Class: Word2Vec

eclairjs/mllib/feature. Word2Vec

new Word2Vec()

Word2Vec creates vector representation of words in a text corpus. The algorithm first constructs a vocabulary from the corpus and then learns vector representation of words in the vocabulary. The vector representation can be used as features in natural language processing and machine learning algorithms. We used skip-gram model in our implementation and hierarchical softmax method to train the model. The variable names in the implementation matches the original C implementation. For original C implementation, see https://code.google.com/p/word2vec/ For research papers, see Efficient Estimation of Word Representations in Vector Space and Distributed Representations of Words and Phrases and their Compositionality.

Source:

eclairjs/mllib/feature/Word2Vec.js, line 42

Methods

fit(dataset) → {module:eclairjs/mllib/feature.Word2VecModel}

Computes the vector representation of each word in vocabulary.

Parameters:

Name	Type	Description
`dataset`	module:eclairjs.RDD	an RDD of words

Source:

eclairjs/mllib/feature/Word2Vec.js, line 141

Returns:

a Word2VecModel

Type: module:eclairjs/mllib/feature.Word2VecModel

setLearningRate(learningRate) → {module:eclairjs/mllib/feature.Word2Vec}

Sets initial learning rate (default: 0.025).

Parameters:

Name	Type	Description
`learningRate`	float

Source:

eclairjs/mllib/feature/Word2Vec.js, line 73

Returns:

Type: module:eclairjs/mllib/feature.Word2Vec

setMinCount(minCount) → {module:eclairjs/mllib/feature.Word2Vec}

Sets minCount, the minimum number of times a token must appear to be included in the word2vec model's vocabulary (default: 5).

Parameters:

Name	Type	Description
`minCount`	integer

Source:

eclairjs/mllib/feature/Word2Vec.js, line 130

Returns:

Type: module:eclairjs/mllib/feature.Word2Vec

setNumIterations(numIterations) → {module:eclairjs/mllib/feature.Word2Vec}

Sets number of iterations (default: 1), which should be smaller than or equal to number of partitions.

Parameters:

Name	Type	Description
`numIterations`	integer

Source:

eclairjs/mllib/feature/Word2Vec.js, line 96

Returns:

Type: module:eclairjs/mllib/feature.Word2Vec

setNumPartitions(numPartitions) → {module:eclairjs/mllib/feature.Word2Vec}

Sets number of partitions (default: 1). Use a small number for accuracy.

Parameters:

Name	Type	Description
`numPartitions`	integer

Source:

eclairjs/mllib/feature/Word2Vec.js, line 84

Returns:

Type: module:eclairjs/mllib/feature.Word2Vec

setSeed(seed) → {module:eclairjs/mllib/feature.Word2Vec}

Sets random seed (default: a random integer).

Parameters:

Name	Type	Description
`seed`	integer

Source:

eclairjs/mllib/feature/Word2Vec.js, line 107

Returns:

Type: module:eclairjs/mllib/feature.Word2Vec

setVectorSize(vectorSize) → {module:eclairjs/mllib/feature.Word2Vec}

Sets vector size (default: 100).

Parameters:

Name	Type	Description
`vectorSize`	integer

Source:

eclairjs/mllib/feature/Word2Vec.js, line 62

Returns:

Type: module:eclairjs/mllib/feature.Word2Vec

setWindowSize(window) → {module:eclairjs/mllib/feature.Word2Vec}

Sets the window of words (default: 5)

Parameters:

Name	Type	Description
`window`	integer

Source:

eclairjs/mllib/feature/Word2Vec.js, line 118

Returns:

Type: module:eclairjs/mllib/feature.Word2Vec