new Word2Vec()
Word2Vec creates vector representation of words in a text corpus.
The algorithm first constructs a vocabulary from the corpus
and then learns vector representation of words in the vocabulary.
The vector representation can be used as features in
natural language processing and machine learning algorithms.
We used skip-gram model in our implementation and hierarchical softmax
method to train the model. The variable names in the implementation
matches the original C implementation.
For original C implementation, see https://code.google.com/p/word2vec/
For research papers, see
Efficient Estimation of Word Representations in Vector Space
and
Distributed Representations of Words and Phrases and their Compositionality.
Methods
fit(dataset) → {module:eclairjs/mllib/feature.Word2VecModel}
Computes the vector representation of each word in vocabulary.
Parameters:
Name | Type | Description |
---|---|---|
dataset |
module:eclairjs.RDD | an RDD of words |
Returns:
a Word2VecModel
setLearningRate(learningRate) → {module:eclairjs/mllib/feature.Word2Vec}
Sets initial learning rate (default: 0.025).
Parameters:
Name | Type | Description |
---|---|---|
learningRate |
float |
Returns:
setMinCount(minCount) → {module:eclairjs/mllib/feature.Word2Vec}
Sets minCount, the minimum number of times a token must appear to be included in the word2vec
model's vocabulary (default: 5).
Parameters:
Name | Type | Description |
---|---|---|
minCount |
integer |
Returns:
setNumIterations(numIterations) → {module:eclairjs/mllib/feature.Word2Vec}
Sets number of iterations (default: 1), which should be smaller than or equal to number of
partitions.
Parameters:
Name | Type | Description |
---|---|---|
numIterations |
integer |
Returns:
setNumPartitions(numPartitions) → {module:eclairjs/mllib/feature.Word2Vec}
Sets number of partitions (default: 1). Use a small number for accuracy.
Parameters:
Name | Type | Description |
---|---|---|
numPartitions |
integer |
Returns:
setSeed(seed) → {module:eclairjs/mllib/feature.Word2Vec}
Sets random seed (default: a random integer).
Parameters:
Name | Type | Description |
---|---|---|
seed |
integer |
Returns:
setVectorSize(vectorSize) → {module:eclairjs/mllib/feature.Word2Vec}
Sets vector size (default: 100).
Parameters:
Name | Type | Description |
---|---|---|
vectorSize |
integer |
Returns:
setWindowSize(window) → {module:eclairjs/mllib/feature.Word2Vec}
Sets the window of words (default: 5)
Parameters:
Name | Type | Description |
---|---|---|
window |
integer |