Class: Word2Vec

eclairjs/mllib/feature. Word2Vec

new Word2Vec()

Word2Vec creates vector representation of words in a text corpus. The algorithm first constructs a vocabulary from the corpus and then learns vector representation of words in the vocabulary. The vector representation can be used as features in natural language processing and machine learning algorithms. We used skip-gram model in our implementation and hierarchical softmax method to train the model. The variable names in the implementation matches the original C implementation. For original C implementation, see https://code.google.com/p/word2vec/ For research papers, see Efficient Estimation of Word Representations in Vector Space and Distributed Representations of Words and Phrases and their Compositionality.
Source:

Methods

fit(dataset) → {module:eclairjs/mllib/feature.Word2VecModel}

Computes the vector representation of each word in vocabulary.
Parameters:
Name Type Description
dataset module:eclairjs.RDD an RDD of words
Source:
Returns:
a Word2VecModel
Type
module:eclairjs/mllib/feature.Word2VecModel

setLearningRate(learningRate) → {module:eclairjs/mllib/feature.Word2Vec}

Sets initial learning rate (default: 0.025).
Parameters:
Name Type Description
learningRate float
Source:
Returns:
Type
module:eclairjs/mllib/feature.Word2Vec

setMinCount(minCount) → {module:eclairjs/mllib/feature.Word2Vec}

Sets minCount, the minimum number of times a token must appear to be included in the word2vec model's vocabulary (default: 5).
Parameters:
Name Type Description
minCount integer
Source:
Returns:
Type
module:eclairjs/mllib/feature.Word2Vec

setNumIterations(numIterations) → {module:eclairjs/mllib/feature.Word2Vec}

Sets number of iterations (default: 1), which should be smaller than or equal to number of partitions.
Parameters:
Name Type Description
numIterations integer
Source:
Returns:
Type
module:eclairjs/mllib/feature.Word2Vec

setNumPartitions(numPartitions) → {module:eclairjs/mllib/feature.Word2Vec}

Sets number of partitions (default: 1). Use a small number for accuracy.
Parameters:
Name Type Description
numPartitions integer
Source:
Returns:
Type
module:eclairjs/mllib/feature.Word2Vec

setSeed(seed) → {module:eclairjs/mllib/feature.Word2Vec}

Sets random seed (default: a random integer).
Parameters:
Name Type Description
seed integer
Source:
Returns:
Type
module:eclairjs/mllib/feature.Word2Vec

setVectorSize(vectorSize) → {module:eclairjs/mllib/feature.Word2Vec}

Sets vector size (default: 100).
Parameters:
Name Type Description
vectorSize integer
Source:
Returns:
Type
module:eclairjs/mllib/feature.Word2Vec

setWindowSize(window) → {module:eclairjs/mllib/feature.Word2Vec}

Sets the window of words (default: 5)
Parameters:
Name Type Description
window integer
Source:
Returns:
Type
module:eclairjs/mllib/feature.Word2Vec