Class: Word2Vec

eclairjs/mllib/feature.Word2Vec

new Word2Vec()

Word2Vec creates vector representation of words in a text corpus. The algorithm first constructs a vocabulary from the corpus and then learns vector representation of words in the vocabulary. The vector representation can be used as features in natural language processing and machine learning algorithms. We used skip-gram model in our implementation and hierarchical softmax method to train the model. The variable names in the implementation matches the original C implementation. For original C implementation, see https://code.google.com/p/word2vec/ For research papers, see Efficient Estimation of Word Representations in Vector Space and Distributed Representations of Words and Phrases and their Compositionality.
Source:

Methods

fit(dataset) → {module:eclairjs/mllib/feature.Word2VecModel}

Computes the vector representation of each word in vocabulary.
Parameters:
Name Type Description
dataset module:eclairjs/rdd.RDD an RDD of words
Source:
Returns:
a Word2VecModel
Type
module:eclairjs/mllib/feature.Word2VecModel

setLearningRate(learningRate)

Sets initial learning rate (default: 0.025).
Parameters:
Name Type Description
learningRate number
Source:
Returns:

setMinCount(minCount)

Sets minCount, the minimum number of times a token must appear to be included in the word2vec model's vocabulary (default: 5).
Parameters:
Name Type Description
minCount number
Source:
Returns:

setNumIterations(numIterations)

Sets number of iterations (default: 1), which should be smaller than or equal to number of partitions.
Parameters:
Name Type Description
numIterations number
Source:
Returns:

setNumPartitions(numPartitions)

Sets number of partitions (default: 1). Use a small number for accuracy.
Parameters:
Name Type Description
numPartitions number
Source:
Returns:

setSeed(seed)

Sets random seed (default: a random long integer).
Parameters:
Name Type Description
seed number
Source:
Returns:

setVectorSize(vectorSize)

Sets vector size (default: 100).
Parameters:
Name Type Description
vectorSize number
Source:
Returns:

setWindowSize(window)

Sets the window of words (default: 5)
Parameters:
Name Type Description
window number
Source:
Returns: