new LocalLDAModel(topics)
Local LDA model.
This model stores only the inferred topics.
Parameters:
Name | Type | Description |
---|---|---|
topics |
Inferred topics (vocabSize x k matrix). |
Extends
Methods
(static) load(sc, path) → {module:eclairjs/mllib/clustering.LocalLDAModel}
Parameters:
Name | Type | Description |
---|---|---|
sc |
module:eclairjs.SparkContext | |
path |
string |
Returns:
describeTopics(maxTermsPerTopic) → {Array.<module:eclairjs.Tuple2>}
Parameters:
Name | Type | Description |
---|---|---|
maxTermsPerTopic |
number |
- Overrides:
- Source:
Returns:
- Type
- Array.<module:eclairjs.Tuple2>
docConcentration() → {module:eclairjs/mllib/linalg.Vector}
Concentration parameter (commonly named "alpha") for the prior placed on documents'
distributions over topics ("theta").
This is the parameter to a Dirichlet distribution.
- Inherited From:
- Source:
Returns:
k() → {number}
- Overrides:
- Source:
Returns:
- Type
- number
logLikelihoodwithJavaPairRDD(documents) → {number}
Java-friendly version of logLikelihood
Parameters:
Name | Type | Description |
---|---|---|
documents |
module:eclairjs.PairRDD |
Returns:
- Type
- number
logLikelihoodwithRDD(documents) → {number}
Calculates a lower bound on the log likelihood of the entire corpus.
See Equation (16) in original Online LDA paper.
Parameters:
Name | Type | Description |
---|---|---|
documents |
module:eclairjs.RDD | test corpus to use for calculating log likelihood |
Returns:
variational lower bound on the log likelihood of the entire corpus
- Type
- number
logPerplexitywithJavaPairRDD(documents) → {number}
Parameters:
Name | Type | Description |
---|---|---|
documents |
module:eclairjs.PairRDD |
Returns:
- Type
- number
logPerplexitywithRDD(documents) → {number}
Calculate an upper bound bound on perplexity. (Lower is better.)
See Equation (16) in original Online LDA paper.
Parameters:
Name | Type | Description |
---|---|---|
documents |
module:eclairjs.RDD | test corpus to use for calculating perplexity |
Returns:
Variational upper bound on log perplexity per token.
- Type
- number
save(sc, path)
Parameters:
Name | Type | Description |
---|---|---|
sc |
module:eclairjs.SparkContext | |
path |
string |
topicConcentration() → {number}
Concentration parameter (commonly named "beta" or "eta") for the prior placed on topics'
distributions over terms.
This is the parameter to a symmetric Dirichlet distribution.
Note: The topics' distributions over terms are called "beta" in the original LDA paper
by Blei et al., but are called "phi" in many later papers such as Asuncion et al., 2009.
- Inherited From:
- Source:
Returns:
- Type
- number
topicDistributionswithJavaPairRDD(documents) → {module:eclairjs.PairRDD}
Java-friendly version of topicDistributions
Parameters:
Name | Type | Description |
---|---|---|
documents |
module:eclairjs.PairRDD |
Returns:
topicDistributionswithRDD(documents) → {module:eclairjs.RDD}
Predicts the topic mixture distribution for each document (often called "theta" in the
literature). Returns a vector of zeros for an empty document.
This uses a variational approximation following Hoffman et al. (2010), where the approximate
distribution is called "gamma." Technically, this method returns this approximation "gamma"
for each document.
Parameters:
Name | Type | Description |
---|---|---|
documents |
module:eclairjs.RDD | documents to predict topic mixture distributions for |
Returns:
An RDD of (document ID, topic mixture distribution for document)
- Type
- module:eclairjs.RDD
topicsMatrix() → {module:eclairjs/mllib/linalg.Matrix}
- Overrides:
- Source:
Returns:
vocabSize() → {number}
- Overrides:
- Source:
Returns:
- Type
- number