Class: LocalLDAModel

eclairjs/mllib/clustering. LocalLDAModel

new LocalLDAModel(topics)

Local LDA model. This model stores only the inferred topics.
Parameters:
Name Type Description
topics Inferred topics (vocabSize x k matrix).
Source:

Extends

Methods

(static) load(sc, path) → {module:eclairjs/mllib/clustering.LocalLDAModel}

Parameters:
Name Type Description
sc module:eclairjs.SparkContext
path string
Source:
Returns:
Type
module:eclairjs/mllib/clustering.LocalLDAModel

describeTopics(maxTermsPerTopic) → {Array.<module:eclairjs.Tuple2>}

Parameters:
Name Type Description
maxTermsPerTopic number
Overrides:
Source:
Returns:
Type
Array.<module:eclairjs.Tuple2>

docConcentration() → {module:eclairjs/mllib/linalg.Vector}

Concentration parameter (commonly named "alpha") for the prior placed on documents' distributions over topics ("theta"). This is the parameter to a Dirichlet distribution.
Inherited From:
Source:
Returns:
Type
module:eclairjs/mllib/linalg.Vector

k() → {number}

Overrides:
Source:
Returns:
Type
number

logLikelihoodwithJavaPairRDD(documents) → {number}

Java-friendly version of logLikelihood
Parameters:
Name Type Description
documents module:eclairjs.PairRDD
Source:
Returns:
Type
number

logLikelihoodwithRDD(documents) → {number}

Calculates a lower bound on the log likelihood of the entire corpus. See Equation (16) in original Online LDA paper.
Parameters:
Name Type Description
documents module:eclairjs.RDD test corpus to use for calculating log likelihood
Source:
Returns:
variational lower bound on the log likelihood of the entire corpus
Type
number

logPerplexitywithJavaPairRDD(documents) → {number}

Parameters:
Name Type Description
documents module:eclairjs.PairRDD
Source:
Returns:
Type
number

logPerplexitywithRDD(documents) → {number}

Calculate an upper bound bound on perplexity. (Lower is better.) See Equation (16) in original Online LDA paper.
Parameters:
Name Type Description
documents module:eclairjs.RDD test corpus to use for calculating perplexity
Source:
Returns:
Variational upper bound on log perplexity per token.
Type
number

save(sc, path)

Parameters:
Name Type Description
sc module:eclairjs.SparkContext
path string
Source:

topicConcentration() → {number}

Concentration parameter (commonly named "beta" or "eta") for the prior placed on topics' distributions over terms. This is the parameter to a symmetric Dirichlet distribution. Note: The topics' distributions over terms are called "beta" in the original LDA paper by Blei et al., but are called "phi" in many later papers such as Asuncion et al., 2009.
Inherited From:
Source:
Returns:
Type
number

topicDistributionswithJavaPairRDD(documents) → {module:eclairjs.PairRDD}

Java-friendly version of topicDistributions
Parameters:
Name Type Description
documents module:eclairjs.PairRDD
Source:
Returns:
Type
module:eclairjs.PairRDD

topicDistributionswithRDD(documents) → {module:eclairjs.RDD}

Predicts the topic mixture distribution for each document (often called "theta" in the literature). Returns a vector of zeros for an empty document. This uses a variational approximation following Hoffman et al. (2010), where the approximate distribution is called "gamma." Technically, this method returns this approximation "gamma" for each document.
Parameters:
Name Type Description
documents module:eclairjs.RDD documents to predict topic mixture distributions for
Source:
Returns:
An RDD of (document ID, topic mixture distribution for document)
Type
module:eclairjs.RDD

topicsMatrix() → {module:eclairjs/mllib/linalg.Matrix}

Overrides:
Source:
Returns:
Type
module:eclairjs/mllib/linalg.Matrix

vocabSize() → {number}

Overrides:
Source:
Returns:
Type
number