JSDoc: Class: LocalLDAModel

Concentration parameter (commonly named "alpha") for the prior placed on documents' distributions over topics ("theta"). This is the parameter to a Dirichlet distribution.

Inherited From:

module:eclairjs/mllib/clustering.LDAModel#docConcentration

Source:

eclairjs/mllib/clustering/LDAModel.js, line 70

Returns:

Type: module:eclairjs/mllib/linalg.Vector

k() → {number}

Overrides:

module:eclairjs/mllib/clustering.LDAModel#k

Source:

eclairjs/mllib/clustering/LocalLDAModel.js, line 51

Returns:

Type: number

logLikelihoodwithJavaPairRDD(documents) → {number}

Java-friendly version of logLikelihood

Parameters:

Name	Type	Description
`documents`	module:eclairjs.PairRDD

Source:

eclairjs/mllib/clustering/LocalLDAModel.js, line 119

Returns:

Type: number

logLikelihoodwithRDD(documents) → {number}

Calculates a lower bound on the log likelihood of the entire corpus. See Equation (16) in original Online LDA paper.

Parameters:

Name	Type	Description
`documents`	module:eclairjs.RDD	test corpus to use for calculating log likelihood

Source:

eclairjs/mllib/clustering/LocalLDAModel.js, line 106

Returns:

variational lower bound on the log likelihood of the entire corpus

Type: number

logPerplexitywithJavaPairRDD(documents) → {number}

Parameters:

Name	Type	Description
`documents`	module:eclairjs.PairRDD

Source:

eclairjs/mllib/clustering/LocalLDAModel.js, line 145

Returns:

Type: number

logPerplexitywithRDD(documents) → {number}

Calculate an upper bound bound on perplexity. (Lower is better.) See Equation (16) in original Online LDA paper.

Parameters:

Name	Type	Description
`documents`	module:eclairjs.RDD	test corpus to use for calculating perplexity

Source:

eclairjs/mllib/clustering/LocalLDAModel.js, line 133

Returns:

Variational upper bound on log perplexity per token.

Type: number

save(sc, path)

Parameters:

Name	Type	Description
`sc`	module:eclairjs.SparkContext
`path`	string

Source:

eclairjs/mllib/clustering/LocalLDAModel.js, line 91

topicConcentration() → {number}

Concentration parameter (commonly named "beta" or "eta") for the prior placed on topics' distributions over terms. This is the parameter to a symmetric Dirichlet distribution. Note: The topics' distributions over terms are called "beta" in the original LDA paper by Blei et al., but are called "phi" in many later papers such as Asuncion et al., 2009.

Inherited From:

module:eclairjs/mllib/clustering.LDAModel#topicConcentration

Source:

eclairjs/mllib/clustering/LDAModel.js, line 87

Returns:

Type: number

topicDistributionswithJavaPairRDD(documents) → {module:eclairjs.PairRDD}

Java-friendly version of topicDistributions

Parameters:

Name	Type	Description
`documents`	module:eclairjs.PairRDD

Source:

eclairjs/mllib/clustering/LocalLDAModel.js, line 176

Returns:

Type: module:eclairjs.PairRDD

topicDistributionswithRDD(documents) → {module:eclairjs.RDD}

Predicts the topic mixture distribution for each document (often called "theta" in the literature). Returns a vector of zeros for an empty document. This uses a variational approximation following Hoffman et al. (2010), where the approximate distribution is called "gamma." Technically, this method returns this approximation "gamma" for each document.