JSDoc: Class: LocalLDAModel

Name	Type	Attributes	Description
`maxTermsPerTopic`	number	<optional>	Maximum number of terms to collect for each topic. Default value of 10. - "topic": IntegerType: topic index - "termIndices": ArrayType(IntegerType): term indices, sorted in order of decreasing term importance - "termWeights": ArrayType(DoubleType): corresponding sorted term weights

Inherited From:

module:eclairjs/ml/clustering.LDAModel#describeTopics

Source:

eclairjs/ml/clustering/LDAModel.js, line 177

Returns:

Local DataFrame with one topic per Row, with columns:

Type: module:eclairjs/sql.DataFrame

estimatedDocConcentration() → {module:eclairjs/mllib/linalg.Vector}

Value for docConcentration estimated from data. If Online LDA was used and optimizeDocConcentration was set to false, then this returns the fixed (given) value for the docConcentration parameter.

Inherited From:

module:eclairjs/ml/clustering.LDAModel#estimatedDocConcentration

Source:

eclairjs/ml/clustering/LDAModel.js, line 100

Returns:

Type: module:eclairjs/mllib/linalg.Vector

hasParent() → {boolean}

Inherited From:

module:eclairjs/ml.Model#hasParent

Source:

eclairjs/ml/Model.js, line 69

Returns:

Type: boolean

isDistributed() → {boolean}

Overrides:

module:eclairjs/ml/clustering.LDAModel#isDistributed

Source:

eclairjs/ml/clustering/LocalLDAModel.js, line 64

Returns:

Type: boolean

logLikelihood(dataset) → {number}

Calculates a lower bound on the log likelihood of the entire corpus. See Equation (16) in the Online LDA paper (Hoffman et al., 2010). WARNING: If this model is an instance of module:eclairjs/ml/clustering.DistributedLDAModel (produced when optimizer is set to "em"), this involves collecting a large topicsMatrix to the driver. This implementation may be changed in the future.

Parameters:

Name	Type	Description
`dataset`	module:eclairjs/sql.DataFrame	test corpus to use for calculating log likelihood

Inherited From:

module:eclairjs/ml/clustering.LDAModel#logLikelihood

Source:

eclairjs/ml/clustering/LDAModel.js, line 143

Returns:

variational lower bound on the log likelihood of the entire corpus

Type: number

logPerplexity(dataset) → {number}

Calculate an upper bound bound on perplexity. (Lower is better.) See Equation (16) in the Online LDA paper (Hoffman et al., 2010). WARNING: If this model is an instance of module:eclairjs/ml/clustering.DistributedLDAModel (produced when optimizer is set to "em"), this involves collecting a large topicsMatrix to the driver. This implementation may be changed in the future.

Parameters:

Name	Type	Description
`dataset`	module:eclairjs/sql.DataFrame	test corpus to use for calculating perplexity

Inherited From:

module:eclairjs/ml/clustering.LDAModel#logPerplexity

Source:

eclairjs/ml/clustering/LDAModel.js, line 160

Returns:

Variational upper bound on log perplexity per token.

Type: number

parent() → {module:eclairjs/ml.Estimator}

Inherited From:

module:eclairjs/ml.Model#parent

Source:

eclairjs/ml/Model.js, line 60

Returns:

Type: module:eclairjs/ml.Estimator

setFeaturesCol(value) → {module:eclairjs/mllib/clustering.LDAModel}

The features for LDA should be a module:eclairjs/mllib/linalg.Vector representing the word counts in a document. The vector should be of length vocabSize, with counts for each term (word).

Parameters:

Name	Type	Description
`value`	string

Inherited From:

module:eclairjs/ml/clustering.LDAModel#setFeaturesCol

Source:

eclairjs/ml/clustering/LDAModel.js, line 51

Returns:

Type: module:eclairjs/mllib/clustering.LDAModel

setParent(parent) → {object}

Sets the parent of this model.

Parameters:

Name	Type	Description
`parent`	module:eclairjs/ml.Estimator

Inherited From:

module:eclairjs/ml.Model#setParent

Source:

eclairjs/ml/Model.js, line 50

Returns:

Type: object

setSeed(value) → {module:eclairjs/mllib/clustering.LDAModel}

Parameters:

Name	Type	Description
`value`	number

Inherited From:

module:eclairjs/ml/clustering.LDAModel#setSeed

Source:

eclairjs/ml/clustering/LDAModel.js, line 61

Returns:

Type: module:eclairjs/mllib/clustering.LDAModel

topicsMatrix() → {module:eclairjs/mllib/linalg.Matrix}

Inferred topics, where each topic is represented by a distribution over terms. This is a matrix of size vocabSize x k, where each column is a topic. No guarantees are given about the ordering of the topics. WARNING: If this model is actually a module:eclairjs/ml/clustering.DistributedLDAModel instance produced by the Expectation-Maximization ("em") optimizer, then this method could involve collecting a large amount of data to the driver (on the order of vocabSize x k).

Inherited From:

module:eclairjs/ml/clustering.LDAModel#topicsMatrix

Source:

eclairjs/ml/clustering/LDAModel.js, line 116

Returns:

Type: module:eclairjs/mllib/linalg.Matrix

transform(dataset) → {module:eclairjs/sql.DataFrame}

Transforms the input dataset. WARNING: If this model is an instance of module:eclairjs/ml/clustering.DistributedLDAModel (produced when optimizer is set to "em"), this involves collecting a large topicsMatrix to the driver. This implementation may be changed in the future.