JSDoc: Class: LDAModel

Constructor

new LDAModel()

Source:

ml/clustering/LDAModel.js, line 33

Extends

module:eclairjs/ml.Model

Methods

copy(extra) → {object}

Parameters:

Name	Type	Description
`extra`	module:eclairjs/ml/param.ParamMap

Inherited From:

module:eclairjs/ml.Model#copy

Source:

ml/Model.js, line 91

Returns:

Type: object

describeTopics(maxTermsPerTopicopt) → {module:eclairjs/sql.Dataset}

Return the topics described by their top-weighted terms.

Parameters:

Name	Type	Attributes	Description
`maxTermsPerTopic`	number	<optional>	Maximum number of terms to collect for each topic. Default value of 10. - "topic": IntegerType: topic index - "termIndices": ArrayType(IntegerType): term indices, sorted in order of decreasing term importance - "termWeights": ArrayType(DoubleType): corresponding sorted term weights

Source:

ml/clustering/LDAModel.js, line 225

Returns:

Local Dataset with one topic per Row, with columns:

Type: module:eclairjs/sql.Dataset

estimatedDocConcentration() → {module:eclairjs/mllib/linalg.Vector}

Value for docConcentration estimated from data. If Online LDA was used and optimizeDocConcentration was set to false, then this returns the fixed (given) value for the docConcentration parameter.

Source:

ml/clustering/LDAModel.js, line 118

Returns:

Type: module:eclairjs/mllib/linalg.Vector

hasParent() → {Promise.<boolean>}

Indicates whether this Model has a corresponding parent.

Inherited From:

module:eclairjs/ml.Model#hasParent

Source:

ml/Model.js, line 76

Returns:

Type: Promise.<boolean>

isDistributed() → {Promise.<boolean>}

Indicates whether this instance is of type DistributedLDAModel

Source:

ml/clustering/LDAModel.js, line 158

Returns:

Type: Promise.<boolean>

logLikelihood(dataset) → {Promise.<number>}

Calculates a lower bound on the log likelihood of the entire corpus. See Equation (16) in the Online LDA paper (Hoffman et al., 2010). WARNING: If this model is an instance of [[DistributedLDAModel]] (produced when optimizer is set to "em"), this involves collecting a large topicsMatrix to the driver. This implementation may be changed in the future.

Parameters:

Name	Type	Description
`dataset`	module:eclairjs/sql.Dataset	test corpus to use for calculating log likelihood

Source:

ml/clustering/LDAModel.js, line 181

Returns:

variational lower bound on the log likelihood of the entire corpus

Type: Promise.<number>

logPerplexity(dataset) → {Promise.<number>}

Calculate an upper bound bound on perplexity. (Lower is better.) See Equation (16) in the Online LDA paper (Hoffman et al., 2010). WARNING: If this model is an instance of [[DistributedLDAModel]] (produced when optimizer is set to "em"), this involves collecting a large topicsMatrix to the driver. This implementation may be changed in the future.

Parameters:

Name	Type	Description
`dataset`	module:eclairjs/sql.Dataset	test corpus to use for calculating perplexity

Source:

ml/clustering/LDAModel.js, line 203

Returns:

Variational upper bound on log perplexity per token.

Type: Promise.<number>

parent() → {module:eclairjs/ml.Estimator}

Inherited From:

module:eclairjs/ml.Model#parent

Source:

ml/Model.js, line 59

Returns:

Type: module:eclairjs/ml.Estimator

setFeaturesCol(value) → {module:eclairjs/mllib/clustering.LDAModel}

The features for LDA should be a Vector representing the word counts in a document. The vector should be of length vocabSize, with counts for each term (word).

Parameters:

Name	Type	Description
`value`	string

Source:

ml/clustering/LDAModel.js, line 47

Returns:

Type: module:eclairjs/mllib/clustering.LDAModel

setParent(parent) → {object}

Sets the parent of this model.

Parameters:

Name	Type	Description
`parent`	module:eclairjs/ml.Estimator

Inherited From:

module:eclairjs/ml.Model#setParent

Source:

ml/Model.js, line 44

Returns:

Type: object

setSeed(value) → {module:eclairjs/mllib/clustering.LDAModel}

Parameters:

Name	Type	Description
`value`	number

Source:

ml/clustering/LDAModel.js, line 62

Returns:

Type: module:eclairjs/mllib/clustering.LDAModel

topicsMatrix() → {module:eclairjs/mllib/linalg.Matrix}

Inferred topics, where each topic is represented by a distribution over terms. This is a matrix of size vocabSize x k, where each column is a topic. No guarantees are given about the ordering of the topics. WARNING: If this model is actually a DistributedLDAModel instance produced by the Expectation-Maximization ("em") optimizer, then this method could involve collecting a large amount of data to the driver (on the order of vocabSize x k).

Source:

ml/clustering/LDAModel.js, line 141

Returns:

Type: module:eclairjs/mllib/linalg.Matrix

transform(dataset) → {module:eclairjs/sql.Dataset}

Transforms the input dataset. WARNING: If this model is an instance of [[DistributedLDAModel]] (produced when optimizer is set to "em"), this involves collecting a large topicsMatrix to the driver. This implementation may be changed in the future.

Parameters:

Name	Type	Description
`dataset`	module:eclairjs/sql.Dataset

Overrides:

module:eclairjs/ml.Model#transform

Source:

ml/clustering/LDAModel.js, line 82

Returns:

Type: module:eclairjs/sql.Dataset

transformSchema(schema) → {module:eclairjs/sql/types.StructType}

Parameters:

Name	Type	Description
`schema`	module:eclairjs/sql/types.StructType

Source:

ml/clustering/LDAModel.js, line 99

Returns:

Type: module:eclairjs/sql/types.StructType