Class: LDAModel

eclairjs/ml/clustering.LDAModel

Constructor

new LDAModel()

Source:

Extends

Methods

copy(extra) → {object}

Parameters:
Name Type Description
extra module:eclairjs/ml/param.ParamMap
Inherited From:
Source:
Returns:
Type
object

describeTopics(maxTermsPerTopicopt) → {module:eclairjs/sql.Dataset}

Return the topics described by their top-weighted terms.
Parameters:
Name Type Attributes Description
maxTermsPerTopic number <optional>
Maximum number of terms to collect for each topic. Default value of 10. - "topic": IntegerType: topic index - "termIndices": ArrayType(IntegerType): term indices, sorted in order of decreasing term importance - "termWeights": ArrayType(DoubleType): corresponding sorted term weights
Source:
Returns:
Local Dataset with one topic per Row, with columns:
Type
module:eclairjs/sql.Dataset

estimatedDocConcentration() → {module:eclairjs/mllib/linalg.Vector}

Value for docConcentration estimated from data. If Online LDA was used and optimizeDocConcentration was set to false, then this returns the fixed (given) value for the docConcentration parameter.
Source:
Returns:
Type
module:eclairjs/mllib/linalg.Vector

hasParent() → {Promise.<boolean>}

Indicates whether this Model has a corresponding parent.
Inherited From:
Source:
Returns:
Type
Promise.<boolean>

isDistributed() → {Promise.<boolean>}

Indicates whether this instance is of type DistributedLDAModel
Source:
Returns:
Type
Promise.<boolean>

logLikelihood(dataset) → {Promise.<number>}

Calculates a lower bound on the log likelihood of the entire corpus. See Equation (16) in the Online LDA paper (Hoffman et al., 2010). WARNING: If this model is an instance of [[DistributedLDAModel]] (produced when optimizer is set to "em"), this involves collecting a large topicsMatrix to the driver. This implementation may be changed in the future.
Parameters:
Name Type Description
dataset module:eclairjs/sql.Dataset test corpus to use for calculating log likelihood
Source:
Returns:
variational lower bound on the log likelihood of the entire corpus
Type
Promise.<number>

logPerplexity(dataset) → {Promise.<number>}

Calculate an upper bound bound on perplexity. (Lower is better.) See Equation (16) in the Online LDA paper (Hoffman et al., 2010). WARNING: If this model is an instance of [[DistributedLDAModel]] (produced when optimizer is set to "em"), this involves collecting a large topicsMatrix to the driver. This implementation may be changed in the future.
Parameters:
Name Type Description
dataset module:eclairjs/sql.Dataset test corpus to use for calculating perplexity
Source:
Returns:
Variational upper bound on log perplexity per token.
Type
Promise.<number>

parent() → {module:eclairjs/ml.Estimator}

Inherited From:
Source:
Returns:
Type
module:eclairjs/ml.Estimator

setFeaturesCol(value) → {module:eclairjs/mllib/clustering.LDAModel}

The features for LDA should be a Vector representing the word counts in a document. The vector should be of length vocabSize, with counts for each term (word).
Parameters:
Name Type Description
value string
Source:
Returns:
Type
module:eclairjs/mllib/clustering.LDAModel

setParent(parent) → {object}

Sets the parent of this model.
Parameters:
Name Type Description
parent module:eclairjs/ml.Estimator
Inherited From:
Source:
Returns:
Type
object

setSeed(value) → {module:eclairjs/mllib/clustering.LDAModel}

Parameters:
Name Type Description
value number
Source:
Returns:
Type
module:eclairjs/mllib/clustering.LDAModel

topicsMatrix() → {module:eclairjs/mllib/linalg.Matrix}

Inferred topics, where each topic is represented by a distribution over terms. This is a matrix of size vocabSize x k, where each column is a topic. No guarantees are given about the ordering of the topics. WARNING: If this model is actually a DistributedLDAModel instance produced by the Expectation-Maximization ("em") optimizer, then this method could involve collecting a large amount of data to the driver (on the order of vocabSize x k).
Source:
Returns:
Type
module:eclairjs/mllib/linalg.Matrix

transform(dataset) → {module:eclairjs/sql.Dataset}

Transforms the input dataset. WARNING: If this model is an instance of [[DistributedLDAModel]] (produced when optimizer is set to "em"), this involves collecting a large topicsMatrix to the driver. This implementation may be changed in the future.
Parameters:
Name Type Description
dataset module:eclairjs/sql.Dataset
Overrides:
Source:
Returns:
Type
module:eclairjs/sql.Dataset

transformSchema(schema) → {module:eclairjs/sql/types.StructType}

Parameters:
Name Type Description
schema module:eclairjs/sql/types.StructType
Source:
Returns:
Type
module:eclairjs/sql/types.StructType