Class: LocalLDAModel

eclairjs/ml/clustering. LocalLDAModel

Local (non-distributed) model fitted by module:eclairjs/ml/clustering.LDA. This model stores the inferred topics only; it does not store info about the training dataset.

Constructor

new LocalLDAModel()

Source:

Extends

Methods

(static) load(path) → {module:eclairjs/mllib/clustering.LocalLDAModel}

Parameters:
Name Type Description
path string
Source:
Returns:
Type
module:eclairjs/mllib/clustering.LocalLDAModel

(static) read() → {module:eclairjs/ml/util.MLReader}

Source:
Returns:
Type
module:eclairjs/ml/util.MLReader

copy(extra) → {module:eclairjs/mllib/clustering.LocalLDAModel}

Parameters:
Name Type Description
extra module:eclairjs/ml/param.ParamMap
Overrides:
Source:
Returns:
Type
module:eclairjs/mllib/clustering.LocalLDAModel

describeTopics(maxTermsPerTopicopt) → {module:eclairjs/sql.DataFrame}

Return the topics described by their top-weighted terms.
Parameters:
Name Type Attributes Description
maxTermsPerTopic number <optional>
Maximum number of terms to collect for each topic. Default value of 10. - "topic": IntegerType: topic index - "termIndices": ArrayType(IntegerType): term indices, sorted in order of decreasing term importance - "termWeights": ArrayType(DoubleType): corresponding sorted term weights
Inherited From:
Source:
Returns:
Local DataFrame with one topic per Row, with columns:
Type
module:eclairjs/sql.DataFrame

estimatedDocConcentration() → {module:eclairjs/mllib/linalg.Vector}

Value for docConcentration estimated from data. If Online LDA was used and optimizeDocConcentration was set to false, then this returns the fixed (given) value for the docConcentration parameter.
Inherited From:
Source:
Returns:
Type
module:eclairjs/mllib/linalg.Vector

hasParent() → {boolean}

Inherited From:
Source:
Returns:
Type
boolean

isDistributed() → {boolean}

Overrides:
Source:
Returns:
Type
boolean

logLikelihood(dataset) → {number}

Calculates a lower bound on the log likelihood of the entire corpus. See Equation (16) in the Online LDA paper (Hoffman et al., 2010). WARNING: If this model is an instance of module:eclairjs/ml/clustering.DistributedLDAModel (produced when optimizer is set to "em"), this involves collecting a large topicsMatrix to the driver. This implementation may be changed in the future.
Parameters:
Name Type Description
dataset module:eclairjs/sql.DataFrame test corpus to use for calculating log likelihood
Inherited From:
Source:
Returns:
variational lower bound on the log likelihood of the entire corpus
Type
number

logPerplexity(dataset) → {number}

Calculate an upper bound bound on perplexity. (Lower is better.) See Equation (16) in the Online LDA paper (Hoffman et al., 2010). WARNING: If this model is an instance of module:eclairjs/ml/clustering.DistributedLDAModel (produced when optimizer is set to "em"), this involves collecting a large topicsMatrix to the driver. This implementation may be changed in the future.
Parameters:
Name Type Description
dataset module:eclairjs/sql.DataFrame test corpus to use for calculating perplexity
Inherited From:
Source:
Returns:
Variational upper bound on log perplexity per token.
Type
number

parent() → {module:eclairjs/ml.Estimator}

Inherited From:
Source:
Returns:
Type
module:eclairjs/ml.Estimator

setFeaturesCol(value) → {module:eclairjs/mllib/clustering.LDAModel}

The features for LDA should be a module:eclairjs/mllib/linalg.Vector representing the word counts in a document. The vector should be of length vocabSize, with counts for each term (word).
Parameters:
Name Type Description
value string
Inherited From:
Source:
Returns:
Type
module:eclairjs/mllib/clustering.LDAModel

setParent(parent) → {object}

Sets the parent of this model.
Parameters:
Name Type Description
parent module:eclairjs/ml.Estimator
Inherited From:
Source:
Returns:
Type
object

setSeed(value) → {module:eclairjs/mllib/clustering.LDAModel}

Parameters:
Name Type Description
value number
Inherited From:
Source:
Returns:
Type
module:eclairjs/mllib/clustering.LDAModel

topicsMatrix() → {module:eclairjs/mllib/linalg.Matrix}

Inferred topics, where each topic is represented by a distribution over terms. This is a matrix of size vocabSize x k, where each column is a topic. No guarantees are given about the ordering of the topics. WARNING: If this model is actually a module:eclairjs/ml/clustering.DistributedLDAModel instance produced by the Expectation-Maximization ("em") optimizer, then this method could involve collecting a large amount of data to the driver (on the order of vocabSize x k).
Inherited From:
Source:
Returns:
Type
module:eclairjs/mllib/linalg.Matrix

transform(dataset) → {module:eclairjs/sql.DataFrame}

Transforms the input dataset. WARNING: If this model is an instance of module:eclairjs/ml/clustering.DistributedLDAModel (produced when optimizer is set to "em"), this involves collecting a large topicsMatrix to the driver. This implementation may be changed in the future.
Parameters:
Name Type Description
dataset module:eclairjs/sql.DataFrame
Inherited From:
Source:
Returns:
Type
module:eclairjs/sql.DataFrame

transformSchema(schema) → {module:eclairjs/sql/types.StructType}

Parameters:
Name Type Description
schema module:eclairjs/sql/types.StructType
Inherited From:
Source:
Returns:
Type
module:eclairjs/sql/types.StructType

write() → {module:eclairjs/ml/util.MLWriter}

Source:
Returns:
Type
module:eclairjs/ml/util.MLWriter