Class: LDA

eclairjs/ml/clustering.LDA

Latent Dirichlet Allocation (LDA), a topic model designed for text documents. Terminology: - "term" = "word": an element of the vocabulary - "token": instance of a term appearing in a document - "topic": multinomial distribution over terms representing some concept - "document": one piece of text, corresponding to one row in the input data Original LDA paper (journal version): Blei, Ng, and Jordan. "Latent Dirichlet Allocation." JMLR, 2003. Input data (featuresCol): LDA is given a collection of documents as input data, via the featuresCol parameter. Each document is specified as a Vector of length vocabSize, where each entry is the count for the corresponding term (word) in the document. Feature transformers such as [[org.apache.spark.ml.feature.Tokenizer]] and CountVectorizer can be useful for converting text to word count vectors.

Constructor

new LDA(uidopt)

Parameters:
Name Type Attributes Description
uid string <optional>
Source:
See:

Extends

Methods

(static) load(path) → {LDA}

Parameters:
Name Type Description
path string
Source:
Returns:
Type
LDA

copy(extra) → {module:eclairjs/mllib/clustering.LDA}

Parameters:
Name Type Description
extra module:eclairjs/ml/param.ParamMap
Overrides:
Source:
Returns:
Type
module:eclairjs/mllib/clustering.LDA

extractParamMap() → {module:eclairjs/ml/param.ParamMap}

Inherited From:
Source:
Returns:
Type
module:eclairjs/ml/param.ParamMap

fit(dataset) → {module:eclairjs/mllib/clustering.LDAModel}

Parameters:
Name Type Description
dataset module:eclairjs/sql.Dataset
Overrides:
Source:
Returns:
Type
module:eclairjs/mllib/clustering.LDAModel

setCheckpointInterval(value) → {module:eclairjs/mllib/clustering.LDA}

Parameters:
Name Type Description
value number
Source:
Returns:
Type
module:eclairjs/mllib/clustering.LDA

setDocConcentration(value) → {module:eclairjs/mllib/clustering.LDA}

Parameters:
Name Type Description
value Array.<number> | number
Source:
Returns:
Type
module:eclairjs/mllib/clustering.LDA

setFeaturesCol(value) → {module:eclairjs/mllib/clustering.LDA}

The features for LDA should be a Vector representing the word counts in a document. The vector should be of length vocabSize, with counts for each term (word).
Parameters:
Name Type Description
value string
Source:
Returns:
Type
module:eclairjs/mllib/clustering.LDA

setK(value) → {module:eclairjs/mllib/clustering.LDA}

Parameters:
Name Type Description
value number
Source:
Returns:
Type
module:eclairjs/mllib/clustering.LDA

setKeepLastCheckpoint(value) → {type}

Parameters:
Name Type Description
value boolean
Source:
Returns:
Type
type

setLearningDecay(value) → {module:eclairjs/mllib/clustering.LDA}

Parameters:
Name Type Description
value number
Source:
Returns:
Type
module:eclairjs/mllib/clustering.LDA

setLearningOffset(value) → {module:eclairjs/mllib/clustering.LDA}

Parameters:
Name Type Description
value number
Source:
Returns:
Type
module:eclairjs/mllib/clustering.LDA

setMaxIter(value) → {module:eclairjs/mllib/clustering.LDA}

Parameters:
Name Type Description
value number
Source:
Returns:
Type
module:eclairjs/mllib/clustering.LDA

setOptimizeDocConcentration(value) → {module:eclairjs/mllib/clustering.LDA}

Parameters:
Name Type Description
value boolean
Source:
Returns:
Type
module:eclairjs/mllib/clustering.LDA

setOptimizer(value) → {module:eclairjs/mllib/clustering.LDA}

Parameters:
Name Type Description
value string
Source:
Returns:
Type
module:eclairjs/mllib/clustering.LDA

setSeed(value) → {module:eclairjs/mllib/clustering.LDA}

Parameters:
Name Type Description
value number
Source:
Returns:
Type
module:eclairjs/mllib/clustering.LDA

setSubsamplingRate(value) → {module:eclairjs/mllib/clustering.LDA}

Parameters:
Name Type Description
value number
Source:
Returns:
Type
module:eclairjs/mllib/clustering.LDA

setTopicConcentration(value) → {module:eclairjs/mllib/clustering.LDA}

Parameters:
Name Type Description
value number
Source:
Returns:
Type
module:eclairjs/mllib/clustering.LDA

setTopicDistributionCol(value) → {module:eclairjs/mllib/clustering.LDA}

Parameters:
Name Type Description
value string
Source:
Returns:
Type
module:eclairjs/mllib/clustering.LDA

transformSchema(schema) → {module:eclairjs/sql/types.StructType}

Parameters:
Name Type Description
schema module:eclairjs/sql/types.StructType
Source:
Returns:
Type
module:eclairjs/sql/types.StructType

uid() → {Promise.<string>}

An immutable unique ID for the object and its derivatives.
Source:
Returns:
Type
Promise.<string>