JSDoc: Class: LDA

new LDA() → (nullable) {?}

Constructs a LDA instance with default parameters.

Source:

mllib/clustering/LDA.js, line 48

Returns:

Type: ?

Methods

getAlpha() → {Promise.<number>}

Alias for getDocConcentration

Source:

mllib/clustering/LDA.js, line 152

Returns:

Type: Promise.<number>

getAsymmetricAlpha() → {module:eclairjs/mllib/linalg.Vector}

Alias for getAsymmetricDocConcentration

Source:

mllib/clustering/LDA.js, line 144

Returns:

Type: module:eclairjs/mllib/linalg.Vector

getAsymmetricDocConcentration() → {module:eclairjs/mllib/linalg.Vector}

Concentration parameter (commonly named "alpha") for the prior placed on documents' distributions over topics ("theta"). This is the parameter to a Dirichlet distribution.

Source:

mllib/clustering/LDA.js, line 85

Returns:

Type: module:eclairjs/mllib/linalg.Vector

getBeta() → {Promise.<number>}

Alias for getTopicConcentration

Source:

mllib/clustering/LDA.js, line 220

Returns:

Type: Promise.<number>

getCheckpointInterval() → {Promise.<number>}

Period (in iterations) between checkpoints.

Source:

mllib/clustering/LDA.js, line 272

Returns:

Type: Promise.<number>

getDocConcentration() → {Promise.<number>}

Concentration parameter (commonly named "alpha") for the prior placed on documents' distributions over topics ("theta"). This method assumes the Dirichlet distribution is symmetric and can be described by a single Double parameter. It should fail if docConcentration is asymmetric.

Source:

mllib/clustering/LDA.js, line 97

Returns:

Type: Promise.<number>

getK() → {Promise.<number>}

Number of topics to infer. I.e., the number of soft cluster centers.

Source:

mllib/clustering/LDA.js, line 57

Returns:

Type: Promise.<number>

getMaxIterations() → {Promise.<number>}

Maximum number of iterations for learning.

Source:

mllib/clustering/LDA.js, line 237

Returns:

Type: Promise.<number>

getSeed() → {Promise.<number>}

Random seed

Source:

mllib/clustering/LDA.js, line 255

Returns:

Type: Promise.<number>

getTopicConcentration() → {Promise.<number>}

Source:

mllib/clustering/LDA.js, line 184

Returns:

Type: Promise.<number>

run(documents) → {module:eclairjs/mllib/clustering.LDAModel}

Learn an LDA model using the given dataset.

Parameters:

Name	Type	Description
`documents`	module:eclairjs/rdd.RDD	RDD of documents, which are term (word) count vectors paired with IDs. The term count vectors are "bags of words" with a fixed-size vocabulary (where the vocabulary size is the length of the vector). Document IDs must be unique and >= 0.

Source:

mllib/clustering/LDA.js, line 309

Returns:

Inferred LDA model

Type: module:eclairjs/mllib/clustering.LDAModel

setAlphawithnumber(alpha)

Alias for [[setDocConcentration()]]

Parameters:

Name	Type	Description
`alpha`	number

Source:

mllib/clustering/LDA.js, line 170

Returns:

setAlphawithVector(alpha)

Alias for [[setDocConcentration()]]

Parameters:

Name	Type	Description
`alpha`	module:eclairjs/mllib/linalg.Vector

Source:

mllib/clustering/LDA.js, line 161

Returns:

setBeta(beta)

Alias for [[setTopicConcentration()]]

Parameters:

Name	Type	Description
`beta`	number

Source:

mllib/clustering/LDA.js, line 229

Returns:

setCheckpointInterval(checkpointInterval)

Period (in iterations) between checkpoints (default = 10). Checkpointing helps with recovery (when nodes fail). It also helps with eliminating temporary shuffle files on disk, which can be important when LDA is run for many iterations. If the checkpoint directory is not set in SparkContext, this setting is ignored.

Parameters:

Name	Type	Description
`checkpointInterval`	number

Source:

mllib/clustering/LDA.js, line 286

See:

[[org.apache.spark.SparkContext#setCheckpointDir]]

Returns:

setDocConcentrationwithnumber(docConcentration)

Replicates a Double docConcentration to create a symmetric prior.

Parameters:

Name	Type	Description
`docConcentration`	number

Source:

mllib/clustering/LDA.js, line 136

Returns:

setDocConcentrationwithVector(docConcentration)

Concentration parameter (commonly named "alpha") for the prior placed on documents' distributions over topics ("theta"). This is the parameter to a Dirichlet distribution, where larger values mean more smoothing (more regularization). If set to a singleton vector Vector(-1), then docConcentration is set automatically. If set to singleton vector Vector(t) where t != -1, then t is replicated to a vector of length k during [[LDAOptimizer.initialize()]]. Otherwise, the docConcentration vector must be length k. (default = Vector(-1) = automatic) Optimizer-specific parameter settings: - EM - Currently only supports symmetric distributions, so all values in the vector should be the same. - Values should be > 1.0 - default = uniformly (50 / k) + 1, where 50/k is common in LDA libraries and +1 follows from Asuncion et al. (2009), who recommend a +1 adjustment for EM. - Online - Values should be >= 0 - default = uniformly (1.0 / k), following the implementation from [[https://github.com/Blei-Lab/onlineldavb]].

Parameters:

Name	Type	Description
`docConcentration`	module:eclairjs/mllib/linalg.Vector

Source:

mllib/clustering/LDA.js, line 127

Returns:

setK(k) → {module:eclairjs/mllib/clustering.LDA}

Number of topics to infer. I.e., the number of soft cluster centers. (default = 10)

Parameters:

Name	Type	Description
`k`	number

Source:

mllib/clustering/LDA.js, line 67

Returns:

Type: module:eclairjs/mllib/clustering.LDA

setMaxIterations(maxIterations)

Maximum number of iterations for learning. (default = 20)

Parameters:

Name	Type	Description
`maxIterations`	number

Source:

mllib/clustering/LDA.js, line 247

Returns:

setOptimizer(optimizerName)

Set the LDAOptimizer used to perform the actual calculation by algorithm name. Currently "em", "online" are supported.

Parameters:

Name	Type	Description
`optimizerName`	string

Source:

mllib/clustering/LDA.js, line 296

Returns:

setSeed(seed)

Random seed

Parameters:

Name	Type	Description
`seed`	number

Source:

mllib/clustering/LDA.js, line 264

Returns:

setTopicConcentration(topicConcentration)

Concentration parameter (commonly named "beta" or "eta") for the prior placed on topics' distributions over terms. This is the parameter to a symmetric Dirichlet distribution. Note: The topics' distributions over terms are called "beta" in the original LDA paper by Blei et al., but are called "phi" in many later papers such as Asuncion et al., 2009. If set to -1, then topicConcentration is set automatically. (default = -1 = automatic) Optimizer-specific parameter settings: - EM - Value should be > 1.0 - default = 0.1 + 1, where 0.1 gives a small amount of smoothing and +1 follows Asuncion et al. (2009), who recommend a +1 adjustment for EM. - Online - Value should be >= 0 - default = (1.0 / k), following the implementation from [[https://github.com/Blei-Lab/onlineldavb]].

Parameters:

Name	Type	Description
`topicConcentration`	number

Source:

mllib/clustering/LDA.js, line 212