JSDoc: Class: LDA

new LDA() → (nullable) {?}

Constructs a LDA instance with default parameters.

Source:

eclairjs/mllib/clustering/LDA.js, line 47

Returns:

Type: ?

Methods

getAlpha() → {number}

Alias for getDocConcentration

Source:

eclairjs/mllib/clustering/LDA.js, line 175

Returns:

Type: number

getAsymmetricAlpha() → {module:eclairjs/mllib/linalg.Vector}

Alias for getAsymmetricDocConcentration

Source:

eclairjs/mllib/clustering/LDA.js, line 164

Returns:

Type: module:eclairjs/mllib/linalg.Vector

getAsymmetricDocConcentration() → {module:eclairjs/mllib/linalg.Vector}

Concentration parameter (commonly named "alpha") for the prior placed on documents' distributions over topics ("theta"). This is the parameter to a Dirichlet distribution.

Source:

eclairjs/mllib/clustering/LDA.js, line 93

Returns:

Type: module:eclairjs/mllib/linalg.Vector

getBeta() → {number}

Alias for getTopicConcentration

Source:

eclairjs/mllib/clustering/LDA.js, line 257

Returns:

Type: number

getCheckpointInterval() → {number}

Period (in iterations) between checkpoints.

Source:

eclairjs/mllib/clustering/LDA.js, line 324

Returns:

Type: number

getDocConcentration() → {number}

Concentration parameter (commonly named "alpha") for the prior placed on documents' distributions over topics ("theta"). This method assumes the Dirichlet distribution is symmetric and can be described by a single Double parameter. It should fail if docConcentration is asymmetric.

Source:

eclairjs/mllib/clustering/LDA.js, line 108

Returns:

Type: number

getK() → {integer}

Number of topics to infer. I.e., the number of soft cluster centers.

Source:

eclairjs/mllib/clustering/LDA.js, line 69

Returns:

Type: integer

getMaxIterations() → {number}

Maximum number of iterations for learning.

Source:

eclairjs/mllib/clustering/LDA.js, line 279

Returns:

Type: number

getSeed() → {number}

Random seed

Source:

eclairjs/mllib/clustering/LDA.js, line 302

Returns:

Type: number

getTopicConcentration() → {number}

Source:

eclairjs/mllib/clustering/LDA.js, line 216

Returns:

Type: number

run(documents) → {LDAModel}

Learn an LDA model using the given dataset.

Parameters:

Name	Type	Description
`documents`	module:eclairjs.RDD \| PairRDD	RDD of documents, which are term (word) count vectors paired with IDs. The term count vectors are "bags of words" with a fixed-size vocabulary (where the vocabulary size is the length of the vector). Document IDs must be unique and >= 0.

Source:

eclairjs/mllib/clustering/LDA.js, line 369

Returns:

Inferred LDA model

Type: LDAModel

setAlphawithnumber(alpha)

Alias for [[setDocConcentration()]]

Parameters:

Name	Type	Description
`alpha`	number

Source:

eclairjs/mllib/clustering/LDA.js, line 199

Returns:

setAlphawithVector(alpha)

Alias for [[setDocConcentration()]]

Parameters:

Name	Type	Description
`alpha`	module:eclairjs/mllib/linalg.Vector

Source:

eclairjs/mllib/clustering/LDA.js, line 186

Returns:

setBeta(beta)

Alias for [[setTopicConcentration()]]

Parameters:

Name	Type	Description
`beta`	number

Source:

eclairjs/mllib/clustering/LDA.js, line 268

Returns:

setCheckpointInterval(checkpointInterval)

Period (in iterations) between checkpoints (default = 10). Checkpointing helps with recovery (when nodes fail). It also helps with eliminating temporary shuffle files on disk, which can be important when LDA is run for many iterations. If the checkpoint directory is not set in SparkContext, this setting is ignored.

Parameters:

Name	Type	Description
`checkpointInterval`	number

Source:

eclairjs/mllib/clustering/LDA.js, line 340

See:

[[org.apache.spark.SparkContext#setCheckpointDir]]

Returns:

setDocConcentrationwithnumber(docConcentration)

Replicates a Double docConcentration to create a symmetric prior.

Parameters:

Name	Type	Description
`docConcentration`	number

Source:

eclairjs/mllib/clustering/LDA.js, line 153

Returns:

setDocConcentrationwithVector(docConcentration)

Concentration parameter (commonly named "alpha") for the prior placed on documents' distributions over topics ("theta"). This is the parameter to a Dirichlet distribution, where larger values mean more smoothing (more regularization). If set to a singleton vector Vector(-1), then docConcentration is set automatically. If set to singleton vector Vector(t) where t != -1, then t is replicated to a vector of length k during [[LDAOptimizer.initialize()]]. Otherwise, the docConcentration vector must be length k. (default = Vector(-1) = automatic) Optimizer-specific parameter settings: - EM - Currently only supports symmetric distributions, so all values in the vector should be the same. - Values should be > 1.0 - default = uniformly (50 / k) + 1, where 50/k is common in LDA libraries and +1 follows from Asuncion et al. (2009), who recommend a +1 adjustment for EM. - Online - Values should be >= 0 - default = uniformly (1.0 / k), following the implementation from [[https://github.com/Blei-Lab/onlineldavb]].

Parameters:

Name	Type	Description
`docConcentration`	module:eclairjs/mllib/linalg.Vector

Source:

eclairjs/mllib/clustering/LDA.js, line 140

Returns:

setK(k) → {LDA}

Number of topics to infer. I.e., the number of soft cluster centers. (default = 10)

Parameters:

Name	Type	Description
`k`	integer

Source:

eclairjs/mllib/clustering/LDA.js, line 80

Returns:

Type: LDA

setMaxIterations(maxIterations)

Maximum number of iterations for learning. (default = 20)

Parameters:

Name	Type	Description
`maxIterations`	number

Source:

eclairjs/mllib/clustering/LDA.js, line 291

Returns:

setOptimizer(optimizerName)

Set the LDAOptimizer used to perform the actual calculation by algorithm name. Currently "em", "online" are supported.

Parameters:

Name	Type	Description
`optimizerName`	string

Source:

eclairjs/mllib/clustering/LDA.js, line 353

Returns:

setSeed(seed)

Random seed

Parameters:

Name	Type	Description
`seed`	number

Source:

eclairjs/mllib/clustering/LDA.js, line 313

Returns:

setTopicConcentration(topicConcentration)

Concentration parameter (commonly named "beta" or "eta") for the prior placed on topics' distributions over terms. This is the parameter to a symmetric Dirichlet distribution. Note: The topics' distributions over terms are called "beta" in the original LDA paper by Blei et al., but are called "phi" in many later papers such as Asuncion et al., 2009. If set to -1, then topicConcentration is set automatically. (default = -1 = automatic) Optimizer-specific parameter settings: - EM - Value should be > 1.0 - default = 0.1 + 1, where 0.1 gives a small amount of smoothing and +1 follows Asuncion et al. (2009), who recommend a +1 adjustment for EM. - Online - Value should be >= 0 - default = (1.0 / k), following the implementation from [[https://github.com/Blei-Lab/onlineldavb]].

Parameters:

Name	Type	Description
`topicConcentration`	number

Source:

eclairjs/mllib/clustering/LDA.js, line 246