Class: LDA

eclairjs/mllib/clustering.LDA

new LDA() → (nullable) {?}

Constructs a LDA instance with default parameters.
Source:
Returns:
Type
?

Methods

getAlpha() → {Promise.<number>}

Alias for getDocConcentration
Source:
Returns:
Type
Promise.<number>

getAsymmetricAlpha() → {module:eclairjs/mllib/linalg.Vector}

Alias for getAsymmetricDocConcentration
Source:
Returns:
Type
module:eclairjs/mllib/linalg.Vector

getAsymmetricDocConcentration() → {module:eclairjs/mllib/linalg.Vector}

Concentration parameter (commonly named "alpha") for the prior placed on documents' distributions over topics ("theta"). This is the parameter to a Dirichlet distribution.
Source:
Returns:
Type
module:eclairjs/mllib/linalg.Vector

getBeta() → {Promise.<number>}

Alias for getTopicConcentration
Source:
Returns:
Type
Promise.<number>

getCheckpointInterval() → {Promise.<number>}

Period (in iterations) between checkpoints.
Source:
Returns:
Type
Promise.<number>

getDocConcentration() → {Promise.<number>}

Concentration parameter (commonly named "alpha") for the prior placed on documents' distributions over topics ("theta"). This method assumes the Dirichlet distribution is symmetric and can be described by a single Double parameter. It should fail if docConcentration is asymmetric.
Source:
Returns:
Type
Promise.<number>

getK() → {Promise.<number>}

Number of topics to infer. I.e., the number of soft cluster centers.
Source:
Returns:
Type
Promise.<number>

getMaxIterations() → {Promise.<number>}

Maximum number of iterations for learning.
Source:
Returns:
Type
Promise.<number>

getSeed() → {Promise.<number>}

Random seed
Source:
Returns:
Type
Promise.<number>

getTopicConcentration() → {Promise.<number>}

Concentration parameter (commonly named "beta" or "eta") for the prior placed on topics' distributions over terms. This is the parameter to a symmetric Dirichlet distribution. Note: The topics' distributions over terms are called "beta" in the original LDA paper by Blei et al., but are called "phi" in many later papers such as Asuncion et al., 2009.
Source:
Returns:
Type
Promise.<number>

run(documents) → {module:eclairjs/mllib/clustering.LDAModel}

Learn an LDA model using the given dataset.
Parameters:
Name Type Description
documents module:eclairjs/rdd.RDD RDD of documents, which are term (word) count vectors paired with IDs. The term count vectors are "bags of words" with a fixed-size vocabulary (where the vocabulary size is the length of the vector). Document IDs must be unique and >= 0.
Source:
Returns:
Inferred LDA model
Type
module:eclairjs/mllib/clustering.LDAModel

setAlphawithnumber(alpha)

Alias for [[setDocConcentration()]]
Parameters:
Name Type Description
alpha number
Source:
Returns:

setAlphawithVector(alpha)

Alias for [[setDocConcentration()]]
Parameters:
Name Type Description
alpha module:eclairjs/mllib/linalg.Vector
Source:
Returns:

setBeta(beta)

Alias for [[setTopicConcentration()]]
Parameters:
Name Type Description
beta number
Source:
Returns:

setCheckpointInterval(checkpointInterval)

Period (in iterations) between checkpoints (default = 10). Checkpointing helps with recovery (when nodes fail). It also helps with eliminating temporary shuffle files on disk, which can be important when LDA is run for many iterations. If the checkpoint directory is not set in SparkContext, this setting is ignored.
Parameters:
Name Type Description
checkpointInterval number
Source:
See:
  • [[org.apache.spark.SparkContext#setCheckpointDir]]
Returns:

setDocConcentrationwithnumber(docConcentration)

Replicates a Double docConcentration to create a symmetric prior.
Parameters:
Name Type Description
docConcentration number
Source:
Returns:

setDocConcentrationwithVector(docConcentration)

Concentration parameter (commonly named "alpha") for the prior placed on documents' distributions over topics ("theta"). This is the parameter to a Dirichlet distribution, where larger values mean more smoothing (more regularization). If set to a singleton vector Vector(-1), then docConcentration is set automatically. If set to singleton vector Vector(t) where t != -1, then t is replicated to a vector of length k during [[LDAOptimizer.initialize()]]. Otherwise, the docConcentration vector must be length k. (default = Vector(-1) = automatic) Optimizer-specific parameter settings: - EM - Currently only supports symmetric distributions, so all values in the vector should be the same. - Values should be > 1.0 - default = uniformly (50 / k) + 1, where 50/k is common in LDA libraries and +1 follows from Asuncion et al. (2009), who recommend a +1 adjustment for EM. - Online - Values should be >= 0 - default = uniformly (1.0 / k), following the implementation from [[https://github.com/Blei-Lab/onlineldavb]].
Parameters:
Name Type Description
docConcentration module:eclairjs/mllib/linalg.Vector
Source:
Returns:

setK(k) → {module:eclairjs/mllib/clustering.LDA}

Number of topics to infer. I.e., the number of soft cluster centers. (default = 10)
Parameters:
Name Type Description
k number
Source:
Returns:
Type
module:eclairjs/mllib/clustering.LDA

setMaxIterations(maxIterations)

Maximum number of iterations for learning. (default = 20)
Parameters:
Name Type Description
maxIterations number
Source:
Returns:

setOptimizer(optimizerName)

Set the LDAOptimizer used to perform the actual calculation by algorithm name. Currently "em", "online" are supported.
Parameters:
Name Type Description
optimizerName string
Source:
Returns:

setSeed(seed)

Random seed
Parameters:
Name Type Description
seed number
Source:
Returns:

setTopicConcentration(topicConcentration)

Concentration parameter (commonly named "beta" or "eta") for the prior placed on topics' distributions over terms. This is the parameter to a symmetric Dirichlet distribution. Note: The topics' distributions over terms are called "beta" in the original LDA paper by Blei et al., but are called "phi" in many later papers such as Asuncion et al., 2009. If set to -1, then topicConcentration is set automatically. (default = -1 = automatic) Optimizer-specific parameter settings: - EM - Value should be > 1.0 - default = 0.1 + 1, where 0.1 gives a small amount of smoothing and +1 follows Asuncion et al. (2009), who recommend a +1 adjustment for EM. - Online - Value should be >= 0 - default = (1.0 / k), following the implementation from [[https://github.com/Blei-Lab/onlineldavb]].
Parameters:
Name Type Description
topicConcentration number
Source:
Returns: