Class: DistributedLDAModel

eclairjs/mllib/clustering. DistributedLDAModel

new DistributedLDAModel()

Distributed LDA model. This model stores the inferred topics, the full training dataset, and the topic distributions.
Source:

Extends

Methods

(static) load(sc, path) → {module:eclairjs/mllib/clustering.DistributedLDAModell}

Parameters:
Name Type Description
sc module:eclairjs.SparkContext
path string
Source:
Returns:
Type
module:eclairjs/mllib/clustering.DistributedLDAModell

describeTopics(maxTermsPerTopic) → {Array.<module:eclairjs.Tuple2>}

Parameters:
Name Type Description
maxTermsPerTopic number
Overrides:
Source:
Returns:
Type
Array.<module:eclairjs.Tuple2>

docConcentration() → {module:eclairjs/mllib/linalg.Vector}

Concentration parameter (commonly named "alpha") for the prior placed on documents' distributions over topics ("theta"). This is the parameter to a Dirichlet distribution.
Inherited From:
Source:
Returns:
Type
module:eclairjs/mllib/linalg.Vector

javaTopicDistributions() → {module:eclairjs.PairRDD}

Java-friendly version of topicDistributions
Source:
Returns:
Type
module:eclairjs.PairRDD

javaTopTopicsPerDocument(k) → {module:eclairjs.RDD}

Java-friendly version of topTopicsPerDocument
Parameters:
Name Type Description
k number
Source:
Returns:
Type
module:eclairjs.RDD

k() → {number}

Inherited From:
Source:
Returns:
Type
number

save(sc, path)

Java-friendly version of topicDistributions
Parameters:
Name Type Description
sc module:eclairjs.SparkContext
path string
Source:

toLocal() → {module:eclairjs/mllib/clustering.LocalLDAModel}

Convert model to a local model. The local model stores the inferred topics but not the topic distributions for training documents.
Source:
Returns:
Type
module:eclairjs/mllib/clustering.LocalLDAModel

topDocumentsPerTopic(maxDocumentsPerTopic) → {Array.<module:eclairjs.Tuple2>}

Return the top documents for each topic
Parameters:
Name Type Description
maxDocumentsPerTopic number Maximum number of documents to collect for each topic. (IDs for the documents, weights of the topic in these documents). For each topic, documents are sorted in order of decreasing topic weights.
Source:
Returns:
Array over topics. Each element represent as a pair of matching arrays:
Type
Array.<module:eclairjs.Tuple2>

topicConcentration() → {number}

Concentration parameter (commonly named "beta" or "eta") for the prior placed on topics' distributions over terms. This is the parameter to a symmetric Dirichlet distribution. Note: The topics' distributions over terms are called "beta" in the original LDA paper by Blei et al., but are called "phi" in many later papers such as Asuncion et al., 2009.
Inherited From:
Source:
Returns:
Type
number

topicDistributions() → {module:eclairjs.RDD}

For each document in the training set, return the distribution over topics for that document ("theta_doc").
Source:
Returns:
RDD of (document ID, topic distribution) pairs
Type
module:eclairjs.RDD

topicsMatrix() → {module:eclairjs/mllib/linalg.Matrix}

Inferred topics, where each topic is represented by a distribution over terms. This is a matrix of size vocabSize x k, where each column is a topic. No guarantees are given about the ordering of the topics.
Inherited From:
Source:
Returns:
Type
module:eclairjs/mllib/linalg.Matrix

topTopicsPerDocument(k) → {module:eclairjs.RDD}

For each document, return the top k weighted topics for that document and their weights.
Parameters:
Name Type Description
k number
Source:
Returns:
RDD of (doc ID, topic indices, topic weights)
Type
module:eclairjs.RDD

vocabSize() → {number}

Inherited From:
Source:
Returns:
Type
number