Extends
Methods
(static) load(path) → {module:eclairjs/mllib/clustering.LocalLDAModel}
Parameters:
Name | Type | Description |
---|---|---|
path |
string |
Returns:
(static) read() → {module:eclairjs/ml/util.MLReader}
Returns:
copy(extra) → {module:eclairjs/mllib/clustering.LocalLDAModel}
Parameters:
Name | Type | Description |
---|---|---|
extra |
module:eclairjs/ml/param.ParamMap |
- Overrides:
- Source:
Returns:
describeTopics(maxTermsPerTopicopt) → {module:eclairjs/sql.DataFrame}
Return the topics described by their top-weighted terms.
Parameters:
Name | Type | Attributes | Description |
---|---|---|---|
maxTermsPerTopic |
number |
<optional> |
Maximum number of terms to collect for each topic. Default value of 10. - "topic": IntegerType: topic index - "termIndices": ArrayType(IntegerType): term indices, sorted in order of decreasing term importance - "termWeights": ArrayType(DoubleType): corresponding sorted term weights |
- Inherited From:
- Source:
Returns:
Local DataFrame with one topic per Row, with columns:
- Type
- module:eclairjs/sql.DataFrame
estimatedDocConcentration() → {module:eclairjs/mllib/linalg.Vector}
Value for docConcentration estimated from data.
If Online LDA was used and optimizeDocConcentration was set to false,
then this returns the fixed (given) value for the docConcentration parameter.
- Inherited From:
- Source:
Returns:
hasParent() → {boolean}
- Inherited From:
- Source:
Returns:
- Type
- boolean
isDistributed() → {boolean}
- Overrides:
- Source:
Returns:
- Type
- boolean
logLikelihood(dataset) → {number}
Calculates a lower bound on the log likelihood of the entire corpus.
See Equation (16) in the Online LDA paper (Hoffman et al., 2010).
WARNING: If this model is an instance of module:eclairjs/ml/clustering.DistributedLDAModel (produced when optimizer
is set to "em"), this involves collecting a large topicsMatrix to the driver.
This implementation may be changed in the future.
Parameters:
Name | Type | Description |
---|---|---|
dataset |
module:eclairjs/sql.DataFrame | test corpus to use for calculating log likelihood |
- Inherited From:
- Source:
Returns:
variational lower bound on the log likelihood of the entire corpus
- Type
- number
logPerplexity(dataset) → {number}
Calculate an upper bound bound on perplexity. (Lower is better.)
See Equation (16) in the Online LDA paper (Hoffman et al., 2010).
WARNING: If this model is an instance of module:eclairjs/ml/clustering.DistributedLDAModel (produced when optimizer
is set to "em"), this involves collecting a large topicsMatrix to the driver.
This implementation may be changed in the future.
Parameters:
Name | Type | Description |
---|---|---|
dataset |
module:eclairjs/sql.DataFrame | test corpus to use for calculating perplexity |
- Inherited From:
- Source:
Returns:
Variational upper bound on log perplexity per token.
- Type
- number
parent() → {module:eclairjs/ml.Estimator}
- Inherited From:
- Source:
Returns:
setFeaturesCol(value) → {module:eclairjs/mllib/clustering.LDAModel}
The features for LDA should be a module:eclairjs/mllib/linalg.Vector representing the word counts in a document.
The vector should be of length vocabSize, with counts for each term (word).
Parameters:
Name | Type | Description |
---|---|---|
value |
string |
- Inherited From:
- Source:
Returns:
setParent(parent) → {object}
Sets the parent of this model.
Parameters:
Name | Type | Description |
---|---|---|
parent |
module:eclairjs/ml.Estimator |
- Inherited From:
- Source:
Returns:
- Type
- object
setSeed(value) → {module:eclairjs/mllib/clustering.LDAModel}
Parameters:
Name | Type | Description |
---|---|---|
value |
number |
- Inherited From:
- Source:
Returns:
topicsMatrix() → {module:eclairjs/mllib/linalg.Matrix}
Inferred topics, where each topic is represented by a distribution over terms.
This is a matrix of size vocabSize x k, where each column is a topic.
No guarantees are given about the ordering of the topics.
WARNING: If this model is actually a module:eclairjs/ml/clustering.DistributedLDAModel instance produced by
the Expectation-Maximization ("em") optimizer, then this method could involve
collecting a large amount of data to the driver (on the order of vocabSize x k).
- Inherited From:
- Source:
Returns:
transform(dataset) → {module:eclairjs/sql.DataFrame}
Transforms the input dataset.
WARNING: If this model is an instance of module:eclairjs/ml/clustering.DistributedLDAModel (produced when optimizer
is set to "em"), this involves collecting a large topicsMatrix to the driver.
This implementation may be changed in the future.
Parameters:
Name | Type | Description |
---|---|---|
dataset |
module:eclairjs/sql.DataFrame |
- Inherited From:
- Source:
Returns:
- Type
- module:eclairjs/sql.DataFrame
transformSchema(schema) → {module:eclairjs/sql/types.StructType}
Parameters:
Name | Type | Description |
---|---|---|
schema |
module:eclairjs/sql/types.StructType |
- Inherited From:
- Source: