Constructor
new CountVectorizerModel(vocabulary, uidopt)
Parameters:
Name |
Type |
Attributes |
Description |
vocabulary |
Array.<string>
|
|
An Array over terms. Only the terms in the vocabulary will be counted. |
uid |
string
|
<optional>
|
|
- Source:
Methods
Parameters:
Name |
Type |
Description |
path |
string
|
|
- Source:
Returns:
-
Type
-
module:eclairjs/ml/feature.CountVectorizerModel
- Source:
Returns:
-
Type
-
module:eclairjs/ml/util.MLReader
Parameters:
- Source:
Returns:
-
Type
-
module:eclairjs/ml/feature.CountVectorizerModel
getMinDF() → {float}
- Source:
Returns:
-
Type
-
float
getMinTF() → {float}
- Source:
Returns:
-
Type
-
float
getVocabSize() → {integer}
- Source:
Returns:
-
Type
-
integer
Specifies the minimum number of different documents a term must appear in to be included in the vocabulary.
If this is an integer >= 1, this specifies the number of documents the term must appear in;
if this is a double in [0,1), then this specifies the fraction of documents.
- Source:
Returns:
-
Type
-
module:eclairjs/ml/param.DoubleParam
Parameters:
Name |
Type |
Description |
value |
string
|
|
- Source:
Returns:
-
Type
-
module:eclairjs/ml/feature.CountVectorizerModel
Parameters:
Name |
Type |
Description |
value |
float
|
|
- Source:
Returns:
-
Type
-
module:eclairjs/ml/feature.CountVectorizerModel
Parameters:
Name |
Type |
Description |
value |
string
|
|
- Source:
Returns:
-
Type
-
module:eclairjs/ml/feature.CountVectorizerModel
Parameters:
Name |
Type |
Description |
dataset |
module:eclairjs/sql.DataFrame
|
|
- Source:
Returns:
-
Type
-
module:eclairjs/sql.DataFrame
Parameters:
- Source:
Returns:
-
Type
-
module:eclairjs/sql/types.StructType
uid() → {string}
An immutable unique ID for the object and its derivatives.
- Source:
Returns:
-
Type
-
string
Validates and transforms the input schema.
Parameters:
- Source:
Returns:
-
Type
-
module:eclairjs/sql/types.StructType
Max size of the vocabulary. CountVectorizer will build a vocabulary that only considers the top vocabSize
terms ordered by term frequency across the corpus.
Default: 2^18^
- Source:
Returns:
-
Type
-
module:eclairjs/ml/param.IntParam
vocabulary() → {Array.<string>}
- Source:
Returns:
-
Type
-
Array.<string>
- Source:
Returns:
-
Type
-
module:eclairjs/ml/util.MLWriter