Class: CountVectorizerModel

eclairjs/ml/feature. CountVectorizerModel

:: Experimental :: Converts a text document to a sparse vector of token counts.

Constructor

new CountVectorizerModel(vocabulary, uidopt)

Parameters:
Name Type Attributes Description
vocabulary Array.<string> An Array over terms. Only the terms in the vocabulary will be counted.
uid string <optional>
Source:

Methods

(static) load(path) → {module:eclairjs/ml/feature.CountVectorizerModel}

Parameters:
Name Type Description
path string
Source:
Returns:
Type
module:eclairjs/ml/feature.CountVectorizerModel

(static) read() → {module:eclairjs/ml/util.MLReader}

Source:
Returns:
Type
module:eclairjs/ml/util.MLReader

copy(extra) → {module:eclairjs/ml/feature.CountVectorizerModel}

Parameters:
Name Type Description
extra module:eclairjs/ml/param.ParamMap
Source:
Returns:
Type
module:eclairjs/ml/feature.CountVectorizerModel

getMinDF() → {float}

Source:
Returns:
Type
float

getMinTF() → {float}

Source:
Returns:
Type
float

getVocabSize() → {integer}

Source:
Returns:
Type
integer

minDF() → {module:eclairjs/ml/param.DoubleParam}

Specifies the minimum number of different documents a term must appear in to be included in the vocabulary. If this is an integer >= 1, this specifies the number of documents the term must appear in; if this is a double in [0,1), then this specifies the fraction of documents.
Source:
Returns:
Type
module:eclairjs/ml/param.DoubleParam

setInputCol(value) → {module:eclairjs/ml/feature.CountVectorizerModel}

Parameters:
Name Type Description
value string
Source:
Returns:
Type
module:eclairjs/ml/feature.CountVectorizerModel

setMinTF(value) → {module:eclairjs/ml/feature.CountVectorizerModel}

Parameters:
Name Type Description
value float
Source:
Returns:
Type
module:eclairjs/ml/feature.CountVectorizerModel

setOutputCol(value) → {module:eclairjs/ml/feature.CountVectorizerModel}

Parameters:
Name Type Description
value string
Source:
Returns:
Type
module:eclairjs/ml/feature.CountVectorizerModel

transform(dataset) → {module:eclairjs/sql.DataFrame}

Parameters:
Name Type Description
dataset module:eclairjs/sql.DataFrame
Source:
Returns:
Type
module:eclairjs/sql.DataFrame

transformSchema(schema) → {module:eclairjs/sql/types.StructType}

Parameters:
Name Type Description
schema module:eclairjs/sql/types.StructType
Source:
Returns:
Type
module:eclairjs/sql/types.StructType

uid() → {string}

An immutable unique ID for the object and its derivatives.
Source:
Returns:
Type
string

validateAndTransformSchema(schema) → {module:eclairjs/sql/types.StructType}

Validates and transforms the input schema.
Parameters:
Name Type Description
schema module:eclairjs/sql/types.StructType
Source:
Returns:
Type
module:eclairjs/sql/types.StructType

vocabSize() → {module:eclairjs/ml/param.IntParam}

Max size of the vocabulary. CountVectorizer will build a vocabulary that only considers the top vocabSize terms ordered by term frequency across the corpus. Default: 2^18^
Source:
Returns:
Type
module:eclairjs/ml/param.IntParam

vocabulary() → {Array.<string>}

Source:
Returns:
Type
Array.<string>

write() → {module:eclairjs/ml/util.MLWriter}

Source:
Returns:
Type
module:eclairjs/ml/util.MLWriter