Class: CountVectorizer

eclairjs/ml/feature. CountVectorizer

:: Experimental :: Extracts a vocabulary from document collections and generates a CountVectorizerModel.

Constructor

new CountVectorizer(uidopt)

Parameters:
Name Type Attributes Description
uid string <optional>
Source:

Methods

(static) load(path) → {module:eclairjs/ml/feature.CountVectorizer}

Parameters:
Name Type Description
path string
Source:
Returns:
Type
module:eclairjs/ml/feature.CountVectorizer

copy(extra) → {module:eclairjs/ml/feature.CountVectorizer}

Parameters:
Name Type Description
extra module:eclairjs/ml/param.ParamMap
Source:
Returns:
Type
module:eclairjs/ml/feature.CountVectorizer

fit(dataset) → {module:eclairjs/ml/feature.CountVectorizerModel}

Parameters:
Name Type Description
dataset module:eclairjs/sql.Dataset
Source:
Returns:
Type
module:eclairjs/ml/feature.CountVectorizerModel

getMinDF() → {float}

Source:
Returns:
Type
float

getMinTF() → {float}

Source:
Returns:
Type
float

getVocabSize() → {integer}

Source:
Returns:
Type
integer

minDF() → {module:eclairjs/ml/param.DoubleParam}

Specifies the minimum number of different documents a term must appear in to be included in the vocabulary. If this is an integer >= 1, this specifies the number of documents the term must appear in; if this is a double in [0,1), then this specifies the fraction of documents.
Source:
Returns:
Type
module:eclairjs/ml/param.DoubleParam

setBinary(value) → {module:eclairjs/ml/feature.CountVectorizer}

Parameters:
Name Type Description
value boolean
Source:
Returns:
Type
module:eclairjs/ml/feature.CountVectorizer

setInputCol(value) → {module:eclairjs/ml/feature.CountVectorizer}

Parameters:
Name Type Description
value string
Source:
Returns:
Type
module:eclairjs/ml/feature.CountVectorizer

setMinDF(value) → {module:eclairjs/ml/feature.CountVectorizer}

Parameters:
Name Type Description
value float
Source:
Returns:
Type
module:eclairjs/ml/feature.CountVectorizer

setMinTF(value) → {module:eclairjs/ml/feature.CountVectorizer}

Parameters:
Name Type Description
value float
Source:
Returns:
Type
module:eclairjs/ml/feature.CountVectorizer

setOutputCol(value) → {module:eclairjs/ml/feature.CountVectorizer}

Parameters:
Name Type Description
value string
Source:
Returns:
Type
module:eclairjs/ml/feature.CountVectorizer

setVocabSize(value) → {module:eclairjs/ml/feature.CountVectorizer}

Parameters:
Name Type Description
value integer
Source:
Returns:
Type
module:eclairjs/ml/feature.CountVectorizer

transformSchema(schema) → {module:eclairjs/sql/types.StructType}

Parameters:
Name Type Description
schema module:eclairjs/sql/types.StructType
Source:
Returns:
Type
module:eclairjs/sql/types.StructType

uid() → {string}

An immutable unique ID for the object and its derivatives.
Source:
Returns:
Type
string

validateAndTransformSchema(schema) → {module:eclairjs/sql/types.StructType}

Validates and transforms the input schema.
Parameters:
Name Type Description
schema module:eclairjs/sql/types.StructType
Source:
Returns:
Type
module:eclairjs/sql/types.StructType

vocabSize() → {module:eclairjs/ml/param.IntParam}

Max size of the vocabulary. CountVectorizer will build a vocabulary that only considers the top vocabSize terms ordered by term frequency across the corpus. Default: 2^18^
Source:
Returns:
Type
module:eclairjs/ml/param.IntParam