new DecisionTree(strategy)
Parameters:
Name | Type | Description |
---|---|---|
strategy |
module:eclairjs/mllib/tree/configuration.Strategy |
Methods
(static) train0(input, strategy) → {module:eclairjs/mllib/tree/model.DecisionTreeModel}
Method to train a decision tree model.
The method supports binary and multiclass classification and regression.
Note: Using [[org.apache.spark.mllib.tree.DecisionTree$#trainClassifier]]
and [[org.apache.spark.mllib.tree.DecisionTree$#trainRegressor]]
is recommended to clearly separate classification and regression.
Parameters:
Name | Type | Description |
---|---|---|
input |
module:eclairjs.RDD | Training dataset: RDD of LabeledPoint. For classification, labels should take values {0, 1, ..., numClasses-1}. For regression, labels are real numbers. |
strategy |
module:eclairjs/mllib/tree/configuration.Strategy | The configuration parameters for the tree algorithm which specify the type of algorithm (classification, regression, etc.), feature type (continuous, categorical), depth of the tree, quantile calculation strategy, etc. |
Returns:
DecisionTreeModel that can be used for prediction
(static) train1(input, algo, impurity, maxDepth) → {module:eclairjs/mllib/tree/model.DecisionTreeModel}
Method to train a decision tree model.
The method supports binary and multiclass classification and regression.
Note: Using [[org.apache.spark.mllib.tree.DecisionTree$#trainClassifier]]
and [[org.apache.spark.mllib.tree.DecisionTree$#trainRegressor]]
is recommended to clearly separate classification and regression.
Parameters:
Name | Type | Description |
---|---|---|
input |
module:eclairjs.RDD | Training dataset: RDD of LabeledPoint. For classification, labels should take values {0, 1, ..., numClasses-1}. For regression, labels are real numbers. |
algo |
Algo | algorithm, classification or regression |
impurity |
Impurity | impurity criterion used for information gain calculation |
maxDepth |
number | Maximum depth of the tree. E.g., depth 0 means 1 leaf node; depth 1 means 1 internal node + 2 leaf nodes. |
Returns:
DecisionTreeModel that can be used for prediction
(static) train2(input, algo, impurity, maxDepth, numClasses) → {module:eclairjs/mllib/tree/model.DecisionTreeModel}
Method to train a decision tree model.
The method supports binary and multiclass classification and regression.
Note: Using [[org.apache.spark.mllib.tree.DecisionTree$#trainClassifier]]
and [[org.apache.spark.mllib.tree.DecisionTree$#trainRegressor]]
is recommended to clearly separate classification and regression.
Parameters:
Name | Type | Description |
---|---|---|
input |
module:eclairjs.RDD | Training dataset: RDD of LabeledPoint. For classification, labels should take values {0, 1, ..., numClasses-1}. For regression, labels are real numbers. |
algo |
Algo | algorithm, classification or regression |
impurity |
Impurity | impurity criterion used for information gain calculation |
maxDepth |
number | Maximum depth of the tree. E.g., depth 0 means 1 leaf node; depth 1 means 1 internal node + 2 leaf nodes. |
numClasses |
number | number of classes for classification. Default value of 2. |
Returns:
DecisionTreeModel that can be used for prediction
(static) train3(input, algo, impurity, maxDepth, numClasses, maxBins, quantileCalculationStrategy, categoricalFeaturesInfo) → {module:eclairjs/mllib/tree/model.DecisionTreeModel}
Method to train a decision tree model.
The method supports binary and multiclass classification and regression.
Note: Using [[org.apache.spark.mllib.tree.DecisionTree$#trainClassifier]]
and [[org.apache.spark.mllib.tree.DecisionTree$#trainRegressor]]
is recommended to clearly separate classification and regression.
Parameters:
Name | Type | Description |
---|---|---|
input |
module:eclairjs.RDD | Training dataset: RDD of LabeledPoint. For classification, labels should take values {0, 1, ..., numClasses-1}. For regression, labels are real numbers. |
algo |
Algo | classification or regression |
impurity |
Impurity | criterion used for information gain calculation |
maxDepth |
number | Maximum depth of the tree. E.g., depth 0 means 1 leaf node; depth 1 means 1 internal node + 2 leaf nodes. |
numClasses |
number | number of classes for classification. Default value of 2. |
maxBins |
number | maximum number of bins used for splitting features |
quantileCalculationStrategy |
QuantileStrategy | algorithm for calculating quantiles |
categoricalFeaturesInfo |
Map | Map storing arity of categorical features. E.g., an entry (n -> k) indicates that feature n is categorical with k categories indexed from 0: {0, 1, ..., k-1}. |
Returns:
DecisionTreeModel that can be used for prediction
(static) trainClassifier(input, numClasses, categoricalFeaturesInfo, impurity, maxDepth, maxBins) → {module:eclairjs/mllib/tree/model.DecisionTreeModel}
Method to train a decision tree model for binary or multiclass classification.
Parameters:
Name | Type | Description |
---|---|---|
input |
module:eclairjs.RDD | Training dataset: RDD of LabeledPoint. Labels should take values {0, 1, ..., numClasses-1}. |
numClasses |
number | number of classes for classification. |
categoricalFeaturesInfo |
object | object name key pair map storing arity of categorical features. E.g., an entry (n -> k) indicates that feature n is categorical with k categories indexed from 0: {0, 1, ..., k-1}. |
impurity |
string | Criterion used for information gain calculation. Supported values: "gini" (recommended) or "entropy". |
maxDepth |
number | Maximum depth of the tree. E.g., depth 0 means 1 leaf node; depth 1 means 1 internal node + 2 leaf nodes. (suggested value: 5) |
maxBins |
number | maximum number of bins used for splitting features (suggested value: 32) |
Returns:
DecisionTreeModel that can be used for prediction
(static) trainRegressor(input, categoricalFeaturesInfo, impurity, maxDepth, maxBins) → {module:eclairjs/mllib/tree/model.DecisionTreeModel}
Method to train a decision tree model for regression.
Parameters:
Name | Type | Description |
---|---|---|
input |
module:eclairjs.RDD | Training dataset: RDD of LabeledPoint. Labels are real numbers. |
categoricalFeaturesInfo |
object | key value storing arity of categorical features. E.g., an entry (n -> k) indicates that feature n is categorical with k categories indexed from 0: {0, 1, ..., k-1}. |
impurity |
string | Criterion used for information gain calculation. Supported values: "variance". |
maxDepth |
number | Maximum depth of the tree. E.g., depth 0 means 1 leaf node; depth 1 means 1 internal node + 2 leaf nodes. (suggested value: 5) |
maxBins |
number | maximum number of bins used for splitting features (suggested value: 32) |
Returns:
DecisionTreeModel that can be used for prediction
run(input) → {module:eclairjs/mllib/tree/model.DecisionTreeModel}
Method to train a decision tree model over an RDD
Parameters:
Name | Type | Description |
---|---|---|
input |
module:eclairjs.RDD | Training data: RDD of LabeledPoint |
Returns:
DecisionTreeModel that can be used for prediction