Class: DecisionTree

eclairjs/mllib/tree.DecisionTree

new DecisionTree(strategy)

Parameters:
Name Type Description
strategy module:eclairjs/mllib/tree/configuration.Strategy
Source:

Methods

(static) train0(input, strategy) → {module:eclairjs/mllib/tree/model.DecisionTreeModel}

Method to train a decision tree model. The method supports binary and multiclass classification and regression. Note: Using [[org.apache.spark.mllib.tree.DecisionTree$#trainClassifier]] and [[org.apache.spark.mllib.tree.DecisionTree$#trainRegressor]] is recommended to clearly separate classification and regression.
Parameters:
Name Type Description
input module:eclairjs/rdd.RDD Training dataset: RDD of LabeledPoint. For classification, labels should take values {0, 1, ..., numClasses-1}. For regression, labels are real numbers.
strategy module:eclairjs/mllib/tree/configuration.Strategy The configuration parameters for the tree algorithm which specify the type of algorithm (classification, regression, etc.), feature type (continuous, categorical), depth of the tree, quantile calculation strategy, etc.
Source:
Returns:
DecisionTreeModel that can be used for prediction
Type
module:eclairjs/mllib/tree/model.DecisionTreeModel

(static) train1(input, algo, impurity, maxDepth) → {module:eclairjs/mllib/tree/model.DecisionTreeModel}

Method to train a decision tree model. The method supports binary and multiclass classification and regression. Note: Using [[org.apache.spark.mllib.tree.DecisionTree$#trainClassifier]] and [[org.apache.spark.mllib.tree.DecisionTree$#trainRegressor]] is recommended to clearly separate classification and regression.
Parameters:
Name Type Description
input module:eclairjs/rdd.RDD Training dataset: RDD of LabeledPoint. For classification, labels should take values {0, 1, ..., numClasses-1}. For regression, labels are real numbers.
algo Algo algorithm, classification or regression
impurity Impurity impurity criterion used for information gain calculation
maxDepth number Maximum depth of the tree. E.g., depth 0 means 1 leaf node; depth 1 means 1 internal node + 2 leaf nodes.
Source:
Returns:
DecisionTreeModel that can be used for prediction
Type
module:eclairjs/mllib/tree/model.DecisionTreeModel

(static) train2(input, algo, impurity, maxDepth, numClasses) → {module:eclairjs/mllib/tree/model.DecisionTreeModel}

Method to train a decision tree model. The method supports binary and multiclass classification and regression. Note: Using [[org.apache.spark.mllib.tree.DecisionTree$#trainClassifier]] and [[org.apache.spark.mllib.tree.DecisionTree$#trainRegressor]] is recommended to clearly separate classification and regression.
Parameters:
Name Type Description
input module:eclairjs/rdd.RDD Training dataset: RDD of LabeledPoint. For classification, labels should take values {0, 1, ..., numClasses-1}. For regression, labels are real numbers.
algo Algo algorithm, classification or regression
impurity Impurity impurity criterion used for information gain calculation
maxDepth number Maximum depth of the tree. E.g., depth 0 means 1 leaf node; depth 1 means 1 internal node + 2 leaf nodes.
numClasses number number of classes for classification. Default value of 2.
Source:
Returns:
DecisionTreeModel that can be used for prediction
Type
module:eclairjs/mllib/tree/model.DecisionTreeModel

(static) train3(input, algo, impurity, maxDepth, numClasses, maxBins, quantileCalculationStrategy, categoricalFeaturesInfo) → {module:eclairjs/mllib/tree/model.DecisionTreeModel}

Method to train a decision tree model. The method supports binary and multiclass classification and regression. Note: Using [[org.apache.spark.mllib.tree.DecisionTree$#trainClassifier]] and [[org.apache.spark.mllib.tree.DecisionTree$#trainRegressor]] is recommended to clearly separate classification and regression.
Parameters:
Name Type Description
input module:eclairjs/rdd.RDD Training dataset: RDD of LabeledPoint. For classification, labels should take values {0, 1, ..., numClasses-1}. For regression, labels are real numbers.
algo Algo classification or regression
impurity Impurity criterion used for information gain calculation
maxDepth number Maximum depth of the tree. E.g., depth 0 means 1 leaf node; depth 1 means 1 internal node + 2 leaf nodes.
numClasses number number of classes for classification. Default value of 2.
maxBins number maximum number of bins used for splitting features
quantileCalculationStrategy QuantileStrategy algorithm for calculating quantiles
categoricalFeaturesInfo Map Map storing arity of categorical features. E.g., an entry (n -> k) indicates that feature n is categorical with k categories indexed from 0: {0, 1, ..., k-1}.
Source:
Returns:
DecisionTreeModel that can be used for prediction
Type
module:eclairjs/mllib/tree/model.DecisionTreeModel

(static) trainClassifier(input, numClasses, categoricalFeaturesInfo, impurity, maxDepth, maxBins) → {module:eclairjs/mllib/tree/model.DecisionTreeModel}

Method to train a decision tree model for binary or multiclass classification.
Parameters:
Name Type Description
input module:eclairjs/rdd.RDD Training dataset: RDD of LabeledPoint. Labels should take values {0, 1, ..., numClasses-1}.
numClasses number number of classes for classification.
categoricalFeaturesInfo object object name key pair map storing arity of categorical features. E.g., an entry (n -> k) indicates that feature n is categorical with k categories indexed from 0: {0, 1, ..., k-1}.
impurity string Criterion used for information gain calculation. Supported values: "gini" (recommended) or "entropy".
maxDepth number Maximum depth of the tree. E.g., depth 0 means 1 leaf node; depth 1 means 1 internal node + 2 leaf nodes. (suggested value: 5)
maxBins number maximum number of bins used for splitting features (suggested value: 32)
Source:
Returns:
DecisionTreeModel that can be used for prediction
Type
module:eclairjs/mllib/tree/model.DecisionTreeModel

(static) trainRegressorwithnumber(input, categoricalFeaturesInfo, impurity, maxDepth, maxBins) → {module:eclairjs/mllib/tree/model.DecisionTreeModel}

Method to train a decision tree model for regression.
Parameters:
Name Type Description
input module:eclairjs/rdd.RDD Training dataset: RDD of LabeledPoint. Labels are real numbers.
categoricalFeaturesInfo Map Map storing arity of categorical features. E.g., an entry (n -> k) indicates that feature n is categorical with k categories indexed from 0: {0, 1, ..., k-1}.
impurity string Criterion used for information gain calculation. Supported values: "variance".
maxDepth number Maximum depth of the tree. E.g., depth 0 means 1 leaf node; depth 1 means 1 internal node + 2 leaf nodes. (suggested value: 5)
maxBins number maximum number of bins used for splitting features (suggested value: 32)
Source:
Returns:
DecisionTreeModel that can be used for prediction
Type
module:eclairjs/mllib/tree/model.DecisionTreeModel

(static) trainRegressorwithnumber(input, categoricalFeaturesInfo, impurity, maxDepth, maxBins) → {module:eclairjs/mllib/tree/model.DecisionTreeModel}

Java-friendly API for [[org.apache.spark.mllib.tree.DecisionTree$#trainRegressor]]
Parameters:
Name Type Description
input JavaRDD
categoricalFeaturesInfo Map
impurity string
maxDepth number
maxBins number
Source:
Returns:
Type
module:eclairjs/mllib/tree/model.DecisionTreeModel

run(input) → {module:eclairjs/mllib/tree/model.DecisionTreeModel}

Method to train a decision tree model over an RDD
Parameters:
Name Type Description
input module:eclairjs/rdd.RDD Training data: RDD of LabeledPoint
Source:
Returns:
DecisionTreeModel that can be used for prediction
Type
module:eclairjs/mllib/tree/model.DecisionTreeModel