Constructor
new RandomForest(strategy, numTrees, featureSubsetStrategy, seed)
A class that implements a [[http://en.wikipedia.org/wiki/Random_forest Random Forest]]
learning algorithm for classification and regression.
It supports both continuous and categorical features.
Parameters:
Name | Type | Description |
---|---|---|
strategy |
module:eclairjs/mllib/tree/configuration.Strategy | The configuration parameters for the random forest algorithm which specify the type of algorithm (classification, regression, etc.), feature type (continuous, categorical), depth of the tree, quantile calculation strategy, etc. |
numTrees |
Number | If 1, then no bootstrapping is used. If > 1, then bootstrapping is done. |
featureSubsetStrategy |
Number of features to consider for splits at each node. Supported: "auto", "all", "sqrt", "log2", "onethird". If "auto" is set, this parameter is set based on numTrees: if numTrees == 1, set to "all"; if numTrees > 1 (forest) set to "sqrt" for classification and to "onethird" for regression. | |
seed |
Random seed for bootstrapping and choosing feature subsets. |
Methods
(static) trainClassifier(input, numClasses, categoricalFeaturesInfo, numTrees, featureSubsetStrategy, impurity, maxDepth, maxBins, seed)
Method to train a decision tree model for binary or multiclass classification.
Parameters:
Name | Type | Description |
---|---|---|
input |
module:eclairjs.RDD | Training dataset: RDD of [[org.apache.spark.mllib.regression.LabeledPoint]]. Labels should take values {0, 1, ..., numClasses-1}. |
numClasses |
Int | number of classes for classification. |
categoricalFeaturesInfo |
Object | Map storing arity of categorical features. E.g., an entry (n -> k) indicates that feature n is categorical with k categories indexed from 0: {0, 1, ..., k-1}. |
numTrees |
Int | Number of trees in the random forest. |
featureSubsetStrategy |
String | Number of features to consider for splits at each node. Supported: "auto", "all", "sqrt", "log2", "onethird". If "auto" is set, this parameter is set based on numTrees: if numTrees == 1, set to "all"; if numTrees > 1 (forest) set to "sqrt". |
impurity |
String | Criterion used for information gain calculation. Supported values: "gini" (recommended) or "entropy". |
maxDepth |
Int | Maximum depth of the tree. E.g., depth 0 means 1 leaf node; depth 1 means 1 internal node + 2 leaf nodes. (suggested value: 4) |
maxBins |
Int | maximum number of bins used for splitting features (suggested value: 100) |
seed |
Random seed for bootstrapping and choosing feature subsets. |
Returns:
a random forest model that can be used for prediction
(static) trainRegressor(input, categoricalFeaturesInfo, numTrees, featureSubsetStrategy, impurity, maxDepth, maxBins, seed) → {module:eclairjs/mllib/tree/model.RandomForestModel}
Method to train a decision tree model for regression.
Parameters:
Name | Type | Description |
---|---|---|
input |
module:eclairjs.RDD | Training dataset: RDD of [[org.apache.spark.mllib.regression.LabeledPoint]]. Labels are real numbers. |
categoricalFeaturesInfo |
Object | Map storing arity of categorical features. E.g., an entry (n -> k) indicates that feature n is categorical with k categories indexed from 0: {0, 1, ..., k-1}. |
numTrees |
Number | Number of trees in the random forest. |
featureSubsetStrategy |
Number | Number of features to consider for splits at each node. Supported: "auto", "all", "sqrt", "log2", "onethird". If "auto" is set, this parameter is set based on numTrees: if numTrees == 1, set to "all"; if numTrees > 1 (forest) set to "onethird". |
impurity |
Criterion used for information gain calculation. Supported values: "variance". | |
maxDepth |
Int | Maximum depth of the tree. E.g., depth 0 means 1 leaf node; depth 1 means 1 internal node + 2 leaf nodes. (suggested value: 4) |
maxBins |
Int | maximum number of bins used for splitting features (suggested value: 100) |
seed |
Number | Random seed for bootstrapping and choosing feature subsets. |
Returns:
a random forest model that can be used for prediction