JSDoc: Class: RandomForest

Class: RandomForest

eclairjs/mllib/tree.RandomForest

The settings for featureSubsetStrategy are based on the following references: - log2: tested in Breiman (2001) - sqrt: recommended by Breiman manual for random forests - The defaults of sqrt (classification) and onethird (regression) match the R randomForest package. [[http://www.stat.berkeley.edu/~breiman/randomforest2001.pdf Breiman (2001)]] [[http://www.stat.berkeley.edu/~breiman/Using_random_forests_V3.1.pdf Breiman manual for random forests]]

Constructor

new RandomForest(strategy, numTrees, featureSubsetStrategy, seed)

A class that implements a [[http://en.wikipedia.org/wiki/Random_forest Random Forest]] learning algorithm for classification and regression. It supports both continuous and categorical features.

Parameters:

Name	Type	Description
`strategy`	module:eclairjs/mllib/tree/configuration.Strategy	The configuration parameters for the random forest algorithm which specify the type of algorithm (classification, regression, etc.), feature type (continuous, categorical), depth of the tree, quantile calculation strategy, etc.
`numTrees`	Number	If 1, then no bootstrapping is used. If > 1, then bootstrapping is done.
`featureSubsetStrategy`		Number of features to consider for splits at each node. Supported: "auto", "all", "sqrt", "log2", "onethird". If "auto" is set, this parameter is set based on numTrees: if numTrees == 1, set to "all"; if numTrees > 1 (forest) set to "sqrt" for classification and to "onethird" for regression.
`seed`		Random seed for bootstrapping and choosing feature subsets.

Source:

mllib/tree/RandomForest.js, line 55

Methods

(static) trainClassifier(input, numClasses, categoricalFeaturesInfo, numTrees, featureSubsetStrategy, impurity, maxDepth, maxBins, seed) → {RandomForestModel}

Method to train a decision tree model for binary or multiclass classification.

Parameters:

Name	Type	Description
`input`	module:eclairjs/rdd.RDD	Training dataset: RDD of LabeledPoint. Labels should take values {0, 1, ..., numClasses-1}.
`numClasses`	number	number of classes for classification.
`categoricalFeaturesInfo`	Map	Map storing arity of categorical features. E.g., an entry (n -> k) indicates that feature n is categorical with k categories indexed from 0: {0, 1, ..., k-1}.
`numTrees`	number	Number of trees in the random forest.
`featureSubsetStrategy`	string	Number of features to consider for splits at each node. Supported: "auto", "all", "sqrt", "log2", "onethird". If "auto" is set, this parameter is set based on numTrees: if numTrees == 1, set to "all"; if numTrees > 1 (forest) set to "sqrt".
`impurity`	string	Criterion used for information gain calculation. Supported values: "gini" (recommended) or "entropy".
`maxDepth`	number	Maximum depth of the tree. E.g., depth 0 means 1 leaf node; depth 1 means 1 internal node + 2 leaf nodes. (suggested value: 4)
`maxBins`	number	maximum number of bins used for splitting features (suggested value: 100)
`seed`	number	Random seed for bootstrapping and choosing feature subsets.

Source:

mllib/tree/RandomForest.js, line 89

Returns:

a random forest model that can be used for prediction

Type: RandomForestModel

(static) trainRegressor(input, categoricalFeaturesInfo, numTrees, featureSubsetStrategy, impurity, maxDepth, maxBins, seed) → {RandomForestModel}

Method to train a decision tree model for regression.

Parameters:

Name	Type	Description
`input`	module:eclairjs/rdd.RDD	Training dataset: RDD of LabeledPoint. Labels are real numbers.
`categoricalFeaturesInfo`	Map	Map storing arity of categorical features. E.g., an entry (n -> k) indicates that feature n is categorical with k categories indexed from 0: {0, 1, ..., k-1}.
`numTrees`	number	Number of trees in the random forest.
`featureSubsetStrategy`	string	Number of features to consider for splits at each node. Supported: "auto", "all", "sqrt", "log2", "onethird". If "auto" is set, this parameter is set based on numTrees: if numTrees == 1, set to "all"; if numTrees > 1 (forest) set to "onethird".
`impurity`	string	Criterion used for information gain calculation. Supported values: "variance".
`maxDepth`	number	Maximum depth of the tree. E.g., depth 0 means 1 leaf node; depth 1 means 1 internal node + 2 leaf nodes. (suggested value: 4)
`maxBins`	number	maximum number of bins used for splitting features (suggested value: 100)
`seed`	number	Random seed for bootstrapping and choosing feature subsets.

Source:

mllib/tree/RandomForest.js, line 126

Returns:

a random forest model that can be used for prediction

Type: RandomForestModel