JSDoc: Class: RandomForest

Class: RandomForest

eclairjs/mllib/tree. RandomForest

The settings for featureSubsetStrategy are based on the following references: - log2: tested in Breiman (2001) - sqrt: recommended by Breiman manual for random forests - The defaults of sqrt (classification) and onethird (regression) match the R randomForest package. [[http://www.stat.berkeley.edu/~breiman/randomforest2001.pdf Breiman (2001)]] [[http://www.stat.berkeley.edu/~breiman/Using_random_forests_V3.1.pdf Breiman manual for random forests]]

Constructor

new RandomForest(strategy, numTrees, featureSubsetStrategy, seed)

A class that implements a [[http://en.wikipedia.org/wiki/Random_forest Random Forest]] learning algorithm for classification and regression. It supports both continuous and categorical features.

Parameters:

Name	Type	Description
`strategy`	module:eclairjs/mllib/tree/configuration.Strategy	The configuration parameters for the random forest algorithm which specify the type of algorithm (classification, regression, etc.), feature type (continuous, categorical), depth of the tree, quantile calculation strategy, etc.
`numTrees`	Number	If 1, then no bootstrapping is used. If > 1, then bootstrapping is done.
`featureSubsetStrategy`		Number of features to consider for splits at each node. Supported: "auto", "all", "sqrt", "log2", "onethird". If "auto" is set, this parameter is set based on numTrees: if numTrees == 1, set to "all"; if numTrees > 1 (forest) set to "sqrt" for classification and to "onethird" for regression.
`seed`		Random seed for bootstrapping and choosing feature subsets.

Source:

eclairjs/mllib/tree/RandomForest.js, line 51

Methods

(static) trainClassifier(input, numClasses, categoricalFeaturesInfo, numTrees, featureSubsetStrategy, impurity, maxDepth, maxBins, seed)

Method to train a decision tree model for binary or multiclass classification.

Parameters:

Name	Type	Description
`input`	module:eclairjs.RDD	Training dataset: RDD of [[org.apache.spark.mllib.regression.LabeledPoint]]. Labels should take values {0, 1, ..., numClasses-1}.
`numClasses`	Int	number of classes for classification.
`categoricalFeaturesInfo`	Object	Map storing arity of categorical features. E.g., an entry (n -> k) indicates that feature n is categorical with k categories indexed from 0: {0, 1, ..., k-1}.
`numTrees`	Int	Number of trees in the random forest.
`featureSubsetStrategy`	String	Number of features to consider for splits at each node. Supported: "auto", "all", "sqrt", "log2", "onethird". If "auto" is set, this parameter is set based on numTrees: if numTrees == 1, set to "all"; if numTrees > 1 (forest) set to "sqrt".
`impurity`	String	Criterion used for information gain calculation. Supported values: "gini" (recommended) or "entropy".
`maxDepth`	Int	Maximum depth of the tree. E.g., depth 0 means 1 leaf node; depth 1 means 1 internal node + 2 leaf nodes. (suggested value: 4)
`maxBins`	Int	maximum number of bins used for splitting features (suggested value: 100)
`seed`		Random seed for bootstrapping and choosing feature subsets.

Source:

eclairjs/mllib/tree/RandomForest.js, line 127

Returns:

a random forest model that can be used for prediction

(static) trainRegressor(input, categoricalFeaturesInfo, numTrees, featureSubsetStrategy, impurity, maxDepth, maxBins, seed) → {module:eclairjs/mllib/tree/model.RandomForestModel}

Method to train a decision tree model for regression.

Parameters:

Name	Type	Description
`input`	module:eclairjs.RDD	Training dataset: RDD of [[org.apache.spark.mllib.regression.LabeledPoint]]. Labels are real numbers.
`categoricalFeaturesInfo`	Object	Map storing arity of categorical features. E.g., an entry (n -> k) indicates that feature n is categorical with k categories indexed from 0: {0, 1, ..., k-1}.
`numTrees`	Number	Number of trees in the random forest.
`featureSubsetStrategy`	Number	Number of features to consider for splits at each node. Supported: "auto", "all", "sqrt", "log2", "onethird". If "auto" is set, this parameter is set based on numTrees: if numTrees == 1, set to "all"; if numTrees > 1 (forest) set to "onethird".
`impurity`		Criterion used for information gain calculation. Supported values: "variance".
`maxDepth`	Int	Maximum depth of the tree. E.g., depth 0 means 1 leaf node; depth 1 means 1 internal node + 2 leaf nodes. (suggested value: 4)
`maxBins`	Int	maximum number of bins used for splitting features (suggested value: 100)
`seed`	Number	Random seed for bootstrapping and choosing feature subsets.

Source:

eclairjs/mllib/tree/RandomForest.js, line 78

Returns:

a random forest model that can be used for prediction

Type: module:eclairjs/mllib/tree/model.RandomForestModel