Jump to: navigation, search

A sklearn.ensemble.RandomForestRegressor is an Random Forest Regression System within sklearn.ensemble module.

1) Import Random Forest Regression System from scikit-learn : from sklearn.ensemble import RandomForestRegressor
2) Create design matrix X and response vector Y
3) Create Random Forest Regressor object: RFR=RandomForestRegressor(n_estimators=10, criterion=’gini’[, max_depth=None, min_samples_split=2, ...])
4) Choose method(s):
  • apply(X), applies trees in the forest to X, return leaf indices.
  • decision_path(X), returns the decision path in the forest
  • fit(X, y[, sample_weight]), builds a forest of trees from the training set (X, y).
  • get_params([deep]), gets parameters for this estimator.
  • predict(X), predicts regression target for X.
  • score(X, y[, sample_weight]), returns the coefficient of determination R^2 of the prediction.
  • set_params(**params), sets the parameters of this estimator.



  • (Scikit Learn, 2017) ⇒ Retrieved: 2017-10-22.
    • QUOTE: class sklearn.ensemble.RandomForestRegressor(n_estimators=10, criterion=’mse’, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features=’auto’, max_leaf_nodes=None, min_impurity_decrease=0.0, min_impurity_split=None, bootstrap=True, oob_score=False, n_jobs=1, random_state=None, verbose=0, warm_start=False)

      A random forest regressor.

      A random forest is a meta estimator that fits a number of classifying decision trees on various sub-samples of the dataset and use averaging to improve the predictive accuracy and control over-fitting. The sub-sample size is always the same as the original input sample size but the samples are drawn with replacement if bootstrap=True (default).

      Read more in the User Guide.


  • (Scikit Learn, 2017) ⇒ Retrieved: 2017-10-22.
    • QUOTE: In random forests (see RandomForestClassifier and RandomForestRegressor classes), each tree in the ensemble is built from a sample drawn with replacement (i.e., a bootstrap sample) from the training set. In addition, when splitting a node during the construction of the tree, the split that is chosen is no longer the best split among all features. Instead, the split that is picked is the best split among a random subset of the features. As a result of this randomness, the bias of the forest usually slightly increases (with respect to the bias of a single non-random tree) but, due to averaging, its variance also decreases, usually more than compensating for the increase in bias, hence yielding an overall better model.

      In contrast to the original publication B2001, the scikit-learn implementation combines classifiers by averaging their probabilistic prediction, instead of letting each classifier vote for a single class.


  • (Wikipedia, 2017) ⇒ Retrieved:2017-10-22.
    • Random forests or random decision forests[1][2] are an ensemble learning method for classification, regression and other tasks, that operate by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees. Random decision forests correct for decision trees' habit of overfitting to their training set. The first algorithm for random decision forests was created by Tin Kam Ho[1] using the random subspace method,[2] which, in Ho's formulation, is a way to implement the "stochastic discrimination" approach to classification proposed by Eugene Kleinberg.[3] [4] [5]

      An extension of the algorithm was developed by Leo Breiman[6] and Adele Cutler,[7] and "Random Forests" is their trademark. [8] The extension combines Breiman's “bagging” idea and random selection of features, introduced first by Ho[1] and later independently by Amit and Geman[9] in order to construct a collection of decision trees with controlled variance.


  1. 1.0 1.1 1.2 Ho, Tin Kam (1995). Random Decision Forests (PDF). Proceedings of the 3rd International Conference on Document Analysis and Recognition, Montreal, QC, 14–16 August 1995. pp. 278–282.
  2. 2.0 2.1 Ho, Tin Kam (1998). "The Random Subspace Method for Constructing Decision Forests" (PDF). IEEE Transactions on Pattern Analysis and Machine Intelligence. 20 (8): 832–844. doi:10.1109/34.709601.
  3. Kleinberg, Eugene (1996). "An Overtraining-Resistant Stochastic Modeling Method for Pattern Recognition" (PDF). Annals of Statistics. 24 (6): 2319–2349. MR 1425956. doi:10.1214/aos/1032181157.
  4. Kleinberg, Eugene (2000). "On the Algorithmic Implementation of Stochastic Discrimination" (PDF). IEEE Transactions on PAMI. 22 (5)
  5. .Kleinberg, Eugine. "Stochastic Discrimination and its Implementation".
  6. Breiman, Leo (2001). "Random Forests". Machine Learning. 45 (1): 5–32. doi:10.1023/A:1010933404324.
  7. Liaw, Andy (16 October 2012). "Documentation for R package randomForest" (PDF). Retrieved 15 March 2013.
  8. U.S. trademark registration number 3185828, registered 2006/12/19.
  9. Amit, Yali; Geman, Donald (1997). "Shape quantization and recognition with randomized trees" (PDF). Neural Computation. 9 (7): 1545–1588. doi:10.1162/neco.1997.9.7.1545.