Random Forests Training Algorithm

Context:
- It can be implemented by a Random Forests Training System (that solves an RF training task to produce an RF model).
- It can take hyperparameters of: RF mtry, ...
Example(s):
- (Robnik-Šikonja, 2004).
- (Breiman, 2001).
- …
Counter-Example(s):
- an Gradient Boosted Trees Training Algorithm.
- an AdaBoost Training Algorithm.
See: Boosting Algorithm, Decision Tree Training Algorithm.

References

(Patel et al., 2015) ⇒ Ankit B. Patel, Tan Nguyen, and Richard G. Baraniuk. (2015). “A Probabilistic Theory of Deep Learning.” In: arXiv:1504.00641 Journal.
- QUOTE: Furthermore, by relaxing the generative model to a discriminative one, we can recover two of the current leading deep learning systems, deep convolutional neural networks (DCNs) and random decision forests (RDFs), providing insights into their successes and shortcomings as well as a principled route to their improvement.

http://en.wikipedia.org/wiki/Random_forest
- Random forests are an ensemble learning method for classification (and regression) that operate by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes output by individual trees. The algorithm for inducing a random forest was developed by Leo Breiman^[1] and Adele Cutler, and "Random Forests" is their trademark. The term came from random decision forests that was first proposed by Tin Kam Ho of Bell Labs in 1995. The method combines Breiman's “bagging” idea and the random selection of features, introduced independently by Ho^[2]^[3] and Amit and Geman^[4] in order to construct a collection of decision trees with controlled variation.
  The selection of a random subset of features is an example of the random subspace method, which, in Ho's formulation, is a way to implement stochastic discrimination^[5] proposed by Eugene Kleinberg.

↑ Breiman, Leo (2001). "Random Forests". Machine Learning 45 (1): 5–32. doi:10.1023/A:1010933404324.
↑ Template:Cite conference
↑ Ho, Tin Kam (1998). "The Random Subspace Method for Constructing Decision Forests". IEEE Transactions on Pattern Analysis and Machine Intelligence 20 (8): 832–844. doi:10.1109/34.709601. http://cm.bell-labs.com/cm/cs/who/tkh/papers/df.pdf.
↑ Amit, Yali; Geman, Donald (1997). "Shape quantization and recognition with randomized trees". Neural Computation 9 (7): 1545–1588. doi:10.1162/neco.1997.9.7.1545. http://www.cis.jhu.edu/publications/papers_in_database/GEMAN/shape.pdf.
↑ Kleinberg, Eugene (1996). "An Overtraining-Resistant Stochastic Modeling Method for Pattern Recognition". Annals of Statistics 24 (6): 2319–2349. doi:10.1214/aos/1032181157. MR 1425956. http://kappa.math.buffalo.edu/aos.pdf.

(Sammut & Webb, 2011) ⇒ Claude Sammut, and Geoffrey I. Webb. (2011). “Random Forests.” In: (Sammut & Webb, 2011) p.828
(Verikas et al., 2011) ⇒ Antanas Verikas, Adas Gelzinis, and Marija Bacauskiene. (2011). “Mining Data with Random Forests: A Survey and Results of New Tests.” In: Pattern Recognition Journal, 44(2). doi:10.1016/j.patcog.2010.08.011

(Bühlmann, 2005) ⇒ Peter Bühlmann. (2005). “16.1 An Introduction to Ensemble Methods." website
- QUOTE: Random forests (Breiman, 2001) is a very different ensemble method than bagging or boosting. The earliest random forest proposal is from Amit and Geman (Amit & Geman, 1997). From the perspective of prediction, random forests is about as good as boosting, and often better than bagging. For further details about random forests we refer to (Breiman, 2001).

(Robnik-Šikonja, 2004) ⇒ Marko Robnik-Šikonja. (2004). “Improving Random Forests.” In: Proceedings of the 15th European Conference on Machine Learning (ECML 2004).

(Breiman, 2001) ⇒ Leo Breiman. (2001). “Random Forests.” In: Machine Learning, 45(1). doi:10.1023/A:1010933404324
- QUOTE: Random forests are a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest. The generalization error for forests converges as to a limit as the number of trees in the forest becomes large. The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation between them. Using a random selection of features to split each node yields error rates that compare favorably to Adaboost ...