Decision Tree Ensemble Learning Algorithm

A Decision Tree Ensemble Learning Algorithm is an ensemble learning algorithm that can be applied by a Decision Tree Ensemble Learning System.

- …
Example(s)
Counter-Example(s)
See: Decision Tree Ensemble, Random Forests.

References

2017a

(Scikit Learn, 2017) ⇒ "Ensemble Methods" in http://scikit-learn.org/stable/modules/ensemble.html Retrieved: 2017-10-15
- QUOTE: The goal of ensemble methods is to combine the predictions of several base estimators built with a given [[learning algorithm[[in order to improve generalizability/robustness over a single estimator.
  Two families of ensemble methods are usually distinguished:
  In averaging methods, the driving principle is to build several estimators independently and then to average their predictions. On average, the combined estimator is usually better than any of the single base estimator because its variance is reduced.
  Examples: Bagging methods, Forests of randomized trees, …
  By contrast, in boosting methods, base estimators are built sequentially and one tries to reduce the bias of the combined estimator. The motivation is to combine several weak models to produce a powerful ensemble.
  Examples: AdaBoost, Gradient Tree Boosting, …

2017b

(Wikipedia, 2017) ⇒ https://en.wikipedia.org/wiki/Decision_tree_learning#Decision_tree_types Retrieved:2017-10-15.
- … Some techniques, often called ensemble methods, construct more than one decision tree:
  - Boosted trees Incrementally building an ensemble by training each new instance to emphasize the training instances previously mis-modeled. A typical example is AdaBoost. These can be used for regression-type and classification-type problems. ^[1] ^[2]
  - Bootstrap aggregated (or bagged) decision trees, an early ensemble method, builds multiple decision trees by repeatedly resampling training data with replacement, and voting the trees for a consensus prediction. ^[3]
    - A random forest classifier is a specific type of bootstrap aggregating
  - Rotation forest - in which every decision tree is trained by first applying principal component analysis (PCA) on a random subset of the input features. ^[4]
    A special case of a decision tree is a decision list, which is a one-sided decision tree, so that every internal node has exactly 1 leaf node and exactly 1 internal node as a child (except for the bottommost node, whose only child is a single leaf node). While less expressive, decision lists are arguably easier to understand than general decision trees due to their added sparsity, permit non-greedy learning methods and monotonic constraints to be imposed.

2017c

(Brown, 2017) ⇒ Gavin Brown. (2017). "Ensemble Learning." In: "Encyclopedia of Machine Learning and Data Mining"(Editors: Claude Sammut, Geoffrey I. Webb) pp 393-402
- QUOTE: Algorithms for Learning a Set of Models
  If we had a committee of people taking decisions, it is self-evident that we would not want them all to make the same bad judgments at the same time. With a committee of learning models, the same intuition applies: we will have no gain from combining a set of identical models. We wish the models to exhibit a certain element of “diversity” in their group behavior, though still retaining good performance individually.
  We therefore make a distinction between two types of ensemble learning algorithms, those which encourage diversity implicitly, and those which encourage it explicitly. The vast majority of ensemble methods are implicit, in that they provide different random subsets of the training data to each learner. Diversity is encouraged “implicitly” by random sampling of the data space: at no point is a measurement taken to ensure diversity will emerge. The random differences between the datasets might be in the selection of examples (the Bagging algorithm), the selection of features (Random Subspace Method, Ho (1998) or Rotation Forests, Rodriguez et al. 2006), or combinations of the two (the Random Forests algorithm, Breiman 2001). Many other “randomization” schemes are of course possible.
  An alternative is to explicitly encourage diversity, constructing each ensemble member with some measurement ensuring that it is substantially different from the other members. Boosting algorithms achieve this by altering the distribution of training examples for each learner such that it is encouraged to make more accurate predictions where previous predictors have made errors. The DECORATE algorithm (Melville and Mooney 2005) explicitly alters the distribution of class labels, such that successive models are forced to learn different answers to the same problem. Negative correlation learning (see Brown 2004; Brown et al. 2005), includes a penalty term when learning each ensemble member, explicitly managing the accuracy-diversity trade-off.
  In general, ensemble methods constitute a large class of algorithms – some based on heuristics, and some on sound learning-theoretic principles. The three algorithms that have received the most attention in the literature are reviewed here. It should be noted that we present only the most basic form of each; numerous modifications have been proposed for a variety of learning scenarios. As further study the reader is referred to the many comprehensive surveys of the field (Brown et al. 2005; Kuncheva 2004b; Polikar 2006).

1997

(Amit & Geman, 1997) ⇒ Yali Amit, and Donald Geman. (1997). “Shape Quantization and Recognition with Randomized Trees." Neural Computation, 9(7). doi:10.1162/neco.1997.9.7.1545
- ttp://scholar.google.com/scholar?q=%22Shape+Quantization+and+Recognition+with+Randomized+Trees%22+1997
- QUOTE: We explore a new approach to shape recognition based on a virtually infinite family of binary features (queries) of the image data, designed to accommodate prior information about shape invariance and regularity. …

↑ Friedman, J. H. (1999). Stochastic gradient boosting. Stanford University.
↑ Hastie, T., Tibshirani, R., Friedman, J. H. (2001). The elements of statistical learning : Data mining, inference, and prediction. New York: Springer Verlag.
↑ Breiman, L. (1996). Bagging Predictors. “Machine Learning, 24": pp. 123-140.
↑ Rodriguez, J.J. and Kuncheva, L.I. and Alonso, C.J. (2006), Rotation forest: A new classifier ensemble method, IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(10):1619-1630.

[1] Friedman, J. H. (1999). Stochastic gradient boosting. Stanford University.

[2] Hastie, T., Tibshirani, R., Friedman, J. H. (2001). The elements of statistical learning : Data mining, inference, and prediction. New York: Springer Verlag.

[3] Breiman, L. (1996). Bagging Predictors. “Machine Learning, 24": pp. 123-140.

[4] Rodriguez, J.J. and Kuncheva, L.I. and Alonso, C.J. (2006), Rotation forest: A new classifier ensemble method, IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(10):1619-1630.

[1]

[2]

[3]

[4]

Decision Tree Ensemble Learning Algorithm

References

2017a

2017b

2017c

1997

Navigation menu

Search