# Decision Tree Ensemble Learning System

A Decision Tree Ensemble Learning System is an ensemble-based learning system that applies a decision tree ensemble learning algorithm to solve a decision tree ensemble learning task (to produce a decision tree ensemble.

## References

### 2017b

• (Wikipedia, 2017) ⇒ https://en.wikipedia.org/wiki/Decision_tree_learning#Decision_tree_types Retrieved:2017-10-15.
• Decision trees used in data mining are of two main types:
• Classification tree analysis is when the predicted outcome is the class to which the data belongs.
• Regression tree analysis is when the predicted outcome can be considered a real number (e.g. the price of a house, or a patient's length of stay in a hospital).
• The term Classification And Regression Tree (CART) analysis is an umbrella term used to refer to both of the above procedures, first introduced by Breiman et al.[1] Trees used for regression and trees used for classification have some similarities - but also some differences, such as the procedure used to determine where to split.[1]

Some techniques, often called ensemble methods, construct more than one decision tree:

• Boosted trees Incrementally building an ensemble by training each new instance to emphasize the training instances previously mis-modeled. A typical example is AdaBoost. These can be used for regression-type and classification-type problems. [2] [3]
• Bootstrap aggregated (or bagged) decision trees, an early ensemble method, builds multiple decision trees by repeatedly resampling training data with replacement, and voting the trees for a consensus prediction. [4]
• Rotation forest - in which every decision tree is trained by first applying principal component analysis (PCA) on a random subset of the input features. [5]

A special case of a decision tree is a decision list, which is a one-sided decision tree, so that every internal node has exactly 1 leaf node and exactly 1 internal node as a child (except for the bottommost node, whose only child is a single leaf node). While less expressive, decision lists are arguably easier to understand than general decision trees due to their added sparsity, permit non-greedy learning methods and monotonic constraints to be imposed.

### 2017c

• (Brown, 2017) ⇒ Gavin Brown. (2017). "Ensemble Learning." In: "Encyclopedia of Machine Learning and Data Mining"(Editors: Claude Sammut, Geoffrey I. Webb) pp 393-402
• QUOTE: Algorithms for Learning a Set of Models

If we had a committee of people taking decisions, it is self-evident that we would not want them all to make the same bad judgments at the same time. With a committee of learning models, the same intuition applies: we will have no gain from combining a set of identical models. We wish the models to exhibit a certain element of “diversity” in their group behavior, though still retaining good performance individually.

We therefore make a distinction between two types of ensemble learning algorithms, those which encourage diversity implicitly, and those which encourage it explicitly. The vast majority of ensemble methods are implicit, in that they provide different random subsets of the training data to each learner. Diversity is encouraged “implicitly” by random sampling of the data space: at no point is a measurement taken to ensure diversity will emerge. The random differences between the datasets might be in the selection of examples (the Bagging algorithm), the selection of features (Random Subspace Method, Ho (1998) or Rotation Forests, Rodriguez et al. 2006), or combinations of the two (the Random Forests algorithm, Breiman 2001). Many other “randomization” schemes are of course possible.

An alternative is to explicitly encourage diversity, constructing each ensemble member with some measurement ensuring that it is substantially different from the other members. Boosting algorithms achieve this by altering the distribution of training examples for each learner such that it is encouraged to make more accurate predictions where previous predictors have made errors. The DECORATE algorithm (Melville and Mooney 2005) explicitly alters the distribution of class labels, such that successive models are forced to learn different answers to the same problem. Negative correlation learning (see Brown 2004; Brown et al. 2005), includes a penalty term when learning each ensemble member, explicitly managing the accuracy-diversity trade-off.

In general, ensemble methods constitute a large class of algorithms – some based on heuristics, and some on sound learning-theoretic principles. The three algorithms that have received the most attention in the literature are reviewed here. It should be noted that we present only the most basic form of each; numerous modifications have been proposed for a variety of learning scenarios. As further study the reader is referred to the many comprehensive surveys of the field (Brown et al. 2005; Kuncheva 2004b; Polikar 2006).

1. Breiman, Leo; Friedman, J. H.; Olshen, R. A.; Stone, C. J. (1984). Classification and regression trees. Monterey, CA: Wadsworth & Brooks/Cole Advanced Books & Software. ISBN 978-0-412-04841-8.
2. Friedman, J. H. (1999). Stochastic gradient boosting. Stanford University.
3. Hastie, T., Tibshirani, R., Friedman, J. H. (2001). The elements of statistical learning : Data mining, inference, and prediction. New York: Springer Verlag.
4. Breiman, L. (1996). Bagging Predictors. "Machine Learning, 24": pp. 123-140.
5. Rodriguez, J.J. and Kuncheva, L.I. and Alonso, C.J. (2006), Rotation forest: A new classifier ensemble method, IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(10):1619-1630.