Bagged Trees System

Jump to: navigation, search

A Bagged Trees System is a decision tree ensemble learning system that applies a Bagged Trees Algorithm to solve a Bagged Trees Task.




  • (Sammut & Webb, 2017) ⇒ "Ensemble Learning". In: "Encyclopedia of Machine Learning and Data Mining"(Editors: Claude Sammut, Geoffrey I. Webb) pp 393-402.
    • QUOTE: In the Bagging algorithm (Breiman 1996) each member of the ensemble is constructed from a different training dataset, and the predictions combined either by uniform averaging or voting over class labels. Each dataset is generated by sampling from the total N data examples, choosing N items uniformly at random with replacement. Each sample is known as a bootstrap; the name Bagging is an acronym derived from Bootstrap AGGregatING. Since a bootstrap samples N items uniformly at random with replacement, the probability of any individual data item not being selected is [math]p = (1 − 1∕N)^N [/math]. Therefore with large N, a single bootstrap is expected to contain approximately 63. 2 % of the original set, while 36. 8 % of the originals are not selected.

      Like many ensemble methods, Bagging works best with unstable models, that is those that produce differing generalization behavior with small changes to the training data. These are also known as high variance models, examples of which are decision trees and neural networks. Bagging therefore tends not to work well with very simple models. In effect, Bagging samples randomly from the space of possible models to make up the ensemble – with very simple models the sampling produces almost identical (low diversity) predictions.

      Despite its apparent capability for variance reduction, situations have been demonstrated where Bagging can converge without affecting variance (see Brown et al. 2005). Several other explanations have been proposed for Bagging’s success, including links to Bayesian model averaging. In summary, it seems that several years from its introduction, despite its apparent simplicity, Bagging is still not fully understood.


  • (Scikit Learn, 2017) ⇒ Retrieved: 2017-10-22.
    • QUOTE: In ensemble algorithms, bagging methods form a class of algorithms which build several instances of a black-box estimator on random subsets of the original training set and then aggregate their individual predictions to form a final prediction. These methods are used as a way to reduce the variance of a base estimator (e.g., a decision tree), by introducing randomization into its construction procedure and then making an ensemble out of it. In many cases, bagging methods constitute a very simple way to improve with respect to a single model, without making it necessary to adapt the underlying base algorithm. As they provide a way to reduce overfitting, bagging methods work best with strong and complex models (e.g., fully developed decision trees), in contrast with boosting methods which usually work best with weak models (e.g., shallow decision trees).

      Bagging methods come in many flavours but mostly differ from each other by the way they draw random subsets of the training set:

      • When random subsets of the dataset are drawn as random subsets of the samples, then this algorithm is known as Pasting [B1999].
      • When samples are drawn with replacement, then the method is known as Bagging [B1996].
      • When random subsets of the dataset are drawn as random subsets of the features, then the method is known as Random Subspaces [H1998].
Finally, when base estimators are built on subsets of both samples and features, then the method is known as Random Patches [LG2012].
In scikit-learn, bagging methods are offered as a unified BaggingClassifier meta-estimator (resp. BaggingRegressor), taking as input a user-specified base estimator along with parameters specifying the strategy to draw random subsets. In particular, max_samples and max_features control the size of the subsets (in terms of samples and features), while bootstrap and bootstrap_features control whether samples and features are drawn with or without replacement. When using a subset of the available samples the generalization accuracy can be estimated with the out-of-bag samples by setting oob_score=True. As an example, the snippet below illustrates how to instantiate a bagging ensemble of KNeighborsClassifier base estimators, each built on random subsets of 50% of the samples and 50% of the features.