Bagging Algorithm
From GM-RKB
A bagging algorithm is an ensemble algorithm where bootstrap samples.
- AKA: Bagging, Bootstrap Aggregating.
- Context
- It can be a Bagging Classification Algorithm or a Bagging Regression Algorithm.
- It can reduces variance over the base algorithm while only slightly increasing bias. (Domingos, 2012)
- Counter-Example(s):
- See: Wagging Algorithm; Bootstrapping.
References
- http://dsg.harvard.edu/courses/hst951/ppt/Bagging.ppt
- http://www.umiacs.umd.edu/~shaohua/enee698a_f03/bagging.ppt
- http://faculty.washington.edu/fxia/courses/LING572/bagging.ppt
2012
- (Domingos, 2012) ⇒ Pedro Domingos. (2012). "A Few Useful Things to Know About Machine Learning." In: Communications of the ACM Journal, 55(10). doi:10.1145/2347736.2347755
- ... In the simplest technique, called bagging, we simply generate random variations of the training set by resampling, learn a classifier on each, and combine the results by voting. This works because it greatly reduces variance while only slightly increasing bias.
2011
- (Sammut & Webb, 2011) ⇒ Claude Sammut (editor), and Geoffrey I. Webb (editor). (2011). "Bagging." In: (Sammut & Webb, 2011) p.73
- (Wikipedia, 2011) ⇒ http://en.wikipedia.org/wiki/Bootstrap_aggregating
- Bootstrap aggregating (bagging) is a machine learning ensemble meta-algorithm to improve machine learning of classification and regression models in terms of stability and classification accuracy. It also reduces variance and helps to avoid overfitting. Although it is usually applied to decision tree models, it can be used with any type of model. Bagging is a special case of the model averaging approach.
Given a standard training set D of size n, bagging generates m new training sets \(D_i\), each of size n ≤ n, by sampling examples from D uniformly and with replacement. By sampling with replacement, it is likely that some examples will be repeated in each \(D_i\). If n=n, then for large n the set \(D_i\) is expected to have 63.2% of the unique examples of D, the rest being duplicates. This kind of sample is known as a bootstrap sample. The m models are fitted using the above m bootstrap samples and combined by averaging the output (for regression) or voting (for classification).
Since the method averages several predictors, it is not useful for improving linear models. Similarly, bagging does not improve very stable models like k nearest neighbors.
- Bootstrap aggregating (bagging) is a machine learning ensemble meta-algorithm to improve machine learning of classification and regression models in terms of stability and classification accuracy. It also reduces variance and helps to avoid overfitting. Although it is usually applied to decision tree models, it can be used with any type of model. Bagging is a special case of the model averaging approach.
2005
- (Bühlmann, 2005) ⇒ Peter Bühlmann. (2005). "16.2 Bagging and Related Methods." website
- QUOTE: Bagging (Breiman, 1996), a sobriquet for bootstrap aggregating, is an ensemble method for improving unstable estimation or classification schemes. Breiman (Breiman, 1996) motivated bagging as a variance reduction technique for a given base procedure, such as decision trees or methods that do variable selection and fitting in a linear model. It has attracted much attention, probably due to its implementational simplicity and the popularity of the bootstrap methodology. At the time of its invention, only heuristic arguments were presented why bagging would work. Later, it has been shown in (Bühlmann & Yu, 2002) that bagging is a smoothing operation which turns out to be advantageous when aiming to improve the predictive performance of regression or classification trees. In case of decision trees, the theory in (Bühlmann & Yu, 2002) confirms Breiman's intuition that bagging is a variance reduction technique, reducing also the mean squared error (MSE). The same also holds for subagging (subsample aggregating), defined in Sect. 16.2.3, which is a computationally cheaper version than bagging. However, for other (even complex) base procedures, the variance and MSE reduction effect of bagging is not necessarily true; this has also been shown in (Buja & Stuetzle, 2002) for the simple case where the estimator is a $ U$-statistics.
2002
- (Buja & Stuetzle, 2002) ⇒ Andreas Buja, and Werner Stuetzle. (2002). "Observations on Bagging." Preprint (2002). Available from http://ljsavage.wharton.upenn.edu/~buja See: (Buja & Stuetzle, 2006).
2003
- (Chang & al, 2003) ⇒ E.Y. Chang, B. Li, G. Wu, and K. Goh. (2003). "Statistical Learning for Effective Visual Information Retrieval." In: Proceedings 2003 International IEEE Conference on Image Processing (ICIP 2003).
- QUOTE: Bagging subsamples training data into a number of bags, trains each bag, and aggregates the decisions of the bags to make final class predictions.
2002
- (Bühlmann & Yu, 2002) ⇒ Peter Bühlmann, and B. Yu. (2002). "Analyzing Bagging." In: Annals of Statistics 30.
1999
- (Bauer & Kohavi, 1999) ⇒ Eric Bauer, and Ron Kohavi. (1999). "An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting and Variants." In: Machine Learning, 36(1-2).
1996
- (Breiman, 1996) ⇒ Leo Breiman. (1996). "Bagging Predictors." In: Machine Learning, 24.