Boosting Learning Algorithm

A Boosting Learning Algorithm is a nonparametric learning algorithm/ensemble learning algorithm that makes repeated use of a weak learner to create a Strong Learner (ensemble model).

AKA: Boosted Algorithm.
Context:
- It can range from being a Boosted Classification Algorithm to being a Boosted Ranking Algorithm to being a Boosted Estimation Algorithm.
- It can range from being a One-Pass Boosting Algorithm to being a Iterative Boosting Algorithm.
- …
Example(s):
- a Boosted Trees Algorithm, such as AdaBoost.
- LogitBoost Algorithm.
- NormalBoost Algorithm.
- Boosted Lasso Algorithm.
- Gradient Boosting Algorithm, such as iterative gradient boosting.
- …
Counter-Example(s):
- a Bagging Algorithm, such as Random Forests.
- a Stacking Algorithm.
See: Bootstrapping Algorithm; Ensemble Algorithm; Large-Margin Algorithm; Additive Model, Meta-Algorithm.

References

2015

(Wikipedia, 2015) ⇒ http://en.wikipedia.org/wiki/Boosting_(machine_learning) Retrieved:2015-2-9.
- Boosting is a machine learning ensemble meta-algorithm for reducing bias primarily and also variance in supervised learning, and a family of machine learning algorithms which convert weak learners to strong ones. Boosting is based on the question posed by Kearns and Valiant (1988, 1989) :^[1] Can a set of weak learners create a single strong learner? A weak learner is defined to be a classifier which is only slightly correlated with the true classification (it can label examples better than random guessing). In contrast, a strong learner is a classifier that is arbitrarily well-correlated with the true classification. Robert Schapire's affirmative answer in a 1990 paper to the question of Kearns and Valiant has had significant ramifications in machine learning and statistics, most notably leading to the development of boosting. When first introduced, the hypothesis boosting problem simply referred to the process of turning a weak learner into a strong learner. “Informally, [the hypothesis boosting] problem asks whether an efficient learning algorithm […] that outputs a hypothesis whose performance is only slightly better than random guessing [i.e. a weak learner] implies the existence of an efficient algorithm that outputs an hypothesis of arbitrary accuracy [i.e. a strong learner]." Algorithms that achieve hypothesis boosting quickly became simply known as "boosting". Freund and Schapire's arcing (Adapt[at]ive Resampling and Combining), ^[2] as a general technique, is more or less synonymous with boosting. ^[3]

↑ Michael Kearns(1988); Thoughts on Hypothesis Boosting, Unpublished manuscript (Machine Learning class project, December 1988)
↑ Yoav Freund and Robert E. Schapire (1997); A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting, Journal of Computer and System Sciences, 55(1):119-139
↑ Leo Breiman (1998); Arcing Classifier (with Discussion and a Rejoinder by the Author), Annals of Statistics, vol. 26, no. 3, pp. 801-849: "The concept of weak learning was introduced by Kearns and Valiant (1988 , 1989 ), who left open the question of whether weak and strong learnability are equivalent. The question was termed the boosting problem since [a solution must] boost the low accuracy of a weak learner to the high accuracy of a strong learner. Schapire (1990) proved that boosting is possible. A boosting algorithm is a method that takes a weak learner and converts it into a strong learner. Freund and Schapire (1997) proved that an algorithm similar to arc-fs is boosting.

2012

(Domingos, 2012) ⇒ Pedro Domingos. (2012). “A Few Useful Things to Know About Machine Learning.” In: Communications of the ACM Journal, 55(10). doi:10.1145/2347736.2347755
- QUOTE: … In boosting, training examples have weights, and these are varied so that each new classifier focuses on the examples the previous ones tended to get wrong.

2011

(Sammut & Webb, 2011) ⇒ Claude Sammut (editor), and Geoffrey I. Webb (editor). (2011). “Boosting.” In: (Sammut & Webb, 2011) p.136

2009

(Freund, 2009) ⇒ Yoav Freund (2009). “Statistical Machine Learning (Boosting). Course: UC San Diego, CSE254, Winter 2009. http://seed.ucsd.edu/mediawiki/index.php/CSE254

2004

(Zhao & Yu, 2004) ⇒ Peng Zhao, and Bin Yu. (2004). “Boosted Lasso." Tech Report, Statistics Department, U. C. Berkeley.
- Another vastly popular and recent idea is Boosting. Since its inception in 1990 (Schapire, 1990; Freund, 1995; Freund and Schapire, 1996), it has become one of the most successful machine learning methods and is now understood as an iterative gradient descent method leading to an additive model (Breiman, 1998; Mason et al., 1999; Friedman et al., 2000).
(Melli, 2004) ⇒ Gabor Melli. (2004). Review of: Yoav Freund, and Robert E. Schapire, “A Short Introduction to Boosting”
- Kearns & Valiant (1989) proved that learners performing only slightly better than random, can be combined to form an arbitrarily good ensemble hypothesis.
- Schapire (1990) provided the first polynomial time Boosting algorithm.
- Freund (1995) “Boosting a weak learning algorithm by majority”
- Freund & Schapire (1995) AdaBoost. Solved many practical problems of boosting algorithms. “Ada” stands for adaptive.

2000

(Webb, 2000) ⇒ Geoffrey I. Webb. (2000). “MultiBoosting: A Technique for Combining Boosting and Wagging.” Machine Learning. 40(2).

1999

(Bauer & Kohavi, 1999) ⇒ Eric Bauer, and Ron Kohavi. (1999). “An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting and Variants.” In: Machine Learning, 36(1-2).
- Wagging Algorithm, Bagging Algorithm, Boosting Algorithm.
(Freund & Schapire, 1999) ⇒ Yoav Freund, and Robert E. Schapire. (1999). “A Short Introduction to Boosting.” In: Journal of Japanese Society for Artificial Intelligence, 14(5).
(Abney et al., 1999) ⇒ Steven Abney, Robert E. Schapire, and Yoram Singer. (1999) "Boosting Applied to Tagging and PP Attachment.” In: Proceedings of the 1999 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (EMNLP-VLC 1999)

1996

(Freund & Schapire, 1996) ⇒ Yoav Freund, and Robert E. Schapire. (1996). “Experiments with a New Boosting Algorithm.” In: Proceedings of the Thirteenth International Conference on Machine Learning (ICML 1996).

1995

(Freund, 1995) ⇒ Yoav Freund. (1995). “Boosting a Weak Learning Algorithm by Majority.” Information and Computation, 121.

1990

(Shapire, 1990) ⇒ Robert E. Schapire. (1990). “The Strength of Weak Learnability.” In: Machine Learning, 5(2).

1988

(Kearns, 1988) ⇒ Michael Kearns. (1988). “Thoughts on Hypothesis Boosting.” Unpublished manuscript.

[Kearns88-1] Michael Kearns(1988); Thoughts on Hypothesis Boosting, Unpublished manuscript (Machine Learning class project, December 1988)

[2] Yoav Freund and Robert E. Schapire (1997); A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting, Journal of Computer and System Sciences, 55(1):119-139

[3] Leo Breiman (1998); Arcing Classifier (with Discussion and a Rejoinder by the Author), Annals of Statistics, vol. 26, no. 3, pp. 801-849: "The concept of weak learning was introduced by Kearns and Valiant (1988 , 1989 ), who left open the question of whether weak and strong learnability are equivalent. The question was termed the boosting problem since [a solution must] boost the low accuracy of a weak learner to the high accuracy of a strong learner. Schapire (1990) proved that boosting is possible. A boosting algorithm is a method that takes a weak learner and converts it into a strong learner. Freund and Schapire (1997) proved that an algorithm similar to arc-fs is boosting.

[1]

[2]

[3]

Boosting Learning Algorithm

References

2015

2012

2011

2009

2004

2000

1999

1996

1995

1990

1988

Navigation menu

Search