AdaBoost System

From GM-RKB
Jump to navigation Jump to search

An AdaBoost System is an iterative boosting system that implements an AdaBoost algorithm to solve an AdaBoost Task.



References

2017a

  • (Wikipedia, 2017A) ⇒ https://en.wikipedia.org/wiki/AdaBoost Retrieved:2017-10-22.
    • AdaBoost, short for "Adaptive Boosting", is a machine learning meta-algorithm formulated by Yoav Freund and Robert Schapire who won the Gödel Prize in 2003 for their work. It can be used in conjunction with many other types of learning algorithms to improve their performance. The output of the other learning algorithms ('weak learners') is combined into a weighted sum that represents the final output of the boosted classifier. AdaBoost is adaptive in the sense that subsequent weak learners are tweaked in favor of those instances misclassified by previous classifiers. AdaBoost is sensitive to noisy data and outliers. In some problems it can be less susceptible to the overfitting problem than other learning algorithms. The individual learners can be weak, but as long as the performance of each one is slightly better than random guessing (e.g., their error rate is smaller than 0.5 for binary classification), the final model can be proven to converge to a strong learner.

      Every learning algorithm will tend to suit some problem types better than others, and will typically have many different parameters and configurations to be adjusted before achieving optimal performance on a dataset, AdaBoost (with decision trees as the weak learners) is often referred to as the best out-of-the-box classifier. When used with decision tree learning, information gathered at each stage of the AdaBoost algorithm about the relative 'hardness' of each training sample is fed into the tree growing algorithm such that later trees tend to focus on harder-to-classify examples.

2017b

2017c

  • (Brown, 2017) ⇒ Gavin Brown. (2017). "Ensemble Learning." In: "Encyclopedia of Machine Learning and Data Mining"(Editors: Claude Sammut, Geoffrey I. Webb) pp 393-402
    • QUOTE: Adaboost (Freund and Schapire 1996) is the most well known of the Boosting family of algorithms (Schapire 2003). The algorithm trains models sequentially, with a new model trained at each round. At the end of each round, mis-classified examples are identified and have their emphasis increased in a new training set which is then fed back into the start of the next round, and a new model is trained. The idea is that subsequent models should be able to compensate for errors made by earlier models.

      Adaboost occupies somewhat of a special place in the history of ensemble methods. Though the procedure seems heuristic, the algorithm is in fact grounded in a rich learning-theoretic body of literature. (Schapire 1990) addressed a question posed by Kearns and Valiant (1988) on the nature of two complexity classes of learning problems. The two classes are strongly learnable and weakly learnable problems. Schapire showed that these classes were equivalent; this had the corollary that a weak model, performing only slightly better than random guessing, could be “boosted” into an arbitrarily accurate strong model. The original Boosting algorithm was a proof by construction of this equivalence, though had a number of impractical assumptions built-in. The Adaboost algorithm (Freund and Schapire 1996) was the first practical Boosting method. The authoritative historical account of the development can be found in Schapire (1999), including discussion of numerous variants and interpretations of the algorithm. The procedure is shown in Algorithm 2. Some similarities with Bagging are evident; a key differences is that at each round t, Bagging has a uniform distribution D t , while Adaboost adapts a nonuniform distribution.

2014

  • http://en.wikibooks.org/wiki/Data_Mining_Algorithms_In_R/Classification/adaboost#Implementation
    • Adaboost is part of ada package. In this section you find more information about installing and using it on R Environment. Type the following commands in R console to install and load the ada package:
      install.packages("ada")
      library("rpart")
      library("ada")

      The function used to execute the algorithm adaboost is:
      ada(x, y,test.x,test.y=NULL, loss=c("exponential","logistic"), type=c("discrete", "real", "gentle"), iter=50, nu=0.1, bag.frac=model.coef=TRUE, bag.shift=FALSE, max.iter=20, delta=10^(-10), verbose=...,na.action=na.rpart)


  1. Y. Freund, and R. Schapire, “A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting”, 1997
  2. T. Hastie, R. Tibshirani and J. Friedman, “Elements of Statistical Learning Ed. 2”, Springer, 2009.
  3. J. Zhu, H. Zou, S. Rosset, T. Hastie. “Multi-class AdaBoost”, 2009.
  4. H.Drucker. “Improving Regressors using Boosting Techniques”, 1997.