2001 ReducingMulticlassToBinary

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Multiclass Classification Task.

Notes

Cited By

~792 http://scholar.google.com/scholar?cites=1569736950810018871

Quotes

Abstract

  • We present a unifying framework for studying the solution of multiclass categorization problems by reducing them to multiple binary problems that are then solved using a margin-based binary learning algorithm. The proposed framework unifies some of the most popular approaches in which each class is compared against all others, or in which all pairs of classes are compared to each other, or in which output codes with error-correcting properties are used. We propose a general method for combining the classifiers generated on the binary problems, and we prove a general empirical multiclass loss bound given the empirical loss of the individual binary learning algorithms. The scheme and the corresponding bounds apply to many popular classification learning algorithms including support-vector machines, AdaBoost, regression, logistic regression and decision-tree algorithms. We also give a multiclass generalization error analysis for general output codes with AdaBoost as the binary learner. Experimental results with SVM and AdaBoost show that our scheme provides a viable alternative to the most commonly used multiclass algorithms.

References

  • 1. Bartlett, P. L. (1998). The sample complexity of pattern classification with neural networks: the size of the weights is more important than the size of the network. IEEE Transactions on Information Theory, 44(2), 525-536.
  • 2. Breiman, L. (1997a). Arcing the edge. Tech. rep. 486, Statistics Department, University of California at Berkeley.
  • 3. Breiman, L. (1997b). Prediction games and arcing classifiers. Tech. rep. 504, Statistics Department, University of California at Berkeley.
  • 4. Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classification and Regression Trees. Wadsworth & Brooks.
  • 5. Michael Collins, Robert E. Schapire, Yoram Singer, Logistic Regression, AdaBoost and Bregman Distances, Proceedings of the Thirteenth Annual Conference on Computational Learning Theory, p.158-169, June 28-July 01, 2000
  • 6. Corinna Cortes, Vladimir Vapnik, Support-Vector Networks, Machine Learning, v.20 n.3, p.273-297, Sept. 1995 doi:10.1023/A:1022627411411
  • 7. Koby Crammer, Yoram Singer, On the Learnability and Design of Output Codes for Multiclass Problems, Proceedings of the Thirteenth Annual Conference on Computational Learning Theory, p.35-46, June 28-July 01, 2000
  • 8. Csiszár, I., & Tusnády, G. (1984). Information geometry and alternating minimization procedures. Statistics and Decisions, Supplement Issue, 1, 205-237.
  • 9. Stephen Della Pietra, Vincent Della Pietra, John Lafferty, Inducing Features of Random Fields, IEEE Transactions on Pattern Analysis and Machine Intelligence, v.19 n.4, p.380-393, April 1997 doi:10.1109/34.588021
  • 10. Dietterich, T. G., & Bakiri, G. (1995). Solving multiclass learning problems via error-correcting output codes. Journal of Artificial Intelligence Research, 2, 263-286..
  • 11. Yoav Freund, An adaptive version of the boost by majority algorithm, Proceedings of the twelfth annual conference on Computational learning theory, p.102-113, July 07-09, 1999, Santa Cruz, California, United States doi:10.1145/307400.307419
  • 12. Yoav Freund, Robert E. Schapire, A decision-theoretic generalization of on-line learning and an application to boosting, Journal of Computer and System Sciences, v.55 n.1, p.119-139, Aug. 1997 doi:10.1006/jcss.1997.1504
  • 13. Friedman, J., Hastie, T., & Tibshirani, R. (2000). Additive logistic regression: a statistical view of boosting. The Annals of Statistics, 38(2), 337-374..
  • 14. Venkatesan Guruswami, Amit Sahai, Multiclass learning, boosting, and error-correcting codes, Proceedings of the twelfth annual conference on Computational learning theory, p.145-155, July 07-09, 1999, Santa Cruz, California, United States doi:10.1145/307400.307429
  • 15. Hastie, T., & Tibshirani, R. (1998). Classification by pairwise coupling. The Annals of Statistics, 26(2), 451-471.
  • 16. Hoeffding, W. (1963). Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58(301), 13-30..
  • 17. Klaus-Uwe Höffgen, Hans Ulrich Simon, Robust trainability of single neurons, Proceedings of the fifth annual workshop on Computational learning theory, p.428-439, July 27-29, 1992, Pittsburgh, Pennsylvania, United States doi:10.1145/130385.130431.
  • 18. Michael Kearns, Yishay Mansour, On the boosting ability of top-down decision tree learning algorithms, Proceedings of the twenty-eighth annual ACM symposium on Theory of computing, p.459-468, May 22-24, 1996, Philadelphia, Pennsylvania, United States doi:10.1145/237814.237994.
  • 19. John Lafferty, Additive models, boosting, and inference for generalized divergences, Proceedings of the twelfth annual conference on Computational learning theory, p.125-133, July 07-09, 1999, Santa Cruz, California, United States doi:10.1145/307400.307422
  • 20. Mason, L., Baxter, J., Bartlett, P., & Frean, M. (1999). Functional gradient techniques for combining hypotheses. In Smola, A. J., Bartlett, P. J., Schölkopf, B., & Schuurmans, D. (Eds.), Advances in Large Margin Classifiers. MIT Press.
  • 21. Merz, C. J., & Murphy, P. M. (1998). UCI repository of machine learning databases. www.ics.uci.edu/~mlearn/MLRepository.html.
  • 22. J. Ross Quinlan, C4.5: Programs for Machine Learning, Morgan Kaufmann Publishers Inc., San Francisco, CA, 1993
  • 23. G. Rätsch, T. Onoda, K.-R. Müller, Soft Margins for AdaBoost, Machine Learning, v.42 n.3, p.287-320, March 2001 doi:10.1023/A:1007618119488
  • 24. D. E. Rumelhart, G. E. Hinton, R. J. Williams, Learning internal representations by error propagation, Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations, MIT Press, Cambridge, MA, 1986
  • 25. Schapire, R. E., Freund, Y., Bartlett, P., & Lee, W. S. (1998). Boosting the margin: A new explanation for the effectiveness of voting methods. The Annals of Statistics, 26(5), 1651-1686.
  • 26. Robert E. Schapire, Yoram Singer, Improved Boosting Algorithms Using Confidence-rated Predictions, Machine Learning, v.37 n.3, p.297-336, Dec. 1999 doi:10.1023/A:1007614523901
  • 27. Schölkopf, B., Smola, A., Williamson, R., & Bartlett, P. (1998). New support vector algorithms. Tech. rep. NC2-TR-1998-053, NeuroColt2.
  • 28. Vladimir N. Vapnik, The nature of statistical learning theory, Springer-Verlag New York, Inc., New York, NY, 1995,


 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2001 ReducingMulticlassToBinaryRobert E. Schapire
Yoram Singer
Erin L. Allwein
Reducing Multiclass to Binary: a unifying approach for margin classifiersThe Journal of Machine Learning Researchhttp://www.ai.mit.edu/projects/jmlr/papers/volume1/allwein00a/allwein00a0.pdf10.1162/153244301527331332001