2001 ReducingMulticlassToBinary

(Allwein et al., 2001) ⇒ Erin L. Allwein, Robert E. Schapire, Yoram Singer. (2001). “Reducing Multiclass to Binary: a unifying approach for margin classifiers.” In: The Journal of Machine Learning Research, 1. doi:10.1162/15324430152733133

Subject Headings: Multiclass Classification Task.

Notes

Cited By

~792 http://scholar.google.com/scholar?cites=1569736950810018871

Quotes

Abstract

We present a unifying framework for studying the solution of multiclass categorization problems by reducing them to multiple binary problems that are then solved using a margin-based binary learning algorithm. The proposed framework unifies some of the most popular approaches in which each class is compared against all others, or in which all pairs of classes are compared to each other, or in which output codes with error-correcting properties are used. We propose a general method for combining the classifiers generated on the binary problems, and we prove a general empirical multiclass loss bound given the empirical loss of the individual binary learning algorithms. The scheme and the corresponding bounds apply to many popular classification learning algorithms including support-vector machines, AdaBoost, regression, logistic regression and decision-tree algorithms. We also give a multiclass generalization error analysis for general output codes with AdaBoost as the binary learner. Experimental results with SVM and AdaBoost show that our scheme provides a viable alternative to the most commonly used multiclass algorithms.

References

1. Bartlett, P. L. (1998). The sample complexity of pattern classification with neural networks: the size of the weights is more important than the size of the network. IEEE Transactions on Information Theory, 44(2), 525-536.
2. Breiman, L. (1997a). Arcing the edge. Tech. rep. 486, Statistics Department, University of California at Berkeley.
3. Breiman, L. (1997b). Prediction games and arcing classifiers. Tech. rep. 504, Statistics Department, University of California at Berkeley.
4. Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classification and Regression Trees. Wadsworth & Brooks.
5. Michael Collins, Robert E. Schapire, Yoram Singer, Logistic Regression, AdaBoost and Bregman Distances, Proceedings of the Thirteenth Annual Conference on Computational Learning Theory, p.158-169, June 28-July 01, 2000
6. Corinna Cortes, Vladimir Vapnik, Support-Vector Networks, Machine Learning, v.20 n.3, p.273-297, Sept. 1995 doi:10.1023/A:1022627411411
7. Koby Crammer, Yoram Singer, On the Learnability and Design of Output Codes for Multiclass Problems, Proceedings of the Thirteenth Annual Conference on Computational Learning Theory, p.35-46, June 28-July 01, 2000
8. Csiszár, I., & Tusnády, G. (1984). Information geometry and alternating minimization procedures. Statistics and Decisions, Supplement Issue, 1, 205-237.
9. Stephen Della Pietra, Vincent Della Pietra, John Lafferty, Inducing Features of Random Fields, IEEE Transactions on Pattern Analysis and Machine Intelligence, v.19 n.4, p.380-393, April 1997 doi:10.1109/34.588021
10. Dietterich, T. G., & Bakiri, G. (1995). Solving multiclass learning problems via error-correcting output codes. Journal of Artificial Intelligence Research, 2, 263-286..
11. Yoav Freund, An adaptive version of the boost by majority algorithm, Proceedings of the twelfth annual conference on Computational learning theory, p.102-113, July 07-09, 1999, Santa Cruz, California, United States doi:10.1145/307400.307419
12. Yoav Freund, Robert E. Schapire, A decision-theoretic generalization of on-line learning and an application to boosting, Journal of Computer and System Sciences, v.55 n.1, p.119-139, Aug. 1997 doi:10.1006/jcss.1997.1504
13. Friedman, J., Hastie, T., & Tibshirani, R. (2000). Additive logistic regression: a statistical view of boosting. The Annals of Statistics, 38(2), 337-374..
14. Venkatesan Guruswami, Amit Sahai, Multiclass learning, boosting, and error-correcting codes, Proceedings of the twelfth annual conference on Computational learning theory, p.145-155, July 07-09, 1999, Santa Cruz, California, United States doi:10.1145/307400.307429
15. Hastie, T., & Tibshirani, R. (1998). Classification by pairwise coupling. The Annals of Statistics, 26(2), 451-471.
16. Hoeffding, W. (1963). Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58(301), 13-30..
17. Klaus-Uwe Höffgen, Hans Ulrich Simon, Robust trainability of single neurons, Proceedings of the fifth annual workshop on Computational learning theory, p.428-439, July 27-29, 1992, Pittsburgh, Pennsylvania, United States doi:10.1145/130385.130431.
18. Michael Kearns, Yishay Mansour, On the boosting ability of top-down decision tree learning algorithms, Proceedings of the twenty-eighth annual ACM symposium on Theory of computing, p.459-468, May 22-24, 1996, Philadelphia, Pennsylvania, United States doi:10.1145/237814.237994.
19. John Lafferty, Additive models, boosting, and inference for generalized divergences, Proceedings of the twelfth annual conference on Computational learning theory, p.125-133, July 07-09, 1999, Santa Cruz, California, United States doi:10.1145/307400.307422
20. Mason, L., Baxter, J., Bartlett, P., & Frean, M. (1999). Functional gradient techniques for combining hypotheses. In Smola, A. J., Bartlett, P. J., Schölkopf, B., & Schuurmans, D. (Eds.), Advances in Large Margin Classifiers. MIT Press.
21. Merz, C. J., & Murphy, P. M. (1998). UCI repository of machine learning databases. www.ics.uci.edu/~mlearn/MLRepository.html.
22. J. Ross Quinlan, C4.5: Programs for Machine Learning, Morgan Kaufmann Publishers Inc., San Francisco, CA, 1993
23. G. Rätsch, T. Onoda, K.-R. Müller, Soft Margins for AdaBoost, Machine Learning, v.42 n.3, p.287-320, March 2001 doi:10.1023/A:1007618119488
24. D. E. Rumelhart, G. E. Hinton, R. J. Williams, Learning internal representations by error propagation, Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations, MIT Press, Cambridge, MA, 1986
25. Schapire, R. E., Freund, Y., Bartlett, P., & Lee, W. S. (1998). Boosting the margin: A new explanation for the effectiveness of voting methods. The Annals of Statistics, 26(5), 1651-1686.
26. Robert E. Schapire, Yoram Singer, Improved Boosting Algorithms Using Confidence-rated Predictions, Machine Learning, v.37 n.3, p.297-336, Dec. 1999 doi:10.1023/A:1007614523901
27. Schölkopf, B., Smola, A., Williamson, R., & Bartlett, P. (1998). New support vector algorithms. Tech. rep. NC2-TR-1998-053, NeuroColt2.
28. Vladimir N. Vapnik, The nature of statistical learning theory, Springer-Verlag New York, Inc., New York, NY, 1995,

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2001 ReducingMulticlassToBinary	Robert E. Schapire Yoram Singer Erin L. Allwein			Reducing Multiclass to Binary: a unifying approach for margin classifiers		The Journal of Machine Learning Research	http://www.ai.mit.edu/projects/jmlr/papers/volume1/allwein00a/allwein00a0.pdf	10.1162/15324430152733133		2001

2001 ReducingMulticlassToBinary

Notes

Cited By

Quotes

Abstract

References

Navigation menu

Search