2008 BoostingSupVectMachforImbDataSets

(Wang & Japkowicz, 2008) ⇒ Benjamin X. Wang, and Nathalie Japkowicz. (2008) "Boosting Support Vector Machines for Imbalanced Data Sets.” In: Proceedings of the 17th International Conference on Foundations of Intelligent Systems. ISBN:3-540-68122-1 978-3-540-68122-9.

Subject Headings:

Notes

Cited By

~11 http://scholar.google.com/scholar?q=%22Boosting+Support+Vector+Machines+for+Imbalanced+Data+Sets%22+2008

Quotes

Abstract

Real world data mining applications must address the issue of learning from imbalanced data sets. The problem occurs when the number of instances in one class greatly outnumbers the number of instances in the other class. Such data sets often cause a default classiﬁer to be built due to skewed vector spaces or lack of information. Common approaches for dealing with the class imbalance problem involve modifying the data distribution or modifying the classiﬁer. In this work, we choose to use a combination of both approaches. We use support vector machines with soft margins as the base classiﬁer to solve the skewed vector spaces problem. Then we use a boosting algorithm to get an ensemble classiﬁer that has lower error than a single classiﬁer. We found that this ensemble of SVMs makes an impressive improvement in prediction performance, not only for the majority class, but also for the minority class.

References

Akbani, R., Kwek, S., and Japkowicz, N. (2004), Applying Support Vector Machines to Imbalanced Datasets,in the Proceedings of the 2004 European Conference on Machine Learning (ECML’2004).
Amari,S.,& Wu,S.(1999). Improving support vector machine classiﬁers by modifying kernel functions. Neural Networks,12,783-789.
N. Chawla, K. Bowyer, L. Hall, & W. P. Kegelmeyer, (2000). SMOTE: synthetic minority over-sampling technique. International Conference on Knowledge Based Computer Systems.
N. Chawla, A. Lazarevic, L. Hall, K. Bowyer (2003). SMOTEBoost: Improving Prediction of the Minority Class in Boosting, 7th European Conference on Principles and Practice of Knowledge Discovery in Databases, Cavtat-Dubrovnik, Croatia , 107-119.
Chen C., Liaw, A., and Breiman, L. (2004). Using random forest to learn unbalanced data. Technical Report 666, Statistics Department, University of California at Berkeley.
W. Fan, S. Stolfo, J.Zhang, P. Chan (1999). AdaCost: Misclassiﬁcation CostSensitive Boosting, Proceedings of 16th International Conference on Machine Learning, Slovenia.
H. Guo and HL Viktor (2004). Learning from Imbalanced Data Sets with Boosting and Data Generation: The DataBoost-IM Approach, ACM SIGKDD Explorations, 6(1), 30-39.
N. Japkowicz and S. Stephen (2002). The Class Imbalance Problem: A Systematic Study: Intelligent Data Analysis, Volume 6, Number 5, pp. 429-450
Kubat, M., & Matwin, S. (1997). Addressing the curse of imbalanced training sets: One-sided selection. Proceddings of the Fourteenth International Conference on Machine Learning, 179-186.
Katharina Morik, Peter Brockhausen, Thorsten Joachims (1999). Combining Statistical Learning with a Knowledge-Based Approach - A Case Study in Intensive Care Monitoring. ICML: 268-277
Shawe-Taylor, J. and Cristianini, N. (1999) Further results on the margin distribution. In: Proceedings of the 12th Conference on Computational Learning Theory.
Vapnik, V. (1995). The nature of statistical learning theory. New York: Springer.
Veropoulos, K., Campbell, C., & Cristianini, N. (1999). Controlling the sensitivity of support vector machines. Proceedings of the International Joint Conference on Artiﬁcial Intelligence, 55-60.
Wu, G., & Chang, E. (2003). Adaptive feature-space conformal transformation for imbalanced data learning. Proceedings of the 20th International Conference on Machine Learning.,

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2008 BoostingSupVectMachforImbDataSets	Nathalie Japkowicz Benjamin X. Wang			Boosting Support Vector Machines for Imbalanced Data Sets		Proceedings of the 17th International Conference on Foundations of Intelligent System	http://www.site.uottawa.ca/~nat/Papers/29-Wang.pdf			2008