1995 SupportVectorNetworks

(Cortes & Vapnik, 1995) ⇒ Corinna Cortes, and Vladimir N. Vapnik. (1995). “Support Vector Networks.” In: Machine Learning, 20(3). doi:10.1007/BF00994018

Subject Headings: Support Vector Machine Classifier, Kernel Function, Radial Basis Function, Machine Learning Algorithm

Notes

It extends SVMs to noisy data (non-separable binary spaces).

Cited By

~6,719 …

2000

(Cristianini & Shawe-Taylor, 2000) ⇒ Nello Cristianini, and John Shawe-Taylor. (2000). “An Introduction to Support Vector Machines: And Other Kernel-based Learning Methods." Cambridge University Press. ISBN:0521780195

Quotes

Abstract

The support-vector network is a new learning machine for two-group classification problems. The machine conceptually implements the following idea: input vectors are non-linearly mapped to a very high-dimension feature space. In this feature space a linear decision surface is constructed. Special properties of the decision surface ensures high generalization ability of the learning machine. The idea behind the support-vector network was previously implemented for the restricted case where the training data can be separated without errors. We here extend this result to non-separable training data.

High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated. We also compare the performance of the support-vector network to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.

1.Introduction

More than 60 years ago R. A. Fisher (Fisher, 1936) suggested the first algorithm for pattern recognition. He considered a model of two normal distributed populations, [math]\displaystyle{ N(m_1, \Sigma_1) }[/math] and [math]\displaystyle{ N(m_2, \Sigma_2) }[/math] of [math]\displaystyle{ n }[/math] dimensional vectors [math]\displaystyle{ \mathbf{x} }[/math] with mean vectors [math]\displaystyle{ m_1 }[/math] and [math]\displaystyle{ m_2 }[/math] and co-variance matrices [math]\displaystyle{ \Sigma_1 }[/math] and [math]\displaystyle{ \Sigma_2 }[/math], and showed that the optimal (Bayesian) solution is a quadratic decision function: …

…

However, even if the optimal hyperplane generalizes well the technical problem of how to treat the high dimensional feature space remains. In 1992 it was shown (Boser, Guyon, & Vapnik, 1992), that the order of operations for constructing a decision function can be interchanged: instead of making a non-linear transformation of the input vectors followed by dot-products with support vectors in feature space, one can first compare two vectors in input space (by e.g. taking their dot-product or some distance measure), and then make a non-linear transformation of the value of the result (see Fig. 4). This enables the construction of rich classes of decision surfaces, for example polynomial decision surfaces of arbitrary degree. We will call this type of learning machine a support-vector network3.

…

References

Aizerman, M., Braverman, E., & Rozonoer, L. (1964). Theoretical foundations of the potential function method in pattern recognition learning. Automation and Remote Control, 25:821-837.
Anderson, T.W., & Bahadur, R.R. (1966). Classification into two multivariate normal distributions with different covariance matrices. Ann. Math. Stat., 33:420--431.
B.E. Boser, Isabelle M. Guyon, and Vladimir N. Vapnik. (1992). “A Training Algorithm for Optimal Margin Classifiers.” In: Proceedings of the Fifth Annual Workshop of Computational Learning Theory, 5.
Bottou, L., Cortes, C., Denker, J.S., Drucker, H., Guyon, I., Jackel, L.D., LeCun, Y., Sackinger, E., Simard, P.,
Vladimir N. Vapnik, and U.A Miller. (1994). “Comparison of classifier methods: A case study in handwritten digit recognition.” In: Proceedings of 12th International Conference on Pattern Recognition and Neural Network.
Bromley, J., & Sackinger, E. (1991). Neural-network and k-nearest-neighbor classifiers. Technical Report 11359-910819-16TM, AT&T.
Courant, R., & Hilbert, D. (1953). Methods ofMathematical Physics, Interscience, New York. Fisher, R.A. (1936). The use of multiple measurements in taxonomic problems. Ann. Eugenics, 7:111-132.
LeCun, Y. (1985). Une procedure d'apprentissage pour reseau a seuil assymetrique. Cognitiva 85: A la Frontiere de l'Intelligence Artificielle des Sciences de la Connaissance des Neurosciences, 599-604, Paris.
LeCun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W., & Jackel, L.D. (1990). Handwritten digit recognition with a back-propagation network. Advances in Neural Information Processing Systems, 2, 396-404, Morgan Kaufman.
Parker, D.B. (1985). Learning logic. Technical Report TR-47, Center for Computational Research in Economics and Management Science, Massachusetts Institute of Technology, Cambridge, MA.
Frank Rosenblatt (1962). Principles ofNeurodynamics, Spartan Books, New York.
Rumelhart, D.E., Hinton, G.E., & Williams, R.J. (1986). Learning intemal representations by backpropagating errors. Nature, 323:533-536.
(Rumlhart et al., 1986) ⇒ David E. Rumelhart, Geoffrey E. Hinton, and Ronald J. Williams. (1986). “Learning internal representations by error propagation.” In: D. E. Rumelhart (editor) & James L. McClelland (editor). “Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Volume 1: Foundations.” MIT Press. ISBN:026268053X
Vladimir N. Vapnik. (1982). Estimation of Dependences Based on Empirical Data, Addendum 1, New York: Springer-Verlag.

,

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
1995 SupportVectorNetworks	Vladimir N. Vapnik Corinna Cortes			Support Vector Networks		Machine Learning (ML) Subject Area	http://cns.bu.edu/~ccwong/Literature/61.pdf	10.1007/BF00994018		1995