Linear Kernel-based Support Vector Machine Algorithm

From GM-RKB
Jump to navigation Jump to search

A Linear Kernel-based Support Vector Machine Algorithm is an SVM training algorithm that can be implemented by a linear SVM training system to solve a linear SVM training task (to produce a linear SVM based on a linear kernel).



References

2016

  • https://www.quora.com/Why-is-kernelized-SVM-much-slower-than-linear-SVM
    • QUOTE: Basically, a kernel-based SVM requires on the order of n^2 computation for training and order of find computation for classification, where n is the number of training examples and d the input dimension (and assuming that the number of support vectors ends up being a fraction of n, which is shown to be expected in theory and in practice). Instead, a 2-class linear SVM requires on the order of find computation for training (times the number of training iterations, which remains small even for large n) and on the order of d computations for classification.

2014

  • (Wikipedia, 2014) ⇒ http://en.wikipedia.org/wiki/Support_vector_machine#Linear_SVM Retrieved:2014-11-28.
    • Given some training data [math]\displaystyle{ \mathcal{D} }[/math], a set of n points of the form  :[math]\displaystyle{ \mathcal{D} = \left\{ (\mathbf{x}_i, y_i)\mid\mathbf{x}_i \in \mathbb{R}^p,\, y_i \in \{-1,1\}\right\}_{i=1}^n }[/math] where the yi is either 1 or −1, indicating the class to which the point [math]\displaystyle{ \mathbf{x}_i }[/math] belongs. Each [math]\displaystyle{ \mathbf{x}_i }[/math] is a p-dimensional real vector. We want to find the maximum-margin hyperplane that divides the points having [math]\displaystyle{ y_i=1 }[/math] from those having [math]\displaystyle{ y_i=-1 }[/math]. Any hyperplane can be written as the set of points [math]\displaystyle{ \mathbf{x} }[/math] satisfying : [math]\displaystyle{ \mathbf{w}\cdot\mathbf{x} - b=0,\, }[/math] where [math]\displaystyle{ \cdot }[/math] denotes the dot product and [math]\displaystyle{ {\mathbf{w}} }[/math] the (not necessarily normalized) normal vector to the hyperplane. The parameter [math]\displaystyle{ \tfrac{b}{\|\mathbf{w}\|} }[/math] determines the offset of the hyperplane from the origin along the normal vector [math]\displaystyle{ {\mathbf{w}} }[/math]. If the training data are linearly separable, we can select two hyperplanes in a way that they separate the data and there are no points between them, and then try to maximize their distance. The region bounded by them is called "the margin". These hyperplanes can be described by the equations  : [math]\displaystyle{ \mathbf{w}\cdot\mathbf{x} - b=1\, }[/math] and  : [math]\displaystyle{ \mathbf{w}\cdot\mathbf{x} - b=-1.\, }[/math]

      By using geometry, we find the distance between these two hyperplanes is [math]\displaystyle{ \tfrac{2}{\|\mathbf{w}\|} }[/math], so we want to minimize [math]\displaystyle{ \|\mathbf{w}\| }[/math]. As we also have to prevent data points from falling into the margin, we add the following constraint: for each [math]\displaystyle{ i }[/math] either [math]\displaystyle{ \mathbf{w}\cdot\mathbf{x}_i - b \ge 1\qquad\text{ for }\mathbf{x}_i }[/math] of the first class or  : [math]\displaystyle{ \mathbf{w}\cdot\mathbf{x}_i - b \le -1\qquad\text{ for }\mathbf{x}_i }[/math] of the second.

      This can be rewritten as:  : [math]\displaystyle{ y_i(\mathbf{w}\cdot\mathbf{x}_i - b) \ge 1, \quad \text{ for all } 1 \le i \le n.\qquad\qquad(1) }[/math]

       We can put this together to get the optimization problem:

      Minimize (in [math]\displaystyle{ {\mathbf{w},b} }[/math])  : [math]\displaystyle{ \|\mathbf{w}\| }[/math]

      subject to (for any [math]\displaystyle{ i = 1, \dots, n }[/math])  : [math]\displaystyle{ y_i(\mathbf{w}\cdot\mathbf{x_i} - b) \ge 1. \, }[/math]

2014

2006