# Supervised Linear Model-based Classification Algorithm

(Redirected from Linear Classifier)

A Supervised Linear Model-based Classification Algorithm is a supervised model-based classification algorithm that fits a linear classification function.

**Context:**- It can be applied by a Linear Classification System (to solve a linear classification task).
- It can range from being a Discriminative Linear Classification Algorithm to being a Generative Linear Classification Algorithm.

**Example(s):**- a Linear SVM Algorithm, that maximizes the margin of a linear kernel (so that the distance to the closest misclassified entity is the widest)
- a Logistic Regression Algorithm, that minimizes the classification error (based on the sum of differences).
- a Perceptron Algorithm.
- …

**Counter-Example(s):****See:**Fisher's Linear Discriminant, Linear Combination.

## References

### 2015

- (Wikipedia, 2015) ⇒ http://en.wikipedia.org/wiki/linear_classifier Retrieved:2015-7-8.
- In the field of machine learning, the goal of statistical classification is to use an object's characteristics to identify which class (or group) it belongs to. A
**linear classifier**achieves this by making a classification decision based on the value of a linear combination of the characteristics. An object's characteristics are also known as feature values and are typically presented to the machine in a vector called a feature vector. Such classifiers work well for practical problems such as document classification, and more generally for problems with many variables (features), reaching accuracy levels comparable to non-linear classifiers while taking less time to train and use.

- In the field of machine learning, the goal of statistical classification is to use an object's characteristics to identify which class (or group) it belongs to. A

- http://spark.apache.org/docs/latest/mllib-linear-methods.html
- QUOTE: Many standard machine learning methods can be formulated as a convex optimization problem, i.e. the task of finding a minimizer of a convex function f that depends on a variable vector w (called weights in the code), which has d entries. Formally, we can write this as the optimization problem [math]minw∈Rdf(w)[/math], where the objective function is of the form: [math]f(w):=λR(w)+1n∑i=1nL(w;xi,yi) .(1) [/math] Here the vectors [math]x_i∈Rd[/math] are the training data examples, for 1≤i≤n, and yi∈R are their corresponding labels, which we want to predict. We call the method linear if L(w;x,y) can be expressed as a function of wTx and y. Several of MLlib’s classification and regression algorithms fall into this category, and are discussed here.
The objective function f has two parts: the regularizer that controls the complexity of the model, and the loss that measures the error of the model on the training data. The loss function L(w;.) is typically a convex function in w. The fixed regularization parameter λ≥0 (regParam in the code) defines the trade-off between the two goals of minimizing the loss (i.e., training error) and minimizing model complexity (i.e., to avoid overfitting).

- QUOTE: Many standard machine learning methods can be formulated as a convex optimization problem, i.e. the task of finding a minimizer of a convex function f that depends on a variable vector w (called weights in the code), which has d entries. Formally, we can write this as the optimization problem [math]minw∈Rdf(w)[/math], where the objective function is of the form: [math]f(w):=λR(w)+1n∑i=1nL(w;xi,yi) .(1) [/math] Here the vectors [math]x_i∈Rd[/math] are the training data examples, for 1≤i≤n, and yi∈R are their corresponding labels, which we want to predict. We call the method linear if L(w;x,y) can be expressed as a function of wTx and y. Several of MLlib’s classification and regression algorithms fall into this category, and are discussed here.

### 2011

- http://en.wikipedia.org/wiki/Linear_classifier
- … If the input feature vector to the classifier is a real vector [math]\vec x[/math], then the output score is [math]y = f(\vec{w}\cdot\vec{x}) = f\left(\sum_j w_j x_j\right),[/math] where [math]\vec w[/math] is a real vector of weights and [math]f[/math] is a function that converts the dot product of the two vectors into the desired output. (In other words, [math]\vec{w}[/math] is a one-form or linear functional mapping [math]\vec x[/math] onto
**R**.) The weight vector [math]\vec w[/math] is learned from a set of labeled training samples. Often [math]f[/math] is a simple function that maps all values above a certain threshold to the first class and all other values to the second class. A more complex [math]f[/math] might give the probability that an item belongs to a certain class.

- … If the input feature vector to the classifier is a real vector [math]\vec x[/math], then the output score is [math]y = f(\vec{w}\cdot\vec{x}) = f\left(\sum_j w_j x_j\right),[/math] where [math]\vec w[/math] is a real vector of weights and [math]f[/math] is a function that converts the dot product of the two vectors into the desired output. (In other words, [math]\vec{w}[/math] is a one-form or linear functional mapping [math]\vec x[/math] onto

### 2009

- (Hastie et al., 2009) ⇒ Trevor Hastie, Robert Tibshirani, and Jerome H. Friedman. (2009). “The Elements of Statistical Learning: Data Mining, Inference, and Prediction; 2nd edition.” Springer-Verlag. ISBN:0387848576

### 2006

- (Bishop, 2006) ⇒ Christopher M. Bishop. (2006). “Pattern Recognition and Machine Learning. Springer, Information Science and Statistics.

### 2004

- (Bouchard & Triggs, 2004) ⇒ Guillaume Bouchard, and Bill Triggs. (2004). “The Trade-off Between Generative and Discriminative Classifiers.” In: Proceedings of COMPSTAT 2004.

### 1995

- (Li, 1995) ⇒ Stan Z. Li. (1995). “Markov Random Field Modeling in Computer Vision.” Springer-Verlag. ISBN:4431701451
- 7.2.3 Linear Classification Function http://www.cbsr.ia.ac.cn/users/szli/mrf_book/Chapter_7/node120.html