# Discriminative Classification Algorithm

A Discriminative Classification Algorithm is a supervised classification algorithm that is a discriminative learning algorithm.

**Context:**- It can be applied by a Discriminative Classification System.
- It can range from being a Fully-Supervised Discriminative Classification Algorithm to being a Semi-Supervised Discriminative Classification Algorithm.
- …

**Example(s):****Counter-Example(s):****See:**Generative Classification Algorithm.

## References

### 2006

- (Kumar & Hebert, 2006) ⇒ Sanjiv Kumar, and Martial Hebert. (2006). “Discriminative Random Fields.” In: International Journal of Computer Vision, 68(2). doi:10.1007/s11263-006-7007-9

### 2004

- (Bouchard & Triggs, 2004) ⇒ Guillaume Bouchard, and Bill Triggs. (2004). “The Trade-off Between Generative and Discriminative Classifiers.” In: Proceedings of COMPSTAT 2004.
- QUOTE: In supervised classification, inputs [math]\displaystyle{ x }[/math] and their labels [math]\displaystyle{ y }[/math] arise from an unknown joint probability [math]\displaystyle{ p(x,y) }[/math]. If we can approximate [math]\displaystyle{ p(x,y) }[/math] using a parametric family of models [math]\displaystyle{ G = \{p_θ(x,y),\theta \in \Theta\} }[/math], then a natural classifier is obtained by first estimating the class-conditional densities, then classifying each new data point to the class with highest posterior probability. This approach is called
*generative*classification.However, if the overall goal is to find the classification rule with the smallest error rate, this depends only on the conditional density [math]\displaystyle{ p(y \vert x) }[/math].

*Discriminative*methods directly model the conditional distribution, without assuming anything about the input distribution [math]\displaystyle{ p(x) }[/math]. Well known generative-discriminative pairs include Linear Discriminant Analysis (LDA) vs. Linear logistic regression and naive Bayes vs. Generalized Additive Models (GAM). Many authors have already studied these models e.g. [5,6]. Under the assumption that the underlying distributions are Gaussian with equal covariances, it is known that LDA requires less data than its discriminative counterpart, linear logistic regression [3]. More generally, it is known that generative classifiers have a smaller variance than.Conversely, the generative approach converges to the best model for the joint distribution [math]\displaystyle{ p(x,y) }[/math] but the resulting conditional density is usually a biased classifier unless its [math]\displaystyle{ p_\theta(x) }[/math] part is an accurate model for [math]\displaystyle{ p(x) }[/math]. In real world problems the assumed generative model is rarely exact, and asymptotically, a discriminative classifier should typically be preferred [9, 5]. The key argument is that the discriminative estimator converges to the conditional density that minimizes the negative log-likelihood classification loss against the true density [math]\displaystyle{ p(x,y) }[/math] [2]. For finite sample sizes, there is a bias-variance tradeoff and it is less obvious how to choose between generative and discriminative classifiers.

- QUOTE: In supervised classification, inputs [math]\displaystyle{ x }[/math] and their labels [math]\displaystyle{ y }[/math] arise from an unknown joint probability [math]\displaystyle{ p(x,y) }[/math]. If we can approximate [math]\displaystyle{ p(x,y) }[/math] using a parametric family of models [math]\displaystyle{ G = \{p_θ(x,y),\theta \in \Theta\} }[/math], then a natural classifier is obtained by first estimating the class-conditional densities, then classifying each new data point to the class with highest posterior probability. This approach is called