# Generative-Discriminative Relation

A Generative-Discriminative Relation is a model relation between a generative model family and a discriminative model family where one can be directly transformed to the other.

**AKA:**Generative-Discriminative Pair.**Example(s):****See:**Conditional Probability, Posterior Probability.

## References

### 2007

- (Sutton & McCallum, 2007) ⇒ Charles Sutton, and Andrew McCallum. (2007). “An Introduction to Conditional Random Fields for Relational Learning.” In: (Getoor & Taskar, 2007).
- QUOTE: An important difference between naive Bayes and logistic regression is that naive Bayes is
*generative*, meaning that it is based on a model of the joint distribution*p*(*y*,**x**), while logistic regression is*discriminative*, meaning that it is based on a model of the conditional distribution*p*(*y*,x). In this section, we discuss the differences between generative and discriminative modeling, and the advantages of discriminative modeling for many tasks. For concreteness, we focus on the examples of naive Bayes and logistic regression, but the discussion in this section actually applies in general to the differences between generative models and conditional random fields.The main difference is that a conditional distribution

*p*(**y**|**x**)[math]p(y \vert x)[/math] does not include a model of*p*(x), which is not needed for classification anyway. The difficulty in modeling*p*(**x**) is that it often contains many highly dependent features, which are difficult to model. For example, in named-entity recognition, an HMM relies on only one feature, the word’s identity. But many words, especially proper names, will not have occurred in the training set, so the word-identity feature is uninformative. To label unseen words, we would like to exploit other features of a word, such as its capitalization, its neighboring words, its prefixes and suffixes, its membership in predetermined lists of people and locations, and so on.

- QUOTE: An important difference between naive Bayes and logistic regression is that naive Bayes is

### 2006

- (Lasserre, 2006) ⇒ Lasserre, JA and Bishop, CM and Minka, TP. (2006). “Principled hybrids of generative and discriminative models.” In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition

### 2004

- (Bouchard & Triggs, 2004) ⇒ Guillaume Bouchard, and Bill Triggs. (2004). “The Trade-off Between Generative and Discriminative Classifiers.” In: Proceedings of the Symposium on Computational Statistics (COMPSTAT 2004).
- QUOTE: In supervised classification, inputs [math]x[/math] and their labels [math]y[/math] arise from an unknown joint probability
*p*(*x*;*y*). If we can approximate*p*(*x*,*y*) using a parametric family of models [math]G[/math] = {*p*_{\theta}(x*,*y*),*\theta*\in T}, then a natural classifier is obtained by first estimating the class-conditional densities, then classifying each new data point to the class with highest posterior probability. This approach is called*generative*classification.*However, if the overall goal is to find the classification rule with the smallest error rate, this depends only on the conditional density [math]p(y \vert x)[/math]. Discriminative methods directly model the conditional distribution, without assuming anything about the input distribution p(x). Well known generative-discriminative pairs include Linear Discriminant Analysis (LDA) vs. Linear logistic regression and naive Bayes vs. Generalized Additive Models (GAM). Many authors have already studied these models e.g. [5,6]. Under the assumption that the underlying distributions are Gaussian with equal covariances, it is known that LDA requires less data than its

**discriminative counterpart**, linear logistic regression [3]. More generally, it is known that generative classifiers have a smaller variance than.*Conversely, the generative approach converges to the best model for the joint distribution***p**generative and*(*x*,*y*) but the resulting conditional density is usually a biased classifier unless its*p_{θ}(*x*) part is an accurate model for*p*(*x*). In real world problems the assumed generative model is rarely exact, and asymptotically, a discriminative classifier should typically be preferred [9, 5]. The key argument is that the discriminative estimator converges to the conditional density that minimizes the negative log-likelihood classification loss against the true density p(x, y) [2]. For finite sample sizes, there is a bias-variance tradeoff and it is less obvious how to choose between**discriminative classifiers**.

- QUOTE: In supervised classification, inputs [math]x[/math] and their labels [math]y[/math] arise from an unknown joint probability

### 2001

- (Ng & Jordan, 2001) ⇒ Andrew Y. Ng, and Michael I. Jordan. (2001). “On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes.” In: Proceeding of NIPS Conference (NIPS 2001).