Discriminative Learning Algorithm

AKA: Distribution-free ML Algorithm, Discriminative Training.
Context:
- It can range from (typically) being a Discriminative Classification Algorithm to being a Discriminative Estimation Algorithm.
- It can (typically) can involve a simpler Parameter Estimation and makes fewer assumptions than a Generative Algorithm.
- It does not result in Probability Functions (but Weights).
Example(s):
Counter-Example(s):
- a Generative Model Training Algorithm, such as: Linear Discriminant Analysis (LDA), or a HMM Training Algorithm.
See: Discriminative Model Inferencing Algorithm; Generative References.

References

(Sammut & Webb, 2011) ⇒ Claude Sammut (editor), and Geoffrey I. Webb (editor). (2011). “Discriminative Learning.” In: (Sammut & Webb, 2011).
- Discriminative Learning - Definition: Discriminative learning refers to any classification learning process that classifies by using a model or estimate of the probability [math]\displaystyle{ P(x|y) }[/math] without reference to an explicit estimate of any of [math]\displaystyle{ P(x) }[/math], [math]\displaystyle{ P(y, x) }[/math], or [math]\displaystyle{ P(x|y) }[/math], where [math]\displaystyle{ y }[/math] is a class and [math]\displaystyle{ x }[/math] is a description of an object to be classified. Discriminative learning contrasts to generative learning which classifies by using an estimate of the joint probability [math]\displaystyle{ P(y, x) }[/math] or of the prior probability [math]\displaystyle{ P(y) }[/math] and the conditional probability [math]\displaystyle{ P(x|y) }[/math]. It is also common to categorize as discriminative any approaches that are directly based on a decision risk function (such as Support Vector Machines, Artificial Neural Networks, and Decision Trees), where the decision risk is minimized without estimation of [math]\displaystyle{ P(x) }[/math], [math]\displaystyle{ P(y, x) }[/math], or [math]\displaystyle{ P(x|y) }[/math].

(Wick et al., 2009) ⇒ Michael Wick, Aron Culotta, Khashayar Rohanimanesh, and Andrew McCallum. (2009). “An Entity Based Model for Coreference Resolution.” In: Proceedings of the SIAM International Conference on Data Mining (SDM 2009).
- Statistical approaches to coreference resolution can be broadly placed into two categories: generative models, which model the joint probability, and discriminative models that model that conditional probability. These models can be either supervised (uses labeled coreference data for learning) or unsupervised (no labeled data is used). Our model falls into the category of discriminative and supervised.

(Minka, 2005) ⇒ Thomas P. Minka. (2005). “Discriminative Models, not Discriminative Training" Technical Report MSR-TR-2005-144, Microsoft Research.
- QUOTE: By taking this view, you have a consistent approach to statistical inference: you always model all variables, and you always use joint likelihood. The only thing that changes is the model. You can also see clearly why discriminative training might work better than generative training. It must be because a model of the form (5) fits the data better than (1). In particular, (5) is necessarily more flexible than (1), because it removes the implicit constraint that [math]\displaystyle{ \theta=\theta' }[/math]. Removing constraints reduces the statistical bias, at the cost of greater parameter uncertainty.