Generative Learning Algorithm: Difference between revisions

From GM-RKB
Jump to navigation Jump to search
No edit summary
 
Line 1: Line 1:
A [[Generative Learning Algorithm]] is a [[probabilistic learning algorithm]] that produces a [[generative model]] by directly estimating ([[maximizing]]?) the [[prior probability]] of the [[target class]] and [[predictor variable]]s.
#REDIRECT [[Generative Model Training Algorithm]]
* <B>Context</U>:</B>
** It can usually be slow/complicated when both <math>x</math> and/or <math>y</math> are [[Complex High-Dimensional Random Object]]s.
** It can apply [[Bayes Rule]].
** It can find the value of the weights that are most likely to account the data that we have seen (the [[Maximum Likelihood]]).
** It can be slow to get the sum over all possible states.
** It can range from being a [[Generative Classification Algorithm]] to being a [[Generative Estimation Algorithm]].
* <B>Example(s):</B>
** [[Naive Bayes Algorithm]].
** [[Hidden Markov Model Learning]].
* <B>Counter-Example(s):</B>
** any [[Discriminative Learning Algorithm]].
* <B>See:</B> [[Hidden Markov Model]].
----
----
==References ==
 
=== 2009 ===
* ([[2009_AnEntityBasedModelForCorefResolution|Wick & al, 2009]]) &rArr; [[Michael Wick]], [[Aron Culotta]], Khashayar Rohanimanesh, and [[Andrew McCallum]]. ([[2009]]). "[http://maroo.cs.umass.edu/pub/web/getpdf.php?id=862 An Entity Based Model for Coreference Resolution]." In: Proceedings of the SIAM International Conference on Data Mining (SDM 2009).
** Statistical approaches to coreference resolution can be broadly placed into two categories: '''generative models''', which model the joint probability, and discriminative models that model that conditional probability. These models can be either supervised (uses labeled coreference data for learning) or unsupervised (no labeled data is used). Our model falls into the category of discriminative and supervised.
 
===2004===
* ([[2004_TheTradeOffBetweenGenAndDiscrClassifiers|Bouchard & Triggs, 2004]]) &rArr; Guillaume Bouchard, and Bill Triggs. (2004). "[http://lear.inrialpes.fr/pubs/2004/BT04/Bouchard-compstat04.pdf The Trade-off Between Generative and Discriminative Classifiers]." In: Proceedings of COMPSTAT 2004.
** QUOTE: ... In [[supervised classification]], inputs <math>x</math> and their labels <math>y</math> arise from an [[unknown]] [[joint probability]] <math>p(x,y)</math>. If we can [[joint probability estimation|approximate]] <math>p(x,y)</math> using a [[parametric family of models]] <math>G = \{p_θ(x,y),\theta \in \Theta\}</math>, then a natural [[classifier]] is obtained by first estimating the [[Conditional Probability Function|class-conditional densities]], then classifying each new [[data point]] to the [[class]] with highest [[posterior probability]]. This approach is called [[Generative Classification|<i>generative</i> classification]].  <P>  However, if the overall goal is to find the classification rule with the smallest error rate, this depends only on the conditional density <math>p(y \vert x)</math>. <i>Discriminative</i> methods directly model the conditional distribution, without assuming anything about the input distribution p(x). Well known generative-discriminative pairs include Linear Discriminant Analysis (LDA) vs. Linear logistic regression and naive Bayes vs. Generalized Additive Models (GAM). Many authors have already studied these models e.g. [5,6]. Under the assumption that the underlying distributions are Gaussian with equal covariances, it is known that LDA requires less data than its discriminative counterpart, linear logistic regression [3]. More generally, it is known that generative classifiers have a smaller variance than.  <P> Conversely, the generative approach converges to the best model for the joint distribution ''p''(''x'',''y'') but the resulting conditional density is usually a biased classifier unless its ''p''<sub>θ</sub>(''x'') part is an accurate model for ''p''(''x''). In real world problems the assumed generative model is rarely exact, and asymptotically, a discriminative classifier should typically be preferred [9, 5]. The key argument is that the discriminative estimator converges to the conditional density that minimizes the negative log-likelihood classification loss against the true density p(x, y) [2]. For finite sample sizes, there is a bias-variance tradeoff and it is less obvious how to choose between generative and discriminative classifiers.
 
===1999 ===
* ([[1999_ExploitingGenerativeModelsinDis|Jaakkola & Haussler, 1999]]) &rArr; [[Tommi S. Jaakkola]], and [[David Haussler]]. ([[1999]]). "[http://www.uniroma2.it/didattica/BdDD/deposito/jaakkola98exploiting-haussler.pdf Exploiting Generative Models in Discriminative Classifiers]." In: [[Proceedings of the 1998 conference on Advances in neural information processing systems II]]. ISBN:0-262-11245-0
** QUOTE: [[Generative probability model]]s such as [[hidden Markov models]] provide a principled way of treating [[missing information]] and dealing  with [[variable length sequence]]s. On the other hand, [[discriminative method]]s such as [[support vector machines]] enable us to construct flexible [[decision boundari]]es and often result in [[classification performance]] superior to that of the [[model based approach]]es.
 
----
 
__NOTOC__
[[Category:Concept]]

Latest revision as of 20:25, 14 January 2015