Generative and Discriminative Learning: Difference between revisions

From GM-RKB
Jump to navigation Jump to search
m (Text replacement - ". ---- " to ". ---- ")
m (Text replacement - "<P> [[" to "<P>  [[")
 
Line 8: Line 8:
=== 2011 ===
=== 2011 ===
* ([[Liu & Webb, 2011]]) ⇒ Bin Liu, and [[Geoffrey I. Webb]]. ([[2011]]). “Generative and Discriminative Learning.” In: ([[Sammut & Webb, 2011]]) p.454
* ([[Liu & Webb, 2011]]) ⇒ Bin Liu, and [[Geoffrey I. Webb]]. ([[2011]]). “Generative and Discriminative Learning.” In: ([[Sammut & Webb, 2011]]) p.454
** <i>[[Generative learning]]</i> refers alternatively to any [[classification learning process]] that classifies by using an [[estimate of the joint probability <math>P(y,\bf{x})</math> or to any [[classification learning process]] that [[classifi]]es by using [[prior probability estimate|estimates]] of the [[prior probability|prior probability <math>P(y)</math>]] and the [[conditional probability function|conditional probability <math>P(\bf{x}, y)</math>]] ([[Bishop, 2007]]; [[Jaakkola & Haussler, 1999]]; [[Jaakkola, Meila & Jebara, 1999]]; [[Lasserre, Bishop & Minka, 2006]]; [[Ng & Jordan, 2002]]), where <math>y</math> is a [[class variable|class]] and <math>\bf{x}</math> is a [[description of an object to be classified]]. [[Generative learning]] contrasts with [[discriminative learning|<math>discriminative learning</math>]] in which a [[conditional probability model|model]] or [[conditional probability function estimate|estimate]] of [[conditional target probability|<math>P(y|\bf{x})</math>]] informed without reference to an explicit estimate of any of <math>P(y,\bf{x})</math>, <math>P(\bf{x})</math> or <math>P(\bf{x},y)</math>.        <P>        It is also common to categorize as discriminative approaches based on a [[decision function]] that directly maps from [[input]] <math>\bf{x}</math> onto the output <math>y</math> (such as [[support vector machines]], [[neural networks]], and [[decision trees]]), where the [[decision risk]] is [[minimized]] without [[estimation]] of <math>P(y,\bf{x})</math>, <math>P(\bf{x} | y)</math> or <math>P(y|\bf{x})</math> ([[Jaakkola & Haussler, 1999]]).        <P>        The [[standard exemplar]] of [[generative learning]] is [[naïve Bayes]] and of [[discriminative learning]], [[logistic regression]]. Another important contrasting pair is the [[generative hidden Markov model]] and [[discriminative conditional random field]]. It is widely accepted that [[generative learning]] workds well when [[samples]] are rare while [[discriminative learning]] has better [[asymptotic error performance]] ([[Ng & Jordan, 2002]]).        <P>        [[Efron (1975)]] provides an early examination of [[generative]]/[[discriminative]] [[distinction]]. [[Efron]] performs an empirical comparison]] of the [[efficiency]] of the [[generative linear discriminant analysis (LDA)]] and [[discriminative logistic regression]]. His results show that [[logistic regression]] has 30% less [[efficiency]] than [[LDA]], which means that [[discriminative approach]] is 30% slower to reach the [[asymptotic error]] than the [[generative approach]].        <P>        [[Ng et al. (2002)]] give a theoretical discussion of the efficiency of [[generative naïve Bayes]] and [[discriminative logistic regression]]. Their result shows that [[logistic regression]] converges towards its [[asymptotic error]] in order <math>n</math> [[sample]]s while [[naitve Bayes]] converges in order [[log]] <math>n</math> [[sample]]s. While [[logistic regression]] converges much slower than [[naïve Bayes]], [[logistic regression|it]] has lower [[asymptotic error]] than [[naïve Bayes]]. These results suggest that it is desirable to use a [[generative approach]] when [[training data]] is [[scares]] and to use a [[discriminative approach]] when there is [[large [[training data]]set|enough training data]].        <P>          Recent research into the [[generative/discriminative learning distinction]] has concentrated on the area of [[hybrids of generative and discriminative learning]], as well as [[generative learning]] and [[discriminative learning]] in [[structured data learning]] or [[semi-supervised learning]] context.        <P>          In [[hybrid learning approach|hybrid approach]]es, [[predictive modeling researcher|researcher]]s see to obtain the merits of both [[generative learning]] and [[discriminative learning]]. Some examples include the [[Fisher kernel for discriminative learning]] ([[Jaakkola & Haussler, 1999]]), [[max-ent discriminative learning]] ([[Jaakkola, Meila & Jebara, 1999]]), and [[principled hybrids of generative and discriminative models]] ([[Lasserre, Bishop & Minka, 2006]]).  In [[structured data learning]], the [[output target|output data]] have [[dependent relationship]]s. As  an example of [[generative learning]], the [[hidden Markov models]] are used in [[structured learning task data|structured data problem]]s which need [[sequential decision]]s. The [[discriminative]] [[analog]] is the [[conditional random field models]]. Another example of [[discriminatively structured learning]] is [[Max-margin Markov networks]] ([[Taskar, Guestrin & Koller, 2004]]).        <P>        In [[semi-supervised learning]], [[co-training]] and [[multiview learning]] are usually applied to [[generative learning]] ([[Blum & Mitchell, 1998]]). It is less straightforward to apply [[semi-supervised learning]] in traditional [[discriminative learning]], since <math>P(y|\bf{x})</math> is estimated by ignoring <math>P(\bf{x})</math>. Examples of [[semi-supervised learning method]]s in [[discriminative learning include [[Transductive SVM]], [[Gaussian processes]], [[information regularization]], and [[graph-based method]]s ([[Chapelle, Scholkopf & Zien, 2006]]).
** <i>[[Generative learning]]</i> refers alternatively to any [[classification learning process]] that classifies by using an [[estimate of the joint probability <math>P(y,\bf{x})</math> or to any [[classification learning process]] that [[classifi]]es by using [[prior probability estimate|estimates]] of the [[prior probability|prior probability <math>P(y)</math>]] and the [[conditional probability function|conditional probability <math>P(\bf{x}, y)</math>]] ([[Bishop, 2007]]; [[Jaakkola & Haussler, 1999]]; [[Jaakkola, Meila & Jebara, 1999]]; [[Lasserre, Bishop & Minka, 2006]]; [[Ng & Jordan, 2002]]), where <math>y</math> is a [[class variable|class]] and <math>\bf{x}</math> is a [[description of an object to be classified]]. [[Generative learning]] contrasts with [[discriminative learning|<math>discriminative learning</math>]] in which a [[conditional probability model|model]] or [[conditional probability function estimate|estimate]] of [[conditional target probability|<math>P(y|\bf{x})</math>]] informed without reference to an explicit estimate of any of <math>P(y,\bf{x})</math>, <math>P(\bf{x})</math> or <math>P(\bf{x},y)</math>.        <P>        It is also common to categorize as discriminative approaches based on a [[decision function]] that directly maps from [[input]] <math>\bf{x}</math> onto the output <math>y</math> (such as [[support vector machines]], [[neural networks]], and [[decision trees]]), where the [[decision risk]] is [[minimized]] without [[estimation]] of <math>P(y,\bf{x})</math>, <math>P(\bf{x} | y)</math> or <math>P(y|\bf{x})</math> ([[Jaakkola & Haussler, 1999]]).        <P>        The [[standard exemplar]] of [[generative learning]] is [[naïve Bayes]] and of [[discriminative learning]], [[logistic regression]]. Another important contrasting pair is the [[generative hidden Markov model]] and [[discriminative conditional random field]]. It is widely accepted that [[generative learning]] workds well when [[samples]] are rare while [[discriminative learning]] has better [[asymptotic error performance]] ([[Ng & Jordan, 2002]]).        <P>          [[Efron (1975)]] provides an early examination of [[generative]]/[[discriminative]] [[distinction]]. [[Efron]] performs an empirical comparison]] of the [[efficiency]] of the [[generative linear discriminant analysis (LDA)]] and [[discriminative logistic regression]]. His results show that [[logistic regression]] has 30% less [[efficiency]] than [[LDA]], which means that [[discriminative approach]] is 30% slower to reach the [[asymptotic error]] than the [[generative approach]].        <P>          [[Ng et al. (2002)]] give a theoretical discussion of the efficiency of [[generative naïve Bayes]] and [[discriminative logistic regression]]. Their result shows that [[logistic regression]] converges towards its [[asymptotic error]] in order <math>n</math> [[sample]]s while [[naitve Bayes]] converges in order [[log]] <math>n</math> [[sample]]s. While [[logistic regression]] converges much slower than [[naïve Bayes]], [[logistic regression|it]] has lower [[asymptotic error]] than [[naïve Bayes]]. These results suggest that it is desirable to use a [[generative approach]] when [[training data]] is [[scares]] and to use a [[discriminative approach]] when there is [[large [[training data]]set|enough training data]].        <P>          Recent research into the [[generative/discriminative learning distinction]] has concentrated on the area of [[hybrids of generative and discriminative learning]], as well as [[generative learning]] and [[discriminative learning]] in [[structured data learning]] or [[semi-supervised learning]] context.        <P>          In [[hybrid learning approach|hybrid approach]]es, [[predictive modeling researcher|researcher]]s see to obtain the merits of both [[generative learning]] and [[discriminative learning]]. Some examples include the [[Fisher kernel for discriminative learning]] ([[Jaakkola & Haussler, 1999]]), [[max-ent discriminative learning]] ([[Jaakkola, Meila & Jebara, 1999]]), and [[principled hybrids of generative and discriminative models]] ([[Lasserre, Bishop & Minka, 2006]]).  In [[structured data learning]], the [[output target|output data]] have [[dependent relationship]]s. As  an example of [[generative learning]], the [[hidden Markov models]] are used in [[structured learning task data|structured data problem]]s which need [[sequential decision]]s. The [[discriminative]] [[analog]] is the [[conditional random field models]]. Another example of [[discriminatively structured learning]] is [[Max-margin Markov networks]] ([[Taskar, Guestrin & Koller, 2004]]).        <P>        In [[semi-supervised learning]], [[co-training]] and [[multiview learning]] are usually applied to [[generative learning]] ([[Blum & Mitchell, 1998]]). It is less straightforward to apply [[semi-supervised learning]] in traditional [[discriminative learning]], since <math>P(y|\bf{x})</math> is estimated by ignoring <math>P(\bf{x})</math>. Examples of [[semi-supervised learning method]]s in [[discriminative learning include [[Transductive SVM]], [[Gaussian processes]], [[information regularization]], and [[graph-based method]]s ([[Chapelle, Scholkopf & Zien, 2006]]).


=== 1999 ===
=== 1999 ===

Latest revision as of 01:45, 27 February 2024

See: Generative Learning, Discriminative Learning, Evolutionary Feature Selection and Construction.



References

2011

1999