Generative and Discriminative Learning: Difference between revisions

Latest revision as of 01:45, 27 February 2024

See: Generative Learning, Discriminative Learning, Evolutionary Feature Selection and Construction.

References

2011

(Liu & Webb, 2011) ⇒ Bin Liu, and Geoffrey I. Webb. (2011). “Generative and Discriminative Learning.” In: (Sammut & Webb, 2011) p.454
- Generative learning refers alternatively to any classification learning process that classifies by using an [[estimate of the joint probability [math]\displaystyle{ P(y,\bf{x}) }[/math] or to any classification learning process that classifies by using estimates of the prior probability [math]\displaystyle{ P(y) }[/math] and the conditional probability [math]\displaystyle{ P(\bf{x}, y) }[/math] (Bishop, 2007; Jaakkola & Haussler, 1999; Jaakkola, Meila & Jebara, 1999; Lasserre, Bishop & Minka, 2006; Ng & Jordan, 2002), where [math]\displaystyle{ y }[/math] is a class and [math]\displaystyle{ \bf{x} }[/math] is a description of an object to be classified. Generative learning contrasts with [math]\displaystyle{ discriminative learning }[/math] in which a model or estimate of [math]\displaystyle{ P(y|\bf{x}) }[/math] informed without reference to an explicit estimate of any of [math]\displaystyle{ P(y,\bf{x}) }[/math], [math]\displaystyle{ P(\bf{x}) }[/math] or [math]\displaystyle{ P(\bf{x},y) }[/math].
  It is also common to categorize as discriminative approaches based on a decision function that directly maps from input [math]\displaystyle{ \bf{x} }[/math] onto the output [math]\displaystyle{ y }[/math] (such as support vector machines, neural networks, and decision trees), where the decision risk is minimized without estimation of [math]\displaystyle{ P(y,\bf{x}) }[/math], [math]\displaystyle{ P(\bf{x} | y) }[/math] or [math]\displaystyle{ P(y|\bf{x}) }[/math] (Jaakkola & Haussler, 1999).
  The standard exemplar of generative learning is naïve Bayes and of discriminative learning, logistic regression. Another important contrasting pair is the generative hidden Markov model and discriminative conditional random field. It is widely accepted that generative learning workds well when samples are rare while discriminative learning has better asymptotic error performance (Ng & Jordan, 2002).
  Efron (1975) provides an early examination of generative/discriminative distinction. Efron performs an empirical comparison]] of the efficiency of the generative linear discriminant analysis (LDA) and discriminative logistic regression. His results show that logistic regression has 30% less efficiency than LDA, which means that discriminative approach is 30% slower to reach the asymptotic error than the generative approach.
  Ng et al. (2002) give a theoretical discussion of the efficiency of generative naïve Bayes and discriminative logistic regression. Their result shows that logistic regression converges towards its asymptotic error in order [math]\displaystyle{ n }[/math] samples while naitve Bayes converges in order log [math]\displaystyle{ n }[/math] samples. While logistic regression converges much slower than naïve Bayes, it has lower asymptotic error than naïve Bayes. These results suggest that it is desirable to use a generative approach when training data is scares and to use a discriminative approach when there is [[large training dataset|enough training data]].
  Recent research into the generative/discriminative learning distinction has concentrated on the area of hybrids of generative and discriminative learning, as well as generative learning and discriminative learning in structured data learning or semi-supervised learning context.
  In hybrid approaches, researchers see to obtain the merits of both generative learning and discriminative learning. Some examples include the Fisher kernel for discriminative learning (Jaakkola & Haussler, 1999), max-ent discriminative learning (Jaakkola, Meila & Jebara, 1999), and principled hybrids of generative and discriminative models (Lasserre, Bishop & Minka, 2006). In structured data learning, the output data have dependent relationships. As an example of generative learning, the hidden Markov models are used in structured data problems which need sequential decisions. The discriminative analog is the conditional random field models. Another example of discriminatively structured learning is Max-margin Markov networks (Taskar, Guestrin & Koller, 2004).
  In semi-supervised learning, co-training and multiview learning are usually applied to generative learning (Blum & Mitchell, 1998). It is less straightforward to apply semi-supervised learning in traditional discriminative learning, since [math]\displaystyle{ P(y|\bf{x}) }[/math] is estimated by ignoring [math]\displaystyle{ P(\bf{x}) }[/math]. Examples of semi-supervised learning methods in [[discriminative learning include Transductive SVM, Gaussian processes, information regularization, and graph-based methods (Chapelle, Scholkopf & Zien, 2006).

1999

(Jaakkola & Haussler, 1999) ⇒ Tommi Jaakkola, and David Haussler. (1999) "Exploiting Generative Models in Discriminative Classifiers.” In: Advances in Neural Information Processing Systems (NIPS 1999).

@@ Line 8: / Line 8: @@
 === 2011 ===
 * ([[Liu & Webb, 2011]]) ⇒ Bin Liu, and [[Geoffrey I. Webb]]. ([[2011]]). “Generative and Discriminative Learning.” In: ([[Sammut & Webb, 2011]]) p.454
-** <i>[[Generative learning]]</i> refers alternatively to any [[classification learning process]] that classifies by using an [[estimate of the joint probability <math>P(y,\bf{x})</math> or to any [[classification learning process]] that [[classifi]]es by using [[prior probability estimate|estimates]] of the [[prior probability|prior probability <math>P(y)</math>]] and the [[conditional probability function|conditional probability <math>P(\bf{x}, y)</math>]] ([[Bishop, 2007]]; [[Jaakkola & Haussler, 1999]]; [[Jaakkola, Meila & Jebara, 1999]]; [[Lasserre, Bishop & Minka, 2006]]; [[Ng & Jordan, 2002]]), where <math>y</math> is a [[class variable|class]] and <math>\bf{x}</math> is a [[description of an object to be classified]]. [[Generative learning]] contrasts with [[discriminative learning|<math>discriminative learning</math>]] in which a [[conditional probability model|model]] or [[conditional probability function estimate|estimate]] of [[conditional target probability|<math>P(y|\bf{x})</math>]] informed without reference to an explicit estimate of any of <math>P(y,\bf{x})</math>, <math>P(\bf{x})</math> or <math>P(\bf{x},y)</math>.        <P>         It is also common to categorize as discriminative approaches based on a [[decision function]] that directly maps from [[input]] <math>\bf{x}</math> onto the output <math>y</math> (such as [[support vector machines]], [[neural networks]], and [[decision trees]]), where the [[decision risk]] is [[minimized]] without [[estimation]] of <math>P(y,\bf{x})</math>, <math>P(\bf{x} | y)</math> or <math>P(y|\bf{x})</math> ([[Jaakkola & Haussler, 1999]]).        <P>         The [[standard exemplar]] of [[generative learning]] is [[naïve Bayes]] and of [[discriminative learning]], [[logistic regression]]. Another important contrasting pair is the [[generative hidden Markov model]] and [[discriminative conditional random field]]. It is widely accepted that [[generative learning]] workds well when [[samples]] are rare while [[discriminative learning]] has better [[asymptotic error performance]] ([[Ng & Jordan, 2002]]).        <P>         [[Efron (1975)]] provides an early examination of [[generative]]/[[discriminative]] [[distinction]]. [[Efron]] performs an empirical comparison]] of the [[efficiency]] of the [[generative linear discriminant analysis (LDA)]] and [[discriminative logistic regression]]. His results show that [[logistic regression]] has 30% less [[efficiency]] than [[LDA]], which means that [[discriminative approach]] is 30% slower to reach the [[asymptotic error]] than the [[generative approach]].        <P>         [[Ng et al. (2002)]] give a theoretical discussion of the efficiency of [[generative naïve Bayes]] and [[discriminative logistic regression]]. Their result shows that [[logistic regression]] converges towards its [[asymptotic error]] in order <math>n</math> [[sample]]s while [[naitve Bayes]] converges in order [[log]] <math>n</math> [[sample]]s. While [[logistic regression]] converges much slower than [[naïve Bayes]], [[logistic regression|it]] has lower [[asymptotic error]] than [[naïve Bayes]]. These results suggest that it is desirable to use a [[generative approach]] when [[training data]] is [[scares]] and to use a [[discriminative approach]] when there is [[large [[training data]]set|enough training data]].        <P>           Recent research into the [[generative/discriminative learning distinction]] has concentrated on the area of [[hybrids of generative and discriminative learning]], as well as [[generative learning]] and [[discriminative learning]] in [[structured data learning]] or [[semi-supervised learning]] context.        <P>           In [[hybrid learning approach|hybrid approach]]es, [[predictive modeling researcher|researcher]]s see to obtain the merits of both [[generative learning]] and [[discriminative learning]]. Some examples include the [[Fisher kernel for discriminative learning]] ([[Jaakkola & Haussler, 1999]]), [[max-ent discriminative learning]] ([[Jaakkola, Meila & Jebara, 1999]]), and [[principled hybrids of generative and discriminative models]] ([[Lasserre, Bishop & Minka, 2006]]).  In [[structured data learning]], the [[output target|output data]] have [[dependent relationship]]s. As  an example of [[generative learning]], the [[hidden Markov models]] are used in [[structured learning task data|structured data problem]]s which need [[sequential decision]]s. The [[discriminative]] [[analog]] is the [[conditional random field models]]. Another example of [[discriminatively structured learning]] is [[Max-margin Markov networks]] ([[Taskar, Guestrin & Koller, 2004]]).        <P>         In [[semi-supervised learning]], [[co-training]] and [[multiview learning]] are usually applied to [[generative learning]] ([[Blum & Mitchell, 1998]]). It is less straightforward to apply [[semi-supervised learning]] in traditional [[discriminative learning]], since <math>P(y|\bf{x})</math> is estimated by ignoring <math>P(\bf{x})</math>. Examples of [[semi-supervised learning method]]s in [[discriminative learning include [[Transductive SVM]], [[Gaussian processes]], [[information regularization]], and [[graph-based method]]s ([[Chapelle, Scholkopf & Zien, 2006]]).
+** <i>[[Generative learning]]</i> refers alternatively to any [[classification learning process]] that classifies by using an [[estimate of the joint probability <math>P(y,\bf{x})</math> or to any [[classification learning process]] that [[classifi]]es by using [[prior probability estimate|estimates]] of the [[prior probability|prior probability <math>P(y)</math>]] and the [[conditional probability function|conditional probability <math>P(\bf{x}, y)</math>]] ([[Bishop, 2007]]; [[Jaakkola & Haussler, 1999]]; [[Jaakkola, Meila & Jebara, 1999]]; [[Lasserre, Bishop & Minka, 2006]]; [[Ng & Jordan, 2002]]), where <math>y</math> is a [[class variable|class]] and <math>\bf{x}</math> is a [[description of an object to be classified]]. [[Generative learning]] contrasts with [[discriminative learning|<math>discriminative learning</math>]] in which a [[conditional probability model|model]] or [[conditional probability function estimate|estimate]] of [[conditional target probability|<math>P(y|\bf{x})</math>]] informed without reference to an explicit estimate of any of <math>P(y,\bf{x})</math>, <math>P(\bf{x})</math> or <math>P(\bf{x},y)</math>.        <P>         It is also common to categorize as discriminative approaches based on a [[decision function]] that directly maps from [[input]] <math>\bf{x}</math> onto the output <math>y</math> (such as [[support vector machines]], [[neural networks]], and [[decision trees]]), where the [[decision risk]] is [[minimized]] without [[estimation]] of <math>P(y,\bf{x})</math>, <math>P(\bf{x} | y)</math> or <math>P(y|\bf{x})</math> ([[Jaakkola & Haussler, 1999]]).        <P>         The [[standard exemplar]] of [[generative learning]] is [[naïve Bayes]] and of [[discriminative learning]], [[logistic regression]]. Another important contrasting pair is the [[generative hidden Markov model]] and [[discriminative conditional random field]]. It is widely accepted that [[generative learning]] workds well when [[samples]] are rare while [[discriminative learning]] has better [[asymptotic error performance]] ([[Ng & Jordan, 2002]]).        <P>          [[Efron (1975)]] provides an early examination of [[generative]]/[[discriminative]] [[distinction]]. [[Efron]] performs an empirical comparison]] of the [[efficiency]] of the [[generative linear discriminant analysis (LDA)]] and [[discriminative logistic regression]]. His results show that [[logistic regression]] has 30% less [[efficiency]] than [[LDA]], which means that [[discriminative approach]] is 30% slower to reach the [[asymptotic error]] than the [[generative approach]].        <P>          [[Ng et al. (2002)]] give a theoretical discussion of the efficiency of [[generative naïve Bayes]] and [[discriminative logistic regression]]. Their result shows that [[logistic regression]] converges towards its [[asymptotic error]] in order <math>n</math> [[sample]]s while [[naitve Bayes]] converges in order [[log]] <math>n</math> [[sample]]s. While [[logistic regression]] converges much slower than [[naïve Bayes]], [[logistic regression|it]] has lower [[asymptotic error]] than [[naïve Bayes]]. These results suggest that it is desirable to use a [[generative approach]] when [[training data]] is [[scares]] and to use a [[discriminative approach]] when there is [[large [[training data]]set|enough training data]].        <P>           Recent research into the [[generative/discriminative learning distinction]] has concentrated on the area of [[hybrids of generative and discriminative learning]], as well as [[generative learning]] and [[discriminative learning]] in [[structured data learning]] or [[semi-supervised learning]] context.        <P>           In [[hybrid learning approach|hybrid approach]]es, [[predictive modeling researcher|researcher]]s see to obtain the merits of both [[generative learning]] and [[discriminative learning]]. Some examples include the [[Fisher kernel for discriminative learning]] ([[Jaakkola & Haussler, 1999]]), [[max-ent discriminative learning]] ([[Jaakkola, Meila & Jebara, 1999]]), and [[principled hybrids of generative and discriminative models]] ([[Lasserre, Bishop & Minka, 2006]]).  In [[structured data learning]], the [[output target|output data]] have [[dependent relationship]]s. As  an example of [[generative learning]], the [[hidden Markov models]] are used in [[structured learning task data|structured data problem]]s which need [[sequential decision]]s. The [[discriminative]] [[analog]] is the [[conditional random field models]]. Another example of [[discriminatively structured learning]] is [[Max-margin Markov networks]] ([[Taskar, Guestrin & Koller, 2004]]).        <P>         In [[semi-supervised learning]], [[co-training]] and [[multiview learning]] are usually applied to [[generative learning]] ([[Blum & Mitchell, 1998]]). It is less straightforward to apply [[semi-supervised learning]] in traditional [[discriminative learning]], since <math>P(y|\bf{x})</math> is estimated by ignoring <math>P(\bf{x})</math>. Examples of [[semi-supervised learning method]]s in [[discriminative learning include [[Transductive SVM]], [[Gaussian processes]], [[information regularization]], and [[graph-based method]]s ([[Chapelle, Scholkopf & Zien, 2006]]).
 === 1999 ===

Generative and Discriminative Learning: Difference between revisions

Latest revision as of 01:45, 27 February 2024

References

2011

1999

Navigation menu

Search