Maximum Entropy Model NLP Algorithm: Difference between revisions

Revision as of 16:54, 6 April 2023

See: NLP Algorithm; Maximum Entropy Model Algorithm; Maximum Entropy Markov Model, Log-linear Model.

References

2017a

(Sammut & Webb, 2017) ⇒ Claude Sammut, and Geoffrey I. Webb. (2017). “Maxent Models” In: (Sammut & Webb, 2017).
- QUOTE: Maximum Entropy Models for Natural Language Processing

2017b

(Ratnaparkhi, 2017) ⇒ Adwait Ratnaparkhi (2017). "Maximum Entropy Models for Natural Language Processing". In: (Sammut & Webb, 2017).
- QUOTE: The term maximum entropy refers to an optimization framework in which the goal is to find the probability model that maximizes entropy over the set of models that are consistent with the observed evidence.
  The information-theoretic notion of entropy is a way to quantify the uncertainty of a probability model; higher entropy corresponds to more uncertainty in the probability distribution. The rationale for choosing the maximum entropy model – from the set of models that meet the evidence – is that any other model assumes evidence that has not been observed (Jaynes 1957).
  In most natural language processing problems, observed evidence takes the form of co-occurrence counts between some prediction of interest and some linguistic context of interest. These counts are derived from a large number of linguistically annotated examples, known as a corpus. For example, the frequency in a large corpus with which the word that co-occurs with the tag corresponding to determiner, or DET, is a piece of observed evidence. A probability model is consistent with the observed evidence if its calculated estimates of the co-occurrence counts agree with the observed counts in the corpus.
  The goal of the maximum entropy framework is to find a model that is consistent with the co-occurrence counts, but is otherwise maximally uncertain. It provides a way to combine many pieces of evidence into a single probability model. An iterative parameter estimation procedure is usually necessary in order to find the maximum entropy probability model.

1957

(Jaynes, 1957) ⇒ E. T. Jaynes (1957). "Information theory and statistical mechanics". Phys Rev 106(4):620–630

@@ Line 12: / Line 12: @@
 === 2017b ===
 * ([[Ratnaparkhi, 2017]]) ⇒ [[Adwait Ratnaparkhi]] ([[2017]]). [https://link.springer.com/referenceworkentry/10.1007%2F978-1-4899-7687-1_525 "Maximum Entropy Models for Natural Language Processing"]. In: ([[Sammut & Webb, 2017]]).
-** QUOTE: The term [[maximum entropy]] refers to an [[optimization framework]] in which the goal is to find the [[probability model]] that [[maximize]]s [[entropy]] over the [[set]] of [[model]]s that are consistent with the [[observed evidence]].<P>        The [[information-theoretic]] notion of [[entropy]] is a way to [[quantify]] the [[uncertainty]] of a [[probability model]]; higher [[entropy]] corresponds to more [[uncertainty]] in the [[probability distribution]]. The rationale for choosing the [[maximum entropy model]] – from the [[set]] of [[model]]s that meet the [[evidence]] – is that any other [[model]] assumes [[evidence]] that has not been [[observed]] ([[Jaynes 1957]]).<P>        In most [[natural language processing]] problems, [[observed evidence]] takes the form of [[co-occurrence count]]s between some [[prediction]] of interest and some [[linguistic context]] of interest. These [[count]]s are derived from a large number of [[linguistically annotated example]]s, known as a [[corpus]]. For example, the [[frequency]] in a large [[corpus]] with which the word that co-occurs with the [[tag]] corresponding to [[determiner]], or [[DET]], is a piece of [[observed evidence]]. A [[probability model]] is consistent with the [[observed evidence]] if its calculated [[estimate]]s of the [[co-occurrence count]]s agree with the [[observed count]]s in the [[corpus]].<P>        The goal of the [[maximum entropy]] [[framework]] is to find a [[model]] that is consistent with the [[co-occurrence count]]s, but is otherwise [[maximally uncertain]]. It provides a way to combine many pieces of [[evidence]] into a single [[probability model]]. An [[iterative parameter estimation procedure]] is usually necessary in order to find the [[maximum entropy probability model]].
+** QUOTE: The term [[maximum entropy]] refers to an [[optimization framework]] in which the goal is to find the [[probability model]] that [[maximize]]s [[entropy]] over the [[set]] of [[model]]s that are consistent with the [[observed evidence]].         <P>        The [[information-theoretic]] notion of [[entropy]] is a way to [[quantify]] the [[uncertainty]] of a [[probability model]]; higher [[entropy]] corresponds to more [[uncertainty]] in the [[probability distribution]]. The rationale for choosing the [[maximum entropy model]] – from the [[set]] of [[model]]s that meet the [[evidence]] – is that any other [[model]] assumes [[evidence]] that has not been [[observed]] ([[Jaynes 1957]]).         <P>        In most [[natural language processing]] problems, [[observed evidence]] takes the form of [[co-occurrence count]]s between some [[prediction]] of interest and some [[linguistic context]] of interest. These [[count]]s are derived from a large number of [[linguistically annotated example]]s, known as a [[corpus]]. For example, the [[frequency]] in a large [[corpus]] with which the word that co-occurs with the [[tag]] corresponding to [[determiner]], or [[DET]], is a piece of [[observed evidence]]. A [[probability model]] is consistent with the [[observed evidence]] if its calculated [[estimate]]s of the [[co-occurrence count]]s agree with the [[observed count]]s in the [[corpus]].         <P>        The goal of the [[maximum entropy]] [[framework]] is to find a [[model]] that is consistent with the [[co-occurrence count]]s, but is otherwise [[maximally uncertain]]. It provides a way to combine many pieces of [[evidence]] into a single [[probability model]]. An [[iterative parameter estimation procedure]] is usually necessary in order to find the [[maximum entropy probability model]].
 === 1957 ===

Maximum Entropy Model NLP Algorithm: Difference between revisions

Revision as of 16:54, 6 April 2023

References

2017a

2017b

1957

Navigation menu

Search