Maximum Entropy Model NLP Algorithm: Difference between revisions

From GM-RKB
Jump to navigation Jump to search
m (Text replacement - ". ----" to ". ----")
m (Text replacement - ".<P>" to ". <P>")
Line 12: Line 12:
=== 2017b ===
=== 2017b ===
* ([[Ratnaparkhi, 2017]]) ⇒ [[Adwait Ratnaparkhi]] ([[2017]]). [https://link.springer.com/referenceworkentry/10.1007%2F978-1-4899-7687-1_525 "Maximum Entropy Models for Natural Language Processing"]. In: ([[Sammut & Webb, 2017]]).
* ([[Ratnaparkhi, 2017]]) ⇒ [[Adwait Ratnaparkhi]] ([[2017]]). [https://link.springer.com/referenceworkentry/10.1007%2F978-1-4899-7687-1_525 "Maximum Entropy Models for Natural Language Processing"]. In: ([[Sammut & Webb, 2017]]).
** QUOTE: The term [[maximum entropy]] refers to an [[optimization framework]] in which the goal is to find the [[probability model]] that [[maximize]]s [[entropy]] over the [[set]] of [[model]]s that are consistent with the [[observed evidence]].<P>        The [[information-theoretic]] notion of [[entropy]] is a way to [[quantify]] the [[uncertainty]] of a [[probability model]]; higher [[entropy]] corresponds to more [[uncertainty]] in the [[probability distribution]]. The rationale for choosing the [[maximum entropy model]] – from the [[set]] of [[model]]s that meet the [[evidence]] – is that any other [[model]] assumes [[evidence]] that has not been [[observed]] ([[Jaynes 1957]]).<P>        In most [[natural language processing]] problems, [[observed evidence]] takes the form of [[co-occurrence count]]s between some [[prediction]] of interest and some [[linguistic context]] of interest. These [[count]]s are derived from a large number of [[linguistically annotated example]]s, known as a [[corpus]]. For example, the [[frequency]] in a large [[corpus]] with which the word that co-occurs with the [[tag]] corresponding to [[determiner]], or [[DET]], is a piece of [[observed evidence]]. A [[probability model]] is consistent with the [[observed evidence]] if its calculated [[estimate]]s of the [[co-occurrence count]]s agree with the [[observed count]]s in the [[corpus]].<P>        The goal of the [[maximum entropy]] [[framework]] is to find a [[model]] that is consistent with the [[co-occurrence count]]s, but is otherwise [[maximally uncertain]]. It provides a way to combine many pieces of [[evidence]] into a single [[probability model]]. An [[iterative parameter estimation procedure]] is usually necessary in order to find the [[maximum entropy probability model]].
** QUOTE: The term [[maximum entropy]] refers to an [[optimization framework]] in which the goal is to find the [[probability model]] that [[maximize]]s [[entropy]] over the [[set]] of [[model]]s that are consistent with the [[observed evidence]].         <P>        The [[information-theoretic]] notion of [[entropy]] is a way to [[quantify]] the [[uncertainty]] of a [[probability model]]; higher [[entropy]] corresponds to more [[uncertainty]] in the [[probability distribution]]. The rationale for choosing the [[maximum entropy model]] – from the [[set]] of [[model]]s that meet the [[evidence]] – is that any other [[model]] assumes [[evidence]] that has not been [[observed]] ([[Jaynes 1957]]).         <P>        In most [[natural language processing]] problems, [[observed evidence]] takes the form of [[co-occurrence count]]s between some [[prediction]] of interest and some [[linguistic context]] of interest. These [[count]]s are derived from a large number of [[linguistically annotated example]]s, known as a [[corpus]]. For example, the [[frequency]] in a large [[corpus]] with which the word that co-occurs with the [[tag]] corresponding to [[determiner]], or [[DET]], is a piece of [[observed evidence]]. A [[probability model]] is consistent with the [[observed evidence]] if its calculated [[estimate]]s of the [[co-occurrence count]]s agree with the [[observed count]]s in the [[corpus]].         <P>        The goal of the [[maximum entropy]] [[framework]] is to find a [[model]] that is consistent with the [[co-occurrence count]]s, but is otherwise [[maximally uncertain]]. It provides a way to combine many pieces of [[evidence]] into a single [[probability model]]. An [[iterative parameter estimation procedure]] is usually necessary in order to find the [[maximum entropy probability model]].


=== 1957 ===
=== 1957 ===

Revision as of 16:54, 6 April 2023

See: NLP Algorithm; Maximum Entropy Model Algorithm; Maximum Entropy Markov Model, Log-linear Model.



References

2017a

2017b

1957