Pointwise Mutual Information (PMI) Measure

From GM-RKB
Jump to navigation Jump to search

A Pointwise Mutual Information (PMI) Measure is a binary random variable measure of association for [math]\displaystyle{ x,y }[/math] based the ratio between the co-occurrence probability [math]\displaystyle{ P(x,y) }[/math] and the independent probability of observing [math]\displaystyle{ x,y }[/math] by chance, [math]\displaystyle{ p(x)p(y) }[/math].



References

2016

  • (Wikipedia, 2016) ⇒ http://en.wikipedia.org/wiki/Pointwise_mutual_information#Definition Retrieved:2016-2-10.
    • The PMI of a pair of outcomes x and y belonging to discrete random variables X and Y quantifies the discrepancy between the probability of their coincidence given their joint distribution and their individual distributions, assuming independence. Mathematically: :[math]\displaystyle{ \operatorname{pmi}(x;y) \equiv \log\frac{p(x,y)}{p(x)p(y)} = \log\frac{p(x|y)}{p(x)} = \log\frac{p(y|x)}{p(y)}. }[/math] The mutual information (MI) of the random variables X and Y is the expected value of the PMI over all possible outcomes (with respect to the joint distribution [math]\displaystyle{ p(x,y) }[/math]).

      The measure is symmetric ([math]\displaystyle{ \operatorname{pmi}(x;y)=\operatorname{pmi}(y;x) }[/math]). It can take positive or negative values, but is zero if X and Y are independent. Note that even though PMI may be negative or positive, its expected outcome over all joint events (MI) is positive. PMI maximizes when X and Y are perfectly associated (i.e. [math]\displaystyle{ p(x|y) }[/math] or [math]\displaystyle{ p(y|x)=1 }[/math]), yielding the following bounds: :[math]\displaystyle{ -\infty \leq \operatorname{pmi}(x;y) \leq \min\left[ -\log p(x), -\log p(y) \right] . }[/math]

      Finally, [math]\displaystyle{ \operatorname{pmi}(x;y) }[/math] will increase if [math]\displaystyle{ p(x|y) }[/math] is fixed but [math]\displaystyle{ p(x) }[/math]decreases.

      Here is an example to illustrate:

      Using this table we can marginalize to get the following additional table for the individual distributions:

      With this example, we can compute four values for [math]\displaystyle{ pmi(x;y) }[/math]. Using base-2 logarithms:

      (For reference, the mutual information [math]\displaystyle{ \operatorname{I}(X;Y) }[/math] would then be 0.214170945)

2016

2016

  • (Levy et al., 2015) ⇒ Omer Levy, Yoav Goldberg, and Ido Dagan. (2015). “Improving Distributional Similarity with Lessons Learned from Word Embeddings.” In: Transactions of the Association for Computational Linguistics, 3.
    • QUOTE: … A popular measure of this association is pointwise mutual information (PMI) (Church and Hanks, 1990). PMI is defined as the log ratio between w and c’s joint probability and the product of their marginal probabilities, which can be estimated by: [math]\displaystyle{ PMI(w, c) = \log \frac{\hat{P}(w,c)}{\hat{P}(w) \hat{P}(c)} = \log \frac{\#(w,c)·|D|}{\#(w)·\#(c)} }[/math] The rows of [math]\displaystyle{ M^{PMI} }[/math] contain many entries of word-context pairs (w, c) that were never observed in the corpus, for which PMI(w, c) = log 0 = −1. A common approach is thus to replace the [math]\displaystyle{ M^{PMI} }[/math] matrix with [math]\displaystyle{ M^{PMI}_0 }[/math], in which PMI(w, c) = 0 in cases where #(w, c) = 0. A more consistent approach is to use positive PMI (PPMI), in which all negative values are replaced by 0 ...

2011

  • (Wikipedia, 2011) ⇒ http://en.wikipedia.org/wiki/Pointwise_mutual_information
    • Pointwise mutual information (PMI), or specific mutual information, is a measure of association used in information theory and statistics.

      The PMI of a pair of outcomes [math]\displaystyle{ x }[/math] and [math]\displaystyle{ y }[/math] belonging to discrete random variables [math]\displaystyle{ X }[/math] and [math]\displaystyle{ Y }[/math] quantifies the discrepancy between the probability of their coincidence given their joint distribution and the probability of their coincidence given only their individual distributions, assuming independence. Mathematically:
      [math]\displaystyle{ SI(x,y) = \log\frac{p(x,y)}{p(x)p(y)}. }[/math]

      The mutual information (MI) of the random variables [math]\displaystyle{ X }[/math] and [math]\displaystyle{ Y }[/math] is the expected value of the PMI over all possible outcomes.

      The measure is symmetric ([math]\displaystyle{ SI(x,y)=SI(y,x) }[/math]). It can take on both negative and positive values but is zero if [math]\displaystyle{ X }[/math] and [math]\displaystyle{ Y }[/math] are independent, and equal to [math]\displaystyle{ -\log(p(x)) }[/math] if [math]\displaystyle{ X }[/math] and [math]\displaystyle{ Y }[/math] are perfectly associated. Finally, [math]\displaystyle{ SI(x,y) }[/math] will increase if [math]\displaystyle{ p(x|y) }[/math] is fixed but [math]\displaystyle{ p(x) }[/math]decreases.

2009

2006

  • http://search.cpan.org/dist/Text-NSP/lib/Text/NSP/Measures/2D/MI/pmi.pm
    • Assume that the frequency count data associated with a bigram <word1><word2> is stored in a 2x2 contingency table: [math]\displaystyle{ \begin{array}{c|cc|c} & \neg {word_2 } & ~word_2 & \\ \hline word_1 & n_{11} & n_{12} & n_{1p} \\ \neg word_1 & n_{21} & n_{22} & n_{2p} \\ \hline & n_{p1} & n_{p2} &n_{pp} \end{array} }[/math] where [math]\displaystyle{ n_{11} }[/math] is the number of times <word1><word2> occur together, and n12 is the number of times <word1> occurs with some word other than word2, and n1p is the number of times in total that word1 occurs as the first word in a bigram.

      The expected values for the internal cells are calculated by taking the product of their associated marginals and dividing by the sample size, for example: [math]\displaystyle{ m_{11} = \frac {n_{p1} n_{1p}}{n_{pp}} }[/math]

       Pointwise Mutual Information (pmi) is defined as the log of the deviation between the observed frequency of a bigram (n11) and the probability of that bigram if it were independent (m11). :[math]\displaystyle{ PMI = \log \Bigl( \frac{n_{11}}{m_{11}} \Bigr) }[/math] The Pointwise Mutual Information tends to overestimate bigrams with low observed frequency counts. To prevent this sometimes a variation of pmi is used which increases the influence of the observed frequency. :[math]\displaystyle{ PMI = \log \bigl(\frac{(n_{11})^{$exp}}{m_{11}}) }[/math] The $exp is 1 by default, so by default the measure will compute the Pointwise Mutual Information for the given bigram. To use a variation of the measure, users can pass the $exp parameter using the --pmi_exp command line option in statistic.pl or by passing the $exp to the initializeStatistic() method from their program.

1994

1991

1989