# Pointwise Mutual Information (PMI) Measure

A Pointwise Mutual Information (PMI) Measure is a binary random variable measure of association for $\displaystyle{ x,y }$ based the ratio between the co-occurrence probability $\displaystyle{ P(x,y) }$ and the independent probability of observing $\displaystyle{ x,y }$ by chance, $\displaystyle{ p(x)p(y) }$.

## References

### 2016

• (Wikipedia, 2016) ⇒ http://en.wikipedia.org/wiki/Pointwise_mutual_information#Definition Retrieved:2016-2-10.
• The PMI of a pair of outcomes x and y belonging to discrete random variables X and Y quantifies the discrepancy between the probability of their coincidence given their joint distribution and their individual distributions, assuming independence. Mathematically: :$\displaystyle{ \operatorname{pmi}(x;y) \equiv \log\frac{p(x,y)}{p(x)p(y)} = \log\frac{p(x|y)}{p(x)} = \log\frac{p(y|x)}{p(y)}. }$ The mutual information (MI) of the random variables X and Y is the expected value of the PMI over all possible outcomes (with respect to the joint distribution $\displaystyle{ p(x,y) }$).

The measure is symmetric ($\displaystyle{ \operatorname{pmi}(x;y)=\operatorname{pmi}(y;x) }$). It can take positive or negative values, but is zero if X and Y are independent. Note that even though PMI may be negative or positive, its expected outcome over all joint events (MI) is positive. PMI maximizes when X and Y are perfectly associated (i.e. $\displaystyle{ p(x|y) }$ or $\displaystyle{ p(y|x)=1 }$), yielding the following bounds: :$\displaystyle{ -\infty \leq \operatorname{pmi}(x;y) \leq \min\left[ -\log p(x), -\log p(y) \right] . }$

Finally, $\displaystyle{ \operatorname{pmi}(x;y) }$ will increase if $\displaystyle{ p(x|y) }$ is fixed but $\displaystyle{ p(x) }$decreases.

Here is an example to illustrate:

Using this table we can marginalize to get the following additional table for the individual distributions:

With this example, we can compute four values for $\displaystyle{ pmi(x;y) }$. Using base-2 logarithms:

(For reference, the mutual information $\displaystyle{ \operatorname{I}(X;Y) }$ would then be 0.214170945)

### 2016

• (Levy et al., 2015) ⇒ Omer Levy, Yoav Goldberg, and Ido Dagan. (2015). “Improving Distributional Similarity with Lessons Learned from Word Embeddings.” In: Transactions of the Association for Computational Linguistics, 3.
• QUOTE: … A popular measure of this association is pointwise mutual information (PMI) (Church and Hanks, 1990). PMI is defined as the log ratio between w and c’s joint probability and the product of their marginal probabilities, which can be estimated by: $\displaystyle{ PMI(w, c) = \log \frac{\hat{P}(w,c)}{\hat{P}(w) \hat{P}(c)} = \log \frac{\#(w,c)·|D|}{\#(w)·\#(c)} }$ The rows of $\displaystyle{ M^{PMI} }$ contain many entries of word-context pairs (w, c) that were never observed in the corpus, for which PMI(w, c) = log 0 = −1. A common approach is thus to replace the $\displaystyle{ M^{PMI} }$ matrix with $\displaystyle{ M^{PMI}_0 }$, in which PMI(w, c) = 0 in cases where #(w, c) = 0. A more consistent approach is to use positive PMI (PPMI), in which all negative values are replaced by 0 ...

### 2011

• (Wikipedia, 2011) ⇒ http://en.wikipedia.org/wiki/Pointwise_mutual_information
• Pointwise mutual information (PMI), or specific mutual information, is a measure of association used in information theory and statistics.

The PMI of a pair of outcomes $\displaystyle{ x }$ and $\displaystyle{ y }$ belonging to discrete random variables $\displaystyle{ X }$ and $\displaystyle{ Y }$ quantifies the discrepancy between the probability of their coincidence given their joint distribution and the probability of their coincidence given only their individual distributions, assuming independence. Mathematically:
$\displaystyle{ SI(x,y) = \log\frac{p(x,y)}{p(x)p(y)}. }$

The mutual information (MI) of the random variables $\displaystyle{ X }$ and $\displaystyle{ Y }$ is the expected value of the PMI over all possible outcomes.

The measure is symmetric ($\displaystyle{ SI(x,y)=SI(y,x) }$). It can take on both negative and positive values but is zero if $\displaystyle{ X }$ and $\displaystyle{ Y }$ are independent, and equal to $\displaystyle{ -\log(p(x)) }$ if $\displaystyle{ X }$ and $\displaystyle{ Y }$ are perfectly associated. Finally, $\displaystyle{ SI(x,y) }$ will increase if $\displaystyle{ p(x|y) }$ is fixed but $\displaystyle{ p(x) }$decreases.

### 2006

• http://search.cpan.org/dist/Text-NSP/lib/Text/NSP/Measures/2D/MI/pmi.pm
• Assume that the frequency count data associated with a bigram <word1><word2> is stored in a 2x2 contingency table: $\displaystyle{ \begin{array}{c|cc|c} & \neg {word_2 } & ~word_2 & \\ \hline word_1 & n_{11} & n_{12} & n_{1p} \\ \neg word_1 & n_{21} & n_{22} & n_{2p} \\ \hline & n_{p1} & n_{p2} &n_{pp} \end{array} }$ where $\displaystyle{ n_{11} }$ is the number of times <word1><word2> occur together, and n12 is the number of times <word1> occurs with some word other than word2, and n1p is the number of times in total that word1 occurs as the first word in a bigram.

The expected values for the internal cells are calculated by taking the product of their associated marginals and dividing by the sample size, for example: $\displaystyle{ m_{11} = \frac {n_{p1} n_{1p}}{n_{pp}} }$

﻿ ﻿Pointwise Mutual Information (pmi) is defined as the log of the deviation between the observed frequency of a bigram (n11) and the probability of that bigram if it were independent (m11). :$\displaystyle{ PMI = \log \Bigl( \frac{n_{11}}{m_{11}} \Bigr) }$ The Pointwise Mutual Information tends to overestimate bigrams with low observed frequency counts. To prevent this sometimes a variation of pmi is used which increases the influence of the observed frequency. :$\displaystyle{ PMI = \log \bigl(\frac{(n_{11})^{exp}}{m_{11}}) }$ The $exp is 1 by default, so by default the measure will compute the Pointwise Mutual Information for the given bigram. To use a variation of the measure, users can pass the$exp parameter using the --pmi_exp command line option in statistic.pl or by passing the \$exp to the initializeStatistic() method from their program.