# Pointwise Mutual Information (PMI) Measure

(Redirected from PMI Measure)

A Pointwise Mutual Information (PMI) Measure is a binary random variable measure of association for $x,y$ based the ratio between the co-occurrence probability $P(x,y)$ and the independent probability of observing $x,y$ by chance, $p(x)p(y)$.

## References

### 2016

• (Wikipedia, 2016) ⇒ http://en.wikipedia.org/wiki/Pointwise_mutual_information#Definition Retrieved:2016-2-10.
• The PMI of a pair of outcomes x and y belonging to discrete random variables X and Y quantifies the discrepancy between the probability of their coincidence given their joint distribution and their individual distributions, assuming independence. Mathematically: :$\operatorname{pmi}(x;y) \equiv \log\frac{p(x,y)}{p(x)p(y)} = \log\frac{p(x|y)}{p(x)} = \log\frac{p(y|x)}{p(y)}.$ The mutual information (MI) of the random variables X and Y is the expected value of the PMI over all possible outcomes (with respect to the joint distribution $p(x,y)$).

The measure is symmetric ($\operatorname{pmi}(x;y)=\operatorname{pmi}(y;x)$). It can take positive or negative values, but is zero if X and Y are independent. Note that even though PMI may be negative or positive, its expected outcome over all joint events (MI) is positive. PMI maximizes when X and Y are perfectly associated (i.e. $p(x|y)$ or $p(y|x)=1$), yielding the following bounds: :$-\infty \leq \operatorname{pmi}(x;y) \leq \min\left[ -\log p(x), -\log p(y) \right] .$

Finally, $\operatorname{pmi}(x;y)$ will increase if $p(x|y)$ is fixed but $p(x)$decreases.

Here is an example to illustrate:

Using this table we can marginalize to get the following additional table for the individual distributions:

With this example, we can compute four values for $pmi(x;y)$. Using base-2 logarithms:

(For reference, the mutual information $\operatorname{I}(X;Y)$ would then be 0.214170945)

### 2011

• (Wikipedia, 2011) ⇒ http://en.wikipedia.org/wiki/Pointwise_mutual_information
• Pointwise mutual information (PMI), or specific mutual information, is a measure of association used in information theory and statistics.

The PMI of a pair of outcomes $x$ and $y$ belonging to discrete random variables $X$ and $Y$ quantifies the discrepancy between the probability of their coincidence given their joint distribution and the probability of their coincidence given only their individual distributions, assuming independence. Mathematically:
$SI(x,y) = \log\frac{p(x,y)}{p(x)p(y)}.$

The mutual information (MI) of the random variables $X$ and $Y$ is the expected value of the PMI over all possible outcomes.

The measure is symmetric ($SI(x,y)=SI(y,x)$). It can take on both negative and positive values but is zero if $X$ and $Y$ are independent, and equal to $-\log(p(x))$ if $X$ and $Y$ are perfectly associated. Finally, $SI(x,y)$ will increase if $p(x|y)$ is fixed but $p(x)$decreases.

### 2006

• http://search.cpan.org/dist/Text-NSP/lib/Text/NSP/Measures/2D/MI/pmi.pm
• Assume that the frequency count data associated with a bigram <word1><word2> is stored in a 2x2 contingency table: $\begin{array}{c|cc|c} & \neg {word_2 } & ~word_2 & \\ \hline word_1 & n_{11} & n_{12} & n_{1p} \\ \neg word_1 & n_{21} & n_{22} & n_{2p} \\ \hline & n_{p1} & n_{p2} &n_{pp} \end{array}$ where $n_{11}$ is the number of times <word1><word2> occur together, and n12 is the number of times <word1> occurs with some word other than word2, and n1p is the number of times in total that word1 occurs as the first word in a bigram.

The expected values for the internal cells are calculated by taking the product of their associated marginals and dividing by the sample size, for example: $m_{11} = \frac {n_{p1} n_{1p}}{n_{pp}}$

Pointwise Mutual Information (pmi) is defined as the log of the deviation between the observed frequency of a bigram (n11) and the probability of that bigram if it were independent (m11). :$PMI = \log \Bigl( \frac{n_{11}}{m_{11}} \Bigr)$ The Pointwise Mutual Information tends to overestimate bigrams with low observed frequency counts. To prevent this sometimes a variation of pmi is used which increases the influence of the observed frequency. :$PMI = \log \bigl(\frac{(n_{11})^{exp}}{m_{11}})$ The $exp is 1 by default, so by default the measure will compute the Pointwise Mutual Information for the given bigram. To use a variation of the measure, users can pass the$exp parameter using the --pmi_exp command line option in statistic.pl or by passing the \$exp to the initializeStatistic() method from their program.