# Normalized Mutual Information Metric

A normalized mutual information metric is a mutual information metric whose range is normalized to [math][0,1][/math].

**AKA:**Normalized Mutual Information, NMI.**Context:**- It can be defined as: [math]NMI(X,Y) = \frac{I(X,Y)}{\sqrt{H(X)H(Y)}}[/math], where [math]I()[/math] is the mutual information metric and [math]H()[/math] is the entropy metric.
- It can have property: [math]NMI(X,X) = 1[/math]

**Counter-Example(s):**- an F-Score Metric.

**See:**Text Clustering Task, Cluster Purity Metric, Normalized Function, Geometrical Mean.

## References

### 2009

- (Hu et al., 1999) ⇒ Xiaohua Hu, Xiaodan Zhang, Caimei Lu, E. K. Park, and Xiaohua Zhou. (2009). “Exploiting Wikipedia as External Knowledge for Document Clustering.” In: Proceedings of ACM SIGKDD Conference (KDD-2009). doi:10.1145/1557019.1557066
- QUOTE: Cluster quality is evaluated by three metrics, purity [14], F-score [10], and normalized mutual information (NMI) [15]. … NMI is an increasingly popular measure of clustering quality. It is defined as the mutual information between the cluster assignments and a pre-existing labeling of the dataset normalized by the arithmetic mean of the maximum possible entropies of the empirical marginals, i.e.
*NMI*(*X*,*Y*) = (*I*(*X*;*Y*)/ (log [math]k[/math] + log*c*) / 2, where [math]X[/math] is a random variable for cluster assignments, [math]Y[/math] is a random variable for the pre-existing labels on the same data, [math]k[/math] is the number of clusters, and [math]c[/math] is the number of pre-existing classes. A merit of NMI is that it does not necessarily increase when the number of clusters increases. All the three metrics range from 0 to 1, and the higher their value, the better the clustering quality is.

- QUOTE: Cluster quality is evaluated by three metrics, purity [14], F-score [10], and normalized mutual information (NMI) [15]. … NMI is an increasingly popular measure of clustering quality. It is defined as the mutual information between the cluster assignments and a pre-existing labeling of the dataset normalized by the arithmetic mean of the maximum possible entropies of the empirical marginals, i.e.

### 2002

- (Strehl & Ghosh, 2002b) ⇒ Alexander Strehl and Joydeep Ghosh. (2002). “Cluster Ensembles: a knowledge reuse framework for combining partitions.” In: Journal of Machine Learning Research, 3.
- QUOTE: Since [math]H(X) = I(X,X)[/math], we prefer the geometric mean because of the analogy with a normalized inner product in Hilbert space. Thus the normalized mutual information (NMI) used is: [math]NMI(X,Y) = \frac{I(X,Y)}{\sqrt{H(X)H(Y)}}[/math](2) … One can see that [math]NMI(X,X) = 1[/math], as desired. …

Our earlier work (Strehl and Ghosh, 2002a) used a slightly different normalization as only balanced clusters were desired: [math]NMI(X,Y) = 2 \cdot I(X,Y)=(log k^{(a)} + log k^{(b)})[/math], i.e., using arithmetic mean and assuming maximum entropy caused by perfect balancing.

- QUOTE: Since [math]H(X) = I(X,X)[/math], we prefer the geometric mean because of the analogy with a normalized inner product in Hilbert space. Thus the normalized mutual information (NMI) used is: [math]NMI(X,Y) = \frac{I(X,Y)}{\sqrt{H(X)H(Y)}}[/math](2) … One can see that [math]NMI(X,X) = 1[/math], as desired. …