Normalized Mutual Information Metric

A normalized mutual information metric is a mutual information metric whose range is normalized to [math]\displaystyle{ [0,1] }[/math].

AKA: Normalized Mutual Information, NMI.
Context:
- It can be defined as: [math]\displaystyle{ NMI(X,Y) = \frac{I(X,Y)}{\sqrt{H(X)H(Y)}} }[/math], where [math]\displaystyle{ I() }[/math] is the mutual information metric and [math]\displaystyle{ H() }[/math] is the entropy metric.
- It can have property: [math]\displaystyle{ NMI(X,X) = 1 }[/math]
Counter-Example(s):
- an F-Score Metric.
See: Text Clustering Task, Cluster Purity Metric, Normalized Function, Geometrical Mean.

References

(Hu et al., 1999) ⇒ Xiaohua Hu, Xiaodan Zhang, Caimei Lu, E. K. Park, and Xiaohua Zhou. (2009). “Exploiting Wikipedia as External Knowledge for Document Clustering.” In: Proceedings of ACM SIGKDD Conference (KDD-2009). doi:10.1145/1557019.1557066
- QUOTE: Cluster quality is evaluated by three metrics, purity [14], F-score [10], and normalized mutual information (NMI) [15]. … NMI is an increasingly popular measure of clustering quality. It is defined as the mutual information between the cluster assignments and a pre-existing labeling of the dataset normalized by the arithmetic mean of the maximum possible entropies of the empirical marginals, i.e. NMI(X,Y) = (I(X ;Y)/ (log [math]\displaystyle{ k }[/math] + log c) / 2, where [math]\displaystyle{ X }[/math] is a random variable for cluster assignments, [math]\displaystyle{ Y }[/math] is a random variable for the pre-existing labels on the same data, [math]\displaystyle{ k }[/math] is the number of clusters, and [math]\displaystyle{ c }[/math] is the number of pre-existing classes. A merit of NMI is that it does not necessarily increase when the number of clusters increases. All the three metrics range from 0 to 1, and the higher their value, the better the clustering quality is.

(Strehl & Ghosh, 2002b) ⇒ Alexander Strehl and Joydeep Ghosh. (2002). “Cluster Ensembles: a knowledge reuse framework for combining partitions.” In: Journal of Machine Learning Research, 3.
- QUOTE: Since [math]\displaystyle{ H(X) = I(X,X) }[/math], we prefer the geometric mean because of the analogy with a normalized inner product in Hilbert space. Thus the normalized mutual information (NMI) used is: [math]\displaystyle{ NMI(X,Y) = \frac{I(X,Y)}{\sqrt{H(X)H(Y)}} }[/math](2) … One can see that [math]\displaystyle{ NMI(X,X) = 1 }[/math], as desired. …
  Our earlier work (Strehl and Ghosh, 2002a) used a slightly different normalization as only balanced clusters were desired: [math]\displaystyle{ NMI(X,Y) = 2 \cdot I(X,Y)=(log k^{(a)} + log k^{(b)}) }[/math], i.e., using arithmetic mean and assuming maximum entropy caused by perfect balancing.