Jensen-Shannon Divergence

Jump to: navigation, search

A Jensen-Shannon Divergence is a symmetric similarity measure between two probability distributions.



    • In probability theory and statistics, the Jensen–Shannon divergence is a popular method of measuring the similarity between two probability distributions. It is also known as information radius (IRad)[1] or total divergence to the average.[2] It is based on the Kullback–Leibler divergence, with the notable (and useful) difference that it is always a finite value. The square root of the Jensen–Shannon divergence is a metric.[3][4]

      Consider the set [math]M_+^1(A)[/math] of probability distributions where A is a set provided with some σ-algebra of measurable subsets. In particular we can take A to be a finite or countable set with all subsets being measurable. The Jensen–Shannon divergence (JSD) [math]M_+^1(A) \times M_+^1(A) \rightarrow [0,\infty{})[/math] is a symmetrized and smoothed version of the Kullback–Leibler divergence [math]D(P \parallel Q)[/math]. It is defined by :[math]JSD(P \parallel Q)= \frac{1}{2}D(P \parallel M)+\frac{1}{2}D(Q \parallel M)[/math] where [math]M=\frac{1}{2}(P+Q)[/math] If A is countable, a more general definition, allowing for the comparison of more than two distributions, is: :[math]JSD(P_1, P_2, \ldots, P_n) = H\left(\sum_{i=1}^n \pi_i P_i\right) - \sum_{i=1}^n \pi_i H(P_i)[/math] where [math]\pi_1, \pi_2, \ldots, \pi_n[/math] are the weights for the probability distributions [math]P_1, P_2, \ldots, P_n[/math] and [math]H(P)[/math] is the Shannon entropy for distribution [math]P[/math]. For the two-distribution case described above, :[math]P_1=P, P_2=Q, \pi_1 = \pi_2 = \frac{1}{2}.\ [/math]

  1. Hinrich Schütze; Christopher D. Manning (1999). Foundations of Statistical Natural Language Processing. Cambridge, Mass: MIT Press. p. 304. ISBN 0-262-13360-1. 
  2. Dagan, Ido; Lillian Lee, Fernando Pereira (1997). "Similarity-Based Methods For Word Sense Disambiguation". Proceedings of the Thirty-Fifth Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics: pp. 56–63. Retrieved 2008-03-09. 
  3. Endres, D. M.; J. E. Schindelin (2003). "A new metric for probability distributions". IEEE Trans. Inf. Theory 49 (7): pp. 1858–1860. doi:10.1109/TIT.2003.813506. 
  4. Ôsterreicher, F.; I. Vajda (2003). "A new class of metric divergences on probability spaces and its statistical applications". Ann. Inst. Statist. Math. 55 (3): pp. 639–653. doi:10.1007/BF02517812.