# Jensen-Shannon Divergence

## References

### 2011

• http://en.wikipedia.org/wiki/Jensen%E2%80%93Shannon_divergence
• In probability theory and statistics, the Jensen–Shannon divergence is a popular method of measuring the similarity between two probability distributions. It is also known as information radius (IRad)[1] or total divergence to the average.[2] It is based on the Kullback–Leibler divergence, with the notable (and useful) difference that it is always a finite value. The square root of the Jensen–Shannon divergence is a metric.[3][4]

Consider the set $M_+^1(A)$ of probability distributions where A is a set provided with some σ-algebra of measurable subsets. In particular we can take A to be a finite or countable set with all subsets being measurable. The Jensen–Shannon divergence (JSD) $M_+^1(A) \times M_+^1(A) \rightarrow [0,\infty{})$ is a symmetrized and smoothed version of the Kullback–Leibler divergence $D(P \parallel Q)$. It is defined by :$JSD(P \parallel Q)= \frac{1}{2}D(P \parallel M)+\frac{1}{2}D(Q \parallel M)$ where $M=\frac{1}{2}(P+Q)$ If A is countable, a more general definition, allowing for the comparison of more than two distributions, is: :$JSD(P_1, P_2, \ldots, P_n) = H\left(\sum_{i=1}^n \pi_i P_i\right) - \sum_{i=1}^n \pi_i H(P_i)$ where $\pi_1, \pi_2, \ldots, \pi_n$ are the weights for the probability distributions $P_1, P_2, \ldots, P_n$ and $H(P)$ is the Shannon entropy for distribution $P$. For the two-distribution case described above, :$P_1=P, P_2=Q, \pi_1 = \pi_2 = \frac{1}{2}.\$

1. Hinrich Schütze; Christopher D. Manning (1999). Foundations of Statistical Natural Language Processing. Cambridge, Mass: MIT Press. p. 304. ISBN 0-262-13360-1.
2. Dagan, Ido; Lillian Lee, Fernando Pereira (1997). "Similarity-Based Methods For Word Sense Disambiguation". Proceedings of the Thirty-Fifth Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics: pp. 56–63. Retrieved 2008-03-09.
3. Endres, D. M.; J. E. Schindelin (2003). "A new metric for probability distributions". IEEE Trans. Inf. Theory 49 (7): pp. 1858–1860. doi:10.1109/TIT.2003.813506.
4. Ôsterreicher, F.; I. Vajda (2003). "A new class of metric divergences on probability spaces and its statistical applications". Ann. Inst. Statist. Math. 55 (3): pp. 639–653. doi:10.1007/BF02517812.