# Jensen-Shannon Divergence (JSD) Metric

(Redirected from Jensen-Shannon Divergence)

## References

### 2011

• http://en.wikipedia.org/wiki/Jensen%E2%80%93Shannon_divergence
• In probability theory and statistics, the Jensen–Shannon divergence is a popular method of measuring the similarity between two probability distributions. It is also known as information radius (IRad)[1] or total divergence to the average.[2] It is based on the Kullback–Leibler divergence, with the notable (and useful) difference that it is always a finite value. The square root of the Jensen–Shannon divergence is a metric.[3][4]

Consider the set $\displaystyle{ M_+^1(A) }$ of probability distributions where A is a set provided with some σ-algebra of measurable subsets. In particular we can take A to be a finite or countable set with all subsets being measurable. The Jensen–Shannon divergence (JSD) $\displaystyle{ M_+^1(A) \times M_+^1(A) \rightarrow [0,\infty{}) }$ is a symmetrized and smoothed version of the Kullback–Leibler divergence $\displaystyle{ D(P \parallel Q) }$. It is defined by :$\displaystyle{ JSD(P \parallel Q)= \frac{1}{2}D(P \parallel M)+\frac{1}{2}D(Q \parallel M) }$ where $\displaystyle{ M=\frac{1}{2}(P+Q) }$ If A is countable, a more general definition, allowing for the comparison of more than two distributions, is: :$\displaystyle{ JSD(P_1, P_2, \ldots, P_n) = H\left(\sum_{i=1}^n \pi_i P_i\right) - \sum_{i=1}^n \pi_i H(P_i) }$ where $\displaystyle{ \pi_1, \pi_2, \ldots, \pi_n }$ are the weights for the probability distributions $\displaystyle{ P_1, P_2, \ldots, P_n }$ and $\displaystyle{ H(P) }$ is the Shannon entropy for distribution $\displaystyle{ P }$. For the two-distribution case described above, :$\displaystyle{ P_1=P, P_2=Q, \pi_1 = \pi_2 = \frac{1}{2}.\ }$

1. Hinrich Schütze; Christopher D. Manning (1999). Foundations of Statistical Natural Language Processing. Cambridge, Mass: MIT Press. p. 304. ISBN 0-262-13360-1.
2. Dagan, Ido; Lillian Lee, Fernando Pereira (1997). "Similarity-Based Methods For Word Sense Disambiguation". Proceedings of the Thirty-Fifth Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics: pp. 56–63. Retrieved 2008-03-09.
3. Endres, D. M.; J. E. Schindelin (2003). "A new metric for probability distributions". IEEE Trans. Inf. Theory 49 (7): pp. 1858–1860. doi:10.1109/TIT.2003.813506.
4. Ôsterreicher, F.; I. Vajda (2003). "A new class of metric divergences on probability spaces and its statistical applications". Ann. Inst. Statist. Math. 55 (3): pp. 639–653. doi:10.1007/BF02517812.