Kullback-Leibler (KL) Divergence Metric

(Redirected from Information Gain Measure)
Jump to: navigation, search

A Kullback-Leibler (KL) Divergence Metric is a non-symmetric similarity measure of the difference between two probability distributions [math]P[/math] and [math]Q[/math].

See: Jensen-Shannon Divergence, Information Gain Impurity Function, AIC Statistic.



  • http://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence
    • In probability theory and information theory, the Kullback–Leibler divergence[1][2][3] (also information divergence, information gain, relative entropy, or KLIC) is a non-symmetric measure of the difference between two probability distributions P and Q. KL measures the expected number of extra bits required to code samples from P when using a code based on Q, rather than using a code based on P. Typically P represents the "true" distribution of data, observations, or a precisely calculated theoretical distribution. The measure Q typically represents a theory, model, description, or approximation of P. Although it is often intuited as a metric or distance, the KL divergence is not a true metric — for example, it is not symmetric: the KL from P to Q is generally not the same as the KL from Q to P. KL divergence is a special case of a broader class of divergences called f-divergences. Originally introduced by Solomon Kullback and Richard Leibler in 1951 as the directed divergence between two distributions, it is not the same as a divergence in calculus. However, the KL divergence can be derived from the Bregman divergence.

      For probability distributions P and Q of a discrete random variable their K–L divergence is defined to be :[math]D_{\mathrm{KL}}(P\|Q) = \sum_i P(i) \log \frac{P(i)}{Q(i)}. \![/math] In words, it is the average of the logarithmic difference between the probabilities P and Q, where the average is taken using the probabilities P. The K-L divergence is only defined if P and Q both sum to 1 and if [math]Q(i)\gt 0[/math] for any i such that [math]P(i)\gt 0[/math]. If the quantity [math]0 \log 0[/math] appears in the formula, it is interpreted as zero. For distributions P and Q of a continuous random variable, KL-divergence is defined to be the integral:[4] :[math]D_{\mathrm{KL}}(P\|Q) = \int_{-\infty}^\infty p(x) \log \frac{p(x)}{q(x)} \, {\rm d}x, \![/math] where p and q denote the densities of P and Q.

      More generally, if P and Q are probability measures over a set X, and Q is absolutely continuous with respect to P, then the Kullback–Leibler divergence from P to Q is defined as :[math] D_{\mathrm{KL}}(P\|Q) = -\int_X \log \frac{{\rm d}Q}{{\rm d}P} \,{\rm d}P, \![/math] where [math]\frac{{\rm d}Q}{{\rm d}P} [/math] is the Radon–Nikodym derivative of Q with respect to P, and provided the expression on the right-hand side exists.

  1. Kullback, S.; Leibler, R.A. (1951). "On Information and Sufficiency". Annals of Mathematical Statistics 22 (1): 79–86. doi:10.1214/aoms/1177729694. MR39968. 
  2. S. Kullback (1959) Information theory and statistics (John Wiley and Sons, NY).
  3. Kullback, S.; Burnham, K. P.; Laubscher, N. F.; Dallal, G. E.; Wilkinson, L.; Morrison, D. F.; Loyer, M. W.; Eisenberg, B. et al. (1987). "Letter to the Editor: The Kullback–Leibler distance". The American Statistician 41 (4): 340–341. JSTOR 2684769. 
  4. C. Bishop (2006). Pattern Recognition and Machine Learning. p. 55.