Information Entropy Metric

(Redirected from Shannon entropy)
Jump to navigation Jump to search

An Information Entropy Metric is a discrete random variable metric of the uncertainty associated with random variable [math]\displaystyle{ X }[/math].




  1. Ihara, Shunsuke (1993). Information theory for continuous systems. World Scientific. p. 2. ISBN 978-981-02-0985-8. 
  2. In this context, a 'message' means a specific realization of the random variable.
  3. Brillouin, Léon (2004). Science & Information Theory. Dover Publications. p. 293. ISBN 978-0-486-43918-1. 
  4. Shannon, Claude E. (July/October 1948). "A Mathematical Theory of Communication". Bell System Technical Journal 27 (3): 379–423.  (PDF)
  5. Goise, Francois & Olla, Stefano (2008). Entropy methods for the Boltzmann equation: lectures from a special semester at the Centre Émile Borel, Institut H. Poincaré, Paris, 2001. Springer. p. 14. ISBN 978-3-540-73704-9. 
  6. Schneier, B: Applied Cryptography, Second edition, page 234. John Wiley and Sons.
  7. Shannon, Claude E.: Prediction and entropy of printed English, The Bell System Technical Journal, 30:50–64, January 1951.
  8. Borda, Monica (2011). Fundamentals in Information Theory and Coding. Springer. p. 11. ISBN 978-3-642-20346-6. 
  9. Han, Te Sun & Kobayashi, Kingo (2002). Mathematics of Information and Coding. American Mathematical Society. pp. 19–20. ISBN 978-0-8218-4256-0. 
  10. Schneider, T.D, Information theory primer with an appendix on logarithms, National Cancer Institute, 14 April 2007.


    • In cryptanalysis, entropy is often roughly used as a measure of the unpredictability of a cryptographic key. For example, a 128-bit key that is randomly generated has 128 bits of entropy. It takes (on average) [math]\displaystyle{ 2^{128-1} }[/math] guesses to break by brute force. If the key's first digit is 0, and the others random, then the entropy is 127 bits, and it takes (on average) [math]\displaystyle{ 2^{127-1} }[/math] guesses.

      However, this measure fails if the possible keys are not of equal probability[clarification needed]. If the key is half the time "password" and half the time a true random 128-bit key, then the entropy is approximately 65 bits. Yet half the time the key may be guessed on the first try, if your first guess is "password", and on average, it takes around [math]\displaystyle{ 2^{126} }[/math] guesses (not [math]\displaystyle{ 2^{65-1} }[/math]) to break this password.

      Similarly, consider a 1000000-digit binary one-time pad. If the pad has 1000000 bits of entropy, it is perfect. If the pad has 999999 bits of entropy, evenly distributed (each individual bit of the pad having 0.999999 bits of entropy) it may still be considered very good. But if the pad has 999999 bits of entropy, where the first digit is fixed and the remaining 999999 digits are perfectly random, then the first digit of the ciphertext will not be encrypted at all.


  • (Wilson, 2008a) ⇒ Bill Wilson. (2008). “The Machine Learning Dictionary for COMP9414." University of New South Wales, Australia.
    • QUOTE: entropy: For our purposes, the entropy measure –Σi pilog2pi gives us the average amount of information in bits in some attribute of an instance. The information referred to is information about what class the instance belongs to, and pi is the probability that an instance belongs to class i. The rationale for this is as follows: –log2(p) is the amount of information in bits associated with an event of probability p - for example, with an event of probability ½, like flipping a fair coin, log2((p) is –log2(½) = 1, so there is one bit of information. This should coincide with our intuition of what a bit means (if we have one). If there is a range of possible outcomes with associated probabilities, then to work out the average number of bits, we need to multiply the number of bits for each outcome (–log2(p)) by the probability p and sum over all the outcomes. This is where the formula comes from. Entropy is used in the ID3 decision tree induction algorithm.