Zipf's Law

From GM-RKB
(Redirected from Zipf's law)
Jump to navigation Jump to search

A Zipf's Law is an Empirical Law about settings where Empirical Distributions follow a Power Function.



References

  • (Wikipedia, 2009) ⇒ http://en.wikipedia.org/wiki/Zipf%27s_law
    • Zipf's law, an empirical law formulated using mathematical statistics, refers to the fact that many types of data studied in the physical and social sciences can be approximated with a Zipfian distribution, one of a family of related discrete power law probability distributions. The law is named after the linguist George Kingsley Zipf (pronounced /zɪf/) who first proposed it (Zipf 1935, 1949), though J.B. Estoup appears to have noticed the regularity before Zipf. [1].
      Zipf's law states that given some corpus of natural language utterances, the frequency of any word is inversely proportional to its rank in the frequency table. Thus the most frequent word will occur approximately twice as often as the second most frequent word, which occurs twice as often as the fourth most frequent word, etc. For example, in the Brown Corpus "the" is the most frequently occurring word, and by itself accounts for nearly 7% of all word occurrences (69,971 out of slightly over 1 million). True to Zipf's Law, the second-place word "of" accounts for slightly over 3.5% of words (36,411 occurrences), followed by "and" (28,852). Only 135 vocabulary items are needed to account for half the Brown Corpus.

1949

  • (Zipf, 1949) ⇒ George K. Zipf. (1949). “Human Behavior and the Principle of Least-Effort." Addison-Wesley.

1935

  • (Zipf, 1935) ⇒ George K. Zipf. (1935). “The Psychobiology of Language." Houghton-Mifflin.