# Multinomial Probability Distribution Family

## References

### 2016

• (Wikipedia, 2016) ⇒ https://en.wikipedia.org/wiki/multinomial_distribution Retrieved:2016-9-14.
• In probability theory, the multinomial distribution is a generalization of the binomial distribution. For example, it models the probability of counts for rolling a k sided die n times. For n independent trials each of which leads to a success for exactly one of k categories, with each category having a given fixed success probability, the multinomial distribution gives the probability of any particular combination of numbers of successes for the various categories.

When n is 1 and k is 2 the multinomial distribution is the Bernoulli distribution. When k is 2 and number of trials are more than 1 it is the Binomial distribution. When n is 1 it is the categorical distribution.

The Bernoulli distribution is the probability distribution of whether a Bernoulli trial is a success. In other words, it models the number of heads from flipping a coin one time. The binomial distribution generalizes this to the number of heads from doing n independent flips of the same coin. For the multinomial distribution the analog to the Bernoulli Distribution is the categorical distribution. Instead of flipping one coin, the categorical distribution models the roll of one k sided die. So the multinomial distribution can model n independent rolls of a k sided die.

Let k be a fixed finite number. Mathematically, we have k possible mutually exclusive outcomes, with corresponding probabilities p1, ..., pk, and n independent trials. Note that since the k outcomes are mutually exclusive and one must occur we have pi ≥ 0 for i = 1, ..., k and $\sum_{i=1}^k p_i = 1$ . Then if the random variables Xi indicate the number of times outcome number i is observed over the n trials, the vector X = (X1, ..., Xk) follows a multinomial distribution with parameters n and p, where p = (p1, ..., pk). While the trials are independent, their outcomes X are dependent because they must be summed to n.

Note that, in some fields, such as natural language processing, the categorical and multinomial distributions are conflated, and it is common to speak of a "multinomial distribution" when a categorical distribution is actually meant. This stems from the fact that it is sometimes convenient to express the outcome of a categorical distribution as a "1-of-K" vector (a vector with one element containing a 1 and all other elements containing a 0) rather than as an integer in the range $1 \dots K$ ; in this form, a categorical distribution is equivalent to a multinomial distribution over a single trial.