# Prior Probability Function

A prior probability function is a probability function representing prior knowledge.

**Context:**- It can be an input to Bayes Rule.

**Example(s):****Counter-Example(s):****See:**Bayesian Methods, Probability Mass, Local Smoothness Prior, Smoothness Prior, Conjugate Prior, Bayesian Probability, Latent Variable, Bayes' Theorem, Likelihood Function, Hyperparameter, Beta Distribution, Bernoulli Distribution.

## References

### 2015

- (Wikipedia, 2015) ⇒ http://en.wikipedia.org/wiki/prior_probability Retrieved:2015-6-23.
- In Bayesian statistical inference, a
**prior probability distribution**, often called simply the prior, of an uncertain quantity is the probability distribution*p*that would express one's beliefs about this quantity before some evidence is taken into account. For example,*p*could be the probability distribution for the proportion of voters who will vote for a particular politician in a future election. It is meant to attribute uncertainty, rather than randomness, to the quantity. The unknown quantity may be a parameter or latent variable.One applies Bayes' theorem, multiplying the prior by the likelihood function and then normalizing, to get the

*posterior probability distribution*, which is the conditional distribution of the uncertain quantity, given the data.A prior is often the purely subjective assessment of an experienced expert. Some will choose a

*conjugate prior*when they can, to make calculation of the posterior distribution easier.Parameters of prior distributions are called

*hyperparameters,*to distinguish them from parameters of the model of the underlying data. For instance, if one is using a beta distribution to model the distribution of the parameter*p*of a Bernoulli distribution, then:*p*is a parameter of the underlying system (Bernoulli distribution), and*α*and*β*are parameters of the prior distribution (beta distribution), hence*hyper*parameters.

- In Bayesian statistical inference, a

### 2011

- (Webb, 2011k) ⇒ Geoffrey I. Webb. (2011). “Prior Probability.” In: (Sammut & Webb, 2011) p.782
- QUOTE: In Bayesian inference, a prior probability of a value x of a random variable X, P(X = x), is the probability of X assuming the value x in the absence of (or before obtaining) any additional information. It contrasts with the posterior probability, P(X = x | Y = y), the probability of X assuming the value x in the context of Y = y.
For example, it may be that the prevalence of a particular form of cancer, exoma, in the population is 0.1%, so the prior probability of exoma, P(exoma = true), is 0.001. However, assume 50% of people who have skin discolorations of greater than 1 cm width (sd > 1cm) have exoma. It follows that the posterior probability of exoma given sd > 1cm, P(exoma = true | sd > 1cm = true), is 0.500.

- QUOTE: In Bayesian inference, a prior probability of a value x of a random variable X, P(X = x), is the probability of X assuming the value x in the absence of (or before obtaining) any additional information. It contrasts with the posterior probability, P(X = x | Y = y), the probability of X assuming the value x in the context of Y = y.

### 2006

- (Cox, 2006) ⇒ David R. Cox. (2006). “Principles of Statistical Inference." Cambridge University Press. ISBN:9780521685672

### 2000

- (Valpola, 2000) ⇒ Harri Valpola. (2000). “Bayesian Ensemble Learning for Nonlinear Factor Analysis." PhD Dissertation, Helsinki University of Technology.
- QUOTE: prior probability: Expresses the beliefs before making an observation. Sometimes referred to as the prior.