# Posterior Probability-based Inference Algorithm

A Posterior Probability-based Inference Algorithm is a statistical inference algorithm (to predict outcome probabilities) in which prior probabilities and Bayes' Rule are used.

**AKA:**Bayesian Inference Technique.**Context:**- It can range from being an Exact Bayesian Inference Algorithm to being an Approximate Bayesian Inference Algorithm.
- It can range from being a Parametric Bayesian Inference Algorithm to being an Non-Parametric Bayesian Inference Algorithm.
- It can be applied by a Bayesian Inference System (that can solve a Bayesian inference task).
- It can be based on a Probabilistic Graphical Model (Bayesian Graphical Model).
- It can use the Marginal Probability Function over Latent Variables (rather than make Point Estimates)
- It can (typically) be useful in cases where one has strong prior knowledge.
- It can (typically) be unhelpful in cases where there is no consensus about what the prior probabilities should be.
- It can (often) support Statistical Parameter Estimation and Statistical Hypothesis Testing.

**Example(s):****Counter-Example(s):****See:**Statistical Hypothesis Testing, Bayesian Probability, Inference Task, Posterior Probability, Bayesian Probability Theory, Maximum a Posteriori Estimate, Bayesian Network, Probabilistic Reasoning, Bayesianism.

## References

### 2015

- (Wikipedia, 2015) ⇒ http://en.wikipedia.org/wiki/Bayesian_inference Retrieved:2015-7-20.
**Bayesian inference**is a method of statistical inference in which Bayes' theorem is used to update the probability for a hypothesis as evidence is acquired. Bayesian inference is an important technique in statistics, and especially in mathematical statistics. Bayesian updating is particularly important in the dynamic analysis of a sequence of data. Bayesian inference has found application in a wide range of activities, including science, engineering, philosophy, medicine, and law. In the philosophy of decision theory, Bayesian inference is closely related to subjective probability, often called “Bayesian probability”. Bayesian probability provides a rational method for updating beliefs. However, non-Bayesian updating rules are compatible with rationality, according to philosophers Ian Hacking and Bas van Fraassen.^{[1]}^{[2]}

- ↑ Stanford encyclopedia of philosophy; Bayesian Epistemology; http://plato.stanford.edu/entries/epistemology-bayesian
- ↑ Gillies, Donald (2000); "Philosophical Theories of Probability"; Routledge; Chapter 4 "The subjective theory"

### 2014

- (Klarreich, 2014) ⇒ Erica Klarreich. (2014). “In Search of Bayesian Inference.” In: Communications of the ACM Journal, 58(1). doi:10.1145/2686734
- QUOTE: … generate a probability map for the airplane's location using Bayesian inference, a statistical approach to combining prior beliefs and experiences with new evidence. Metron started by constructing a probability map based on the initial data about the flight's disappearance, then used Bayes' Law to incorporate the evidence provided by the failures of the various search attempts. ...

### 2013

- http://en.wikipedia.org/wiki/Bayesian_inference
- In statistics,
**Bayesian inference**is a method of inference in which Bayes' rule is used to update the probability estimate for a hypothesis as additional evidence is learned. Bayesian updating is an important technique throughout statistics, and especially in mathematical statistics. For some cases, exhibiting a Bayesian derivation for a statistical method automatically ensures that the method works as well as any competing method. Bayesian updating is especially important in the dynamic analysis of a sequence of data. Bayesian inference has found application in a range of fields including science, engineering, philosophy, medicine, and law.In the philosophy of decision theory, Bayesian inference is closely related to discussions of subjective probability, often called “Bayesian probability”. Bayesian probability provides a rational method for updating beliefs; however, non-Bayesian updating rules are compatible with rationality, according to philosophers Ian Hacking and Bas van Fraassen.

^{[1]}^{[2]}

- In statistics,

- ↑ Stanford encyclopedia of philosophy; Bayesian Epistemology; http://plato.stanford.edu/entries/epistemology-bayesian
- ↑ Gillies, Donald (2000); "Philosophical Theories of Probability"; Routledge; Chapter 4 "The subjective theory"

- http://en.wikipedia.org/wiki/Bayesian_inference#Formal
- Bayesian inference derives the posterior probability as a consequence of two antecedents, a prior probability and a “likelihood function” derived from a probability model for the data to be observed. Bayesian inference computes the posterior probability according to Bayes' rule: :[math]\displaystyle{ P(H|E) = \frac{P(E|H) \cdot P(H)}{P(E)} }[/math] where
- [math]\displaystyle{ \textstyle | }[/math] means
*given*. - [math]\displaystyle{ \textstyle H }[/math] stands for any
*hypothesis*whose probability may be affected by data (called*evidence*below). Often there are competing hypotheses, from which one chooses the most probable. - the
*evidence*[math]\displaystyle{ \textstyle E }[/math] corresponds to data that were not used in computing the prior probability. - [math]\displaystyle{ \textstyle P(H) }[/math], the
*prior probability*, is the probability of [math]\displaystyle{ \textstyle H }[/math]*before*[math]\displaystyle{ \textstyle E }[/math] is observed. This indicates one's preconceived beliefs about how likely different hypotheses are, absent evidence regarding the instance under study. - [math]\displaystyle{ \textstyle P(H|E) }[/math], the
*posterior probability*, is the probability of [math]\displaystyle{ \textstyle H }[/math]*given*[math]\displaystyle{ \textstyle E }[/math], i.e., after*[math]\displaystyle{ \textstyle E }[/math] is observed. This tells us what we want to know: the probability of a hypothesis*given the observed evidence. - [math]\displaystyle{ \textstyle P(E|H) }[/math], the probability of observing [math]\displaystyle{ \textstyle E }[/math]
*given*[math]\displaystyle{ \textstyle H }[/math], is also known as the likelihood*. It indicates the compatibility of the evidence with the given hypothesis.* - [math]\displaystyle{ \textstyle P(E) }[/math] is sometimes termed the marginal likelihood or "model evidence". This factor is the same for all possible hypotheses being considered. (This can be seen by the fact that the hypothesis [math]\displaystyle{ \textstyle H }[/math] does not appear anywhere in the symbol, unlike for all the other factors.) This means that this factor does not enter into determining the relative probabilities of different hypotheses.

- [math]\displaystyle{ \textstyle | }[/math] means
- Note that what affects the value of [math]\displaystyle{ \textstyle P(H|E) }[/math] for different values of [math]\displaystyle{ \textstyle H }[/math] is only the factors [math]\displaystyle{ \textstyle P(H) }[/math] and [math]\displaystyle{ \textstyle P(E|H) }[/math], which both appear in the numerator, and hence the posterior probability is proportional to both. In words:
- (more exactly)
*The posterior probability of a hypothesis is determined by a combination of the inherent likeliness of a hypothesis (the prior) and the compatibility of the observed evidence with the hypothesis (the likelihood).* - (more concisely)
*Posterior is proportional to prior times likelihood.*Note that Bayes' rule can also be written as follows: :[math]\displaystyle{ P(H|E) = \frac{P(E|H)}{P(E)} \cdot P(H) }[/math] where the factor [math]\displaystyle{ \textstyle \frac{P(E|H)}{P(E)} }[/math] represents the impact of [math]\displaystyle{ E }[/math] on the probability of [math]\displaystyle{ H }[/math].

- (more exactly)

- Bayesian inference derives the posterior probability as a consequence of two antecedents, a prior probability and a “likelihood function” derived from a probability model for the data to be observed. Bayesian inference computes the posterior probability according to Bayes' rule: :[math]\displaystyle{ P(H|E) = \frac{P(E|H) \cdot P(H)}{P(E)} }[/math] where

### 2005

- (Geweke & Whiteman, 2005) ⇒ John Geweke, and Charles Whiteman. (2005). “Chapter 1. Bayesian Forecasting.” In: Handbook of Economic Forecasting, 1. doi:10.1016/S1574-0706(05)01001-3
- ABSTRACT: Bayesian forecasting is a natural product of a Bayesian approach to inference. The Bayesian approach in general requires explicit formulation of a model, and conditioning on known quantities, in order to draw inferences about unknown ones.

### 2004

- (Gharamani, 2004) ⇒ Zoubin Ghahramani. (2004). “Bayesian methods in machine learning." Seminar Talk, Oct 18 2004 at University of Birmingham.
- Bayesian methods can be applied to a wide range of probabilistic models commonly used in machine learning and pattern recognition. The challenge is to discover approximate inference methods that can deal with complex models and large scale data sets in reasonable time. In the past few years Variational Bayesian (VB) approximations have emerged as an alternative to MCMC methods. I will review VB methods and demonstrate applications to clustering, dimensionality reduction, time series modelling with hidden Markov and state-space models, independent components analysis (ICA) and learning the structure of probablistic graphical models. Time permitting, I will discuss current and future directions in the machine learning community, including non-parametric Bayesian methods (e.g. Gaussian processes, Dirichlet processes, and extensions).

### 2003

- (Beal, 2003) ⇒ Matthew J. Beal. (2003). “Variational Algorithms for Approximate Bayesian Inference." Ph.D. thesis, Gatsby Computational Neuroscience Unit, University College London.

### 1989

- (Kass & Steffey, 1989) ⇒ R. Kass, and D. Steffey. (1989). “Approximate Bayesian Inference in Conditionally Independent Hierarchical Models (parametric empirical Bayes models).” In: Journal of the American Statistical Association, 84(407).

### 1979

- (Bernardo, 1979) ⇒ Jose M. Bernardo. (1979). “Reference posterior distributions for Bayesian inference.” In: Journal of the Royal Statistical Society. Series B.
- ABSTRACT: A procedure is proposed to derive reference posterior distributions which approximately describe the inferential content of the data without incorporating any other information. More explicitly, operational priors, derived from information-theoretical considerations, are used to obtain reference posteriors which may be expected to approximate the posteriors which would have been obtained with the use of proper priors describing vague initial states of knowledge. The results obtained unify and generalize some previous work and seem to overcome criticisms to which this has been subject.