Posterior Probability-based Inference Algorithm

(Redirected from Bayesian algorithm)
Jump to navigation Jump to search

A Posterior Probability-based Inference Algorithm is a statistical inference algorithm (to predict outcome probabilities) in which prior probabilities and Bayes' Rule are used.



  1. Stanford encyclopedia of philosophy; Bayesian Epistemology;
  2. Gillies, Donald (2000); "Philosophical Theories of Probability"; Routledge; Chapter 4 "The subjective theory"



  1. Stanford encyclopedia of philosophy; Bayesian Epistemology;
  2. Gillies, Donald (2000); "Philosophical Theories of Probability"; Routledge; Chapter 4 "The subjective theory"
    • Bayesian inference derives the posterior probability as a consequence of two antecedents, a prior probability and a “likelihood function” derived from a probability model for the data to be observed. Bayesian inference computes the posterior probability according to Bayes' rule: :[math]\displaystyle{ P(H|E) = \frac{P(E|H) \cdot P(H)}{P(E)} }[/math] where
      • [math]\displaystyle{ \textstyle | }[/math] means given.
      • [math]\displaystyle{ \textstyle H }[/math] stands for any hypothesis whose probability may be affected by data (called evidence below). Often there are competing hypotheses, from which one chooses the most probable.
      • the evidence [math]\displaystyle{ \textstyle E }[/math] corresponds to data that were not used in computing the prior probability.
      • [math]\displaystyle{ \textstyle P(H) }[/math], the prior probability, is the probability of [math]\displaystyle{ \textstyle H }[/math] before [math]\displaystyle{ \textstyle E }[/math] is observed. This indicates one's preconceived beliefs about how likely different hypotheses are, absent evidence regarding the instance under study.
      • [math]\displaystyle{ \textstyle P(H|E) }[/math], the posterior probability, is the probability of [math]\displaystyle{ \textstyle H }[/math] given [math]\displaystyle{ \textstyle E }[/math], i.e., after [math]\displaystyle{ \textstyle E }[/math] is observed. This tells us what we want to know: the probability of a hypothesis given the observed evidence.
      • [math]\displaystyle{ \textstyle P(E|H) }[/math], the probability of observing [math]\displaystyle{ \textstyle E }[/math] given [math]\displaystyle{ \textstyle H }[/math], is also known as the likelihood. It indicates the compatibility of the evidence with the given hypothesis.
      • [math]\displaystyle{ \textstyle P(E) }[/math] is sometimes termed the marginal likelihood or "model evidence". This factor is the same for all possible hypotheses being considered. (This can be seen by the fact that the hypothesis [math]\displaystyle{ \textstyle H }[/math] does not appear anywhere in the symbol, unlike for all the other factors.) This means that this factor does not enter into determining the relative probabilities of different hypotheses.
    • Note that what affects the value of [math]\displaystyle{ \textstyle P(H|E) }[/math] for different values of [math]\displaystyle{ \textstyle H }[/math] is only the factors [math]\displaystyle{ \textstyle P(H) }[/math] and [math]\displaystyle{ \textstyle P(E|H) }[/math], which both appear in the numerator, and hence the posterior probability is proportional to both. In words:
      • (more exactly) The posterior probability of a hypothesis is determined by a combination of the inherent likeliness of a hypothesis (the prior) and the compatibility of the observed evidence with the hypothesis (the likelihood).
      • (more concisely) Posterior is proportional to prior times likelihood.

        Note that Bayes' rule can also be written as follows: :[math]\displaystyle{ P(H|E) = \frac{P(E|H)}{P(E)} \cdot P(H) }[/math] where the factor [math]\displaystyle{ \textstyle \frac{P(E|H)}{P(E)} }[/math] represents the impact of [math]\displaystyle{ E }[/math] on the probability of [math]\displaystyle{ H }[/math].



  • (Gharamani, 2004) ⇒ Zoubin Ghahramani. (2004). “Bayesian methods in machine learning." Seminar Talk, Oct 18 2004 at University of Birmingham.
    • Bayesian methods can be applied to a wide range of probabilistic models commonly used in machine learning and pattern recognition. The challenge is to discover approximate inference methods that can deal with complex models and large scale data sets in reasonable time. In the past few years Variational Bayesian (VB) approximations have emerged as an alternative to MCMC methods. I will review VB methods and demonstrate applications to clustering, dimensionality reduction, time series modelling with hidden Markov and state-space models, independent components analysis (ICA) and learning the structure of probablistic graphical models. Time permitting, I will discuss current and future directions in the machine learning community, including non-parametric Bayesian methods (e.g. Gaussian processes, Dirichlet processes, and extensions).



  • (Kass & Steffey, 1989) ⇒ R. Kass, and D. Steffey. (1989). “Approximate Bayesian Inference in Conditionally Independent Hierarchical Models (parametric empirical Bayes models).” In: Journal of the American Statistical Association, 84(407).


  • (Bernardo, 1979) ⇒ Jose M. Bernardo. (1979). “Reference posterior distributions for Bayesian inference.” In: Journal of the Royal Statistical Society. Series B.
    • ABSTRACT: A procedure is proposed to derive reference posterior distributions which approximately describe the inferential content of the data without incorporating any other information. More explicitly, operational priors, derived from information-theoretical considerations, are used to obtain reference posteriors which may be expected to approximate the posteriors which would have been obtained with the use of proper priors describing vague initial states of knowledge. The results obtained unify and generalize some previous work and seem to overcome criticisms to which this has been subject.