Bayes Rule

A Bayes Rule is a probability update rule which states that you must multiply the prior probability (that a belief is true) by the probability that the evidence is true given that the belief is true divided by the probability that the evidence is true regardless of whether the belief is true.

AKA: Bayes Theorem.
Context:
- It can be stated as,if [math]\displaystyle{ E_1,E_2,\dots,E_n }[/math] are mutually disjoint events with a priori probabilities [math]\displaystyle{ P(E_i)\neq 0,(i=1,2,\dots,n) }[/math] then for any arbitrary event [math]\displaystyle{ A }[/math] which is a subset of [math]\displaystyle{ \displaystyle\bigcup_{i=1}^{n}E_i }[/math], such that [math]\displaystyle{ P(A)\gt 0 }[/math], we have the posterior probabilities [math]\displaystyle{ P(E_i|A)=\frac{P(E_i)P(A|E_i)}{\displaystyle\sum_{i=1}^{n}P(E_i)P(A|E_i)},i=1,2,\dots,n. }[/math] Here [math]\displaystyle{ P(A|E_i),i=1,2,\dots,n }[/math] are called likelihoods.
- It can be used by a Bayesian Inference Algorithm.
- It can be proved by application of the Product Rule.
- It can be used as a Decision Rule based on minimizing Average Loss.
- It can be restated as “The plausibility of your belief depends on the degree to which your belief -- and only your belief--explains the evidence for it. The more alternative explanations there are for the evidence, the less plausible your belief is.”[1]
Example(s):
- Posterior Probability = The Prior × Normalized Likelihood.
- [math]\displaystyle{ Pr(A \vert B) = \frac{Pr(B \vert A)}{1} \times \frac{Pr(A)}{Pr(B)} }[/math].
- In answering a question on a multiple choice test, a student either knows the answer (with probability [math]\displaystyle{ p }[/math]) or he guesses (with probability [math]\displaystyle{ 1-p }[/math]).Assume that the probability of answering a question correctly is unity for a student who knows the answer and [math]\displaystyle{ \frac{1}{m} }[/math] for the student who guesses, where [math]\displaystyle{ m }[/math] is the number of multiple choice alternatives. Supposing a student answers a question correctly, the probability that he really knows the answer can be found out by using the Bayes theorem as follows, let [math]\displaystyle{ E_1= }[/math]The student knows the answer, [math]\displaystyle{ E_2= }[/math]The student guesses the answer and [math]\displaystyle{ A= }[/math]The student answers correctly. Then [math]\displaystyle{ P(E_1)=p,P(E_2)=1-p,P(A|E_1)=1 }[/math] and [math]\displaystyle{ P(A|E_2)=\frac{1}{m} }[/math].Now using Bayes theorem, the probability that a student really knows the answer given that the student answers it correctly is [math]\displaystyle{ P(E_1|A)=\frac{P(E_1)P(A|E_1)}{P(E_1)P(A|E_1)+P(E_2)P(A|E_2)}=\frac{p.1}{p.1+(1-p).\frac{1}{m}}=\frac{mp}{1+(m-1)p} }[/math]
Counter-Example(s):
- a Minimax Decision Rule (which minimizes maximum loss).
- Falsifiability.
See: Belief Revision; Probability Theory; Bayesian Network; Naive-Bayes Model; Naive-Bayes Classifier; Bayesian Probability; Bayesianist; Bayesian Methods; Bayesian Network; Bayesian Model Selection, Bayes Networks, Statistical Proof.

References

2015

(Wikipedia, 2015) ⇒ http://en.wikipedia.org/wiki/Bayes'_rule Retrieved:2015-9-15.
- In probability theory and applications, Bayes' rule relates the odds of event [math]\displaystyle{ A_1 }[/math] to the odds of event [math]\displaystyle{ A_2 }[/math] , before (prior to) and after (posterior to) conditioning on another event [math]\displaystyle{ B }[/math] . The odds on [math]\displaystyle{ A_1 }[/math] to event [math]\displaystyle{ A_2 }[/math] is simply the ratio of the probabilities of the two events. The prior odds is the ratio of the unconditional or prior probabilities, the posterior odds is the ratio of conditional or posterior probabilities given the event [math]\displaystyle{ B }[/math] . The relationship is expressed in terms of the likelihood ratio or Bayes factor, [math]\displaystyle{ \Lambda }[/math] . By definition, this is the ratio of the conditional probabilities of the event [math]\displaystyle{ B }[/math] given that [math]\displaystyle{ A_1 }[/math] is the case or that [math]\displaystyle{ A_2 }[/math] is the case, respectively. The rule simply states: posterior odds equals prior odds times Bayes factor (Gelman et al., 2005, Chapter 1).
  When arbitrarily many events [math]\displaystyle{ A }[/math] are of interest, not just two, the rule can be rephrased as posterior is proportional to prior times likelihood, [math]\displaystyle{ P(A|B)\propto P(A) P(B|A) }[/math] where the proportionality symbol means that the left hand side is proportional to (i.e., equals a constant times) the right hand side as [math]\displaystyle{ A }[/math] varies, for fixed or given [math]\displaystyle{ B }[/math] (Lee, 2012; Bertsch McGrayne, 2012). In this form it goes back to Laplace (1774) and to Cournot (1843); see Fienberg (2005).
  Bayes' rule is an equivalent way to formulate Bayes' theorem. If we know the odds for and against [math]\displaystyle{ A }[/math] we also know the probabilities of [math]\displaystyle{ A }[/math] . It may be preferred to Bayes' theorem in practice for a number of reasons.
  Bayes' rule is widely used in statistics, science and engineering, for instance in model selection, probabilistic expert systems based on Bayes networks, statistical proof in legal proceedings, email spam filters, and so on (Rosenthal, 2005; Bertsch McGrayne, 2012). As an elementary fact from the calculus of probability, Bayes' rule tells us how unconditional and conditional probabilities are related whether we work with a frequentist interpretation of probability or a Bayesian interpretation of probability. Under the Bayesian interpretation it is frequently applied in the situation where [math]\displaystyle{ A_1 }[/math] and [math]\displaystyle{ A_2 }[/math] are competing hypotheses, and [math]\displaystyle{ B }[/math] is some observed evidence. The rule shows how one's judgement on whether [math]\displaystyle{ A_1 }[/math] or [math]\displaystyle{ A_2 }[/math] is true should be updated on observing the evidence [math]\displaystyle{ B }[/math] (Gelman et al., 2003).

(Wikipedia, 2015) ⇒ http://en.wikipedia.org/wiki/Bayes'_theorem Retrieved:2015-9-15.
- In probability theory and statistics, Bayes' theorem (alternatively Bayes' law or Bayes' rule) describes the probability of an event, based on conditions that might be related to the event. For example, suppose one is interested in whether Addison has cancer. Furthermore, suppose that Addison is age 65. If cancer is related to age, information about Addison's age can be used to more accurately assess his or her chance of having cancer using Bayes' Theorem.
  When applied, the probabilities involved in Bayes' theorem may have different interpretations. In one of these interpretations, the theorem is used directly as part of a particular approach to statistical inference. In particular, with the Bayesian interpretation of probability, the theorem expresses how a subjective degree of belief should rationally change to account for evidence: this is Bayesian inference, which is fundamental to Bayesian statistics. However, Bayes' theorem has applications in a wide range of calculations involving probabilities, not just in Bayesian inference.
  Bayes' theorem is named after Rev. Thomas Bayes (1701–1761), who firstshowed how to use new evidence to update beliefs. It was further developed by Pierre-Simon Laplace, who first published the modern formulation in his 1812 Théorie analytique des probabilités. Sir Harold Jeffreys put Bayes' algorithm and Laplace's formulation on an axiomatic basis. Jeffreys wrote that Bayes' theorem "is to the theory of probability what Pythagoras's theorem is to geometry".

2011

(Webb, 2011b) ⇒ Geoffrey I. Webb. (2011). “Bayes Rule.” In: (Sammut & Webb, 2011) p.74
- QUOTE: Bayes rule provides a decomposition of a conditional probability that is frequently used in a family of learning techniques collectively called Bayesian Learning. Bayes rule is the equality : [math]\displaystyle{ P(z|w) = P(z) P(w|z) P(w) }[/math] P(w) is called the prior probability, P(w|z) is called the posterior probability, and P(z|w) is called the likelihood.
  Bayes rule is used for two purposes. The first is Bayesian update. In this context, z represents some new information that has become available since an estimate P(w) was formed of some hypothesis w. The application of Bayes’ rule enables a new estimate of the probability of w (the posterior probability) to be calculated from estimates of the prior probability, the likelihood and P(z).
  The second common application of Bayes’ rule is for estimating posterior probabilities in probabilistic learning, ...

(Buntine, 2011) ⇒ Wray Buntine. (2011). “Bayesian Methods.” In: (Sammut & Webb, 2011) p.75
- QUOTE: The two most important concepts used in Bayesian modeling are probability and utility. probabilities are used to model our belief about the state of the world and utilities are used to model the value to us of different outcomes, thus to model costs and benefits. Probabilities are represented in the form of p(x|C), where C is the current known context and x is some event (s) of interest from a space χ. The left and right arguments of the probability function are in general propositions (in the logical sense). Probabilities are updated based on new evidence or outcomes by using Bayes rule, which takes the form :[math]\displaystyle{ p(x|C,y) = \frac{p(x|C)p(y|x,C)}{p(y|C)}, }[/math] where χ is the discrete domain of x. More generally, any measurable set can be used for the domain χ. An integral or mixed sum and integral can replace the sum. For a utility function u (x) of some event x, for instance the benefit of a particular outcome, the expected value of u () is …

2009

(Wikipedia, 2009) ⇒ http://en.wikipedia.org/wiki/Bayes'_theorem
- "In some interpretations of probability, Bayes' theorem tells how to update or revise beliefs in light of new evidence a posteriori."
- Each term in Bayes' theorem has a conventional name:
  - P(A) is the prior probability or marginal probability of A. It is "prior" in the sense that it does not take into account any information about B.
  - P(A|B) is the conditional probability of A, given B. It is also called the posterior probability because it is derived from or depends upon the specified value of B.
  - P(B|A) is the conditional probability of B given A.
  - P(B) is the prior or marginal probability of B, and acts as a normalizing constant.

2002

(Gabor Melli, 2002) ⇒ Gabor Melli. (2002). “PredictionWorks' Data Mining Glossary." PredictionWorks.
- Bayes Theorem : Describes a useful relationship between the likelihood of a future event (posteriors) and the likelihood of a prior event (priors). Given a hypothesis [math]\displaystyle{ h }[/math] and a dataset [math]\displaystyle{ D }[/math] the likelihood that the hypothesis is correct for the dataset P(h|D) can be expressed as P(D|h)P(h)/P(D). The use of P(h), "the prior", is the source of some debate among statisticians. The theorem can be proved by application of the product rule P(h^D)=P(h|D)P(D)=P(D|h)P(h). See: Naive-Bayes Classifier.