Bayesian Model Averaging Algorithm

From GM-RKB
Jump to navigation Jump to search

A Bayesian Model Averaging Algorithm is an Statistical Modeling Algorithm that seeks to approximate the Bayes Optimal Classifier by averaging the individual predictions of all bayesian classifiers in the hypothesis space, weighted by how well the classifiers explain the training data and how much we believe in them a priori.



References

2013

  • http://en.wikipedia.org/wiki/Ensemble_learning#Bayesian_model_averaging
    • Bayesian model averaging is an ensemble technique that seeks to approximate the Bayes Optimal Classifier by sampling hypotheses from the hypothesis space, and combining them using Bayes' law.[1] Unlike the Bayes optimal classifier, Bayesian model averaging can be practically implemented. Hypotheses are typically sampled using a Monte Carlo sampling technique such as MCMC. For example, Gibbs sampling may be used to draw hypotheses that are representative of the distribution [math]\displaystyle{ P(T|H) }[/math]. It has been shown that under certain circumstances, when hypotheses are drawn in this manner and averaged according to Bayes' law, this technique has an expected error that is bounded to be at most twice the expected error of the Bayes optimal classifier.[2] Despite the theoretical correctness of this technique, however, it has a tendency to promote over-fitting, and does not perform as well empirically as simpler ensemble techniques such as bagging.[3]
  1. Template:Cite jstor
  2. David Haussler, Michael Kearns, and Robert E. Schapire. Bounds on the sample complexity of Bayesian learning using information theory and the VC dimension. Machine Learning, 14:83–113, 1994
  3. Template:Cite conference


  function train_bayesian_model_averaging(T)
	z = -infinity
	For each model, m, in the ensemble:
		Train m, typically using a random subset of the [[training data]], T.
		Let prior[m] be the prior probability that m is the generating hypothesis.
			Typically, [[uniform prior]]s are used, so prior[m] = 1.
		Let x be the [[predictive accuracy]] (from 0 to 1) of m for predicting the labels in T.
		Use x to estimate log_likelihood[m]. Often, this is computed as
			log_likelihood[m] = |T| * (x * log(x) + (1 - x) * log(1 - x)),
			where |T| is the number of training patterns in T.
		z = max(z, log_likelihood[m])
	For each model, m, in the ensemble:
		weight[m] = prior[m] * exp(log_likelihood[m] - z)
	Normalize all the model weights to sum to 1.

2012

1999

1995