Naïve Bayes (NB) Classification Algorithm

Jump to: navigation, search

A Naïve Bayes (NB) classification algorithm is probabilistic learning algorithm that is a linear generative classification algorithm (i.e. that makes the conditional independence assumption).




  • (Wikipedia, 2011) ⇒
    • A Naive Bayes classifier is a simple probabilistic classifier based on applying Bayes' theorem with strong (naive) independence assumptions. A more descriptive term for the underlying probability model would be “independent feature model".

      In simple terms, a naive Bayes classifier assumes that the presence (or absence) of a particular feature of a class is unrelated to the presence (or absence) of any other feature. For example, a fruit may be considered to be an apple if it is red, round, and about 4" in diameter. Even if these features depend on each other or upon the existence of the other features, a naive Bayes classifier considers all of these properties to independently contribute to the probability that this fruit is an apple.

      Depending on the precise nature of the probability model, naive Bayes classifiers can be trained very efficiently in a supervised learning setting. In many practical applications, parameter estimation for naive Bayes models uses the method of maximum likelihood; in other words, one can work with the naive Bayes model without believing in Bayesian probability or using any Bayesian methods.

      In spite of their naive design and apparently over-simplified assumptions, naive Bayes classifiers have worked quite well in many complex real-world situations. In 2004, analysis of the Bayesian classification problem has shown that there are some theoretical reasons for the apparently unreasonable efficacy of naive Bayes classifiers.[1] Still, a comprehensive comparison with other classification methods in 2006 showed that Bayes classification is outperformed by more current approaches, such as boosted trees or random forests.[2]

      An advantage of the naive Bayes classifier is that it only requires a small amount of training data to estimate the parameters (means and variances of the variables) necessary for classification. Because independent variables are assumed, only the variances of the variables for each class need to be determined and not the entire covariance matrix.

  1. Harry Zhang "The Optimality of Naive Bayes". FLAIRS2004 conference. (available online: PDF)
  2. Caruana, R. and Niculescu-Mizil, A.: "An empirical comparison of supervised learning algorithms". Proceedings of the 23rd International Conference on Machine learning, 2006. (available online PDF)




  • (Hand & Yu, 2001) ⇒ David J. Hand, and Keming Yu. (2001). "Idiot's Bayes - not so stupid after all?." In: International Statistical Review, 69(3).
    • QUOTE:Folklore has it that a very simple supervised classification rule, based on the typically false assumption that the predictor variables are independent, can be highly effective, and often more effective than sophisticated rules. We examine the evidence For this, both empirical, as observed in real data applications, and theoretical, summarising explanations for why this simple rule might be effective. … In this paper, following almost all of the work on the idiot’s Bayes method, we adopt a frequentist interpretation. … The phenomenon is not limited to medicine. Other studies which found that the independence Bayes method performed very well, often better than the alternatives, include Cestnik. Kononenko & Bratko (1987). Clark & Niblett (1989). Cestnik (1990), Langley, Iba & Thompson (1992). Pazzani, Muramatsu & Billsus (1996), Friedman, Geiger & Goldszmidt (1997). and Domingos & Pazzani (1997).
  • (Rich, 2001) ⇒ Irina Rish. (2001). "An Empirical Study of the Maive Bayes Classifier." In: IJCAI 2001 Workshop on Empirical Methods in Artificial Intelligence.