Naïve Bayes Classification Algorithm
A Naïve Bayes classification algorithm is probabilistic learning algorithm that is a linear generative classification algorithm (i.e. that makes the conditional independence assumption).
- AKA: Naive Bayes, NB, Bayesian Classifier.
- Context:
- It can be implemented by a Naïve Bayes Classification System.
- It can range from being a Binomial Naïve Bayes to being a Multinomial Naïve Bayes.
- It can range from being a Univariate Naïve Bayes to being a Multivariate Naïve Bayes.
- It can produce a Naive-Bayes Classification Model.
- It assumes Conditional Independence between Features.
- It can be robust to Noisy Data.
- It can naturally handle Missing Values.
- It can be robust to Irrelevant Features.
- Its Performance can degrade when there are correlated features (when the Conditional Independence assumption does not hold).
- Example(s):
- \(P(c_i, s) = P(c_i) P(s|c_i)\).
- \(\operatorname{max}...\)
- Counter-Example(s):
- See: Learning Algorithm; Generative Statistical Model; Bayes Rule; Bayesian Method; Bayesian Networks; Semi-Naïve Bayesian Learning.
References
2011
- (Webb, 2011g) ⇒ Geoffrey I. Webb. (2011). "Naïve Bayes." In: (Sammut & Webb, 2011) p.713
- http://en.wikipedia.org/wiki/Linear_classifier#Generative_models_vs._discriminative_models
- ... Naive Bayes classifier: assumes independent binomial conditional density models.
- (Wikipedia, 2011) ⇒ http://en.wikipedia.org/wiki/Naive_Bayes_classifier
- A Naive Bayes classifier is a simple probabilistic classifier based on applying Bayes' theorem with strong (naive) independence assumptions. A more descriptive term for the underlying probability model would be "independent feature model".
In simple terms, a naive Bayes classifier assumes that the presence (or absence) of a particular feature of a class is unrelated to the presence (or absence) of any other feature. For example, a fruit may be considered to be an apple if it is red, round, and about 4" in diameter. Even if these features depend on each other or upon the existence of the other features, a naive Bayes classifier considers all of these properties to independently contribute to the probability that this fruit is an apple.
Depending on the precise nature of the probability model, naive Bayes classifiers can be trained very efficiently in a supervised learning setting. In many practical applications, parameter estimation for naive Bayes models uses the method of maximum likelihood; in other words, one can work with the naive Bayes model without believing in Bayesian probability or using any Bayesian methods.
In spite of their naive design and apparently over-simplified assumptions, naive Bayes classifiers have worked quite well in many complex real-world situations. In 2004, analysis of the Bayesian classification problem has shown that there are some theoretical reasons for the apparently unreasonable efficacy of naive Bayes classifiers.^{[1]} Still, a comprehensive comparison with other classification methods in 2006 showed that Bayes classification is outperformed by more current approaches, such as boosted trees or random forests.^{[2]}
An advantage of the naive Bayes classifier is that it only requires a small amount of training data to estimate the parameters (means and variances of the variables) necessary for classification. Because independent variables are assumed, only the variances of the variables for each class need to be determined and not the entire covariance matrix.
- A Naive Bayes classifier is a simple probabilistic classifier based on applying Bayes' theorem with strong (naive) independence assumptions. A more descriptive term for the underlying probability model would be "independent feature model".
- ↑ Harry Zhang "The Optimality of Naive Bayes". FLAIRS2004 conference. (available online: PDF)
- ↑ Caruana, R. and Niculescu-Mizil, A.: "An empirical comparison of supervised learning algorithms". Proceedings of the 23rd international conference on Machine learning, 2006. (available online PDF)
2009
- (Hand, 2009) ⇒ David J. Hand. (2009). "Naïve Bayes." In: (Wu & Kumar, 2009).
2008
- (Wu, Kumar & al, 2008) ⇒ Xindong Wu, Vipin Kumar, J. Ross Quinlan, Joydeep Ghosh, Qiang Yang, Hiroshi Motoda, Geoffrey J. McLachlan, Angus Ng, Bing Liu and Philip S. Yu, Zhi-Hua Zhou, Michael Steinbach, David J. Hand, and Dan Steinberg. (2008). "Top 10 Algorithms in Data Mining." In: Knowledge and Information Systems, 14(1). doi:10.1007/s10115-007-0114-2
- QUOTE: One very important one is the naive Bayes method — also called idiot’s Bayes, simple Bayes, and independence Bayes. This method is important for several reasons. It is very easy to construct, not needing any complicated iterative parameter estimation schemes. This means it may be readily applied to huge data sets. It is easy to interpret, so users unskilled in classiﬁer technology can understand why it is making the classiﬁcation it makes. And ﬁnally, it often does surprisingly well: it may not be the best possible classiﬁer in any particular application, but it can usually be relied on to be robust and to do quite well. General discussion of the naive Bayes method and its merits are given in (Domingos & Pazzani, 1997; Hand & Yu, 2001).
2001
- (Hand & Yu, 2001) ⇒ David J. Hand, and Keming Yu. (2001). "Idiot's Bayes - not so stupid after all?." In: International Statistical Review, 69(3).
- QUOTE: Folklore has it that a very simple supervised classification rule, based on the typically false assumption that the predictor variables are independent, can be highly effective, and often more effective than sophisticated rules. We examine the evidence For this, both empirical, as observed in real data applications, and theoretical, summarising explanations for why this simple rule might be effective. ... In this paper, following almost all of the work on the idiot’s Bayes method, we adopt a frequentist interpretation. ... The phenomenon is not limited to medicine. Other studies which found that the independence Bayes method performed very well, often better than the alternatives, include Cestnik. Kononenko & Bratko (1987). Clark & Niblett (1989). Cestnik (1990), Langley, Iba & Thompson (1992). Pazzani, Muramatsu & Billsus (1996), Friedman, Geiger & Goldszmidt (1997). and Domingos & Pazzani (1997).
- (Rich, 2001) ⇒ Irina Rish. (2001). "An Empirical Study of the Maive Bayes Classifier." In: IJCAI 2001 Workshop on Empirical Methods in Artificial Intelligence.
1998
- (McCallum & Niham, 1998) ⇒ Andrew McCallum, and Kamal Nigam. (1998). "A Comparison of Event Models for Naive Bayes Text Classification." In: AAAI/ICML-98 Workshop on Learning for Text Categorization.
1997
- (Mitchell, 1997) ⇒ Tom M. Mitchell. (1997). "Machine Learning." McGraw-Hill.
- (Domingos & Pazzani, 1997) ⇒ Pedro Domingos, and Michael J. Pazzani. (1997). "On the Optimality of the Simple Bayesian Classifier Under Zero-One Loss." In: Machine Learning, 29.
1995
- (John & Langley, 1995) ⇒ George H. John, and Pat Langley. (1995). "Estimating Continuous Distributions in Bayesian Classifiers." In: Proceedings of the eleventh Conference on Uncertainty in Artificial Intelligence (UAI 1995).