Multinomial Naïve Bayes Algorithm

From GM-RKB
Jump to navigation Jump to search

A Multinomial Naïve Bayes Algorithm is a naïve Bayes algorithm that is a multiclass classification algorithm.



References

2013

  • (de Fortuny Enric et al., 2013) ⇒ Junqué de Fortuny Enric, Martens David, and Provost Foster. (2013). “Predictive Modeling With Big Data: Is Bigger Really Better?.” In: Big Data, 1(4): 215-226. doi:10.1089/big.2013.0037.
    • QUOTE: … We introduce a version of Naïve Bayes with a multivariate event model that can scale up efficiently to massive, sparse datasets. Specifically, this version of the commonly used multivariate Bernoulli Naïve Bayes only needs to consider the “active” elements of the dataset — those that are present or non-zero — which can be a tiny fraction of the elements in the matrix for massive, sparse data. This means that predictive modelers wanting to work with the very convenient Naïve Bayes algorithm are not forced to use the multinomial event model simply because it is more scalable. This article thereby makes a small but important addition to the cumulative answer to a current open research question17: How can we learn predictive models from lots of data? …

      … There are a several variants of Naïve Bayes in common use, which differ in their calculation of the conditional probabilities. Each of these variants assumes a different event model, which postulates how the features from the data were generated.30 The two most popular choices are the multivariate and the multinomial event models. The reader is referred to the appendix and to the literature30 for details, but the main idea behind the multinomial event model is that each input sample results from a series of independent draws from the collection of all features. The main advantage of this model is that it requires only computing over those features that have a count that is nonzero. For a sparse dataset, this results in a huge reduction in run time since all the zeros can be ignored. Although results have shown this multinomial Naïve Bayes to be superior to the multivariate event model (described next) for text mining, the multinomial model makes the generative assumption that the features are drawn with replacement, which is not well specified for most binary feature data (although the multinomial model is used commonly in practice for binary data). The multinomial model further makes the implicit assumption that the number of draws in the generative model is independent of the class, which roughly implies in our setting that each instance contains on average the same number of active features (behaviors) per class, which might not be true for the datasets that we are considering. Therefore, it is helpful to consider the multivariate model as well, even though the traditional formulation is not efficient for massive datasets.

      The multivariate event model takes the point-of-view that the features are generated according to an independent Bernoulli process with probability parameters θj for class C (see the appendix and the literature30 for technical details).

      Unfortunately, the usual multivariate formulation does not have the same attractive sparsity properties as the multinomial formulation. In the Appendix we present an alternative, equivalent formulation that circumvents the need to consider the massive number of inactive elements for binary classification problems containing only Boolean-valued features (which is the case in all our datasets).

2000