2006 TopicModelingBeyongBoW
Jump to navigation
Jump to search
- (Wallach, 2006) ⇒ Hanna M. Wallach. (2006). “Topic Modeling: beyond bag-of-words.” In: Proceedings of the 23rd ICML Conference (ICML 2006) doi:10.1145/1143844.1143967
Subject Headings: Topic Modeling Algorithm
Notes
Cited By
Quotes
Abstract
- Some models of textual corpora employ text generation methods involving n-gram statistics, while others use latent topic variables inferred using the "bag-of-words" assumption, in which word order is ignored. Previously, these methods have not been combined. In this work, I explore a hierarchical generative probabilistic model that incorporates both n-gram statistics and latent topic variables by extending a unigram topic model to include properties of a hierarchical Dirichlet bigram language model. The model hyperparameters are inferred using a Gibbs EM algorithm. On two data sets, each of 150 documents, the new model exhibits better predictive accuracy than either a hierarchical Dirichlet bigram language model or a unigram topic model. Additionally, the inferred topics are less dominated by function words than are topics discovered using unigram statistics, potentially making them more meaningful.
,
Author | volume | Date Value | title | type | journal | titleUrl | doi | note | year | |
---|---|---|---|---|---|---|---|---|---|---|
2006 TopicModelingBeyongBoW | Topic Modeling: beyond bag-of-words | http://www.cs.umass.edu/~wallach/publications/wallach06beyond.pdf | 10.1145/1143844.1143967 |