2004 FindingScientificTopics

(Griffiths & Steyvers, 2004) ⇒ Thomas L. Griffiths, and Mark Steyvers. (2004). “Finding Scientific Topics.” In: Proceedings of the National Academy of Sciences (PNAS), 101(Suppl. 1). doi:10.1073/pnas.0307752101

Subject Headings: Probabilistic Topic Model, Topic Modeling Algorithm, Markov Chain Monte Carlo Algorithm, Gibbs Sampling.

Notes

Cited By

~839 http://scholar.google.com/scholar?q=%22Finding+Scientific+Topics%22+2004

2006

(Blei & Lafferty, 2006) ⇒ David M. Blei, and John D. Lafferty. (2006). “Dynamic Topic Models.” In: Proceedings of the 23rd International Conference on Machine Learning (ICML 2006). doi:10.1145/1143844.1143859
- QUOTE: Recent research in machine learning and statistics has developed new techniques for finding patterns of words in document collections using hierarchical probabilistic models (Griffiths & Steyvers, 2004) … While Gibbs sampling has been effectively used for static topic models (Griffiths and Steyvers, 2004), nonconjugacy makes sampling methods more difficult for this dynamic model.

Quotes

Abstract

A first step in identifying the content of a document is determining which topics that document addresses. We describe a generative model for documents, introduced by [[2003_LatentDirichletAllocation|Blei, Ng, and Jordan [Blei, D. M., Ng, A. Y. & Jordan, M. I. (2003). J. Machine Learn. Res. 3, 993-1022]], in which each document is generated by choosing a distribution over topics and then choosing each word in the document from a topic selected according to this distribution. We then present a Markov chain Monte Carlo algorithm for inference in this model. We use this algorithm to analyze abstracts from PNAS by using Bayesian model selection to establish the number of topics. We show that the extracted topics capture meaningful structure in the data, consistent with the class designations provided by the authors of the articles, and outline further applications of this analysis, including identifying “hot topics” by examining temporal dynamics and tagging abstracts to illustrate semantic content.

References

1. (Blei, Ng & Jordan, 2003) ⇒ David M. Blei, Andrew Y. Ng , and Michael I. Jordan. (2003). “Latent Dirichlet Allocation.” In: The Journal of Machine Learning Research, 3.
2. Hofmann, T. (2001) Machine Learn. J. 42, 177–196.
3. Cohn, D. & Hofmann, T. (2001) in Advances in Neural Information Processing Systems 13 (MIT Press, Cambridge, MA), pp. 430–436.
4. Iyer, R. & Ostendorf, M. (1996) in: Proceedings of the International Conference on Spoken Language Processing (Applied Science & Engineering Laboratories, Alfred I. duPont Inst., Wilmington, DE), Vol 1., pp. 236–239.
5. Bigi, B., De Mori, R., El-Beze, M. & Spriet, T. (1997) in 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings (IEEE, Piscataway, NJ), pp. 535–542.
6. Ueda, N. & Saito, K. (2003). in Advances in Neural Information Processing Systems (MIT Press, Cambridge, MA), Vol. 15.
7. Erosheva, E. A. (2003). in Bayesian Statistics (Oxford Univ. Press, Oxford), Vol. 7.
8. Arthur P. Dempster, Laird, N. M. & Rubin, D. B. (1977) J. R. Stat. Soc. B 39, 1–38.
9. Minka, T. & Lafferty, J. (2002). Expectation-propagation for the generative aspect model. In: Proceedings of the 18th Conference on Uncertainty in Artificial Intelligence (Elsevier, New York).
10. Newman, M. E. J. & Barkema, G. T. (1999) Monte Carlo Methods in Statistical Physics (Oxford Univ. Press, Oxford).
11. Gilks, W. R., Richardson, S. & Spiegelhalter, D. J. (1996) Markov Chain Monte Carlo in Practice (Chapman & Hall, New York).
12. Liu, J. S. (2001) Monte Carlo Strategies in Scientific Computing (Springer, New York).
13. Geman, S. & Geman, D. (1984) IEEE Trans. Pattern Anal. Machine Intelligence 6, 721–741.
14. Christopher D. Manning & Hinrich Schütze (1999) Foundations of Statistical Natural Language Processing (MIT Press, Cambridge, MA).
15. Kass, R. E. & Raftery, A. E. (1995) J. Am. Stat. Assoc. 90, 773–795.
16. Kuhn, T. S. (1970) The Structure of Scientific Revolutions (Univ. of Chicago Press, Chicago), 2nd Ed.
17. Salmon, W. (1990) in Scientific Theories, Minnesota Studies in the Philosophy of Science, ed. Savage, C. W. (Univ. of Minnesota Press, Minneapolis),,

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2004 FindingScientificTopics	Thomas L. Griffiths Mark Steyvers			Finding Scientific Topics		Proceedings of the National Academy of Science	http://www.pnas.org/content/101/suppl.1/5228.full.pdf+html	10.1073/pnas.0307752101		2004