2008 FastCollapsedGibbsSamplingforLa

From GM-RKB
Jump to navigation Jump to search

Subject Headings:

Notes

Cited By

Quotes

Author Keywords

Abstract

In this paper we introduce a novel collapsed Gibbs sampling method for the widely used latent Dirichlet allocation (LDA) model. Our new method results in significant speedups on real world text corpora. Conventional Gibbs sampling schemes for LDA require O(K) operations per sample where K is the number of topics in the model. Our proposed method draws equivalent samples but requires on average significantly less then K operations per sample. On real-word corpora FastLDA can be as much as 8 times faster than the standard collapsed Gibbs sampler for LDA. No approximations are necessary, and we show that our fast sampling scheme produces exactly the same results as the standard (but slower) sampling scheme. Experiments on four real world data sets demonstrate speedups for a wide range of collection sizes. For the PubMed collection of over 8 million documents with a required computation time of 6 CPU months for LDA, our speedup of 5.7 can save 5 CPU months of computation.

References

,

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2008 FastCollapsedGibbsSamplingforLaPadhraic Smyth
Arthur Asuncion
Ian Porteous
David Newman
Alexander Ihler
Max Welling
Fast Collapsed Gibbs Sampling for Latent Dirichlet Allocation10.1145/1401890.1401960