2015 LinearTimeSamplersforSupervised

From GM-RKB
Jump to navigation Jump to search

Subject Headings:

Notes

Cited By

Quotes

Author Keywords

Abstract

Topic models are effective probabilistic tools for processing large collections of unstructured data. With the exponential growth of modern industrial data, and consequentially also with our ambition to explore much bigger models, there is a real pressing need to significantly scale up topic modeling algorithms, which has been taken up in lots of previous works, culminating in the recent fast Markov chain Monte Carlo sampling algorithms in [10, 23] for the unsupervised latent Dirichlet allocation (LDA) formulations.

In this work we extend the recent sampling advances for unsupervised LDA models to supervised tasks. We focus on the Gibbs MedLDA model [27] that is able to simultaneously discover latent structures and make accurate predictions. By combining a set of sampling techniques we are able to reduce the O (K 3 + DK 2 + DNK complexity in [27] to O (DK + DN) when there are K topics and D documents with average length N. To our best knowledge, this is the first linear time sampling algorithm for supervised topic models. Our algorithm requires minimal modifications to incorporate most loss functions in a variety of supervised tasks, and we observe in our experiments an order of magnitude speedup over the current state-of-the-art implementation, while achieving similar prediction performances.

The open-source C++ implementation of the proposed algorithm is available at https://github.com/xunzheng / light_medlda.

References

;

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2015 LinearTimeSamplersforSupervisedEric P. Xing
Xun Zheng
Yaoliang Yu
Linear Time Samplers for Supervised Topic Models Using Compositional Proposals10.1145/2783258.27833712015