2015 DiversifyingRestrictedBoltzmann

From GM-RKB
Jump to navigation Jump to search

Subject Headings:

Notes

Cited By

Quotes

Author Keywords

Abstract

Restricted Boltzmann Machine (RBM) has shown great effectiveness in document modeling. It utilizes hidden units to discover the latent topics and can learn compact semantic representations for documents which greatly facilitate document retrieval, clustering and classification. The popularity (or frequency) of topics in text corpora usually follow a power-law distribution where a few dominant topics occur very frequently while most topics (in the long-tail region) have low probabilities. Due to this imbalance, RBM tends to learn multiple redundant hidden units to best represent dominant topics and ignore those in the long-tail region, which renders the learned representations to be redundant and non-informative. To solve this problem, we propose Diversified RBM (DRBM) which diversifies the hidden units, to make them cover not only the dominant topics, but also those in the long-tail region. We define a diversity metric and use it as a regularizer to encourage the hidden units to be diverse. Since the diversity metric is hard to optimize directly, we instead optimize its lower bound and prove that maximizing the lower bound with projected gradient ascent can increase this diversity metric. Experiments on document retrieval and clustering demonstrate that with diversification, the document modeling power of DRBM can be greatly improved.

References

;

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2015 DiversifyingRestrictedBoltzmannEric P. Xing
Pengtao Xie
Yuntian Deng
Diversifying Restricted Boltzmann Machine for Document Modeling10.1145/2783258.27832642015