Pachinko Allocation Model
(Redirected from Pachinko Allocation)Jump to navigation Jump to search
A Pachinko Allocation Model is a topic model learning algorithm that improves on latent Dirichlet allocation by modeling correlations between topics in addition to the word correlations which constitute topics.
- AKA: PAM.
- See: Topic Model, Bioinformatics, Pachinko Machine.
- (Wikipedia, 2014) ⇒ http://en.wikipedia.org/wiki/Pachinko_allocation Retrieved:2014-5-12.
- In machine learning and natural language processing, the pachinko allocation model (PAM) is a topic model. Topic models are a suite of algorithms to uncover the hidden thematic structure of a collection of documents. The algorithm improves upon earlier topic models such as latent Dirichlet allocation (LDA) by modeling correlations between topics in addition to the word correlations which constitute topics. PAM provides more flexibility and greater expressive power than latent Dirichlet allocation. While first described and implemented in the context of natural language processing, the algorithm may have applications in other fields such as bioinformatics. The model is named for pachinko machines — a game popular in Japan, in which metal balls bounce down around a complex collection of pins until they land in various bins at the bottom.
- (Mimno et al., 2007) ⇒ David Mimno, Wei Li, and Andrew McCallum. (2007). “Mixtures of Hierarchical Topics with Pachinko Allocation.” In: Proceedings of the 24th International Conference on Machine learning. ISBN:978-1-59593-793-3 doi:10.1145/1273496.1273576
- QUOTE: Another approach to representing the organization of topics is the pachinko allocation model (PAM) (Li & McCallum, 2006). PAM is a family of generative models in which words are generated by a directed acyclic graph (DAG) consisting of distributions over words and distributions over other nodes. A simple example of the PAM framework, four-level PAM, is described in Li and McCallum (2006). There is a single node at the top of the DAG that defines a distribution over nodes in the second level, which we refer to as super-topics. Each node in the second level defines a distribution over all nodes in the third level, or sub-topics. Each sub-topic maps to a single distribution over the vocabulary. Only the sub-topics, therefore, actually produce words. The super-topics represent clusters of topics that frequently cooccur.