Text Topic Modeling Task

Jump to: navigation, search

A text topic modeling task is a topic modeling task that is a text corpus modeling task (whose input is a text corpus).



  • (Wikipedia, 2015) ⇒ http://en.wikipedia.org/wiki/Topic_model Retrieved:2015-4-25.
    • In machine learning and natural language processing, a topic model is a type of statistical model for discovering the abstract "topics" that occur in a collection of documents. Intuitively, given that a document is about a particular topic, one would expect particular words to appear in the document more or less frequently: "dog" and "bone" will appear more often in documents about dogs, "cat" and "meow" will appear in documents about cats, and "the" and "is" will appear equally in both. A document typically concerns multiple topics in different proportions; thus, in a document that is 10% about cats and 90% about dogs, there would probably be about 9 times more dog words than cat words. A topic model captures this intuition in a mathematical framework, which allows examining a set of documents and discovering, based on the statistics of the words in each, what the topics might be and what each document's balance of topics is.

      Although topic models were first described and implemented in the context of natural language processing, they have applications in other fields such as bioinformatics.