2011 SmoothingTechniquesforAdaptiveO

From GM-RKB
Jump to navigation Jump to search

Subject Headings:

Notes

Cited By

Quotes

Author Keywords

Abstract

We are interested in the problem of tracking broad topics such as “baseball” and “fashion” in continuous streams of short texts, exemplified by tweets from the microblogging service Twitter. The task is conceived as a language modeling problem where per-topic models are trained using hashtags in the tweet stream, which serve as proxies for topic labels. Simple perplexity-based classifiers are then applied to filter the tweet stream for topics of interest. Within this framework, we evaluate, both intrinsically and extrinsically, smoothing techniques for integrating " foreground " models (to capture recency) and "background" models (to combat sparsity), as well as different techniques for retaining history. Experiments show that unigram language models smoothed using a normalized extension of stupid backoff and a simple queue for history retention performs well on the task.

References

;

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2011 SmoothingTechniquesforAdaptiveORion Snow
Jimmy Lin
William Morgan
Smoothing Techniques for Adaptive Online Language Models: Topic Tracking in Tweet Streams10.1145/2020408.20204762011