2011 EmpiricalEvaluationandCombinati

(Mikolov et al., 2011) ⇒ Tomáš Mikolov, Anoop Deoras, Stefan Kombrink, Lukas Burget, and Jan Černocký. (2011). “Empirical Evaluation and Combination of Advanced Language Modeling Techniques..” In: Proceedings of the 12th Annual Conference of the International Speech Communication Association (INTERSPEECH 2011).

Subject Headings: Neural Network Language Model, Structured Language Model, Class-based Language Model, Cache-based Language Model, Maximum Entropy Language Model, Random Forest Language Model, Good-Turing Trigram language Model, Kneser-Ney Smoothed 5-Gram Language Model.

Notes

Other Version(s):
- MS Research : http://research.microsoft.com/pubs/175560/InterSpeech-2011.PDF

Cited By

Google Scholar: ~ 345 Citations.

Quotes

Abstract

We present results obtained with several advanced language modeling techniques, including class based model, cache model, maximum entropy model, structured language model, random forest language model and several types of neural network based language models. We show results obtained after combining all these models by using linear interpolation. We conclude that for both small and moderately sized tasks, we obtain new state-of-the-art results with combination of models, that is significantly better than performance of any individual model. Obtained perplexity reductions against Good-Turing trigram baseline are over 50% and against modified Kneser-Ney smoothed 5-gram over 40%.

1. Introduction

In this paper, we will deal with the statistical approaches to language modeling, that are motivated by information theory. This will allow us to fairly compare different techniques. It is supposed that the model that is the best predictor of words given the context, is the closest model to the true model of language. Thus, the measure that we will aim to minimize is the cross entropy of the test data given the language model. The cross entropy is equal to [math]\displaystyle{ \log_2 }[/math] perplexity (PPL). The per-word perplexity is defined as

Eq. 1 [math]\displaystyle{ PPL = \sqrt[K]{ \prod^K_{i=1} \frac{1}{P(w_i \mid w_{1,...,i-1})} } }[/math]

It is important to note that perplexity does not depend just on the quality of the model, but also on the nature of training and test data. For difficult tasks, when small amounts of training data are available and large vocabulary is used (thus the model has to choose between many variants), the perplexity can reach values over 1000, while on easy tasks, it is common to observe values below 100.

Another difficulty that arises when using perplexity as a measure of progress is when improvements are reported as percentual reductions. It can be seen that constant relative reduction of entropy results in variable reduction of perplexity. …

. …

References

BibTeX

@inproceedings{2011_EmpiricalEvaluationandCombinati,
  author    = {Tomas Mikolov and
               Anoop Deoras and
               Stefan Kombrink and
               Lukas Burget and
               Jan Cernocky},
  title     = {Empirical Evaluation and Combination of Advanced Language Modeling
               Techniques},
  booktitle = {Proceedings of the 12th Annual Conference of the International Speech
               Communication Association (INTERSPEECH 2011)},
  pages     = {605--608},
  publisher = {ISCA},
  year      = {2011},
  url       = {http://www.isca-speech.org/archive/interspeech\_2011/i11\_0605.html},
}

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2011 EmpiricalEvaluationandCombinati	Anoop Deoras Stefan Kombrink Lukas Burget Jan Černocký Tomáš Mikolov			Empirical Evaluation and Combination of Advanced Language Modeling Techniques.						2011