2011 EmpiricalEvaluationandCombinati

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Neural Network Language Model, Structured Language Model, Class-based Language Model, Cache-based Language Model, Maximum Entropy Language Model, Random Forest Language Model, Good-Turing Trigram language Model, Kneser-Ney Smoothed 5-Gram Language Model.

Notes

Cited By

Quotes

Abstract

We present results obtained with several advanced language modeling techniques, including class based model, cache model, maximum entropy model, structured language model, random forest language model and several types of neural network based language models. We show results obtained after combining all these models by using linear interpolation. We conclude that for both small and moderately sized tasks, we obtain new state-of-the-art results with combination of models, that is significantly better than performance of any individual model. Obtained perplexity reductions against Good-Turing trigram baseline are over 50% and against modified Kneser-Ney smoothed 5-gram over 40%.

1. Introduction

In this paper, we will deal with the statistical approaches to language modeling, that are motivated by information theory. This will allow us to fairly compare different techniques. It is supposed that the model that is the best predictor of words given the context, is the closest model to the true model of language. Thus, the measure that we will aim to minimize is the cross entropy of the test data given the language model. The cross entropy is equal to [math]\displaystyle{ \log_2 }[/math] perplexity (PPL). The per-word perplexity is defined as

Eq. 1 [math]\displaystyle{ PPL = \sqrt[K]{ \prod^K_{i=1} \frac{1}{P(w_i \mid w_{1,...,i-1})} } }[/math]

It is important to note that perplexity does not depend just on the quality of the model, but also on the nature of training and test data. For difficult tasks, when small amounts of training data are available and large vocabulary is used (thus the model has to choose between many variants), the perplexity can reach values over 1000, while on easy tasks, it is common to observe values below 100.

Another difficulty that arises when using perplexity as a measure of progress is when improvements are reported as percentual reductions. It can be seen that constant relative reduction of entropy results in variable reduction of perplexity. …

. …

References

BibTeX

@inproceedings{2011_EmpiricalEvaluationandCombinati,
  author    = {Tomas Mikolov and
               Anoop Deoras and
               Stefan Kombrink and
               Lukas Burget and
               Jan Cernocky},
  title     = {Empirical Evaluation and Combination of Advanced Language Modeling
               Techniques},
  booktitle = {Proceedings of the 12th Annual Conference of the International Speech
               Communication Association (INTERSPEECH 2011)},
  pages     = {605--608},
  publisher = {ISCA},
  year      = {2011},
  url       = {http://www.isca-speech.org/archive/interspeech\_2011/i11\_0605.html},
}


 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2011 EmpiricalEvaluationandCombinatiAnoop Deoras
Stefan Kombrink
Lukas Burget
Jan Černocký
Tomáš Mikolov
Empirical Evaluation and Combination of Advanced Language Modeling Techniques.2011