Maximum Likelihood Estimation (MLE)-based Language Model

From GM-RKB
(Redirected from N-gram language model)
Jump to navigation Jump to search

An Maximum Likelihood Estimation (MLE)-based Language Model is a language model in which the probability distribution is a maximum likelihood estimation.



References

2019

2016

  • (Kuznetsov et al., 2016) ⇒ Vitaly Kuznetsov, Hank Liao, Mehryar Mohri, Michael Riley, and Brian Roark. (2016). “Learning N-Gram Language Models from Uncertain Data”. In: INTERSPEECH, pp. 2323-2327.
    • ABSTRACT: We present a new algorithm for efficiently training n-gram language models on uncertain data, and illustrate its use for semi-supervised language model adaptation. We compute the probability that an n-gram occurs k times in the sample of uncertain data, and use the resulting histograms to derive a generalized Katz backoff model. We compare semi-supervised adaptation of language models for YouTube video speech recognition in two conditions: when using full lattices with our new algorithm versus just the 1-best output from the baseline speech recognizer. Unlike 1-best methods, the new algorithm provides models that yield solid improvements over the baseline on the full test set, and, further, achieves these gains without hurting performance on any of the set of channels. We show that channels with the most data yielded the largest gains. The algorithm was implemented via a new semiring in the OpenFst library and will be released as part of the OpenGrm ngram library.

2003

  • (Croft & Lafferty, 2003) ⇒ W. Bruce Croft, and John D. Lafferty, editors. (2003). “Language Modeling for Information Retrieval." Kluwer Academic.
    • QUOTE: A statistical language model, or more simple a language model, is a probabilistic mechanism for generating text. Such a definition is general enough to include an endless variety of schemes. However, a distinction should be made between generative models, which can in principle be used to synthesize artificial text, and discriminative techniques to classify text into predefined categories.

      In the past several years a new framework for information retrieval has emerged that is based on statistical language modeling. The approach differs from traditional probabilistic approaches in interesting and subtle ways, and is fundamentally different from vector space methods. It is string that the language modeling approach to information retrieval was not proposed until the late 1990s; however, until recently the information retrieval and language modeling research communities were somewhat isolated.

2001a

2001b

1996

  • (Stanley et al., 1996) ⇒ Stanley F. Chen, and Joshua Goodman. (1996). "Empirical Study of Smoothing Techniques for Language Modeling". In: Proceedings of the 34th Annual Meeting of the ACL (ACL 1996).
    • QUOTE: We present an extensive empirical comparison of several smoothing techniques in the domain of language modeling, including those described by Jelinek and Mercer (1980), Katz (1987), and Church and Gale (1991). We investigate for the first time how factors such as training data size, corpus (e.g., Brown versus Wall Street Journal), and n-gram order (bigram versus trigram) affect the relative performance of these methods, which we measure through the cross-entropy of test data. In addition, we introduce two novel smoothing techniques, one a variation of Jelinek-Mercer smoothing and one a very simple linear interpolation technique, both of which outperform existing methods.

1992


  1. For probabilistic models, normalizing means dividing by some total count so that the resulting probabilities fall legally between 0 and 1.