# Text-String Probability Function Training Task

(Redirected from statistical language modeling)

A Text-String Probability Function Training Task is a probability function generation task that requires the creation of a text-string probability function structure.

**AKA:**Statistical Language Modeling, LM.**Context:**- Performance: a Perplexity Measure, ...
- It can range from (typically) being a Data-Driven Language Modeling Task to being a Heuristic Language Modeling Task.
- It can range from being a Character-level Language Modeling Task to being a Word-level Language Modeling Task.
- It can be solved by a Language Modeling System (that implements a language modeling algorithm).
- It can include a Language Model Evaluation Task.

**Counter-Example(s):****See:**n-Gram, Word Embedding Task.

## References

### 2013

- (Collins, 2013a) ⇒ Michael Collins. (2013). “Chapter 1 - Language Modeling." Course notes for NLP by Michael Collins, Columbia University.
- QUOTE: Definition 1 (Language Model) A language model consists of a finite set [math]\mathcal{V}[/math], and afunction [math]p(x_1, x_2, ... x_n)[/math] such that:
- For any [math]\lt x_1 ... x_n\gt \in \mathcal{V}^{\dagger}, p(x_1,x_2,... x_n) \ge 0[/math]
- In addition, [math]\Sigma_{\lt x_1 ... x+n\gt } \in \mathcal{V}^{\dagger} p(x1; x2, ... xn) = 1[/math]

- Hence [math]p(x_1,x_2,... x_n)[/math] is a probability distribution over the sentences in [math]\mathcal{V}^{\dagger}[/math].

- QUOTE: Definition 1 (Language Model) A language model consists of a finite set [math]\mathcal{V}[/math], and afunction [math]p(x_1, x_2, ... x_n)[/math] such that:

### 2003

- (Bengio et al., 2003a) ⇒ Yoshua Bengio, Réjean Ducharme, Pascal Vincent, and Christian Janvin. (2003). “A Neural Probabilistic Language Model.” In: The Journal of Machine Learning Research, 3.
- QUOTE: A goal of statistical language modeling is to learn the joint probability function of sequences of words in a language.

### 2001

- (Goodman, 2001) ⇒ Joshua T. Goodman. (2001). “A Bit of Progress in Language Modeling.” In: Computer Speech & Language, 15(4). doi:10.1006/csla.2001.0174
- QUOTE: The goal of a language model is to determine the probability of a word sequence [math]w_1...w_n, P (w_1...w_n)[/math]. This probability is typically broken down into its component probabilities: : [math]P (w_1...w_i) = P (w_1) × P (w_2 \mid w_1) ×... × P (w_i \mid w_1...w_{i−1}) [/math] Since it may be difficult to compute a probability of the form [math]P(w_i \mid w_1...w_{i−1})[/math] for large i, we typically assume that the probability of a word depends on only the two previous words, the trigram assumption: : [math]P (w_i \mid w_1...w_{i−1}) ≈ P (w_i \mid w_i−2w_{i−1})[/math] which has been shown to work well in practice.