Out-Of-Vocabulary (OOV) Word

From GM-RKB
Jump to navigation Jump to search

A Out-Of-Vocabulary (OOV) Word is a Linguistic Unit or a token that does not appear in training vocabulary or document.



References

2017a

2017b

2017c

2017d

2017e

2017f

Note that if $w$ is an out-of-vocabulary (OOV) word, then $P_{vocab(w)}$ is zero; similarly if $w$ does not appear in the source document, then $\sum_{i:w_i=w} a^t_i$ is zero. The ability to produce OOV words is one of the primary advantages of pointer-generator models; by contrast models such as our baseline are restricted to their pre-set vocabulary.
The loss function is as described in equations (6) and (7), but with respect to our modified probability distribution $P(w)$ given in equation (9).

Figure 3: Pointer-generator model. For each decoder timestep a generation probability $p_{gen} \in [0,1]$ is calculated, which weights the probability of generating words from the vocabulary, versus copying words from the source text. The vocabulary distribution and the attention distribution are weighted and summed to obtain the final distribution, from which we make our prediction. Note that out-of-vocabulary article words such as 2-0 are included in the final distribution. Best viewed in color.

2014

2000