Distributional Word Model Training Algorithm

References

(Bengio & al, 2003a) ⇒ Yoshua Bengio, Réjean Ducharme, Pascal Vincent, and Christian Janvin. (2003). "A Neural Probabilistic Language Model." In: The Journal of Machine Learning Research, 3.
- QUOTE: A goal of statistical language modeling is to learn the joint probability function of sequences of words in a language. This is intrinsically difficult because of the curse of dimensionality: a word sequence on which the model will be tested is likely to be different from all the word sequences seen during training. Traditional but very successful approaches based on n-grams obtain generalization by concatenating very short overlapping sequences seen in the training set. We propose to fight the curse of dimensionality by learning a distributed representation for words which allows each training sentence to inform the model about an exponential number of semantically neighboring sentences.

(Hinton, 1986) ⇒ Geoffrey E. Hinton. (1986). "Learning Distributed Representations of Concepts." In: Proceedings of the eighth annual conference of the cognitive science society.
- QUOTE: Concepts can be represented by distributed patterns of activity in networks of neuron-like units. One advantage of this kind of representation is that it leads to automatic generalization.