Distributional Word Model Training Algorithm

AKA: Distributional Word Representation Algorithm.
Context:
- It can range from being a Continuous Distributional Word Model Training Algorithm to being a Discrete Distributional Word Model Training Algorithm.
- It can range from being a Sparse Distributional Word Model Training Algorithm to being a Dense Distributional Word Model Training Algorithm.
- …
Example(s):
- (Levy & Goldberg, 2014)
- (Mikolov et al., 2013b).
- (GloVe Algorithm).
- …
Counter-Example(s):
- a Distributional Phrase Model Training Algorithm.
- a Next-Word Model Training Algorithm.
- a Language Model Training Algorithm.
- a Sentence Embedding Algorithm, such as (Le & Mikolov, 2014).
- a Document Embedding Algorithm, such as (Le & Mikolov, 2014).
See: Embedding Algorithm, SGNS Algorithm, Word Embeddings.

References

(Levy & Goldberg, 2014) ⇒ Omer Levy, and Yoav Goldberg. (2014). “Neural Word Embedding As Implicit Matrix Factorization.” In: Advances in Neural Information Processing Systems.
(Goldberg & Levy, 2014) ⇒ Yoav Goldberg, and Omer Levy. (2014). “word2vec Explained: Deriving Mikolov Et Al.'s Negative-sampling Word-embedding Method.” In: arXiv preprint arXiv:1402.3722.
(Mikolov et al., 2014) ⇒ Tomáš Mikolov, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeff Dean. (2014). “Distributed Representations of Words and Phrases and their Compositionality.” In: Advances in Neural Information Processing Systems, 26.

(Bengio et al., 2003a) ⇒ Yoshua Bengio, Réjean Ducharme, Pascal Vincent, and Christian Janvin. (2003). “A Neural Probabilistic Language Model.” In: The Journal of Machine Learning Research, 3.
- QUOTE: A goal of statistical language modeling is to learn the joint probability function of sequences of words in a language. This is intrinsically difficult because of the curse of dimensionality: a word sequence on which the model will be tested is likely to be different from all the word sequences seen during training. Traditional but very successful approaches based on n-grams obtain generalization by concatenating very short overlapping sequences seen in the training set. We propose to fight the curse of dimensionality by learning a distributed representation for words which allows each training sentence to inform the model about an exponential number of semantically neighboring sentences.

(Hinton, 1986) ⇒ Geoffrey E. Hinton. (1986). “Learning Distributed Representations of Concepts.” In: Proceedings of the eighth annual conference of the cognitive science society.
- QUOTE: Concepts can be represented by distributed patterns of activity in networks of neuron-like units. One advantage of this kind of representation is that it leads to automatic generalization.