Dense Distributional Word Model Training Algorithm
A Dense Distributional Word Model Training Algorithm is a distributional word model training algorithm that can be implemented into a dense distributional word model training system (to solve a dense distributional word model training task).
- It can range from (typically) being a Continuous Dense Distributional Word Model Training Algorithm to being a Discrete Dense Distributional Word Model Training Algorithm.
- See: Embedding Algorithm, SGNS Algorithm, Word Embeddings.
- (Levy & Goldberg, 2014) ⇒ Omer Levy, and Yoav Goldberg. (2014). “Neural Word Embedding As Implicit Matrix Factorization.” In: Advances in Neural Information Processing Systems.
- (Goldberg & Levy, 2014) ⇒ Yoav Goldberg, and Omer Levy. (2014). “word2vec Explained: Deriving Mikolov Et Al.'s Negative-sampling Word-embedding Method.” In: arXiv preprint arXiv:1402.3722.
- (Mikolov et al., 2014) ⇒ Tomáš Mikolov, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeff Dean. (2014). “Distributed Representations of Words and Phrases and their Compositionality.” In: Advances in Neural Information Processing Systems, 26.
- (Chelba et al., 2013) ⇒ Ciprian Chelba, Tomáš Mikolov, Mike Schuster, Qi Ge, Thorsten Brants, Phillipp Koehn, and Tony Robinson. (2013). “One Billion Word Benchmark for Measuring Progress in Statistical Language Modeling." Technical Report, Google Research.
- (Mikolov et al., 2013a) ⇒ Tomáš Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. (2013). “Efficient Estimation of Word Representations in Vector Space." CoRR, abs/1301.3781, 2013.
- (Mikolov et al., 2013b) ⇒ Tomáš Mikolov, Wen-tau Yih, and Geoffrey Zweig. (2013). “Linguistic Regularities in Continuous Space Word Representations..” In: HLT-NAACL.
- (Bengio et al., 2003a) ⇒ Yoshua Bengio, Réjean Ducharme, Pascal Vincent, and Christian Janvin. (2003). “A Neural Probabilistic Language Model.” In: The Journal of Machine Learning Research, 3.
- QUOTE: A goal of statistical language modeling is to learn the joint probability function of sequences of words in a language. This is intrinsically difficult because of the curse of dimensionality: a word sequence on which the model will be tested is likely to be different from all the word sequences seen during training. Traditional but very successful approaches based on n-grams obtain generalization by concatenating very short overlapping sequences seen in the training set. We propose to fight the curse of dimensionality by learning a distributed representation for words which allows each training sentence to inform the model about an exponential number of semantically neighboring sentences.
- (Hinton, 1986) ⇒ Geoffrey E. Hinton. (1986). “Learning Distributed Representations of Concepts.” In: Proceedings of the eighth annual conference of the cognitive science society.