Negative-Sampling Algorithm

(Baroni & al, 2014) ⇒ Marco Baroni, Georgiana Dinu, and Germán Kruszewski. (2014). "Don't Count, Predict! a Systematic Comparison of Context-counting Vs. Context-predicting Semantic Vectors." In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL 2014)
- QUOTE: Hierarchical softmax is a computationally efficient way to estimate the overall probability distribution using an output layer that is proportional to log (unigram.perplexity (W)) instead of W (for W the vocabulary size). As an alternative, negative sampling estimates the probability of an output word by learning to distinguish it from draws from a noise distribution. The number of these draws (number of negative samples) is given by a parameter k. We test both hierarchical softmax and negative sampling with k values of 5 and 10. Very frequent words such as the or a are not very informative as context features.

Navigation menu