word2vec-like System: Difference between revisions
m (Remove links to pages that are actually redirects to this page.) |
m (Remove links to pages that are actually redirects to this page.) |
||
Line 28: | Line 28: | ||
=== 2014 === | === 2014 === | ||
* ([[Rei & Briscoe, 2014]]) ⇒ [[Marek Rei]], and [[Ted Briscoe]]. ([[2014]]). “[http://www.aclweb.org/anthology/W14-1608 Looking for Hyponyms in Vector Space].” In: Proceedings of CoNLL-2014. | * ([[Rei & Briscoe, 2014]]) ⇒ [[Marek Rei]], and [[Ted Briscoe]]. ([[2014]]). “[http://www.aclweb.org/anthology/W14-1608 Looking for Hyponyms in Vector Space].” In: Proceedings of CoNLL-2014. | ||
** QUOTE: [[Word2vec]]: [[We]] created word representations using the [[word2vec-like System|word2vec toolkit]]<ref>https://code.google.com/p/word2vec/</ref>. </s> The tool is based on a [[feedforward neural network language model]], with modifications to make [[representation learning]] more efficient ([[Mikolov et al., 2013a]]). </s> [[We]] make use of the [[skip-gram model]], which takes each [[word in a sequence]] as an input to a [[log-linear classifier]] with a [[continuous projection layer]], and [[predicts word]]s within a [[text window|certain range before and after the input word]]. </s> The [[text window size|window size]] was set to 5 and [[vector]]s were trained with both [[100]] and 500 dimensions. </s> | ** QUOTE: [[word2vec-like System|Word2vec]]: [[We]] created word representations using the [[word2vec-like System|word2vec toolkit]]<ref>https://code.google.com/p/word2vec/</ref>. </s> The tool is based on a [[feedforward neural network language model]], with modifications to make [[representation learning]] more efficient ([[Mikolov et al., 2013a]]). </s> [[We]] make use of the [[skip-gram model]], which takes each [[word in a sequence]] as an input to a [[log-linear classifier]] with a [[continuous projection layer]], and [[predicts word]]s within a [[text window|certain range before and after the input word]]. </s> The [[text window size|window size]] was set to 5 and [[vector]]s were trained with both [[100]] and 500 dimensions. </s> | ||
=== 2013 === | === 2013 === |
Revision as of 20:45, 23 December 2019
A word2vec-like System is a distributional word embedding training system that applies a word2vec algorithm (based on work by Tomáš Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean, et al[1]).
- Context:
- It can train a word2vec Model Instance (that defines a word2vec model space).
- It can can require billions of words to train a good Word Embedding.
- It can have source code available at https://code.google.com/p/word2vec/source/checkout
- Example(s):
- the original release http://code.google.com/p/word2vec/
- Gensim's https://radimrehurek.com/gensim/models/word2vec.html
- Counter-Example(s):
- rnnlm System.
- SemanticVectors System.
- GloVe-based System (using the GloVe algorithm).
- word2phrase.
- See: Bag-of-Words Representation, Word Context Vectors.
References
2015
- (Rothe & Schütze, 2015) ⇒ Sascha Rothe, and Hinrich Schütze. (2015). “AutoExtend: Extending Word Embeddings to Embeddings for Synsets and Lexemes.” In: arXiv preprint arXiv:1507.01127.
- QUOTE: ... Unsupervised methods for word embeddings (also called “distributed word representations”) have become popular in natural language processing (NLP). These methods only need very large corpora as input to create sparse representations (e.g., based on local collocations) and project them into a lower dimensional dense vector space. Examples for word embeddings are SENNA (Collobert and Weston, 2008), the hierarchical log-bilinear model (Mnih and Hinton, 2009), word2vec (Mikolov et al., 2013c) and GloVe (Pennington et al., 2014).
2014
- Dec-23-2014 http://radimrehurek.com/2014/12/making-sense-of-word2vec/
- QUOTE: Tomáš Mikolov (together with his colleagues at Google) ... releasing word2vec, an unsupervised algorithm for learning the meaning behind words. ...
... Using large amounts of unannotated plain text, word2vec learns relationships between words automatically. The output are vectors, one vector per word, with remarkable linear relationships that allow us to do things like vec(“king”) – vec(“man”) + vec(“woman”) =~ vec(“queen”), or vec(“Montreal Canadiens”) – vec(“Montreal”) + vec(“Toronto”) resembles the vector for “Toronto Maple Leafs”. ...
... Basically, where GloVe precomputes the large word x word co-occurrence matrix in memory and then quickly factorizes it, word2vec sweeps through the sentences in an online fashion, handling each co-occurrence separately. So, there is a tradeoff between taking more memory (GloVe) vs. taking longer to train (word2vec). Also, once computed, GloVe can re-use the co-occurrence matrix to quickly factorize with any dimensionality, whereas word2vec has to be trained from scratch after changing its embedding dimensionality.
- QUOTE: Tomáš Mikolov (together with his colleagues at Google) ... releasing word2vec, an unsupervised algorithm for learning the meaning behind words. ...
2014
- (Rei & Briscoe, 2014) ⇒ Marek Rei, and Ted Briscoe. (2014). “Looking for Hyponyms in Vector Space.” In: Proceedings of CoNLL-2014.
- QUOTE: Word2vec: We created word representations using the word2vec toolkit[1]. The tool is based on a feedforward neural network language model, with modifications to make representation learning more efficient (Mikolov et al., 2013a). We make use of the skip-gram model, which takes each word in a sequence as an input to a log-linear classifier with a continuous projection layer, and predicts words within a certain range before and after the input word. The window size was set to 5 and vectors were trained with both 100 and 500 dimensions.
2013
- https://code.google.com/p/word2vec/
- This tool provides an efficient implementation of the continuous bag-of-words and skip-gram architectures for computing vector representations of words. These representations can be subsequently used in many natural language processing applications and for further research.
...
The word2vec tool takes a text corpus as input and produces the word vectors as output. It first constructs a vocabulary from the training text data and then learns vector representation of words. The resulting word vector file can be used as features in many natural language processing and machine learning applications.
A simple way to investigate the learned representations is to find the closest words for a user-specified word. The distance tool serves that purpose. For example, if you enter 'france', distance will display the most similar words and their distances to 'france', which should look like ...
- This tool provides an efficient implementation of the continuous bag-of-words and skip-gram architectures for computing vector representations of words. These representations can be subsequently used in many natural language processing applications and for further research.
2013b
- (Mikolov et al., 2013a) ⇒ Tomáš Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. (2013). “Efficient Estimation of Word Representations in Vector Space.” In: Proceedings of International Conference of Learning Representations Workshop.
2013a
- (Mikolov et al., 2013b) ⇒ Tomáš Mikolov, Wen-tau Yih, and Geoffrey Zweig. (2013). “Linguistic Regularities in Continuous Space Word Representations..” In: HLT-NAACL.