word2vec-like System: Difference between revisions
m (Text replacement - "“" to "“") |
m (Remove links to pages that are actually redirects to this page.) |
||
Line 20: | Line 20: | ||
=== 2015 === | === 2015 === | ||
* ([[2015_AutoExtendExtendingWordEmbeddin|Rothe & Schütze, 2015]]) ⇒ [[Sascha Rothe]], and [[Hinrich Schütze]]. ([[2015]]). “[http://arxiv.org/pdf/1507.01127.pdf AutoExtend: Extending Word Embeddings to Embeddings for Synsets and Lexemes].” In: arXiv preprint arXiv:1507.01127. | * ([[2015_AutoExtendExtendingWordEmbeddin|Rothe & Schütze, 2015]]) ⇒ [[Sascha Rothe]], and [[Hinrich Schütze]]. ([[2015]]). “[http://arxiv.org/pdf/1507.01127.pdf AutoExtend: Extending Word Embeddings to Embeddings for Synsets and Lexemes].” In: arXiv preprint arXiv:1507.01127. | ||
** QUOTE: ... [[Unsupervised methods for word embeddings]] (also called “[[distributed word representation]]s”) have become popular in [[natural language processing (NLP)]]. </s> [[These method]]s only need [[very large corpora]] as input to create [[sparse representation]]s (e.g., based on [[local collocation]]s) and project them into a [[lower dimensional dense vector space]]. </s> Examples for [[word embedding]]s are [[SENNA]] ([[Collobert and Weston, 2008]]), the [[hierarchical log-bilinear model]] ([[Mnih and Hinton, 2009]]), [[word2vec]] ([[Mikolov et al., 2013c]]) and [[GloVe]] ([[Pennington et al., 2014]]). </s> | ** QUOTE: ... [[Unsupervised methods for word embeddings]] (also called “[[distributed word representation]]s”) have become popular in [[natural language processing (NLP)]]. </s> [[These method]]s only need [[very large corpora]] as input to create [[sparse representation]]s (e.g., based on [[local collocation]]s) and project them into a [[lower dimensional dense vector space]]. </s> Examples for [[word embedding]]s are [[SENNA]] ([[Collobert and Weston, 2008]]), the [[hierarchical log-bilinear model]] ([[Mnih and Hinton, 2009]]), [[word2vec-like System|word2vec]] ([[Mikolov et al., 2013c]]) and [[GloVe]] ([[Pennington et al., 2014]]). </s> | ||
=== 2014 === | === 2014 === | ||
* Dec-23-2014 http://radimrehurek.com/2014/12/making-sense-of-word2vec/ | * Dec-23-2014 http://radimrehurek.com/2014/12/making-sense-of-word2vec/ | ||
** QUOTE: [[Tomáš Mikolov]] (together with his colleagues at Google) ... releasing [[word2vec]], an [[unsupervised algorithm]] for [[learning the meaning behind words]]. ... <P> ... Using [[large amounts of unannotated plain text]], [[word2vec]] [[learns relationships between words automatically]]. The output are [[vector]]s, one [[Word Vector|vector per word]], with remarkable [[linear relationship]]s that allow us to do things like vec(“king”) – vec(“man”) + vec(“woman”) =~ vec(“queen”), or vec(“Montreal Canadiens”) – vec(“Montreal”) + vec(“Toronto”) [[resemble]]s the [[word vector|vector]] for “Toronto Maple Leafs”. ... <P> ... Basically, where [[GloVe]] precomputes the large [[word x word co-occurrence matrix]] in memory and then quickly factorizes it, [[word2vec]] sweeps through the sentences in an online fashion, handling each co-occurrence separately. So, there is a tradeoff between taking more memory ([[GloVe]]) vs. taking longer to train ([[word2vec]]). Also, once computed, [[GloVe]] can re-use the [[word-word co-occurrence matrix|co-occurrence matrix]] to quickly factorize with any dimensionality, whereas [[word2vec]] has to be trained from scratch after changing its [[embedding dimensionality]]. | ** QUOTE: [[Tomáš Mikolov]] (together with his colleagues at Google) ... releasing [[word2vec-like System|word2vec]], an [[unsupervised algorithm]] for [[learning the meaning behind words]]. ... <P> ... Using [[large amounts of unannotated plain text]], [[word2vec-like System|word2vec]] [[learns relationships between words automatically]]. The output are [[vector]]s, one [[Word Vector|vector per word]], with remarkable [[linear relationship]]s that allow us to do things like vec(“king”) – vec(“man”) + vec(“woman”) =~ vec(“queen”), or vec(“Montreal Canadiens”) – vec(“Montreal”) + vec(“Toronto”) [[resemble]]s the [[word vector|vector]] for “Toronto Maple Leafs”. ... <P> ... Basically, where [[GloVe]] precomputes the large [[word x word co-occurrence matrix]] in memory and then quickly factorizes it, [[word2vec-like System|word2vec]] sweeps through the sentences in an online fashion, handling each co-occurrence separately. So, there is a tradeoff between taking more memory ([[GloVe]]) vs. taking longer to train ([[word2vec-like System|word2vec]]). Also, once computed, [[GloVe]] can re-use the [[word-word co-occurrence matrix|co-occurrence matrix]] to quickly factorize with any dimensionality, whereas [[word2vec-like System|word2vec]] has to be trained from scratch after changing its [[embedding dimensionality]]. | ||
=== 2014 === | === 2014 === | ||
* ([[Rei & Briscoe, 2014]]) ⇒ [[Marek Rei]], and [[Ted Briscoe]]. ([[2014]]). “[http://www.aclweb.org/anthology/W14-1608 Looking for Hyponyms in Vector Space].” In: Proceedings of CoNLL-2014. | * ([[Rei & Briscoe, 2014]]) ⇒ [[Marek Rei]], and [[Ted Briscoe]]. ([[2014]]). “[http://www.aclweb.org/anthology/W14-1608 Looking for Hyponyms in Vector Space].” In: Proceedings of CoNLL-2014. | ||
** QUOTE: [[Word2vec]]: [[We]] created word representations using the [[word2vec|word2vec toolkit]]<ref>https://code.google.com/p/word2vec/</ref>. </s> The tool is based on a [[feedforward neural network language model]], with modifications to make [[representation learning]] more efficient ([[Mikolov et al., 2013a]]). </s> [[We]] make use of the [[skip-gram model]], which takes each [[word in a sequence]] as an input to a [[log-linear classifier]] with a [[continuous projection layer]], and [[predicts word]]s within a [[text window|certain range before and after the input word]]. </s> The [[text window size|window size]] was set to 5 and [[vector]]s were trained with both [[100]] and 500 dimensions. </s> | ** QUOTE: [[Word2vec]]: [[We]] created word representations using the [[word2vec-like System|word2vec toolkit]]<ref>https://code.google.com/p/word2vec/</ref>. </s> The tool is based on a [[feedforward neural network language model]], with modifications to make [[representation learning]] more efficient ([[Mikolov et al., 2013a]]). </s> [[We]] make use of the [[skip-gram model]], which takes each [[word in a sequence]] as an input to a [[log-linear classifier]] with a [[continuous projection layer]], and [[predicts word]]s within a [[text window|certain range before and after the input word]]. </s> The [[text window size|window size]] was set to 5 and [[vector]]s were trained with both [[100]] and 500 dimensions. </s> | ||
=== 2013 === | === 2013 === |
Revision as of 20:45, 23 December 2019
A word2vec-like System is a distributional word embedding training system that applies a word2vec algorithm (based on work by Tomáš Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean, et al[1]).
- Context:
- It can train a word2vec Model Instance (that defines a word2vec model space).
- It can can require billions of words to train a good Word Embedding.
- It can have source code available at https://code.google.com/p/word2vec/source/checkout
- Example(s):
- the original release http://code.google.com/p/word2vec/
- Gensim's https://radimrehurek.com/gensim/models/word2vec.html
- Counter-Example(s):
- rnnlm System.
- SemanticVectors System.
- GloVe-based System (using the GloVe algorithm).
- word2phrase.
- See: Bag-of-Words Representation, Word Context Vectors.
References
2015
- (Rothe & Schütze, 2015) ⇒ Sascha Rothe, and Hinrich Schütze. (2015). “AutoExtend: Extending Word Embeddings to Embeddings for Synsets and Lexemes.” In: arXiv preprint arXiv:1507.01127.
- QUOTE: ... Unsupervised methods for word embeddings (also called “distributed word representations”) have become popular in natural language processing (NLP). These methods only need very large corpora as input to create sparse representations (e.g., based on local collocations) and project them into a lower dimensional dense vector space. Examples for word embeddings are SENNA (Collobert and Weston, 2008), the hierarchical log-bilinear model (Mnih and Hinton, 2009), word2vec (Mikolov et al., 2013c) and GloVe (Pennington et al., 2014).
2014
- Dec-23-2014 http://radimrehurek.com/2014/12/making-sense-of-word2vec/
- QUOTE: Tomáš Mikolov (together with his colleagues at Google) ... releasing word2vec, an unsupervised algorithm for learning the meaning behind words. ...
... Using large amounts of unannotated plain text, word2vec learns relationships between words automatically. The output are vectors, one vector per word, with remarkable linear relationships that allow us to do things like vec(“king”) – vec(“man”) + vec(“woman”) =~ vec(“queen”), or vec(“Montreal Canadiens”) – vec(“Montreal”) + vec(“Toronto”) resembles the vector for “Toronto Maple Leafs”. ...
... Basically, where GloVe precomputes the large word x word co-occurrence matrix in memory and then quickly factorizes it, word2vec sweeps through the sentences in an online fashion, handling each co-occurrence separately. So, there is a tradeoff between taking more memory (GloVe) vs. taking longer to train (word2vec). Also, once computed, GloVe can re-use the co-occurrence matrix to quickly factorize with any dimensionality, whereas word2vec has to be trained from scratch after changing its embedding dimensionality.
- QUOTE: Tomáš Mikolov (together with his colleagues at Google) ... releasing word2vec, an unsupervised algorithm for learning the meaning behind words. ...
2014
- (Rei & Briscoe, 2014) ⇒ Marek Rei, and Ted Briscoe. (2014). “Looking for Hyponyms in Vector Space.” In: Proceedings of CoNLL-2014.
- QUOTE: Word2vec: We created word representations using the word2vec toolkit[1]. The tool is based on a feedforward neural network language model, with modifications to make representation learning more efficient (Mikolov et al., 2013a). We make use of the skip-gram model, which takes each word in a sequence as an input to a log-linear classifier with a continuous projection layer, and predicts words within a certain range before and after the input word. The window size was set to 5 and vectors were trained with both 100 and 500 dimensions.
2013
- https://code.google.com/p/word2vec/
- This tool provides an efficient implementation of the continuous bag-of-words and skip-gram architectures for computing vector representations of words. These representations can be subsequently used in many natural language processing applications and for further research.
...
The word2vec tool takes a text corpus as input and produces the word vectors as output. It first constructs a vocabulary from the training text data and then learns vector representation of words. The resulting word vector file can be used as features in many natural language processing and machine learning applications.
A simple way to investigate the learned representations is to find the closest words for a user-specified word. The distance tool serves that purpose. For example, if you enter 'france', distance will display the most similar words and their distances to 'france', which should look like ...
- This tool provides an efficient implementation of the continuous bag-of-words and skip-gram architectures for computing vector representations of words. These representations can be subsequently used in many natural language processing applications and for further research.
2013b
- (Mikolov et al., 2013a) ⇒ Tomáš Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. (2013). “Efficient Estimation of Word Representations in Vector Space.” In: Proceedings of International Conference of Learning Representations Workshop.
2013a
- (Mikolov et al., 2013b) ⇒ Tomáš Mikolov, Wen-tau Yih, and Geoffrey Zweig. (2013). “Linguistic Regularities in Continuous Space Word Representations..” In: HLT-NAACL.