word2vec-like System: Difference between revisions

From GM-RKB
Jump to navigation Jump to search
m (Text replacement - "“" to "“")
m (Remove links to pages that are actually redirects to this page.)
Line 20: Line 20:
=== 2015 ===
=== 2015 ===
* ([[2015_AutoExtendExtendingWordEmbeddin|Rothe & Schütze, 2015]]) ⇒ [[Sascha Rothe]], and [[Hinrich Schütze]]. ([[2015]]). “[http://arxiv.org/pdf/1507.01127.pdf AutoExtend: Extending Word Embeddings to Embeddings for Synsets and Lexemes].” In: arXiv preprint arXiv:1507.01127.  
* ([[2015_AutoExtendExtendingWordEmbeddin|Rothe & Schütze, 2015]]) ⇒ [[Sascha Rothe]], and [[Hinrich Schütze]]. ([[2015]]). “[http://arxiv.org/pdf/1507.01127.pdf AutoExtend: Extending Word Embeddings to Embeddings for Synsets and Lexemes].” In: arXiv preprint arXiv:1507.01127.  
** QUOTE: ... [[Unsupervised methods for word embeddings]] (also called “[[distributed word representation]]s”) have become popular in [[natural language processing (NLP)]]. </s> [[These method]]s only need [[very large corpora]] as input to create [[sparse representation]]s (e.g., based on [[local collocation]]s) and project them into a [[lower dimensional dense vector space]]. </s> Examples for [[word embedding]]s are [[SENNA]] ([[Collobert and Weston, 2008]]), the [[hierarchical log-bilinear model]] ([[Mnih and Hinton, 2009]]), [[word2vec]] ([[Mikolov et al., 2013c]]) and [[GloVe]] ([[Pennington et al., 2014]]). </s>
** QUOTE: ... [[Unsupervised methods for word embeddings]] (also called “[[distributed word representation]]s”) have become popular in [[natural language processing (NLP)]]. </s> [[These method]]s only need [[very large corpora]] as input to create [[sparse representation]]s (e.g., based on [[local collocation]]s) and project them into a [[lower dimensional dense vector space]]. </s> Examples for [[word embedding]]s are [[SENNA]] ([[Collobert and Weston, 2008]]), the [[hierarchical log-bilinear model]] ([[Mnih and Hinton, 2009]]), [[word2vec-like System|word2vec]] ([[Mikolov et al., 2013c]]) and [[GloVe]] ([[Pennington et al., 2014]]). </s>


=== 2014 ===
=== 2014 ===
* Dec-23-2014 http://radimrehurek.com/2014/12/making-sense-of-word2vec/
* Dec-23-2014 http://radimrehurek.com/2014/12/making-sense-of-word2vec/
** QUOTE: [[Tomáš Mikolov]] (together with his colleagues at Google) ... releasing [[word2vec]], an [[unsupervised algorithm]] for [[learning the meaning behind words]]. ...        <P>        ... Using [[large amounts of unannotated plain text]], [[word2vec]] [[learns relationships between words automatically]]. The output are [[vector]]s, one [[Word Vector|vector per word]], with remarkable [[linear relationship]]s that allow us to do things like vec(“king”) – vec(“man”) + vec(“woman”) =~ vec(“queen”), or vec(“Montreal Canadiens”) – vec(“Montreal”) + vec(“Toronto”) [[resemble]]s the [[word vector|vector]] for “Toronto Maple Leafs”. ...        <P>        ... Basically, where [[GloVe]] precomputes the large [[word x word co-occurrence matrix]] in memory and then quickly factorizes it, [[word2vec]] sweeps through the sentences in an online fashion, handling each co-occurrence separately. So, there is a tradeoff between taking more memory ([[GloVe]]) vs. taking longer to train ([[word2vec]]). Also, once computed, [[GloVe]] can re-use the [[word-word co-occurrence matrix|co-occurrence matrix]] to quickly factorize with any dimensionality, whereas [[word2vec]] has to be trained from scratch after changing its [[embedding dimensionality]].  
** QUOTE: [[Tomáš Mikolov]] (together with his colleagues at Google) ... releasing [[word2vec-like System|word2vec]], an [[unsupervised algorithm]] for [[learning the meaning behind words]]. ...        <P>        ... Using [[large amounts of unannotated plain text]], [[word2vec-like System|word2vec]] [[learns relationships between words automatically]]. The output are [[vector]]s, one [[Word Vector|vector per word]], with remarkable [[linear relationship]]s that allow us to do things like vec(“king”) – vec(“man”) + vec(“woman”) =~ vec(“queen”), or vec(“Montreal Canadiens”) – vec(“Montreal”) + vec(“Toronto”) [[resemble]]s the [[word vector|vector]] for “Toronto Maple Leafs”. ...        <P>        ... Basically, where [[GloVe]] precomputes the large [[word x word co-occurrence matrix]] in memory and then quickly factorizes it, [[word2vec-like System|word2vec]] sweeps through the sentences in an online fashion, handling each co-occurrence separately. So, there is a tradeoff between taking more memory ([[GloVe]]) vs. taking longer to train ([[word2vec-like System|word2vec]]). Also, once computed, [[GloVe]] can re-use the [[word-word co-occurrence matrix|co-occurrence matrix]] to quickly factorize with any dimensionality, whereas [[word2vec-like System|word2vec]] has to be trained from scratch after changing its [[embedding dimensionality]].  


=== 2014 ===
=== 2014 ===
* ([[Rei & Briscoe, 2014]]) ⇒ [[Marek Rei]], and [[Ted Briscoe]]. ([[2014]]). “[http://www.aclweb.org/anthology/W14-1608 Looking for Hyponyms in Vector Space].” In: Proceedings of CoNLL-2014.  
* ([[Rei & Briscoe, 2014]]) ⇒ [[Marek Rei]], and [[Ted Briscoe]]. ([[2014]]). “[http://www.aclweb.org/anthology/W14-1608 Looking for Hyponyms in Vector Space].” In: Proceedings of CoNLL-2014.  
** QUOTE: [[Word2vec]]: [[We]] created word representations using the [[word2vec|word2vec toolkit]]<ref>https://code.google.com/p/word2vec/</ref>. </s> The tool is based on a [[feedforward neural network language model]], with modifications to make [[representation learning]] more efficient ([[Mikolov et al., 2013a]]).  </s> [[We]] make use of the [[skip-gram model]], which takes each [[word in a sequence]] as an input to a [[log-linear classifier]] with a [[continuous projection layer]], and [[predicts word]]s within a [[text window|certain range before and after the input word]]. </s> The [[text window size|window size]] was set to 5 and [[vector]]s were trained with both [[100]] and 500 dimensions. </s>
** QUOTE: [[Word2vec]]: [[We]] created word representations using the [[word2vec-like System|word2vec toolkit]]<ref>https://code.google.com/p/word2vec/</ref>. </s> The tool is based on a [[feedforward neural network language model]], with modifications to make [[representation learning]] more efficient ([[Mikolov et al., 2013a]]).  </s> [[We]] make use of the [[skip-gram model]], which takes each [[word in a sequence]] as an input to a [[log-linear classifier]] with a [[continuous projection layer]], and [[predicts word]]s within a [[text window|certain range before and after the input word]]. </s> The [[text window size|window size]] was set to 5 and [[vector]]s were trained with both [[100]] and 500 dimensions. </s>


=== 2013 ===
=== 2013 ===

Revision as of 20:45, 23 December 2019

A word2vec-like System is a distributional word embedding training system that applies a word2vec algorithm (based on work by Tomáš Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean, et al[1]).



References

2015

2014

2014

2013

2013b

2013a