Difference between revisions of "Doc2Vec Algorithm"

From GM-RKB
Jump to: navigation, search
(ContinuousReplacement)
(Tag: continuous replacement)
m (Text replacement - " doc2vec " to " doc2vec ")
(One intermediate revision by the same user not shown)
Line 8: Line 8:
 
=== 2016 ===
 
=== 2016 ===
 
* ([[Lauan & Baldwin, 2016]]) ⇒ [[Jey H. Lauan]], and [[Timothy Baldwin]]. ([[2016]]). “An Empirical Evaluation of Doc2vec with Practical Insights Into Document Embedding Generation.” In: Proceedings of the 1st Workshop on Representation Learning for NLP, pp. 78-86. </s>
 
* ([[Lauan & Baldwin, 2016]]) ⇒ [[Jey H. Lauan]], and [[Timothy Baldwin]]. ([[2016]]). “An Empirical Evaluation of Doc2vec with Practical Insights Into Document Embedding Generation.” In: Proceedings of the 1st Workshop on Representation Learning for NLP, pp. 78-86. </s>
 +
** QUOTE: Recently, [[Le and Mikolov (2014)]] proposed [[doc2vec]] as an extension to word2vec (Mikolov et al., 2013a) to learn document-level embeddings. Despite promising results in the original paper, others have struggled to reproduce those results. This paper presents a rigorous empirical evaluation of [[doc2vec]] over two tasks. We compare [[doc2vec]] to two baselines and two state-of-the-art document embedding methodologies. We found that [[doc2vec]] performs robustly when using models trained on large external corpora, and can be further improved by using pre-trained word embeddings. We also provide recommendations on hyper-parameter settings for general purpose applications, and release source code to induce document embeddings using our trained [[doc2vec]] models.
  
 
----
 
----
 
__NOTOC__
 
__NOTOC__

Revision as of 19:26, 4 December 2019

A Doc2Vec Algorithm is a document embedding algorithm that ...



References

2016

  • (Lauan & Baldwin, 2016) ⇒ Jey H. Lauan, and Timothy Baldwin. (2016). “An Empirical Evaluation of Doc2vec with Practical Insights Into Document Embedding Generation.” In: Proceedings of the 1st Workshop on Representation Learning for NLP, pp. 78-86.
    • QUOTE: Recently, Le and Mikolov (2014) proposed doc2vec as an extension to word2vec (Mikolov et al., 2013a) to learn document-level embeddings. Despite promising results in the original paper, others have struggled to reproduce those results. This paper presents a rigorous empirical evaluation of doc2vec over two tasks. We compare doc2vec to two baselines and two state-of-the-art document embedding methodologies. We found that doc2vec performs robustly when using models trained on large external corpora, and can be further improved by using pre-trained word embeddings. We also provide recommendations on hyper-parameter settings for general purpose applications, and release source code to induce document embeddings using our trained doc2vec models.