Document Word Vector: Difference between revisions

From GM-RKB
Jump to navigation Jump to search
m (Text replace - " (2007)" to " (2007)")
 
m (Text replacement - "<B><U>AKA</U>:</B>" to "<U>AKA</U>:")
Line 1: Line 1:
A [[Document Word Vector]] is a [[Word Vector]] that is a [[Document Vector]] (representation of a [[Document]]).
A [[Document Word Vector]] is a [[Word Vector]] that is a [[Document Vector]] (representation of a [[Document]]).
* <B><U>AKA</U>:</B> [[Text Vector]], [[Vectorized Document]], [[Document Term Vector]], [[Document Indexing Vector]].
* <U>AKA</U>: [[Text Vector]], [[Vectorized Document]], [[Document Term Vector]], [[Document Indexing Vector]].
* <B><U>Context</U>:</B>
* <B><U>Context</U>:</B>
** It can be used by:
** It can be used by:

Revision as of 21:34, 17 August 2014

A Document Word Vector is a Word Vector that is a Document Vector (representation of a Document).



References

2007

  • (Recupero, 2007) ⇒ Diego Reforgiato Recupero. (2007). "A New Unsupervised Method for Document Clustering by using WordNet Lexical and Conceptual Relations." In: Information Retrieval (2007) 10:563–579.
    • Many well-known methods of text clustering make use of a long list of words as vector space which is often unsatisfactory for a couple of reasons: first, it keeps the dimensionality of the data very high, and second, it ignores important relationships between terms like synonyms or antonyms. Our unsupervised method solves both problems by using ANNIE and WordNet lexical categories and WordNet ontology in order to create a well structured document vector space whose low dimensionality allows common clustering algorithms to perform well.

2001

  • (StephensPMR, 2001) ⇒ M. Stephens, M. Palakal, S. Mukhopadhyay, and R. Raje. (2001). "Detecting gene relations from MEDLINE abstracts." In: Proc. Sixth Annual Pacific Symposium on Biocomputing, pages 483–496.
    • The document representation step converts text documents into structures that can be efficiently processed without the loss of vital content. At the core of this process is a thesaurus, an array T of atomic tokens (e.g., a single term) each identified by a unique numeric identifier culled from authoritative sources or automatically. ... The purpose of the document representation step is to convert each document to a weight vector whose dimension is the same as the number of terms in the thesaurus.

1975