Document Word Vector: Difference between revisions

Revision as of 21:34, 17 August 2014

(Recupero, 2007) ⇒ Diego Reforgiato Recupero. (2007). "A New Unsupervised Method for Document Clustering by using WordNet Lexical and Conceptual Relations." In: Information Retrieval (2007) 10:563–579.
- Many well-known methods of text clustering make use of a long list of words as vector space which is often unsatisfactory for a couple of reasons: first, it keeps the dimensionality of the data very high, and second, it ignores important relationships between terms like synonyms or antonyms. Our unsupervised method solves both problems by using ANNIE and WordNet lexical categories and WordNet ontology in order to create a well structured document vector space whose low dimensionality allows common clustering algorithms to perform well.

(StephensPMR, 2001) ⇒ M. Stephens, M. Palakal, S. Mukhopadhyay, and R. Raje. (2001). "Detecting gene relations from MEDLINE abstracts." In: Proc. Sixth Annual Pacific Symposium on Biocomputing, pages 483–496.
- The document representation step converts text documents into structures that can be efficiently processed without the loss of vital content. At the core of this process is a thesaurus, an array T of atomic tokens (e.g., a single term) each identified by a unique numeric identifier culled from authoritative sources or automatically. ... The purpose of the document representation step is to convert each document to a weight vector whose dimension is the same as the number of terms in the thesaurus.

@@ Line 1: / Line 1: @@
 A [[Document Word Vector]] is a [[Word Vector]] that is a [[Document Vector]] (representation of a [[Document]]).
-* <B><U>AKA</U>:</B> [[Text Vector]], [[Vectorized Document]], [[Document Term Vector]], [[Document Indexing Vector]].
+* <U>AKA</U>: [[Text Vector]], [[Vectorized Document]], [[Document Term Vector]], [[Document Indexing Vector]].
 * <B><U>Context</U>:</B>
 ** It can be used by: