Text Clustering Algorithm: Difference between revisions

Revision as of 07:43, 13 January 2015

AKA: Document Clustering Algorithm, Text Document Clustering Algorithm.
Context:
- It can involve the mapping of a Document into a Document Vector.
- It can make use of an Ontology (e.g. Informal Ontology such as WordNet or Wikipedia)
Counter-Example(s):
See: Information Retrieval Algorithm, Text Classification Task.

(Hu & al, 1999) ⇒ Xiaohua Hu, Xiaodan Zhang, Caimei Lu, E. K. Park, and Xiaohua Zhou. (2009). "Exploiting Wikipedia as External Knowledge for Document Clustering." In: Proceedings of ACM SIGKDD Conference (KDD 2009). doi:10.1145/1557019.1557066

(Yoo & al, 2006) ⇒ Illhoi Yoo, Xiaohua Hu, and Il-Yeol Song. (2006). "Integration of Semantic-based Bipartite Graph Representation and Mutual Refinement Strategy for Biomedical Literature Clustering." In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2006).

(Ferragina & Gulli, 2005) ⇒ Paolo Ferragina, and Antonio Gulli. (2005). "A Personalized Search Engine Based on Web-Snippet Hierarchical Clustering." In: Proceedings of International World Wide Web Conference (WWW 2005).
(Surdeanu & al, 2005) ⇒ Mihai Surdeanu, Jordi Turmo, and Alicia Ageno. (2005). "A Hybrid Unsupervised Approach for Document Clustering." In: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining ([[KDD] 2005]]).
(Zhong & Ghosh, 2005) ⇒ S. Zhong, and Joydeep Ghosh. (2005). "Generative Model-based Document Clustering: A comparative study." In: Journal of Knowledge and Information Systems, 8(3).

(Sedding and Kazakov, 2004) ⇒ Julian Sedding and Dimitar Kazakov. (2004). "Wordnet-based Text Document Clustering." In: COLING-2004 Workshop on Robust Methods in Analysis of Natural Language Data (ROMAND).

(Hotho & al, 2001) ⇒ Andreas Hotho, Alexander Maedche, and Steffen Staab. "Ontology-based Text Clustering." In: Proceedings of the IJCAI-2001 Workshop on Text Learning: Beyond Supervision.
(Zhao & Karypsis, 2001) ⇒ Ying Zhao, and George Karypis. (2001). "Criterion Functions for Document Clustering: Experiments and analysis." Technical Report TR #01--40, Department of Computer Science, University of Minnesota, Minneapolis, MN.

(Steinbach, 2000) ⇒ Michael Steinbach, George Karypis, and Vipin Kumar. (2000). "A Comparison of Document Clustering Techniques." In: Proceedings of Workshop at KDD 2000 on Text Mining.
- We use two metrics for evaluating cluster quality: entropy, which provides a measure of “goodness” for un-nested clusters or for the clusters at one level of a hierarchical clustering, and the F-measure, which measures the effectiveness of a hierarchical clustering. (The F measure was recently extended to document hierarchies in [5].)

(Larsen & Aone, 1999) ⇒ Bjornar Larsen, and Chinatsu Aone. (1999). "Fast and Effective Text Mining Using Linear-time Document Clustering." In: Proceedings of the fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 1999). doi:10.1145/312129.312186

(Schütze & Silverstein, 1997) ⇒ Hinrich Schütze, and Craig Silverstein. (1997). "Projections for Efficient Document Clustering." In: ACM SIGIR Forum.
Zamir, O., Oren Etzioni, Madani, O., and Karp, R. (1997). "Fast and Intuitive Clustering of Web Documents." In: Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining.

@@ Line 15: / Line 15: @@
 ===2009===
-* ([[2009_ExploitingWikipediaAsExter|Hu & al, 1999]]) &rArr; [[Xiaohua Hu]], Xiaodan Zhang, Caimei Lu, E. K. Park, and Xiaohua Zhou. ([[2009]]). "Exploiting Wikipedia as External Knowledge for Document Clustering." In: Proceedings of [[ACM SIGKDD]] Conference ([[KDD 2009]]). [http://dx.doi.org/10.1145/1557019.1557066 doi:10.1145/1557019.1557066]
+* ([[2009_ExploitingWikipediaAsExternalKn|Hu & al, 1999]]) &rArr; [[Xiaohua Hu]], Xiaodan Zhang, Caimei Lu, E. K. Park, and Xiaohua Zhou. ([[2009]]). "Exploiting Wikipedia as External Knowledge for Document Clustering." In: Proceedings of [[ACM SIGKDD]] Conference ([[KDD 2009]]). [http://dx.doi.org/10.1145/1557019.1557066 doi:10.1145/1557019.1557066]
 ===2008===