2001 OntologyBasedTextClustering

(Hotho et al., 2001) ⇒ Andreas Hotho, Alexander Maedche, Steffen Staab. (2001). “Ontology-based Text Clustering.” In: Proceedings of the IJCAI-2001 Workshop on Text Learning: Beyond Supervision.

Subject Headings: Text Clustering Algorithm, Document Vector, Word Vector.

Notes

See their subsequent work (Hotho et al., 2003) ⇒ Andreas Hotho, Steffen Staab, and Gerd Stumme. (2003). “Wordnet Improves Text Document Clustering.” In: Proceedings of the SIGIR Workshop on Semantic Web Workshop.

Cited By

~134 http://scholar.google.com/scholar?cites=12879009525856628385

Quotes

Abstract

Text clustering typically involves clustering in a high dimensional space, which appears difficult with regard to virtually all practical settings. In addition, given a particular clustering result it is typically very hard to come up with a good explanation of why the text clusters have been constructed the way they are. In this paper, we propose a new approach for applying background knowledge during preprocessing in order to improve clustering results and allow for selection between results. We built various views basing our selection of text features on a heterarchy of concepts. Based on these aggregations, we compute multiple clustering results using K-Means. The results may be distinguished and explained by the corresponding selection of concepts in the ontology. Our results compare favourably with a sophisticated baseline preprocessing strategy.

References

Rakesh Agrawal, J. Gehrke, D. Gunopulos, and Prabhakar Raghavan. Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of the ACMSIGMOD Int’l Conference on Management of Data, Seattle, Washington, June (1998). ACM Press, 1998.
K. Beyer, Jonathan Goldstein, R. Ramakrishnan, and U. Shaft. When is ‘nearest neighbor’ meaningful. In: Proceedings of ICDT-1999, Jerusalem, Israel, 1999, pages 217–235, 1999.
P. Bradley, Usama M. Fayyad, and C. Reina. Scaling clustering algorithms to large databases. In: Proceedings of KDD-1998, New York, NY, USA, August 1998, pages 9–15, Menlo Park, CA, USA, (1998). AAAI Press.
J. Fuernkranz, Tom M. Mitchell, and Ellen Riloff. A Case Study in Using Linguistic Phrases for Text Categorization on the WWW. In: Proceedings of AAAI/ICML Workshop Learning for Text Categorization, Madison, WI, (1998). AAAI Press, 1998.
A. Hinneburg, C. Aggarwal, and D.A. Keim. What is the nearest neighbor in high dimensional spaces? In: Proceedings of VLDB-2000, Cairo, Egypt, September 2000, pages 506–515. Morgan Kaufmann, 2000.
A. Hinneburg and D.A. Keim. Optimal gridclustering: Towards breaking the curse of dimensionality in high-dimensional clustering. In: Proceedings of VLDB-1999, Edinburgh, Scotland, September (2000). Morgan Kaufmann, 1999.
A. Hinneburg, M. Wawryniuk, and D.A. Keim. Visual mining of high-dimensional data. Computer Graphics & Applications Journal, September 1999.
L. Kaufman and P.J. Rousseeuw. Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, New York, 1990.
S. A. Macskassy, A. Banerjee, B.D. Davison, and H. Hirsh. Human performance on clustering web pages: a preliminary study. In: Proceedings of KDD-1998, New York, NY, USA, August 1998, pages 264–268, Menlo Park, CA, USA, (1998). AAAI Press.
Alexander Maedche and Steffen Staab. Ontology learning for the semantic web. IEEE Intelligent Systems, 16(2), 2001.
George A. Miller. WordNet: A lexical database for english. CACM, 38(11):39–41, 1995.
G. Neumann, R. Backofen, J. Baur, M. Becker, and C. Braun. An information extraction core system for real world german text processing. In ANLP-1997 — Proceedings of the Conference on Applied Natural Language Processing, pages 208–215,Washington, USA, 1997.
M. Devaneyand A. Ram. Efficient feature selection in conceptual clustering. In: Proceedings of ICML-1997, Nashville, TN, (1998). Morgan Kaufmann, 1998.
H. Schuetze and C. Silverstein. Projections for efficient document clustering. In: Proceedings of SIGIR-1997, Philadelphia, PA, July 1997, pages 74–81. Morgan Kaufmann, 1997.

,

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2001 OntologyBasedTextClustering	Steffen Staab Alexander Maedche Andreas Hotho			Ontology-based Text Clustering		Proceedings of the IJCAI-2001 Workshop on Text Learning: Beyond Supervision	http://www.uni-koblenz.de/~staab/Research/Publications/hothoetal-ijcaiws2001.pdf			2001