2006 AComprehensiveComparStudyOfDocClustForBiomedDigiLibMEDLINE

(Yoo & Hu) ⇒ Illhoi Yoo, Xiaohua Hu. (2006). “A Comprehensive Comparison Study of Document Clustering for a Biomedical Digital Library MEDLINE.” In: Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries.

Subject Headings:

Notes

Cited By

~24 http://scholar.google.com/scholar?cites=188131744891586522

Quotes

Abstract

Document clustering has been used for better document retrieval, document browsing, and text mining in digital library. In this paper, we perform a comprehensive comparison study of various document clustering approaches such as three hierarchical methods (single-link, complete-link, and complete link), Bisecting K-means, K-means, and Suffix Tree Clustering in terms of the efficiency, the effectiveness, and the scalability. In addition, we apply a domain ontology to document clustering to investigate if the ontology such as MeSH improves clustering quality for MEDLINE articles. Because an ontology is a formal, explicit specification of a shared conceptualization for a domain of interest, the use of ontologies is a natural way to solve traditional information retrieval problems such as synonym/hypernym/hyponym problems. We conducted fairly extensive experiments based on different evaluation metrics such as misclassification index, F-measure, cluster purity, and Entropy on very large article sets from MEDLINE, the largest biomedical digital library in biomedicine.

References

Aggarwal, C. C., Wolf, J. L., Yu, P. S., Procopiuc, C., and Park, J. S. Fast algorithms for projected clustering. In: Proceedings of the 1999 ACM SIGMOD International Conference on Management of data, 1999, 61-72.
Beil, F., Ester, M. and Xu, X. Frequent Term-based Text Clustering, In: Proceedings of 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, July 23-26, 2002, Edmonton, Alberta, Canada, 436-442.
Beyer, K., Jonathan Goldstein, Ramakrishnan, R., and Shaft, U. When is nearest neighbor meaningful?. Proceedings of 7th International Conference on Database Theory, 1999, 217-235.
Buckley, C., Gerard M. Salton, Allen, J. and Singhal, A. Automatic query expansion using SMART: TREC-3. In: D. K. Harman (ed.), The Third Text Retrieval Conference (TREC-3). U.S. Department of Commerce, 1995, 69-80.
Buckley, C. and Lewit, A. F. Optimization of inverted vector searches. In: Proceedings of SIGIR-85, 1985, 97-110.
Cutting, D., Karger, D., Pedersen, J. and Tukey, J. Scatter/Gather: A Cluster-based Approach to Browsing Large Document Collections, In: Proceedings of SIGIR ’92, 1992, 318-329.
Ghosh, J. Scalable clustering methods for data mining. In N. Ye (Ed.), Handbook of data mining. Lawrence Erlbaum, 2003.
Gruber, T.R. Towards Principles for the Design of Ontologies used for Knowledge Sharing. International Journal of Human-Computer Studies, 43, 1995, 907-928.
Hearst, M. A. and Pedersen, J. O. Reexamining the cluster hypothesis: Scatter/Gather on retrieval results. In: Proceedings of SIGIR-96, 1996, 76–84.
Hotho, A., Maedche A., and Staab S. Text Clustering Based on Good Aggregations. Künstliche Intelligenz (KI), 16, 4, 2002, 48-54.
Hu, X. Mining Novel Connections from Large Online Digital Library Using Biomedical Ontologies, Library Management Journal, 26, 4/5, 2005, 261-270.
Kaufman, L., and Rousseeuw, P.J. Finding Groups in Data: an Introduction to Cluster Analysis, 1999, John Wiley & Sons.
Koller, D. and Sahami, M. Hierarchically classifying documents using very few words. In: Proceedings of ICML-97, 1997, 170–176.
Larsen, B. and Aone, C. Fast and Effective Text Mining Using Linear-time Document Clustering, KDD-99, San Diego, California, 1999, 16-22.
Li, T., Ma, S., and Ogihara, M. Document clustering via adaptive subspace iteration. In: Proceedings of the 2004 ACM SIGMOD International Conference on Management of data, 2004, 218-225.
Patrick Pantel and Dekang Lin Document clustering with committees. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of data, 2002, 199-206.
Steinbach, M., Karypis, G., and Vipin Kumar A Comparison of Document Clustering Techniques. Technical Report #00-034. Department of Computer Science and Engineering, University of Minnesota, 2000. 228
van Rijsbergen, C. J. Information Retrieval, 2nd edition, London: Buttersworth, 1979. http://www.dcs.gla.ac.uk/Keith/Preface.html)
Wang, B.B., McKay, R I., Abbass, H.A., Barlow M. Learning Text Classifier using the Domain Concept Hierarchy. In: Proceedings of International Conference on Communications, Circuits and Systems 2002, China.
Willett, P. Recent trends in hierarchical document clustering: A critical review. Information Processing & Management, 24, 5, 1988, 577-597.
Xu, W. and Gong, Y. Document clustering by concept factorization. Proceedings of SIGIR-04, 2004, 202-209.
Zamir O., and Etzioni O. Web Document Clustering: A Feasibility Demonstration, In: Proceedings of SIGIR 98, 1998, 46-54.
Zeng, Y., Tang, J., Garcia-Frias, J. and Gao, G.R. An Adaptive Meta-Clustering Approach: Combining The Information From Different Clustering Results, IEEE Computer Society Bioinformatics Conference (CSB2002), 2002, 276-287.
Zhao, Y., and Karypis, G. Criterion functions for document clustering: Experiments and analysis, Technical Report, Department of Computer Science, University of Minnesota, 2002.
Zhao, Y., and Karypis, G. Evaluation of Hierarchical Clustering Algorithms for Document Datasets, Technical Report, Department of Computer Science, University of Minnesota, 2002.
Zhong, S., and Ghosh, J. A comparative study of generative models for document clustering. Proceedings of the workshop on Clustering High Dimensional Data and Its Applications in SIAM Data Mining Conference, 2003.
zu Eissen, S.M., Stein, B, Potthast, M. The Suffix Tree Document Model Revisited, In: Proceedings of the 5th International Conference on Knowledge Management, 2005, 596-603.

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2006 AComprehensiveComparStudyOfDocClustForBiomedDigiLibMEDLINE	Xiaohua Hu Illhoi Yoo			A Comprehensive Comparison Study of Document Clustering for a Biomedical Digital Library MEDLINE		Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries	http://www.mendeley.com/research/a-comprehensive-comparison-study-of-document-clustering-for-a-biomedical-digital-library-medline/			2006

2006 AComprehensiveComparStudyOfDocClustForBiomedDigiLibMEDLINE

Notes

Cited By

Quotes

Abstract

References

Navigation menu

Search