2002 DiscoveringWordSensesFromText
- (Pantel & Lin, 2002b) ⇒ Patrick Pantel, and Dekang Lin. (2002). “Discovering Word Senses from Text.” In: Proceedings of the eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2002). doi:10.1145/775047.775138
See: Word Sense-based Word Form Clustering, Clustering By Committee Algorithm, Hybrid Clustering Algorithm
Notes
Cited by
Cited By
- (Kaji, 2003) ⇒ Hiroyuki Kaji. (2003). “Word Sense Acquisition from Bilingual Comparable Corpora.” In: Proceedings of NAACL Conference (NAACL 2003).
- To the best of our knowledge, there are two preceding research papers on word sense acquisition (Fukumoto and Tsujii, 1994; Pantel and Lin, 2002). Both proposed distributional word clustering algorithms that are characterized by their capabilities to produce overlapping clusters. According to their algorithms, a polysemous word is assigned to multiple clusters, each of which represents one of its senses.
Quotes
Author Keywords
Word sense discovery, clustering, evaluation, machine learning.
Abstract
Inventories of manually compiled dictionaries usually serve as a source for word senses. However, they often include many rare senses while missing corpus/domain-specific senses. We present a clustering algorithm called CBC (Clustering By Committee) that automatically discovers word senses from text. It initially discovers a set of tight clusters called committees that are well scattered in the similarity space. The centroid of the members of a committee is used as the feature vector of the cluster. We proceed by assigning words to their most similar clusters. After assigning an element to a cluster, we remove their overlapping features from the element. This allows CBC to discover the less frequent senses of a word and to avoid discovering duplicate senses. Each cluster that a word belongs to represents one of its senses. We also present an evaluation methodology for automatically measuring the precision and recall of discovered senses.
References
- 1. Douglass R. Cutting, David R. Karger, Jan O. Pedersen, John W. Tukey, Scatter/Gather: a cluster-based approach to browsing large document collections, Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval, p.318-329, June 21-24, 1992, Copenhagen, Denmark doi:10.1145/133160.133214
- 2. ROCK: A Robust Clustering Algorithm for Categorical Attributes, Proceedings of the 15th International Conference on Data Engineering, p.512, March 23-26, 1999
- 3. Harris, Z. 1985. Distributional structure. In: Katz, J. J. (ed.) he hilosophy of inguistics. New York: Oxford University Press. pp. 26--47.
- 4. Donald Hindle, Noun classification from predicate-argument structures, Proceedings of the 28th annual meeting on Association for Computational Linguistics, p.268-275, June 06-09, 1990, Pittsburgh, Pennsylvania doi:10.3115/981823.981857
- 5. Hutchins, J. and Sommers, H. (1992). Introduction to achine ranslation,. Academic Press.. 6. A. K. Jain, M. N. Murty, P. J. Flynn, Data clustering: a review, ACM Computing Surveys (CSUR), v.31 n.3, p.264-323, Sept. 1999 doi:10.1145/331499.331504
- 7. George Karypis, Eui-Hong (Sam) Han, Vipin Kumar, Chameleon: Hierarchical Clustering Using Dynamic Modeling, Computer, v.32 n.8, p.68-75, August 1999 doi:10.1109/2.781637
- 8. Thomas K. Landauer, and Dumais, S. T. (1997). A solution to Plato's problem: The Latent Semantic Analysis theory of the acquisition, induction, and representation of knowledge. sychological eview 104:211--240.
- 9. Landes, S.; Leacock, C,; and Tengi, R. I. (1998). Building semantic concordances. In ord et n lectronic e ical Database, edited by C. Fellbaum. pp. 199--216. MIT Press.
- 10. Dekang Lin. (1994). Principar - an efficient, broad-coverage, principle-based parser. roceedings of C I G-. pp. 42--48. Kyoto, Japan.
- 11. Dekang Lin. (1997). Using syntactic dependency as local context to resolve word sense ambiguity. In roceedings of C-. pp. 64--71. Madrid, Spain.
- 12. Dekang Lin. (1998). Automatic retrieval and clustering of similar words. In: Proceedings of C I G C -. pp. 768--774. Montreal, Canada..
- 13. Dekang Lin, Patrick Pantel, Induction of semantic classes from natural language text, Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, p.317-322, August 26-29, 2001, San Francisco, California doi:10.1145/502512.502558
- 14. Christopher D. Manning, Hinrich Schütze, Foundations of statistical natural language processing, MIT Press, Cambridge, MA, 1999
- 15. George A. Miller 1990. WordNet: An online lexical database. International ournal of e icography, 1990.. 16. Marius A. Paşca, Sandra M. Harabagiu, High performance question/answering, Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, p.366-374, September 2001, New Orleans, Louisiana, United States doi:10.1145/383952.384025
- 17. Gerard Salton, Michael J. McGill, Introduction to Modern Information Retrieval, McGraw-Hill, Inc., New York, NY, 1986
- 18. W. M. Shaw, Jr., Robert Burgin, Patrick Howell, Performance standards and evaluations in IR test collections: cluster-based retrieval models, Information Processing and Management: an International Journal, v.33 n.1, p.1-14, Jan 1, 1997 doi:10.1016/S0306-4573(96)00043-X
- 19. Steinbach, M.; Karypis, G.; and Vipin Kumar 2000. A comparison of document clustering techniques, echnical eport 00-0. Department of Computer Science and Engineering, University of Minnesota.
- 20. Ellen Voorhees. (1998). Using WordNet for text retrieval. In ord et n lectronic e ical Database, edited by C. Fellbaum. pp. 285--303. MIT Press.
,
Author | volume | Date Value | title | type | journal | titleUrl | doi | note | year | |
---|---|---|---|---|---|---|---|---|---|---|
2002 DiscoveringWordSensesFromText | Dekang Lin Patrick Pantel | Discovering Word Senses from Text | http://www.patrickpantel.com/cgi-bin/Web/Tools/getfile.pl?type=paper&id=2002/kdd02.pdf | 10.1145/775047.775138 |