2004 WordNetSimilarityMeasuringRelatedness

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Lexical Semantic Similarity Function, WordNet-based WSD Algorithm, WordNet-Similarity System.

Quotes

Abstract

  • WordNet::Similarity is a freely available software package that makes it possible to measure the semantic similarity and relatedness between a pair of concepts (or synsets). It provides six measures of similarity, and three measures of relatedness, all of which are based on the lexical database WordNet. These measures are implemented as Perl modules which take as input two concepts, and return a numeric value that represents the degree to which they are similar or related.

1 Introduction

  • WordNet::Similarity implements measures of similarity and relatedness that are all in some way based on the structure and content of WordNet.
  • Measures of similarity use information found in an is–a hierarchy of concepts (or synsets), and quantify how much concept A is like (or is similar to) concept B. For example, such a measure might show that an automobile is more like a boat than it is a tree, due to the fact that automobile and boat share vehicle as an ancestor in the WordNet noun hierarchy.
  • WordNet is particularly well suited for similarity measures, since it organizes nouns and verbs into hierarchies of is–a relations. In version 2.0, there are nine separate noun hierarchies that include 80,000 concepts, and 554 verb hierarchies that are made up of 13,500 concepts. Is–a relations in WordNet do not cross part of speech boundaries, so similarity measures are limited to making judgments between noun pairs (e.g., cat and dog) and verb pairs (e.g., run and walk). While WordNet also includes adjectives and adverbs, these are not organized into is–a hierarchies so similarity measures can not be applied.
  • However, concepts can be related in many ways beyond being similar to each other. For example, a wheel is a part of a car, night is the opposite of day, snow is made up of water, a knife is used to cut bread, and so forth. As such WordNet provides relations beyond is–a, including has–part, is–made–of, and is–an–attribute–of. In addition, each concept is defined by a short gloss that may include an example usage. All of this information can be brought to bear in creating measures of relatedness. As a result these measures tend to be more flexible, and allow for relatedness values to be assigned across parts of speech (e.g., the verb murder and the noun gun). This paper continues with an overview of the measures supported in WordNet::Similarity, and then provides a brief description of how the package can be used. We close with a summary of research that has employed WordNet::Similarity.

References

  • (Baldwin & el, 2003) ⇒ Timothy Baldwin, Colin Bannard, Takaaki Tanaka, and Dominic Widdows. (2003). “An Empirical Model of Multiword Expression Decomposability.” In: Proceedings of the of the ACL-2003 Workshop on Multiword Expressions: Analysis, Acquisition and Treatment. doi:10.3115/1119282.1119294
  • S. Banerjee and T. Pedersen. (2002). An adapted Lesk algorithm for word sense disambiguation using Word-Net. In: Proceedings of the Third International Conference on Intelligent Text Processing and Computational Linguistics, pages 136–145, Mexico City, February.
  • S. Banerjee and T. Pedersen. (2003). Extended gloss overlaps as a measure of semantic relatedness. In: Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence, pages 805–810, Acapulco, August.
  • M. Diab. (2003). Word Sense Disambiguation within a Multilingual Framework. Ph.D. thesis, The University of Maryland.
  • Graeme Hirst and D. St-Onge. (1998). Lexical chains as representations of context for the detection and correction of malapropisms. In C. Fellbaum, editor, WordNet: An electronic lexical database, pages 305–332. MIT Press.
  • M. Jarmasz and S. Szpakowicz. (2003). Roget’s thesaurus and semantic similarity. In: Proceedings of the Conference on Recent Advances in Natural Language Processing, pages 212–219, Borovets, Bulgaria.
  • J. Jiang and D. Conrath. (1997). Semantic similarity based on corpus statistics and lexical taxonomy. In: Proceedings on International Conference on Research in Computational Linguistics, pages 19–33, Taiwan.
  • C. Leacock and M. Chodorow. (1998). Combining local context andWordNet similarity for word sense identification. In C. Fellbaum, editor, WordNet: An electronic lexical database, pages 265–283. MIT Press.
  • Dekang Lin. (1998). An information-theoretic definition of similarity. In: Proceedings of the International Conference on Machine Learning, Madison, August.
  • D. McCarthy, R. Koeling, and J. Weeds. (2004). Ranking WordNet senses automatically. Technical Report CSRP 569, University of Sussex, January.
  • S. Patwardhan, S. Banerjee, and T. Pedersen. (2003). Using measures of semantic relatedness for word sense disambiguation. In: Proceedings of the Fourth International Conference on Intelligent Text Processing and Computational Linguistics, pages 241–257, Mexico City, February.
  • S. Patwardhan. (2003). Incorporating dictionary and corpus information into a context vector measure of semantic relatedness. Master’s thesis, University of Minnesota, Duluth, August.
  • J. Rennie. (2000). WordNet::QueryData: a Perl module for accessing the WordNet database. http://www.ai.mit.edu/people/jrennie/WordNet.
  • Philip Resnik. (1995). Using information content to evaluate semantic similarity in a taxonomy. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence, pages 448–453, Montreal, August.
  • Z. Wu and M. Palmer. (1994). Verb semantics and lexical selection. In 32nd Annual Meeting of the Association for Computational Linguistics, pages 133–138, Las Cruces, New Mexico.
  • Z. Zhang, J. Otterbacher, and Dragomir Radev. (2003). Learning cross-document structural relationships using boosting. In: Proceedings of the 12th International Conference on Information and Knowledge Management, pages 124–130.

,

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2004 WordNetSimilarityMeasuringRelatednessSiddharth Patwardhan
Ted Pedersen
Jason Michelizzi
WordNet::Similarity - Measuring the Relatedness of Conceptshttp://acl.ldc.upenn.edu/N/N04/N04-3012.pdf