2003 UsingMeasuresOfSemRelatednessForWSD

(Patwardhan et al., 2003) ⇒ Siddharth Patwardhan, Satanjeev Banerjee, Ted Pedersen. (2003). “Using Measures of Semantic Relatedness for Word Sense Disambiguation.” In: Proceedings of the Fourth International Conference on Intelligent Text Processing and Computational Linguistics (CICLing 2003). doi:10.1007/3-540-36456-0_24

Subject Headings: Word Sense Disambiguation Algorithm, WordNet, Lesk Algorithm, Lexical Semantic Similarity Function.

Notes

Cited By

Quotes

Abstract

This paper generalizes the Adapted Lesk Algorithm of Banerjee and Pedersen (2002) to a method of word sense disambiguation based on semantic relatedness. This is possible since Lesk's original algorithm (1986) is based on gloss overlaps which can be viewed as a measure of semantic relatedness. We evaluate a variety of measures of semantic relatedness when applied to word sense disambiguation by carrying out experiments using the English lexical sample data of Senseval-2. We find that the gloss overlaps of Adapted Lesk and the semantic distance measure of Jiang and Conrath (1997) result in the highest accuracy.

{{#ifanon:|

1. Introduction

Word sense disambiguation is the process of assigning a meaning to a word based on the context in which it occurs. The most appropriate meaning for a word is selected from a predefined set of possibilities, usually known as a sense inventory.

In this paper we present a class of dictionary–based methods that follow from the Adapted Lesk Algorithm of Banerjee and Pedersen [2]. The original Lesk algorithm [9] disambiguates a target word by selecting the sense whose gloss (or definition) has the largest number of words that overlap (or match) with the glosses of neighboring words. Banerjee and Pedersen extend the concept of a gloss overlap to include the glosses of words that are related to the target word and its neighbors according to the concept hierarchies provided in the lexical database WordNet [4]. This paper takes the view that gloss overlaps are just another measure of semantic relatedness, which is a point previously noted by Resnik [13]. In this paper we evaluate several additional measures of semantic relatedness when applied to word sense disambiguation using the general framework provided by the Adapted Lesk Algorithm.

Supervised learning algorithms also assign meanings to words from a sense inventory, but take a very different approach. A human manually annotates examples of a word with tags that indicates the intended sense in each context. These examples become training data for a learning algorithm that induces rules that are then used to assign meanings to other occurrences of the word. In supervised methods, the human uses the information in the dictionary to decide which sense tag should be assigned to an example, and then a learning algorithm finds clues from the context of that word that allow it to generalize rules of disambiguation. Note that the learning algorithm simply views the sense inventory as a set of categories and that the human has absorbed the information from the dictionary and combined it with their own knowledge of words to manually sense–tag the training examples. The objective of a dictionary–based approach is to provide a disambiguation algorithm with the contents of a dictionary and attempt to make inferences about the meanings of words in context based on that information. Here we extract information about semantic relatedness from the lexical database WordNet (sometimes augmented by corpus statistics) in order to make such inferences.

This paper begins with an overview of the original Lesk algorithm and the adaptation of Banerjee and Pedersen. We review five other measures of semantic relatedness that are included in this study. These include measures by Resnik (1995), Jiang and Conrath (1997), Lin (1997), Leacock and Chodorow (1998), and Hirst and St. Onge (1998). We go on to describe our experimental methodology and results. We close with an analysis and discussion, as well as a brief review of related work.

2. The Lesk Algorithm

The original Lesk algorithm [9] disambiguates a target word by comparing its gloss with those of its surrounding words. The target word is assigned the sense whose gloss has the most overlapping or shared words with the glosses of its neighboring words.

There are two hypotheses that underly this approach. The first is that words that appear together in a sentence can be disambiguated by assigning to them the senses that are most closely related to their neighboring words. This follows from the intuition that words that appear together in a sentence must inevitably be related in some way, since they are normally working together to communicate some idea. The second hypothesis is that related senses can be identified by finding overlapping words in their definitions. The intuition here is equally reasonable, in that words that are related will often be defined using the same words, and in fact may refer to each other in their definitions.

For example, in The rate of interest at my bank is. . . a human reader knows that bank refers to a financial institution rather than a river shore, since each of these words has a financial sense. In WordNet the glosses of the financial senses of these three words overlap; the glosses of interest and bank share money and mortgage, and the glosses of interest and rate share charge.

The main limitation to this approach is that dictionary glosses are often quite brief, and may not include sufficient vocabulary to identify related senses. Banerjee and Pedersen suggest an adaptation based on the use of WordNet. Rather than simply considering the glosses of the surrounding words in the sentence, the concept hierarchy of WordNet is exploited to allow for glosses of word senses related to the words in the context to be compared as well. In effect, the glosses of surrounding words in the text are expanded to include glosses ofthose words to which they are related through relations in WordNet. Pedersen and Banerjee also suggest a variation to the scoring of overlaps such that a match of n consecutive words in two glosses is weighted more heavily than a set of n one word matches.

Suppose that bark is the target word and it is surrounded by dog and tail. The original Lesk algorithm checks for overlaps in the glosses of the senses of dog with the glosses of bark. Then it checks for overlaps in the glosses of bark and tail. The sense of bark with the maximum number of overlaps with dog and tail is selected. The adaptation of the Lesk algorithm considers these same overlaps and adds to them the overlaps of the glosses of the senses of concepts that are semantically or lexically related to dog, bark and tail according to WordNet.

3. WordNet

WordNet [4] is a freely–avail

References

Eneko Agirre and G. Rigau. Word sense disambiguation using conceptual density. In: Proceedings of the 16th International Conference on Computational Linguistics, pages 16–22, Copenhagen, 1996.
(Banerjee and Pedersen, 2002) ⇒ Satanjeev Banerjee, and Ted Pedersen. (2002). “An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet.” In: Proceedings of CICLing (2002). Lecture Notes In Computer Science; Vol. 2276.
A. Budanitsky and Graeme Hirst. Semantic distance in WordNet: An experimental, application-oriented evaluation of five measures. In Workshop on WordNet and Other Lexical Resources, Second meeting of the North American Chapter of the Association for Computational Linguistics, Pittsburgh, June 2001.
C. Fellbaum, editor. WordNet: An electronic lexical database. MIT Press, (1998). Using Measures of Semantic Relatedness for Word Sense Disambiguation 257
W. Francis and H. Kucera. Frequency Analysis of English Usage: Lexicon and Grammar. Houghton Mifflin, 1982.
Graeme Hirst and D. St. Onge. Lexical chains as representations of context for the detection and correction of malapropisms. In C. Fellbaum, editor, WordNet: An electronic lexical database, pages 305–332. MIT Press, 1998.
(Jiang and Conrath) ⇒ Jay J. Jiang, and David W. Conrath. (1997). “Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy.” In: Proceedings on International Conference on Research in Computational Linguistics.
C. Leacock and M. Chodorow. Combining local context and WordNet similarity for word sense identification. In C. Fellbaum, editor, WordNet: An electronic lexical database, pages 265–283. MIT Press, 1998.
(Lesk, 1986) ⇒ Michael Lesk. (1986). “Automatic Sense Disambiguation Uusing Machine Readable Dictionaries: How to tell a pine cone from a ice cream cone.” In: Proceedings of SIGDOC-1986.
Dekang Lin. Using syntactic dependency as a local context to resolve word sense ambiguity. In: Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics, pages 64–71, Madrid, July 1997.
M. Marcus, B. Santorini, and M. Marcinkiewicz. Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19(2):313–330, 1993.
R. Rada, H. Mili, E. Bicknell, and M. Blettner. Development and application of a metric on semantic nets. IEEE Transactions on Systems, Man and Cybernetics, 19(1):17–30, 1989.
Philip Resnik. Using information content to evaluate semantic similarity in a taxonomy. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence, Montreal, August 1995.
Philip Resnik. WordNet and class–based probabilities. In C. Fellbaum, editor, Word- Net: An electronic lexical database, pages 239–263. MIT Press, 1998.

,

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2003 UsingMeasuresOfSemRelatednessForWSD	Satanjeev Banerjee Siddharth Patwardhan Ted Pedersen			Using Measures of Semantic Relatedness for Word Sense Disambiguation		Proceedings of the Fourth International Conference on Intelligent Text Processing and Computational Linguistics	http://www-2.cs.cmu.edu/~banerjee/Publications/pedersen3.pdf	10.1007/3-540-36456-0_2		2003