2002 AnAdaptedLeskAlgForWSDUsingWordNet

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Word-Sense Disambiguation Algorithm, Lesk Algorithm, WordNet Database, BP Adapted Lesk Algorithm.

Notes

  • Reimplements the Lesk Algorithm but uses WordNet as the Sense Inventory Database.
  • It assigns the candidate combination with the highest score, and if there is a tie then chooses the most familiar word sense.
  • Test performance ont he SENSEEVAL-2 Benchmark Task.
  • Achieves a 32% accuracy, which is better performance than the 16% and 23% of previously reported results.

Cited By

Quotes

Abstract

  • This paper presents an adaptation of Lesk’s dictionary-based word sense disambiguation algorithm. Rather than using a standard dictionary as the source of glosses for our approach, the lexical database WordNet is employed. This provides a rich hierarchy of semantic relations that our algorithm can exploit. This method is evaluated using the English lexical sample data from the Senseval-2 word sense disambiguation exercise, and attains an overall accuracy of 32%. This represents a significant improvement over the 16% and 23% accuracy attained by variations of the Lesk algorithm used as benchmarks during the SENSEVAL-2 comparative exercise among word sense disambiguation systems.

1 Introduction

  • Most words in natural languages are polysemous, that is they have multiple possible meanings or senses. For example, interest can mean a charge for borrowing money, or a sense of concern and curiosity. When using language humans rarely stop and consider which sense of a word is intended. For example, in I have an interest in the arts, a human reader immediately knows from the surrounding context that interest refers to an appreciation, and not a charge for borrowing

money.

  • However, computer programs do not have the benefit of a human’s vast experience of the world and language, so automatically determining the correct sense of a polysemous word is a difficult problem. This process is called word sense disambiguation, and has long been recognized as a significant component in language processing applications such as information retrieval, machine translation, speech recognition, etc.
  • In recent years corpus–based approaches to word sense disambiguation have become quite popular. In general these rely on the availability of manually created sense–tagged text, where a human has gone through a corpus of text, and labeled each occurrence of a word with a tag that refers to the definition of the word that the human considers most appropriate for that context. This sense–tagged text serves as training examples for a supervised learning algorithm that can induce a classifier that can then be used to assign a sense–tag to previously unseen occurrences of a word. The main difficulty of this approach is that sense–tagged text is expensive to create, and even once it exists the classifiers learned from it are only applicable to text written about similar subjects and for comparable audiences.
  • Approaches that do not depend on the existence of manually created training data are an appealing alternative. An idea that actually pre–dates most work in corpus–based approaches is to take advantage of the information available in machine readable dictionaries. The Lesk algorithm [3] is the prototypical approach, and is based on detecting shared vocabulary between the definitions of words. We adapt this algorithm to WordNet [2], which is a lexical database structured as a semantic network.
  • This paper continues with a description of the original Lesk algorithm and an overview of WordNet. This is followed by a detailed presentation of our algorithm, and a discussion of our experimental results.

2 The Lesk Algorithm

  • The original Lesk algorithm [3] disambiguates words in short phrases. The definition, or gloss, of each sense of a word in a phrase is compared to the glosses of every other word in the phrase. A word is assigned the sense whose gloss shares the largest number of words in common with the glosses of the other words. For example, in time flies like an arrow, the algorithm compares the glosses of time to all the glosses of fly and arrow. Next it compares the glosses of fly with those of time and arrow, and so on. The algorithm begins anew for each word and does not utilize the senses it previously assigned.
  • The original Lesk algorithm relies on glosses found in traditional dictionaries such as Oxford Advanced Learner’s. We modify Lesk’s basic approach to take advantage of the highly inter–connected set of relations among synonyms that WordNet offers. While Lesk’s algorithm restricts its comparisons to the glosses of the words being disambiguated, our approach is able to compare the glosses of words that are related to the words to be disambiguated. This provides a richer source of information and improves overall disambiguation accuracy. We also introduce a novel scoring mechanism that weighs longer sequences of matches more heavily than single words.

4.2 Processing

8 Conclusions

  • This paper presents an adaptation of the Lesk algorithm for word sense disambiguation. While the original algorithm relies upon finding overlaps in the glosses of neighboring words, this extends these comparisons to include the glosses of words that are related to the words in the text being disambiguated. These relationships are defined by the lexical database WordNet. We have evaluated this approach on the English Senseval-2 lexical sample data and find that it attains overall accuracy of 32%, which doubles the accuracy of a more traditional Lesk approach. The authors have made their Perl implementation of this algorithm freely available on their web sites.

References


,

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2002 AnAdaptedLeskAlgForWSDUsingWordNetSatanjeev Banerjee
Ted Pedersen
An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNethttp://www.d.umn.edu/~tpederse/Pubs/cicling2002-b.pdf10.1007/3-540-45715-1_11