Lesk Algorithm

Jump to navigation Jump to search

A Lesk algorithm is an unsupervised word sense disambiguation algorithm that resembles the one proposed in (Lesk, 1986).



  • (Wikipedia, 2009) ⇒ http://en.wikipedia.org/wiki/Lesk_algorithm
    • The Lesk algorithm is a classical algorithm for word sense disambiguation introduced by Michael E. Lesk in (1986). The Lesk algorithm is based on the assumption that words in a given neighbourhood will tend to share a common topic. A naive implementation of the The Lesk algorithm would be:
      • 1. choosing pairs of ambiguous words within a neighbourhood
      • 2. checks their definitions in a dictionary
      • 3. choose the senses as to maximise the number of common terms in the definitions of the chosen words.
    • Accuracy on Pride and Prejudice and selected papers of the Associated Press was found to be in the 50% to 70% range.
    • A simplified version of the Lesk algorithm is to compare the dictionary definition of an ambiguous word with the terms contained of the neighbourhood.
    • Versions have been adapted to Wordnet.
  • (Gentile et al., 2009) ⇒ Anna L. Gentile, Pierpaolo Basile, and Giovanni Semeraro. (2009). “WibNED Wikipedia Based Named Entity Disambiguation.” In: Proceedings of the 5th Italian Research Conference on Digital Libraries (IRCDL 2009).



  • (Patwardhan et al., 2003) ⇒ Siddharth Patwardhan, Satanjeev Banerjee, and Ted Pedersen. (2003). “Using Measures of Semantic Relatedness for Word Sense Disambiguation.” In: Proceedings of the Fourth International Conference on Intelligent Text Processing and Computational Linguistics (CICLing 2003).
    • QUOTE: The original Lesk algorithm [9] disambiguates a target word by comparing its gloss with those of its surrounding words. The target word is assigned the sense whose gloss has the most overlapping or shared words with the glosses of its neighboring words.
    • There are two hypotheses that underly this approach. The first is that words that appear together in a sentence can be disambiguated by assigning to them the senses that are most closely related to their neighboring words. This follows from the intuition that words that appear together in a sentence must inevitably be related in some way, since they are normally working together to communicate some idea. The second hypothesis is that related senses can be identified by finding overlapping words in their definitions. The intuition here is equally reasonable, in that words that are related will often be defined using the same words, and in fact may refer to each other in their definitions.
    • The main limitation to this approach is that dictionary glosses are often quite brief, and may not include sufficient vocabulary to identify related senses. Banerjee and Pedersen suggest an adaptation based on the use of WordNet.