Word Sense Disambiguation (WSD) Task

Jump to navigation Jump to search

A Word Sense Disambiguation (WSD) Task is a word mention to word sense resolution task that is restricted to the use of word sense inventory.



  • (Wikipedia, 2012) ⇒ http://en.wikipedia.org/wiki/Word_sense_disambiguation
    • QUOTE: In computational linguistics, word-sense disambiguation (WSD) is an open problem of natural language processing, which governs the process of identifying which sense of a word (i.e. meaning) is used in a sentence, when the word has multiple meanings (polysemy). The solution to this problem impacts other computer-related writing, such as discourse, improving relevance of search engines, anaphora resolution, coherence, inference et cetera.

      Research has progressed steadily to the point where WSD systems achieve sufficiently high levels of accuracy on a variety of word types and ambiguities. A rich variety of techniques have been researched, from dictionary-based methods that use the knowledge encoded in lexical resources, to supervised machine learning methods in which a classifier is trained for each distinct word on a corpus of manually sense-annotated examples, to completely unsupervised methods that cluster occurrences of words, thereby inducing word senses. Among these, supervised learning approaches have been the most successful algorithms to date.

      Current accuracy is difficult to state without a host of caveats. In English, accuracy at the coarse-grained (homograph) level is routinely above 90%, with some methods on particular homographs achieving over 96%. On finer-grained sense distinctions, top accuracies from 59.1% to 69.0% have been reported in recent evaluation exercises (SemEval-2007, Senseval-2), where the baseline accuracy of the simplest possible algorithm of always choosing the most frequent sense was 51.4% and 57%, respectively.




  • http://www.cse.unsw.edu.au/~billw/nlpdict.html#wordsenseambig
    • QUOTE: A kind of ambiguity where what is in doubt is what sense of a word is intended. One classic example is in the sentence "John shot some bucks". Here there are (at least) two readings - one corresponding to interpreting "bucks" as meaning male deer, and "shot" meaning to kill, wound or damage with a projectile weapon (gun or arrow), and the other corresponding to interpreting "shot" as meaning "waste", and "bucks" as meaning dollars. Other readings (such as damaging some dollars) are possible but semantically implausible. Notice that all readings mentioned have the same syntactic structure, as in each case, "shot" is a verb and "bucks" is a noun.
    • See also structural ambiguity and referential ambiguity.


  • (Lingpipe WSD Tutorial, 2009) LingPipe. (2009). “LingPipe: Word Sense Tutorial." LingPipe Homepage.
    • QUOTE: Word sense disambiguation (WSD) is the task of determining which meaning of a polysemous word is intended in a given context.

      Some words, such as English "run", are highly ambiguous. The American Heritage Dictionary, 4th Edition lists 28 intransitive verb senses, 31 transitive verb senses, 30 nominal senses and 46 adjectival senses. The word "gallop" has a mere 4 nominal senses, and the word "subroutine" only 1 nominal sense.

      Where Do Senses Come From?

      It would be convenient if we could trust dictionaries as the arbiter of word senses. Unfortunately, language presents harder problems than that. Words are fluid, living things that change meanings through metaphor, extension, adaptation, and just plain randomness. Attempting to carve the meaning of a word into a set of discrete categories with well-defined boundaries is doomed to fail for a number of reasons.

      • Words do not have well-defined boundaries between their senses. Dictionary definitions attempt to distinguish a discrete set of meanings with examples and definitions, which are themselves vague. Luckily, humans deal with vagueness in their language quite well, so this is not so much a problem with humans using dictionaries.
      • A related problem with dictionaries is that they don't agree. A quick glance at more than one dictionary (follow the link for "run", for example) will show that disagreement is not only possible, it's the norm. There is often overlap of meanings with subtle distinctions at the boundaries, which in practice, are actually vague.
      • Another problem with dictionaries is that they are incomplete. Today's newspaper or e-mail is likely to contain words or word senses that are not present in today's dictionary.
    • In practice, dictionaries can be useful. They might be good enough for practical purposes even if there are tail-of-the-distribution or boundary cases they don't adequately capture.
    • Supervised vs. Unsupervised WSD
      • We will assume for the rest of this tutorial that the words we care about will have finitely many disjoint senses. If we have training data, word sense disambiguation reduces to a classification problem. Additional training data may be supplied in the form of dictionary definitions, ontologies such as Medical Subject Headings (MeSH), or lexical resources like WordNet.
      • If there is no training data, word sense disambiguation is a clustering problem. Hierarchical clusterings may make sense; the dictionaries sited above break meanings of the word "run" down into senses and sub-senses.
      • For this demo, we will be doing supervised word sense disambiguation. That is, we will have training data consisting of examples of words in context and their meanings. We will compare several LingPipe classifiers on this task.



  • (JoshiPPMC, 2006) ⇒ Mahesh Joshi, Serguei Pakhomov, Ted Pedersen, Richard Maclin, and Christopher Chute. (2006). “An End-to-end Supervised Target-Word Sense Disambiguation System.” In: Proceedings of AAAI-2006 (Intelligent System Demonstration).
    • QUOTE: Word Sense Disambiguation (WSD) is the task of automatically deciding the sense of an ambiguous word based on its surrounding context. The correct sense is usually chosen from a predefined set of senses, known as the sense inventory. In target-word sense disambiguation the scope is limited to assigning meaning to occurrences of a few predefined target words in the given corpus of text.