2009 LingPipeWSD

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Word Sense Classification Task, Word Sense Discrimination Task.

Notes

Quotes

What is Word Sense Disambiguation?

  • Word sense disambiguation (WSD) is the task of determining which meaning of a polysemous word is intended in a given context.
  • Some words, such as English "run", are highly ambiguous. The American Heritage Dictionary, 4th Edition lists 28 intransitive verb senses, 31 transitive verb senses, 30 nominal senses and 46 adjectival senses. The word "gallop" has a mere 4 nominal senses, and the word "subroutine" only 1 nominal sense.

Where Do Senses Come From?

  • It would be convenient if we could trust dictionaries as the arbiter of word senses. Unfortunately, language presents harder problems than that. Words are fluid, living things that change meanings through metaphor, extension, adaptation, and just plain randomness. Attempting to carve the meaning of a word into a set of discrete categories with well-defined boundaries is doomed to fail for a number of reasons.
    • Words do not have well-defined boundaries between their senses. Dictionary definitions attempt to distinguish a discrete set of meanings with examples and definitions, which are themselves vague. Luckily, humans deal with vagueness in their language quite well, so this is not so much a problem with humans using dictionaries.
    • A related problem with dictionaries is that they don't agree. A quick glance at more than one dictionary (follow the link for "run", for example) will show that disagreement is not only possible, it's the norm. There is often overlap of meanings with subtle distinctions at the boundaries, which in practice, are actually vague.
    • Another problem with dictionaries is that they are incomplete. Today's newspaper or e-mail is likely to contain words or word senses that are not present in today's dictionary.
  • In practice, dictionaries can be useful. They might be good enough for practical purposes even if there are tail-of-the-distribution or boundary cases they don't adequately capture.

Supervised vs. Unsupervised WSd

  • We will assume for the rest of this tutorial that the words we care about will have finitely many disjoint senses. If we have training data, word sense disambiguation reduces to a classification problem. Additional training data may be supplied in the form of dictionary definitions, ontologies such as Medical Subject Headings (MeSH), or lexical resources like WordNet.
  • If there is no training data, word sense disambiguation is a clustering problem. Hierarchical clusterings may make sense; the dictionaries sited above break meanings of the word "run" down into senses and sub-senses.
  • For this demo, we will be doing supervised word sense disambiguation. That is, we will have training data consisting of examples of words in context and their meanings. We will compare several LingPipe classifiers on this task.

,

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2009 LingPipeWSDLingPipe NLP ToolkitLingPipe: Word Sense Tutorialhttp://alias-i.com/lingpipe/demos/tutorial/wordSense/read-me.html