1992 OneSensePerDiscourse

Jump to: navigation, search

Subject Headings: One Sense per Discourse, Word Sense, Word Sense Disambiguation, Polysemous Word, One Sense Per Discourse Heuristic.

Cited By



It is well-known that there are polysemous words like sentence whose "meaning" or "sense" depends on the context of use. We have recently reported on two new word-sense disambiguation systems, one trained on bilingual material (the Canadian Hansards) and the other trained on monolingual material (Roget's Thesaurus and Grolier's Encyclopedia). As this work was nearing completion, we observed a very strong discourse effect. That is, if a polysemous word such as sentence appears two or more times in a well-written discourse, it is extremely likely that they will all share the same sense. This paper describes an experiment which confirmed this hypothesis and found that the tendency to share sense in the same discourse is extremely strong (98%). This result can be used as an additional source of constraint for improving the performance of the word-sense disambiguation algorithm. In addition, it could also be used to help evaluate disambiguation algorithms that did not make use of the discourse constraint.

2.2. Bayesian Discrimination

Surprisingly good results can be achieved using Bayesian discrimination methods which have been used very successfully in many other applications, especially author identification (Mosteller and Wallace, 1964) and information retrieval (IR) (Salton, 1989, section 10.3). Our word-sense disambiguation algorithm uses the words in a 100-word context surrounding the polysemous word very much like the other two applications use the words in a test document.

It is common to use very small contexts (e.g., 5-words) based on the observation that people do not need very much context in order to performance the disambiguation task. In contrast, we use much larger contexts (e.g., 100 words). Although people may be able to make do with much less context, we believe the machine Leeds all the help it can get, and we have found that the larger context makes the task much easier. In fact, we have been able to measure information at extremely large distances (10,000 words away from the polysemous word in question), though obviously most of the useful information appears relatively near the polysemous word (e.g., within the first 100 words or so). Needless to say, our 100-word contexts are considerably larger than the smaller 5-word windows that one normally finds in the literature.


 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
1992 OneSensePerDiscourseWilliam A. Gale
Kenneth W. Church
David Yarowsky
One Sense per DiscourseProceedings of the DARPA Speech and Natural Language Workshophttp://www.coli.uni-saarland.de/~schulte/Teaching/ESSLLI-06/Referenzen/Senses/gale-et-al-1992.pdf1992