- (Gale et al., 1992) ⇒ William A. Gale, Kenneth W. Church, and David Yarowsky. (1992). “One Sense per Discourse.” In: Proceedings of the DARPA Speech and Natural Language Workshop.
- (Yarowsky, 1993) ⇒ David Yarowsky. (1993). “One Sense per Collocation.” In: Proceedings of the Workshop on Human Language Technology. doi:10.3115/1075671.1075731
It is well-known that there are polysemous words like sentence whose "meaning" or "sense" depends on the context of use. We have recently reported on two new word-sense disambiguation systems, one trained on bilingual material (the Canadian Hansards) and the other trained on monolingual material (Roget's Thesaurus and Grolier's Encyclopedia). As this work was nearing completion, we observed a very strong discourse effect. That is, if a polysemous word such as sentence appears two or more times in a well-written discourse, it is extremely likely that they will all share the same sense. This paper describes an experiment which confirmed this hypothesis and found that the tendency to share sense in the same discourse is extremely strong (98%). This result can be used as an additional source of constraint for improving the performance of the word-sense disambiguation algorithm. In addition, it could also be used to help evaluate disambiguation algorithms that did not make use of the discourse constraint.
2.2. Bayesian Discrimination
Surprisingly good results can be achieved using Bayesian discrimination methods which have been used very successfully in many other applications, especially author identification (Mosteller and Wallace, 1964) and information retrieval (IR) (Salton, 1989, section 10.3). Our word-sense disambiguation algorithm uses the words in a 100-word context surrounding the polysemous word very much like the other two applications use the words in a test document.
It is common to use very small contexts (e.g., 5-words) based on the observation that people do not need very much context in order to performance the disambiguation task. In contrast, we use much larger contexts (e.g., 100 words). Although people may be able to make do with much less context, we believe the machine Leeds all the help it can get, and we have found that the larger context makes the task much easier. In fact, we have been able to measure information at extremely large distances (10,000 words away from the polysemous word in question), though obviously most of the useful information appears relatively near the polysemous word (e.g., within the first 100 words or so). Needless to say, our 100-word contexts are considerably larger than the smaller 5-word windows that one normally finds in the literature.