1993 OneSensePerCollocation

Jump to navigation Jump to search

Subject Headings: Collocation, Word Sense Disambiguation Task, One Sense Per Collocation Heuristic.


  • Suggests that in a Collocation each component Word has a single Word Sense.
    • For example "foreign aid" is more likely than "foreign aide", while "presidential aide" is more likely than "presidential aid".

Cited By



Previous work (Gale, Church and Yarowsky, 1992) showed that with high probability a polysemous word has one sense per discourse. In this paper we show that for certain definitions of collocation, a polysemous word exhibits essentially only one sense per collocation. We test this empirical hypothesis for several definitions of sense and collocation, and discover that it holds with 90--99% accuracy for binary ambiguities. We utilize this property in a disambiguation algorithm that achieves precision of 92% using combined models of very local context.


The traditional definition of word sense is "One of several meanings assigned to the same orthographic string". As meanings can always be partitioned into multiple refinements, senses are typically organized in a tree such as one finds in a dictionary. In the extreme case, one could continue making refinements until a word has a slightly different sense every time it is used. If so, the title of this paper is a tautology. However, the studies in this paper are focused on the sense distinctions at the top of the tree. A good working definition of the distinctions considered are those meanings which are not typically translated to the same word in a foreign language.

Therefore, one natural type of sense distinction to consider are those words in English which indeed have multiple translations in a language such as French.


Collocation means the co-occurrence of two words in some defined relationship. We look at several such relationships, including direct adjacency and first word to the left or right having a certain part-of-speech. We also consider certain direct syntactic relationships, such as verb/object, subject/verb, and adjective/noun pairs. It appears that content words (nouns, verbs, adjectives, and adverbs) behave quite differently from function words (other parts of speech); we make use of this distinction in several definitions of collocation.

We will attempt to quantify the validity of the one-sense-per collocation hypothesis for these different collocation types.

4.2 Measuring Entropies

Table 2: A typical collocational distribution for the homophone ambiguity aid/aide.
| Collocation | Aid | Aide |
| foreign | 718 | 1 |
| federal | 297 | 0 |
| western | 146 | 0 |
| provide | 88 | 0 |
| covert | 26 | 0 |
| appose | 13 | 0 |
| future | 9 | 0 |
| similar | 6 | 0 |
| presidential | 0 | 63 |
| chief | 0 | 40 |
| longtime | 0 | 26 |
| aids-infected | 0 | 2 |
| sleepy | 0 | 1 |
| disaffected | 0 | 1 |
| indispensable | 2 | 1 |
| practical | 2 | 0 |
| squander | 1 | 0 |


The sense disambiguation algorithm used is quite straightforward. When based on a single collocation type, such as the object of the verb or word immediately to the left, the procedure is very simple. One identifies if this collocation type exists for the novel context and if the specific words found are listed in the table of probability distributions (as computed above). If so, we return the sense which was most frequent for that collocation in the training data. If not, we return the sense which is most frequent overall.


This paper has examined some of the basic distributional properties of lexical ambiguity in the English language. Our experiments have shown that for several definitions of sense and collocation, an ambiguous word has only one sense in a given collocation with a probability of 90-99%. We showed how this claim is influenced by part-of-speech, distance, and sample frequency. We discussed the implications of these results for data set creation and algorithm design, identifying potential weaknesses in the common "bag of words" approach to disambiguation. Finally, we showed that models of local collocation can be combined in a disambiguation algorithm that achieves overall precision of 92%.



 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
1993 OneSensePerCollocationDavid YarowskyOne Sense per CollocationProceedings of the Workshop on Human Language Technologyhttp://acl.ldc.upenn.edu/H/H93/H93-1052.pdf10.3115/1075671.10757311993