Subject Headings: Compound Noun, Noun Compound Bracketing, Noun Compound Parsing.
- It suggests that Compound Noun analysis involves three steps:
- The senses of the constituent words have to be identified.
- The structure, i.e. syntactic bracketing, of the group has to be determined.
- The semantic relations linking the words have to be established.
- It suggests that Compound Nouns can be divided into: established, non-established, or novel (as Warrent 1978 does)
- It uses the sample sentence “We need to do the decorations and flowers and things now: I'll do the [verandah table sprays].”
- For the designer of an automatic language-processing program, non-lexicalised compound nouns are a problem. Even in specialised domains, where de facto lexicalisation may be fairly rampant, compound nouns are a problem; and they are much more so in ordinary discourse.
- The first part of this paper briefly reviews the properties of compound nouns; the second considers what these imply for automatic language processing in general; and the third discusses the particular issues which arise in handling compound nouns in automatic speech processing. The object of the paper is to examine an important problem: it does not pretend to solve it; but while the problems of compound noun interpretation are largely bypassed in today's domain-specific language-processing programs, they will have to be tackled if more power and comprehensive programs are to be built.
1. Properties of compound nouns
- It is well known that, in English at least, compound nouns can be freely constructed, generating units of, in some cases, surprising length. Certainly pairs of nouns are very common (e.g. "basket lid"), triples occur frequently ("staff tearoom pinboard"), and in 'technical' contexts especially even longer compounds are not unusual ("satellite radio link transmitter", "horse race apprentice training establishment"). When proper names figure, compounds may reach the amazing length of the Gleitman's "Volume Feeding Management Success Formula Award" (Gleitman and Gleitman 1970; "Volume Feeding Management" is a name).
- It is not in fact possible to maintain a principled distinction between lexicalised and non-lexicalised compounds, even within specialised universes of discourse. Some compounds are clearly lexicalised, as may be shown in their becoming single words ("tearoom"), developing meaning extension having no reference to their underlying structure, etc. However, even those compounds canonised by entries in lexicons differ in the extend to which they are established, and are properly regarded only as representing one end of a spectrum from the firmly established to the totally novel. It may be convenient for the purposes of linguistic discussion to group compounds, and at the same time more satisfactory, to label compounds as established, non-established, or novel (as Warren 1978 does) rather than as simply lexicalised or non-lexicalised. However from both the formal and the programming points of view, a compound is either supplied with an explicitly characterisation, as a unit, in a or it is not. The problem of compound nouns is that as compounding is a highly productive process, any individual compound may not figure in any particular so its meaning has to be constructed by the reader/hearer. From this point of view, a compound recalled as familiar by a human being, though it does not fiture in an official must be treated as lexicalised; and equally, a second occurrence of a novel compound in a text may be treated as lexicalised with respect to a lexicon generated by that text. For the purposes of this paper, therefore, I shall simply divide compounds into the lexicalised and the non-lexicalised, in order to focus on the interpretation problems presented in the latter. (Of course a non-lexicalised compound may have a lexicalised constituent: but this will then be assumed to function like a single word.)
- Interpreting compound nouns, i.e. providing a meaning representation for them, has three elements. The senses of the constituent words have to be identified; the structure, i.e. syntactic bracketing, of the group has to be determined; and the semantic relations linking the words have to be established. For example, given the noun string "verandah table sprays" in the particular context, say:
- We need to do the decorations and flowers and things now: I'll do the verandah table sprays.,
- it is necessary to identify the appropriate sense of words, for example 'supportive piece of furniture' rather then 'list of numbers' as the meaning of "table"; to determine that ((verandah table) sprays) rather than (verandah (table sprays)) is the syntactic bracketing of the group, i.e. that "verandah" modifies "table" and "sprays" modifies "verandah table" rather than "table" "sprays" and "verandah" "table sprays"; and to establish the underlying relationships between the words as 'sprays FOR tables IN verandah". However as this example, pushed a littler further shows, the general problem of compound nouns is that, like single words, they are typically ambiguous in isolation, in constituent senses and/or bracketing and/or linking relations, and that quite extensive contextual information may be required to disambiguate them.
- Beatrice Warren. (1978) Semantic Patterns of Noun-Noun Compounds. GOthenburg Studies in English 41.,