A compound word is a content lexical item composed of two or more lexical items (a complex word).




  • (Crystal, 2008) ⇒ David Crystal. (2008). “A Dictionary of Linguistics and Phonetics, 6th edition." Blackwell Publishing.
    • compound (n.) A term used widely in DESCRIPTIVE LINGUISTIC studies to refer to a linguistic UNIT which is composed of ELEMENTS that function independently in other circumstances. Of particular currency are the notions of compound found in 'compound WORDS' (consisting of two or more free MORPHEMES, as in such 'compound NOUNS' as bedroom, rainfall, and washing machine) and 'compound SENTENCES' (consisting of two or more main CLAUSES); but other application of the term exist, as in 'compound VERBS' (e.g. come in), 'compound TENSES' (those consisting of an AUXILIARY + LEXICAL verb), 'compound SUBJECTS/OBJECTS', etc. (where the clause elements consist of more than one noun PHRASE or PRONOUN, as in the boys and the girls shouted) and 'compound PREPOSITIONS' (e.g. in accordance with). See also BAHAVRUIHI, DVANDVA.


  • (Nesselhauf, 2005) ⇒ Nadja Nesselhauf. (2005). “Collocations in a Learner Corpus." John Benjamins Publishing
  • (Villavicencio, 2005) ⇒ Aline Villavicencio, Francis Bonda, Anna Korhonena, and Diana McCarthya. (2005). “Introduction to the Special Issue on Multiword Expressions: Having a crack at a hard nut.” In: Special issue on Multiword Expression, Computer Speech & Language, 19(4). doi:10.1016/j.csl.2005.05.001
    • The term ‘‘Multiword Expression’’ has been defined slightly differently by different researchers. (footnote: 1 Other terms used to refer to MWEs include ‘‘multiwords’’, ‘‘multiword units’’ (Dias et al., 2004) and ‘‘fixed expressions and idioms’’ Moon (1998).) Calzolari et al. (2002). gives a general definition as ‘‘a sequence of words that acts as a single unit at some level of linguistic analysis’’, which in addition must exhibit (some of) the following characteristics to a smaller or greater extent:
      • (1) reduced syntactic and semantic transparency;
      • (2) reduced or lack of compositionality;
      • (3) more or less frozen or fixed status;
      • (4) possible violation of some otherwise general syntactic patterns or rules;
      • (5) a high degree of lexicalisation (depending on pragmatic factors);
      • (6) a high degree of conventionality.





  • (Manning and Schütze, 1999) ⇒ Christopher D. Manning and Hinrich Schütze. (1999). “Foundations of Statistical Natural Language Processing." The MIT Press.
    • The major types of morphological processes are inflection, derivation, and compounding.
    • Compounding refers to the merging of two or more words into a new word. English has many noun-noun compounds, nouns that are combinations of two other nouns. Examples are tea kettle, disk drive, or college degree. While these are (usually) written as separate words, they are pronounced as a single word, and denote a single semantic concept, which one would normally wish to list in the lexicon. There are also other compounds that involve parts of speech such as adjectives, verbs, and prepositions, such as downmarket, (to) overtake, and mad cow disease.
    • While maintaining most words spaces, in German compound nouns are written as single words, for example Lebensversicherungsgesellschaftsangestellter 'life insurance company employee.' In many ways this makes linguistic sense, as compounds are a single words, at least phonologically. But for process purposes one may wish to divide such a compound, or at least to be aware of the internal structure of the words, and this becomes a limited words segmentation task. While not the rule, joining of compounds sometimes also happens in English, especially when they are common and have a specialized meaning. We noted above that one finds both data base and database. As another examples, while hard disk is more common, one sometimes finds harddisk in the computer press.


  • (Moon, 1998) ⇒ Rosamund Moon. (1998). “Fixed Expressions and Idioms in English: A Corpus-based Approach." Oxford University Press.
    • Anomalous collocations: lexicogrammatically marked
      • (syntactically) ill-formed collocations: (at all, by and large)
      • cranberry collocations: idiosyncratic lexical component -- one or more words found only in that collocation (in retrospect, kith and kin)
      • defective collocations: idiosyncratic meaning component (in effect, foot the bill)
      • phraseological collocations: semi-productive constructions, occurring in paradigms (in/into/out of action, on show/display)
    • Formulae: pragmatically marked
      • simple formulae/sayings: compositional strings with a special discourse function (alive and well, a horse, a horse, my kingdom for a horse)
      • metaphorical/literal proverbs: (you can't have your cake and eat it, enough is enough)
      • similes (as good as gold)
    • Metaphors: semantically marked (non-compositional)
      • transparent metaphors: (behind someone's back, pack one's bags)
      • semi-transparent metaphors: (on an even keel, pecking order)
      • opaque metaphors: (bite the bullet, kick the bucket)
    • Collocations: compositional word co-occurrence of markedly high frequency
      • semantic collocations: co-occurrence preferences/priming effects (jam with FOOD)
      • lexico-semantic collocations: collocation paradigms (rancid butter/fat, face the truth/facts/problem)
      • syntactic collocations: fully-productive phraseological collocations (too … to ...)


  • (Kjellmer, 1987) ⇒ G. Kjellmer. (1987). “Aspects of English Collocations.” In: Proceedings of the Seventh International Conference on English Language Research on Computerised Corpora.
    • “A sequence of words that occurs more than once in identical form and is grammatically well-structured"