1999 ThreePrincipledMethodsofAutomat

Jump to: navigation, search

Subject Headings: Morpheme.


Cited By



For the computer, word forms in an online text are simply letter sequences between blanks. A rule-based automatic language analyses presupposes, however, that the computer can recognize the individual word forms. This includes assigning the base form (lemmatization) and determining the morphosyntactic properties (categorization). It is shown that there are three principled methods of automatic word form recognition, based on word forms, morphemes, and allomorphs, respectively. After describing these different methods using the notions of traditional morphology, they are compared with regards to their handling of neologisms, time efficiency, and space requirements.

1 Morphemes and allomorphs

In traditional morphology, word forms are analyzed by disassembling them into their elementary parts. These are called morphemes and defined as the smallest meaningful units of language. In contrast to the number of potential words, the number of morphemes is finite.

The notion of a morpheme is a linguistic abstraction which is manifested concretely in the form of finitely many allomorphs. The word allomorph is of Greek origin and means “alternative shape.” For example, the morpheme wolf is realized as the two allomorphs wolf and wolv.

Just as the elementary parts of the syntax are really the word forms (and not the words), the elementary parts of morphology are really the allomorphs. A morpheme may be defined as naming the set of associated allomorphs.


morpheme </math>=def</math> {associated analyzed allomorphs}

Allomorphs are formally analyzed here as ordered triples, consisting of the surface, the category and the semantic representation. The following examples, based on the English noun wolf, are intended to demonstrate these basic concepts of morphology as simply as possible.Cite error: Closing </ref> missing for <ref> tag


2 Irregular words

The number and variation of allomorphs of a given morpheme determine the degree of regularity of the morpheme and – in the case of a free morpheme – the associated word. An example of a regular word is the verb to learn, the morpheme of which is defined as a set containing only one allomorph.


morpheme allomorphs
learn =de f {[learn (N . . .V) learn]}

An irregular word, on the other hand, is the verb to swim, the morpheme of which has four allomorphs, namely swim, swimm,[1] swam, and swum. The change of the stem vowel may be found also in other verbs, e.g., sing, sang, sung, and is called ablaut.


morpheme allomorphs
swim =de f {[swim (N . . .V1) swim],
[swimm (. . .B) swim],
[swam (N . . .V2) swim],
[swum (N . . .V) swim]}

In 2.2, the allomorph of the base form is used as the name of the morpheme. Thus, we may say that swam is an allomorph of the morpheme swim.

Cases where there is no similarity at all between the allomorphs of a given morpheme are called suppletion.


[[morpheme allomorph]]s
good =def {[good (ADV IR) good],
[bett (CAD IR) good],
[b (SAD IR) good]}

While the regular comparation in, e.g.,

fast, fast/er, fast/est

uses only one allomorph for the stem, the irregular comparation in, e.g.,

good, bett/er, b/est

uses several allomorphs.[2] Even in a suppletive formlike bett, the associated morpheme good is readily available as the third element of the ordered triple analysis.

In structuralism, the morphemes of the open and closed classes are called free morphemes, in contradistinction to bound morphemes. A morpheme is free if it can occcur as an independent word form, e.g. book. Bound morphemes, on the other hand, are affixes such as the prefixes un-, pre-, dis-, etc., and the suffixes -s, -ed, -ing, etc., which can occur only in combination with free morphemes.

The following example represents the English plural morpheme, which has been claimed to arise in such different forms as book/s, wolv/es, ox/en and sheep/#.

2.4 EXAMPLE OF A BOUND MORPHEME (hypothetical)

morpheme allomorphs
-s =de f {[s (PL1) plural],
[es (PL2) plural],
[en (PL3) plural]
[# (PL4) plural]}

In bound morphemes, the choice of the morpheme name, here -s, and the base form of the allomorph, here ‘plural’ is quite artificial. Also, postulating the ‘zero allomorph’ # is in violation of the principle of surface compositionality.

3 Categorization and lemmatization

The morphological analysis of an unknown word form surface consists in principle of the following three steps. The unanalyzed word form surface is (i) disassembled into its basic elements (segmentation), (ii) the basic elements are analyzed in terms of their grammatical definitions (lexical look-up), and (iii) the analyzed basic elements are reassembled based on rules, whereby the overall analysis of the word form is derived (concatenation). Thereby concatenation applies to the surface, the category, and the semantic representation simultaneously. Depending on the approach, the basic elements of word forms are either the allomorphs or the morphemes.


  1. This allomorph is used in the progressive swimm/ing, avoiding the concatenative insertion of the gemination letter.
  2. For practical purposes, one may analyze good, better, best as basic allomorphs without concatenation.



 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
1999 ThreePrincipledMethodsofAutomatRonald HausserThree Principled Methods of Automatic Word Form Recognition1999

AuthorRonald Hausser +
titleThree Principled Methods of Automatic Word Form Recognition +
year1999 +