1999 ThreePrincipledMethodsofAutomat

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Morpheme.

Notes

Cited By

Quotes

Abstract

For the computer, word forms in an online text are simply letter sequences between blanks. A rule-based automatic language analyses presupposes, however, that the computer can recognize the individual word forms. This includes assigning the base form (lemmatization) and determining the morphosyntactic properties (categorization). It is shown that there are three principled methods of automatic word form recognition, based on word forms, morphemes, and allomorphs, respectively. After describing these different methods using the notions of traditional morphology, they are compared with regards to their handling of neologisms, time efficiency, and space requirements.

1 Morphemes and allomorphs

In traditional morphology, word forms are analyzed by disassembling them into their elementary parts. These are called morphemes and defined as the smallest meaningful units of language. In contrast to the number of potential words, the number of morphemes is finite.

The notion of a morpheme is a linguistic abstraction which is manifested concretely in the form of finitely many allomorphs. The word allomorph is of Greek origin and means “alternative shape.” For example, the morpheme wolf is realized as the two allomorphs wolf and wolv.

Just as the elementary parts of the syntax are really the word forms (and not the words), the elementary parts of morphology are really the allomorphs. A morpheme may be defined as naming the set of associated allomorphs.

1.1 DEFINITION OF THE NOTION morpheme

$\text{morpheme}=def \{\text{associated analyzed allomorphs}\}$

Allomorphs are formally analyzed here as ordered triples, consisting of the surface, the category and the semantic representation. The following examples, based on the English noun wolf, are intended to demonstrate these basic concepts of morphology as simply as possible.[1]

1.2 FORMAL ANALYSIS OF THE MORPHEME wolf

morpheme allomorphs
wolf =def { [wolf (SN SR) wolf],
[ wolv (PN SR) wolf]}

The different allomorphs wolf and wolv are shown to belong to the same morpheme by the common semantic representation in the third position. As (the name of) the semantic representation we use the base form of the allomorph, i.e. wolf, which is also used as the name of the associated morpheme.

Some surfaces such as wolf can be analyzed alternatively as an allomorph, a morpheme (name), a word form, or a word (name).

[2]

2 Irregular words

The number and variation of allomorphs of a given morpheme determine the degree of regularity of the morpheme and – in the case of a free morpheme – the associated word. An example of a regular word is the verb to learn, the morpheme of which is defined as a set containing only one allomorph.

2.1 THE REGULAR MORPHEME learn

morpheme allomorphs
learn =def { [learn (N . . .V) learn]},

An irregular word, on the other hand, is the verb to swim, the morpheme of which has four allomorphs, namely swim, swimm,[3] swam, and swum. The change of the stem vowel may be found also in other verbs, e.g., sing, sang, sung, and is called ablaut.

2.2 THE IRREGULAR MORPHEME swim

morpheme allomorphs
swim =def { [swim (N ... V1) swim],
[swimm (. . .B) swim],
[swam (N . . .V2) swim],
[swum (N . . .V) swim]}

In 2.2, the allomorph of the base form is used as the name of the morpheme. Thus, we may say that swam is an allomorph of the morpheme swim.

Cases where there is no similarity at all between the allomorphs of a given morpheme are called suppletion.

2.3 AN EXAMPLE OF SUPPLETION

morpheme allomorphs
good =def {[good(ADV IR) good],
[bett(CAD IR) good],
[b(SAD IR) good]}

While the regular comparation in, e.g.,

fast, fast/er, fast/est

uses only one allomorph for the stem, the irregular comparation in, e.g.,

good, better, b/est

uses several allomorphs.[4] Even in a suppletive formlike bett, the associated morpheme good is readily available as the third element of the ordered triple analysis.

In structuralism, the morphemes of the open and closed classes are called free morphemes, in contradistinction to bound morphemes. A morpheme is free if it can occcur as an independent word form, e.g. book. Bound morphemes, on the other hand, are affixes such as the prefixes un-, pre-, dis-, etc., and the suffixes -s, -ed, -ing, etc., which can occur only in combination with free morphemes.

The following example represents the English plural morpheme, which has been claimed to arise in such different forms as book/s, wolv/es, ox/en and sheep/#.

2.4 EXAMPLE OF A BOUND MORPHEME (hypothetical)

morpheme allomorphs
-s =def { [s (PL1) plural],
[es (PL2) plural],
[en (PL3) plural],
[# (PL4) plural (PL4) plural]}

In bound morphemes, the choice of the morpheme name, here -s, and the base form of the allomorph, here ‘plural’ is quite artificial. Also, postulating the ‘zero allomorph’ # is in violation of the principle of surface compositionality.

3 Categorization and lemmatization

The morphological analysis of an unknown word form surface consists in principle of the following three steps. The unanalyzed word form surface is (i) disassembled into its basic elements (segmentation), (ii) the basic elements are analyzed in terms of their grammatical definitions (lexical look-up), and (iii) the analyzed basic elements are reassembled based on rules, whereby the overall analysis of the word form is derived (concatenation). Thereby concatenation applies to the surface, the category, and the semantic representation simultaneously. Depending on the approach, the basic elements of word forms are either the allomorphs or the morphemes.

Footnotes

  1. Nouns of English ending in -lf, such as calf, shelf, self, etc. form their plural in general as -lves. One might prefer for practical purposes to treat forms like wolves, calves, or shelves as elementary allomorphic forms, rather than combining an allomorphic noun stem ending in -lv with the plural allomorph es. This, however, would prevent us from explaining the interaction of concatenation and allomorphy with an example from English.
  2. 2. In as much as the medium of realization influences the representation of allomorphs (types), there is the distinction between allographs in written and allophones in spoken language. Allographs are, e.g., happy and happi-, allophones the present vs. past tense pronunciation of read.
  3. This allomorph is used in the progressive swimming, avoiding the concatenative insertion of the gemination letter.
  4. For practical purposes, one may analyze good, better, best as basic allomorphs without concatenation.

References

;

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
1999 ThreePrincipledMethodsofAutomatRonald HausserThree Principled Methods of Automatic Word Form Recognition1999