Morphological Parsing Task

(Redirected from Word Morphology)
Jump to navigation Jump to search

A Morphological Parsing Task is a word-level morphological analysis task that requires the decomposition of a surface word form into its underlying Word Constinuents (morphs, morphemes, lexemes, word root, and morphological rules)



  • (Wikipedia, 2015) ⇒ Retrieved:2015-4-15.
    • In linguistics, morphology is the identification, analysis, and description of the structure of a given language's morphemes and other linguistic units, such as root words, affixes, parts of speech, intonations and stresses, or implied context. In contrast, morphological typology is the classification of languages according to their use of morphemes, while lexicology is the study of those words forming a language's wordstock.

      While words, along with clitics, are generally accepted as being the smallest units of syntax, in most languages, if not all, many words can be related to other words by rules that collectively describe the grammar for that language. For example, English speakers recognize that the words dog and dogs are closely related, differentiated only by the plurality morpheme "-s", only found bound to nouns. Speakers of English, a fusional language, recognize these relations from their tacit knowledge of English's rules of word formation. They infer intuitively that dog is to dogs as cat is to cats ; and, in similar fashion, dog is to dog catcher as dish is to dishwasher. Languages such as Classical Chinese, however, also use unbound morphemes ("free" morphemes) and depend on post-phrase affixes and word order to convey meaning. (Most words in modern Standard Chinese ("Mandarin"), however, are compounds and most roots are bound.) These are understood as grammars that represent the morphology of the language. The rules understood by a speaker reflect specific patterns or regularities in the way words are formed from smaller units in the language they are using and how those smaller units interact in speech. In this way, morphology is the branch of linguistics that studies patterns of word formation within and across languages and attempts to formulate rules that model the knowledge of the speakers of those languages.

       Polysynthetic languages, such as Chukchi, have words composed of many morphemes. The Chukchi word "təmeyŋəlevtpəγtərkən", for example, meaning "I have a fierce headache", is composed of eight morphemes t-ə-meyŋ-ə-levt-pəγt-ə-rkən that may be glossed. The morphology of such languages allows for each consonant and vowel to be understood as morphemes, while the grammar of the language indicates the usage and understanding of each morpheme.

      The discipline that deals specifically with the sound changes occurring within morphemes is morphophonology.


  • (Wikipedia, 2015) ⇒ Retrieved:2015-4-16.
    • Morphological parsing, in natural language processing, is the process of determining the morphemes from which a given word is constructed. It must be able to distinguish between orthographic rules and morphological rules. For example, the word 'foxes' can be decomposed into 'fox' (the stem), and 'es' (a suffix indicating plurality).

      The generally accepted approach to morphological parsing is through the use of a finite state transducer (FST), which inputs words and outputs their stem and modifiers. The FST is initially created through algorithmic parsing of some word source, such as a dictionary, complete with modifier markups.

      Another approach is through the use of an indexed lookup method, which uses a constructed radix tree. This is not an often-taken route because it breaks down for morphologically complex languages.





  • (Beesley & Karttunen, 2003) ⇒ Kenneth R. Beesley, and Lauri Karttunen. (2003). “Finite State Morphology." CSLI Publications
  • (Karttunen, 2003) ⇒ Lauri Karttunen. (2003). “Computing with Realizational Morphology.” In: CICLing-2003, A. Gelbukh (ed.), Lecture Notes in Computer Science. Springer Verlag. 2003.
    • ABSTRACT: The theory of realizational morphology presented by Stump in his influential book Inflectional Morphology (2001) describes the derivation of inflected surface forms from underlying lexical forms by means of ordered blocks of realization rules. The theory presents a rich formalism for expressing generalizations about phenomena commonly found in the morphological systems of natural languages.

       This paper demonstrates that, in spite of the apparent complexity of Stump’s formalism, the system as a whole is no more powerful than a collection of regular relations. Consequently, a Stump-style description of the morphology of a particular language such as Lingala or Bulgarian can be compiled into a finite-state transducer that maps the underlying lexical representations directly into the corresponding surface forms or forms, and vice versa, yielding a single lexical transducer. For illustration we will present an explicit finite-state implementation of an analysis of Lingala based on Stump’s description and other sources.


  • (Soon et al., 2001) ⇒ Wee Meng Soon, Hwee Tou Ng, and Daniel Chung Yong Lim. (2001). “A Machine Learning Approach to Coreference Resolution of Noun Phrases.” In: Computational Linguistics, Vol. 27, No. 4.
    • QUOTE: A prerequisite for coreference resolution is to obtain most, if not all, of the possible markables in a raw input text. To determine the markables, a pipeline of natural language processing (NLP) modules is used, as shown in Figure 1. They consist of tokenization, sentence segmentation, morphological processing, part-of-speech tagging, noun phrase identification, named entity recognition, nested noun phrase extraction, and semantic class determination.



  • (Lezius et al, 1998) ⇒ Wolfgang Lezius, Reinhard Rapp, Manfred Wettler. (1998). “A Freely Available Morphological Analyzer, Disambiguator and Context Sensitive Lemmatizer for German].” In: Proceedings of the 17th International Conference on Computational linguistics.
    • ABSTRACT: In this paper we present Morphy, an integrated tool for German morphology, part-of-speech tagging and context-sensitive lemmatization. Its large lexicon of more than 320,000 word forms plus its ability to process German compound nouns guarantee a wide morphological coverage. Syntactic ambiguities can be resolved with a standard statistical part-of-speech tagger. By using the output of the tagger, the lemmatizer can determine the correct root even for ambiguous word forms. The complete package is freely available and can be downloaded from the World Wide Web.


  • (Kaplan & Kay, 1994) ⇒ Ronald M Kaplan, and Martin Kay. (1994). “Regular Models of Phonological Rule Systems.” In: Computational Linguistics, 20(3).
    • ABSTRACT: This paper presents a set of mathematical and computational tools for manipulating and reasoning about regular languages and regular relations and argues that they provide a solid basis for computational phonology. It shows in detail how this framework applies to ordered sets of context-sensitive rewriting rules and also to grammars in Koskenniemi's two-level formalism. This analysis provides a common representation of phonological constraints that supports efficient generation and recognition by a single simple interpreter.


  • (Koskenniemi, 1983) ⇒ Kimmo Koskenniemi. (1983). “Two-level Morphology:A General Computational Model for Word-Form Recognition and Production." Department of General Linguistics, University of Helsinki, Helsinki, Finland.


  • (Johnson, 1972) ⇒ C. Douglas Johnson. (1972). “Formal Aspects of Phonological Description." Mouton, The Hague.


  • (Matthews, 1972) ⇒ Peter H. Matthews. (1972). “Inflectional morphology: a theoretical study based on aspects of Latin verb conjugation.” Cambridge: Cambridge University Press.