Morph Segmentation Task
(Redirected from Allomorph Segmentation Task)
- AKA: Word Decompounding, Decompounding Task, MST, Word Decompounding Task.
- MST("dogs") ⇒ (dog, s).
- MST("wolves") ⇒ (wolv, es, ).
- MST("arachnophobia") ⇒ (arachno, phonia).
- MST("Fliegangst" ~fear of flying) ⇒ (flieg, angst).
- MST("Lebensversicherungsgesellschaft" ~ "life insurance company) ⇒ (Lebens, versicher ungs, gesell, schaft).
- MST("The wolves' den was empty.") ⇒ ([The],[wolv],[es],['],[den],[was],[empty]), is an Allomorph Segmentation Task.
- a Morphological Parsing Task.
- an Orthographic Word Segmentation Task.
- \(f\)("vliegangst") ⇒ (Fliegen, angst), likely a Word Lemmatisation Task.
- \(f\)("日文章魚怎麼說") ⇒ (日文, 章魚, 怎麼, 說), likely a Word Segmentation Task.
- \(f\)("日文章魚怎麼說", English) ⇒ "How do you say octopus in Japanese?", likely a Machine Translation Task.
- See: Word Lemmatisation Task, Morpheme.
- (Karrij, 2004) ⇒ Wessel Kraaij. (2004). "Variations on Language Modeling for Information Retrieval." PhD Thesis, University of Twente, June 2004.
- Compound analysis (also called decompounding or compound splitting) is an additional normalization technique for Germanic languages, since these have a productive compounding capacity. This means that new words can be formed by concatenating existing words. Decomposition of these compound words into their constituting morphological base forms is important for IR, since these compounds can usually be paraphrased by a noun-phrase construction, e.g., “vliegangst” and “angst om te vliegen” (fear of flying). Normalization of compounds will enable a match between both forms of the same composite concept and partial matches with related words after compound splitting, e.g., ’luchtvervuiling’ will match with ’vervuiling’ Several algorithms have been proposed for compound splitting.. They either use a lexicon (e.g. Vosse, 1994) or a corpus (e.g. Hollink et al., 2003) as a resource for the identification of candidate base forms which can form compounds. We will discuss the results of several comparative studies concerning stemming algorithms in the rest of this section.
- Hollink, V., Kamps, J., Monz, C., & de Rijke, M. (2003). Monolingual document retrieval for european languages. Information Retrieval.
- Vosse, T. G. (1994). The Word Connection. PhD thesis, Rijksuniversiteit Leiden, Neslia Paniculata Uitgeverij, Enschede.