2006 WordNormaliAndDecompInMonoBilingIR

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Lemmatisation Task, Word Decompounding Task, Morphological Parsing Task.

Notes

Quotes

  • Keywords: Monolingual information retrieval - bilingual information retrieval - lemmatization - stemming - decompounding

Abstract

  • The present research studies the impact of decompounding and two different word normalization methods, stemming and lemmatization, on monolingual and bilingual retrieval. The languages in the monolingual runs are English, Finnish, German and Swedish. The source language of the bilingual runs is English, and the target languages are Finnish, German and Swedish. In the monolingual runs, retrieval in a lemmatized compound index gives almost as good results as retrieval in a decompounded index, but in the bilingual runs differences are found: retrieval in a lemmatized decompounded index performs better than retrieval in a lemmatized compound index. The reason for the poorer performance of indexes without decompounding in bilingual retrieval is the difference between the source language and target languages: phrases are used in English, while compounds are used instead of phrases in Finnish, German and Swedish. No remarkable performance differences could be found between stemming and lemmatization.

,

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2006 WordNormaliAndDecompInMonoBilingIREija AirioWord Normalization and Decompounding in Mono- and Bilingual IRJournal of Information Retrievalhttp://dx.doi.org/10.1007/s10791-006-0884-210.1007/s10791-006-0884-22006