Word Stemming Task

From GM-RKB
(Redirected from WSIT)
Jump to navigation Jump to search

A Word Stemming Task is a text processing task that requires the stripping of a word mention's lexical suffix.



References

2015

2009

1999

  • (Manning and Schütze, 1999) ⇒ Christopher D. Manning and Hinrich Schütze. (1999). “Foundations of Statistical Natural Language Processing." The MIT Press.
    • QUOTE: Extensive empirical research within the Information Retrieval (IR) community has shown that doing stemming does not help the performance of classic IR system when performance is measure as an average over queries (Salton 1989; Hull 1996). There are always some queries for which stemming helps a lot. But there are other where performance goes down. This is a somewhat surprising result, especially from the viewpoint of linguist intuition, and so it is important to understand why that is. There are three main reasons for this.

1980

  • (Porter, 1980) ⇒ Martin F. Porter. (1980). “An Algorithm for Suffix Stripping.” In: Program, 14(3):130–137.
    • QUOTE: Removing suffixes by automatic means is an operation which is especially useful in the field of information retrieval. In a typical IR environment, one has a collection of documents, each described by the words in the document title and possibly by words in the document abstract. Ignoring the issue of precisely where the words originate, we can say that a document is represented by a vector of words, or terms. Terms with a common stem will usually have similar meanings, ...

      In any suffix stripping program for IR work, two points must be borne in mind. Firstly, the suffixes are being removed simply to improve IR performance, and not as a linguistic exercise. This means that it would not be at all obvious under what circumstances a suffix should be removed, even if we could exactly determine the suffixes of a word by automatic means.