1980 AnAlgorithmForSuffixStripping

(Porter, 1980) ⇒ Martin F. Porter. (1980). “An Algorithm for Suffix Stripping.” In: Program, 14(3):130–137.

Subject Headings: Word Stem, Word Stemming Task, Word Stemming Algorithm, Porter Stemmer.

Quotes

1. Introduction

Removing suffixes by automatic means is an operation which is especially useful in the field of information retrieval. In a typical IR environment, one has a collection of documents, each described by the words in the document title and possibly by words in the document abstract. Ignoring the issue of precisely where the words originate, we can say that a document is represented by a vector of words, or terms. Terms with a common stem will usually have similar meanings, for example:

       CONNECT
       CONNECTED
       CONNECTING
       CONNECTION
       CONNECTIONS

Frequently, the performance of an IR system will be improved if term groups such as this are conflated into a single term. This may be done by removal of the various suffixes -ED, -ING, -ION, IONS to leave the single term CONNECT. In addition, the suffix stripping process will reduce the total number of terms in the IR system, and hence reduce the size and complexity of the data in the system, which is always advantageous.

The nature of the task will vary considerably depending on whether a stem dictionary is being used, whether a suffix list is being used, and of course on the purpose for which the suffix stripping is being done.

In any suffix stripping program for IR work, two points must be borne in mind. Firstly, the suffixes are being removed simply to improve IR performance, and not as a linguistic exercise. This means that it would not be at all obvious under what circumstances a suffix should be removed, even if we could exactly determine the suffixes of a word by automatic means.

,

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
1980 AnAlgorithmForSuffixStripping	Martin F. Porter			An Algorithm for Suffix Stripping		Program	http://tartarus.org/~martin/PorterStemmer/def.txt			1980

1980 AnAlgorithmForSuffixStripping

Quotes

1. Introduction

Navigation menu

Search