1992 TechniquesforAutomaticallyCorre

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Text Error Correction System, Spelling Error Correction (SEC) System.

Notes

Cited By

Quotes

Keywords

Abstract

Research aimed at correcting words in text has focused on three progressively more difficult problems: (1) nonword error detection; (2) isolated-word error correction; and (3) context-dependent work correction. In response to the first problem, efficient pattern-matching and n-gram analysis techniques have been developed for detecting strings that do not appear in a given word list. In response to the second problem, a variety of general and application-specific spelling correction techniques have been developed. Some of them were based on detailed studies of spelling error patterns. In response to the third problem, a few experiments using natural-language-processing tools or statistical-language models have been carried out. This article surveys documented findings on spelling error patterns, provides descriptions of various nonword detection and isolated-word error correction techniques, reviews the state of the art of context-dependent word correction techniques, and discusses research issues related to all three areas of automatic error correction in text.

Introduction

...

A distinction must be made between the tasks of error detection and error correction. Efficient techniques have been devised for detecting strings that do not appear in a given word list, dictionary, or lexicon [1]. But correcting a misspelled string is a much harder problem. Not only is the task of locating and ranking candidate words a challenge, but as Bentley (1985) points out: given the morphological productivity of the English language (e.g., almost any noun can be verbifled) and the rate at which words enter and leave the lexicon (e. g., catwomanhood, balkanization), some even question the wisdom of attempts at automatic correction.

Many existing spelling correctors exploit task-specific constraints. For example, interactive command line spelling correctors exploit the small size of a command language lexicon to achieve quick response times. Alternatively, longer response times are tolerated for noninteractive-mode manuscript preparation applications. Both of the foregoing spelling correction applications tolerate lower first-guess accuracy by returning multiple guesses and allowing the user to make the final choice of intended word. In contrast, some future applications, such as text-to-speech synthesis, will require a system to perform fully automatic, real-time word recognition and error correction for vocabularies of many thousands of words and names. The contrast between the first two examples and this last one highlights the distinction between interactive spelling checkers and automatic correction. The latter task is much more demanding, and it is not clear how far existing spelling correction techniques can go toward fully automatic word correction.

...

  1. The terms “word hst.” “dictionary,” and “lexicon” are used interchangeably in the literature. We prefer the use of the term lexicon because its connotation of “a list of words relevant to a particular subject, field, or class” seems best suited to spelling correction applications, but we adopt the terms “dictionary” and “word list” to describe research in which other authors have used them exclusively.

1. Nonword Error Detection Research

1.1 N-Gram Analysis Techniques

1.2 Dictionary Lookup Techniques

1.3 Dictionary Construction Issues

1.4 The Word Boundary Problem

1.5 Summary Of Non-word Error Detection Work

2. Isolated-Word Error Correction Research

2.1 Spelling Error Patterns

2.2 Techniques For Isolated-Word Error Correction

3. Context-Dependent Word Correction Research

3.1 Real-Word Errors Frequency And Classification

3.2 NLP Prototypes For Handling Ill-Formed Input

3.3 Statistically Based Error Detection And Correction Experiments

3.4 Summary Of Context-Dependent Word Correction Work

Future Directions

References

;

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
1992 TechniquesforAutomaticallyCorreKaren KukichTechniques for Automatically Correcting Words in Text10.1145/146370.1463801992