2004 TextInducedSpellingCorrection

From GM-RKB

Jump to navigation Jump to search

(Reynaert, 2004) ⇒ Martin Reynaert. (2004). “Text Induced Spelling Correction.” In: Proceedings of the 20th International Conference on Computational Linguistics. doi:10.3115/1220355.1220475

Subject Headings: Text Error Correction (TEC) System; Text Induced Spelling Correction (TISC) System.

Notes

Cited By

Google Scholar: 59 Citations ⇒ http://scholar.google.com/scholar?q=%222004%22+Text+Induced+Spelling+Correction Retrieved: 2019-07-21.
ACM DL: 4 Citations ⇒ http://dl.acm.org/citation.cfm?id=1220355.1220475&preflayout=flat#citedby Retrirved:2019-07-21
Semantic Scholar: 28 Citations ⇒ https://www.semanticscholar.org/paper/Text-Induced-Spelling-Correction-Reynaert/9d1eaee896dc8887fca94fbaf484a3aebd2bf393 Retrieved: 2019-07-21

Quotes

Abstract

We present TISC, a language-independent and context-sensitive spelling checking and correction system designed to facilitate the automatic removal of non-word spelling errors in large corpora. Its lexicon is derived from a very large corpus of raw text, without supervision, and contains word unigrams and word bigrams. It is stored a novel representation based on a purpose-built hashing function, which provides a fast and computationally tractable way of checking whether a particular word form likely constitutes a spelling error and of retrieving correction candidates. The system employs input context and lexicon evidence to automatically propose a limited number of ranked correction candidates when insufficient information for an unambiguous decision on a single correction is available. We describe the implemented prototype and evaluate it on English and Dutch text, containing real-world errors in more or less limited contexts. The results are compared with those of the isolated word spelling checking programs ISPELL and the Microsoft Proofing Tools (MPT).

1 Introduction

The automatic detection and correction of errors is an important problem in the recognition of texts. Textual errors are mainly caused during the recognition process, and they are known as edition errors: insert, delete or change errors. In text recognition systems, the error correction is in part provided by a Contextual Postprocessing (CP). Let [math]\displaystyle{ w = a_1\;a_2 \cdots a_m }[/math] be an observed word which is obtained from a previous stage of the system; being the characters [math]\displaystyle{ a_i (1 \leq i \leq m) }[/math] belong to an alphabet [math]\displaystyle{ \Sigma }[/math]. The objective of the CP is to estimate a word [math]\displaystyle{ \hat{w} }[/math] in a set of words [math]\displaystyle{ D }[/math] (a dictionary) that is the best selection for [math]\displaystyle{ w }[/math], e.g., it minimizes a certain distance function [math]\displaystyle{ d(\hat{w}, w) }[/math] or maximizes the posteriori probability [math]\displaystyle{ P(\hat{w} | w) }[/math]. This problem is referred to as one of text error correction.

(...)

References

;

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2004 TextInducedSpellingCorrection	Martin Reynaert			Text Induced Spelling Correction				10.3115/1220355.1220475		2004

Retrieved from "http://www.gabormelli.com/RKB/index.php?title=2004_TextInducedSpellingCorrection&oldid=885945"

Facts

... more about "2004 TextInducedSpellingCorrection"

Martin Reynaert +

10.3115/1220355.1220475 +

Text Induced Spelling Correction +

2004 +