2019 SpellingCorrectionAsaForeignLan

We use beam search to obtain the final result from the model. The result is illustrated in table 1, it is clear that our albeit much simpler, our RNN based model offers competitive performance as compare to the previous methods. It is interesting to note that, the BPE based encoder and decoder performs the best. The better performance may attribute to the shorter resultant sequence as compared to the character case, and possibly more semantic meaningful segments from the sub-words as compared to the characters. Surprisingly, the character based decoder performs quite well considering the complexity of the learning task. This demonstrated the benefit from end-to-end training and the robustness of the framework.

**Table 1:** Results on test dataset with various methods. C-2-C denotes that the model uses character based encoder and decoder; W-2-W denotes that the model uses BPE partial word based encoder and decoder; and C-2-W denotes that the model uses a character based encoder and BPE partial word based decoder.
Method	Accuracy
Hasan et al.[8]	62.0%
C-Z-W RNN	59.9 %
W-Z-W RNN	62.5 %
C-Z-C RNN	55.1%

6 Conclusion

In this paper, we reformulated the spelling correction problem as a machine translation task under the encoder-decoder framework. The reformulation allowed us to use a single model for solving the problem and can be trained from end-to-end. We demonstrate the effectiveness of this model using an internal dataset, where the training data is automatically obtained from user logs. Despite the simplicity of the model, it performed competitively as compared to the state of the art methods that require a lot of feature engineering and human intervention.

References

2019a

(Gupta et al., 2019) ⇒ Jai Gupta, Zhen Qin, Michael Bendersky, and Donald Metzler. (2019, May). "Personalized Online Spell Correction for Personal Search". In: The World Wide Web Conference. ACM. DOI:10.1145/3308558.3313706

2019b

(Lu et al., 2019) ⇒ Chris J. Lu, Alan R. Aronson, Sonya E. Shooshan, and Dina Demner-Fushman (2019). "Spell Checker For Consumer Language (CSpell)". In: Journal of the American Medical Informatics Association (JAMIA)26, 3. DOI:10.1093/jamia/ocy171

2016a

(Eger et al., 2016) ⇒ Steffen Eger, Tim vor der Bruck, and Alexander Mehler (2016). “A Comparison Of Four Character-Level String-To-String Translation Models For (OCR) Spelling Error Correction". The Prague Bulletin of Mathematical Linguistics 105, 1 (2016), 77799.

2013

(Raaijmakers, 2013) ⇒ Stephan Raaijmakers (2013). “A Deep Graphical Model For Spelling Correction".

2012

(Li et al., 2012) ⇒ Yanen Li, Huizhong Duan, and ChengXiang Zhai (2012). “CloudSpeller: Query Spelling Correction By Using A Uniﬁed Hidden Markov Model With Web-Scale Resources". In: Proceedings the 21st International Conference on World Wide Web. ACM, 5617562.

2010

(Gao et al., 2010) ⇒ Jianfeng Gao, Xiaolong Li, Daniel Micol, Chris Quirk, and Xu Sun (2010). “A Large Scale Ranker-Based System For Search Query Spelling Correction". In: Proceedings of the 23rd International Conference on Computational Linguistics. Association for Computational Linguistics, 3587366.

2009

(Whitelaw et al., 2009) ⇒ Casey Whitelaw, Ben Hutchinson, Grace Y Chung, and Gerard Ellis (2009). “Using The Web For Language Independent Spellchecking And Autocorrection". In: Proceedings the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2—Volume 2. Association for Computational Linguistics, 8907899.

2004

(Cucerzan & Brill) ⇒ Silviu Cucerzan and Eric Brill (2004). “Spelling Correction As An Iterative Process That Exploits The Collective Knowledge Of Web Users". In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. 2937300.

2001

(Hochreiter et al., 2001) ⇒ Sepp Hochreiter, Yoshua Bengio, Paolo Frasconi, and Jurgen Schmidhuber (2001). “Gradient Flowing Recurrent Nets: The Difficulty Of Learning Long-Term Dependencies".

2000

(Brill & Moore, 2000) ⇒ Eric Brill and Robert C Moore (2000). “An Improved Error Model For Noisy Channel Spelling Correction". In: Proceedings ofthe 38th Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, 2867293.

1997

(Hochreiter & Schmidhuber, 1997) ⇒ Sepp Hochreiter, and Jurgen Schmidhuber. (1997). “Long Short-term Memory". In: Neural computation, 9(8). DOI:10.1162/neco.1997.9.8.1735.

1994

(Bengio et al., 1994) ⇒ Yoshua Bengio, Patrice Simard, and Paolo Frasconi (1994). “Learning Long-Term Dependencies With Gradient Descent Is Difficult". IEEE transactions on neural networks 5, 2 (1994), 1577166.

1990

(Kernighan et al., 1990) ⇒ Mark D. Kernighan, Kenneth W. Church, and William A. Gale. 1990. “A Spelling Correction Program Based On A Noisy Channel Model". In: Proceedings of the 13th conference on Computational linguistics-Volume 2. Association for Computational Linguistics, 2057210.

BibTeX

@inproceedings{DBLP:conf/sigir/ZhouPK19,
  author    = {Yingbo Zhou and
               Utkarsh Porwal and
               Roberto Konow},
  title     = {Spelling Correction as a Foreign Language},
  booktitle = {Proceedings of the {SIGIR} 2019 Workshop on eCommerce, co-located
               with the 42st International {ACM} {SIGIR} Conference on Research and
               Development in Information Retrieval, eCom@SIGIR 2019, Paris, France,
               July 25, 2019},
  year      = {2019},
  crossref  = {DBLP:conf/sigir/2019ecom},
  url       = {http://ceur-ws.org/Vol-2410/paper28.pdf},
  timestamp = {Fri, 30 Aug 2019 13:15:06 +0200},
  biburl    = {https://dblp.org/rec/bib/conf/sigir/ZhouPK19},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2019 SpellingCorrectionAsaForeignLan	Yingbo Zhou Utkarsh Porwal Roberto Konow			Spelling Correction As a Foreign Language						2019

2019 SpellingCorrectionAsaForeignLan

Notes

Cited By

Quotes

Author Keywords

Abstract

1 Introduction

2 Related Work

3 Background And Preliminaries

4 Spelling Correction As A Foreign Language

5 Experiments