2016 SentenceLevelGrammaticalErrorId

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Attention-based Encoder-Decoder RNN Model, Sentence-Level NLP Task.

Notes

Cited By

Quotes

Abstract

We demonstrate that an attention-based encoder-decoder model can be used for sentence-level grammatical error identification for the Automated Evaluation of Scientific Writing (AESW) Shared Task 2016. The attention-based encoder-decoder models can be used for the generation of corrections, in addition to error identification, which is of interest for certain end-user applications. We show that a character-based encoder-decoder model is particularly effective, outperforming other results on the AESW Shared Task on its own, and showing gains over a word-based counterpart. Our final model -- a combination of three character-based encoder-decoder models, one word-based encoder-decoder model, and a sentence-level CNN -- is the highest performing system on the AESW 2016 binary prediction Shared Task.

1 Introduction

2 Background

Evaluation is at the sentence level, but the paragraph-level context for each sentence is also provided. The paragraphs, themselves, are shuffled so that full article context is not available. A coarse academic field category is also provided for each paragraph. Our models described below do not make use of the paragraph context nor the field category, and they treat each sentence independently.

Further information about the task is available in the Shared Task report (Daudaravicius et al., 2016).

3 http://www.comp.nus.edu.sg/ ̃nlp/conll13st.html

4 http://www.comp.nus.edu.sg/ ̃nlp/conll14st.html

3 Related Work

While this is the first year for a shared task focusing on sentence-level binary error identification, previous work and shared tasks have focused on the related tasks of intra-sentence identification and correction of errors. Until recently, standard hand-annotated grammatical error datasets were not available, complicating comparisons and limiting the choice of methods used. Given the lack of a large hand-annotated corpus at the time, Park and Levy (2011) demonstrated the use of the EM algorithm for parameter learning of a noise model using error data without corrections, performing evaluation on a much smaller set of sentences hand-corrected by Amazon Mechanical Turk workers.

4 Models

5 Experiments

5.1 Data

The AESW task data differs from previous grammatical error datasets in terms of scale and genre. To the best of our knowledge, the AESW dataset is the first large-scale, publicly available professionally edited dataset of academic, scientific writing. The training set consists of 466,672 sentences with edits and 722,742 sentences without edits, and the development set contains 57,340 sentences with edits and 90,106 sentences without. The raw training and development datasets are provided as annotated sentences, t, from which the s sequences may be deterministically derived. There are 143,802 sentences in the Shared Task test set with hidden gold labels, which serve directly as s sequences.

As part of pre-processing, we treat each sentence independently, discarding paragraph context (which sentences, if any, were present in the same paragraph) and domain information, which is a coarse grouping by the field of the original journal (Engineering, Mathematics, Chemistry, Physics, etc.). We generate Penn Treebank style tokenizations of the input. Case is maintained and digits are not replaced with holder symbols. The vocabulary is restricted to the 50,000 most common tokens, with remaining low frequency tokens replaced with a special <unk> token. The CHAR model can encode but not decode over open vocabularies and hence we do not have any <unk> tokens on the source side of those models. For all of the encoder-decoder models, we replace the low-frequency target symbols during inference as discussed above in Section 4.2.

References

;

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2016 SentenceLevelGrammaticalErrorIdAlexander M Rush
Allen Schmaltz
Yoon Kim
Stuart Shieber
Sentence-Level Grammatical Error Identification As Sequence-to-Sequence Correction