2015 TeachingMachinestoReadandCompre

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Attention-Based QA-LSTM; Attention-Based QA-LSTM-CNN; Attention Mechanism; Deep LSTM Reader; Machine Reading System, Neural Natural Language Processing System CNN-Daily Mail Dataset.

Notes

Cited By

Quotes

Abstract

Teaching machines to read natural language documents remains an elusive challenge. Machine reading systems can be tested on their ability to answer questions posed on the contents of documents that they have seen, but until now large scale training and test datasets have been missing for this type of evaluation. In this work, we define a new methodology that resolves this bottleneck and provides large scale supervised reading comprehension data. This allows us to develop a class of attention based deep neural networks that learn to read real documents and answer complex questions with minimal prior knowledge of language structure.

1. Introduction

Progress on the path from shallow bag-of-words information retrieval algorithms to machines capable of reading and understanding documents has been slow. Traditional approaches to machine reading and comprehension have been based on either hand engineered grammars (Riloff & Thelen, 2000), or information extraction methods of detecting predicate argument triples that can later be queried as a relational database (Poon et al, 2010). Supervised machine learning approaches have largely been absent from this space due to both the lack of large scale training datasets, and the difficulty in structuring statistical models flexible enough to learn to exploit document structure.

While obtaining supervised natural language reading comprehension data has proved difficult, some researchers have explored generating synthetic narratives and queries (Weston et al., 2015; Sukhbaatar et al., 2015). Such approaches allow the generation of almost unlimited amounts of supervised data and enable researchers to isolate the performance of their algorithms on individual simulated phenomena. Work on such data has shown that neural network based models hold promise for modelling reading comprehension, something that we will build upon here. Historically, however, many similar approaches in Computational Linguistics have failed to manage the transition from synthetic data to real environments, as such closed worlds inevitably fail to capture the complexity, richness, and noise of natural language (Winograd, 1972).

In this work we seek to directly address the lack of real natural language training data by introducing a novel approach to building a supervised reading comprehension data set. We observe that summary and paraphrase sentences, with their associated documents, can be readily converted to context–query–answer triples using simple entity detection and anonymisation algorithms. Using this approach we have collected two new corpora of roughly a million news stories with associated queries from the CNN and Daily Mail websites.

We demonstrate the efficacy of our new corpora by building novel deep learning models for reading comprehension. These models draw on recent developments for incorporating attention mechanisms into recurrent neural network architectures (Bahdanau et al., 2015, Minh et al., 2014; Gregor et al., 2015 ; Sukhbaatar et al., 2015). This allows a model to focus on the aspects of a document that it believes will help it answer a question, and also allows us to visualises its inference process. We compare these neural models to a range of baselines and heuristic benchmarks based upon a traditional frame semantic analysis provided by a state-of-the-art natural language processing (NLP) pipeline. Our results indicate that the neural models achieve a higher accuracy, and do so without any specific encoding of the document or query structure.

2. Supervised Training Data for Reading Comprehension

3. Models

4. Empirical Evaluation

5. Conclusion

References

2015a

2015b

2015c

2015d

2014a

  • (Das et al., 2014) ⇒ Dipanjan Das, Desai Chen, André F. T. Martins, Nathan Schneider, and Noah A. Smith (2014). "Frame-semantic Parsing". In: Computational Linguistics, v.40 n.1.

2014b

2014c

2014d

  • (Minh et al., 2014) ⇒ Volodymyr Mnih, Nicolas Heess, Alex Graves, and Koray Kavukcuoglu (2014). "Recurrent Models of Visual Attention". In: Proceedings of the Advances in Neural Information Processing Systems 27 (NIPS 2014).

2014e

2013

2012a

2012b

  • (Tieleman & Hinton, 2012) ⇒ Tijmen Tieleman, and Geoffrey Hinton (2012). Lecture 6.5—RmsProp: Divide the Gradient by a Running Average of Its Recent Magnitude. COURSERA: Neural Networks for Machine Learning.

2011

2010a

2010b

2007

2000

1997

1972

  • (Winograd, 1972) ⇒ Terry Winograd (1972). "Understanding Natural Language". In: Academic Press, Inc., Orlando, FL, USA.

1953

BibTeX

@inproceedings{2015_TeachingMachinestoReadandCompre,
  author    = {Karl Moritz Hermann and
               Tomas Kocisky and
               Edward Grefenstette and
               Lasse Espeholt and
               Will Kay and
               Mustafa Suleyman and
               Phil Blunsom},
  editor    = {Corinna Cortes and
               Neil D. Lawrence and
               Daniel D. Lee and
               Masashi Sugiyama and
               Roman Garnett},
  title     = {Teaching Machines to Read and Comprehend},
  booktitle = {Advances in Neural Information Processing Systems 28: Annual Conference
               on Neural Information Processing Systems 2015}
  month     = {December},
  address   = {Montreal, Quebec, Canada},
  pages     = {1693--1701},
  year      = {2015},
  url       = {http://papers.nips.cc/paper/5945-teaching-machines-to-read-and-comprehend}
}


 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2015 TeachingMachinestoReadandCompreEdward Grefenstette
Karl Moritz Hermann
Phil Blunsom
Lasse Espeholt
Mustafa Suleyman
Tomas Kocisky
Will Kay
Teaching Machines to Read and Comprehend2015