Question-Answer (QA) Benchmark Dataset

From GM-RKB
(Redirected from question answering dataset)
Jump to navigation Jump to search

A Question-Answer (QA) Benchmark Dataset is a reading comprehension NLP benchmark database for QA systemss.



References

2023

2019a

2019b

Dataset Conversational Answer Type Domain
MCTest (Richardson et al., 2013) Multiple choice Children’s stories
CNN/Daily Mail (Hermann et al., 2015) Spans News
Children's book test (Hill et al., 2016) Multiple choice Children’s stories
SQuAD (Rajpurkar et al., 2016) Spans Wikipedia
MS MARCO (Nguyen et al., 2016) Free-form text, Unanswerable Web Search
NewsQA (Trischler et al., 2017) Spans News
SearchQA (Dunn et al., 2017) Spans Jeopardy
TriviaQA (Joshi et al., 2017) Spans Trivia
RACE (Lai et al., 2017) Multiple choice Mid/High School Exams
Narrative QA (Kocisky et al., 2018) Free-form text Movie Scripts, Literature
SQuAD 2.0 (Rajpurkar et al., 2018) Spans, Unanswerable Wikipedia
CoQA (this work) Free-form text, Unanswerable;

Each answer comes with a text span rationale

Children’s Stories, Literature, Mid/High School Exams, News, Wikipedia, Reddit, Science
Table 1: Comparison of CoQA with existing reading comprehension datasets.

2018a

Dataset Documents Questions Answers
MCTest (Richardson et al., 2013) 660 short stories, grade school level 2640 human generated, based on the document multiple choice
CNN/Daily Mail (Hermann et al., 2015) 93K+220K news articles 387K+997K Cloze-form, based on highlights entities
Children’s Book Test (CBT) (Hill et al., 2016) 687K of 20 sentence passages from 108 children’s books Cloze-form, from the 21st sentence multiple choice
BookTest (Bajgar et al., 2016) 14.2M, similar to CBT Cloze-form, similar to CBT multiple choice
SQuAD (Rajpurkar et al., 2016) 23K paragraphs from 536 Wikipedia articles 108K human generated, based on the paragraphs spans
NewsQA (Trischler et al., 2016) 13K news articles from the CNN dataset 120K human generated, based on headline, highlights spans
MS MARCO (Nguyen et al., 2016) 1M passages from 200K+ documents retrieved using the queries 100K search queries human generated, based on the passages
SearchQA (Dunn et al., 2017) 6.9m passages retrieved from a search engine using the queries 140k human generated Jeopardy! questions human generated Jeopardy! answers
NarrativeQA (this paper) 1,572 stories (books, movie scripts) & human generated summaries 46,765 human generated, based on summaries human generated, based on summaries
Table 1: Comparison of datasets.

2018b

  1. SQuAD with adveRsarial Unanswerable questions

2018c

2016