2013 TowardstheMachineComprehensiono

From GM-RKB
Jump to navigation Jump to search
  • (Burges, 2013) ⇒ Christopher J.C. Burges. (2013). “Towards the Machine Comprehension of Text: An Essay.” In: Microsoft Research Technical Report MSR-TR-2013-125 Journal.

Subject Headings: Text Comprehension.

Notes

Cited By

Quotes

Introduction

The Machine Comprehension of Text (MCT)1 has been a central goal of Artificial Intelligence for over fifty years. How does one even define “machine comprehension”? Researchers often invoke the Turing test to this end (a machine attains human level intelligence if its responses in a dialog with a human are indistinguishable from those of another human (Turing, 1950)), but as Levesque (2013) recently pointed out, this definition has resulted in workers focusing on the wrong task, namely, fooling humans, rather than achieving machine intelligence. But even if researchers could be persuaded to focus on the AI part of the Turing test, the test is still a false goal, in the sense that the typical user would be happy to know that she is having a dialog with a machine if this were a result of her knowing that no human could possibly be that smart. Perhaps shoehorning the research to meet the goal of appearing human-like is a red herring. Levesque also suggests multiple choice tests that require world knowledge (for example, to solve the anaphora problem) as a suitable replacement for the Turing test. We will return to multiple choice tests below. But this still leaves the definition of machine comprehension tied to the data used to construct the tests. It seems useful to define the task more generally, but still operationally, and to this end we suggest the following: A machine comprehends a passage of text if, for any question regarding that text that can be answered correctly by a majority of native speakers, that machine can provide a string which those speakers would agree both answers that question, and does not contain information irrelevant to that question. Thus we can define machine comprehension in terms of Question Answering in its most general form. Much has changed since the early days and we can hope that recent advances, such as the emergence of large, distantly labeled datasets (e.g. text on theWeb), the availability of orders of magnitude more computing power, and the development of powerful and principled mathematical models, will lead to real progress. The goal of this essay is to ex1We prefer this term as more precise than other terms such as Machine Reading (but machines have been reading since the days of punch cards) and Natural Language Understanding (which, as a challenge, can equally apply to people). amine what might be needed to solve the problem of the machine comprehension of text.

1.1 How To Measure Progress

Levesque (2013) suggests multiple choice question answering as a better alternative to the Turing test. To spur research in this direction we have made available a dataset of 660 fictional short stories, created using crowd sourcing, and aimed at the reading level of a typical 7 year old (Richardson et al., 2013). Each story is accompanied by four multiple choice questions. The Winograd Schema Test proposal (Levesque, 2013) suggests using questions that require significant expertise to generate, since the question/answer pairs are carefully designed to require background knowledge (for example, in “The ball fell through the table because it was made of paper”, to what does “it” refer?”). On the other hand, using crowd sourcing to generate the data has the significant advantage of scalability. We also have some control over the difficulty of the task by restricting the available vocabulary. If progress is rapid and the data set turns out to be too easy, we can increase the vocabulary from the current 8000 words, incorporate non-fictional writing, and if necessary, change the task definition, by, for example, (1) not requiring that exactly one of the four alternative answers always be correct, but instead allowing more than one, one, or no correct answers per question, or (2) requiring that more answers require reasoning over several sentences (for the current set, workers were asked to make at least two questions answerable only by combining information from at least two sentences; this could also be tightened by requiring that the two sentences be separated in the text). It is interesting that, while random guessing will get 25% of the questions correct, a simple token-based baseline achieves approximately 60% correct, and early results using a modern textual entailment system are similar (Richardson et al., 2013).

2 Desiderata and some Recent Work =

4 Discussion

Rule-building has been around since the beginning of AI, and researchers are understandably wary about relying too heavily upon it. The main reason that expert systems did not solve AI is that with rare exceptions, updating rules manually is not a feasible large scale solution. However, techniques that require significant up front manual labor, but which are then largely automated, can be scalable, as NELL has shown; perhaps some form of scalable rule building and rule learning can be made to work. The use of machine learning, although clearly very powerful in many scenarios, can also act as a brake on progress, since such systems are typically not interpretable and so are difficult to further improve, without yet more labeled data. They also currently lack the other key properties we describe above. However clearly such powerful tools should be used when appropriate. One approach may be to limit their use to model the uncertainty remaining in the data after all structure in the data has been maximally leveraged. The fact that text is usually unambiguous to humans also suggests that for people at least, the modeling of uncertainty is not the central problem being solved. Saving our current machine learning algorithms (deep or otherwise) for those situations where uncertainty must be modeled and where labels are extremely simple, and instead searching for different approaches designed for richly structured yet unambiguous text, is not meant to suggest that we abandon the rich mathematical foundations upon which machine learning rests; the same search for disciplined methods should serve us well here, too.

References

;

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2013 TowardstheMachineComprehensionoChristopher J. C. BurgesTowards the Machine Comprehension of Text: An Essay