2002 LearningSurfaceTextPatternsForAQASystem

(Ravichandran & Hovy, 2002) ⇒ Deepak Ravichandran, Eduard Hovy. (2002). “Learning Surface Text Patterns for a Question Answering System.” In: [[journal::Proceedings of the 40th Annual Meeting on Association for Computational Linguistics] (ACL 2002).

Subject Headings: Pattern-based Semantic Relation Recognition, Semi-Supervised Learning Algorithm, Question Answering Task.

Notes

It reports on several open challenges and opportunities.
- Named entity recognition and word sense would help to disqualify some patterns.
- Surface patterns cannot handle long-distance dependencies, thus reducing recall.
- It would be helpful to predict the size of the entity mention.
- It would be helpful to distinguish between upper-lower case letters. E.g. micron vs Micron.
- It would be helpful to Canonicalize Entity Mentions, such as “1869”, “Oct. 2, 1869”, “2nd October 1869”, “October 2 1869” and “Mahatma Gandhi”, “Mohandas Karamchand Gandhi”.

Cited By

Quotes

Abstract

In this paper we explore the power of surface text patterns for open-domain question answering systems. In order to obtain an optimal set of patterns, we have developed a method for learning such patterns automatically. A tagged corpus is built from the Internet in a bootstrapping process by providing a few hand-crafted examples of each question type to Altavista. Patterns are then automatically extracted from the returned documents and standardized. We calculate the precision of each pattern, and the average precision for each question type. These patterns are then applied to find answers to new questions. Using the TREC-10 question set, we report results for two cases: answers determined from the TREC-10 corpus and from the web.

1 Introduction

…

However, at the recent TREC-10 QA evaluation (Voorhees, 01), the winning system used just one resource: a fairly extensive list of surface patterns (Soubbotin and Soubbotin, 01). The apparent power of such patterns surprised many. We therefore decided to investigate their potential by acquiring patterns automatically and to measure their accuracy.

It has been noted in several QA systems that certain types of answer are expressed using characteristic phrases (Lee et al., 01; Wang et al., 01). For example, for BIRTHDATEs (with questions like “When was X born?”), typical answers are

“Mozart was born in 1756.”
“Gandhi (1869–1948)…”

These examples suggest that phrases like

“<NAME> was born in <BIRTHDATE>”
“<NAME> (<BIRTHDATE>–”

when formulated as regular expressions, can be used to locate the correct answer.

2 Learning of Patterns

We describe the pattern-learning algorithm with an example. A table of patterns is constructed for each individual question type by the following procedure (Algorithm 1).

Select an example for a given question type. Thus for BIRTHYEAR questions we select “Mozart 1756” (we refer to “Mozart” as the question term and “1756” as the answer term).
Submit the question and the answer term as queries to a search engine. Thus, we give the query +“Mozart” +“1756” to AltaVista (http://www.altavista.com).
Download the top 1000 web documents provided by the search engine.
Apply a sentence breaker to the documents.
Retain only those sentences that contain both the question and the answer term. Tokenize the input text, smooth variations in white space characters, and remove html and other extraneous tags, to allow simple regular expression matching tools such as egrep to be used.
Pass each retained sentence through a suffix tree constructor. This finds all substrings, of all lengths, along with their counts. For example consider the sentences “The great composer Mozart (1756–1791) achieved fame at a young age” “Mozart (1756–1791) was a genius”, and “The whole world would always be indebted to the great music of Mozart (1756–1791)”. The longest matching substring for all 3 sentences is “Mozart (1756–1791)”, which the suffix tree would extract as one of the outputs along with the score of 3.
Pass each phrase in the suffix tree through a filter to retain only those phrases that contain both the question and the answer term. For the example, we extract only those phrases from the suffix tree that contain the words “Mozart” and “1756”.
Replace the word for the question term by the tag “<NAME>” and the word for the answer term by the term “<ANSWER>”.

3 Experiments

From our Webclopedia QA Typology (Hovy et al., 2002a) we selected 6 different question types: BIRTHDATE, LOCATION, INVENTOR, DISCOVERER, DEFINITION, WHY-FAMOUS. The pattern table for each of these question types was constructed using Algorithm 1.

Some of the patterns obtained along with their precision are as follows

BIRTHYEAR
1.00 <NAME> (<ANSWER> - )
0.85 <NAME> was born on <ANSWER>,
0.60 <NAME> was born in <ANSWER>
0.59 <NAME> was born <ANSWER>
0.53 <ANSWER> <NAME> was born
0.50 - <NAME> (<ANSWER>
0.36 <NAME> (<ANSWER> -
0.32 <NAME> (<ANSWER> ),
0.28 born in <ANSWER>, <NAME>
0.20 of <NAME> (<ANSWER>

INVENTOR
1.0 <ANSWER> invents <NAME>
1.0 the <NAME> was invented by <ANSWER>
1.0 <ANSWER> invented the <NAME> in
1.0 <ANSWER> ' s invention of the <NAME>
1.0 <ANSWER> invents the <NAME> .
1.0 <ANSWER> ' s <NAME> was
1.0 <NAME>, invented by <ANSWER>
1.0 <ANSWER> ' s <NAME> and
1.0 that <ANSWER> ' s <NAME>
1.0 <NAME> was invented by <ANSWER>,

DISCOVERER
1.0 when <ANSWER> discovered <NAME>
1.0 <ANSWER> ' s discovery of <NAME>
1.0 <ANSWER>, the discoverer of <NAME>
1.0 <ANSWER> discovers <NAME> .
1.0 <ANSWER> discover <NAME>
1.0 <ANSWER> discovered <NAME>, the
1.0 discovery of <NAME> by <ANSWER>.
0.95 <NAME> was discovered by <ANSWER>
0.91 of <ANSWER> ' s <NAME>
0.9 <NAME> was discovered by <ANSWER> in

DEFINITION
1.0 <NAME> and related <ANSWER>s
1.0 <ANSWER> (<NAME>,
1.0 <ANSWER>, <NAME> .
1.0, a <NAME> <ANSWER>,
1.0 (<NAME> <ANSWER> ),
1.0 form of <ANSWER>, <NAME>
1.0 for <NAME>, <ANSWER> and
1.0 cell <ANSWER>, <NAME>
1.0 and <ANSWER> > <ANSWER> ><NAME>
0.94 as <NAME>, <ANSWER> and

WHY-FAMOUS
1.0 <ANSWER> <NAME> called
1.0 laureate <ANSWER> <NAME>
1.0 by the <ANSWER>, <NAME>,
1.0 <NAME> - the <ANSWER> of
1.0 <NAME> was the <ANSWER> of
0.84 by the <ANSWER> <NAME>,
0.8 the famous <ANSWER> <NAME>,
0.73 the famous <ANSWER> <NAME>
0.72 <ANSWER> > <NAME>
0.71 <NAME> is the <ANSWER> of

LOCATION
1.0 <ANSWER> ' s <NAME> .
1.0 regional : <ANSWER> : <NAME>
1.0 to <ANSWER> ' s <NAME>,
1.0 <ANSWER> ' s <NAME> in
1.0 in <ANSWER> ' s <NAME>,
1.0 of <ANSWER> ' s <NAME>,
1.0 at the <NAME> in <ANSWER>
0.96 the <NAME> in <ANSWER>,
0.92 from <ANSWER> ' s <NAME>
0.92 near <NAME> in <ANSWER>

4 Shortcoming and Extensions

No external knowledge has been added to these patterns. We frequently observe the need for matching part of speech and/or semantic types, however. For example, the question: “Where are the Rocky Mountains located?” is answered by “Denver’s new airport, topped with white fiberglass cones in imitation of the Rocky Mountains in the background, continues to lie empty”, because the system picked the answer “the background” using the pattern “the <NAME> in <ANSWER>,”. Using a named entity tagger and/or an ontology would enable the system to use the knowledge that “background” is not a location. DEFINITION questions pose a related problem. Frequently the system’s patterns match a term that is too general, though correct technically. For “what is nepotism?” the pattern “<ANSWER>, <NAME>” matches “…in the form of widespread bureaucratic abuses: graft, nepotism…”; for “what is sonar?” the pattern “<NAME> and related <ANSWER>s” matches “…while its sonar and related underseas systems are built…”.

The patterns cannot handle long-distance dependencies. For example, for “Where is London?” the system cannot locate the answer in “London, which has one of the most busiest airports in the world, lies on the banks of the river Thames” due to the explosive danger of unrestricted wildcard matching, as would be required in the pattern “<QUESTION>, (<any_word>)*, lies on <ANSWER>”. This is one of the reasons why the system performs very well on certain types of questions from the web but performs poorly with documents obtained from the TREC corpus. The abundance and variation of data on the Internet allows the system to find an instance of its patterns without losing answers to longterm dependencies. The TREC corpus, on the other hand, typically contains fewer candidate answers for a given question and many of the answers present may match only long-term dependency patterns.

More information needs to be added to the text patterns regarding the length of the answer phrase to be expected. The system searches in the range of 50 bytes of the answer phrase to capture the pattern. It fails to perform under certain conditions as exemplified by the question “When was Lyndon B. Johnson born?”. The system selects the sentence “Tower gained national attention in 1960 when he lost to democratic Sen. Lyndon B. Johnson, who ran for both reelection and the vice presidency” using the pattern “<NAME> <ANSWER> –“. The system lacks the information that the <ANSWER> tag should be replaced exactly by one word. Simple extensions could be made to the system so that instead of searching in the range of 50 bytes for the answer phrase it could search for the answer in the range of 1–2 chunks (basic phrases in English such as simple NP, VP, PP, etc.).

A more serious limitation is that the present framework can handle only one anchor point (the question term) in the candidate answer sentence. It cannot work for types of question that require multiple words from the question to be in the answer sentence, possibly apart from each other. For example, in “Which county does the city of Long Beach lie?”, the answer “Long Beach is situated in Los Angeles County” requires the pattern. “<QUESTION_TERM_1> situated in <ANSWER> <QUESTION_TERM_2>”, where <QUESTION_TERM_1> and <QUESTION_TERM_2> represent the terms “Long Beach” and “county” respectively. The performance of the system depends significantly on there being only one anchor word, which allows a single word match between the question and the candidate answer sentence. The presence of multiple anchor words would help to eliminate many of the candidate answers by simply using the condition that all the anchor words from the question must be present in the candidate answer sentence.

The system does not classify or make any distinction between upper and lower case letters. For example, “What is micron?” is answered by “In Boise, Idaho, a spokesman for Micron, a maker of semiconductors, said Simms are ‘ a very high volume product for us …’ ”. The answer returned by the system would have been perfect if the word “micron” had been capitalized in the question. Canonicalization of words is also an issue. While giving examples in the bootstrapping procedure, say, for BIRTHDATE questions, the answer term could be written in many ways (for example, Gandhi’s birth date can be written as “1869”, “Oct. 2, 1869”, “2nd October 1869”, “October 2 1869”, and so on). Instead of enlisting all the possibilities a date tagger could be used to cluster all the variations and tag them with the same term. The same idea could also be extended for smoothing out the variations in the question term for names of persons (Gandhi could be written as “Mahatma Gandhi”, “Mohandas Karamchand Gandhi”, etc.).

Conclusion

The web results easily outperform the TREC results. This suggests that there is a need to integrate the outputs of the Web and the TREC corpus. Since the output from the Web contains many correct answers among the top ones, a simple word count could help in eliminating many unlikely answers. This would work well for question types like BIRTHDATE or LOCATION but is not clear for question types like DEFINITION.

The simplicity of this method makes it perfect for multilingual QA. Many tools required by sophisticated QA systems (named entity taggers, parsers, ontologies, etc.) are language specific and require significant effort to adapt to a new language. Since the answer patterns used in this method are learned using only a small number of manual training terms, one can rapidly learn patterns for new languages, assuming the web search engine is appropriately switched.

References

Brill, E., J. Lin, Michele Banko, S. Dumais, and A. Ng. (2001). Data-Intensive Question Answering. Proceedings of the TREC-10 Conference. NIST, Gaithersburg, MD, 183--189.
Gusfield, D. (1997). Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology. Chapter 6: Linear Time construction of Suffix trees, 94--121.
Sanda M. Harabagiu, Dan Moldovan, Marius Paşca, Rada Mihalcea, Mihai Surdeanu, R. Buneascu, R. Gîrju, V. Rus and P. Morarescu. (2001). FALCON: Boosting Knowledge for Answer Engines. Proceedings of the 9th Text Retrieval Conference (TREC-9), NIST, 479--488.
Eduard Hovy, U. Hermjakob, and C.-Y. Lin. (2001). The Use of External Knowledge in Factoid QA. Proceedings of the TREC-10 Conference. NIST, Gaithersburg, MD, 166--174.
Eduard Hovy, U. Hermjakob, and D. Ravichandran. 2002a. A Question/Answer Typology with Surface Text Patterns. Proceedings of the Human Language Technology (HLT) conference. San Diego, CA.
Eduard Hovy, Ulf Hermjakob, Chin-Yew Lin, Deepak Ravichandran, Using knowledge to facilitate factoid answer pinpointing, Proceedings of the 19th International Conference on Computational linguistics, p.1-7, August 24-September 01, 2002, Taipei, Taiwan doi:10.3115/1072228.1072270
Lee, G. G., J. Seo, S. Lee, H. Jung, B-H. Cho, C. Lee, B-K. Kwak, J, Cha, D. Kim, J-H. An, H. Kim, and K. Kim. (2001). SiteQ: Engineering High Performance QA System Using Lexico-Semantic Pattern Matching and Shallow NLP. Proceedings of the TREC-10 Conference. NIST, Gaithersburg, MD, 437--446.
Chin-Yew Lin, The effectiveness of dictionary and web-based answer reranking, Proceedings of the 19th International Conference on Computational linguistics, p.1-7, August 24-September 01, 2002, Taipei, Taiwan doi:10.3115/1072228.1072254
Prager, J. and J. Chu-Carroll. (2001). Use of WordNet Hypernyms for Answering What-Is Questions. Proceedings of the TREC-10 Conference. NIST, Gaithersburg, MD, 309--316.
Ellen Riloff (1996). Automatically Generating Extraction Patterns from Untagged Text. Proceedings of the Thirteenth National Conference on Artificial Intelligence (AAAI-96), 1044--1049.
Soubbotin, M. M. and S. M. Soubbotin. (2001). Patterns of Potential Answer Expressions as Clues to the Right Answer. Proceedings of the TREC-10 Conference. NIST, Gaithersburg, MD, 175--182.
Rohini K Srihari, Wei Li, A question answering system supported by Information Extraction, Proceedings of the sixth Conference on Applied Natural Language Processing, p.166-172, April 29-May 04, 2000, Seattle, Washington doi:10.3115/974147.974170
Ellen Voorhees. (2001). Overview of the Question Answering Track. Proceedings of the TREC-10 Conference. NIST, Gaithersburg, MD, 157--165.
Wang, B., H. Xu, Z. Yang, Y. Liu, X. Cheng, D. Bu, and S. Bai. (2001). TREC-10 Experiments at CAS-ICT: Filtering, Web, and QA. Proceedings of the TREC-10 Conference. NIST, Gaithersburg, MD, 229--241.

,

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2002 LearningSurfaceTextPatternsForAQASystem	Eduard Hovy Deepak Ravichandran			Learning Surface Text Patterns for a Question Answering System			http://dx.doi.org/10.3115/1073083.1073092	10.3115/1073083.1073092		2002