1997 InformationExtraction

(Grishman, 1997) ⇒ Ralph Grishman. (1997). “Information extraction: Techniques and challenges.” In: Information Extraction: A Multidisciplinary Approach to an Emerging Information Technology, International Summer School, (SCIE 1997).

Subject Headings: Information Extraction Task

Notes

Cited By

(Agichtein, 2005) ⇒ Eugene Agichtein. (2005). “Scaling Information Extraction to Large Document Collections.” In: IEEE Data Eng. Bull., 28(4).
- The general information extraction process is outlined in Figure 1 (adapted from [15]).

Quotes

Abstract

Introduction

This volume takes a broad view of information extraction as any method for filtering information from large volumes of text. This includes the retrieval of documents from collections and the tagging of particular terms in text. In this paper we shall use a narrower definition: the identification of instances of a particular class of events or relationships in a natural language text, and the extraction of the relevant arguments of the event or relationship. Information extraction therefore involves the creation of a structured representation (such as a data base) of selected information drawn from the text.
The idea of reducing the information in a document to a tabular structure

is not new. Its feasibility for sublanguage texts was suggested by Zellig Harris in the 1950’s, and an early implementation for medical texts was done at New York University by Naomi Sager[20]. However, the specific notion of information extraction described here has received wide currency over the last decade through the series of Message Understanding Conferences [1, 2, 3, 4, 14]. We shall discuss these Conferences in more detail a bit later, and shall use simplied versions of extraction tasks from these Conferences as examples throughout this paper.

The Overall Flow

The process of information extraction has two major parts. First, the system extracts individual "facts" from the text of a document through local text analysis. Second, it integrates these facts, producing larger facts or new facts (through inference). As a final step after the facts are integrated, the pertinent facts are translated into the required output format.
Figure 1
- Document ⇒ Local Text Analysis (Lexical Analysis ⇒ Name Recognition ⇒ Partial Syntactic Analysis ⇒ Scenario Pattern Matching) ⇒ Discourse Analysis (Coreference Analysis ⇒ Inference) ⇒ Template Generation ⇒ Extracted Templates

References

1. Proceedings of the Third Message Understanding Conference (MUC-3). Morgan Kaufmann, May 1991.
2. Proceedings of the Fourth Message Understanding Conference (MUC-4). Morgan Kaufmann, June 1992.
3. Proceedings of the Fifth Message Understanding Conference (MUC-5), Baltimore, MD, August (1993). Morgan Kaufmann.
4. Proceedings of the Sixth Message Understanding Conference (MUC-6), Columbia, MD, November (1995). Morgan Kaufmann.
5. (Appelt et al., 1995) ⇒ Douglas E. Appelt, Jerry R. Hobbs, John Bear, David Israel, Megumi Kameyama, Andy Kehler, David Martin, Karen Meyers, and Mabry Tyson. (1995). “SRI International FASTUS system: MUC-6 test results and analysis.” In: Proceedings of the Sixth Message Understanding Conference (MUC-6).
6. (Appelt et al., 1993) ⇒ Douglas E. Appelt, Jerry R. Hobbs, John Bear, David J. Israel, and Mabry Tyson. (1993). “FASTUS: A Finite-state Processor for Information Extraction from Real-world Text.” In: Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence (IJCAI-93).
7. Amit Bagga and Alan Biermann. Analyzing the performance of message under- standing systems. Technical Report CS-1997-01, Dept. of Computer Science, Duke University, 1997.
8. Daniel Bikel, Scott Miller, Richard Schwartz, and Ralph Weischedel. Nymble: a high-performance learning name-nder. In: Proceedings of Fifth Applied Natural Language Processing Conf., Washington, DC, April (1997). Assn. for Computational Linguistics.

9. Michael Collins. A new statistical parser based on bigram lexical dependencies. In: Proceedings of 34th Annual Meeting Assn. Computational Linguistics, pages 184{191, Santa Cruz, CA, June 1996. 10. David Fisher, Stephen Soderland, Joseph McCarthy, Fangfang Feng, and Wendy Lehnert. Description of the UMass system as used for MUC-6. In: Proceedings of Sixth Message Understanding Conference (MUC-6), Columbia, MD, November (1995). Morgan Kaufmann. 11. Ralph Grishman. The NYU system for MUC-6 or where’s the syntax? In: Proceedings of Sixth Message Understanding Conference (MUC-6), Columbia, MD, November (1995). Morgan Kaufmann. 12. Ralph Grishman, Catherine Macleod, and Adam Meyers. Comlex Syntax: Building a computational lexicon. In: Proceedings of 15th Int’l Conference Computational Linguistics (COLING 94), pages 268{272, Kyoto, Japan, August 1994. 13. Ralph Grishman, Catherine Macleod, and John Sterling. New York University: Description of the Proteus System as used for MUC-4. In: Proceedings of Fourth Message Understanding Conference (MUC-4), pages 233{241, McLean, VA, June 1992. 14. Ralph Grishman and Beth Sundheim. Message Understanding Conference - 6: A brief history. In: Proceedings of 16th Int’l Conference on Computational Linguistics (COLING 96), Copenhagen, August 1996. 15. George Krupka. SRA: Description of the SRA system as used for MUC-6. In: Proceedings of Sixth Message Understanding Conference (MUC-6), Columbia, MD, November (1995). Morgan Kaufmann.

16. W. Lehnert, C. Cardie, D. Fisher, J. McCarthy, E. Rilo, and S. Soderland. University of Massachusetts: MUC-4 test results and analysis. In: Proceedings of Fourth Message Understanding Conf., McLean, VA, June (1992). Morgan Kaufmann,

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
1997 InformationExtraction	Ralph Grishman			Information extraction: Techniques and challenges			http://www.springerlink.com/index/k454643746325537.pdf