1999 InfExtr-AUserGuide

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Information Extraction from Text Task

Notes

Cited By

~64 http://scholar.google.com/scholar?cites=11563041363194547028

Quotes

Abstract

  • This technical memo describes Information Extraction from the point-of-view of a potential user of the technology. No knowledge of language processing is assumed. Information Extraction is a process which takes unseen texts as input and produces fixed-format, unambiguous data as output. This data may be used directly for display to users, or may be stored in a database or spreadsheet for later analysis, or may be used for indexing purposes in Information Retrieval applications.

Types of IE

  • Named Entity recognition (NE): Finds and classifies names, places etc.
  • Coreference Resolution (CO): Identifies identity relations between entities in texts.
  • Template Element construction (TE): Adds descriptive information to NE results (using CO).
  • Template Relation construction (TR): Finds relations between TE entities.
  • Scenario Template production (ST): Fits TE and TR results into specified event scenarios.
  • From a user point-of-view, NE, TE, TR and ST are the most relevant IE tasks (CO, as noted below, is necessary as an adjunct to the other tasks, but is of limited direct usefulness to the IE system user). NE, TE, TR and ST provide progressively higher-level information about texts.

An Extended Example

  • When the system is specified, our imaginary analyst states that “the operational domains that user interests are centred around are... drug enforcement, money laundering, organised crime, terrorism, legislation. The entities of interest within these domains are cited as “person, company, bank, financial entity, transportation means, locality, place, organisation, time, telephone, narcotics, legislation, activity. A number of relations (or “links) are also specified, for example between people, between people and companies, etc. These relations are not typed, i.e. the kind of relation involved is not specified. Some relations take the form of properties of entities - e.g. the location of a company - whilst others denote events - e.g. a person visiting a ship.


  • For example, consider the following text:

   Reuter - New York, Wednesday 12 July (1996).
    • New York police announced today the arrest of Frederick J. Thompson, head of Jay Street Imports Inc., on charges of drug smuggling. Thompson was taken from his Manhattan apartment in the early hours yesterday. His attorney, Robert Giuliani, issued a statement denying any involvement with narcotics on the part of his client. “No way did Fred ever have dealings with dope, Guliani said.
    • A Jay Street spokesperson said the company had ceased trading as of today. The company, a medium-sized import-export concern established in 1989, had been the main contractor in several collaborative transport ventures involving Latin-American produce. Several associates of the firm moved yesterday to distance themselves from the scandal, including the mid-western transportation company Downing-Jones.
    • Thompson is understood to be accused of importing heroin into the United States.

    • From this IE might produce information such as the following (in some format to be determined according to user requirements, e.g. SQL statements addressing some database schema).


    • First, a list of entities and associated descriptive information. Relations of property type are made explicit. Each entity has an id, e.g. ENTITY-2, which can be used for cross-referencing between entities and for describing events involving entities. Each also has a type, or category, e.g. company, person. Additionally various type-specific information is available, e.g., for dates, a normalisation giving the date in standard format.

   Reuter
           id:            ENTITY-1
           type:          company
           business:      news
   New York
           id:            ENTITY-2
           type:          location
           subtype:       city
           is_in:         US
   Wednesday 12 July 1996
           id:            ENTITY-3
           type:          date
           normalisation: 12/07/1996
   New York police
           id:            ENTITY-4
           type:          organisation
           location:      ENTITY-2
   Frederick J. Thompson
           id:            ENTITY-5
           type:          person
           aliases:       Thompson; Fred
           domicile:      ENTITY-7
           profession:    managing director
           employer:      ENTITY-6

(These results correspond to the combination of NE and TE tasks; if we removed all but the type slots we would be left with the NE data.) Second, relations of event type, or scenarios:

   narcotics-smuggling
           id:            EVENT-1
           destination:   ENTITY-13
           source:        unknown
           perpetrators:  ENTITY-5, ENTITY-6
           status:        on-trial
   joint-venture
           id:            EVENT-2
           type:          transport
           companies:     ENTITY-6, ENTITY-11
           status:        past

(These results correspond to the ST task.)

References

  • [App99] Douglas E. Appelt. An Introduction to Information Extraction. Artificial Intelligence Communications, ?(?), 1999.
  • [ARP95] Defense Advanced Research Projects Agency. Proceedings of the Sixth Message Understanding Conference (MUC-6). Morgan Kaufmann, California, 1995.
  • [CL96] J. Cowie and W. Lehnert. Information Extraction. Communications of the ACM, 39(1):80-91, 1996.
  • [CWG96] H. Cunningham, Y. Wilks, and R.J. Gaizauskas. New Methods, Current Trends and Software Infrastructure for NLP. In: Proceedings of the Conference on New Methods in Natural Language Processing (NeMLaP-2), Bilkent University, Turkey, September (1996). http://xxx.lanl.gov/abs/cs.CL/9607025.
  • [GS96] Ralph Grishman and B. Sundheim. Message understanding conference - 6: A brief history. In: Proceedings of the 16th International Conference on Computational Linguistics, Copenhagen, June 1996.
  • [GWH+95] R. Gaizauskas, T. Wakao, K Humphreys, H. Cunningham, and Y. Wilks. Description of the LaSIE system as used for MUC-6. In: Proceedings of the Sixth Message Understanding Conference (MUC-6). Morgan Kaufmann, California, 1995.
  • [MOC96] R. Merchant, M.E. Okurowski, and N. Chinchor. The Multi Lingual Entity Tast (MET) Overview. In Advances in Text Processing - TIPSTER Programme Phase II. DARPA, Morgan Kaufmann, California, (1996).

,

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
1999 InfExtr-AUserGuideHamish CunninghamInformation Extraction - A User GuideSecond Editionhttp://www.dcs.shef.ac.uk/~hamish/IE/1999