An information extraction task (IE) is a data analysis task that requires the populating a data structure with the information contained in an information artifact.
References
2009
- (Wikipedia, 2009) http://en.wikipedia.org/wiki/Information_extraction
- In natural language processing, information extraction (IE) is a type of information retrieval whose goal is to automatically extract structured information, i.e. categorized and contextually and semantically well-defined data from a certain domain, from unstructured machine-readable documents. An example of information extraction is the extraction of instances of corporate mergers, more formally MergerBetween(company1,company2,date), from an online news sentence such as: "Yesterday, New-York based Foo Inc. announced their acquisition of Bar Corp." A broad goal of IE is to allow computation to be done on the previously unstructured data. A more specific goal is to allow logical reasoning to draw inferences based on the logical content of the input data.
- The significance of IE is determined by the growing amount of information available in unstructured (i.e. without metadata) form, for instance on the Internet. This knowledge can be made more accessible by means of transformation into relational form, or by marking-up with XML tags. An intelligent agent monitoring a news data feed requires IE to transform unstructured data into something that can be reasoned with.
- Typical subtasks of IE are:
- Named Entity Recognition: recognition of entity names (for people and organizations), place names, temporal expressions, and certain types of numerical expressions.
- Coreference: identification chains of noun phrases that refer to the same object. For example, anaphora is a type of coreference.
- Terminology extraction: finding the relevant terms for a given corpus
- Relationship Extraction: identification of relations between entities, such as:
- PERSON works for ORGANIZATION (extracted from the sentence "Bill works for IBM.")
- PERSON located in LOCATION (extracted from the sentence "Bill is in France.")
- (Wikipedia, 2009) http://en.wikipedia.org/wiki/Data_extraction
- Data extraction is the act or process of retrieving (binary) data out of (usually unstructured or badly structured) data sources for further data processing or data storage (data migration). The import into the intermediate extracting system is thus usually followed by data transformation and possibly the addition of metadata prior to export to another stage in the data workflow.
- Usually, the term data extraction is applied when (experimental) data is first imported into a computer from primary sources, like measuring or recording devices. Today's electronic devices will usually present a electrical connector (e.g. USB) through which 'raw data' can be streamed into a personal computer.
- http://biocreative.sourceforge.net/biocreative_glossary.html
- Information extraction (IE): IE systems perform natural language text analysis in order to identify information related to pre-defined types of entities (e.g. genes or proteins), relationships, facts or events.
2008
- (Sarawagi, 2008) => Sunita Sarawagi. (2008). "Information extraction. FnT Databases, 1(3), 2008.
- Information Extraction refers to the automatic extraction of structured information such as entities, relationships between entities, and attributes describing entities from unstructured sources. This enables much richer forms of queries on the abundant unstructured sources than possible with keyword searches alone. When structured and unstructured data co-exist, information extraction makes it possible to integrate the two types of sources and pose queries spanning them.
2007
2005
2003
1999
1997
1993