Relation Mention Recognition Task

A relation mention recognition task is a relation recognition task that is a mention recognition task (requires the identification and classification of semantic relation mentions within a document set.

AKA: RMR, Textual Relationship Recognition.
Context:
- Input: a Text Corpus.
  - optional: A set of One or more aought Semantic Relation Types (such as: IsA, HasA, HeadquarterLocation, ..., Semantic Relation Description).
  - optional: An Annotated Training Set with examples of the sought Semantic Relation Mentions.
- output: The set of Semantic Relation Mentions from the Corpus.
  - optional: A Recognition Model/Semantic Relation Recognizer.
- Performance Metrics:
  - Output Correctness (e.g. Accuracy, Precision, Recall, F-Measure)
  - Algorithm Consumption (e.g. Computational Complexity and Space Complexity).
  - It can be measured against a Semantic Relation Mention Recognition Benchmark Task.
- It can be solved by a Relation Mention Recognition System (that implements a Relation Mention Recognition algorithm).
- It can be supported by a Recognition Model Training Task.
- It can be decomposed into a Semantic Relation Mention Detection Task and a Semantic Relation Classification Task.
- It can range from being a Simple Relation Mention Recognition Task to being a Complex Relation Mention Recognition Task.
- It can range from being a Unary Relation Mention Recognition Task to being a Binary Relation Mention Recognition Task to being an N-ary Relation Mention Recognition Task.
- It can support: Relation Mention Annotation, Relation Mention Extraction, Question Answering, Information Retrieval, Resolution Tasks, ...
Example(s):
- a Subsumption Relation Mention Recognition Task, such as: RMR("A cat is a mammal.”) ⇒ TypeOf(cat, mammal). (of a subsumption relation mention)
- RMR("My cookie has chocolate chips.”) ⇒ Contains(chocolate chips, cookie). (see: meronymy relation, quantification)
- RMR("Alexander went to Australia.”) ⇒ RelocatedTo(Alexander, Australia).
- RMR("Microsoft is based in Redmond.”) ⇒ CompanyHeadquarterLocation(Microsoft, Redmond). (see: domain specific relation)
- RMR("Albert's niece Ann got engaged to John.”) DaughterOfSibling(Albert,Ann) ^ Engaged(Ann, John). (see: appositive relation).
- RMR("The expression of mouse p53 inhibits simian virus 40 replication.”) ⇒ OrganismComponent(mouse, p53). An Organism Component Semantic Relation Recognition Task.
- RMR("XyaA is one of E. coli’s proteins. It is found in the periplasmic space.”) ⇒ SubcellularLocalization(E. coli, XyaA, periplasmic space), a Subcellular Localization Relation Recognition Task.
- RMR("Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1 and this modification served as a priming step to promote subsequent Cdc5-dependent Swe1 hyperphosphorylation and degradation.”) ⇒ Complex: Clb2–Cdc28; and Phosphorylation: Clb2=>Swe1, Cdc28=>Swe1, and Cdc5=>Swe1, a Protein-Protein Interaction Recognition Task.
- RMR("He wouldn't accept anything of value from those he was writing about.”) ⇒ [A0 He] [AM-MOD would] [AM-NEG n't] [V accept] [A1 anything of value] from [A2 those he was writing about] ., a Semantic Role Labeling Task.
- HeadquarterLocation(Organization, Location) (Snowball)
- SubcellularProteinLocalization(Organism, Protein, Location) (PPLRE Project).
- a Semantic Relation Mention Recognition Benchmark Task.
- …
Counter-Example(s):
See: Information Extraction Task, Word Sense Disambiguation Task.

References

2008

(Sarawagi, 2008) ⇒ Sunita Sarawagi. (2008). “Information extraction.” In: FnT Databases, 1(3).
- The problem of relationship extraction has been studied extensively on natural language text, including news articles [1], scientific publications [166], Blogs, emails [113], and sources like Wikipedia [196, 197] and the general web [4, 14].

2007

(Girju et al., 2007) ⇒ Roxana Girju, Preslav Nakov, Vivi Nastase, Stan Szpakowicz, Peter D. Turney, and Deniz Yuret. (2007). “SemEval-2007 Task 04: Classification of Semantic Relations between Nominals.” In: Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval 2007).

2006

(McCallum, 2006) ⇒ Andrew McCallum. (2006). “Information Extraction, Data Mining and Joint Inference. SIGKDD Proceedings (KDD-2006). (paper.pdf)
- QUOTE: "The task of relation extraction is to discover connections between entities in text."
- QUOTE: "Information Extraction = segmentation + classification + clustering + association"
(Culotta et al., 2006) ⇒ Aron Culotta, Andrew McCallum, and Jonathan Betz. (2006). “Integrating Probabilistic Extraction Models and Data Mining to Discover Relations and Patterns in Text.” In: Proceedings of HLT-NAACL 2006.
- QUOTE: Relation extraction is the task of discovering semantic connections between entities. In text, this usually amounts to examining pairs of entities in a document and determining (from local language cues) whether a relation exists between them. Common approaches to this problem include ...

2005

(Bizer et al., 2005) ⇒ Christian Bizer, Ralf Heese, Malgorzata Mochol, Radoslaw Oldakowski, Robert Tolksdorf, and Rainer Eckstein. (2005). “The Impact of Semantic Web Technologies on Job Recruitment Processes.” 7. Internationale Tagung Wirtschaftsinformatik (WI 2005).

2004

(Culotta & Sorensen, 2004) ⇒ Aron Culotta, and Jeffrey S. Sorensen. (2004). “Dependency Tree Kernels for Relation Extraction.” In: Proceedings of ACL Conference (ACL 2004).

2003

Mihai Surdeanu, Sanda M. Harabagiu, J. Williams and P. Aarseth. (2003). Using Predicate-Argument Structures for Information Extraction. In: Proceedings of Assoc. for Computational Linguistics (ACL). http://acl.ldc.upenn.edu/acl2003/main/pdfs/Surdeanu.pdf
- Induce predicate-argument structures from parse trees with simple rules that map predicate arguments to domain-specific template slots.
2003_PARC_InformationSciencesAndTechnologiesLaboratory

2002

(Roth and Yih, 2002) ⇒ Dan Roth and W. Yih. (2002). “Probabilistic Reasoning for Entity & Relation Recognition.” In: the 20th International Conference on Computational Linguistics (COLING-2002). paper.pdf
(Laender et al., 2002) ⇒ Alberto H. F. Laender, Berthier A. Ribeiro-Neto, Altigran S. da Silva, and Juliana S. Teixeira. (2002). “A Brief Survey of Web Data Extraction Tools.” In: ACM SIGMOD Record, 31(2). doi:10.1145/565117.565137

2001

Fabio Ciravegna. (2001). Adaptive information extraction from text by rule induction and generalization. In: Proceedings of the 17th International Joint Conference on Artificial Intelligence (IJCAI 2001).
(Park, 2001) ⇒ J. C. Park. (2001). Using Combinatory Categorical Grammar to Extract Biomedical Information. In: IEEE Intelligent Systems.
- applies parsing for automatic database curation from biomedical research papers.

2000

(Agichtein and Gravano, 2000) ⇒ Eugene Agichtein and L. Gravano. (2000). “Snowball: Extracting Relations from Large Plain-Text Collections.” In: Proceedings of the 5th ACM International Conference on Digital Libraries (DL-2000). (tech report.pdf)
(Miller et al., 2000) ⇒ Scott Miller, Heidi Fox, Lance Ramshaw, and Ralph Weischedel. (2000). “A Novel Use of Statistical Parsing to Extract Information from Text.” In: Proceedings of NAACL Conference (NAACL 2000).
- Add simple entity and relation annotations on top of syntax, and train a parser to recover both in parallel. Finished second in MUC-7.
(Nahm & Mooney, 2000) ⇒ U. Y. Nahm, and Raymond Mooney. (2000). “A Mutually Beneficial Integration of Data Mining and Information Extraction.” In: Proceedings of the Seventeenth National Conference on ArtificialIntelligence (AAAI-2000).
- This paper describes a system called DiscoTEX, that combines IE and data mining methodologies to perform text mining as well as improve the performance of the underlying extraction system. Rules mined from a database extracted from a corpus of texts are used to predict additional information to extract from future documents, thereby improving the recall of IE. Encouraging results are presented on applying these techniques to a corpus of computer job postings from an Internet newsgroup.
(Cohen et al., 2000) ⇒ William W. Cohen, Andrew McCallum, and D. Quass. (2000). “Learning to Understand the Web.” In: Bulletin of the IEEE Computer Society Technical Committee on Data Engineering.
2000_SpeechAndLanguageProcessing.
- “Information Extraction tasks are characterized by two properties: the desired knowledge can be relatively simple and fixed templated, or frame, with slots that need to be filled in with material from the text, and only a small part of the information in the text is relevant for fillin in its frame”

1999

Colins & Singer. (1999). “Unsupervised models for named entity classification.”

1998

Dayne Freitag. (1998). Information Extraction from HTML: Application of a general learning approach. Proceedings of the Fifteenth Conference on Artificial Intelligence AAAI-98. http://citeseer.ist.psu.edu/freitag98information.html
(Giles et al., 1998) ⇒ C. Lee Giles, Kurt D. Bollacker, and Steve Lawrence. (1998). “CiteSeer: An automatic Citation Indexing System.” In: The Third ACM Conference on Digital Libraries (1998).
M. Craven, D. DiPasquo, Dayne Freitag, Andrew McCallum, Tom M. Mitchell, K. Nigam, and S. Slattery. (1998). Learning to extract symbolic knowledge from the world wide web. In: Proceedings of AAAI-98.

1997

(Khoo, 1997) ⇒ C. Khoo. 1997. The Use of Relation Matching in Information Retrieval. LIBRES: Library and Information Science Research Electronic Journal, 7(2). (paper.html)
N. Kushmerick, D. Weld and R. Doorenbos. (1997). “Wrapper induction for information extraction.” In: IJCAI (1997).
T. R. Leek. (1997). Information Extraction using Hidden Markov Models. Master's thesis, UC San Diego. http://citeseer.ist.psu.edu/leek97information.html
S. Soderland 1997 learning to extract Text Based information from the World Wide Web

1996

1996_SurveyOfInformationRetrievalAndFilteringMethods

1995

S. Soderland, D. Fisher, J. Aseltine, and W. Lehnert. (1995). Crystal: Inducing a Conceptual Dictionary. In: Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence. http://citeseer.ist.psu.edu/soderland95crystal.html

1993

Ellen Riloff. (1993). “Automatically constructing a dictionary for information extraction tasks."

1992

(Hearst, 1992) ⇒ Marti Hearst. (1992). “Automatic Acquisition of Hyponyms from Large Text Corpora.” In: Proceedings of the 14th International Conference on Computational Linguistics (COLING 1992).

1991

L. Rau. (1991). “Extracting Company Names From Text.” In: Proceedings of the Sixth Conference on Artificial Intelligence Applications.

Notes

IE Task Open Issues

Integration of IE and TM [2003_ANoteOnUnifying...]
Allow for patterns to refer to generalized words. E.g. “to recognize as" <=> "to know as" by WordNet relationship (validate this example)
Weak theoretical underpinnings
The extraction of Long-Distance Dependency (LDD) and the mapping to shallow semantic representations is not always possible from the output of Syntactic Parsers.

Relation Types

Generic/Specific
- Generic: InstanceOf(entity, class), IsA(subclass, class), PartOf(part, thing),
- Specific: Cities(x), Elements(x), HeadquarterLocation(organization, location), DateOfBirth(person, date), Person(x)

IE Task Models, Summary

Text Surface Pattern.
- Hearst "x such as y"
- Snowball: [left words, EntityType1, middle words, <w> <w>.
- Part-of-Speech
Lexico-Syntactic Pattern.
- Dependency Grammars (Suchanek et al., 2006)
- Phrase-Structure Grammars (Bunescu & Mooney, 2006)

IE Task Evaluation Metrics

An IE system is typically evaluated in terms of:
Precision:
- # of correct answers biven by the system / total # of answers given
Recall:
- # of correct answers given by the system / total # of possible correct answers in the text
  - absolute
  - relative
Fallout:
- # of incorrect answers given by the system / # of spurious facts in the text
F-measure: ...

Relation Mention Recognition Task

References

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

1996

1995

1993

1992

1991

Notes

IE Task Open Issues

Relation Types

IE Task Models, Summary

IE Task Evaluation Metrics

Navigation menu

Search