2004 CombLexSyntSemFeatWithMaxEntForRelEx

(Kambhatla, 2004) ⇒ Nanda Kambhatla. (2004). “Combining Lexical, Syntactic, and Semantic Features with Maximum Entropy Models for Extracting Relations.” In: Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL 2004). doi:10.3115/1219044.1219066

Subject Headings: Relation Recognition from Text Algorithm, ACE Benchmark Task, Poster Paper.

Notes

Cited By

~135 http://scholar.google.com/scholar?cites=9116156287934380826

2006

(Zhang, Zhang & Su, 2006) ⇒ M. Zhang, J. Zhang, and J. Su. (2006). “Exploring Syntactic Features for Relation Extraction using a Convolution Tree Kernel.” In: Proceedings of HLT-2006.
- "Kambhatla (2004) employs Maximum Entropy models to combine diverse lexical, syntactic and semantic features derived from the text for relation extraction. … Kambhatla (2004) use the path of non-terminals connecting two mentions in a parse tree as the parse tree features.

Quotes

Abstract

Extracting semantic relationships between entities is challenging because of a paucity of annotated data and the errors induced by entity detection modules. We employ Maximum Entropy models to combine diverse lexical, syntactic and semantic features derived from the text. Our system obtained competitive results in the Automatic Content Extraction (ACE) evaluation. Here we present our general approach and describe our ACE results.

1. Introduction

Extraction of semantic relationships between entities can be very useful for applications such as biography extraction and question answering, e.g. to answer queries such as “Where is the Taj Mahal?”. Several prior approaches to relation extraction have focused on using syntactic parse trees. For the Template Relations task of MUC-7, BBN researchers (Miller et al., 2000) augmented syntactic parse trees with semantic information corresponding to entities and relations and built generative models for the augmented trees. More recently, (Zelenko et al., 2003) have proposed extracting relations by computing kernel functions between parse trees and (Culotta and Sorensen, 2004) have extended this work to estimate kernel functions between augmented dependency trees.

We build Maximum Entropy models for extracting relations that combine diverse lexical, syntactic and semantic features. Our results indicate that using a variety of information sources can result in improved recall and overall F measure. Our approach can easily scale to include more features from a multitude of sources–e.g. WordNet, gazatteers, output of other semantic taggers etc.–that can be brought to bear on this task. In this paper, we present our general approach, describe the features we currently use and show the results of our participation in the ACE evaluation.

Automatic Content Extraction (ACE, 2004) is an evaluation conducted by NIST to measure Entity Detection and Tracking (EDT) and relation detection and characterization (RDC). The EDT task entails the detection of mentions of entities and chaining them together by identifying their coreference. In ACE vocabulary, entities are objects, mentions are references to them, and relations are explicitly or implicitly stated relationships among entities. Entities can be of five types: persons, organizations, locations, facilities, and geo-political entities (geographically defined regions that define a political boundary, e.g. countries, cities, etc.). Mentions have levels: they can be names, nominal expressions or pronouns.

The RDC task detects implicit and explicit relations between entities identified by the EDT task. Explict relations occur in text with explicit evidence suggesting the relationship. Implicit relations need not have explicit supporting evidence in text, though they should be evident from a reading of the document.

Here is an example:

"The American Medical Association voted yesterday to install the heir apparent as its president-elect, rejecting a strong, upstart challenge by a District doctor who argued that the nation’s largest physicians’ group needs stronger ethics and new leadership.
"In electing Thomas R. Reardon, an Oregon general practitioner who had been the chairman of its board, …

In this fragment, all the underlined phrases are mentions referring to the American Medical Association, or to Thomas R. Reardon or the board (an organization) of the American Medical Association. Moreover, there is an explicit management relation between chairman and board, which are references to Thomas R. Reardon and the board of the American Medical Association respectively. Relation extraction is hard, since successful extraction implies correctly detecting both the argument mentions, correctly chaining these mentions to their respective entities, and correctly determining the type of relation that holds between them.

2 Maximum Entropy models for extracting relations

We built Maximum Entropy models for predicting the type of relation (if any) between every pair of mentions within each sentence.

For each pair of mentions, we compute several feature streams shown below. All the syntactic features are derived from the syntactic parse tree and the dependency tree that we compute using a statistical parser trained on the PennTree Bank using the Maximum Entropy framework (Ratnaparkhi, 1999).

The feature streams are:

Words "The words of both the mentions and all the words in between.
Entity Type "The entity type (one of PERSON, ORGANIZATION, LOCATION, FACILITY, Geo-Political Entity or GPE) of both the mentions.
Mention Level "The mention level (one of NAME, NOMINAL, PRONOUN) of both the mentions.
Overlap The number of words (if any) separating the two mentions, the number of other mentions in between, flags indicating whether the two mentions are in the same noun phrase, verb phrase or prepositional phrase.
Dependency "The words and part-of-speech and chunk labels of the words on which the mentions are dependent in the dependency tree derived from the syntactic parse tree.
Parse Tree The path of non-terminals (removing duplicates) connecting the two mentions in the parse tree, and the path annotated with head words.

3 Experimental results

We divided the ACE training data provided by LDC into separate training and development sets. The training set contained around 300K words, and 9752 instances of relations and the development set contained around 46K words, and 1679 instances of relations.

Table 2: The Precision, Recall, F-measure and the ACE Value on the development set with true mentions and entities.

| Features | P R F Value|
| Words | 81.9 17.4 28.6 8.0|
| + Entity Type | 71.1 27.5 39.6 19.3|
| + Mention Level | 71.6 28.6 40.9 20.2|
| + Overlap | 61.4 38.8 47.6 34.7|
| + Dependency | 63.4 44.3 52.1 40.2|
| + Parse Tree | 63.5 45.2 52.8 40.9|

Table 3: The Precision, Recall, F-measure, and ACE Value on the development set with system output mentions and entities.

| Features | P R F Value|
| Words   | 58.4 11.1 18.6 5.9
| + Entity Type  | 43.6 14.0 21.1 12.5
| + Mention Level | 43.6 14.5 21.7 13.4
| + Overlap      | 35.6 17.6 23.5 21.0
| + Dependency   | 35.0 19.1 24.7 24.6
| + Parse Tree   | 35.5 19.8 25.4 25.2

Table 4: The F-measure and ACE Value for the test sets with true (T) and system output (S) mentions and entities.

| Eval Set | Value (T) | F (T) | Value (S) | F (S) | Feb’02 | 31.3 | 52.4 | 17.3 | 24.9 | Sept’03 | 39.4 | 55.2 | 18.3 | 23.6

As expected, the numbers are significantly lower for the system output runs due to errors made by the mention detection and mention chaining modules.

We ran the best model on the official ACE Feb’2002 and ACE Sept’2003 evaluation sets. We obtained competitive results shown in Table 4. The rules of the ACE evaluation prohibit us from disclosing our final ranking and the results of other participants.

4 Discussion

We have presented a statistical approach for extracting relations where we combine diverse lexical, syntactic, and semantic features. We obtained competitive results on the ACE RDC task. Several previous relation extraction systems have focused almost exclusively on syntactic parse trees. We believe our approach of combining many kinds of evidence can potentially scale better to problems (like ACE), where we have a lot of relation types with relatively small amounts of annotated data. Our system certainly benefits from features derived from parse trees, but it is not inextricably linked to them. Even using very simple lexical features, we obtained high precision extractors that can potentially be used to annotate large amounts of unlabeled data for semi-supervised or unsupervised learning, without having to parse the entire data. We obtained our best results when we combined a variety of features.

References

ACE. (2004). The NIST ACE evaluation website. http://www.nist.gov/speech/tests/ace/.
(Culotta and Sorensen, 2004) ⇒ Aron Culottaand J. S. Sorensen. (2004). “Dependency Tree Kernels for Relation Extraction.” In: Proceedings ofACL 2004.
Radu Florian, Hany Hassan, Hongyan Jing, Nanda Kambhatla, Xiaqiang Luo, Nicolas Nicolov, and Salim Roukos. (2004). A statistical model for multilingual entity detection and tracking. In: Proceedings of the Human Language Technologies Conference (HLTNAACL’ 04), Boston, Mass., May 27 – June 1.
Abraham Ittycheriah, Lucian Lita, Nanda Kambhatla, Nicolas Nicolov, Salim Roukos, and Margo Stys. (2003). Identifying and tracking entity mentions in a maximum entropy framework. In: Proceedings of the Human Language Technologies Conference (HLTNAACL’ 03), pages 40–42, Edmonton, Canada, May 27 – June 1.
Xiaoqiang Luo, Abraham Ittycheriah, Hongyan Jing, Nanda Kambhatla, and Salim Roukos. (2004). A mention-synchronous coreference resolution algorithm based on the bell tree. In: Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics, Barcelona, Spain, July 21–July 26.
Scott Miller, Heidi Fox, Lance Ramshaw, and Ralph Weischedel. (2000). A novel use of statistical parsing to extract information from text. In 1st Meeting of the North American Chapter of the Association for Computational Linguistics, pages 226–233, Seattle, Washington, April 29–May 4.
Adwait Ratnaparkhi. (1999). Learning to parse natural language with maximum entropy. Machine Learning (Special Issue on Natural Language Learning), 34(1-3):151–176.
Stephanie Strassel, Alexis Mitchell, and Shudong Huang. (2003). Multilingual resources for entity detection. In: Proceedings of the ACL 2003Workshop on Multilingual Resources for Entity Detection.
Dmitry Zelenko, Chinatsu Aone, and Anthony Richardella. (2003). Kernel methods for relation extraction. Journal of Machine Learning Research, 3:1083–1106.

,

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2004 CombLexSyntSemFeatWithMaxEntForRelEx	Nanda Kambhatla			Combining Lexical, Syntactic, and Semantic Features with Maximum Entropy Models for Extracting Relations		Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics	http://acl.ldc.upenn.edu/P/p04/P04-3022.pdf	10.3115/1219044.1219066		2004