2005 DiscovRelsBetNEsFromALargeCorpus

(Zhang et al., 2005) ⇒ Min Zhang, Jian Su, Danmei Wang, Guodong Zhou, Chew Lim Tan. (2005). “Discovering Relations between Named Entities from a Large Raw Corpus Using Tree Similarity-based Clustering.” In: Proceedings of the Second International Joint Conference on Natural Language Processing (IJCNLP-2005).

Subject Headings: Relation Recognition from Text Algorithm, Unsupervised Learning Algorithm

Notes

Cited By

Quotes

Abstract

We propose a tree-similarity-based unsupervised learning method to extract relations between Named Entities from a large raw corpus. Our method regards relation extraction as a clustering problem on shallow parse trees. First, we modify previous tree kernels on relation extraction to estimate the similarity between parse trees more efficiently. Then, the similarity between parse trees is used in a hierarchical clustering algorithm to group entity pairs into different clusters. Finally, each cluster is labeled by an indicative word and unreliable clusters are pruned out. Evaluation on the New York Times (1995) corpus shows that our method outperforms the only previous work by 5 in F-measure. It also shows that our method performs well on both high-frequent and lessfrequent entity pairs. To the best of our knowledge, this is the first work to use a tree similarity metric in relation clustering."

Introduction

Hasegawa et al. [8] proposed a cosine similarity-based unsupervised learning approach for extracting relations from a large raw corpus. The context words in between the same entity pairs in different sentences are used to form word vectors, which are then clustered according to the cosine similarity. This approach does not rely on any annotated corpus and works effectively on high-frequent entity pairs [8]. However, there are two problems in this approach:
- The assumption that the same entity pairs in different sentences have the same relation.
- The cosine similarity measure between the flat feature vectors, which only consider the words between entities.
In this paper, we propose a tree similarity-based unsupervised learning approach for relation extraction. In order to resolve the above two problems in Hasegawa et al. [8], we assume that the same entity pairs in different sentences can have different relation types. Moreover, rather than the cosine similarity measure, a similarity function over parse trees is proposed to capture much larger feature spaces instead of the simple word features.

Tree Similarity-based Unsupervised Learning

We use the shallow parse tree as the representation of relation instances, and regard relation extraction as a clustering problem on shallow parse trees. Our method consists of three steps:

Calculating the similarity between two parse trees using a tree similarity function;
Clustering relation instances based on the similarities using a hierarchical clustering algorithm;
Labeling each cluster using indicative words as its relation type, and pruning out unreliable clusters.

References

MUC. 1987-1998. The nist MUC website: http://www.itl.nist.gov/iaui/894.02/related_projects/muc/
Miller, S., Fox, H., Ramshaw, L. and Weischedel, R. (2000). A novel use of statistical parsing to extract information from text. Proceedings of NAACL-00.
Zelenko, D., Aone, C. and Richardella, A. (2003). Kernel Methods for Relation Extraction.
Journal of Machine Learning Research. 2003(2):1083-1106
Culotta, A. and Sorensen, J. (2004). Dependency Tree Kernel for Relation Extraction. Proceeding of ACL-04
Nanda Kambhatla (2004). Combining Lexical, Syntactic, and Semantic Features with Maximum Entropy Models for Extracting Relations. Proceeding of ACL-04, Poster paper.
Eugene Agichtein and Gravano, L. (2000). Snow-ball: Extracting Relations from Large Plain-text Collections. Proceedings of the Fifth ACM International Conference on Digital Libraries.
Stevenson, M. (2004). An Unsupervised WordNet-based Algorithm for Relation Extraction. Proceedings of the 4th LREC workshop "Beyond Named Entity: Semantic Labeling for NLP tasks"
Hasegawa, T., Satoshi Sekine and Grishman, R. (2004). Discovering Relations among Named Entities from Large Corpora. Proceeding of ACL-04
Vladimir N. Vapnik (1998). Statistical Learning Theory. John Wiley
Michael Collins and Duffy, N. (2001). Convolution Kernels for Natural Language. Proceeding of NIPS-01
Michael Collins and Duffy, N. (2002). New Ranking Algorithm for Parsing and Tagging: Kernel over Discrete Structure, and the Voted Perceptron. Proceeding of ACL-02.
Haussler, D. (1999). Convolution Kernels on Discrete Structures. Technical Report UCSCRL-99-10, University of California
Lodhi, H., Saunders, C., John Shawe-Taylor, Cristianini, N. and Watkins, C. (2002). Text classification using string kernel. Journal of Machine Learning Research, 2002(2):419-444
Suzuki, J., Hirao, T., Sasaki Y. and Maeda, E. (2003). Hierarchical Directed Acyclic Graph Kernel: Methods for Structured Natural Language Data. Proceedings of ACL-03
Suzuki, J., Isozaki, H. and Maeda, E. (2003). Convolution Kernels with Feature Selection for Natural Language Processing Tasks. Proceedings of ACL-04
Alessandro Moschitti (2004). A study on Convolution Kernels for Shallow Semantic Parsing. Proceedings of ACL-04
Christopher D. Manningd Hinrich Schütze (1999). Foundations of Statistical Natural Language Processing. The MIT Press: 500-527
Michael Collins (1999). Head-Driven Statistical Models for Natural Language Parsing. Ph.D. Thesis. University of Pennsylvania
Christiane Fellbaum (1998). WordNet: An Electronic Lexical Database and some of its Applications. Cambridge, MA: MIT Press.
Satoshi Sekine (2001). OAK System (English Sentence Analysis). Http://nlp.cs.nyu.edu/oak
Satoshi Sekine, Sudo, K. and Nobata, C. (2002). Extended named entity hierarchy. Proceedings of LREC-02
ACE. (2004). The Automatic Content Extraction (ACE) Projects. http://www.ldc.upenn.edu/ Projects/ACE/,

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2005 DiscovRelsBetNEsFromALargeCorpus	Min Zhang Jian Su Danmei Wang Chew Lim Tan GuoDong Zhou			Discovering Relations between Named Entities from a Large Raw Corpus Using Tree Similarity-based Clustering			http://www.comp.nus.edu.sg/~tancl/Papers/IJCNLP05/144 IE IJCNLP manuscript.pdf

2005 DiscovRelsBetNEsFromALargeCorpus

Notes

Cited By

Quotes

Abstract

Introduction

Tree Similarity-based Unsupervised Learning

References

Navigation menu

Search