2005 DiscovRelsBetNEsFromALargeCorpus

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Relation Recognition from Text Algorithm, Unsupervised Learning Algorithm

Notes

Cited By

Quotes

Abstract

  • We propose a tree-similarity-based unsupervised learning method to extract relations between Named Entities from a large raw corpus. Our method regards relation extraction as a clustering problem on shallow parse trees. First, we modify previous tree kernels on relation extraction to estimate the similarity between parse trees more efficiently. Then, the similarity between parse trees is used in a hierarchical clustering algorithm to group entity pairs into different clusters. Finally, each cluster is labeled by an indicative word and unreliable clusters are pruned out. Evaluation on the New York Times (1995) corpus shows that our method outperforms the only previous work by 5 in F-measure. It also shows that our method performs well on both high-frequent and lessfrequent entity pairs. To the best of our knowledge, this is the first work to use a tree similarity metric in relation clustering."

Introduction

  • Hasegawa et al. [8] proposed a cosine similarity-based unsupervised learning approach for extracting relations from a large raw corpus. The context words in between the same entity pairs in different sentences are used to form word vectors, which are then clustered according to the cosine similarity. This approach does not rely on any annotated corpus and works effectively on high-frequent entity pairs [8]. However, there are two problems in this approach:
    • The assumption that the same entity pairs in different sentences have the same relation.
    • The cosine similarity measure between the flat feature vectors, which only consider the words between entities.
  • In this paper, we propose a tree similarity-based unsupervised learning approach for relation extraction. In order to resolve the above two problems in Hasegawa et al. [8], we assume that the same entity pairs in different sentences can have different relation types. Moreover, rather than the cosine similarity measure, a similarity function over parse trees is proposed to capture much larger feature spaces instead of the simple word features.

Tree Similarity-based Unsupervised Learning

  • We use the shallow parse tree as the representation of relation instances, and regard relation extraction as a clustering problem on shallow parse trees. Our method consists of three steps:
  1. Calculating the similarity between two parse trees using a tree similarity function;
  2. Clustering relation instances based on the similarities using a hierarchical clustering algorithm;
  3. Labeling each cluster using indicative words as its relation type, and pruning out unreliable clusters.

References

  • MUC. 1987-1998. The nist MUC website: http://www.itl.nist.gov/iaui/894.02/related_projects/muc/
  • Miller, S., Fox, H., Ramshaw, L. and Weischedel, R. (2000). A novel use of statistical parsing to extract information from text. Proceedings of NAACL-00.
  • Zelenko, D., Aone, C. and Richardella, A. (2003). Kernel Methods for Relation Extraction.
  • Journal of Machine Learning Research. 2003(2):1083-1106
  • Culotta, A. and Sorensen, J. (2004). Dependency Tree Kernel for Relation Extraction. Proceeding of ACL-04
  • Nanda Kambhatla (2004). Combining Lexical, Syntactic, and Semantic Features with Maximum Entropy Models for Extracting Relations. Proceeding of ACL-04, Poster paper.
  • Eugene Agichtein and Gravano, L. (2000). Snow-ball: Extracting Relations from Large Plain-text Collections. Proceedings of the Fifth ACM International Conference on Digital Libraries.
  • Stevenson, M. (2004). An Unsupervised WordNet-based Algorithm for Relation Extraction. Proceedings of the 4th LREC workshop "Beyond Named Entity: Semantic Labeling for NLP tasks"
  • Hasegawa, T., Satoshi Sekine and Grishman, R. (2004). Discovering Relations among Named Entities from Large Corpora. Proceeding of ACL-04
  • Vladimir N. Vapnik (1998). Statistical Learning Theory. John Wiley
  • Michael Collins and Duffy, N. (2001). Convolution Kernels for Natural Language. Proceeding of NIPS-01
  • Michael Collins and Duffy, N. (2002). New Ranking Algorithm for Parsing and Tagging: Kernel over Discrete Structure, and the Voted Perceptron. Proceeding of ACL-02.
  • Haussler, D. (1999). Convolution Kernels on Discrete Structures. Technical Report UCSCRL-99-10, University of California
  • Lodhi, H., Saunders, C., John Shawe-Taylor, Cristianini, N. and Watkins, C. (2002). Text classification using string kernel. Journal of Machine Learning Research, 2002(2):419-444
  • Suzuki, J., Hirao, T., Sasaki Y. and Maeda, E. (2003). Hierarchical Directed Acyclic Graph Kernel: Methods for Structured Natural Language Data. Proceedings of ACL-03
  • Suzuki, J., Isozaki, H. and Maeda, E. (2003). Convolution Kernels with Feature Selection for Natural Language Processing Tasks. Proceedings of ACL-04
  • Alessandro Moschitti (2004). A study on Convolution Kernels for Shallow Semantic Parsing. Proceedings of ACL-04
  • Christopher D. Manningd Hinrich Schütze (1999). Foundations of Statistical Natural Language Processing. The MIT Press: 500-527
  • Michael Collins (1999). Head-Driven Statistical Models for Natural Language Parsing. Ph.D. Thesis. University of Pennsylvania
  • Christiane Fellbaum (1998). WordNet: An Electronic Lexical Database and some of its Applications. Cambridge, MA: MIT Press.
  • Satoshi Sekine (2001). OAK System (English Sentence Analysis). Http://nlp.cs.nyu.edu/oak
  • Satoshi Sekine, Sudo, K. and Nobata, C. (2002). Extended named entity hierarchy. Proceedings of LREC-02
  • ACE. (2004). The Automatic Content Extraction (ACE) Projects. http://www.ldc.upenn.edu/ Projects/ACE/,


 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2005 DiscovRelsBetNEsFromALargeCorpusMin Zhang
Jian Su
Danmei Wang
Chew Lim Tan
GuoDong Zhou
Discovering Relations between Named Entities from a Large Raw Corpus Using Tree Similarity-based Clusteringhttp://www.comp.nus.edu.sg/~tancl/Papers/IJCNLP05/144 IE IJCNLP manuscript.pdf