2005 AutoRelExtWModelOrderSelAndDiscrLabelId

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Unsupervised Relation Extraction.

Notes

Cited By

Quotes

Abstract

In this paper, we study the problem of unsupervised relation extraction based on model order identification and discriminative feature analysis. The model order identification is achieved by stability-based clustering and used to infer the number of the relation types between entity pairs automatically. The discriminative feature analysis is used to find discriminative feature words to name the relation types. Experiments on ACE corpus show that the method is promising.

Introduction

Relation extraction is the task of finding relationships between two entities from text contents. Recently, it has received more and more attention in many areas, e.g., information extraction, ontology construction, and bioinformatics, etc. In this paper, we propose an unsupervised method for relation extraction from corpus.

Since the concept of relation extraction was introduced in MUC 6 [1], there has been considerable work on supervised learning of relation patterns, using corpora which have been annotated to indicate the information to be extracted [2, 9, 8]. A range of extraction models have been used, including both symbolic rules and statistical rules such as HMMs or Kernels. These methods have been particularly successful in some specific domains. However, manually tagging of large amounts of training data is very time-consuming; furthermore, it is difficult for one extraction system to be ported across di®erent domains.

Evaluation method for relation labelling

For evaluation of the relation labeling, we need to explore the relatedness between the identified labels and the pre-defined relation names. To do this, we use one information-content based measure [16, 15], which is provided in Wordnet-Similarity package [17] to evaluate the similarity between two concepts in Word-net. Intuitively, the relatedness between two concepts in Wordnet is captured by the information content of their lowest common subsumer (lcs) and the information content of the two concepts themselves. This can be viewed as taking the information content of the intersection, which can be formalized as ollows:

Relatednessl in (c1, c2) = 2 x IC(lcs(c1,c2)) / IC(c1) + IC(c2)

This measure depends upon the corpus to estimate information content. Information content of a concept is estimated by counting the frequency of that concept in a large corpus and thereby determining its probability via a maximum likelihood estimate. We carried out the experiments using the British National Corpus (BNC) as the source of information content.



References

  • 1. Defense Advanced Research Projects Agency.: Proceedings of the Sixth Message Understanding Conference (MUC-6). Morgan Kaufmann Publishers, Inc.(1995)
  • 2. Mary Elaine Califf and Raymond J.Mooney.: Relational Learning of Pattern-Match Rules for Information Extraction, AAAI(1999)
  • 3. Sergey Brin.: Extracting patterns and relations from world wide web. In: Proceedings of WebDB Workshop at 6th International Conference on Extending Database Technology, pages 172-183 (1998)
  • 4. Kiyoshi Sudo, Satoshi Sekine and Ralph Grishman.: An Improved Extraction Pattern Representation Model for Automatic IE Pattern Acquisition. Proceedings of ACL, Sapporo, Japan.(2003)
  • 5. Yangarber,R., R.Grishman, P.Tapanainen, and S.Huttunen.: Unsupervised discovery of scenario-level patterns for information extraction. In: Proceedings of the Applied Natural Language Processing Conference, Seattle, WA (2000)
  • 6. Eugene Agichtein and Luis Gravano.: Snowball: Extracting Relations from large Plain-Text Collections. In: Proceedings of the 5th ACM International Conference on Digital Libraries (2000)
  • 7. Takaaki Hasegawa, Satoshi Sekine and Ralph Grishman.: Discovering Relations among Named Entities from Large Corpora. Proceeding of Conference ACL, Barcelona, Spain (2004)
  • 8. Dmitry Zelenko, Chinatsu Aone and Anthony Richardella.: Kernel Methods for Relation Extraction. Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Philadelphia (2002)
  • 9. S.Soderland.: Learning information extraction rules for semi-structured and free text. Machine Learning, 31(1-3):233-272 (1999)
  • 10. Lange,T., Braun,M.,Roth, V., and Buhmann,J.M..: Stability-based Model Selection. Advances in Neural Information Processing Systems 15 (2002)
  • 11. Levine,E. and Domany,E..: Resampling Method for Unsupervised Estimation of Cluster Calidity. Neural Computation, Vol.13, 2573-2593 (2001)
  • 12. Zhengyu Niu, Donghong Ji and Chew Lim Tan.: Document Clustering Based on Cluster Validation. CIKM'04, Washington, DC, USA, November 8-13 (2004)
  • 13. Volker Roth and Tilman Lange.: Feature Selection in Clustering Problems. NIPS2003 workshop (2003)
  • 14. Gabriel Pui Cheong Fung, Je®rey Xu Yu and Hongjun Lu.: Discriminative Category Matching: E±cient Text Classi¯cation for Huge Document Collections. Proceedings of the IEEE International Conference on Data Mining (ICDM), Maebashi City, Japan, December 09-12 (2002)
  • 15. D.Lin.: Using syntactic dependency as a local context to resolve word sense ambiguity. In: Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics, pages 64-71,Madrid, July (1997)
  • 16. P.Resnik.: Using information content to evaluate semantic similarity in a taxonomy. In: Proceedings of the 14th International Joint Conference on Arti¯cial Intelligence, Montreal, August (1995)
  • 17. Ted Pedersen, Siddharth Patwardhan and Jason Michelizzi.: WordNet::Similarity-Measuring the Relatedness of Concepts. AAAI (2004),


 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2005 AutoRelExtWModelOrderSelAndDiscrLabelIdChen Jinxiu
Ji Donghong
Tan Chew Lim
Niu Zhengyu
Automatic Relation Extraction with Model Order Selection and Discriminative Label IdentificationProceedings of 2nd International Joint Conference on Natural Language Processinghttp://www.comp.nus.edu.sg/~tancl/Papers/IJCNLP05/Jinxiu-IJCNLP2005-1.pdf2005