2007 RelExtrFromWikipediaUSubtreeMining

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Relation Extraction Algorithm, (Wikipedia, 2009).

Notes

Cited By

Quotes

Abstract

The exponential growth and reliability of Wikipedia have made it a promising data source for intelligent systems. The first challenge of Wikipedia is to make the encyclopedia machine-processable. In this study, we address the problem of extracting relations among entities from Wikipedia’s English articles, which in turn can serve for intelligent systems to satisfy users’ information needs. Our proposed method first anchors the appearance of entities in Wikipedia articles using some heuristic rules that are supported by their encyclopedic style. Therefore, it uses neither the Named Entity Recognizer (NER) nor the Coreference Resolution tool, which are sources of errors for relation extraction. It then classifies the relationships among entity pairs using SVM with features extracted from the web structure and subtrees mined from the syntactic structure of text. The innovations behind our work are the following: a) our method makes use of Wikipedia characteristics for entity allocation and entity classification, which are essential for relation extraction; b) our algorithm extracts a core tree, which accurately reflects a relationship between a given entity pair, and subsequently identifies key features with respect to the relationship from the core tree. We demonstrate the effectiveness of our approach through evaluation of manually annotated data from actual Wikipedia articles.


References

  • Eugene Agichtein, and Gravano, L. (2000). Snowball: Extracting relations from large plain-text collections. In: Proceedings of the Fifth ACM International Conference on Digital Libraries.
  • Brin, S. (1998). Extracting patterns and relations from the world wide web. In: Proceedings of the 1998 International Workshop on the Web and Databases, 172–183.
  • Bunescu, R., and Mooney, R. (2006). Extracting relations from text: From word sequences to dependency paths. In Kao, A., and Poteet, S., eds., Text Mining and Natural Language Processing.
  • Cui, H.; Sun, R.; Li, K.; Kan,M.-Y.; and Chua, T.-S. (2005). Question answering passage retrieval using dependency relations. In: Proceedings of SIGIR-2005.
  • Culotta, A., and Sorensen, J. (2004). Dependency tree kernels for relation extraction. In: Proceedings ofACL 2004.
  • Culotta, A.; Andrew McCallum; and Betz, J. (2006). Integrating probabilistic extraction models and data mining to discover relations and patterns in text. In: Proceedings of HLT/NAACL-2006.
  • Evgeniy Gabrilovich, and Markovitch, S. (2006). Overcoming the brittleness bottleneck using wikipedia: Enhancing text categorization with encyclopedic knowledge. In: Proceedings of AAAI-06, 1301–1306.
  • Hsu, C.-W., and Lin, C.-J. (2002). A comparison of methods for multiclass support vector machines. IEEE Transactions on Neural Networks 13.
  • Thorsten Joachims (1999). Making large-scale svm learning practical. In Bernhard Schölkopf; Burges, C.; and Smola, A., eds., Advances in Kernel Methods - Support Vector Learning. MIT-Press.
  • Dekang Lin (1998). Dependency-based evaluation of minipar. In: Proceedings of the Workshop on the Evaluation of Parsing Systems, the First International Conference on Language Resources and Evaluation.
  • Morton, T. (2000). Coreference for nlp applications. In: Proceedings of ACL-2000.
  • Paşca, M.; Dekang Lin; Bigham, J.; Lifchits, A.; and Jain, A. (2006). Organizing and searching the world wide web of facts - step one: the one-million fact extraction challenge. In: Proceedings of AAAI-06.
  • Ravichandran, D., and Eduard Hovy (2002). Learning surface text patterns for a question answering system. In: Proceedings of ACL-2002, 41–47.
  • Soon, W.; Lim, D.; and Ng, H. (2001). A machine learning approach to coreference resolution of noun phrases. Computational Linguistics 27.
  • Michael Strube, and Ponzetto, S. (2006). Wikirelate! computing semantic relatedness using wikipedia. In: Proceedings of AAAI-06, 1419–1424.
  • V¨olkel, M.; Kr¨otzsch, M.; Vrandecic, D.; Haller, H.; and Studer, R. (2006). Semantic wikipedia. In: Proceedings of WWW2006, 585–594.
  • Zaki,M. (2002). Efficiently mining frequent trees in a forest. In: Proceedings of KDD-2002.

,

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2007 RelExtrFromWikipediaUSubtreeMiningDat P.T. Nguyen
Yutaka Matsuo
Mitsuru Ishizuka
Relation Extraction from Wikipedia Using Subtree Mininghttp://www.aaai.org/Library/AAAI/2007/aaai07-224.php