2005 ExplorVarKnowledgeInRelExtraction

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Relation Recognition from Text Algorithm, ACE Benchmark Task

Notes

Cited By

  • ~40

2006

  • (self) (ZhangZS, 2006) ⇒ M. Zhang, J. Zhang, and J. Su. (2006). “Exploring Syntactic Features for Relation Extraction using a Convolution Tree Kernel.” In: Proceedings of HLT-2006.
    • QUOTE: Zhou et al. (2005). explore various features in relation extraction using SVM. They conduct exhaustive experiments to investigate the incorporation and the individual contribution of diverse features. They report that chunking information contributes to most of the performance improvement from the syntactic aspect. … Zhou et al. (2005). introduce additional chunking features to enhance the parse tree features. However, the hierarchical structured information in the parse trees is not well preserved in their parse treerelated features.

Quotes

Abstract

Extracting semantic relationships between entities is challenging. This paper investigates the incorporation of diverse lexical, syntactic and semantic knowledge in feature-based relation extraction using SVM. Our study illustrates that the base phrase chunking information is very effective for relation extraction and contributes to most of the performance improvement from syntactic aspect while additional information from full parsing gives limited further enhancement. This suggests that most of useful information in full parse trees for relation extraction is shallow and can be captured by chunking. We also demonstrate how semantic information such as WordNet and Name List, can be used in feature-based relation extraction to further improve the performance. Evaluation on the ACE corpus shows that effective incorporation of diverse features enables our system outperform previously best-reported systems on the 24 ACE relation subtypes and significantly outperforms tree kernel-based systems by over 20 in F-measure on the 5 ACE relation types.

Related Work

This paper will further explore the feature-based approach with a systematic study on the extensive incorporation of diverse lexical, syntactic and semantic information. Compared with Kambhatla (2004), we separately incorporate the base phrase chunking information, which contributes to most of the performance improvement from syntactic aspect. We also show how semantic information like WordNet and Name List can be equipped to further improve the performance. Evaluation on the ACE corpus shows that our system outperforms Kambhatla (2004) by about 3 F-measure on extracting 24 ACE relation subtypes. It also shows that our system outperforms tree kernel-based systems (Culotta et al 2004) by over 20 F-measure on extracting 5 ACE relation types.

Discussion and Conclusion

In this paper, we have presented a feature-based approach for relation extraction where diverse lexical, syntactic and semantic knowledge are employed. Instead of exploring the full parse tree information directly as previous related work, we incorporate the base phrase chunking information first. Evaluation on the ACE corpus shows that base phrase chunking contributes to most of the performance improvement from syntactic aspect while further incorporation of the parse tree and dependence tree information only slightly improves the performance. This may be due to three reasons: First, most of relations defined in ACE have two mentions being close to each other. While short-distance relations dominate and can be resolved by simple features such as word and chunking features, the further dependency tree and parse tree features can only take effect in the remaining much less and more difficult long-distance relations. Second, it is well known that full parsing is always prone to long-distance parsing errors although the Collins’ parser used in our system achieves the state-of-the-art performance. Therefore, the state-of-art full parsing still needs to be further enhanced to provide accurate enough information, especially PP (Preposition Phrase) attachment. Last, effective ways need to be explored to incorporate information embedded in the full parse trees. Besides, we also demonstrate how semantic information such as WordNet and Name List, can be used in feature-based relation extraction to further improve the performance.

The effective incorporation of diverse features enables our system outperform previously bestreported systems on the ACE corpus. Although tree kernel-based approaches facilitate the exploration of the implicit feature space with the parse tree structure, yet the current technologies are expected to be further advanced to be effective for relatively complicated relation extraction tasks such as the one defined in ACE where 5 types and 24 subtypes need to be extracted. Evaluation on the ACE RDC task shows that our approach of combining various kinds of evidence can scale better to problems, where we have a lot of relation types with a relatively small amount of annotated data. The experiment result also shows that our feature-based approach outperforms the tree kernel-based approaches by more than 20 F-measure on the extraction of 5 ACE relation types.

In the future work, we will focus on exploring more semantic knowledge in relation extraction, which has not been covered by current research. Moreover, our current work is done when the Entity Detection and Tracking (EDT) has been perfectly done. Therefore, it would be interesting to see how imperfect EDT affects the performance in relation extraction.

References

  • Agichtein E. and Gravano L. (2000). Snowball: Extracting relations from large plain text collections. In: Proceedings of 5th ACM International Conference on Digital Libraries. 4-7 June (2000). San Antonio, TX.
  • Brin S. (1998). Extracting patterns and relations from the World Wide Web. In: Proceedings of WebDB workshop at 6th International Conference on Extending DataBase Technology (EDBT’1998).23-27 March 1998, Valencia, Spain
  • Collins M. (1999). Head-driven statistical models for natural language parsing. Ph.D. Dissertation, University of Pennsylvania.
  • Collins M. and Duffy N. (2002). Covolution kernels for natural language. In Dietterich T.G., Becker S. and Ghahramani Z. editors. Advances in Neural Information Processing Systems 14. Cambridge, MA.
  • Culotta A. and Sorensen J. (2004). Dependency tree kernels for relation extraction. In: Proceedings of 42th Annual Meeting of the Association for Computational Linguistics. 21-26 July (2004). Barcelona, Spain
  • Cumby C.M. and Roth D. (2003). On kernel methods for relation learning. In Fawcett T. and Mishra N. editors. In: Proceedings of 20th International Conference on Machine Learning (ICML’2003). 21-24 Aug (2003). Washington D.C. USA. AAAI Press.
  • Haussler D. (1999). Covention kernels on discrete structures. Technical Report UCS-CRL-99-10. University of California, Santa Cruz.
  • Thorsten Joachims. (1998). Text categorization with Support Vector Machines: Learning with many relevant features. In: Proceedings of European Conference on Machine Learning(ECML’1998). 21-23 April (1998). Chemnitz, Germany
  • Miller G.A. (1990). WordNet: An online lexical database. International Journal of Lexicography. 3(4):235-312.
  • Miller S., Fox H., Ramshaw L. and Weischedel R. (2000). A novel use of statistical parsing to extract information from text. In: Proceedings of 6th Applied Natural Language Processing Conference. 29 April - 4 May 2000, Seattle, USA
  • MUC-7. (1998). Proceedings of the 7th Message Understanding Conference (MUC-7). Morgan Kaufmann, San Mateo, CA.
  • (Kambhatla, 2004) ⇒ Nanda Kambhatla. (2004). Combining lexical, syntactic, and semantic features with maximum entropy models for extracting relations. Poster In: Proceedings of ACL 2004 (ACL 2004.
  • Roth D. and Yih W.T. (2002). Probabilistic reasoning for entities and relation recognition. In: Proceedings of 19th International Conference on Computational Linguistics(CoLING’2002). Taiwan.
  • Vapnik V. (1998). Statistical Learning Theory. Whiley, Chichester, GB.
  • Zelenko D., Aone C. and Richardella. (2003). Kernel methods for relation extraction. Journal of Machine Learning Research. pp1083-1106.
  • Zhang Z. (2004). Weekly-supervised relation classification for Information Extraction. In: Proceedings of ACM 13th Conference on Information and Knowledge Management (CIKM 2004).

,

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2005 ExplorVarKnowledgeInRelExtractionMin Zhang
Jian Su
Jie Zhang
GuoDong Zhou
Exploring Various Knowledge in Relation ExtractionProceedings of ACL Conferencehttp://portal.acm.org/citation.cfm?id=12198932005