2008 MiningRelationalDataFromText

(Zhang, 2008) ⇒ Zhu Zhang. (2008). “Mining relational data from text: From strictly supervised to weakly supervised learning.” Information Systems. Volume 33, Issue 3. May 2008.

Subject Headings: Relation Extraction Algorithm

Notes

Cited By

Quotes

Abstract

This paper approaches the relation classification problem in information extraction framework with different machine learning strategies, from strictly supervised to weakly supervised. A number of learning algorithms are presented and empirically evaluated on a standard data set. We show that a supervised SVM classifier using various lexical and syntactic features can achieve competitive classification accuracy. Furthermore, a variety of weakly supervised learning algorithms can be applied to take advantage of large amount of unlabeled data when labeling is expensive. Newly introduced random-subspace-based algorithms demonstrate their empirical advantage over competitors in the context of both active learning and bootstrapping.

3. Problem definition

The research problem of this paper is a classification of relations between

entities that co-occur in the same linguistic context. From a database perspective, the task is to determine the appropriate relational table into which one should put a given pair of related entities. To be more precise,

- We only focus on binary relations, i.e., ones between pairs of entities.
- We only deal with intra-sentence explicit relations in this study. In other words, the two entity arguments of a relation must occur within

a common syntactic construction, in this case a sentence. The relations also have to be “explicit” in the sense that they should have explicit textual support and do not require further reasoning based on understanding of the context's meaning.

- The goal is to classify the type of relation between two entities (or, in

other words, to put the entity pair into the correct relational table), given that they are known to be related.

- It is also assumed that entity recognition already takes place beforehand,

hence all entity-related information is available. Typical entity types defined by ACE include person, organization, location, facility, and geo-political entity (GPEs).

References

[1] H. Cunningham, D. Maynard, K. Bontcheva, V. Tablan, GATE: A framework and graphical development environment for robust NLP tools and applications, in: Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics, 2002.
[2] G. Zhou, J. Su, J. Zhang and M. Zhang, Exploring various knowledge in relation extraction, Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05), Association for Computational Linguistics, Ann Arbor, Michigan (2005), pp. 427–434.
[3] D. Zelenko, C. Aone and A. Richardella, Kernel methods for relation extraction, J. Mach. Learn. Res. 3 (2003), pp. 1083–1106. Full Text via CrossRef | View Record in Scopus | Cited By in Scopus (61)
[4] A. Culotta, J. Sorensen, Dependency tree kernels for relation extraction, in: ACL, 2004, pp. 423–429.
[5] S. Zhao and R. Grishman, Extracting relations with integrated information using kernel methods, Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05), Association for Computational Linguistics, Ann Arbor, Michigan (2005), pp. 419–426.
[6] B. Rosario, Marti Hearst, Classifying semantic relations in bioscience text, in: Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics, 2004.
[7] (McDonald et al., 2005) ⇒ Ryan T. McDonald, Fernando C. N. Pereira, Seth Kulick, R. Scott Winters, Yang Jin, Peter S. White. (2005). “Simple Algorithms for Complex Relation Extraction with Applications to Biomedical IE.” In: ACL-2005.
[8] S. Brin, Extracting patterns and relations from the world wide web, in: WebDB Workshop at sixth International Conference on Extending Database Technology, EDBT’98, 1998.
[9] Eugene Agichtein, L. Gravano, Snowball: Extracting relations from large plain-text collections, in: Proceedings of the Fifth ACM International Conference on Digital Libraries, 2000.
[10] Michele Banko, M.J. Cafarella, S. Soderland, M. Broadhead, Oren Etzioni, Open information extraction from the web, in: IJCAI, 2007, pp. 2670–2676.
[11] S. Sekine, On-demand Information Extraction, Proceedings of the COLING/ACL on Main Conference Poster Sessions, Association for Computational Linguistics, Morristown, NJ, USA (2006), pp. 731–738.
[12] N. Cristianini and John Shawe Taylor, An Introduction to Support Vector Machines: And Other Kernel-based Learning Methods, Cambridge University Press, Cambridge (2000).
[13] Vladimir N. Vapnik, The Nature of Statistical Learning Theory, Springer, New York (1995).
[14] Thorsten Joachims, Text categorization with support vector machines: learning with many relevant features, in: C. Nédellec, C. Rouveirol (Eds.), Proceedings of ECML-98, 10th European Conference on Machine Learning, vol. 1398, Chemnitz, DE, Springer Verlag, Heidelberg DE, 1998, pp. 137–142.
[15] Taku Kudo, Y. Matsumoto, Chunking with support vector machines, in: Proceedings of NAACL, 2001, pp. 192–199.
[16] H. Lodhi, C. Saunders, John Shawe Taylor, N. Cristianini and C. Watkins, Text classification using string kernels, J. Mach. Learn. Res. 2 (2001), pp. 419–444.
[17] N. Cancedda, E. Gaussier, C. Goutte and J.M. Renders, Word sequence kernels, J. Mach. Learn. Res. 3 (2003), pp. 1059–1082. Full Text via CrossRef | View Record in Scopus | Cited By in Scopus (33)
[18] L. Breiman, Bagging predictors, Mach. Learn. 24 (2) (1996), pp. 123–140. Full Text via CrossRef | View Record in Scopus | Cited By in Scopus (2252)
[19] Yoav Freund and Robert E. Schapire, A decision-theoretic generalization of on-line learning and an application to boosting, J. Comput. System Sci. 55 (1) (1997), pp. 119–139. Abstract | PDF (627 K) | MathSciNet | View Record in Scopus | Cited By in Scopus (1291)
[20] Michele Banko, Eric D. Brill, Scaling to very very large corpora for natural language disambiguation, in: Meeting of the Association for Computational Linguistics, 2001, pp. 26–33.
[21] V. Ng, C. Cardie, Weakly supervised natural language learning without redundant views, in: HLT-NAACL 2003: Proceedings of the Main Conference, 2003, pp. 173–180.
[22] T.K. Ho, The random subspace method for constructing decision forests, IEEE Trans. Pattern Anal. Mach. Intell. 20 (8) (1998), pp. 832–844. View Record in Scopus | Cited By in Scopus (379)
[23] Leo Breiman, Random forests, Mach. Learn. 45 (1) (2001), pp. 5–32. Full Text via CrossRef | View Record in Scopus | Cited By in Scopus (938)
[24] D. Cohn, L. Atlas and R. Ladner, Improving generalization with active learning, Mach. Learn. 15 (2) (1994), pp. 201–221. Full Text via CrossRef | View Record in Scopus | Cited By in Scopus (163)
[25] C.A. Thompson, M.E. Califf, R.J. Mooney, Active learning for natural language parsing and information extraction, in: Proceedings of 16th International Conference on Machine Learning, Morgan Kaufmann, San Francisco CA, 1999, pp. 406–414.
[26] D.D. Lewis, J. Catlett, Heterogeneous uncertainty sampling for supervised learning. in: W.W. Cohen, H. Hirsh (Eds.), Proceedings of ICML-94, 11th International Conference on Machine Learning, New Brunswick, US, Morgan Kaufmann Publishers, San Francisco, USA, 1994, pp. 148–156.
[27] D.D. Lewis, W.A. Gale, A sequential algorithm for training text classifiers, in: Proceedings of the 17th ACM SIGIR Conference on Research and development in information retrieval, Springer-Verlag, New York, Inc., 1994, pp. 3–12.
[28] Yoav Freund, H.S. Seung, E. Shamir and N. Tishby, Selective sampling using the query by committee algorithm, Mach. Learn. 28 (2–3) (1997), pp. 133–168. Full Text via CrossRef | View Record in Scopus | Cited By in Scopus (130)
[29] S. Argamon-Engelson and I. Dagan, Committee-based sample selection for probabilistic classifiers, J. Artif. Intell. Res. 11 (1999), pp. 335–360. View Record in Scopus | Cited By in Scopus (23)
[30] S. Tong and Daphne Koller, Support vector machine active learning with applications to text classification, J. Mach. Learn. Res. 2 (2001), pp. 45–66. Full Text via CrossRef
[31] A. Fujii, K. Inui, T. Tokunaga and H. Tanaka, Selective sampling for example-based word sense disambiguation, Comput. Linguistics 24 (4) (1998), pp. 573–598.
[32] G. Ngai and David Yarowsky, Rule writing or annotation: Cost-efficient resource usage for base noun phrase chunking, Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics, Hong Kong, China (2000), pp. 117–125.
[33] M. Sassano, An empirical study of active learning with support vector machines for japanese word segmentation, in: Proceedings of ACL-02, 40th Annual Meeting of the Association for Computational Linguistics, 2002.
[34] R. Hwa, Sample selection for statistical grammar induction, Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Hong Kong, China (2000), pp. 45–52.
[35] J. Baldridge, M. Osborne, Active learning for HPSG parse selection, in: Proceedings of the seventh Conference on Natural Language Learning, 2003.
[36] M. Tang, X. Luo, S. Roukos, Active learning for statistical natural language parsing, in: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), 2002, pp. 120–127.
[37] X. Zhu, Semi-supervised learning literature survey. Technical Report 1530, Computer Sciences, University of Wisconsin-Madison. left angle bracket http://www.cs.wisc.edu/ not, vert, similar jerryzhu/pub/ssl_survey.pdf right-pointing angle bracket .
[38] David Yarowsky, Unsupervised word sense disambiguation rivaling supervised methods, in: Meeting of the Association for Computational Linguistics,1995, pp. 189–196.
[39] Steven P. Abney, Understanding the Yarowsky algorithm, Comput. Linguistics 30 (3) (2004).
[40] A. Blum and Tom M. Mitchell, Combining Labeled and Unlabeled Data with Co-training, COLT: Proceedings of the Workshop on Computational Learning Theory, Morgan Kaufmann Publishers, Los Altos, CA (1998).
[41] M. Collins, Y. Singer, Unsupervised models for named entity classification, in: EMNLP/VLC-99, 1999.
[42] S. Goldman and Y. Zhou, Enhancing supervised learning with unlabeled data, Proceedings of the 17th International Conference on Machine Learning, Morgan Kaufmann, San Francisco, CA (2000), pp. 327–334.
[43] I. Muslea, S. Minton and C.A. Knoblock, Selective sampling with redundant views, Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence, AAAI Press/The MIT Press, New York, Cambridge (2000), pp. 621–626.
[44] Eugene Charniak, A maximum-entropy-inspired parser, Technical Report CS-99-12, Computer Scicence Department, Brown University, 1999.
[45] C.-C. Chang, C.-J. Lin, LIBSVM: a library for support vector machines. Software available at, left angle bracket http://www.csie.ntu.edu.tw/ not, vert, similar cjlin/libsvm right-pointing angle bracket, 2001.
[46] Steven P. Abney, Bootstrapping, in: Proceedings of ACL-02, 40th Annual Meeting of the Association for Computational Linguistics, 2002.
[47] M.-F. Balcan, A. Blum and K. Yang, Co-training and expansion: towards bridging theory and practice. In: L.K. Saul, Yair Weiss and L. Bottou, Editors, Advances in Neural Information Processing Systems vol. 17, MIT Press, Cambridge, MA (2005), pp. 489–496.

,

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2008 MiningRelationalDataFromText	Zhu Zhang			Mining relational data from text: From strictly supervised to weakly supervised learning			http://dx.doi.org/10.1016/j.is.2007.10.002	10.1016/j.is.2007.10.002		2008