2008 AutomaticallyHarvestAndOntologSemRels

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Ontologizing, WordNet.

Notes

Cited By

2008

Quotes

Author Keywords

knowledge acquisition, relation extraction, ontology learning

Abstract

With the advent of the Web and the explosion of available textual data, it is key for modern natural language processing systems to access, represent and reason over large amounts of knowledge in semantic repositories. Separately, the knowledge representation and natural language processing communities have been developing representations/engines for reasoning over knowledge and algorithms for automatically harvesting knowledge from textual data, respectively. There is a pressing need for collaboration between the two communities to provide large-scale robust reasoning capabilities for knowledge rich applications like question answering. In this chapter, we propose one small step by presenting algorithms for harvesting semantic relations from text and then automatically linking the knowledge into existing semantic repositories. Experimental results show better than state of the art performance on both relation harvesting and ontologizing tasks.

1. Introduction

In this chapter, we present algorithms for both extracting semantic relations from textual resources and for linking, or ontologizing, them into a semantic repository.

Knowledge resources can be mainly divided in two types: textual resources and structured resources. textual resources include linguistic text collections, ranging from large generic repositories such as the Web to specific domain texts such as collections of texts or books on specific subjects. These repositories contain a large and ever growing amount of information expressed implicitly in natural language texts. These resources greatly vary in size, from the terabytes of data on theWeb to the kilobytes of textual material in electronic books. Structured resources consist of repositories in which knowledge is explicit and organized in lists or graphs of entities. In contrast with textual resources, structured resources are used to explicitly represent domain and generic knowledge, making their inherent knowledge directly usable in applications. Structured resources vary largely on their degree of internal structuring, and can be accordingly divided in two different classes: semantic repositories and lexical resources. The first class is formed by highly structured resources that usually organize knowledge at a conceptual level (e.g., concepts, relations among concepts, situation types) or at a sense level (word senses and relations among senses). Ontologies such as Mikrokosmos [3,4], DOLCE [5] and SUMO [6], and situation repositories such as FrameNet [7] are good examples of the former, while WordNet [8] is an example of the latter. Lexical resources are less structured resources such as thesauri, lists of facts, lexical relation instances, lists of paraphrases, and other flat lists of lexical objects. These resources usually organize knowledge at a pure lexical level, and are in most cases built by using automatic or semi-automatic techniques.

Two main issues must be addressed in order to use knowledge resources in applications: extract the implicit knowledge in textual resources (knowledge harvesting), and make the knowledge of both textual and structured resource usable (knowledge exploitation).

Regarding knowledge harvesting, harvesting algorithms are used to analyze textual repositories and extract knowledge in the form of lexical resources. NLP researchers have developed many algorithms for mining knowledge from text and theWeb, including facts [9], semantic lexicons [10], concept lists [11], and word similarity lists [12]. Many recent efforts have also focused on extracting binary semantic relations between entities, such as entailments [13], is-a [14], part-of [15], and other relations. Relational knowledge is in fact crucial in many applications. Unfortunately, most relation extraction algorithms suffer from many limitations. First, they require a high degree of supervision. Secondly, they are usually limited in breadth (they cannot be easily applied to different corpus sizes and domains) and generality (they can harvest only specific types of relations).

So far, little attention has been spent on the issue of knowledge exploitation. As Bos [16] outlined, whilst lexical resources are potentially useful, their successful use in applications has been very limited due to a variety of problems. …

2. Relevant Work

In this section, we review previous work in both relational knowledge harvesting and ontologizing.

2.1. Relational Knowledge Harvesting

To date, most research on relation harvesting has focused on is-a and part-of. Approaches fall into two categories: pattern- and clustering-based.

Most common are pattern-based approaches. Hearst [17 pioneered using patterns to extract hyponym (is-a) relations. Manually building three lexico-syntactic patterns, Hearst sketched a bootstrapping algorithm to learn more patterns from instances, which has served as the model for most subsequent pattern-based algorithms.

Berland and Charniank [20] proposed a system for part-of relation extraction, based on the Hearst [17 approach. Seed instances are used to infer linguistic patterns that are used to extract new instances. While this study introduces statistical measures to evaluate instance quality, it remains vulnerable to data sparseness and has the limitation of considering only one-word terms.

Improving upon Berland and Charniank [20], Girju et Al. [15] employ machine learning algorithms and WordNet [8] to disambiguate part-of generic patterns like “X’s Y” and “X of Y”. This study is the first extensive attempt to make use of generic patterns. In order to discard incorrect instances, they learn WordNet-based selectional restrictions, like “X(scene#4)’s Y(movie#1)”. While making huge grounds on improving precision/recall, heavy supervision is required through manual semantic annotations.

Ravichandran and Hovy [14] focus on scaling relation extraction to the Web. A simple and effective algorithm is proposed to infer surface patterns from a small set of instance seeds by extracting substrings relating seeds in corpus sentences. The approach gives good results on specific relations such as birthdates, however it has low precision on generic ones like is-a and part-of. Pantel and et Al. [21] proposed a similar, highly scalable approach, based on an edit-distance technique, to learn lexico-syntactic patterns, showing both good performance and computational efficiency. Espresso uses a similar approach to infer patterns, but we make use of generic patterns and apply refining techniques to deal with a wide variety of relations.

Other pattern-based algorithms have been proposed by Riloff and Shepherd [10], who used a semi-automatic method for discovering similar words using a few seed examples, in KnowItAll [9] that performs large-scale extraction of facts from the Web, by Mann [22] who used part of speech patterns to extract a subset of is-a relations involving proper nouns, by Downey et Al. [23] who formalized the problem of relation extraction in a coherent and effective combinatorial model that is shown to outperform previous probabilistic frameworks, by Snow et Al. [24], and in co-occurrence approaches such as in Roark and Charniak [25]. Ciaramita et al.’s chapter in this book presents a very nice approach to learning structured arbitrary binary semantic relations which is fully unsupervised, domain independent and quite efficient since it ultimately relies on named-entity tagging and dependency parsing which can be both solved in linear time.

Clustering approaches have so far been applied only to is-a extraction. These methods use clustering algorithms to group words according to their meanings in text, label the clusters using its members’ lexical or syntactic dependencies, and then extract an is-a relation between each cluster member and the cluster label. Caraballo [26] proposed the first attempt which used conjunction and apposition features to build noun clusters. Recently, Pantel and Ravichandran [18] extended this approach by making use of all syntactic dependency features for each noun. The advantage of clustering approaches is that they permit algorithms to identify is-a relations that do not explicitly appear in text, however they generally fail to produce coherent clusters from fewer than 100 million words; hence they are unreliable for small corpora.

2.2. Ontologizing Knowledge

Several researchers have worked on ontologizing semantic resources. Most recently, Pantel [19] defined the task of ontologizing a lexical semantic resource as linking its terms to the concepts in aWordNet-like hierarchy. He developed a method to propagate lexical co-occurrence vectors to WordNet synsets, forming ontological co-occurrence vectors. Adopting an extension of the distributional hypothesis [27], the co-occurrence vectors are used to compute the similarity between synset/synset and between lexical term/synset. An unknown term is then attached to the WordNet synset whose co-occurrence vector is most similar to the term’s co-occurrence vector. Though the author suggests a method for attaching more complex lexical structures like binary semantic relations, he focuses only on attaching terms.

Basili et Al. [28] proposed an unsupervised method to infer semantic classes (WordNet synsets) for terms in domain-specific verb relations. These relations, such as (x, EXPAND, y) are first automatically learnt from a corpus. The semantic classes of x and y are then inferred using conceptual density [29], a WordNet-based measure applied to all instantiations of x and y in the corpus. Semantic classes represent possible common generalizations of the verb arguments. At the end of the process, a set of syntactic-semantic patterns are available for each verb, such as:

  • (social_group#1, expand, act#2)
  • (instrumentality#2, expand, act#2)

The method is successful on specific relations with few instances (such as domain verb relations) while its value on generic and frequent relations, such as part-of, was untested. Girju et Al. [15] presented a highly supervised machine learning algorithm to infer semantic constraints on part-of relations, such as (object#1, PART-OF, social_event#1). These constraints are then used as selectional restrictions in harvesting part-of instances from ambiguous lexical patterns, like “X of Y”. The approach shows high performance in terms of precision and recall, but, as the authors acknowledge, it requires large human effort during the training phase.

Others have also made significant additions to WordNet. For example, in eXtended WordNet [30], the glosses in WordNet are enriched by disambiguating the nouns, verbs, adverbs, and adjectives with synsets. Another work has enriched WordNet synsets with topically related words extracted from the Web [31]. Finally, the general task of word sense disambiguation [32] is relevant since there the task is to ontologize each term in a passage into aWordNet-like sense inventory. If we had a large collection of sense-tagged text, then our mining algorithms could directly discover WordNet attachment points at harvest time. However, since there is little high precision sense-tagged corpora, methods are required to ontologize semantic resources without fully disambiguating text.

3. Knowledge Harvesting: The Espresso Algorithm

Espresso is based on the framework adopted by Hearst [17. It is a minimally supervised bootstrapping algorithm that takes as input a few seed instances of a particular relation and iteratively learns surface patterns to extract more instances. The key to Espresso lies in its use of generic patterns, i.e., those broad coverage noisy patterns that extract both many correct and incorrect relation instances. For example, for part-of relations, the pattern “X of Y” extracts many correct relation instances like “wheel of the car” but also many incorrect ones like “house of representatives”.

The key assumption behind Espresso is that in very large corpora, like the Web, correct instances generated by a generic pattern will be instantiated by some reliable patterns, where reliable patterns are patterns that have high precision but often very low recall (e.g., “X consists of Y” for part-of relations). In this section, we describe the overall architecture of Espresso, propose a principled measure of reliability, and give an algorithm for exploiting generic patterns.

3.1 Systems Architecture

3.1.1 Pattern Induction

3.1.2 Pattern Ranking/Selection

3.1.3 Instance Generation

3.1.4 Pattern and Instance Reliability

3.2 Exploiting Generic Terms

4. Ontologizing Semantic Relations

The output of most relation harvesting algorithms, such as Espresso described in Section 3, consists of flat lists of lexical semantic knowledge such as “Italy is-a country” and “orange similar-to blue”. However, using this knowledge beyond simple keyword matching, for example in inferences, requires it to be linked, or ontologized, into semantic repositories such as ontologies or term banks like WordNet.

Given an instance (x; r; y) of a binary relation r between terms x and y, the ontologizing task is to identify the senses of x and y where r holds. In this work, we focus on WordNet 2.0 senses, though any similar term bank would apply.

Let Sx and Sy be the sets of all WordNet senses of x and y. A sense pair, sxy, is defined as any pair of senses of x and y: sxy = {sx; sy} where sx Sx and sy Sy . The set of all sense pairs Sxy consists of all pairings between senses in Sx and Sy. In order to attach a relation instance (x; r; y) into WordNet, one must:

Disambiguate x and y, that is, find the subsets S'x Sx and S'y Sy for which the relation r holds; and

Instantiate the relation in WordNet, using the synsets corresponding to all correct pairings between the senses in S'x and S'y. We denote this set of attachment points as S'xy.

If Sx or Sy is empty, no attachments are produced.

For example, the instance (study, PART-OF, report) is ontologized into WordNet through the senses S'x = {survey#1; study#2} and S0y = {report#1}. The final attachment points S'xy are:

  • (survey#1, PART-OF, report#1)
  • (study#2, PART-OF, report#1)

Unlike common algorithms for word sense disambiguation, here it is important to take into consideration the semantic dependency between the two terms x and y. For example, an entity that is part-of a study has to be some kind of information. This knowledge about mutual selectional preference (the preferred semantic class that fills a certain relation role, as x or y) can be exploited to ontologize the instance.

5. Experimental Results

6. Conclusions

In this chapter, we presented algorithms for both extracting semantic relations from textual resources and for linking, or ontologizing, them into a semantic repository. We proposed a weakly-supervised, general-purpose, and accurate algorithm, called Espresso, for harvesting binary semantic relations from raw text. The main contributions are: i) a method for exploiting generic patterns by filtering incorrect instances using the Web; and ii) a principled measure of pattern and instance reliability enabling the filtering algorithm. We have empirically compared Espresso’s precision and recall with other systems on both a small domain-specific textbook and on a larger corpus of general news, and have extracted several standard and specific semantic relations: is-a, part-of, succession, reaction, and production. Espresso achieves higher and more balanced performance than other state of the art systems. By exploiting generic patterns, system recall substantially increases with little effect on precision.

We then proposed two algorithms for automatically ontologizing binary semantic relations into WordNet: an anchoring approach and a clustering approach. Experiments on the part-of and causation relations showed promising results. Both algorithms outperformed the baseline on F-score. Our best results were on the part-of relation where the clustering approach achieved 13.6% higher F-score than the baseline. The induction of conceptual instances has opened the way for many avenues of future work. We intend to pursue the ideas presented in Section 5.2.4 for using conceptual instances to: i) support knowledge acquisition tools by learning semantic constraints on extracting patterns; ii) support ontology learning from text; and iii) improve word sense disambiguation through selectional restrictions. Also, we will try different similarity score functions for both the clustering and the anchoring approaches, as those surveyed in Corley and Mihalcea [40].

The algorithms described in this chapter may be applied to ontologize many lexical resources of semantic relations, no matter the harvesting algorithm used to mine them. In doing so, we have the potential to quickly enrich our ontologies, like WordNet, thus reducing the knowledge acquisition bottleneck. It is our hope that we will be able to leverage these enriched resources, albeit with some noisy additions, to improve performance on knowledge rich problems such as question answering and information extraction.

References

  • [1] Marius Paşca and Sanda M. Harabagiu. The informative role of wordnet in open-domain question answering. In: Proceedings of the NAACL-2001 Workshop on WordNet and Other Lexical Resources: Applications, Extensions and Customizations, pages 138–143, Pittsburgh, PA, 2001.
  • [2] M. Geffet and I. Dagan. The distributional inclusion hypotheses and lexical entailment. In: Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL-2005), Ann Arbor, MI, 2005.
  • [3] K. Mahesh. Ontology development for machine translation: Ideology and methodology. Rl report mccs- 96-292, New Mexico State University, 1996.
  • [4] K. Mahesh, T. O’Hara and S. Nirenburg. Lexical acquisition with wordnet and the mikrokosmos ontology. In: Proceedings of the COLING/ACL Workshop on Usage of WordNet in Natural Language Processing Systems, Montreal, Canada, 1998.
  • [5] N. Guarino, A. Gangemi, C. Masolo, A. Oltramari, and L. Schneider. Sweetening ontologies with dolce. In: Proceedings of Knowledge Engineering and Knowledge Management. Ontologies and the Semantic Web, 13th International Conference, EKAW 2002, pages 166–181, Siguenza, Spain, 2002.
  • [6] I. Niles and A. Pease. Towards a standard upper ontology. In: Proceedings of the 2nd International Conference on Formal Ontology in Information Systems (FOIS-2001), pages 2–9, Ogunquit, Maine, 2001.
  • [7] C. Baker, C. Fillmore, and J. Lowe. The berkeley framenet project. In: Proceedings of the Joint Conference of the International Committee on Computational Linguistics and the Association for Computational Linguistics (COLING/ACL-98), pages 86–90, Montreal, Canada, 1998.
  • [8] C. Fellbaum. WordNet: An Electronic Lexical Database. MIT Press, 1998.
  • [9] O. Etzioni, M. J. Cafarella, D. Downey, A.-M. Popescu, T. Shaked, S. Soderland, D.S. Weld, and A. Yates. Unsupervised named-entity extraction from the web: An experimental study. Artificial Intelligence, (165(1)):91–134, 2005.
  • [10] E. Riloff and J. Shepherd. A corpus-based approach for building semantic lexicons. In: Proceedings of 2nd Conference on Empirical Methods in Natural Language Processing (EMNLP-2007)), pages 117– 124, Somerset, NJ, 1997.
  • [11] D. Lin and P. Pantel. Concept discovery from text. In: Proceedings of the 20th International Conference on Computational Linguistics (COLING-02), pages 577–583, Taipei, Taiwan, 2002.
  • [12] D. Hindle. Noun classification from predicate-argument structures. In: Proceedings of the 28rd Annual Meeting of the Association for Computational Linguistics (ACL-1990), pages 268–275, Pittsburgh, PA, 1990.
  • [13] H.; I. Dagan; Szpektor, I.; Tanev and B. Coppola. Scaling web-based acquisition of entailment relations. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pages 41–48, Barcelona, Spain, 2004.
  • [14] D. Ravichandran and E. H. Hovy. Learning surface text patterns for a question answering system. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL-2002), pages 41–47, Philadelphia, PA, 2002.
  • [15] R. Girju, A. Badulescu, and Dan Moldovan. Automatic discovery of part-whole relations. Computational Linguistics, (32(1)):83–135, 2006.
  • [16] J. Bos. Invited talk. In 2nd Workshop on Ontology Learning and Population: Bridging the Gap between Text and Knowledge, Sydney, Australia, 2006. Association for Computational Linguistics.
  • [17] (Hearst, 1992) ⇒ M. Hearst. Automatic acquisition of hyponyms from large text corpora. In: Proceedings of the 14th International Conference on Computational Linguistics (COLING-92), pages 539–545, Nantes, France, 1992.
  • [18] P. Pantel and D. Ravichandran. Automatically labeling semantic classes. In: Proceedings of Human Language Technology conference / North American chapter of the Association for Computational Linguistics annual meeting (HLT/NAACL-04), pages 321–328, Boston, MA, 2004.
  • [19] (Pantel, 2005) ⇒ P. Pantel. Inducing ontological co-occurrence vectors. In: Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL-2005), pages 125–132, Ann Arbor, MI, 2005.
  • [20] M. Berland and E. Charniak. Finding parts in very large corpora. In: Proceedings of the 27th Annual Meeting of the Association for Computational Linguistics (ACL-1999), pages 57–64, College Park, MD, 1999.
  • [21] P. Pantel, D. Ravichandran, and E.H. Hovy. Towards terascale knowledge acquisition. In: Proceedings of the 21st International Conference on Computational Linguistics (COLING-04), pages 771–777, Geneva, Switzerland, 2004.
  • [22] G. S. Mann. Fine-grained proper noun ontologies for question answering. In: Proceedings of SemaNet’ 02: Building and Using Semantic Networks, pages 1–7, Taipei, Taiwan, 2002.
  • [23] D. Downey, Oren Etzioni, and S. Soderland. A probabilistic model of redundancy in information extraction. In: Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI-05), pages 1034–1041, Edinburgh, Scotland, 2005.
  • [24] R. Snow, D. Jurafsky, and A. Y. Ng. Learning syntactic patterns for automatic hypernym discovery. In: Proceedings of the 7th Neural Information Processing System Conference (NIPS-05), Vancouver, Canada, 2005.
  • [25] B. Roark and E. Charniak. Noun-phrase co-occurrence statistics for semi-automatic semantic lexicon construction. In: Proceedings of the 15th International Conference on Computational Linguistics (COLING-98), pages 1110–1116, Montreal, Canada, 1998.
  • [26] S. Caraballo. Automatic acquisition of a hypernym-labeled noun hierarchy from text. In: Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics (ACL-1999), pages 57–64, College Park, MD, 1999.
  • [27] Z. Harris. Distributional structure, pages 26–47. New York: Oxford University Press, 1985.
  • [28] R. Basili, M.T. Pazienza, and M. Vindigni. Corpus-driven learning of event recognition rules. In: Proceedings of Workshop on Machine Learning for Information Extraction workshop held in conjunction with the 14th European Conference on Artificial Intelligence (ECAI-00), Berlin, Germany, 2000.
  • [29] E. Agirre and G. Rigau. Word sense disambiguation using conceptual density. In: Proceedings of the 16th International Conference on Computational Linguistics (COLING-96), pages 16–22, Copenhagen, Danmark, 1996.
  • [30] S. Harabagiu, George A. Miller, and Dan Moldovan. Wordnet 2 - a morphologically and semantically enhanced resource. In: Proceedings of SIGLEX-99, pages 1–8, University of Maryland, 1999.
  • [31] E. Agirre, G. Rigau, D. Martinez, and E.H. Hovy. Enriching wordnet concepts with topic signatures. In: Proceedings of the NAACL-2001 Workshop on WordNet and Other Lexical Resources: Applications, Extensions and Customizations, Pittsburgh, PA, 2001.
  • [32] W. Gale, K. Church, and D. Yarowsky. A method for disambiguating word senses in a large corpus. Computers and Humanities, (26):415–439, 1992.
  • [33] J. Justeson and S. Katz. Technical terminology: some linguistic properties and an algorithm for identification in text. Natural Language Engineering, (1):9–27, 1995.
  • [34] T. M. Cover and J. A. Thomas. Elements of Information Theory. John Wiley and Sons, 1991.
  • [35] T. L. Brown, H. E. LeMay, and B.E. Bursten. Chemistry: The Central Science. Prentice Hall, 2003.
  • [36] D. Day, J. Aberdeen, L. Hirschman, R. Kozierok, P. Robinson,, and M. Vilain. Mixed-initiative development of language processing systems. In: Proceedings of Fifth Conference on Applied Natural Language Processing (ANLP-97), pages 348–355, Washington D.C., 1997.
  • [37] S. Siegel and N. J. Castellan Jr. Nonparametric Statistics for the Behavioral Sciences. McGraw-Hill, 1998.
  • [38] M. Winston, R. Chaffin, and D. Hermann. A taxonomy of part-whole relations. Cognitive Science, (11):417–444, 1987.
  • [39] R. Girju. Automatic detection of causal relations for question answering. In: Proceedings of ACL Workshop on Multilingual Summarization and Question Answering, pages 107–114, Sapporo, Japan, 2003.
  • [40] C. Corley and R. Mihalcea. Measuring the semantic similarity of texts. In: Proceedings of the ACL Workshop on Empirical Modelling of Semantic Equivalence and Entailment, pages 13–18, Ann Arbor, MI, 2005.

,

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2008 AutomaticallyHarvestAndOntologSemRelsPatrick Pantel
Marco Pennacchiotti
Automatically Harvesting and Ontologizing Semantic Relationshttp://www.patrickpantel.com/cgi-bin/web/tools/getfile.pl?type=paper&id=2008/olp08.pdf