2007 DerivingALargeTaxFromWikipedia

Jump to navigation Jump to search

Subject Headings: Ontology Structure Learning, Wikipedia Category Network.


Cited By

  • ~148 …



We take the category system in Wikipedia as a conceptual network. We label the semantic relations between categories using methods based on connectivity in the network and lexico-syntactic matching. As a result we are able to derive a large scale taxonomy containing a large amount of subsumption, i.e. isa, relations. We evaluate the quality of the created resource by comparing it with ResearchCyc, one of the largest manually annotated ontologies, as well as computing semantic similarity between words in benchmarking datasets.

1. Introduction

The availability of large coverage, machine readable knowledge is a crucial theme for Artificial Intelligence. While advances towards robust statistical inference methods (cf. e.g. Domingos et al. (2006) and Punyakanok et al. (2006)) will certainly improve the computational modeling of intelligence, we believe that crucial advances will also come from rediscovering the deployment of large knowledge bases.

Creating knowledge bases, however, is expensive and they are time-consuming to maintain. In addition, most of the existing knowledge bases are domain dependent or have a limited and arbitrary coverage – Cyc (Lenat & Guha, 1990) and WordNet (Fellbaum, 1998) being notable exceptions. The field of ontology learning deals with these problems by taking textual input and transforming it into a taxonomy or a proper ontology. However, the learned ontologies are small and mostly domain dependent, and evaluations have revealed a rather poor performance (see Buitelaar et al. (2005) for an extensive overview).

We try to overcome such problems by relying on a wide coverage online encyclopedia developed by a large number of users, namely Wikipedia. We use semi-structured input by taking the category system in Wikipedia as a conceptual network. This provides us with pairs of related concepts whose semantic relation is unspecified. The task of creating a subsumption hierarchy then boils down to distinguish between isa and notisa relations. We use methods based on connectivity in the network and lexico-syntactic patterns to label the relations between categories. As a result we are able to derive a large scale taxonomy.


We described the automatic creation of a large scale domain independent taxonomy. We took Wikipedia’s categories as concepts in a semantic network and labeled the relations between these concepts as isa and notisa relations by using methods based on the connectivity of the network and on applying lexico-syntactic patterns to very large corpora. Both connectivity-based methods and lexico-syntactic patterns ensure a high recall while decreasing the precision. We compared the created taxonomy with ResearchCyc and via semantic similarity measures with WordNet. Our Wikipedia-based taxonomy proved to be competitive with the two arguably largest and best developed existing ontologies. We believe that these results are caused by taking already structured and well-maintained knowledge as input.

Our work on deriving a taxonomy is the first step in creating a fully-fledged ontology based on Wikipedia. This will require to label the generic notisa relations with particular ones such as has-part, has-attribute, etc.


  • Berland, M. & Eugene Charniak (1999). Finding parts in very large corpora. In: Proceedings of ACL-99, pp. 57–64.
  • Berners-Lee, T., J. Hendler & O. Lassila (2001). The semantic web. Scientific American, 284(5):34–43.
  • Brants, T. (2000). TnT – A statistical Part-of-Speech tagger. In: Proceedings of ANLP-00, pp. 224–231.
  • Brin, S. & L. Page (1998). The anatomy of a large-scale hypertextual web search engine. Computer Networks and ISDN Systems, 30(1–7):107–117.
  • Buitelaar, P., Philipp Cimiano & Bernardo Magnini (Eds.) (2005). Ontology Learning from Text: Methods, Evaluation and Applications. Amsterdam, The Netherlands: IOS Press.
  • Caraballo, S. A. (1999). Automatic construction of a hypernymlabeled noun hierarchy from text. In: Proceedings of ACL-99, pp. 120– 126.
  • Kenneth W. Church & P. Hanks (1990). Word association norms, mutual information, and lexicography. Computational Linguistics, 16(1):22–29.
  • Cimiano, P., A. Pivk, L. Schmidt-Thieme & Steffen Staab (2005). Learning taxonomic relations from heterogenous sources of evidence. In Paul Buitelaar, Philipp Cimiano & Bernardo Magnini (Eds.), Ontology Learning from Text: Methods, Evaluation and Applications, pp. 59–73. Amsterdam, The Netherlands: IOS Press.
  • Michael Collins (1999). Head-Driven Statistical Models for Natural Language Parsing., (Ph.D. thesis). University of Pennsylvania.
  • Pedro Domingos, S. Kok, H. Poon, Matthew Richardson & P. Singla (2006). Unifying logical and statistical AI. In: Proceedings of AAAI-06, pp. 2–7.
  • Christiane Fellbaum (Ed.) (1998). WordNet: An Electronic Lexical Database. Cambridge, Mass.: MIT Press.
  • Finkel, J. R., T. Grenager&C. Manning (2005). Incorporating nonlocal information into information extraction systems by Gibbs sampling. In: Proceedings of ACL-05, pp. 363–370.
  • Girju, R., A. Badulescu & Dan Moldovan (2006). Automatic discovery of part-whole relations. Computational Linguistics, 32(1):83–135.
  • Harman, D. & M. Liberman (1993). TIPSTER Complete. LDC93T3A, Philadelphia, Penn.: Linguistic Data Consortium.
  • Hearst, M. A. (1992). Automatic acquisition of hyponyms from large text corpora. In: Proceedings of COLING-92, pp. 539–545.
  • Klein, D. & Christopher D. Manning (2003). Fast exact inference with a factored model for natural language parsing. In S. Becker, S. Thrun, & K. Obermayer (Eds.), Advances in Neural Information Processing Systems 15 (NIPS 2002), pp. 3–10. Cambridge, Mass.: MIT Press.
  • Kudoh, T. & Y. Matsumoto (2000). Use of Support Vector Machines for chunk identification. In: Proceedings of CoNLL-00, pp. 142–144.
  • Leacock, C. & M. Chodorow (1998). Combining local context and WordNet similarity for word sense identification. In C. Fellbaum (Ed.), WordNet. An Electronic Lexical Database, Chp. 11, pp. 265–283. Cambridge, Mass.: MIT Press.
  • Lee, L. (1999). Measures of distributional similarity. In: Proceedings of ACL-99, pp. 25–31.
  • Douglas B. Lenat & R. V. Guha (1990). Building Large Knowledge-based Systems: Representation and Inference in the CYC Project. Reading, Mass.: Addison-Wesley.
  • Matuszek, C., M.Witbrock, R. C. Kahlert, J. Cabral, D. Schneider, P. Shah & D. Lenat (2005). Searching for common sense: Populating Cyc from the web. In: Proceedings of AAAI-05, pp. 1430–1435.
  • McCarthy, J. (1959). Programs with common sense. In: Proceedings of the Teddington Conference on the Mechanization of Thought Processes, pp. 75–91.
  • George A. Miller & W. G. Charles (1991). Contextual correlates of semantic similarity. Language and Cognitive Processes, 6(1):1–28.
  • George A. Miller & F. Hristea (2006). WordNet nouns: Classes and instances. Computational Linguistics, 32(1):1–3.
  • Minnen, G., J. Carroll & D. Pearce (2001). “Applied morphological processing of English.” Natural Language Engineering, 7(3):207–223.
  • Porter, M. (1980). An algorithm for suffix stripping. Program, 14(3):130–137.
  • Punyakanok, V., Dan Roth, W. Yih & D. Zimak (2006). Learning and inference over constrained output. In: Proceedings of IJCAI-05, pp. 1117–1123.
  • Rada, R., H. Mili, E. Bicknell & M. Blettner (1989). Development and application of a metric to semantic nets. IEEE Transactions on Systems, Man and Cybernetics, 19(1):17–30.
  • Philip Resnik (1995). Using information content to evaluate semantic similarity in a taxonomy. In: Proceedings of IJCAI-95, Vol. 1, pp. 448–453.
  • Matthew Richardson & Pedro Domingos (2003). Building large knowledge bases by mass collaboration. In: Proceedings of the 2nd International Conference on Knowledge Capture (K-CAP 2003). Sanibel Island, Fl., October 23–25, 2003, pp. 129–137.
  • Rubenstein, H. & J. Goodenough (1965). Contextual correlates of synonymy. Communications of the ACM, 8(10):627–633.
  • Hinrich Schütze (1998). Automatic word sense discrimination. Computational Linguistics, 24(1):97–123.
  • Snow, R., Daniel Jurafsky & A. Y. Ng (2006). Semantic taxonomy induction from heterogeneous evidence. In: Proceedings of COLINGACL-06, pp. 801–808.
  • Michael Strube & S. P. Ponzetto (2006). WikiRelate! Computing semantic relatedness using Wikipedia. In: Proceedings of AAAI-06, pp.1419–1424.
  • Suchanek, F. M., G. Kasneci & G.Weikum (2007). YAGO: A core of semantic knowledge. In: Proceedings of WWW-07.
  • Weeds, J. & D. Weir (2005). Co-occurrence retrieval: A flexible framework for lexical distributional similarity. Computational Linguistics, 31(4):439–475.
  • Wu, Z. & M. Palmer (1994). Verb semantics and lexical selection. In: Proceedings of ACL-94, pp. 133–138.


 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2007 DerivingALargeTaxFromWikipediaSimone P. Ponzetto
Michael Strube
Deriving a Large Scale Taxonomy from Wikipediahttp://www.eml-research.de/nlp/papers/ponzetto07b.pdf