2006 WikiRelate

Jump to navigation Jump to search

Subject Headings: Lexical Semantic Similarity Function, Wikipedia.


Cited By



Wikipedia provides a knowledge base for computing word relatedness in a more structured fashion than a search engine and with more coverage than WordNet. In this work we present experiments on using Wikipedia for computing semantic relatedness and compare it to WordNet on various benchmarking datasets. Existing relatedness measures perform better using Wikipedia than a baseline given by Google counts, and we show that Wikipedia outperforms WordNet when applied to the largest available dataset designed for that purpose. The best results on this dataset are obtained by integrating Google, WordNet and Wikipedia based measures. We also show that including Wikipedia improves the performance of an NLP application processing naturally occurring texts.


  • Ahn, D., Valentin Jijkoun, G. Mishne, K. Müller, M. de Rijke & S. Schlobach (2004). Using Wikipedia at the TREC QA track. In: Proceedings of TREC-13.
  • S. Banerjee, and Ted Pedersen. (2003). “Extended gloss overlap as a measure of semantic relatedness.” In: Proceedings of IJCAI-03, pp. 805– 810.
  • Berger, A., S. A. Della Pietra & V. J. Della Pietra (1996). A maximum entropy approach to natural language processing. Computational Linguistics, 22(1):39–71.
  • Budanitsky, A. & Graeme Hirst (2006). Evaluating WordNet-based measures of semantic distance. Computational Linguistics, 32(1).
  • Bunescu, R. & Marius Paşca (2006). Using encyclopedic knowledge for named entity disambiguation. In: Proceedings of EACL-06, pp. 9–-16.
  • Christiane Fellbaum (Ed.) (1998). WordNet: An Electronic Lexical Database. Cambridge, Mass.: MIT Press.
  • Finkelstein, L., Evgeniy Gabrilovich, Y. Matias, E. Rivlin, Z. Solan, G. Wolfman & E. Ruppin (2002). Placing search in context: The concept revisited. ACM Transactions on Information Systems, 20(1):116–131.
  • Daniel Gildea & Daniel Jurafsky (2002). Automatic labeling of semantic roles. Computational Linguistics, 28(3):245–288.
  • Graeme Hirst & D. St-Onge (1998). Lexical chains as representations of context for the detection and correction of malapropisms. In C. Fellbaum (Ed.), WordNet: An Electronic Lexical Database, pp. 305–332. Cambridge, Mass.: MIT Press.
  • Hsu, C.-W., C.-C. Chang & C.-J. Lin (2006). A Practical Guide to Support Vector Classification. http://www.csie.ntu.edu.tw/ cjlin/papers/guide/guide.pdf.
  • Jarmasz, M. & S. Szpakowicz (2003). Roget's Thesaurus and semantic similarity. In: Proceedings of RANLP-03, pp. 212–219.
  • Kim, S. N. & T. Baldwin (2005). Automatic interpretation of noun compounds using WordNet similarity. In: Proceedings of IJCNLP-05, pp. 945–956.
  • Ron Kohavi & G. H. John (1997). Wrappers for feature subset selection. Artificial Intelligence Journal, 97(1-2):273–324.
  • Thomas K. Landauer & S. T. Dumais (1997). A solution to Plato's problem: The Latent Semantic Analysis theory of the acquisition, induction, and representation of knowledge. Psychological Review, 104:211–240.
  • Leacock, C. & M. Chodorow (1998). Combining local context and WordNet similarity for word sense identification. In C. Fellbaum (Ed.), WordNet. An Electronic Lexical Database, Chp. 11, pp. 265–283. Cambridge, Mass.: MIT Press.
  • Michael E. Lesk. (1986). Automatic sense disambiguation using machine readable dictionaries: How to tell a pine cone from an ice cream cone. In: Proceedings of the 5th Annual Conference on Systems Documentation, Toronto, Ontario, Canada, pp. 24–26.
  • George A. Miller & W. G. Charles (1991). Contextual correlates of semantic similarity. Language and Cognitive Processes, 6(1):1– 28.
  • Mitchell, A., S. Strassel, M. Przybocki, J. Davis, G. Doddington, Ralph Grishman, A. Meyers, A. Brunstain, L. Ferro & B. Sundheim (2003). TIDES Extraction (ACE) 2003 Multilingual Training Data. LDC2004T09, Philadelphia, Penn.: Linguistic Data Consortium.
  • S. Patwardhan, S. Banerjee, and Ted Pedersen. (2005). “SenseRelate::TargetWord – A generalized framework for word sense disambiguation.” In: Proceedings of AAAI-05.
  • Ponzetto, S. P. & M. Strube (2006). Exploiting Semantic Role Labeling, WordNet and Wikipedia for Coreference Resolution. In: Proceedings of HLT-NAACL-06. Rada, R., H. Mili, E. Bicknell & M. Blettner (1989). Development and application of a metric to semantic nets. IEEE Transactions on Systems, Man and Cybernetics, 19(1):17–30.
  • Philip Resnik (1995). Using information content to evaluate semantic similarity in a taxonomy. In: Proceedings of IJCAI-95, Vol. 1, pp. 448–-453.
  • Rubenstein, H. & J. Goodenough (1965). Contextual correlates of synonymy. Communications of the ACM, 8(10):627–633.
  • Seco, N., T. Veale & J. Hayes (2004). An intrinsic information content metric for semantic similarity in WordNet. In: Proceedings of ECAI-04, pp. 1089–1090.
  • Soon,W. M., H. T. Ng & D. C. Y. Lim (2001). A machine learning approach to coreference resolution of noun phrases. Computational Linguistics, 27(4):521–544.
  • Turney, P. (2001). Mining the web for synonyms: PMI-IR versus LSA on TOEFL. In: Proceedings of ECML-01, pp. 491–502.
  • Vladimir N. Vapnik (1995). The Nature of Statistical Learning Theory. Berlin, Germany: Springer-Verlag.
  • Vilain, M., J. Burger, J. Aberdeen, D. Connolly & Lynette Hirschman (1995). A model-theoretic coreference scoring scheme. In: Proceedings of the 6th Message Understanding Conference (MUC-6), pp. 45–52.
  • Wu, Z. & M. Palmer (1994). Verb semantics and lexical selection. In: Proceedings of ACL-94, pp. 133–138.


 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2006 WikiRelateSimone P. Ponzetto
Michael Strube
WikiRelate! Computing Semantic Relatedness Using Wikipediahttp://dit.unitn.it/~p2p/RelatedWork/Matching/aaai06.pdf