2007 AutonomouslySemantifyingWikipedia

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Information Extraction, Semantic Web, Wikipedia.

Notes

Cited By

2009

Quotes

Abstract

Berners-Lee's compelling vision of a Semantic Web is hindered by a chicken-and-egg problem, which can be best solved by a bootstrapping method - creating enough structured data to motivate the development of applications. This paper argues that autonomously "Semantifying Wikipedia" is the best way to solve the problem. We choose Wikipedia as an initial data source, because it is comprehensive, not too large, high-quality, and contains enough manually-derived structure to bootstrap an autonomous, self-supervised process. We identify several types of structures which can be automatically enhanced in Wikipedia (e.g., link structure, taxonomic data, infoboxes, etc.), and we describea prototype implementation of a self-supervised, machine learning system which realizes our vision. Preliminary experiments demonstrate the high precision of our system's extracted data - in one case equaling that of humans.

References

  • http://opennlp.sourceforge.net/.
  • Sisay Fissaha Adafre, Maarten de Rijke\n, Discovering missing links in Wikipedia, Proceedings of the 3rd international workshop on Link discovery, p.90-97, August 21-25, 2005, Chicago, Illinois doi:10.1145/1134271.1134284
  • S. Auer and J. Lehmann. What have Innsbruck and Leipzig in common? Extracting semantics from wiki content. In ESWC, 2007.
  • Michele Banko, M. J. Cafarella, S. Soderland, M. Broadhead, and Oren Etzioni. Open information extraction from the Web. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence, 2007.
  • T. Berners-Lee, J. Hendler, and O. Lassila. The Semantic Web. Scientific American, May 2001.
  • Leo Breiman, Bagging predictors, Machine Learning, v.24 n.2, p.123-140, Aug. 1996 doi:10.1023/A:1018054314350
  • Eric Brill, Susan Dumais, Michele Banko, An analysis of the AskMSR question-answering system, Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing, p.257-264, July 06, 2002 doi:10.3115/1118693.1118726.
  • Charles L. A. Clarke, Gordon V. Cormack, Thomas R. Lynam, Exploiting redundancy in question answering, Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, p.358-365, September 2001, New Orleans, Louisiana, United States doi:10.1145/383952.384024
  • R. de Salvo Braz, R. Girju, V. Punyakanok, D. Roth, and M. Sammons. An inference model for semantic entailment in natural language. In National Conference on Artificial Intelligence (AAAI), pages 1678--1679, 2005.
  • Stephen Dill, Nadav Eiron, David Gibson, Daniel Gruhl, Ramanathan V. Guha, Anant Jhingran, Tapas Kanungo, Sridhar Rajagopalan, Andrew Tomkins, John A. Tomlin, Jason Y. Zien, SemTag and seeker: bootstrapping the semantic web via automated semantic annotation, Proceedings of the 12th International Conference on World Wide Web, May 20-24, 2003, Budapest, Hungary doi:10.1145/775152.775178
  • AnHai Doan, Alon Y. Halevy, Semantic-integration research in the database community, AI Magazine, v.26 n.1, p.83-94, March 2005
  • D. Downey, Oren Etzioni, and S. Soderland. A probabilistic model of redundancy in information extraction. In: Proceedingss. of IJCAI 2005, 2005..
  • Susan Dumais, Michele Banko, Eric Brill, Jimmy Lin, Andrew Ng , Web question answering: is more always better?, Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, August 11-15, 2002, Tampere, Finland doi:10.1145/564376.564428
  • Oren Etzioni, Michael Cafarella, Doug Downey, Ana-Maria Popescu, Tal Shaked, Stephen Soderland, Daniel S. Weld, Alexander Yates, Unsupervised named-entity extraction from the web: an experimental study, Artificial Intelligence, v.165 n.1, p.91-134, June 2005 doi:10.1016/j.artint.2005.03.001
  • E. Gabrilovich and S. Markovitch. Overcoming the brittleness bottleneck using wikipedia: Enhancing text categorization with encyclopedic knowledge. In: Proceedings of the 21st National Conference on Artificial Intelligence, pages 1301--1306, 2006.
  • E. Gabrilovich and S. Markovitch. Computing semantic relatedness using wikipedia-based explicit semantic analysis. In: Proceedings of The 20th International Joint Conference on Artificial Intelligence, Hyderabad, India, January 2007.
  • A. Y. Halevy, Oren Etzioni, A. Doan, Z. G. Ives, J. Madhavan, L. McDowell, and I. Tatarinov. Crossing the structure chasm. In: Proceedings of CIDR, 2003.
  • Cody Kwok, Oren Etzioni, Daniel S. Weld, Scaling question answering to the web, ACM Transactions on Information Systems (TOIS), v.19 n.3, p.242-262, July 2001 doi:10.1145/502115.502117
  • John D. Lafferty, Andrew McCallum, Fernando C. N. Pereira, Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data, Proceedings of the Eighteenth International Conference on Machine Learning, p.282-289, June 28-July 01, 2001
  • B. MacCartney and C. D. Manning. Natural logic for textual inference. In Workshop on Textual Entailment and Paraphrasing, ACL 2007, 2007.
  • A. K. McCallum. Mallet: A machine learning for language toolkit. In http://mallet.cs.umass.edu, 2002.
  • Ron Meir, Gunnar Rätsch, An introduction to boosting and leveraging, Advanced lectures on machine learning, Springer-Verlag New York, Inc., New York, NY, 2003
  • D. P. Nguyen, Y. Matsuo, and M. Ishizuka. Exploiting syntactic and semantic information for relation extraction from wikipedia. In IJCAI07-TextLinkWS, 2007.
  • K. Nigam, J. Lafferty, and A. McCallum. Using maximum entropy for text classification. In: Proceedings of the IJCAI-99 Workshop on Machine Learning for Information Filtering, 1999.
  • D. Opitz and R. Maclin. Popular ensemble methods: An empirical study. Journal of Artificial Intelligence Research, pages 169--198, 1999.
  • S. P. Ponzetto and M. Strube. Deriving a large scale taxonomy from wikipedia. In: Proceedings of the 22st National Conference on Artificial Intelligence, pages 1440--1445, 2007.
  • E. Riloff and J. Shepherd. A corpus-based approach for building semantic lexicons. In: Proceedings of the Second Conference on Empirical Methods in Natural Language Processing, pages 117--124, Providence, RI, 1997.
  • Fabian M. Suchanek, Gjergji Kasneci, Gerhard Weikum, Yago: a core of semantic knowledge, Proceedings of the 16th International Conference on World Wide Web, May 08-12, 2007, Banff, Alberta, Canada doi:10.1145/1242572.1242667.
  • Max Völkel, Markus Krötzsch, Denny Vrandecic, Heiko Haller, Rudi Studer, Semantic Wikipedia, Proceedings of the 15th International Conference on World Wide Web, May 23-26, 2006, Edinburgh, Scotland doi:10.1145/1135777.1135863
  • W. Wu, A. Doan, C. Yu, and W. Meng. Bootstrapping domain ontology for Semantic Web services from source web sites. In: Proceedings of the VLDB-05 Workshop on Technologies for E-Services, 2005.

,

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2007 AutonomouslySemantifyingWikipediaFei Wu
Daniel S. Weld
Autonomously Semantifying WikipediaProceedings of the Sixteenth ACM Conference on Information and Knowledge Managementhttp://turing.cs.washington.edu/papers/cikm07.pdf10.1145/1321440.13214492007