2007 AutonomouslySemantifyingWikipedia

(Wu & Weld, 2007) ⇒ Fei Wu, and Daniel S. Weld. (2007). “Autonomously Semantifying Wikipedia.” In: Proceedings of the Sixteenth ACM Conference on Information and Knowledge Management (CIKM 2007). doi:10.1145/1321440.1321449

Subject Headings: Information Extraction, Semantic Web, Wikipedia.

Notes

Cited By

~128 http://scholar.google.com/scholar?q=%22Autonomously+Semantifying+Wikipedia%22+2007

2009

(Mintz et al., 2009) ⇒ Mike Mintz, Steven Bills, Rion Snow, and Dan Jurafsky. (2009). “Distant Supervision for Relation Extraction without Labeled Data.” In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP (ACL 2009).
- QUOTE: Perhaps most similar to our distant supervision algorithm is the effective method of Wu and Weld (2007) who extract relations from a Wikipedia page by using supervision from the page’s infobox. Unlike their corpus-specific method, which is specific to a (single) Wikipedia page, our algorithm allows us to extract evidence for a relation from many different documents, and from any genre.

Quotes

Abstract

Berners-Lee's compelling vision of a Semantic Web is hindered by a chicken-and-egg problem, which can be best solved by a bootstrapping method - creating enough structured data to motivate the development of applications. This paper argues that autonomously "Semantifying Wikipedia" is the best way to solve the problem. We choose Wikipedia as an initial data source, because it is comprehensive, not too large, high-quality, and contains enough manually-derived structure to bootstrap an autonomous, self-supervised process. We identify several types of structures which can be automatically enhanced in Wikipedia (e.g., link structure, taxonomic data, infoboxes, etc.), and we describea prototype implementation of a self-supervised, machine learning system which realizes our vision. Preliminary experiments demonstrate the high precision of our system's extracted data - in one case equaling that of humans.

References

http://opennlp.sourceforge.net/.
Sisay Fissaha Adafre, Maarten de Rijke\n, Discovering missing links in Wikipedia, Proceedings of the 3rd international workshop on Link discovery, p.90-97, August 21-25, 2005, Chicago, Illinois doi:10.1145/1134271.1134284
S. Auer and J. Lehmann. What have Innsbruck and Leipzig in common? Extracting semantics from wiki content. In ESWC, 2007.
Michele Banko, M. J. Cafarella, S. Soderland, M. Broadhead, and Oren Etzioni. Open information extraction from the Web. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence, 2007.
T. Berners-Lee, J. Hendler, and O. Lassila. The Semantic Web. Scientific American, May 2001.
Leo Breiman, Bagging predictors, Machine Learning, v.24 n.2, p.123-140, Aug. 1996 doi:10.1023/A:1018054314350
Eric Brill, Susan Dumais, Michele Banko, An analysis of the AskMSR question-answering system, Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing, p.257-264, July 06, 2002 doi:10.3115/1118693.1118726.
Charles L. A. Clarke, Gordon V. Cormack, Thomas R. Lynam, Exploiting redundancy in question answering, Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, p.358-365, September 2001, New Orleans, Louisiana, United States doi:10.1145/383952.384024
R. de Salvo Braz, R. Girju, V. Punyakanok, D. Roth, and M. Sammons. An inference model for semantic entailment in natural language. In National Conference on Artificial Intelligence (AAAI), pages 1678--1679, 2005.
Stephen Dill, Nadav Eiron, David Gibson, Daniel Gruhl, Ramanathan V. Guha, Anant Jhingran, Tapas Kanungo, Sridhar Rajagopalan, Andrew Tomkins, John A. Tomlin, Jason Y. Zien, SemTag and seeker: bootstrapping the semantic web via automated semantic annotation, Proceedings of the 12th International Conference on World Wide Web, May 20-24, 2003, Budapest, Hungary doi:10.1145/775152.775178
AnHai Doan, Alon Y. Halevy, Semantic-integration research in the database community, AI Magazine, v.26 n.1, p.83-94, March 2005
D. Downey, Oren Etzioni, and S. Soderland. A probabilistic model of redundancy in information extraction. In: Proceedingss. of IJCAI 2005, 2005..
Susan Dumais, Michele Banko, Eric Brill, Jimmy Lin, Andrew Ng , Web question answering: is more always better?, Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, August 11-15, 2002, Tampere, Finland doi:10.1145/564376.564428
Oren Etzioni, Michael Cafarella, Doug Downey, Ana-Maria Popescu, Tal Shaked, Stephen Soderland, Daniel S. Weld, Alexander Yates, Unsupervised named-entity extraction from the web: an experimental study, Artificial Intelligence, v.165 n.1, p.91-134, June 2005 doi:10.1016/j.artint.2005.03.001
E. Gabrilovich and S. Markovitch. Overcoming the brittleness bottleneck using wikipedia: Enhancing text categorization with encyclopedic knowledge. In: Proceedings of the 21st National Conference on Artificial Intelligence, pages 1301--1306, 2006.
E. Gabrilovich and S. Markovitch. Computing semantic relatedness using wikipedia-based explicit semantic analysis. In: Proceedings of The 20th International Joint Conference on Artificial Intelligence, Hyderabad, India, January 2007.
A. Y. Halevy, Oren Etzioni, A. Doan, Z. G. Ives, J. Madhavan, L. McDowell, and I. Tatarinov. Crossing the structure chasm. In: Proceedings of CIDR, 2003.
Cody Kwok, Oren Etzioni, Daniel S. Weld, Scaling question answering to the web, ACM Transactions on Information Systems (TOIS), v.19 n.3, p.242-262, July 2001 doi:10.1145/502115.502117
John D. Lafferty, Andrew McCallum, Fernando C. N. Pereira, Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data, Proceedings of the Eighteenth International Conference on Machine Learning, p.282-289, June 28-July 01, 2001
B. MacCartney and C. D. Manning. Natural logic for textual inference. In Workshop on Textual Entailment and Paraphrasing, ACL 2007, 2007.
A. K. McCallum. Mallet: A machine learning for language toolkit. In http://mallet.cs.umass.edu, 2002.
Ron Meir, Gunnar Rätsch, An introduction to boosting and leveraging, Advanced lectures on machine learning, Springer-Verlag New York, Inc., New York, NY, 2003
D. P. Nguyen, Y. Matsuo, and M. Ishizuka. Exploiting syntactic and semantic information for relation extraction from wikipedia. In IJCAI07-TextLinkWS, 2007.
K. Nigam, J. Lafferty, and A. McCallum. Using maximum entropy for text classification. In: Proceedings of the IJCAI-99 Workshop on Machine Learning for Information Filtering, 1999.
D. Opitz and R. Maclin. Popular ensemble methods: An empirical study. Journal of Artificial Intelligence Research, pages 169--198, 1999.
S. P. Ponzetto and M. Strube. Deriving a large scale taxonomy from wikipedia. In: Proceedings of the 22st National Conference on Artificial Intelligence, pages 1440--1445, 2007.
E. Riloff and J. Shepherd. A corpus-based approach for building semantic lexicons. In: Proceedings of the Second Conference on Empirical Methods in Natural Language Processing, pages 117--124, Providence, RI, 1997.
Fabian M. Suchanek, Gjergji Kasneci, Gerhard Weikum, Yago: a core of semantic knowledge, Proceedings of the 16th International Conference on World Wide Web, May 08-12, 2007, Banff, Alberta, Canada doi:10.1145/1242572.1242667.
Max Völkel, Markus Krötzsch, Denny Vrandecic, Heiko Haller, Rudi Studer, Semantic Wikipedia, Proceedings of the 15th International Conference on World Wide Web, May 23-26, 2006, Edinburgh, Scotland doi:10.1145/1135777.1135863
W. Wu, A. Doan, C. Yu, and W. Meng. Bootstrapping domain ontology for Semantic Web services from source web sites. In: Proceedings of the VLDB-05 Workshop on Technologies for E-Services, 2005.

,

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2007 AutonomouslySemantifyingWikipedia	Fei Wu Daniel S. Weld			Autonomously Semantifying Wikipedia		Proceedings of the Sixteenth ACM Conference on Information and Knowledge Management	http://turing.cs.washington.edu/papers/cikm07.pdf	10.1145/1321440.1321449		2007