2003 SemTagAndSeeker

(Dill et al., 2003a) ⇒ Stephen Dill, Nadav Eiron, David Gibson, Daniel Gruhl, Ramanathan V. Guha, Anant Jhingran, Tapas Kanungo, Sridhar Rajagopalan, Andrew Tomkins, John A. Tomlin, and Jason Y. Zien. (2003). “SemTag and Seeker: Bootstrapping the semantic web via automated semantic annotation.” In: Proceedings of the 12th International Conference on World Wide Web (WWW 2003). doi:10.1145/775152.775178

Subject Headings: Semantic Annotation, Large Corpus, Information Extraction, Seeker System, SemTag System.

Notes

It references (Erdmann et al., 2000)
It was republished as a nearly identical journal paper
- (Dill et al., 2003) ⇒ Stephen Dill, Nadav Eiron, David Gibson, Daniel Gruhl, R. Guha, Anant Jhingran, Tapas Kanungo, Kevin S. McCurley, Sridhar Rajagopalan, Andrew Tomkins, John A. Tomlin, and Jason Y. Zien. (2003). “A Case for Automated Large Scale Semantic Annotation.” In: Journal of Web Semantics, 1(1). doi:10.1016/j.websem.2003.07.006
- Cited by ~109 http://scholar.google.com/scholar?q=%22A+case+for+automated+large-scale+semantic+annotation%22+2003

Cited By

~381 http://scholar.google.com/scholar?q=%22SemTag+and+Seeker%3A+Bootstrapping+the+semantic+web+via+automated+semantic+annotation%22+2003

2011

(Gómez-Berbís, 2011) ⇒ Juan Miguel Gómez-Berbís, Ricardo Colomo-Palacios, José Luis López-Cuadrado, Israel González-Carrasco and Ángel García-Crespo. (2011). “SEAN: Multi-ontology semantic annotation for highly accurate closed domains.” In: International Journal of the Physical Sciences, 6(6).

2004

(Ferrucci & Lally, 2004) ⇒ David Ferrucci, and Adam Lally. (2004). “UIMA: An architectural approach to unstructured information processing in the corporate research environment.” In: Journal of Natural Language Engineering, 10(3-4). doi:10.1017/S1351324904003523
- QUOTE: A large project underway at IBM is based on providing a large-scale capability for mining the web for various types of semantic content. Some of the results based on this capability have been recently published (Dill, Eiron, Gibson, Gruhl, Guha, Jhingran, Kanungo, Rajagopalan, Tomkins, Tomlin and Zien 2003). Much of the implementation predates UIMA; however, the project is adopting UIMA’s analysis engine architecture for creating and deploying analysis capabilities. In addition UIMA interfaces are being adopted and this project is contributing robust implementations of framework components including the document and collection metadata store.

Quotes

Author Keywords

Large text datasets; Information retrieval; Data mining; Text analytics; Automated semantic tagging

Abstract

This paper describes Seeker, a platform for large-scale text analytics, and SemTag, an application written on the platform to perform automated semantic tagging of large corpora. We apply SemTag to a collection of approximately 264 million web pages, and generate approximately 434 million automatically disambiguated semantic tags, published to the web as a label bureau providing metadata regarding the 434 million annotations. To our knowledge, this is the largest scale semantic tagging effort to date.

We describe the Seeker platform, discuss the architecture of the SemTag application, describe a new disambiguation algorithm specialized to support ontological disambiguation of large-scale data, evaluate the algorithm, and present our final results with information about acquiring and making use of the semantic tags. We argue that automated large-scale semantic tagging of ambiguous content can bootstrap and accelerate the creation of the semantic web.

,

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2003 SemTagAndSeeker	Ramanathan V. Guha Stephen Dill Andrew Tomkins Nadav Eiron David Gibson Daniel Gruhl Anant Jhingran Tapas Kanungo Sridhar Rajagopalan John A. Tomlin Jason Y. Zien			SemTag and Seeker: Bootstrapping the semantic web via automated semantic annotation			http://www2003.org/cdrom/papers/refereed/p831/p831-dill.html	10.1145/775152.775178