2003 SemTagAndSeeker

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Semantic Annotation, Large Corpus, Information Extraction, Seeker System, SemTag System.

Notes

Cited By

2011

2004

Quotes

Author Keywords

Large text datasets; Information retrieval; Data mining; Text analytics; Automated semantic tagging

Abstract

This paper describes Seeker, a platform for large-scale text analytics, and SemTag, an application written on the platform to perform automated semantic tagging of large corpora. We apply SemTag to a collection of approximately 264 million web pages, and generate approximately 434 million automatically disambiguated semantic tags, published to the web as a label bureau providing metadata regarding the 434 million annotations. To our knowledge, this is the largest scale semantic tagging effort to date.

We describe the Seeker platform, discuss the architecture of the SemTag application, describe a new disambiguation algorithm specialized to support ontological disambiguation of large-scale data, evaluate the algorithm, and present our final results with information about acquiring and making use of the semantic tags. We argue that automated large-scale semantic tagging of ambiguous content can bootstrap and accelerate the creation of the semantic web.



,

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2003 SemTagAndSeekerRamanathan V. Guha
Stephen Dill
Andrew Tomkins
Nadav Eiron
David Gibson
Daniel Gruhl
Anant Jhingran
Tapas Kanungo
Sridhar Rajagopalan
John A. Tomlin
Jason Y. Zien
SemTag and Seeker: Bootstrapping the semantic web via automated semantic annotationhttp://www2003.org/cdrom/papers/refereed/p831/p831-dill.html10.1145/775152.775178