1998 EnhancedHypertextCategorizationUsingHyperlinks

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Local Collective Classification Algorithm.

Notes

Cited By

2005

Quotes

Abstract

A major challenge in indexing unstructured hypertext databases is to automatically extract meta-data that enables structured search using topic taxonomies, circumvents keyword ambiguity, and improves the quality of search and profile-based routing and filtering. Therefore, an accurate classifier is an essential component of a hypertext database. Hyperlinks pose new problems not addressed in the extensive text classification literature. Links clearly contain high-quality semantic clues that are lost upon a purely term-based classifier, but exploiting link information is non-trivial because it is noisy. Naive use of terms in the link neighborhood of a document can even degrade accuracy. Our contribution is to propose robust statistical models and a relaxation labeling technique for better classification by exploiting link information in a small neighborhood around documents. Our technique also adapts gracefully to the fraction of neighboring documents having known topics. We experimented with pre-classified samples from Yahoo!1 and the US Patent Database2. In previous work, we developed a text classifier that misclassified only 13% of the documents in the well-known Reuters benchmark; this was comparable to the best results ever obtained. This classifier misclassified 36% of the patents, indicating that classifying hypertext can be more difficult than classifying text. Naively using terms in neighboring documents increased error to 38%; our hypertext classifier reduced it to 21%. Results with the Yahoo! sample were more dramatic: the text classifier showed 68% error, whereas our hypertext classifier reduced this to only 21%.


,

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
1998 EnhancedHypertextCategorizationUsingHyperlinksSoumen Chakrabarti
Byron Dom
Piotr Indyk
Enhanced Hypertext Categorization Using HyperlinksProceedings of the 1998 ACM SIGMOD Conferencehttp://www.cse.iitb.ac.in/~soumen/sigmod98.ps10.1145/276304.2763321998