2003 MiningDataRecordsInWebPages

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Information Extraction from Tables Task.


Notes

Cited By

Quotes

Abstract

A large amount of information on the Web is contained in regularly structured objects, which we call data records. Such data records are important because they often present the essential information of their host pages, e.g., lists of products or services. It is useful to mine such data records in order to extract information from them to provide value-added services. Existing automatic techniques are not satisfactory because of their poor accuracies. In this paper, we propose a more effective technique to perform the task. The technique is based on two observations about data records on the Web and a string matching algorithm. The proposed technique is able to mine both contiguous and non-contiguous data records. Our experimental results show that the proposed technique outperforms existing techniques substantially.


,

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2003 MiningDataRecordsInWebPagesBing Liu
Robert L. Grossman
Yanhong Zhai
Mining Data Records in Web Pageshttp://grossmanreport.com/dl/proc-075.pdf10.1145/956750.956826