2003 MiningDataRecordsInWebPages

(Liu et al., 2003) ⇒ Bing Liu, Robert L. Grossman, and Yanhong Zhai. (2003). “Mining Data Records in Web Pages.” In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2003). doi:10.1145/956750.956826

Subject Headings: Information Extraction from Tables Task.

Notes

It proposes an algorithm Mining Data Records in Web Pages Algorithm (MDR) for the Information Extraction from Tables Task.
It is based on the observation that
- Data Records that have descriptions for a set of similar objects are usually showing at a specific region of a page and normally they are formatted in the form of similar HTML tags.
- It can detect a group of data records placed in a specific region.
- It can work effectively for Contiguous Data Records and Non-Contiguous Data Records.

Cited By

~313 http://scholar.google.com/scholar?q=%22Mining+Data+Records+in+Web+Pages%22+2003

Quotes

Abstract

A large amount of information on the Web is contained in regularly structured objects, which we call data records. Such data records are important because they often present the essential information of their host pages, e.g., lists of products or services. It is useful to mine such data records in order to extract information from them to provide value-added services. Existing automatic techniques are not satisfactory because of their poor accuracies. In this paper, we propose a more effective technique to perform the task. The technique is based on two observations about data records on the Web and a string matching algorithm. The proposed technique is able to mine both contiguous and non-contiguous data records. Our experimental results show that the proposed technique outperforms existing techniques substantially.

,

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2003 MiningDataRecordsInWebPages	Bing Liu Robert L. Grossman Yanhong Zhai			Mining Data Records in Web Pages			http://grossmanreport.com/dl/proc-075.pdf	10.1145/956750.956826

2003 MiningDataRecordsInWebPages

Notes

Cited By

Quotes

Abstract

Navigation menu

Search