2012 HarnessingtheWisdomoftheCrowdsf

From GM-RKB
Jump to navigation Jump to search

Subject Headings:

Notes

Cited By

Quotes

Author Keywords

Abstract

Clipping Web pages, namely extracting the informative clips (areas) from Web pages, has many applications, such as Web printing and e-reading on small handheld devices. Although many existing methods attempt to address this task, most of them can either work only on certain types of Web pages (e.g., news - and blog-like web pages), or perform semi-automatically where extra user efforts are required in adjusting the outputs. The problem of clipping any types of Web pages accurately in a totally automatic way remains pretty much open. To this end in this study we harness the wisdom of the crowds to provide accurate recommendation of informative clips on any given Web pages. Specifically, we leverage the knowledge on how previous users clip similar Web pages, and this knowledge repository can be represented as a transaction database where each transaction contains the clips selected by a user on a certain Web page. Then, we formulate a new pattern mining problem, mining top-1 qualified pattern, on transaction database for this recommendation. Here, the recommendation considers not only the pattern support but also the pattern occupancy (proposed in this work). High support requires that patterns appear frequently in the database, while high occupancy requires that patterns occupy a large portion of the transactions they appear in. Thus, it leads to both precise and complete recommendation. Additionally, we explore the properties on occupancy to further prune the search space for high-efficient pattern mining. Finally, we show the effectiveness of the proposed algorithm on a human-labeled ground truth dataset consisting of 2000 web pages from 100 major Web sites, and demonstrate its efficiency on large synthetic datasets.

References

;

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2012 HarnessingtheWisdomoftheCrowdsfEnhong Chen
Lei Zhang
Ping Luo
Min Wang
Linpeng Tang
Limei Jiao
Guiquan Liu
Harnessing the Wisdom of the Crowds for Accurate Web Page Clipping10.1145/2339530.23396212012