Webpage Scraping Task

From GM-RKB
Jump to navigation Jump to search

A Webpage Scraping Task is an information extraction task from a webpage.



References

2014

  • (Wikipedia, 2014) ⇒ http://en.wikipedia.org/wiki/web_scraping Retrieved:2014-7-24.
    • Web scraping (web harvesting or web data extraction) is a computer software technique of extracting information from websites. Usually, such software programs simulate human exploration of the World Wide Web by either implementing low-level Hypertext Transfer Protocol (HTTP), or embedding a fully-fledged web browser, such as Internet Explorer or Mozilla Firefox.

      Web scraping is closely related to web indexing, which indexes information on the web using a bot or web crawler and is a universal technique adopted by most search engines. In contrast, web scraping focuses more on the transformation of unstructured data on the web, typically in HTML format, into structured data that can be stored and analyzed in a central local database or spreadsheet. Web scraping is also related to web automation, which simulates human browsing using computer software. Uses of web scraping include online price comparison, contact scraping, weather data monitoring, website change detection, research, web mashup and web data integration.

      A report based on information from the world's largest database for web scraping related activity shows that web scraping related traffic has increased rapidly during the recent years. In average 23% of all traffic was scraping-related in 2013.