2000 WebUsageMiningDiscoveryandAppli

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Web Visitor Clustering.

Notes

Cited By

Quotes

Keywords

Data Mining, World Wide Web, Web Usage Mining.

Abstract

Web usage mining is the application of data mining techniques to discover usage patterns from Web data, in order to understand and better serve the needs of Web-based applications. Web usage mining consists of three phases, namely preprocessing, pattern discovery, and pattern analysis. This paper describes each of these phases in detail. Given its application potential, Web usage mining has seen a rapid increase in interest, from both the research and practice communities. This paper provides a detailed taxonomy of the work in this area, including research efforts as well as commercial offerings. An up-to-date survey of the existing work is also provided. Finally, a brief overview of the WebSIFT system as an example of a prototypical Web usage mining system is given.

1. INTRODUCTION

The ease and speed with which business transactions can be carried out over the Web has been a key driving force in the rapid growth of electronic commerce. Specifically, ecommerce activity that involves the end user is undergoing a significant revolution. The ability to track users' browsing behavior down to individual mouse clicks has brought the vendor and end customer closer than ever before. It is now possible for a vendor to personalize his product message for individual customers at a massive scale, a phenomenon that is being referred to as mass customization.

The scenario described above is one of many possible applications of Web Usage mining, which is the process of applying data mining techniques to the discovery of usage patterns from Web data, targeted towards various applications. Data mining efforts associated with the Web, called Web mining, can be broadly divided into three classes, i.e. content mining, usage mining, and structure mining. Web Structure mining projects such as [34; 54] and Web Content mining projects such as [47; 21] are beyond the scope of this survey. An early taxonomy of Web mining is provided in [29], which also describes the architecture of the WebMiner system [42], one of the first systems for Web Usage mining. The proceedings of the recent WebKDD workshop [41], held in conjunction with the KDD-1999 conference, provides a sampling of some of the current research being performed in the area of Web Usage Analysis, including Web Usage mining. This paper provides an up-to-date survey of Web Usage mining, including both academic and industrial research efforts, as well as commercial offerings. Section 2 describes the various kinds of Web data that can be useful for Web Usage mining. Section 3 discusses the challenges involved in discovering usage patterns from Web data. The three phases are preprocessing, pattern discovery, and patterns analysis. Section 4 provides a detailed taxonomy and survey of the existing efforts in Web Usage mining, and Section 5 gives an overview of the WebSIFT system [31], as a prototypical example of a Web Usage mining system, finally, Section 6 discusses privacy concerns and Section 7 concludes the paper.

2. WEB DATA

One of the key steps in Knowledge Discovery in Databases [33] is to create a suitable target data set for the data mining tasks. In Web Mining, data can be collected at the server-side, client-side, proxy servers, or obtained from an organization's database (which contains business data or consolidated Web data). Each type of data collection differs not only in terms of the location of the data source, but also the kinds of data available, the segment of population from which the data was collected, and its method of implementation.

There are many kinds of data that can be used in Web Mining. This paper classifies such data into the following types:

2.1 Data Sources

References

;

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2000 WebUsageMiningDiscoveryandAppliPang-Ning Tan
Jaideep Srivastava
Robert Cooley
Mukund Deshpande
Web Usage Mining: Discovery and Applications of Usage Patterns from Web Data10.1145/846183.8461882000