2007 FrequentPatternMining

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Frequent Pattern Mining, Association Rules, Survey Paper, - Data mining research - Applications.

Notes

Cited By

  • ~390 …

Quotes

Abstract

Frequent pattern mining has been a focused theme in data mining research for over a decade. Abundant literature has been dedicated to this research and tremendous progress has been made, ranging from efficient and scalable algorithms for frequent itemset mining in transaction databases to numerous research frontiers, such as sequential pattern mining, structured pattern mining, correlation mining, associative classification, and frequent pattern-based clustering, as well as their broad applications. In this article, we provide a brief overview of the current status of frequent pattern mining and discuss a few promising research directions. We believe that frequent pattern mining research has substantially broadened the scope of data analysis and will have deep impact on data mining methodologies and applications in the long run. However, there are still some challenging research issues that need to be solved before frequent pattern mining can claim a cornerstone approach in data mining applications.

1. Introduction

Frequent patterns are itemsets, subsequences, or substructures that appear in a data set with frequency noa user-specified threshold. For example, a set of items, such as milk and bread, that appear frequently together in a transaction data set, is a frequent itemset. A subsequence, such as buying first a PC, then a digital camera, and then a memory card, if it occurs frequently in a shopping history database, is a (frequent) sequential pattern. A substructure can refer to different structural forms, such as subgraphs, subtrees, or sublattices, which may be combined with itemsets or subsequences. If a substructure occurs frequently in a graph database, it is called a (frequent) structural pattern. Finding frequent patterns plays an essential role inmining associations, correlations, and many other interesting relationships among data. Moreover, it helps in data indexing, classification, clustering, and other data mining tasks as well. Thus, frequent pattern mining has become an important data mining task and a focused theme in data mining research.

Frequent pattern mining was first proposed by Agrawal et al. (1993) for market basket analysis in the form of association rule mining. It analyses customer buying habits by finding associations between the different items that customers place in their “shopping baskets”. For instance, if customers are buying milk, how likely are they going to also buy cereal (and what kind of cereal) on the same trip to the supermarket? Such information can lead to increased sales by helping retailers do selective marketing and arrange their shelf space.

3.3 From frequent patterns to interestingness and correlation analysis

Frequent itemset-mining naturally leads to the discovery of associations and correlations among items in large transaction data sets. The discovery of interesting association or correlation relationships can help inmany business decision-making processes, such as catalog design, cross-marketing, and customer shopping behavior analysis.

The concept of association rule was introduced together with that of frequent pattern (Agrawal et al. 1993). Let I = {i1, i2, . . ., im} be a set of items. An association rule takes the form of [math]\displaystyle{ α }[/math] ⇒</math>β</math>, where [math]\displaystyle{ α }[/math] ⊂ I, [math]\displaystyle{ β }[/math] ⊂ I, and [math]\displaystyle{ α }[/math][math]\displaystyle{ β }[/math] = φ, and support and confidence are two measures of rule interestingness. An association rule is considered interesting if it satisfies both a min_sup threshold and a min_conf threshold.

Based on the definition of association rule, most studies take frequent pattern mining as the first and the essential step in association rule mining. However, not all the association rules so generated are interesting, especially when mining at a low support threshold or mining for long patterns. To mine interesting rules, a correlation measure has been used to augment the support-confidence framework of association rules. This leads to the correlation rules of the form [math]\displaystyle{ α }[/math] ⇒ β[support, confidence, correlation]. There are various correlation measures including lift, χ2, cosine and all_confidence.

4.1 Frequent pattern-based classification

Frequent itemsets have been demonstrated to be useful for classification, where association rules are generated and analyzed for use in classification (Liu et al. 1998; Dong and Li 1999; Li et al. 2000; Li et al. 2001; Yin and Han 2003; Cong et al. 2005;Wang and Karypis 2005). The general idea is that strong associations between frequent patterns and class labels can be discovered. Then the association rules are used for prediction. In many studies, associative classification has been found to be more accurate than some traditional classification methods, such as C4.5.

References


,

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2007 FrequentPatternMiningHong Cheng
Dong Xin
Xifeng Yan
Jiawei Han
Frequent Pattern Mining: current status and future directionsData Mining and Knowledge Discoveryhttp://www.cs.ucsb.edu/~xyan/papers/dmkd07 frequentpattern.pdf10.1007/s10618-006-0059-12007