2011 ARelationalViewofPatternDiscove

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Pattern Discovery, Relational Pattern, Pattern-Oriented Relational Algebra, Pattern-Oriented Query.

Notes

Cited By

Quotes

Abstract

The elegant integration of pattern mining techniques into database remains an open issue. In particular, no language is able to manipulate data and patterns without introducing opaque operators or loop-like statement. In this paper, we cope with this problem using relational algebra to formulate pattern mining queries. We introduce several operators based on the notion of cover allowing to express a wide range of queries like the mining of frequent patterns. Beyond modeling aspects, we show how to reason on queries for characterizing and rewriting them for optimization purpose. Thus, we algebraically reformulate the principle of the level-wise algorithm.

1 Introduction

Pattern discovery is a significant field of Knowledge Discovery in Databases (KDD). A broad spectrum of powerful techniques for producing local patterns has been developed over the two last decades [3-5]. But, it is widely agreed that the need of theoretical fusion between database and data mining still remains a crucial issue [14, 18, 23, 24]. We would force the pattern mining methods to fit in the relational model [1] which is the main database theory. Unlike most of the proposals [6, 10, 14, 16, 20, 23, 28, 33, 34], we desire to only address the pattern mining that we distinguish from the construction of global models 17] like decision trees.

Let us consider the popular task of frequent pattern mining [3] as a motivating example. Most works treat this task as a “black box” which input parameters are defined by the user [6, 7, 14, 16, 20, 28, 32, 34]. Instead of only specifying the minimal frequency threshold and the dataset, we think that the user query should fully formalize the notion of frequent patterns (e.g., it should describe how the frequency of a pattern is computed starting from the dataset). Ideally, we would like to express the frequent pattern mining query in the relational algebra in order to manipulate both the data and the patterns. As declarative aspects should be promoted on physical ones, a pattern discovery process has to be fully specified without considering algorithmic points. For this purpose, loop-like operators [10, 23, 33] are not relevant for us. Furthermore, the improvement of query performances mainly rests on physical optimizations in the field of pattern mining. Typically, the frequent pattern mining is efficiently performed by an adequate implementation [3-5, 25]. Such algorithmic optimizations (even specified at a higher level [10, 23, 33]) reduce the opportunity of integrating other optimizations. We prefer to favor logical reasoning for optimizing query performances. For instance, the rewriting of the naive frequent pattern mining query should enable us to algebraically formulate the levelwise pruning [25].

The main goal of this paper is to propose an algebraic framework for pattern discovery for expressing a wide range of queries without introducing opaque operators or loop-like statements. Our framework brings two meaningful contributions: expressive modeling and logical reasoning. First, it allows a large set of queries manipulating relations which contain both data and patterns. We add to the relational algebra several specific operators, like the cover operator [math]\displaystyle{ \triangleleft }[/math], to coherently and easily join such relations. We also define a new operator [math]\displaystyle{ \Delta }[/math] for generating a language starting from a relation. Typically, the query [math]\displaystyle{ \sigma_{freq}\geq f(\gamma_{patt},\,\text{COUNT}(trans)\to freq\left(\Delta (L) \triangleleft D\right) }[/math] returns the patterns of language [math]\displaystyle{ L }[/math] frequent in dataset [math]\displaystyle{ D }[/math]. Second, the pattern-oriented relational algebra enables to characterize and rewrite queries in order to optimize their performance. In particular, we formalize the notions of syntactic constraint [9] and global constraint [12] by characterizing the degree of dependence between a query and a relation. Besides, we not only benefit from usual query rewriting methods stemming from the relational model, but we also algebraically reformulate the levelwise pruning.

 This paper is organized in the following way. Section 2 introduces basic notions about the relational algebra and the pattern discovery. Section 3 defines the cover-like and domain operators which are at the core of our algebra. We then study the properties of downward closure and independence in Section 4. We rewrite queries satisfying such properties for optimization purpose in Section 5. Finally, Section 6 provides a related work.

2 Basic Notions

(...)

2.1 Relational Algebra

2.2 Pattern Discovery

3 Pattern-Oriented Relational Algebra

(...)

3.1 Pattern-Oriented Attributes

3.2 Cover, Semi—cover and Anti—cover Operators

3.3 Domain Operator

3.4 Scope of the Pattern-Oriented Relational Algebra

4 Characterizing Pattern-Oriented Queries

(...)

4.1 Downward Closed Query

4.2 Local and Global Dependent Queries

5 Rewriting Pattern-Oriented Queries

(...)

5.1 Algebraic Laws Involving Cover-Like Operators

5.2 Algebraic Reformulation of the Level-wise Algorithm

6 Related Work

(...)

7 Conclusion

References

;

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2011 ARelationalViewofPatternDiscoveArnaud Giacometti
Patrick Marcel
Arnaud Soulet
A Relational View of Pattern Discovery2011