2005 SummarizingItemsetPatterns

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Frequent Itemset Summarization Task, Frequent Itemset Mining Task, Pattern Summarization Task, Hierarchical Agglomerative Clustering Task, K-Means Clustering Task.

Notes

Cited By

Quotes

Abstract

1. Introduction

2. Pattern Profile

3. Pattern Summarization

3.1 Hierarchical Agglomerative Clustering

3.2 K-means Clustering

3.3 Optimization Heuristics

3.3.1 Closed Itemsets vs. Frequent Itemsets
3.3.2 Approximate Profiles

3.4 Quality Evaluation

4. Empirical Study

4.1 Real Datasets

4.2 Synthetic Datasets

5. Related Work

6. Conclusions

References

  • [1] F. Afrati, A. Gionis, and H. Mannila, Approximating a collection of frequent sets. In Proc. of 2004 ACM Int. Conf. on Knowledge Discovery in Databases (KDD’04), pages 12 : 19, 2004.
  • [2] R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases, In Proc. of 1993 Int. Conf. on Management of Data (SIGMOD’93), pages 207:216, 1993.
  • [3] R. Agrawal and R. Srikant, Mining sequential patterns, In Proc. 0f1995 Int. Conf. on Data Engineering ([CDE’95}, pages 3:14, 1995.
  • [4] L. Baker and A. McCallum, Distributional clustering of words for text classification, In Proc. of 1998 ACM Int. Conf. on Research and Development in Information Retrieval (SIGIR’98), pages 96:103, 1998.
  • [5] R. Bayardo, Efficiently mining long patterns from databases, In Proc. of 1998 Int. Conf. on Management of Data (SIGMOD’98), pages 85:93, 1998.
  • [6] T. Calders and B. Goethals, Mining all non-derivable frequent itemsets, In Proc. of 2002 European Conf. on Principles of Data Mining and Knowledge Discovery (PKDD’02), pages 74:85, 2002.
  • [7] L. Dehaspe, H. Toivonen, and R. King, Finding frequent substructures in chemical compounds, In Proc. of 1998 Int. Conf. on Knowledge Discovery and Data Mining (KDD’98), pages 30:36, 1998.
  • [8] M. Deshpande, M. Kuramochi, and G. Karypis, Frequent sub-structure-based approaches for classifying chemical compounds, In Proc. of 2003 Int. Conf. on Data Mining ([CDM’OB}, pages 35:42, 2003.
  • [9] I. Dhillon, S. Mallela, and R. Kumar, A divisive information-theoretic feature clustering algorithm for text classification, J. of Machine Learning Research, 3:1265:1287, 2003.
  • [10] D. Gunopulos, H. Mannila, R. Khardon, and H. Toivonen, Data mining, hypergraph transversals, and machine learning, In Proc. 1997 ACM SIGACT—SIGMOD-SIGART Symposium on Principles of Database Systems (PODS’97), pages 209 : 216, 1997.
  • [11] J. Han, J. Wang, Y. Lu, and P. Tzvetkov, Mining top—k frequent closed patterns without minimum support, In Proc. of 2002 Int. Conf. on Data Mining ([CDM’02), pages 211:218, 2002.
  • [12] W. Hoeffding, Probability inequalities for sums of bounded random variables, J. American Statistical Associations, 58:13:30, 1963.
  • [13] L. Holder, D. Cook, and S. Djoko, Substructure discovery in the subdue system, In Proc. AAAI94 Workshop on Knowledge Discovery in Databases (KDD94), page 169180, 1994.
  • [14] J. Huan, W. Wang, D. Bandyopadhyay, J. Snoeyink, J. Prins, and A. Tropsha, Mining spatial motifs from protein structure graphs, In Proc. of 8th Amt. Int. Conf. on Research in Computational Molecular Biology (RECOMB’W), pages 308:315, 2004.
  • [15] M. Hutchins, H. Foster, T. Goradia, and T. Ostrand, Experiments 0f the effectiveness of datafiow— and controlfiow—based test adequacy criteria, In Proc. of 16th Int. Conf. on Software engineering ([CSE’94), pages 191:200, 1994.
  • [16] R. Kohavi, C. Brodley, B. Frasca, L. Mason, and Z. Zheng, KDD—Cup 2000 organizers7 report: Feeling the onion, SIGKDD Explorations, 2:86:98, 2000.
  • [17] T, Mielikainen and H. Mannila, The pattern ordering problem, In Prof. 7th European Conf. on Principles and Practice of Knowledge Discovery in Databases (PKDD’03), pages 327:338, 2003.
  • [18] E. Omiecinski, Alternative interest measures for mining associations, [EEE Trans. Knowledge and Data Engineering, 15:57:69, 2003.
  • [19] N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal Discovering frequent closed itemsets for association rules, In Proc. of 7th Int. Conf. on Database Theory ([CDT’99}, pages 398:416, 1999.
  • [20] D. Pavlov, H. Mannila, and P. Smyth, Beyond independence: Probabilistic models for query approximation on binary transaction data, [EEE Trans. Knowledge and Data Engineering, 15:1409:1421, 2003.
  • [21] J. Pei, G. Dong, W. Zou, and J. Han, On computing condensed frequent pattern bases, In Proc. 2002 Int. Conf. on Data Mining ([CDM’02), pages 378:385, 2002.
  • [22] J. Pei, A. Tung, and J. Han, Fault-tolerant frequent pattern mining: Problems and challenges, In Proc. of 2001 ACM Int. Workshop Data Mining and Knowledge Discovery (DMKD’01), pages 7:12, 2001.
  • [23] M. Steinbach, P. Tan, and V. Kumar, Support envelopes: a technique for exploring the structure of association patterns, In Proc. of 2002 ACM Int. Conf. on Knowledge Discovery in Databases (KDD’04), pages 296 : 305, 2004.
  • [24] P. Tan, V. Kumar, and J. Srivastava, Selecting the right interestingness measure for association patterns, In Proc. of 2002 ACM Int. Conf. on Knowledge Discovery in Databases (KDD’02), pages 32:41, 2002.
  • [25] K. Wang, C. Xu, and B. Lin, Clustering transactions using large items, In Proc. of 8th Int. Conf. on Information and Knowledge Management (CIKM’99), pages 483 7 490, 1999.
  • [26] X. Yan, P. Yu, and J. Han, Graph indexing: A frequent structure-based approach, In Proc. of 2004 ACM Int. Conf. on Management of Data (SIGMOD’04), pages 335:346, 2004.
  • [27] C. Yang, U. Fayyad, and P. S. Bradley, Efficient discovery of error-tolerant frequent itemsets in high dimensions, In Proc. of 2001 ACM Int. Conf. on Knowledge Discovery in Databases (KDD’01), pages 194:203, 2001.,


 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2005 SummarizingItemsetPatternsHong Cheng
Dong Xin
Xifeng Yan
Jiawei Han
Summarizing Itemset Patterns: a profile-based approachhttp://portal.acm.org/citation.cfm?id=108190710.1145/1081870.1081907