1998 LazyModelBasedOnlineClassification

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Online Classification, DBPredictor

Notes

Cited By

2006

Quotes

Abstract

The growing access to large amounts of structured observations allows for more opportunistic uses of this data. An example of this, is the prediction of an event's class membership based on a database of observations. When these predictions are supported by a high-level representation, we refer to these as knowledge based on-line classification tasks. Two common types of algorithms from machine learning research that may be applied to on-line classification tasks make use of either lazy instance-based (k-NN, IB1) or eager model-based (C4.5, CN2) approaches. Neither approach, however, appears to provide a complete solution for these tasks.

This thesis proposes a lazy model-based algorithm, named DBPredictor, that is suited to knowledge based on-line classification tasks. The algorithm uses a greedy top-down search to locate a probabilistic IF-THEN rule that will classify the given event. Empirical investigation validates this match. DBPredictor is shown to be as accurate as IB1 and C4.5 against general datasets. Its accuracy however, is more robust to irrelevant attributes than IB1, and more robust to underspecialized events than C4.5. Finally, DBPredictor is shown to solve a significant number of classification requests before C4.5 can satisfy its first request.

These performance characteristics, along with the algorithm's ability to avoid discretization of numerical attributes and its ability to be tightly-coupled with a relational database, suggests that DBPredictor is an appropriate algorithm for knowledge based on-line classification tasks.

Table of Contents

1 Introduction p.1
1.1 Motivations p.3
1.2 Approach p.4
1.3 Contributions p.5
1.4 ThesisOutline p.6
2 GeneralFramework p.Pr9
2.1 Example p.7
2.2 Input Requirements p.8
2.3 Output Requirements p.12
2.4 Control Requirements p.14
2.5 Performance Measures p.16
2.6 Related Issues p.18
2.7 Chapter Summary p.20
3 Related Work p.21
3.1 Instance-BasedLearning p.22
3.2 Top-Down Induction of Decision Trees p.27
3.3 Lazy Algorithms with Dynamic Relevance Testing p.32
3.4 ChapterSummary p.40
4 DBPredictor Algorithm p.41
4.1 Overview p.42
4.2 Input Parameters p.43
4.3 Output Format p.45
4.4 DBPredictor () Algorith mp.47
4.5 PSIP() Procedure p.47
4.6 seedrule () p.50
4.7 top-downsearch0 p.5 i
4.8 generateantecedents () p.52
4.9 get-consequent () p.58
4.10 F() Heuristic hinction p.58
4-11 bestrule O Sub-Procedure p.59
4.12 Comple+ Analysis p.59
4.13 Discussion p.63
4.14 Chapter Summary p.64
5 Time Efficient Search Algorithm p.65
5.1 Updated seemle() Procedure p.66
5.2 Updated get-consequent () Procedure p.66
5.3 Updated bestde() Procedure p.69
5.4 Complexity Analysis p.69
5.4.1 Running Time CompIexity p.70
5.4.2 Space CompIexity p.72
5.5 Chapter Summary p.72
6 Heuristic Functions p.74
6.1 Information Available to F() p.74
6.2 Sibling-Sibling versus Parent-Child p.76
6.3 Sibling-Sibling F() p.77
6.4 Parent-ChildF() p.81
6.5 Pruning p.84
6.6 ChapterSummary p.85
7 Empincal Study of Accuracy p.86
7.1 Methodology p.87
7.2 Variations on F() p.90
7.3 Relative Accuracy of DBPredictor p.98
7.4 Discussion p.105
7.5 Chapter Summary p.106
8 Empirical Study of Running Time p.107
8.1 Methodology p.107
8.2 Standalone fe rformance p.1 08
8.3 Relative Performance p.109
8.4 Chapter Summary and Discussion p.111
9 Conclusion p.112
9.1 Thesis Summary p.112
9.2 Contributions p.1 14
9.3 Future Research p.115
9.4 Concluding Remarks p.116

References

  • [1] AAAI. Thirteenth National Conference on Artificial Intelligence. AAAI Press, 1996.
  • [2] Rakesh Agrawal and J. C. Shafer. Parallel mining of association rules: Design, implementation, and experience. IEEE Trans. Knowledge and Data Engineering, 8:962{969, (1996).
  • [3] D. W. Aha. A Study of Instance-based Algorithms for Supervised Learning Tasks. PhD thesis, University of California, Irvine, (1990).
  • [4] D. W. Aha, editor. Lazy Learning. Kluwer Academic, May (1997).
  • [5] D. W. Aha. Lazy learning editorial remarks. [4], pages 1{3.
  • [6] D. W. Aha, D. Kibler, and M. K. Albert. Instance-based learning algorithms. Machine Learning, 6(1):37{66, (1991).
  • [7] C. Apte and S. J. Hong. Predicting equity returns from securities data with minimal rule generation.
  • [8] M. A. Arbib, A. J. Kfoury, and R. N. Moll. A basis for theoretical computer science. Texts and monographs in computer science. Springer-Verlag, (1981).
  • [9] Leo Breiman, Jerome H. Friedman, R. Olshen, and C. Stone. Classification and Regression Trees. Wadsworth, (1984).
  • [10] J. G. Carbonell, R. S. Michalski, and Tom M. Mitchell. An overview of machine learning. In Michalski et al., editor, Machine Learning: An Artificial Intelligence Approach, Vol. 1, pages 3{24. Morgan Kaufmann, (1983). 117
  • [11] J. Catlett. Megainduction: Machine Learning on Very Large Databases. PhD thesis, University of Sydney, June (1991).
  • [12] J. Cheng, Usama M. Fayyad, K. B. Irani, and Z. Qian. (1988). “Improved Decision Trees. A generalized version of ID3.” In: Proceedings of Fifth International Conference Machine Learning (ICML 1988).
  • [13] P. Clark and R. Boswell. Rule induction with CN2: Some recent improvements. In Machine Learning - Proceedings of the Fifth European Conference (EWSL-91), pages 151{163. Springer-Verlag, (1991).
  • [14] P. Clark and T. Niblett. The CN2 induction algorithm. Machine Learning, 3:261{283, (1989).
  • [15] E. F Codd, S. B. Codd, and C. T. Salley. Providing OLAP (on-line analytical processing) to user-analysts: An IT mandate. In E. F. Codd & Associates available at http://www.arborsoft.com/OLAP.html, (1993).
  • [16] S. Cost and S. Salzberg. A weighted nearest neighbor algorithm for learning with symbolic features. Machine Learning, 10:57{78, (1993).
  • [17] T. M. Cover and P. E. Hart. Nearest neighbor pattern classification. IEEE Transaction on Information Theory, 13:21{27, 1967.
  • [18] Thomas G. Dietterich. Over tting and undercomputing in machine learning. ACM Computing Surveys, 27(3):326{327, September (1995).
  • [19] Thomas G. Dietterich and R. S. Michalski. A comparative review of selected methods for learning from examples. In Michalski et al., editor, Machine Learning: An Artificial Intelligence Approach, Vol. 1, pages 41{82. Morgan Kaufmann, (1983).
  • [20] A. J. Dobson. An Introduction to Generalized Linear Models. Chapman & Hall, (1990).
  • [21] Pedro Domingos. Unifying instance-based and rule-based induction. Machine Learning, 24(2):141{168, August (1996).
  • [22] Pedro Domingos. Context-sensitive feature selection for lazy learners. In Aha [4], pages 227{253.
  • [23] J. Dougherty, Ron Kohavi, and M. Sahami. Supervised and unsupervised discretization of continuous features. In A. Prieditis and S. Russel, editors, Machine Learning: Proceedings of the Twelfth International Conference, pages 194{202. Morgan Kaufmann.
  • [24] Usama M. Fayyad, S. G. Djorgovski, and N.Weir. Automating the analysis and cataloging of sky surveys. In U.M. Fayyad, G. Piatetsky-Shapiro, Padhraic Smyth, and R. Uthurusamy, editors, Advances in Knowledge Discovery and Data Mining, pages 471{493. AAAI/MIT Press, (1996).
  • [25] Usama M. Fayyad and K. B. Irani. On the handling of continuous-value attributes in decision tree generation. Machine Learning, pages 87{102.
  • [26] Usama M. Fayyad and K. B. Irani. The attribute selection problem in decision tree generation. In: Proceedings of the Tenth National Conference of Artificial Intelligence, pages 104{110. AAAI Press, (1992).
  • [27] Usama M. Fayyad, G. Piatetsky-Shapiro, Padhraic Smyth, and R. Uthurusamy. Advances in Knowledge Discovery and Data Mining. AAAI/MIT Press, (1996).
  • [28] E. Fix and Jr. J. L. Hodges. Discriminatory analysis, nonparametric discrimination, consistency properties. Technical Report 4, United States Air Force, School of Aviation Medicine, Randolph Field, TX, 1951.
  • [29] Jerome H. Friedman, Ron Kohavi, and Y. Yun. Lazy decision trees. [1], pages 717{724.
  • [30] T. Fulton, S. Kasif, S. Salzberg, and D. Waltz. Local induction of decision trees: Towards interactive data mining. In Simoudis et al. [63], pages 14{19.
  • [31] J. Gray, S. Chaudhuri, A. Bosworth, A. Layman, D. Reichart, M. Venkatrao, F. Pellow, and H. Pirahesh. Data cube: A relational aggregation operator generalizing group-by, cross-tab and sub-totals. Data Mining and Knowledge Discovery, 1:29{54, (1997).
  • [32] R. Greiner, A. J. Grove, and Dan Roth. Learning active classifiers. In Lorenza Saitta, editor, Proceedings of 13th International Conference Machine Learning (ICML'96), pages 207{215. Morgan Kaufmann, (1996).
  • [33] Jiawei Han, Y. Cai, and N. Cercone. Knowledge discovery in databases: An attribute-oriented approach. In: Proceedings of 18th International Conference Very Large Data Bases, pages 547{559, Vancouver, Canada, August (1992).
  • [34] (Hogg & Ledolter, 1987) ⇒ Robert V. Hogg and Johannes Ledolter. (1987). “Engineering Statistics. Macmillan Publishing Company.
  • [35] X. Hu. Conceptual clustering and concept hierarchies in knowledge discovery. Master's thesis, Simon Fraser University, School of Computing Science, December (1992).
  • [36] P. J. Huber. From large to huge: A statistician's reactions to KDD & DM. In The Third International Conference on Knowledge Discovery & Data Mining, pages 304{ 308, (1997).
  • [37] E. B. Hunt, J. Marin, and P. T. Stone. Experiments in Induction. Academic Press, 1966.
  • [38] L. Hya l and R. L. Rivest. Construction optimal binary decision trees is NP-complete. Information Processing Letters, 5(1):15{17, 1976.
  • [39] T. Imielinski and Heikki Mannila. A database perspective on knowledge discovery. Communications of ACM, 39:58{64, (1996).
  • [40] G. H. John and B. Lent. SIPping from the data firehose. In: Proceedings, Third International Conference on Knowledge Discovery and Data Mining, pages 199{202. AAAI Press, (1997).
  • [41] D. Kibler and Pat Langley. Machine Learning as an Experimental Science, chapter 1, pages 38{43. In [62], (1990).
  • [42] W. Kl¨osgen and J. Zytkow. Knowledge discovery in database terminology. In U.M. Fayyad, G. Piatetsky-Shapiro, Padhraic Smyth, and R. Uthurusamy, editors, Advances in Knowledge Discovery and Data Mining, pages 573{592. AAAI/MIT Press, (1996).
  • [43] Ron Kohavi. Wrappers for performance enhancement and oblivious decision graphs. PhD thesis, Stanford University, September (1995).
  • [44] J. Kolodner. Case-based reasoning. Morgan Kaufmann, San Mateo, CA, (1993).
  • [45] C.J. Matheus, G. Piatetsky-Shapiro, and D. McNeil. Selecting and reporting what is interesting: The KEFIR application to healthcare data. In U.M. Fayyad, G. Piatetsky- Shapiro, Padhraic Smyth, and R. Uthurusamy, editors, Advances in Knowledge Discovery and Data Mining, pages 495{516. AAAI/MIT Press, (1996).
  • [46] Gabor Melli. Ad hoc attribute-value prediction. [1], page 1396.
  • [47] J. Melton and A. R. Simon. Understanding the New SQL: A Complete Guide. Morgan Kaufmann, (1993).
  • [48] R. S. Michalski, J. G. Carbonell, and Tom M. Mitchell. Machine Learning, An Artificial Intelligence Approach, Vol. 1. Morgan Kaufmann, (1983).
  • [49] J. Mingers. An empirical comparison of pruning methods for decision-tree induction. Machine Learning, 4:227{243, (1989).
  • [50] J. Mingers. An empirical comparison of selection measures for decision-tree induction. Machine Learning, 3:319{342, (1989).
  • [51] P. M. Murphy and D. W. Aha. UCI repository of machine learning databases. Irvine, CA: University of California, Department of Information and Computer Science, (1995).
  • [52] S K. Murthy. On Growing Better Decision Trees from Data. PhD thesis, John Hopkins University, (1995).
  • [53] A. Newell and H. Simon. Human Problem Solving. Prentice-Hall, 1972.
  • [54] G. Piatetsky-Shapiro and W. J. Frawley. Knowledge Discovery in Databases. AAAI/MIT Press, (1991).
  • [55] J. Ross Quinlan. Learning efficient classification procedures and their application to chess end-games. In Michalski et al., editor, Machine Learning: An Artificial Intelligence Approach, Vol. 1, pages 463{482. Morgan Kaufmann, (1983).
  • [56] J. Ross Quinlan. The e ect of noise on concept learning. In Michalski et al., editor, Machine Learning: An Artificial Intelligence Approach, Vol. 2, pages 149{166. Morgan Kaufmann, (1986).
  • [57] J. Ross Quinlan. Induction of decision trees. Machine Learning, 1:81{106, (1986).
  • [58] J. Ross Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, (1993).
  • [59] J. Ross Quinlan. Combining instance-based and model-based learning. In: Proceedings of the Tenth International Conference on Machine Learning, pages 236{243, (1993).
  • [60] J. Ross Quinlan. Improved use of continuous attributes in C4.5. Journal of Artificial Intelligence Research, 4:77{90, March (1996).
  • [61] J. Rissanen. A universal prior for integers and estimation by minimum description length. Annals of Statistics, 5(3):416{431, (1983).
  • [62] J.W. Shavlik and Thomas G. Dietterich. Readings in Machine Learning. Morgan Kaufmann, (1990).
  • [63] Evangelos Simoudis, Jiawei Han, and Usama Fayyad, editors. The Second International Conference on Knowledge Discovery and Data Mining. AAAI Press, August (1996).
  • [64] Padhraic Smyth and R.M. Goodman. Rule induction using information theory. In G. Piatetsky-Shapiro and W. J. Frawley, editors, Knowledge Discovery in Databases, pages 159{176. AAAI/MIT Press, (1991).
  • [65] Padhraic Smyth and R.M. Goodman. An information theoretic approach to rule induction. IEEE Trans. Knowledge and Data Engineering, 4:301{316, (1992).
  • [66] C. Stan ll and D. Waltz. Toward memory-based reasoning. Communications of the ACM, 29:1213{1228, (1986).
  • [67] R. Uthurusamy, Usama M. Fayyad, and S. Spangler. Learning useful rules from inconclusive data. In G. Piatetsky-Shapiro and W. J. Frawley, editors, Knowledge Discovery in Databases, pages 141{157. AAAI/MIT Press.
  • [68] S. M. Weiss and C. A. Kulikowski. Computer Systems that Learn: Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning, and Expert Systems. Morgan Kaufman, (1991).
  • [69] P. H. Winston. Artificial Intelligence. Addison Wesley, 1992.

,

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
1998 LazyModelBasedOnlineClassificationGabor Melliftp://fas.sfu.ca/pub/cs/theses/1998/GaborMelliMSc.pdf A Lazy Model-based Approach to On-Line ClassificationProceedings of the Third Pacific-Asia Conference on Methodologies for Knowledge Discovery and Data Mining1998
1999