2009 ADataMiningOntforAlgSelandMeta-Min

(Hilario et al., 2009) ⇒ Melanie Hilario, Alexandros Kalousis, Phong Nguyen, and Adam Woznica. (2009). “A Data Mining Ontology for Algorithm Selection and Meta-Mining.” In: Proceedings of the 2nd International Workshop on Third generation Data Mining at ECML/PKDD09 (SoKD-09)

Subject Headings: Data Mining Ontology.

Notes

Cited By

~9 http://scholar.google.com/scholar?q=%22A+Data+Mining+Ontology+for+Algorithm+Selection+and+Meta-Mining%22+2009

Quotes

Abstract

Given a learning task, the standard approach is to experiment with a broad range of algorithms and parameter settings, and select the model which performs best according to some performance criterion. One of the aims of meta-learning is to at least restrict the space of candidate models by exploiting insights gained from previous experiments. This has been done over the years by correlating dataset characteristics with the observed performance of algorithms viewed as black boxes. We have started to pry open these black boxes to sort out salient algorithm features such as the structure and parameters of the models built, the data partitions effected in data space, the cost function used and the optimization strategy adopted to minimize this cost function. The immediate goal is to build a data mining ontology formalizing the key components that together compose an algorithm's inductive bias. Based on this ontology, a meta-learner could infer algorithm selection guidelines by correlating an algorithm's intrinsic bias with empirical evidence of its performance.

1 Introduction

The medium—term goal of the work reported in this paper is to build an e— Laboratory for Interdisciplinary COllaborative research (e—LICO) in data mining and data—intensive sciences. The proposed e—lab comprises three layers: the e— science and data mining layers form a generic research environment that can be adapted to different scientiﬁc domains by customizing the application layer. The e—science infrastructure integrates semantic web technologies for resource sharing and integration to support collaborative scientiﬁc research. The main innovation of the data mining (DM) layer is a self—improving, planner—based DM assistant. Improvement With experience is ensured by a meta—miner that thrives on data and meta—data collected from groups of committed scientists. The term meta— mining speciﬁcally designates meta—learning applied not only to the learning phase, but to the complete knowledge discovery process, in particular to all tasks that require search in the space of applicable methods. The DM assistant draws its intelligence not only from its planning and meta—mining capabilities but also from the wealth of domain—speciﬁc and domain—independent knowledge at its disposal.

A key source of domain—independent knowledge is the data mining ontology (DMO), Which can be viewed as the repository of the intelligent assistant’s data mining expertise. The DMO plays a major role throughout the lifecycle of the DM e—lab. As a compendium of knowledge about DM tasks, algorithms, data and models, it Will be used: 1) to plan the DM process using hierarchical task networks and generate alternative workﬂows; 2) to guide algorithm and model selection for critical tasks such as learning and dimensionality reduction; 3) t0 meta—mine experimentation records in order to improve algorithm and model selection; 4) to provide a controlled vocabulary for semantic annotation of DM tools and services offered in e—LICO. Use of the DMO to plan the knowledge discovery process is discussed in [17]. This paper describes how the DMO has been designed to support algorithm selection and meta—learning. Section 2 pro— poses a novel approach to the algorithm selection problem; Section 3 discusses related work concerning algorithm selection, meta—learning, and DM ontologies. Sections 4 and 5 give an overview of the DMO and its conceptualization of tasks and methods to support algorithm selection and meta—learning. Section 6 concludes With a discussion of major open issues and future work.

2 Algorithm Selection And Meta—learning

It is now a matter of consensus that no learning algorithm can outperform all others across broad classes of problems and domains [26]. Thus an essential step in any machine learning experiment is selecting the algorithm that Will perform best for a given task and data set. As pointed out in a recent survey [24], research an algorithm selection ﬁnds its origins outside machine learning, in a broader framework that cuts across diverse areas of mathematics and computer science. In 1976 a seminal paper by John Rice [23] proposed a formal model comprising four components: a problem space X or collection of problem instances describ— able in terms of features deﬁned in feature space f, an algorithm space A or set of algorithms considered to address problems in X , and a performance space 73 representing metrics of algorithm eﬂicacy in solving a problem. Algorithm se— lection can then be formulated as follows: Given a problem as E X characterized by ﬂat) 6 f, ﬁnd an algorithm a E A via the selection mapping S(f($)) such that the performance mapping p(a(ac)) E 73 is maximized. A schematic diagram of the abstract model is given in Fig. 1.

In Rice’s model, selection mapping from problem space X onto algorithm space A is based solely on features f E J: over the problem instances. In machine learning terms, the choice of the appropriate induction algorithm is conditioned solely on the characteristics of the learning problem and data. Strangely, meta—learning research has independently abided by the same restriction from its inception to the present. Learned meta—rules are generally of the form: if the given dataset has characteristics Cl, 02, ..., Cm then use algorithm A1. Sometimes the conclusion can take other forms such as "don’t use algorithm A2" or "prefer A1 to A2 "; in all cases, however, these rules represent mappings from data set features to algorithms viewed essentially as black boxes.

Fig. 1. Rice’s model of the algorithm selection problem. Adapted from [23,24]

So far no attempt has been made to correlate dataset and algorithm characteris — tics, in other words to understand which aspects of a given algorithm explain its expected performance given the features of the data to be modelled. As a conse— quence, current meta—learners cannot generalize over algorithms as they do over data sets. To illustrate this problem, suppose that three algorithms are observed to achieve equivalent performance on a collection of datasets representing a task family. Meta—learning would yield three disjunctive rules with identical condi— tions and distinct recommendations. There would be no way of characterizing in more abstract terms the class of algorithms that would perform well on the given task domain. In short, no amount of meta—learning would reap fresh insights into the commonalities underlying the disconcerting variety of algorithms.

To overcome this dilﬁculty, we propose to extend the Rice framework and pry open the black box of algorithms. To be able to differentiate similar algorithms as well as detect deeper commonalities among apparently unrelated ones, we propose to characterize them in terms of components such as the model struc— ture built, the objective functions and search strategies used, or the type of data partitions produced. This compositional approach is expected to have two far—reaching consequences. Through a systematic analysis of all the ingredients that constitute an algorithm’s inductive bias, meta—learning systems (and data miners in the ﬁrst instance) will be able to infer not only which algorithms work for speciﬁc data/task classes butimore importantlyiwhy. In the long term, they should be able to operationalize the insights thus gained in order to combine algorithms purposefully and perhaps design new algorithms. This novel approach to algorithm selection is not limited to the induction phase; it should be applicable to other data and model processing tasks that require search in the space of candidate algorithms. The proposed approach will also be adapted to model selection, i.e., ﬁnding the speciﬁc parameter setting that will allow a given algorithm to achieve acceptable performance on a given task. This will require an extensive study of the parameters involved in a given class of algo— rithms, their role in the learning process or their impact on the expected results (e.g., on the complexity of the learned model for induction algorithms), and their formalization in the data mining ontology.

Fig. 2. Proposed model for algorithm selection

The proposed revision of Rice’s model for algorithm selection is visualized in Fig. 2. It includes an additional feature space 9 representing the space of features extracted to characterize algorithms; selection mapping is now a function of both problem and algorithm features. The revised problem formulation now is: Given a problem as E X characterized by f (as) E J: and algorithms a E A characterized by g(a) E 9, ﬁnd an algorithm a E A via the selection mapping S(f(ac),g(a)) such that the performance mapping p(a(ac)) E 73 is maximized.

3 Related Work

Of the few data mining ontologies reported in the literature, the majority focus on planning the DM process and building workﬂows [4,28,27], sometimes in the speciﬁc context of Grid computing [8,7]. We shall not delve into their content which is not directly relevant to the focus of this paper. A recent paper proposes a data mining ontology aimed at "the uniﬁcation of the ﬁeld of data mining" [20] but deﬁnes no speciﬁc use case that it is intended to support. To our knowledge, the DMO is the ﬁrst data mining ontology that has been designed to support, among other tasks, algorithm/Inodel selection and meta—learning.

Algorithm and model selection in data mining has been the object of in— tensive experimentation and large—scale comparative studies, a comprehensive review of which is outside the scope of this paper. (e.g., [19,14,18,9,12]). More interestingly, choosing the right algorithm and parameter setting has been cast as a learning problem in itself: meta—learning for algorithm and model selection has been an active area of investigation for the past two decades [22,1,6,16,13,2]. As pointed out in Section 2, most research on this topic has been implicitly done within the bounds of Rice’s framework, where black—box algorithms are selected based solely on problem/data descriptions. An important body of meta—learning research has been devoted to dataset characterization. The Statlog project [19] yielded several dozen dataset features grouped into three categories: simple counts (e.g., number of instances, features or classes), statistical measures (e.g., feature covariance) and information—theoretic measures (e.g., feature entropy). Thereafter, other researchers have tried to expand this set by exploring new fea— tures that might yield clues on which algorithms work best for which dataset characteristics [11,10].

The use of landmarking [21,3] in the METAL project gave new impetus to the study of algorithm performance on datasets. This approach uses two sets of algorithms: so—called landmarkers and the actual candidate algorithms. Land— markers are simple and fast learners (preferably with different inductive biases) whose performance on a set of different learning tasks serve to chart the space of learning problems. To generate Ineta—rules, both landmarkers and candidates are trained and evaluated on a given set of datasets. Each learning task/dataset then becomes a Ineta—learning instance which is characterized, in addition to standard predictive features, by the different landmarkers’ performance scores. The label of each Ineta—instance is the candidate algorithm with the best per— formance measure. The Ineta—learner is then trained to predict the winning al— gorithm by identifying tasks in which landmarkers’ performance correlate with that of a particular candidate. An example of a learned Ineta—rule is: If error— LINEAR—DISOR 3 0.0652 V (num—inst 2 10 /\ num—classes 2 5 /\ maxclass 2 0.547 ) then choose LTREE, else choose RIPPER [21]. However, despite the use of learners to landmark areas of expertise of other learners, no attempt is made to explain observed performance of algorithms on the basis of landmarkers’ or their own characteristics. In landmarking, as before, learners remain black boxes.

4 The Data Mining Ontology

As indicated in Section 1, the DMO is meant to support a number of use cases. This section presents a speciﬁc view of DMO based on the algorithm selection use case. The most important competency questions that the ontology should be able to answer include the following: Given a data mining task/data set, what is the set of potentially applicable methods/algorithrns? Given a set of candidate methods/algorithrns for a given task/data set, which data set char— acteristics should be taken into account in order to select the most appropriate one? Given a set of candidate methods/algorithrns for a given task/data set, which Inethod/algorithrn characteristics should be taken into account in order to select the most appropriate one?

The DMO is currently being developed in OWL2 using the Protege 4 editor. To support algorithm selection, it provides a conceptualization of data mining tasks, methods/algorithrns and datasets. The task hierarchy is divided into two major subtrees: the ﬁrst represents the user task which is more relevant to the plan— ning use case described in [17], while the concept of GenericDMTask subsumes four major task classes: data processing, modelling, model transformation, and model evaluation. Since the focus of this paper is on algorithm selection for classiﬁcation, Fig. 3 shows an extract of the ModellingTask hierarchy where Pre— dictiveModellingTask subsumes three subclasses distinguished by the data type of their output: categories for classiﬁcation, scalars for regression, and complex objects (e.g., tuples, trees) for structured prediction.

Fig. 3. The Modelling Task subtree

For each leaf class of the task hierarchy, there is a corresponding Method subtree whose branches represent broad classes of methods that address the task. For in— stance, classiﬁcation methods can be divided into three broad categories [5] that form the main branches of the ClassificationMethod subtree (Fig. 4). Generative methods compute the class — conditional densities p(x]Ck) and the priors p(Ck) for each class Ck, then use Bayes’ theorem to ﬁnd posterior class probabilities p(CMx). They can also model the joint distribution p(x, Ck) directly and then normalize to obtain the posteriors. In both cases, they use statistical decision theory to determine the class for each new input. Examples of generative meth— ods are normal discriminant analysis and Naive Bayes. Discriminative methods such as logistic regression compute posteriors p(C;C ]x) directly to determine class membership. Discriminant functions build a direct mapping f (x) from input x onto a class label; neural networks and support vector machines (SVMs) are examples of discriminative methods.

5 Algorithm Characterization in the DMO

The DMO’s conceptualization of learning algorithms hinges on the 4—tuple of concepts (Task, DataSet, Method, Model). For instance, a ClassificationTask is achieved by applying a ClassificationMethod to a LabelledDataSet, producing a ClassificationModel. As we go down the classiﬁcation method subtree in Fig.4, the broad approaches described in the previous section split into more special— ized methods which in turn give rise to formally speciﬁed algorithms such as those on the right side of the ﬁgure. In ontological terms, these speciﬁc method subclasses are simultaneously declared as instances of the Algorithm Ineta—class; their subclasses represent operators, deﬁned as concrete software implementa— tions of algorithms. In the same vein, these subclasses are themselves instances of the Operator Ineta—class. For example, DiscriminantFunctionMethod subsumes Recu rsivePartitioning which in turn subsumes algorithms LTREE, CART and C4.5. The black triangle to the right of C4.5 depicts its (hidden) subclasses, operators Weka—J48 and RapidMiner—DecisionTree.

Fig. 4. The ClassificationMethod subtree

We now zoom in on the key components of classiﬁcation algorithms; these are represented by datatype and object properties of Algorithm instances. For the purposes of this paper, we focus on characterizing how algorithms work and ignore shallow algorithm characteristics such as ease of implementation, com— putational cost or readability. To do this, we must ﬁrst characterize the models they were designed to produce.

A ClassificationModel is deﬁned by its ModelStructure and by the ModelParam— eters that instantiate this basic structure. It is the ModelStructure that distin— guishes the major classiﬁcation models: a GenerativeModel's basic structure is

a JointProbabilityDistribution, that of a DiscriminativeModel is a PosteriorProba— bilityDistribution. DiscriminantFunctionMethods produce diverse model structures such as decision trees and neural networks, depending on the nature of the map— ping function. Within each model family, a variety of models are produced by coupling the model structure with different types/values of model parameters. To see this, consider the difference between linear and quadratic discriminant analysis under the Gaussian assumption. The Norma|QuadraticDiscriminantModel has the same model structure and ﬁrst model parameter as NormalLinearDiscrim— inantModeI shown in Fig. 5. However, its second model parameter is not a single SharedCovarianceMatrix, but as many class — speciﬁc covariance matrices as there are classes in the given dataset. The outcome is a major difference in the ge— ometry of the resulting models: one draws a linear (value of the doesDataSplit property, Fig. 5) and the other a quadratic boundary between the classes.

Fig. 5. Characterization of two generative algorithms and models

In probabilistic (generative and discriminative) models, another property that further speciﬁes the model structure to yield diverse models is the DensityEstima— tionMethod used. For instance, although the two generative models in Fig. 5 use a JointProbabilityDistribution structure, the NormalLinearDiscriminantModel uses a Gaussian distribution whereas NaiveBayesKernel [15] estimates a non—parametric distribution by ﬁtting a Gaussian kernel around each training instance. This entails clear differences in the type of model parameters: the sulﬁcient statistics of the estimated Gaussian distribution are the Norma|LinearDiscriminantModel’s mean and covariance matrices, whereas those of NaiveBayesKernelModel are all the training instances.

Given a model structure and its parameters, the learning process is nothing more or less than the automated adjustment of these parameters to produce a fully speciﬁed, operational model. This is the task of the learning algorithm. The goal is to determine the set of parameter values that will maximize classiﬁcation per— formance as gauged by some criterion. Independently of the manner in which the learned model will be evaluated after the learning process, the learner should deﬁne a cost (or objective) function, which quantiﬁes how close the current pa— rameter values are to the optimum. Learning stops when the cost function is minimzed. In its simplest version, the cost function can be simply some measure of error or more generally of loss (e.g. misclassiﬁcation rate or sum of squared errors). However, minimizing training set error can lead to overﬁtting and gen— eralization failure. The more general concept of CostFunction used in the DMO can be formalized as F : e + Ac, where e is a measure of loss, 0 is a measure of model complexity, and A is a regularization parameter which controls the trade— off between loss and complexity. The components of the cost function used in SVM learning are shown in Fig. 6.

Fig. 6. Characterization of a discriminant function algorithm and model

The search for the right setting can be cast as an optimization problem that con— sists in minimizing the cost function. Hence an OptimizationStrategy is another essential component of a learning algorithm. In certain cases, optimization is straightforward. This is the case of NormalLinearDiscrirninantAnalysis (Fig. 5), where the cost function is the log likelihood, and the maximum likelihood esti— mates of the model parameters have a closed form solution: it sulﬁces to take the derivatives of the log likelihood with respect to the different parameters, set them to 0, and solve for the parameters. Logistic regression, on the other hand, estimates the maximum likelihood parameters using methods such as Newton— Raphson. SVMs use Sequential Minimal Optimization (SMO), a quadratic pro— gramming method rendered necessary by the quadratic complexity component of the cost function (L2 norm in Fig. 6).

A learning algorithm’s model structure and its strategy for ﬁnding the optimal model parameters are essential ingredients of its inductive bias, without which no generalization is possible. Despite such design options that restrict the space of target functions that a learning algorithm can explore, the combinatorics of search remains daunting. Thus many algorithms allow the user to restrict further the space of considered models or steer the search in regions deemed promising. This is the role of hyperpararneters: they allow the user to reinforce an algorithm’s built—in inductive bias by specifying choices that might be informed by prior knowledge. In SVMs, for instance, a single generic algorithm can give rise to a number of different models based on the hyperparameter values selected by users. One such hyperpararneter is the kernel function, which is deﬁned by the kernel type (e.g., polynomial, Gaussian) and its associated parameters: the order or degree of a polynomial kernel, or the bandwidth of a Gaussian kernel. The kernel function selected by the user (depicted as <Kernel> in Fig.6) speciﬁes the LinearCombination OfKernels that comprises the model structure. Adjustment of the model parameters (the kernel coelﬁcients) is controlled by yet another hyperparaIneter called C. As shown in the ﬁgure, the value of C becomes the regularization parameter that controls the trade—off between error (measured by Hinge Loss) and model complexity (quantiﬁed by the L2 norm of the kernel coelﬁcients). This is expressed in OWL through the SWRL rule: If SVM(?a:) /\ hasCostFunction(?a:, ?y) /\ hasHyperpammeter(?a:, ?z) /\ hasValue(?z, ?c) —> hasRegularizationPammeter(?y, ?c).

6 Conclusion

In this paper we presented our vision of a data mining ontology designed to support Ineta—learning for algorithm (and subsequently model) selection. Previ— ous research has focused obsessively on aligning experiments and performance metrics while little effort has gone into explaining observations in terms of the in— ternal logic and mechanisms of learning algorithms. In this sense, meta—learning research has remained within the strict bounds of the Rice framework, which re— lates dataset descriptions to performance of algorithms viewed mainly as black boxes. We propose to extend the Rice model by adding algorithm features to dataset features as parameters of the algorithm selection function. To do this, we need to investigate the building blocks that comprise algorithms in order to reveal commonalities underlying their apparent diversity; Inore ambitiously, the goal is to identify the components of inductive bias that characterize each algo— rithm and algorithm family. Key components are: the structure and parameters of the models produced, the cost function used to quantify the appropriateness of a model, and the optimization strategy adopted to ﬁnd the model parameter values that minimize this cost function.

Ongoing work involves two broad groups of issues. First, we should sort out a number of ontology engineering problems. The main hurdle we face concerns the limitations of description logic; we need the power of ﬁrst—order logic to formulate the underlying mathematics of learning in an ontological framework. However, we must weigh the trade—off between expressive power and interoperability with OWL—based e—science platforms. Collaboration with specialists in formal ontolo— gies is crucial at this point. Second, the priority data mining issue is identifying other components of bias for learning algorithms, in addition to those described in this paper. This task concerns classiﬁcation in the ﬁrst instance, but could be fruitfully extended to other predictive and descriptive data mining tasks.

This two—pronged research agenda is clearly beyond the reach of a single research group or even of a small—scale European project. The short—term goal is to gather interested data miners and ontology engineers to consolidate the core concepts and orientation of the DMO. The next step will be to show how the DMO can be used to improve algorithm selection through Ineta—learning. Here again, it is indispensable to establish broad collaborations and leverage the results of teams working actively in the area. For instance, the wealth of Ineta—data gathered in extensive empirical comparisons [9] and community—based experimentation platforms [25] will certainly help to overcome the well—known bottleneck of meta— data sparsity that has always hindered Ineta—learning research.

Acknowledgements

This work was supported by the European Union within FP7 ICT project e— LICO (Grant No 231519).

References

1. D. W. Aha. Generalizing from case studies: a case study. In D. Sleeman and P. Edwards, editors, Proc. of the 9th International Workshop on Machine Learning, pages 1710. Morgan Kaufmann, 1992.

2. S. Ali and K. Smith-Miles. A meta-leamning approach to automatic kernel selection for support vector machines. Neurocomputing, 70(1-3):1737186, 2006.

3. H. Bensusan and C. Giraud—Carrier. Discovering task neighbourhoods through landmark learning performances. In: Proceedings of the Fourth European Conference on Principles and Practice of Knowledge Discovery in Databases, pages 3257330, 2000.

4. A. Bernstein, F. Provost, and S. Hill. Toward intelligent assistance for a data mining process: An ontology—based approach for cost-sensitive classiﬁcation. IEEE Transactions on Knowledge and Data Engineering, 17(4):5037518, 2005.

. C. Bishop. Pattern Recognition and Machine Learning. Springer, 2006.

6. P. B. Brazdil and R. J . Henery. Analysis of results. In Michie et al. [19], chapter 10, pages 1757212.

7. P. Brezany, I. Janciak, and A. Min Tjoa. Ontology-based construction of grid data mining workﬂows. In H. O. Nigro, S. E. Gonzalez Cisaro, and D. H. Xodo, editors, Data Mining with Ontologies: Implementations, Findings and Frameworks. IGI Global, 2008.

8. M. Cannataro and C. Comito. A data mining ontology for grid programming. In Proc. Ist Int. Workshop on Semantics in PeeT—to—Peer and Grid Computing, in conjunction with WWW2003, pages 1137134, 2003.

9. R. Carnana and A. Niculescu—Mizil. An empirical comparison of supervised learning algorithms. In: Proceedings of the 23rd International Conference on Machine Learning, Pittsburgh, PA, 2006.

10. Christian Kopf Charles, Charles Taylor, and Jorg Keller. Meta—analysis: From data characterisation for meta-learning to meta-regression. In: Proceedings of the PKDD—OO Workshop on Data Mining, Decision Support,Meta—Leaming and ILP, 2000.

. S. J. Cunningham. Dataset cataloging metadata for machine learning applica- tions and research. In: Proceedings of the Sixth International Workshop on A1 and Statistics 1997, Fort Lauerdale, FL, 1997.

J . Demsar. Statistical comparisons of classiﬁers over multiple data sets. Journal of Machine Learning Research, 7:1730, 2006.

W. Duch and K. Grudzinski. Meta—learning: Searching in the model space. In Proc. of the Int. Conf. on Neural Information Processing {ICONIP}, Shanghai 2001, pages 2357240, 2001.

A. Feelders and W. Verkooijen. On the statistical comparison of inductive learning methods, chapter 26, pages 2717279. Springer, 1996.

G. H. John and P. Langley. Estimating continuous distributions in bayesian clas- siﬁers. In P. Besna.rd and S. Hanks, editors, Procs. Eleventh Conference on Uncer— tainty in Artiﬁcial Intelligence, pages 3387345. Morgan Kaufmann, 1995.

A. Kalousis and M. Hilario. Model selection via meta—learning. International Journal on Artiﬁcial Intelligence Tools, 10(4), 2001.

J . U. Kietz, F. Serban, A. Bernstein, and S. Fischer. Towards cooperative planning of data mining workﬂows. In Submitted to the ECML/PKDD—2009 Workshop on Third Generation Data Mining: Service Oriented Knowledge Discovery, 2009.

T. Lim, W. Loh, and Y. Shih. A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classiﬁcation algorithms. Machine Learning, 40:35775, 2000.

D. Michie, D. J. Spiegelhalter, and C. C. Taylor, editors. Machine learning, neural and statistical classiﬁcation. Ellis — Horwood, 1994.

P. Panov, S. Dzeroski, and L. Soldatova. Ontodm: An ontology of data mining. In Proceedings of the 2008 IEEE International Conference on Data Mining Work— shops, pages 7527760, 2008.

B. Pfahringer, H. Bensusan, and C. Giraud-Camrier. Meta-leamning by landmarking various learning algorithms. In Proc. Seventeenth International Conference on Machine Learning, IOML’2000, pages 7437750, San Francisco, California, June 2000. Morgan Kaufmann.

L. Rendell and E. Cho. Empirical learning as a function of concept character. Machine Learning, 5:2677298, 1990.

J. Rice. The algorithm selection problem. Advances in Computing, 15:657118, 1976.

K. A. Smith-Miles. Cross — disciplinary perspectives on meta-leamning for algorithm selection. AOM Computing Surveys, 41(1), 2008.

Joaquin Vanschoren, Hendrik Blockeel, Bernhard Pfahringer, and Geoff Holmes. Experiment databases: Creating a new platform for meta—learning research. In Planning to Learn Workshop, ICML/COLT/UAI—2008, pages 10715, Helsinki, July 2008.

D. Wolpert. The lack of a priori distinctions between learning algorithms. Neural Computation, 8(7):138171390, 1996.

Monika Zakova, Petr Kiemen, F. Zelezny, and N. Lavrac. Using ontologi- cal reasoning and planning for data mining workﬂow composition. In SoKD: ECML/PKDD 2008 workshop on Third Generation Data Mining: Towards Service— oﬁented Knowledge Discovery, 2008.

Monika Zakova, Petr Kiemen, Filip Zelezny, and Nada Lavrac. Planning to learn with a knowledge discovery ontology. In Planning to Learn Workshop (PlanLeam 2008) at ICML 2008, 2008.

,

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2009 ADataMiningOntforAlgSelandMeta-Min	Melanie Hilario Alexandros Kalousis Phong Nguyen Adam Woznica			A Data Mining Ontology for Algorithm Selection and Meta-Mining		Proceedings of the 2nd International Workshop on Third generation Data Mining	http://cui.unige.ch/~woznica/doc/papers/SoKD2009.pdf			2009