2008 OntoDMAnOntofDataMining

Jump to navigation Jump to search

Subject Headings: Data Mining Ontology, OntoDM Ontology.


Cited By




Motivated by the need for unification of the field of data mining and the growing demand for formalized representation of outcomes of research, we address the task of constructing an ontology of data mining. The proposed [[data mining ontology}ontology]], named OntoDM, is based on a recent proposal of a general framework for data mining, and includes definitions of basic data mining entities, such as datatype and dataset, data mining task, data mining algorithm and components thereof (e.g., distance function), etc. It also allows for the definition of more complex entities, e.g., constraints in constraint-based data mining, sets of such constraints (inductive queries) and data mining scenarios (sequences of inductive queries). Unlike most existing approaches to constructing ontologies of data mining, OntoDM is a deep/heavy-weight ontology and follows best practices in ontology engineering, such as not allowing multiple inheritance of classes, using a predefined set of relations and using a top level ontology.


  • M. Ashburner, C. A. Ball, J. A. Blake, D. Botstein, H. Butler, J. M. Cherry, A. P. Davis, K. Dolinski, S. S. Dwight, J. T. Eppig, M. A. Harris, D. P. Hill, L. Issel-Tarver, A. Kasarskis, S. Lewis, J. C. Matese, J. E. Richardson, M. Ringwald, G. M. Rubin, and G. Sherlock. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet, 25(1):25–29, May 2000.
  • C. A. Ball and A. Brazma. MGED standards: work in progress. Omics : a journal of integrative biology, 10(2):138–144, 2006.
  • A. Bernstein, F. Provost, and S. Hill. Toward intelligent assistance for a data mining process: An ontology-based approach for cost-sensitive classification. IEEE Trans. on Knowl. and Data Eng., 17(4):503–518, 2005.
  • J.-F. Boulicaut, M. Klemettinen, and H. Mannila. Modeling KDD processes within the inductive database framework. In Data Warehousing and Knowledge Discovery, pages 293–302, 1999.
  • A. Brazma et al. Minimum information about a microarray experiment (MIAME)-toward standards for microarray data. Nature Genetics, 29:365–371, December 2001.
  • P. Brezany, I. Janciak, and A. M. Tjoa. Data Mining with Ontologies: Implementations, Findings and Frameworks, chapter Ontology-Based Construction of Grid Data Mining Workflows. IGI Global, 2007.
  • M. Cannataro and C. Comito. A data mining ontology for grid programming. In: Proceedings of the 1st International Workshop on Semantics in Peer-to-Peer and Grid Computing (SemPGrid2003), pages 113–134, 2003.
  • M. Cannataro and D. Talia. The knowledge grid. Commun. ACM, 46(1):89–93, 2003.
  • B. Chandrasekaran, J. R. Josephson, and V. R. Benjamins. What are ontologies, and why do we need them? IEEE Intelligent Systems, 14(1):20–26, 1999.
  • P. Chapman, R. Kerber, J. Clinton, T. Khabaza, T. Reinartz, and R. Wirth. The CRISP-DM process model. Discussion Paper, March 1999. http://www.crisp-dm.org.
  • S. Dzeroski. Towards a general framework for data mining. In S. Dzeroski and J. Struyf, editors, KDID, volume 4747 of Lecture Notes in Computer Science, pages 259–300. Springer, 2006.
  • S. Dzeroski, S. Schulze-Kremer, K. R. Heidtke, K. Siems, and D. Wettschereck. Applying ILP to diterpene structure elucidation from 13 c NMR spectra. In S. Muggleton, editor, Inductive Logic Programming Workshop, volume 1314 of Lecture Notes in Computer Science, pages 41–54. Springer, 1996.
  • A. Gangemi, N. Guarino, C. Masolo, A. Oltramari, and L. Schneider. Sweetening ontologies with DOLCE, 2002.
  • M. F. Hornick, E. Marcade, and S. Venkayala. Java Data Mining: Strategy, Standard, and Practice: A Practical Guide for architecture, design, and implementation (The Morgan Kaufmann Series in Data Management Systems). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2006.
  • T. Imielinski and H. Mannila. A database perspective on knowledge discovery. Comm. Of The Acm, 39:58–64, 1996.
  • A. Kalousis, A. Bernstein, and M. Hilario. Meta-learning with kernels and similarity functions for planning of data mining workflows. In P. Brazdil, A. Bernstein, and L. Hunter, editors, Proceedings of the Second Planning to Learn Workshop (PlanLearn) at the ICML/COLT/UAI 2008, pages 23–28, 2008.
  • R. Mizoguchi. Tutorial on ontological engineering - part 3: Advanced course of ontological engineering. New Generation Comput., 22(2), 2004.
  • R. Ramakrishnan, R. Agrawal, J.-C. Freytag, T. Bollinger, C. W. Clifton, S. Dzeroski, J. Hipp, D. Keim, S. Kramer, H.-P. Kriegel, U. Leser, B. Liu, H. Mannila, R. Meo, S. Morishita, R. Ng, J. Pei, P. Raghavan, M. Spiliopoulou, J. Srivastava, and V. Torra. Data mining: The next generation. In R. Agrawal, J. C. Freytag, and R. Ramakrishnan, editors, Perspectives Workshop: Data Mining: The Next Generation, number 04292 in Dagstuhl Seminar Proceedings, Dagstuhl, Germany, 2005. Internationales Begegnungs- und Forschungszentrum fur Informatik (IBFI), Schloss Dagstuhl, Germany.
  • C. Rosse and J. L. V. Mejino. A reference ontology for biomedical informatics: the foundational model of anatomy. J. of Biomedical Informatics, 36(6):478–500, December 2003.
  • S.-A. Sansone at al. Metabolomics standards initiative - ontology working group. work in progress. Metabolomics, 3(3):249–256, 2007.
  • D. Schober, W. Kusnierczyk, S. E. Lewis, and J. Lomax. Towards naming conventions for use in controlled vocabulary and ontology engineering. In: Proceedings of BioOntologies SIG, ISMB 2007, pages 29–32, 2007.
  • B. Smith, W. Ceusters, B. Klagges, J. Kohler, A. Kumar, J. Lomax, C. Mungall, F. Neuhaus, A. L. Rector, and C. Rosse. Relations in biomedical ontologies. Genome Biology, 6(5), 2005.
  • B. Smith and N. Shah. Ontologies for biomedicine - how to make them and use them. Tutorial notes at ISMB/ECCB 2007, 2007.
  • L. N. Soldatova, W. Aubrey, R. D. King, and A. Clare. The exact description of biomedical protocols. Bioinformatics, 24(13), 2008.
  • L. N. Soldatova and R. D. King. An ontology of scientific experiments. Journal of the Royal Society Interface, 3(11):795–803, 2006.
  • R. D. Stevens, P. G. Baker, S. Bechhofer, G. Ng, A. Jacoby, N. Paton, C. A. Goble, and A. Brass. Tambis: Transparent access to multiple bioinformatics information sources. Bioinformatics, 16:200–0, 2000.
  • C. F. Taylor at al. The minimum information about a proteomics experiment (miape). Nature Biotechnology, (25):887 – 893, 2007.
  • I. H. Witten and E. Frank. Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann Series in Data Management Systems. Morgan Kaufmann, second edition, June 2005.
  • Q. Yang and X. Wu. 10 challenging problems in data mining research. International Journal of Information Technology and Decision Making, 5(4):597–604, 2006.
  • M. Zakova, P. Kremen, F. Zelezny, and N. Lavrac. Planning to learn with a knowledge discovery ontology. In P. Brazdil, A. Bernstein, and L. Hunter, editors, Proceedings of the Second Planning to Learn Workshop (PlanLearn) at the ICML/COLT/UAI 2008, pages 29–34, 2008.,

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2008 OntoDMAnOntofDataMiningSašo Džeroski
Larisa Soldatova
Panče Panov
OntoDM: An Ontology of Data MiningIEEE International Conference on Data Mining Workshopshttp://kt.ijs.si/panovp/Default files/OntoDM PanovEtAl ICDMw08.pdf2008