2016 GenericOntologyofDatatypes

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Datatype, Data Structure, ChEBI.

Notes

Cited By

Quotes

Abstract

We present OntoDT, a generic ontology for the representation of scientific knowledge about datatypes. OntoDT defines basic entities, such as datatype, properties of datatypes, specifications, characterizing operations, and a datatype taxonomy. We demonstrate the utility of OntoDT on several use cases. OntoDT was used within an Ontology of core data mining entities for constructing taxonomies of datasets, data mining tasks, generalizations and data mining algorithms. Furthermore, we show how OntoDT can be used to annotate and query dataset repositories. We also show how OntoDT can improve the representation of datatypes in the BioXSD exchange format for basic bio-informatics types of data. The generic nature of OntoDT enables it to support a wide range of other applications, especially in combination with other domain specific ontologies: the construction of data mining workflows, annotation of software and algorithms, semantic annotation of scientific articles, etc. OntoDT is open source and is available at http://www.ontodt.com.

1. Introduction

Data processing is at the heart of science. Scientific research workflows rely heavily on datatype representations. Especially in data mining research it is impossible to efficiently (semi-) automatically connect parts of workflows, such as data preprocessing and data mining, perform analysis of the research results and communicate the research outputs, without machine processable representation of datatypes and their properties. There is a need for a standardized semantically-defined and machine amenable representation of scientific datatypes to support cross-domain applications. Unfortunately, the existing representations of datatypes do not fully address such a need.

In the literature, there exist different definitions of datatypes. In computer science, a datatype is usually defined as a “classification that identifies various types of data, such as boolean, integer, discrete and others, that determines the possible values for that type, operations on the values of the data, and the way the values of that type can be stored” [56]. Nell and Walker [8] discuss the difference]] between a data structure and datatype in the sense that “data structure refers to the study of data and how to represent data objects within a program; that is, the implementation of structured relationships” while a datatype defines “the properties of classes of objects in addition to how these objects might be represented in a program”. Martin [36] also discusses the difference between data structures and datatypes and states that “depending on the point of view, a data object is characterized by its type (for the user) or by its structure (for the implementer) ”.

In this paper, we present OntoDT, a generic ontology of datatypes. OntoDT defines the semantics, i.e., meaning of the key entities and represents the knowledge about datatypes in a machine friendly way. The OntoDT ontology is based on the latest revised version of the ISO / IEC 11404 standard for datatypes [ 23 ].

      • .

This paper is organized as follows. In Section 2, we present the background related to the development of the OntoDT ontology. In Section 3, we review and discuss the related work. Next, in Section 4, we present the ontology design principles and implementation, and in Section 5 we present the key OntoDT classes. In Section 6, we present the OntoDT datatype taxonomy. Finally, we present the ontology evaluation (Section 7), and three use cases of the ontology (Section 8). We conclude the paper with a discussion (Section 9) and a summary of contributions and points for further work (Section 10).

2. Background

The OntoDT development started within the frame of an ontology for data mining (OntoDM) [ 44 ]. The main idea of using a formalized description of datatypes for the domain of data mining was to characterize the types of data contained in a dataset, the applicability of a data mining task on data from a given datatype, and the applicability of a data mining algorithm on a dataset. Due to generality and reuse purposes, OntoDT has evolved to become an independent ontology.

References

  • 1. F. Baader, D. Calvanese, D.L. McGuinness, D. Nardi, P.F. Patel-Schneider (Eds.), [The Description Logic Handbook: Theory, Implementation, and Applications], Cambridge University Press, New York, NY, USA (2003)
  • 2. K. Bache, M. Lichman, 2013, UCI Machine Learning Repository, URL: http://archive.ics.uci.edu/ml (accessed 8.12.14).
  • 3. Basic Formal Ontology (BFO) Web Page, 2014, URL: http://www.ifomis.org/bfo (accessed 31.03.14).
  • 4. BioPortal Web Page, 2014, URL: Https://bioportal.bioontology.org (accessed 31.03.14).
  • 5. Chemical Entities of Biological Interest (ChEBI) Web Page, 2014, URL: http://www.ebi.ac.uk/chebi/ (accessed 08.12.14).
  • 6. Michael Compton, Payam Barnaghi, Luis Bermudez, RaúL GarcíA-Castro, Oscar Corcho, Simon Cox, John Graybeal, Manfred Hauswirth, Cory Henson, Arthur Herzog, Vincent Huang, Krzysztof Janowicz, W. David Kelsey, Danh Le Phuoc, Laurent Lefort, Myriam Leggieri, Holger Neuhaus, Andriy Nikolov, Kevin Page, Alexandre Passant, Amit Sheth, Kerry Taylor, Ontology Paper: The SSN Ontology of the W3C Semantic Sensor Network Incubator Group, Web Semantics: Science, Services and Agents on the World Wide Web, 17, p.25-32, December, 2012 doi:10.1016/j.websem.2012.05.003
  • 7. Mélanie Courtot, Frank Gibson, Allyson L. Lister, James Malone, Daniel Schober, Ryan R. Brinkman, Alan Ruttenberg, MIREOT: The Minimum Information to Reference An External Ontology Term, Applied Ontology, v.6 n.1, p.23-33, January 2011
  • 8. N. Dale, H.M. Walker, A Classification of Data Types, Comput. Sci. Edu., 3 (1992) 223-232.
  • 9. Data Mining Optimization Ontology (DMOP) Web Page, 2014, URL: http://www.e-lico.eu/DMOP.html (accessed 31.03.14).
  • 10. Data Type Registry (DTR) Web Page, 2015, URL: http://typeregistry.org/ (accessed 11.05.15).
  • 11. Data Type Registry Work Group Web Page, 2015, URL: Https://rd-alliance.org/groups/data-type-registries-wg.html (accessed 11.05.15).
  • 12. Descriptive Ontology for Linguistic and Cognitive Engineering (DOLCE) Web Page, 2014, URL: http://www.loa.istc.cnr.it/old/DOLCE.html (accessed 08.12.14).
  • 13. EDAM Ontology Web Page, 2014, URL: http://edamontology.org (accessed 31.03.14).
  • 14. T. Erl,, Prentice Hall PTR, Upper Saddle River, NJ, USA, 2005.
  • 15. J. Garcia, F.J. Garca-Penalvo, R. Theron, A Survey on Ontology Metrics, Springer, Berlin Heidelberg, 2010.
  • 16. M. Grüninger, M. Fox, Methodology for the Design and Evaluation of Ontologies, 1995.
  • 17. Health Level Seven Reference Implementation Model (HL7) Web Page, 2014, URL: http://www.hl7.org (accessed 31.03.14).
  • 18. HermiT OWL Reasoner Web Page, 2014, URL: http://www.hermit-reasoner.com (accessed 31.03.14).
  • 19. M. Hilario, P. Nguyen, H. Do, A. Woznica, A. Kalousis, Ontology-based Meta-mining of Knowledge Discovery Workflows, in: S, 2011, Pp. 273-315.
  • 20. Information Artifact Ontology (IAO) Web Page, 2014, URL: http://code.google.com/p/information-artifact-ontology (accessed 31.03.14).
  • 21. Image and Data Quality Assessment Ontology (IDQA) Web Page, 2014, URL: Https://bioportal.bioontology.org/ontologies/IDQA (accessed 31.03.14).
  • 22. ISO/IEC 11404:1996, 1996, Information Technology - Programming Languages, their Environments and System Software Interfaces - Language-independent Datatypes, URL: http://www.iso.org/iso/catalogue_detail.htm?csnumber=19346.
  • 23. ISO/IEC 11404:2007, 2007, Information Technology - General-Purpose Datatypes (GPD), URL: http://www.iso.org/iso/catalogue_detail.htm?csnumber=39479.
  • 24. Jon Ison, Matúš Kalaš, Inge Jonassen, Dan Bolser, Mahmut Uludag, Hamish McWilliam, James Malone, Rodrigo Lopez, Steve Pettifer, Peter Rice, EDAM, Bioinformatics, v.29 n.10, p.1325-1332, May 2013 doi:10.1093/bioinformatics/btt113
  • 25. S. Jupp, M. Horridge, L. Iannone, J. Klein, S. Owen, J. Schanstra, K. Wolstencroft, R. Stevens, Populous: A Tool for Building OWL Ontologies from Templates, BMC Bioinformatics, 13 (2012) S5.
  • 26. Matúš Kalaš, Pæl Puntervoll, Alexandre Joseph, Edita Bartaševičiūtė, Armin Töpfer, Prabakar Venkataraman, Steve Pettifer, Jan Christian Bryne, Jon Ison, Christophe Blanchet, Kristoffer Rapacki, Inge Jonassen, BioXSD, Bioinformatics, v.26 n.18, P.i540-i546, September 2010 doi:10.1093/bioinformatics/btq391
  • 27. Aram Karalič, Ivan Bratko, First Order Regression, Machine Learning, v.26 n.2-3, p.147-176, Feb./March 1, 1997 doi:10.1023/A:1007365207130
  • 28. C.M. Keet, A. Awrynowicz, C. Amato, A. Kalousis, P. Nguyen, R. Palma, R. Stevens, M. Hilario, The Data Mining Optimization Ontology, Web Semantics: Science, Services and Agents on the World Wide Web. Doi: 10.1016/j.websem.2015.01.001.
  • 29. J.-U. Kietz, F. Serban, S. Fischer, A. Bernstein, Semantics Inside! But Let's Not Tell the Data Miners: Intelligent Support for Data Mining, in:, Springer International Publishing, 2014, Pp. 706-720.
  • 30. Dragi Kocev, Celine Vens, Jan Struyf, SašO Deroski, Tree Ensembles for Predicting Structured Outputs, Pattern Recognition, v.46 n.3, p.817-833, March, 2013 doi:10.1016/j.patcog.2012.09.023
  • 31. Petr Kremen, Bogdan Kostov, Expressive OWL Queries: Design, Evaluation, Visualization, International Journal on Semantic Web & Information Systems, v.8 n.4, p.57-79, October 2012 doi:10.4018/jswis.2012100104
  • 32. Library for Quantity Kinds and Units: Schema, based on QUDV Model OMG SysML(TM), 2014, Version 1.2, URL: http://www.w3.org/2005/Incubator/ssn/ssnx/qu/qu (accessed 08.12.14)¿.
  • 33. Linked Models Web Page, 2014, URL: http://linkedmodels.org (accessed 31.03.14).
  • 34. J. Madin, S. Bowers, M. Schildhauer, S. Krivov, D. Pennington, F. Villa, An Ontology for Describing and Synthesizing Ecological Observation Data, Ecol. Inform., 2 (2007) 279-296.
  • 35. Gjorgji Madjarov, Dragi Kocev, Dejan Gjorgjevikj, SašO Deroski, An Extensive Experimental Comparison of Methods for Multi-label Learning, Pattern Recognition, v.45 n.9, p.3084-3104, September, 2012 doi:10.1016/j.patcog.2012.03.004
  • 36. J.J. Martin,, Prentice Hall International, UK, 1986.
  • 37. Brian Meek, A Taxonomy of Datatypes, ACM SIGPLAN Notices, v.29 n.9, p.159-167, Sept. 1994 doi:10.1145/185009.185042
  • 38. Microarray and Gene Expression Data Ontology (MGED) Web Page, 2014, URL: http://mged.sourceforge.net/ontologies/MGEDontology.php (accessed on 31.03.14).
  • 39. National Cancer Institute Thesaurus (NCIT) Web Page, 2014, URL: http://ncit.nci.nih.gov (accessed 31.03.14).
  • 40. Ian Niles, Adam Pease, Towards a Standard Upper Ontology, Proceedings of the International Conference on Formal Ontology in Information Systems, p.2-9, October 17-19, 2001, Ogunquit, Maine, USA doi:10.1145/505168.505170
  • 41. Ontology of Biomedical Investigations (OBI) Web Page, 2014, URL: http://obi-ontology.org (accessed 31.03.14).
  • 42. Open Biomedical Ontologies (OBO), 2014, Foundry Web Page, URL: http://www.obofoundry.org (accessed 31.03.14).
  • 43. OWL2Query Web Page, 2014, URL: http://krizik.felk.cvut.cz/km/owl2query/index.html (accessed 31.03.14).
  • 44. P. Panov, L. Soldatova, S. D¿eroski, Representing Entities in the OntoDM Data Mining Ontology, in:, Springer, New York, 2010, Pp. 27-58.
  • 45. Panče Panov, Larisa Soldatova, Sašo Džeroski, Ontology of Core Data Mining Entities, Data Mining and Knowledge Discovery, v.28 n.5-6, p.1222-1265, September 2014 doi:10.1007/s10618-014-0363-0
  • 46. S. Pettifer, J. Ison, M. Kalaš, D. Thorne, P. McDermott, I. Jonassen, A. Liaquat, J.M. Fernndez, J.M. Rodriguez, I. Partners, D.G. Pisano, C. Blanchet, M. Uludag, P. Rice, E. Bartaseviciute, K. Rapacki, M. Hekkelman, O. Sand, H. Stockinger, A.B. Clegg, E. Bongcam-Rudloff, J. Salzemann, V. Breton, T.K. Attwood, G. Cameron, G. Vriend, The EMBRACE Web Service Collection, Nucleic Acids Res., 38 (2010) W683-W688.
  • 47. Phylogenetic Ontology (PHYLONT) Web Page, 2014, URL: Https://bioportal.bioontology.org/ontologies/PHYLONT (accessed 31.03.14).
  • 48. Protégé Software Web Page, 2014, URL: http://protege.stanford.edu (accessed 31.03.14).
  • 49. Quantities, Units, Dimensions and Data Types Ontologies (QUDT) Web Page, 2014, URL: http://qudt.org/(accessed 08.12.14).
  • 50. Research Data Aliance (RDA) Web Page, 2015, URL: http://rd-alliance.org/ (accessed 11.05.15).
  • 51. RDF Data Cube Vocabulary Web Page, 2014, URL: http://www.w3.org/TR/vocab-data-cube/ (accessed 08.12.14).
  • 52. Relational Ontology (RO) Web Page, 2014, URL: http://purl.org/obo/owl/OBO_REL (accessed 31.03.14).
  • 53. Review of Sensor and Observations Ontology Web Page, 2014, URL: http://www.w3.org/2005/Incubator/ssn/wiki/Review_of_Sensor_and_Observations_Ontologies(accessed 08.12.14).
  • 54. Semantic Annotations for Web Services Description Language (SAWSDL) Web Page, 2014, URL: http://www.w3.org/TR/sawsdl (accessed 31.03.14).
  • 55. Semantic Sensor Network Ontology (SSN) Web Page, 2014, URL: http://purl.oclc.org/NET/ssnx/ssn (accessed 08.12.14).
  • 56. C.A. Shaffer,, Prentice Hall, Upper Saddle River, NJ, 1997.
  • 57. SKOS Simple Knowledge Organization System Web Page, 2014, URL: http://www.w3.org/2004/02/skos/ (accessed 08.12.2014).
  • 58. Carlos N. Silla, Jr., Alex A. Freitas, A Survey of Hierarchical Classification Across Different Application Domains, Data Mining and Knowledge Discovery, v.22 n.1-2, p.31-72, January 2011 doi:10.1007/s10618-010-0175-9
  • 59. E. Sirin, B. Parsia, SPARQL-DL: SPARQL Query for OWL-DL, 2007.
  • 60. B. Smith, M. Ashburner, C. Rosse, J. Bard, W. Bug, W. Ceusters, L.J. Goldberg, K. Eilbeck, A. Ireland, C.J. Mungall, N. Leontis, P. Rocca-Serra, A. Ruttenberg, S.-A. Sansone, R.H. Scheuermann, N. Shah, P.L. Whetzel, S. Lewis, The OBO Foundry: Coordinated Evolution of Ontologies to Support Biomedical Data Integration, Nat. Biotechnol., 25 (2007) 1251-1255.
  • 61. B. Smith, W. Ceusters, B. Klagges, J. Kohler, A. Kumar, J. Lomax, C. Mungall, F. Neuhaus, A.L. Rector, C. Rosse, Relations in Biomedical Ontologies, Genome Biol., 6 (2005) R46.
  • 62. Suggested Upper Merged Ontology (SUMO) Web Page, 2014, URL: http://www.adampease.org/OP/ (accessed 08.12.14).
  • 63. SWO Ontology Web Page, 2014, URL: http://theswo.sourceforge.net (accessed 31.03.14).
  • 64. Syndromic Surveillance Ontology (SSO) Web Page, 2014, URL: http://surveillance.mcgill.ca/projects/sso (accessed 31.03.14).
  • 65. Dennis G. Thomas, Rohit V. Pappu, Nathan A. Baker, NanoParticle Ontology for Cancer Nanotechnology Research, Journal of Biomedical Informatics, v.44 n.1, p.59-74, February, 2011 doi:10.1016/j.jbi.2010.03.001
  • 66. Joaquin Vanschoren, Jan N. Van Rijn, Bernd Bischl, Luis Torgo, OpenML: Networked Science in Machine Learning, ACM SIGKDD Explorations Newsletter, v.15 n.2, December 2013 doi:10.1145/2641190.2641198
  • 67. W3C XML Schema Definition Language Web Page, 2014, URL: http://www.w3.org/TR/xmlschema11-1/ (accessed 08.12.14).
  • 68. XML Schema Reference - XSD Elements Web Page, 2014, URL: http://www.w3schools.com/schema/schema_elements_ref.asp (accessed 18.12.14.

}};


 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2016 GenericOntologyofDatatypesSašo Džeroski
Larisa Soldatova
Panče Panov
Generic Ontology of Datatypes10.1016/j.ins.2015.08.0062016