2014 TargetDrivenMergingofTaxonomies

(Raunich & Rahm, 2014) ⇒ Salvatore Raunich, and Erhard Rahm. (2014). “Target-driven Merging of Taxonomies with Atom.” In: Information Systems, 42.

Subject Headings: Taxonomy Merging Algorithm, Atom Taxonomy Merging Algorithm.

Notes

Cited By

http://scholar.google.com/scholar?q=%222014%22+Target-driven+Merging+of+Taxonomies+with+Atom

Quotes

Author Keywords

Data models; Data integration; Taxonomy merging; Database applications

Abstract

The proliferation of ontologies and taxonomies in many domains increasingly demands the integration of multiple such ontologies. We propose a new taxonomy merging algorithm called Atom that, given as input two taxonomies and a match mapping between them, can generate an integrated taxonomy in a largely automatic manner. The approach is target-driven, i.e. we merge a source taxonomy into the target taxonomy and preserve the target ontology as much as possible. In contrast to previous approaches, Atom does not aim at fully preserving all input concepts and relationships but strives to reduce the semantic heterogeneity of the merge results for improved understandability. Atom can also exploit advanced match mappings containing is-a relationships in addition to equivalence relationships between concepts of the input taxonomies. We evaluate Atom for synthetic and real-world scenarios and compare it with a full merge solution.

1. Introduction

Ontologies and taxonomies are increasingly used to semantically categorize or annotate information, especially on the web. For example, product catalogs of online shops or web directories categorize products or websites to help users finding relevant entries. In life sciences, ontologies are used to describe components and functions of organisms or objects such as genes or proteins. Since many ontologies refer to the same domain and to the same objects, there is a growing need to integrate or merge such related ontologies. The goal is to create a merged ontology providing a unified view on the two or more input ontologies.

Despite a significant amount of previous work on the related problem of schema integration [2], ontology integra - tion is still a challenge and not sufficiently solved. Previous ontology merging approaches [17, 13, 32] are largely user - controlled and provide little support to automatically deter - mine merge solutions. However, such manual approaches are insufficient for merging large ontologies with thousands of concepts so that there is a strong need to automatically determine ontology merge results which the user can confirm or adjust as needed. One promising approach to this end is to decompose the complex integration problem into match and merge subtasks and leverage the significant advances already made for automatic ontology matching to solve the first subproblem. The merge subtask can then utilize a match mapping identifying corresponding concepts in the input ontologies that should be merged. This idea has already been applied for integrating database schemas, where several proposed approaches merge schemas based on a pre-determined match mapping [6, 28, 20, 29, 23].

Previous merge approaches commonly treat all input ontologies symmetrically and require that all information from the input ontologies should be preserved in the merged ontology, in particular all concepts and their relationships [22]. We argue that such symmetric, fully information-preserving merge solutions are not always desirable but may introduce a significant amount of semantic redundancy due to heterogeneous organizations of the same concepts.

For illustration, consider the simple scenario in Fig. 1 that we will use as a running example. The task is to merge the catalog of a new online car shop (source) into the catalog of a price comparison portal (target). We assume that a match mapping, expressed as a set of correspondences between source and target concepts, is already given, either automatically generated by a matching tool or manually designed by an expert user. In this example, the input matching contains four equivalence correspondences labeled eq1, eq2, eq3, eq4 (we initially ignore the other correspondences). A typical merge approach would combine equivalent concepts and maintain all the remaining input concepts and relationships in the merge result. We call this a symmetric, full merge approach since it preserves all input concepts and relationships.

Fig. 1. Running example.

Fig. 2. Full merge solution.

The running example shows that the two taxonomies organize the vehicles in different ways. The target initially categorizes first by manufacturer (Audi, BMW, etc.) and then by body style (sedan, wagon, etc.) while the source taxonomy uses the opposite order. Fig. 2 shows the solution that a full merge approach would produce. It preserves both views in the merged taxonomy but thereby introduces a semantic overlap (redundancy) and reduces the understandability of the resulting taxonomy. In particular, multiple inheritance has been introduced for several concepts so that there are multiple paths to several leaves. For example, the leaf concept Sedan Audi can be reached through both the concepts Sedan and Audi showing a semantic overlap between these two concepts.

In our new, asymmetric merge approach we will deal with such situations by giving preference to the target taxonomy. We merge the source taxonomy into the target taxonomy and only preserve the concepts and structure of the target taxonomy but drop concepts and relationships from the source taxonomy that would introduce redundancy in the merge result. We believe that such an asymmetric merge is highly relevant in practice. It supports the incremental integration of new source ontologies into an existing target ontology, such as a data warehouse or a mediator ontology. Preserving the target ontology can greatly improve its stability and minimize the need to change applications of the integrated ontology. The asymmetric merge is also useful for applications such as web data integration or the integration of life science ontologies. As in the running example, it supports catalog integration of web shops, e.g. for adding the catalog of a new online shop into the catalog of a price comparison portal. In life sciences, there exist already large manually curated hub ontologies such as Uberon or UMLS combining diverse anatomy or other biomedical ontologies [15, 5]. Adding further ontologies to such integrated ontologies can benefit from an automatic, asymmetric merge that reduces human effort and leaves the existing target ontologies largely stable.

In particular, we make the following contributions:

We propose a largely automatic approach for taxonomy merging called ATOM which utilizes a given match mapping between the input taxonomies.

ATOM is an asymmetric, target-driven algorithm, i.e., it merges a source taxonomy into the target taxonomy.

We propose to restrict the semantic overlap in the

merge result for improved understandability. This is achieved by giving preference to the target taxonomy when the same concepts are differently organized in the input taxonomies and limiting the degree of multi- ple inheritance.

* We propose the use of extended match mappings containing equivalence, is-a and inverse is-a relation- ships between concepts of the input taxonomies. The additional types of correspondences are used for a better placement of source concepts and to further reduce the semantic overlap in the merge result.

* We have implemented ATOM and a full merge solution in a working prototype [26] and present an evaluation of both approaches for medium and large real-life ontologies.

In the next section, we introduce our ontology model and define the main requirements for taxonomy merging. In Section 3, we describe the ATOM merge algorithm in detail and discuss its complexity. Section 4 sketches the generation of mappings between the input taxonomies and the merge result that can be used for instance migration. In Section 5 we evaluate the algorithms on real-life ontologies. Related work is described in Section 6 before we conclude.

2. Models and problem definition

2.1. Preliminaries

We first define data representation models used in the paper. An ontology is a quadruple O ¼ ðC; Ci ; I; RÞ where C is a collection of Classes or Concepts, Ci D C is the subset of concepts containing instances, I is the set of instances associated to classes in Ci possibly empty, and R is the set of relationships between concepts. Each concept C has a name (or label) and a collection of attributes or properties Ac, possibly empty. Several kinds of relationships can be defined, like “is-a” or “subclass”, “part-of”, “type-of”, etc. A relationship rða; bÞA R is a directed, binary and semantic connection between two concepts a and b. It can be explicitly present in the ontology or implied by an ontology rule. For example, given two is-a relationships rða; bÞ and rðb; cÞ, the relationship rða; cÞ is implied since is-a relation- ships are transitive.

References

;

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2014 TargetDrivenMergingofTaxonomies	Erhard Rahm Salvatore Raunich			Target-driven Merging of Taxonomies with Atom						2014