2011 LexicalizingAnOntology

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Lexicalized Ontology

Notes

Cited By

Quotes

Author Keywords

Abstract

Rich lexica such as WordNet are valuable resources for information extraction from unstructured text. When extraction techniques have formal ontologies as their targets, a mapping from the lexicon to the ontology has been shown to be beneficial in sense disambiguation and usability of the extracted knowledge. Such mappings are generally established manually, which can be a costly procedure if either the lexicon or the ontology is large. This paper describes an approach to accelerate this mapping process via automation using WordNet as the lexicon and a variety of standard ontologies. Times required to create useful mappings are measured across various parameterizations.

1. Introduction

Accurate mappings from a rich lexicon such as WordNet[1] to formal ontology objects are of significant benefit to information extraction from unstructured text. Such extraction techniques may be used to grow an ontology or to align extracted knowledge to a formal context for later reasoning and analysis. The mapping of the lexicon to the ontology prior to processing text aids with sense disambiguation and accurate selection of extraction targets within the ontology. This mapping process, while only necessary once per version of the lexicon and target ontology, is a time-consuming manual job.

A particularly useful characteristic of the WordNet lexicon is that it has a graph-like structure, with meaningful links between lexical entries. These links are not as formally defined as in a Web Ontology Language (OWL)[2] ontology or full first order logic language, but they provide a useful means for finding associations between synsets.

Many ontology objects, particularly classes, are compositional in nature, combining multiple senses and terms in a single class name. Linguistic resources such as WordNet seldom do this. An example is the class ‘DryRedWine’ found in the popular Wine Ontology, which should be mapped to the WordNet lexical entries for both ‘red wine’ and ‘dry (as in liquor)’.

his paper describes an automated approach for generating mapping candidates between WordNet synsets and target ontology objects. We use this approach to create mappings for three ontologies, and compare the approach to a fully manual mapping process. Finally, we compare different parameterizations of the mapping process to determine optimal use of the automation.

2 Motivation

Our efforts in aligning information extraction results to formal ontologies have resulted in both an automated ontology growth mechanism - Ontology Generation and Evolution Processor (OGEP)[3], and a mechanism for detecting instances of subgraphs of an ontology within extraction results, even if the concept represented by the subgraph is not explicitly mentioned in the source text – the Semantic Grounding Mechanism (SGM)[4]. Each use of these technologies with a different target ontology requires a new set of mappings from the WordNet lexicon to classes of the ontology. We estimate that it takes an average of nearly 2 minutes to produce a mapping for each ontology object when an expert lexicographer uses an ontology editing tool such as Protégé[5] and the online WordNet browser[1].

3 Related Work

The ontologies used for this paper include two popular ‘example’ ontologies – the Wine Ontology[6] and the Pizza Ontology[7]. A third ontology is really a collection of related ontologies under the Basic Formal Ontology (BFO)[8] foundational upper level, including a number of The Open Biological and Biomedical Ontologies (OBO)[9]. This BFO collection is an ‘industrial strength’ ontology, built very carefully and used in numerous academic, commercial and government applications.

As referenced earlier, we use the WordNet lexical database. WordNet is a semantic lexicon for the English language. WordNet groups English terms, called ‘lemmas,’ into sets of synonyms called ‘synsets’, provides short, general definitions called ‘glosses’, and records some light semantic relations between these synsets. The purpose is twofold: to produce a combination dictionary and thesaurus that is understandable to a language user, and to support automatic text analysis and information extraction applications.

4 Approach

Our automated lexicalization approach is to compare the terminological scopes of candidate mappings, exploiting the graph natures of both the lexicon and the target ontology. We have a preference for high-recall, low-precision mapping suggestions, as accounting for false negatives is a much more costly operation in the review process than accounting for false positives, as will be shown in our results in Section 5.

References

,

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2011 LexicalizingAnOntologyJoshua Powers
Anthony Stirtzinger
Lexicalizing An Ontology2011