2004 EnhanvingOntolKnowThrOntPopAndEnrich

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Ontology Learning from Text.

Notes

Cited By

Quotes

Abstract

Ontologies are widely used for capturing and organizing knowledge of a particular domain of interest. This knowledge is usually evolvable and therefore an ontology maintenance process is required to keep the ontological knowledge up-to-date. We proposed an incremental ontology maintenance methodology which exploits ontology population and enrichment methods to enhance the knowledge captured by the instances of the ontology and their various lexicalizations. Furthermore, we employ ontology learning techniques to alleviate as much as possible the intervention of human into the proposed methodology. We conducted experiments using the CROSSMARC ontology as a case study evaluating the methodology and its partial methods. The methodology performed well enhancing the ontological knowledge to 96.5% from only 50%.

3.1 Incremental Ontology Population and Enrichment

The incremental ontology population and enrichment methodology proposed, iterates through four stages:

  1. . Ontology-based Semantic Annotation. The instances of the domain ontology are used to semantically annotate a domain-specific corpus in an automatic way. In this stage disambiguation techniques are used exploiting knowledge captured in the domain ontology.
  2. . Knowledge Discovery. An information extraction module is employed in this stage to locate new ontological instances. The module is trained, using ma- chine learning methods, on the annotated corpus of the previous stage.
  3. . Knowledge Refinement. A compression-based clustering algorithm is em- ployed in this stage for identifying lexicographic variants of each instance supporting the ontology enrichment.
  4. . Validation and Insertion. A domain expert validates the candidate instances that have been added in the ontology.
Fig.1. Overall Method for Ontology Maintenance

Figure 1 depicts the above methodology which is presented in more detail in the following subsections.

Ontology-based Semantic Annotation

The aim of this stage is to annotate a corpus with existing concept instances. This instance-based method differs from the semi-automated semantic annotation that has been proposed in the literature, as it intends to automatically annotate a document with metadata derived explicitly from the ontology at hand. Other methods (appear in section 5) can be characterized as concept-based as they intend to annotate all the potential instances that can be found in a corpus and belong to a particular concept. These methods usually exploit context-typed information using information extraction methods. Obviously, the instance-based semantic annotation is faster as it does not need to identify new instances but requires disambiguation techniques as the latter does as well. On its own, this method is sufficient when our knowledge about a domain is closed or when we are interested only in the known concept instances.

The semantic annotation of the corpus is currently performed by a string matching technique that is biased to select the maximum spanning annotated lexical expression for each instance. One problem with this method is the identification of properties, whose range of values is a numerical datatype followed by the corresponding measurement unit, e.g. dates, age, capacity. For example, the numeric string “32” could be an instance of ram memory or hard disk capacity. Those ambiguities are resolved by the exploitation of the measurement units e.g. if the “32” is being followed by the string “kb” then it is a ram memory’s instance and if is being followed by the string “GB” then it is an instance of the hard disk capacity. Beyond exploiting measurement units (this knowledge is encoded in our ontology), properties are also identified by special rules that enhance string matching techniques by using again knowledge encoded in the ontology, such as the valid range of values that a property can take. For example, RAM capacity values range in a set that is different from the one that the Hard disk capacity ranges. We encode such knowledge in the definition of the concept and use it to resolve the ambiguities.


,

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2004 EnhanvingOntolKnowThrOntPopAndEnrichAlexandros G. Valarakos
Georgios Paliouras
Vangelis Karkaletsis
George Vouros
Enhancing Ontological Knowledge through Ontology Population and EnrichmentProceedings of the 14th EKAW conferencehttp://users.iit.demokritos.gr/~alexv/publications/valarakosEKAW04.pdf2004