2006 OntologiesAndTheSemanticWebTutorial

From GM-RKB
(Redirected from Staab, 2006)
Jump to navigation Jump to search

Subject Headings: Ontology, Ontology Design, Basic Concept Extraction from Text.

Notes

Cited By

Quotes

Abstract

  • Knowledge rich domains benefit if knowledge structures are made explicit and formal in order that they may be used by people as well as by machines. In recent years there has been intensive research towards representing and using two kinds of knowledge structures in particular. First, ontologies have been investigated as a means to formalize a conceptualization of a domain of interest, i.e. an ontology captures the terminology of a domain as it remains constant over all the different situations one may encounter for a domain. Second, the semantic web has been conceived as an idea to provide a world wide standard to represent data as well as ontologies - and to link such data and ontologies. While the standardization allows for easy exchange and reuse of encoded knowledge structures, the linkage of data and ontologies allows for occurrence of even large network effects by the community that exploits them.
  • In the tutorial we will approach the foundations for both ontologies and the semantic web and we will see some way of exploiting them in knowledge rich domains.

TOc

  • What is an ontology? (20 min)
  • What is the Semantic Web? (20 min)
  • Representation language for ontologies and the semantic web (50min)
  • Semantic Integration (30 min)
  • Ontology learning from text (30 min)
  • Some uses of ontologies in text representation tasks (30 min)

Some Current Work on Ontology Learning from Text

  • Term Extraction
    • Statistical Analysis
    • Patterns
    • (Shallow) Linguistic Parsing
    • Term Disambiguation & Compositional Interpretation
    • Combinations
  • Taxonomy Extraction
    • Statistical Analysis & Clustering (e.g. FCA)
    • Patterns
    • (Shallow) Linguistic Parsing
    • WordNet
    • Combinations
  • Relation Extraction
    • Anonymous Relations (e.g. with Association Rules)
    • Named Relations (Linguistic Parsing)
    • (Linguistic) Compound Analysis
    • Web Mining, Social Network Analysis
    • Combinations
  • Relation Label Extraction
    • Extension of Association Rules Algorithm
  • Definition Extraction
    • (Linguistic) Compound Analysis (incl. WordNet)

Some Current Work on Ontology Learning from Text

  • AIFB – TextToOnto (Maedche and Staab, 2000; Cimiano et al., 2005)

– Term Extraction and Taxonomy Extraction

  • Statistical Analysis
  • Conceptual Clustering (FCA), Patterns, WordNet (+ Combination)

– Relation Extraction

  • Anonymous Relations (Association Rules)
  • Named Relations (Subcategorization Frames)
  • CNTS Univ. Antwerpen, VUB (Reinberger et al., 2004)

– Concept Formation + Relation Extraction

  • Shallow Linguistic Parsing
  • Clustering
  • DFKI – OntoLT (Buitelaar et al., 2004), RelExt (Schutz and Buitelaar, 2005)

– Term Extraction

  • Shallow Linguistic Parsing & Statistical Analysis

– Taxonomy and Relation Extraction

  • Shallow Linguistic Parsing & manually defined mapping rules
  • Named Relations (Subcategorization Frames)
  • Economic Univ., Prague (Kavalec and Svatek, 2005)

– Relation Label Extraction

  • Extension of Association Rules Algorithm
  • Free Univ. Amsterdam (Sabou, 2005)

– Term and Taxonomy Extraction (for Web Service Ontologies)

  • Shallow Linguistic Analysis & Patterns
  • Jozef Stefan Inst., Ljubljana -- OntoGen (Fortuna et al., 2005)

– Term and Taxonomy Extraction

  • Statistical Analysis & Clustering

– Relations

  • Web Mining, Social Network Analysis
  • Univ. Paris -- ASIUM (Faure and Nedellec, 1998)

– Taxonomy Extraction (& Subcategorization Frames)

  • Shallow Linguistic Parsing
  • Clustering
  • Univ. Rome – OntoLearn (Navigli and Velardi, 2004; Velardi et al., 2005)

– Term Extraction and Interpretation

  • Shallow Linguistic Parsing &Term Disambiguation & Compositional Interpretation

– Relations

  • Classification of the relation between terms in a compound into predefined set of (thematic) relations

– Definitions

  • Rules for Gloss Generation
  • Univ. of Zürich (Rinaldi et al., 2005)

– Term and Taxonomy Extraction

  • Shallow Linguistic Analysis & Patterns

Multilayered

  • Rules & Axioms: ∀x, y (sufferFrom (x, y) → ill(x))
  • Relations: cure(dom:DOCTOR,range:DISEASE)
  • Taxonomy: is_a(DOCTOR,PERSON)
  • Concepts: DISEASE:=<Int,Ext,Lex>
  • (Multilingual) Synonyms: {disease, illness, Krankheit}
  • Terms: disease, illness, hospital

Term Extraction

Determine most relevant phrases as terms

  • Linguistic Methods
  • Rules over linguistically analyzed text

– Linguistic analysis – Part-of-Speech Tagging, Morphological Analysis, … – Extract patterns – Adjective-Noun, Noun-Noun, Adj-Noun-Noun, … – Ignore Names (DEC, HP, …), Certain Adjectives (quality, top, …), etc.

  • Statistical Methods
  • Co-occurrence (collocation) analysis for term extraction within the corpus
  • Comparison of frequencies between domain and general corpora

– Computer Terminal will be specific to the Computer domain – Dining Table will be less specific to the Computer domain

  • Hybrid Methods
  • Linguistic rules to extract term candidates
  • Statistical (pre- or post-) filtering

The Semiotic Triangle

  • Ogden & Richards, 1923
    • Object / Concept / Sign: Object<=experience=>Concept ; Concept<=perception=>Sign ; Object <= convention ⇒ Sign.

Concepts: Intension, Extension, Lexicon

A term may indicate a concept, if we can define its

  • Intension
  • (in)formal definition of the set of objects that this concept describes

– a disease is an impairment of health or a condition of abnormal functioning

  • Extension
  • a set of objects (instances) that the definition of this concept describes

– influenza, cancer, heart disease, … Discussion: what is an instance? - ‘heart disease’ or ‘my uncle’s heart disease’

  • Lexical Realizations
  • the term itself and its multilingual synonyms

– disease, illness, Krankheit, maladie, … Discussion: synonyms vs. instances – ‘disease’, ‘heart disease’, ‘cancer’, …

Concepts: Intension

Extraction of a Definition for a Concept from Text

  • Informal Definition
  • e.g., a gloss for the concept as used in WordNet
  • OntoLearn (Navigli and Velardi, 2004; Velardi et al., 2005) uses natural language generation to compositionally build up a WordNet gloss for automatically extracted concepts
  • ‘Integration Strategy’ : “strategy for the integration of …”
  • Formal Definition
  • e.g., a logical form that defines all formal constraints on class membership
  • Inductive Logic Programming, Formal Concept Analysis, …

Concepts: Extension

Extraction of Instances for a Concept from Text

  • Commonly referred to as Ontology Population
  • Relates to Knowledge Markup (Semantic Metadata)
  • Uses Named-Entity Recognition and Information Extraction
  • Instances can be:
    • Names for objects, e.g.
      • Person, Organization, Country, City, …
    • Event instances (with participant and property instances), e.g.
      • Football Match (with Teams, Players, Officials, ...)
      • Disease (with Patient-Name, Symptoms, Date, …)

Concepts: Lexicon

Extraction of Synonyms and Translations for a Concept from Text – (Multilingual) Term Extraction – see previous slides – Representation of Lexical Information in Ontologies

The Mathematical Definition of an

Ontology [Stumme et al.; abbrev. from Cimiano-06]

  • Structure:
    • C == (C, <c, R, <r, w)

– C: set of concept identifiers – R: set of relation identifiers – <C partial order on C (concept hierarchy) – <R: partial order on R (relation hierarchy)

– Mathematical definition of extension of concepts [c] and relations [r] – L-Axiom System: Arbitrary Axioms (may include patterns)

Context Features

  • Four-grams [Schuetze 93]
  • Word-windows [Grefenstette 92]
  • Predicate-Argument relations (every man loves a woman) Modifier Relations (fast car, the hood of the car)
    • [Grefenstette 92, Cimiano 04b, Gasperin et al. 03]
  • Appositions (Ferrari, the fastest car in the world)
    • [Hahn & Schnattinger 98, Caraballo 99]
  • Coordination (ladies and gentlemen)
    • [Caraballo 99, Dorow and Widdows 03]

Evaluation - Data Sets

  • Tourism (118 Mio. tokens):

http://www.all-in-all.de/englishhttp://www.lonelyplanet.com – British National Corpus (BNC) – handcrafted tourism ontology (289 concepts)

  • Finance (185 Mio. tokens):

– Reuters news from 1987 – GETESS finance ontology (1178 concepts)

Using Ontologies

Ontologies as:

References


,

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2006 OntologiesAndTheSemanticWebTutorialSteffen StaabOntologies and the Semantic Webhttp://www.uni-koblenz.de/~staab/Teaching/Tutorials/SMBM-2006/2006