2007 TermextractoraWebApplicationtoL

From GM-RKB
Jump to navigation Jump to search

Subject Headings: TermExtractor System, Terminology Extraction System, Terminology Extraction Algorithm.

Notes

Cited By

Quotes

Abstract

We implemented a high-performing technique to automatically extract from the available documents in a given domain the shared terminology of a web community. This technique has been successfully experimented and submitted for large-scale evaluation in the domain of enterprise interoperability, by the member of the INTEROP network of excellence. In order to make the technique available to any web community in any domain, we developed a web application that allows users to i) acquire (incrementally or in a single step) a terminology in any domain, by submitting documents of variable length and format, and ii) validate on-line the obtained results. The system also supports collaborative evaluation by a group of experts. The web application has been widely tested in several domains by many international institutions that volunteered for this task.

1. Introduction

In (Navigli and Velardi, 2004) we presented a technique, named OntoLearn, to automatically learn a domain ontology from the documents shared by the members of a web community. This technique is based on three learning steps, each followed by manual validation: terminology extraction, glossary extraction, and finally, ontology enrichment. The OntoLearn methodology has been enhanced (Navigli and Velardi, 2005) and experimented in real settings (Velardi et al. 2007). Recently, we started to develop web applications to make freely available each of the steps of the OntoLearn methodology. This paper describes TermExtractor, a tool to extract the terminology “shared” among the members of a web community, through the analysis of the documents they exchange. Defining a domain lexicon is in fact the first step of an ontology building process.

The contributions of the paper, with respect to published work, are the following:

  • 1. We summarize the terminology extraction algorithm, on which we provided already a description in (Navigli and Velardi, 2002)1, with the main intent of highlighting progress wrt previously reported work;
  • 2. We summarize the features and options of the TermExtractor web application;
  • 3. We provide an evaluation of the web tool performed by a world-wide group of TermExtractor users, who volunteered to perform the task.

2. The term extraction algorithms

As many terminology extraction systems (Wermter and Hahn, 2005) (Bourigault and Jacquemin, 1999) (Park et al., 2002), in TermExtractor the identification of relevant terms is based on two steps: first, a linguistic processor is used to parse text and extract typical terminological structures, like compounds (enterprise model), adjective-noun (local network) and noun-preposition-noun (board of directors). Then, the (usually large) list of terminological candidates is purged according to various filters.

References


 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2007 TermextractoraWebApplicationtoLFrancesco Sclano
Paola Velardi
Termextractor: A Web Application to Learn the Shared Terminology of Emergent Web Communities2007