2016 SemEval2016Task14SemanticTaxono
- (Jurgens & Pilehvar, 2016) ⇒ David Jurgens, and Mohammad Taher Pilehvar. (2016). “SemEval-2016 Task 14: Semantic Taxonomy Enrichment.” In: Proceedings of SemEval 2016.
Subject Headings: SemEval-2016 Task 14.
Notes
Cited By
Quotes
Abstract
Manually constructed taxonomies provide a crucial resource for many NLP technologies, yet these resources are often limited in their lexical coverage due to their construction procedure. While multiple approaches have been proposed to enrich such taxonomies with new concepts, these techniques are typically evaluated by measuring the accuracy at identifying relationships between words, e.g., that a dog is a canine, rather relationships between specific concepts. Task 14 provides an evaluation framework for automatic taxonomy enrichment techniques by measuring the placement of a new concept into an existing taxonomy: Given a new word and its definition, systems were asked to attach or merge the concept into an existing WordNet concept. Five teams submitted 13 systems to the task, all of which were able to improve over the random baseline system. However, only one participating system outperformed the second, morecompetitive baseline that attaches a new term to the first word in its gloss with the appropriate part of speech, which indicates that techniques must be adapted to exploit the structure of glosses.
1 Introduction
Semantic networks and ontologies are key resources in Natural Language Processing. Of these resources, WordNet (Fellbaum, 1998), the de facto standard lexical database of English, has remained in widespread use over the past two decades, with a broad range of applications such as Word Sense Disambiguation (Navigli, 2009), Query expansion and Information Retrieval (Varelas et al., 2005; Fang, 2008), sentiment analysis (Esuli and Sebastiani, 2006), and semantic similarity measurement (Budanitsky and Hirst, 2006a; Pilehvar et al., 2013). The performances of these WordNet-based techniques are directly affected by the lexical coverage of WordNet’s vocabulary, especially if applied to specific domains and social media texts. However, the manual maintenance of WordNet is an expensive endeavour which requires significant effort and time. As a result, WordNet is not updated frequently and omits many lemmas and senses, such as those from domain specific lexicons (e.g., DNA replication, regular expression, and long shot), creative slang usages (e.g., homewrecker), or those for technology or entities that came into recent existence (e.g., selfie, mp3).
Hence, a variety of techniques have tried to tackle the coverage limitation of WordNet, often by drawing new word senses from other domain-specific or collaboratively-constructed dictionaries and adding the new word senses to the WordNet hierarchy (Poprat et al., 2008; Snow et al., 2006; Toral et al., 2008; Yamada et al., 2011; Jurgens and Pilehvar, 2015). However, these approaches have usually been tested on relatively small datasets, often testing for word-level relationships without precisely measuring integration accuracy at the concept level. Similarly, other techniques have been proposed for automatically discovering novel senses of words (Lau et al., 2012); however, these senses were not re-integrated into the taxonomy.
Given the availability of large-scale dictionaries such as Wiktionary, SemEval-2016 Task 14 is designed to inspire new automated approaches for using the definitions in these resource to expand WordNet with new concepts. Accordingly, the task provides a high-quality dataset of one thousand definitions from a wide range of domains to be added to theWordNet hierarchy, either by adding them as new concepts or integrating them as new lemmas of an existing concept. The task provides a robust evaluation framework for measuring the accuracy of ontology expansion techniques. More broadly, the techniques developed as a part of Task 14 can play an important role in the construction of new automatically-built ontologies.
2 Task Description
The goal of Task 14 is to evaluate systems that enrich semantic taxonomies with new word senses drawn from other lexicographic resources. The task provides systems with a set of word senses that are not defined in WordNet.1 Each word sense comprises three parts: a lemma, part of speech tag, and definition. For example, the noun geoscience is a word sense in our dataset which is associated with the definition “Any of several sciences that deal with the Earth”. The word sense is drawn from Wiktionary.2 For each of these word senses, a system’s task is to identify a point in the WordNet’s subsumption (i.e., is-a) hierarchy which is the most plausible point for placing the new word sense. In other words, a system’s task is to find the most semantically similar WordNet synset to the given new word sense. Operations Once the target synset is identified, a system has to decide how to integrate the new word sense. For a given new word sense s and a target synset S we define two possible operations:
- MERGE: when s refers to the same concept that is conceptualized by the synset S. As a result of this operation s is added to the set of synonymous word senses in S.
- ATTACH: when s refers to a more specific concept than S. In other words, S is a generalization of the new word sense s (i.e., its hypernym). This operation creates a new synset containing the sole word sense s and attaches the new synset as a hyponym of S in the WordNet’s subsumption hierarchy.
1 We use WordNet 3.0. 2 http://www.wiktionary.org
Table 1 shows example new word senses together with the target synset and the operation. Note that after both these operations, the polysemy of the lemma of s is increased by one. Also, the total number of synsets in the enriched WordNet increases by one after an ATTACH operation whereas it remains unchanged after MERGE, since in the latter case, a new word sense is added to an existing synset. Our datasets contain instances from noun and verb parts of speech.
2.1 Subtasks
…
References
2013
- (Pilehvar et al., 2013) ⇒ Mohammad Taher Pilehvar, David Jurgens, and Roberto Navigli. (2013). “Align, Disambiguate and Walk: A Unified Approach for Measuring Semantic Similarity.” In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL 2013) Volume 1: Long Papers.
Author | volume | Date Value | title | type | journal | titleUrl | doi | note | year | |
---|---|---|---|---|---|---|---|---|---|---|
2016 SemEval2016Task14SemanticTaxono | David Jurgens Mohammad Taher Pilehvar | SemEval-2016 Task 14: Semantic Taxonomy Enrichment | 2016 |