2016 SemEval2016Task14SemanticTaxono

Subject Headings: SemEval-2016 Task 14.

Notes

Manually constructed taxonomies provide a crucial resource for many NLP technologies, yet these resources are often limited in their lexical coverage due to their construction procedure. While multiple approaches have been proposed to enrich such taxonomies with new concepts, these techniques are typically evaluated by measuring the accuracy at identifying relationships between words, e.g., that a dog is a canine, rather relationships between specific concepts. Task 14 provides an evaluation framework for automatic taxonomy enrichment techniques by measuring the placement of a new concept into an existing taxonomy: Given a new word and its definition, systems were asked to attach or merge the concept into an existing WordNet concept. Five teams submitted 13 systems to the task, all of which were able to improve over the random baseline system. However, only one participating system outperformed the second, morecompetitive baseline that attaches a new term to the first word in its gloss with the appropriate part of speech, which indicates that techniques must be adapted to exploit the structure of glosses.

Given the availability of large-scale dictionaries such as Wiktionary, SemEval-2016 Task 14 is designed to inspire new automated approaches for using the definitions in these resource to expand WordNet with new concepts. Accordingly, the task provides a high-quality dataset of one thousand definitions from a wide range of domains to be added to theWordNet hierarchy, either by adding them as new concepts or integrating them as new lemmas of an existing concept. The task provides a robust evaluation framework for measuring the accuracy of ontology expansion techniques. More broadly, the techniques developed as a part of Task 14 can play an important role in the construction of new automatically-built ontologies.

The goal of Task 14 is to evaluate systems that enrich semantic taxonomies with new word senses drawn from other lexicographic resources. The task provides systems with a set of word senses that are not defined in WordNet.1 Each word sense comprises three parts: a lemma, part of speech tag, and definition. For example, the noun geoscience is a word sense in our dataset which is associated with the definition “Any of several sciences that deal with the Earth”. The word sense is drawn from Wiktionary.2 For each of these word senses, a system’s task is to identify a point in the WordNet’s subsumption (i.e., is-a) hierarchy which is the most plausible point for placing the new word sense. In other words, a system’s task is to find the most semantically similar WordNet synset to the given new word sense. Operations Once the target synset is identified, a system has to decide how to integrate the new word sense. For a given new word sense s and a target synset S we define two possible operations:

MERGE: when s refers to the same concept that is conceptualized by the synset S. As a result of this operation s is added to the set of synonymous word senses in S.
ATTACH: when s refers to a more specific concept than S. In other words, S is a generalization of the new word sense s (i.e., its hypernym). This operation creates a new synset containing the sole word sense s and attaches the new synset as a hyponym of S in the WordNet’s subsumption hierarchy.

Table 1 shows example new word senses together with the target synset and the operation. Note that after both these operations, the polysemy of the lemma of s is increased by one. Also, the total number of synsets in the enriched WordNet increases by one after an ATTACH operation whereas it remains unchanged after MERGE, since in the latter case, a new word sense is added to an existing synset. Our datasets contain instances from noun and verb parts of speech.

…

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2016 SemEval2016Task14SemanticTaxono	David Jurgens Mohammad Taher Pilehvar			SemEval-2016 Task 14: Semantic Taxonomy Enrichment						2016