2015 NASARIANovelApproachtoaSemantic

From GM-RKB
Jump to navigation Jump to search

Subject Headings: NASARI System; Word Embedding System, Semantic Similarity System; Word Sense Disambiguation System.

Notes

Cited By

Quotes

Abstract

The semantic representation of individual word senses and concepts is of fundamental importance to several applications in Natural Language Processing. To date, concept modeling techniques have in the main based their representation either on lexicographic resources, such as WordNet, or on encyclopedic resources, such as Wikipedia. We propose a vector representation technique that combines the complementary knowledge of both these types of resource. Thanks to its use of explicit semantics combined with a novel cluster-based dimensionality reduction and an effective weighting scheme, our representation attains state-of-the-art performance on multiple datasets in two standard benchmarks: word similarity and sense clustering. We are releasing our vector representations at http://lcl.uniroma1.it/nasari/.

1. Introduction

...

In this paper we put forward a novel concept representation technique, called NASARI, which exploits the knowledge available in both types of resource in order to obtain effective representations of arbitrary concepts. The contributions of this paper are threefold. First, we propose a novel technique for rich semantic representation of arbitrary WordNet synsets or Wikipedia pages. Second, we provide improvements over the conventional tf-idf weighting scheme by applying lexical specificity (Lafon, 1980), a statistical measure mainly used for term extraction, to the task of computing vector weights in a vector representation. Third, we propose a semantically-aware dimensionality reduction technique that transforms a lexical item's representation from a semantic space of words to one of WordNet synsets, simultaneously providing an implicit disambiguation and a distribution smoothing. We demonstrate that our representation achieves state-of-the-art performance on two different tasks: (1) word similarity on multiple standard datasets: MC30, RG-65, and WordSim-353 similarity, and (2) Wikipedia sense clustering, in which our unsupervised system surpasses the performance of a state-of-the-art supervised technique that exploits knowledge available Wikipedia in several languages.

2. Semantic Representation of Concepts

3. NASARI for Semantic Similarity

4. Experiments

5. Related Work

6. Conclusions

Acknowledgments

Footnotes


References

BibTeX

@inproceedings{2015_NASARIANovelApproachtoaSemantic,
  author    = {Jose Camacho-Collados and
               Mohammad Taher Pilehvar and
               Roberto Navigli},
  editor    = {Rada Mihalcea and
               Joyce Yue Chai and
               Anoop Sarkar},
  title     = {NASARI: a Novel Approach to a Semantically-Aware Representation
               of Items},
  booktitle = {Proceedings of the 2015 Conference of the North American Chapter
               of the Association for Computational Linguistics: Human Language Technologies,
               (NAACL-HLT 2015)},
  pages     = {567--577},
  publisher = {The Association for Computational Linguistics},
  year      = {2015},
  url       = {https://doi.org/10.3115/v1/n15-1059},
  doi       = {10.3115/v1/n15-1059},
}


 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2015 NASARIANovelApproachtoaSemanticMohammad Taher Pilehvar
Jose Camacho-Collados
Roberto Navigli
NASARI: A Novel Approach to a Semantically-Aware Representation of Items2015