Difference between revisions of "SensEval Benchmark Task"

From GM-RKB
Jump to: navigation, search
(ContinuousReplacement)
(Tag: continuous replacement)
(2000)
Line 36: Line 36:
  
 
=== 2000 ===
 
=== 2000 ===
* (Senseval Archive, 2000) SENSEVAL committe (2000). http://www.itri.brighton.ac.uk/events/senseval/ARCHIVE/index.html
+
* (Kilgarriff & Rosenzweig, 2000) ⇒ [[Adam Kilgarriff]], and [[Joseph Rosenzweig]] (2000). [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.201.4346&rep=rep1&type=pdf "Framework and Results for English SENSEVAL"]. Computers and the Humanities, 34(1-2), 15-48. [https://doi.org/10.1023/A%3A1002693207386 DOI:10.1023/A:1002693207386]
** QUOTE: There are now many [[computer program]]s for automatically determining which sense a word is being used in. One would like to be able to say which were better, which worse, and also which words, or varieties of language, presented particular problems to which programs. [[SENSEVAL]] is designed to meet this need. <P>The first [[SENSEVAL]] took place in the summer of 1998, for English, French and Italian, culminating in a workshop held at Herstmonceux Castle, Sussex, England on September 2-4. <P>For specifications of the French and Italian tasks see the [[ROMANSEVAL]] site [http://www.lpl.univ-aix.fr/projects/romanseval].
+
** QUOTE: [[Senseval]] was the first open, [[community]]-based [[evaluation exercise]] for [[Word Sense Disambiguation]] programs. It adopted the [[quantitative approach]] to evaluation developed in [[MUC]] and other [[ARPA]] [[evaluation exercise]]s. It took place in 1998. In this paper we describe the structure, organisation and results of the [[SENSEVAL]] exercise for [[English]]. We present and defend various design choices for the exercise, describe the [[data]] and [[gold-standard]] [[preparation]], consider issues of [[scoring strategi]]es and [[baseline]]s, and present the results for the 18 participating [[WDS System|system]]s. The exercise identifies the [[state-of-the-art]] for [[fine-grained]] [[word sense disambiguation]], where [[training data]] is available, as 74–78% correct, with a number of [[algorithm]]s approaching this level of performance. For [[WDS System|system]]s thatdid not assume the availability of [[training data]], performance was markedly lower and also more variable. [[Human inter-tagger agreement]] was high, with the [[gold standard]] [[tagging]]s being around 95% [[replicable]].
  
 
=== 1998 ===
 
=== 1998 ===

Revision as of 21:34, 10 November 2019

A SensEval Benchmark Task is a Benchmark Task that evaluates the performance of Word-Sense Disambiguation Systems.



References

2019a

  • (Wikipedia, 2019) ⇒ https://en.wikipedia.org/wiki/SemEval Retrieved:2019-11-10.
    • SemEval (Semantic Evaluation) is an ongoing series of evaluations of computational semantic analysis systems; it evolved from the Senseval word sense evaluation series. The evaluations are intended to explore the nature of meaning in language. While meaning is intuitive to humans, transferring those intuitions to computational analysis has proved elusive.

      This series of evaluations is providing a mechanism to characterize in more precise terms exactly what is necessary to compute in meaning. As such, the evaluations provide an emergent mechanism to identify the problems and solutions for computations with meaning. These exercises have evolved to articulate more of the dimensions that are involved in our use of language. They began with apparently simple attempts to identify word senses computationally. They have evolved to investigate the interrelationships among the elements in a sentence (e.g., semantic role labeling), relations between sentences (e.g., coreference), and the nature of what we are saying (semantic relations and sentiment analysis).

      The purpose of the SemEval and Senseval exercises is to evaluate semantic analysis systems. "Semantic Analysis" refers to a formal analysis of meaning, and "computational" refer to approaches that in principle support effective implementation. [1] The first three evaluations, Senseval-1 through Senseval-3, were focused on word sense disambiguation, each time growing in the number of languages offered in the tasks and in the number of participating teams. Beginning with the fourth workshop, SemEval-2007 (SemEval-1), the nature of the tasks evolved to include semantic analysis tasks outside of word sense disambiguation.

      Triggered by the conception of the *SEM conference, the SemEval community had decided to hold the evaluation workshops yearly in association with the *SEM conference. It was also the decision that not every evaluation task will be run every year, e.g. none of the WSD tasks were included in the SemEval-2012 workshop.

  1. Blackburn, P., and Bos, J. (2005), Representation and Inference for Natural Language: A First Course in Computational Semantics, CSLI Publications. .

2019

2001

2000

1998

1. receiving corpus data from the organisers
2. applying the participant’s WSD program to it
3. returning the program's word sense decisions to the organisers for evaluation.
This will take place over the summer, 1998, and there will be a workshop in Sussex, England, in September, by which time the performance of a number of WSD programs will have been evaluated, and where we shall discuss