Human Inter-Annotator Agreement (IAA) Measure

From GM-RKB
Jump to navigation Jump to search

A Human Inter-Annotator Agreement (IAA) Measure is an agreement measure of multi-classifier classification task performed by to or more human annotators or raters.



References

2021

  • (Wikipedia, 2021) ⇒ https://en.wikipedia.org/wiki/Cohen's_kappa#Definition Retrieved:2021-8-1.
    • Cohen's kappa measures the agreement between two raters who each classify N items into C mutually exclusive categories. The definition of [math]\displaystyle{ \kappa }[/math] is:

      [math]\displaystyle{ \kappa \equiv \dfrac{p_o - p_e}{1 - p_e} = 1- \dfrac{1 - p_o}{1 - p_e}, }[/math]

      where is the relative observed agreement among raters, and is the hypothetical probability of chance agreement, using the observed data to calculate the probabilities of each observer randomly seeing each category. If the raters are in complete agreement then [math]\displaystyle{ \kappa=1 }[/math]. If there is no agreement among the raters other than what would be expected by chance (as given by ), [math]\displaystyle{ \kappa=0 }[/math]. It is possible for the statistic to be negative, which implies that there is no effective agreement between the two raters or the agreement is worse than random.

      For categories, observations to categorize and [math]\displaystyle{ n_{ki} }[/math] the number of times rater predicted category :

      [math]\displaystyle{ p_e = \dfrac{1}{N^2} \sum_k n_{k1}n_{k2} }[/math]

      This is derived from the following construction:

      [math]\displaystyle{ p_e = \sum_k \widehat{p_{k12}} = \sum_k \widehat{p_{k1}}\widehat{p_{k2}} = \sum_k \dfrac{n_{k1}}{N}\dfrac{n_{k2}}{N} = \dfrac{1}{N^2} \sum_k n_{k1}n_{k2} }[/math]

      Where [math]\displaystyle{ \widehat{p_{k12}} }[/math] is the estimated probability that both rater 1 and rater 2 will classify the same item as k, while [math]\displaystyle{ \widehat{p_{k1}} }[/math] is the estimated probability that rater 1 will classify an item as k (and similarly for rater 2).

      The relation [math]\displaystyle{ \widehat{p_k} = \sum_k \widehat{p_{k1}}\widehat{p_{k2}} }[/math] is based on using the assumption that the rating of the two raters are independent. The term [math]\displaystyle{ \widehat{p_{k1}} }[/math] is estimated by using the number of items classified as k by rater 1 ([math]\displaystyle{ n_{k1} }[/math] ) divided by the total items to classify ([math]\displaystyle{ N }[/math] ): [math]\displaystyle{ \widehat{p_{k1}}= \dfrac{n_{k1}}{N} }[/math] (and similarly for rater 2).

2020

2017

$K =\dfrac{Pr(a) - Pr(e)}{1- Pr(e)}$

2014

  • https://corpuslinguisticmethods.wordpress.com/2014/01/15/what-is-inter-annotator-agreement/
    • QUOTE: … There are basically two ways of calculating inter-annotator agreement. The first approach is nothing more than a percentage of overlapping choices between the annotators. This approach is somewhat biased, because it might be sheer luck that there is a high overlap. Indeed, this might be the case if there are only a very limited amount of category levels (only yes versus no, or so), so the chance of having the same annotation is a priori already 1 out of 2. Also, it might be possible that the majority of observations belongs to one of the levels of the category, so that the a priori overlap is already potentially high.

      Therefore, an inter-annotator measure has been devised that takes such a priori overlaps into account. That measure is known as Kohen’s Kappa. To calculate inter-annotator agreement with Kohen’s Kappa, we need an additional package for R, called “irr”. Install it as follows:

2012a

2012b

2008

2006

1960

The test of agreement comes then with regard to the $1 - p_e$ of the units for which the hypothesis of no association would predict disagreement between the judges. This term will serve as the denominator.

To the extent to which nonchance factors are operating in the direction of agreement, $p_o$ will exceed $p_e$; their difference, $p_o - p_e$, represents the proportion of the cases in which beyond-chance agreement occurred and is the numerator of the coefficient.

The coefficient $\kappa$ is simply the proportion of chance-expected disagreements which do not occur, or alternatively, it is the proportion of agreement after chance agreement is removed from consideration:

$\kappa=\dfrac{p_o-p_e}{1-p_e}$

(1)