2015 DebiasingCrowdsourcedBatches

From GM-RKB
Jump to navigation Jump to search

Subject Headings:

Notes

Cited By

Quotes

Author Keywords

Abstract

Crowdsourcing is the de-facto standard for gathering annotated data. While, in theory, data annotation tasks are assumed to be attempted by workers independently, in practice, data annotation tasks are often grouped into batches to be presented and annotated by workers together, in order to save on the time or cost overhead of providing instructions or necessary background. Thus, even though independence is usually assumed between annotations on data items within the same batch, in most cases, a worker's judgment on a data item can still be affected by other data items within the batch, leading to additional errors in collected labels. In this paper, we study the data annotation bias when data items are presented as batches to be judged by workers simultaneously. We propose a novel worker model to characterize the annotating behavior on data batches, and present how to train the worker model on annotation data sets. We also present a debiasing technique to remove the effect of such annotation bias from adversely affecting the accuracy of labels obtained. Our experimental results on synthetic and real-world data sets demonstrate that our proposed method can achieve up to + 57% improvement in F 1-score compared to the standard majority voting baseline.

References

;

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2015 DebiasingCrowdsourcedBatchesAditya Parameswaran
Honglei Zhuang
Dan Roth
Jiawei Han
Debiasing Crowdsourced Batches10.1145/2783258.27833162015