2015 DebiasingCrowdsourcedBatches

Subject Headings:

Notes

Crowdsourcing is the de-facto standard for gathering annotated data. While, in theory, data annotation tasks are assumed to be attempted by workers independently, in practice, data annotation tasks are often grouped into batches to be presented and annotated by workers together, in order to save on the time or cost overhead of providing instructions or necessary background. Thus, even though independence is usually assumed between annotations on data items within the same batch, in most cases, a worker's judgment on a data item can still be affected by other data items within the batch, leading to additional errors in collected labels. In this paper, we study the data annotation bias when data items are presented as batches to be judged by workers simultaneously. We propose a novel worker model to characterize the annotating behavior on data batches, and present how to train the worker model on annotation data sets. We also present a debiasing technique to remove the effect of such annotation bias from adversely affecting the accuracy of labels obtained. Our experimental results on synthetic and real-world data sets demonstrate that our proposed method can achieve up to + 57% improvement in F 1-score compared to the standard majority voting baseline.

;

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2015 DebiasingCrowdsourcedBatches	Aditya Parameswaran Honglei Zhuang Dan Roth Jiawei Han			Debiasing Crowdsourced Batches				10.1145/2783258.2783316		2015