Proceedings of the 24th International Conference on Intelligent User Interfaces | 2019
Explainable modeling of annotations in crowdsourcing
Abstract
Aggregation models for improving the quality of annotations collected via crowdsourcing have been widely studied, but far less has been done to explain why annotators make the mistakes that they do. To this end, we propose a joint aggregation and worker clustering model that detects patterns underlying crowd worker labels to characterize varieties of labeling errors. We evaluate our approach on a Named Entity Recognition dataset labeled by Mechanical Turk workers in both a retrospective experiment and a small human study. The former shows that our joint model improves the quality of clusters vs. aggregation followed by clustering. Results of the latter suggest that clusters aid human sense-making in interpreting worker labels and predicting worker mistakes. By enabling better explanation of annotator mistakes, our model creates a new opportunity to help Requesters improve task instructions and to help crowd annotators learn from their mistakes. Source code, data, and supplementary material is shared online.