Learning from Crowds with Annotation Reliability
Zhi Cao, Enhong Chen, Ye Huang, and 2 more authors
In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2023
Crowdsourcing provides a practical approach for obtaining annotated data to train supervised learning models. However, since the crowd annotators may have different expertise domain and cannot always guarantee the high-quality annotations, learning from crowds generally suffers from the problem of unreliable results of introducing some noises, which makes it hard to achieve satisfying performance. In this work, we investigate the reliability of annotations to improve learning from crowds. Specifically, we first project annotator and data instance to factor vectors and model the complex interaction between annotator expertise and instance difficulty to predict annotation reliability. The learned reliability can be used to evaluate the quality of crowdsourced data directly. Then, we construct a new annotation, namely soft annotation, which serves as the gold label during the training. To recognize the different strengths of annotators, we model each annotator’s confusion in an end-to-end manner. Extensive experimental results on three real-world datasets demonstrate the effectiveness of our method.