In many crowdsourcing tasks, such as the acquisition of high-quality polygons from aerial imagery, ensuring quality remains a central challenge, especially when contributors prioritize speed over accuracy as is often the case in paid crowdsourcing. Traditional quality-control measures to address such issues often include hidden ground-truth tests and consistency checks. However, these strategies often come with trade-offs, as they increase costs, slow down the acquisition process, or can be bypassed by experienced workers.
To optimize these trade-offs in terms of output quality, overall costs and acquisition times, an alternative approach integrates quality assessment directly into the annotation workflow: A pretrained convolutional neural network (CNN) evaluates crowd acquisition during the acquisition and provides immediate feedback to crowdworkers tasked with the acquisition. The underlying CNN architecture is deliberately lightweight to ensure both scalability and low-latency responses, as acquisitions must be processed simultaneously and in real-time for not just one, but many simultaneous crowdworkers in large-scale acquisition campaigns.
Although the feedback is computed as a continuous quality score between zero and one, it is presented through a simple traffic-light interface within the acquisition tool, as detailed numerical values are unnecessary for crowdworkers and may distract from the task. With each click performed by a crowdworker, i.e., when new vertices are added to the acquisition polygon, the current geometry is transmitted to the backend server, where the CNN infers the corresponding quality score. This quality score is then mapped to three discrete states and submitted to the client, where the traffic light shows a red, yellow or green light: Low quality, intermediate quality, or satisfactory. A short demonstration of this real-time feedback system is shown in the video below.
The underlying CNN was pre-trained on a large set of previously collected crowd acquisitions from different datasets using weak supervision, with the quality metric derived from underlying reference data, in order to allow the CNN to learn to distinguish between high- and low-quality geometries and enable some level of generalization. Although requiring additional costs in terms of training, the overall data quality typically increases notably, allowing to reduce the need for redundancy and post-processing, as the CNN improves the quality of individual acquisitions. Moreover, it offers a scalable and intuitive form of quality control that can be applied to large datasets. While currently applied to tree crown segmentation, the approach can in general be adapted to further use cases that require polygon-based acquisition, and extending it to such domains is part of future research. Current research focuses on optimizing the network by reducing its dependence on reference acquisitions during training, to further lower costs and optimize the trade-off in terms of quality, costs and time even more.

David Collmar
M.Sc.Research Associate