Incremental Object Detection in the Context of Interactive Machine Learning for Industrial Applications

Sylvia Ackermann

Duration: 6 months
Completion: December 2024
Supervisor: M.Sc. Martina Köhler, Fraunhofer-Institut für Produktionstechnik und Automatisierung (IPA)
Examiner: apl. Prof. Dr.-Ing. Norbert Haala, IfP

Motivation

In recent years, machine learning (ML) methods have increasingly found their way into industry, particularly for image processing. These techniques are employed in diverse applications, such as quality control and human safety within the manufacturing sector. However, the implementation of machine learning models necessitates labeled training data, requiring a collaborative effort between a company’s domain expert and a machine learning specialist. The domain expert records and labels the data, which is then used by the machine learning expert to train the ML model. This process is particularly time-consuming and burdensome for small and medium-sized companies, which often contend with a frequently changing product portfolio and small batch sizes. Therefore, interactive machine learning (IML) is helpful to simplify and speed up the process. With the help of IML for image processing solutions, companies have the opportunity to control their data and the continuous change and improvement of ML models themselves.

Figure 1: Principle of interactive machine learning © Fraunhofer IPA.

For this purpose, the Fraunhofer Institute for Manufacturing Engineering and Automation (IPA) has developed a web app for interactive machine learning in object detection. The app empowers domain experts to independently label new data and start a new training, allowing for iterative improvements to the base ML model. This thesis aims to optimise the scenarios where new classes are introduced into the model, which is called class-incremental object detection. The backend of the web application for IML should be optimised so that the training for detecting objects of new classes is done efficiently, but at the same time leads to accurate results.
Class incremental object detection consists of two steps depicted in figure 2. Step one is the base training using training images and annotations of the base classes. It is followed by the incremental training using annotations of the incremental classes. Finally, the model should be able to detect both base and incremental classes without ‘catastrophic forgetting’ of the base classes.

Figure 2: Principle of class incremental object detection [1].

Choice of Algorithm for Incremental Object Detection

This thesis investigates, tests, and evaluates algorithms for class incremental object detection (IOD), focusing on their applicability for interactive machine learning in an industrial environment. Research on state-of-the-art incremental learning strategies is conducted, which lead to choosing elastic response distillation (ERD) as suitable algorithm for the usecase of interactive machine learning (IML). The ERD algorithm uses a data-based knowledge distillation strategy to prevent the incremental neural network from deviating too much from the network after base training.

Figure 3: Structure of the algorithm of Elastic Response Distillation for incremental object detection [2].

Recording Industrial Dataset

The applicability of ERD for IML in an industrial environment is tested and evaluated with two different datasets. The two datasets used for this purpose are the widely recognized COCO dataset and a custom industrial dataset. The latter was created specifically for this research, involving careful planning of object types, quantities, and recorded scenarios. The industrial dataset contains images of tools typically found on a workbench, which are used in the manual assembly of workpieces. It consists of 272 images of the 10 classes visualized in figure 3. In addition to single object images, scenarios containing multiple objects of different classes were recorded as well. This allows evaluations focusing on the effect of co-occurrence on the incremental object detection results. Co-occurrence means the existence of unlabeled old class objects in the incremental training dataset.

Figure 4: Classes of the industrial dataset recorded at the Fraunhofer IPA.

Evaluation of incremental object detection scenarios

Key evaluations of incremental object detection with ERD using the industrial dataset include training configurations with different class distributions for base and incremental training phases, variations in the number of incremental steps, and incremental step sizes. The research addresses several questions: How do incremental class numbers impact detection accuracy? What is the effect of multiple incremental steps on precision? Can co-occurrence of base and incremental classes in the incremental training data improve the results? Is elastic response distillation suitable for small industrial datasets in an interactive machine learning context?

Results

Below are visualizations of some exemplar results of different configurations of incremental object detection with the ERD algorithm using the industrial dataset. They show the influence of the number of incremental classes, multiple incremental steps and co-occurrence on the detection accuracy.

Figure 5: Precision of base training according to number of base classes.

Figure 6: Precision of incremental training according to number of incremental classes.

Figure 7: Duration of base and incremental training according to number of incremental classes.

Figure 8: Precision of incremental training per incremental step introducing two new classes.

Conclusion

The results indicate that while the custom industrial dataset is viable for IOD, its small size and high inter-class similarity pose challenges. Precisions are even low for joint training with all classes at once, and further decrease when performing incremental training. However, from the relative comparison between the evaluated configurations, several interesting findings are made. We generally find, that after the incremental training, the mean average precision drops compared to the base training results. This effect increases when introducing multiple incremental steps or a higher number of incremental classes. This decrease can be slowed down by having co-occurrence of the base class objects in the incremental training data. Since the mean average precision decreases with each incremental training step, it is beneficial to perform a joint training after multiple incremental steps, to increase the precision values again.
Analysis of training durations shows that incremental training takes slightly longer than base training with an equivalent number of classes. Consequently, the combined duration of base and incremental training exceeds that of joint training. For scenarios with significantly more incremental than base classes, it is faster to perform a joint training instead of an incremental training which leads to a higher precision as well.

For future work, it would be valuable to explore methods for reducing the reliance on co-occurrence for improving the incremental detection precision. One promising direction involves extending the current algorithm by integrating replay with knowledge distillation strategies, as demonstrated in the EGOR [3] and IOR [4] algorithms. Reducing the impact of co-occurrence would also offer greater flexibility in data collection, as it would no longer require users to capture images where base and incremental classes appear together.

Bibliography

[1] A. G. Menezes, G. d. Moura, C. Alves, and A. C. P. L. F. d. Carvalho, “Continual object detection: A review of definitions, strategies, and challenges.” [Online]. Available: http://arxiv.org/pdf/2205.15445v1

[2] Tao Feng, Mang Wang, and Hangjie Yuan, “Overcoming catastrophic forgetting in incremental object detection via elastic response distillation,” 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9417–9426, 2022. [Online]. Available: https: //api.semanticscholar.org/CorpusID:247958078

[3] Z. An, B. Diao, L. Huang, R. Liu, Z. An, and Y. Xu, “Egor: Efficient generated objects replay for incremental object detection,” 2024. [Online]. Available: https://arxiv.org/abs/2406.04829

[4] Z. An, B. Diao, L. Huang, R. Liu, Z. An, and Y. Xu, “Ior: Inversed objects replay for incremental object detection,” 2024. [Online]. Available: https://api.semanticscholar.org/CorpusID:270357933

Incremental Object Detection in the Context of Interactive Machine Learning for Industrial Applications

Sylvia Ackermann