Instance-based mapping of dynamic environments with a focus on construction site monitoring

Master Thesis at ifp - Vincent Tim Kassulat

Tim Kassulat

Instance-based mapping of dynamic environments with a focus on construction site monitoring

Duration: 6 months
Completion: September 2025
Supervisor: M.Sc. Vincent Reß
Examiner: apl. Prof. Dr.-Ing. Norbert Haala

Introduction

Construction sites are highly dynamic environments that are subject to long-term and short-term changes. In the context of monitoring, this leads to a difficult problem: moving objects, including vehicles, people, construction materials, machines, etc., are usually not part of the relevant structure and can lead to far-reaching restrictions such as obscuring, deterioration of visibility, and lack of accessibility during monitoring. These factors are reflected in the digital model by additional object points for irrelevant, moving structures, as well as missing points when capturing relevant building structures.

By comparing the actual model to a previous point cloud it can be determined, which parts of the scene are unchanged and which have been moved. To do this, the point clouds need to be segmented into instances, including a clear description of the characteristics of each object to match them.

The goal is to find clear cross-epoch correspondences in the two point clouds and clearly identify, if each instance has changed or stayed as it is (as shown in Figure 1).

Figure 1: Classification into dynamics of an active construction site after comparison with the previous period, red = static, green = low dynamic, blue = high dynamic. (Image created using ChatGPT)

Methods

Figure 2 shows the schematic processing sequence used to process both point clouds.

Figure 2: Processing Pipeline

Various neural networks are used in the individual sub-steps:

  • GeDi [1]: offers RANSAC-based feature registration, which also works well in dynamic environments. Works partly on a PointNet architecture.
  • Mask3D [2]: a neural network for 3D instance segmentation of point clouds that automatically recognizes and separates objects in a scene. It is characterized by its high accuracy on many different datasets.
  • PointNet++ [3]: uses multi-layer perceptrons and pooling layers to form a global feature vector from point clouds.

Test scenario

This method is being tested in a small indoor case study in which the changes are known and can therefore serve as ground truth. Actual ground truth in the form of a point cloud is not possible in this case, so a visual assessment must be used instead.

Figure 3: Test scene for the first (left) and second (right) epoch; all objects that were moved between the images were outlined in red in both epochs.

This environment was captured using a multisensor system consisting of a camera and laser scanner with integrated IMU, which creates its map using FAST-LIVO2 [4]. 

Results

The results of the second epoch are considered here as an example. Figures 4a-d) show both the segmentation and the comparison based on the segmentation with Mask3D, as well as a comparison between manually optimized segments.

Figure 4a (left): Original Mask3D segmentation of the second recording epoch; Figure 4b (right): Manually corrected segmentation from 4a
Figure 4c (left): Comparison with instances from the first epoch, green = statically detected, red = dynamic, gray = no assignment/correspondence found; Figure 4d (right): Comparison with correct instances, green = statically detected, red = dynamically detected, gray = no assignment/correspondence found

While Figure 4a shows a reasonably meaningful segmentation, the segmentation in the first epoch was highly inaccurate. This is partly due to the poorer quality of this image. Problems arise during comparison when one of the two point clouds is poorly segmented. This causes objects, including their feature vectors, to differ too greatly, so that often no correspondence or an incorrect correspondence is found. Additional problems arise in instances with similar feature vectors, as these can no longer be clearly assigned. With corrected segmentation (4b), almost every static instance could be correctly assigned, allowing the dynamics to be separated. This meant that even under simulated occlusion in one of the two epochs, the information from the other could be used to supplement the instance.

Figure 5: Simulated coveraged areas (top) can be supplemented with correspondences (bottom)

Conclusion

The work has shown that the presented concept is fundamentally suitable for detecting movements in 3D scenes across multiple images, recognizing objects, and reducing occlusions using existing information. While registration and geometric comparison ran reliably, segmentation remained a dominant source of error in the overall process. Since further steps build on this result, segmentation significantly limits practical applicability. Under ideal conditions—i.e., with optimal segmentation—it was shown that the method is capable of distinguishing objects and capturing dynamic changes. The work thus provides a conceptual basis on which future developments can build.

 

References (selection)

[1] F. Poiesi and D. Boscaini, "Learning General and Distinctive 3D Local Deep Descriptors for Point Cloud Registration," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 3, pp. 3979-3985, 1 March 2023.

Online available under: https://ieeexplore.ieee.org/abstract/document/9775606

[2] J. Schult, F. Engelmann, A. Hermans, O. Litany, S. Tang, B. Leibe, "Mask3D: Mask Transformer for 3D Semantic Instance Segmentation", 2022

Online available under:  https://arxiv.org/abs/2210.03105

[3] Charles R. Qi, Li Yi, Hao Su, Leonidas J.Guibas "PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space", 2017

Online available under:  https://arxiv.org/pdf/1706.02413

[4] C. Zheng et al., "FAST-LIVO2: Fast, Direct LiDAR–Inertial–Visual Odometry," in IEEE Transactions on Robotics, vol. 41, pp. 326-346, 2025

Online available under:  https://ieeexplore.ieee.org/abstract/document/10757429

Ansprechpartner

Dieses Bild zeigt Norbert Haala

Norbert Haala

apl. Prof. Dr.-Ing.

Stellvertretender Institutsleiter

Zum Seitenanfang