Implicit Joint Semantic Segmentation of Images and Point Cloud

Fangwen Shu

Duration of the Thesis: 6 months
Completion: April 2019
Supervisor: M.Sc. Dominik Laupheimer
Examiner: Prof. Dr.-Ing. Norbert Haala

Introduction

In order to avoid feeding noisy or non-uniformly sampled point cloud into 3D CNN, in this work, a novel fusion of the labeled LiDAR point cloud and oriented aerial imagery in 2D space is hypothesized, in this way, we can leverage image-based semantic segmentation and create a multi-view, multi-modal and multi-scale segmentation classifier. Thereafter through a fast back-projection of the 2D semantic result to 3D point cloud, we achieve a joint semantic segmentation of imagery and point cloud. The proposed method is validated against to our own dataset: the oriented high-resolution oblique and nadir aerial imagery of village Hessigheim, Germany, captured by unmanned aerial vehicle (UAV), as well as the LiDAR point cloud obtained by the airborne laser scanning (ALS) device. The high resolution aerial images offer views of diverse urban scene, with useful geometric characteristics derived from point cloud, it is a potential combination to set up a big dataset for training a well-engineered deep CNN.

Dataset

In this work, a large-scale ALS point cloud covering the village Hessigheim, Germany, as well as the nadir and oblique imagery which are captured by UAV (Cramer et al., 2018). It is worth to note that LiDAR point cloud is already manually labeled (master thesis from Michael Kölle), while aerial imagery is not labelled initially.

Figure 1: The colored ground truth label imagery generated by projected point cloud, with the list of classes, in this work, the 12th class is added to indicate void pixel where no 3D points are projected (as there is no ground truth point cloud available).

Figure 2: The LiDAR point cloud colored with the label value, visualization in 3D space by software CloudCompare, and the overview of the point cloud block, which is divided into training-set (yellow), validation-set (blue) and test-set (green). Here we use diﬀerent color scale for visualization to distinguish 3D point cloud data and 2D imagery.

Methodology to Establish Connection Between 3D Point Cloud and 2D Imagery

Figure 3: Flow chart of the implemented pipeline.

Training

Different fusion strategies of multi-modal data were tested based on different CNNs.

Naïve stacking:

Figure 4: The input image is simply attaching more channels combined with RGB channels. Figure of SegNet extracted from (Badrinarayanan et al., 2017).

Multi-scale fusion:

Figure 5: Alternative architectures to capture multi-scale context. Figure extracted from (Chen et al., 2017).

Early- or late-stage fusion of feature map:

Figure 6: FuseNet architecture for early- and late-stage fusion of remote sensing data (Audebert et al., 2018). The input of main stream is RGB image, the projected LiDAR feature image is used as auxiliary input.

Results

Figure 7: Top: The ground truth of an instance image (left) and the test-set of point cloud (right). Bottom: The semantic prediction of this image and the back-projected semantic result of whole test-set of point cloud.

Figure 8: Semantic predictions of diﬀerent experiments.

References

Audebert, N., Le Saux, B., and Lefèvre, S. (2018). Beyond rgb: Very high resolution urban remote sensing with multimodal deep networks. ISPRS Journal of Photogrammetry and Remote Sensing, 140:20–32.

Badrinarayanan, V., Kendall, A., and Cipolla, R. (2017). Segnet: A deep convolutional encoderdecoder architecture for image segmentation. IEEE transactions on pattern analysis and machine intelligence, 39(12):2481–2495.

Chen, L.-C., Papandreou, G., Schroﬀ, F., and Adam, H. (2017). Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587.

Cramer, M., Haala, N., Laupheimer, D., Mandlburger, G., and Havel, P. (2018). Ultrahigh precision uav-based lidar and dense image matching. International Archives of the Photogrammetry, Remote Sensing & Spatial Information Sciences.

Implicit Joint Semantic Segmentation of Images and Point Cloud

Fangwen Shu