Machine-learned 3D Building Vectorization from Satellite Imagery

Masterarbeit am ifp - Yi Wang

Yi Wang

Machine-learned 3D Building Vectorization from Satellite Imagery

Duration: 6 months
Completition: March 2021
Supervisor: Dr. Ksenia Bittner (DLR), Prof. Dr.-Ing. Norbert Haala
Examiner: Prof. Dr.-Ing. Norbert Haala



The availability of accurate, large-scale and frequently updated 3D building models has become highly demanded in various applications like urban planning and disaster monitoring. In this thesis, We propose a machine learning based approach for automatic 3D building reconstruction and vectorization. Taking a single-channel photogrammetric digital surface model (DSM) and panchromatic (PAN) image as input, we first refine the building shapes of the input DSM with a conditional generative adversarial network (cGAN). The refined DSM and the input PAN image are then used through a semantic segmentation network to detect edges and corners of building roofs. Later, a set of vectorization algorithms are proposed to build roof polygons. Finally, the height information from refined DSM is processed and added to the polygons to obtain a fully vectorized level of detail (LoD)-2 building model. We verify the effectiveness of our method on large-scale satellite images, where we obtain state-of-the-art performance.

Figure 1. Overview of the proposed 3D building vectorization method.


1. DSM building shape refinement

We propose a conditional generative adversarial neural network (cGAN) (Figure 2) to refine photogrammetric DSM, which is an extension to the work presented by Bittner et al. [1]. The network jointly learns a generator and a discriminator to do the domain transfer, i.e. from source domain, the photogrammetric DSM, to target domain, the refined DSM. With the discriminator following the PatchGAN architecture proposed by Isola et al. [2], the generator has a UNet-like structure with both long skip connections from the encoder to the decoder and short skip connections in-between the residual blocks inside the encoder. To enhance the feature of building objects, we add a self-attention called convolutional block attention module (CBAM) [3] before the decoder. The CBAM is a combination of 1D channel attention and 2D spatial attention, which are sequentially multiplied to the input feature maps. LSGAN loss, L1 loss and normal vector loss are combined in a self-weighting multi-task learning manner to improve the generator quality.

Figure 2. The proposed DSM refinement network.

2. Edge and corner detection

Given the refined DSM and PAN image, a semantic segmentation network is used to detect building edges and corners. The network architecture is identical to the generator of the DSM refinement network, except a three-channel output layer which produces the probability map.

3. 3D building model reconstruction

In the final stage, a 3D building vectorization method is proposed using the refined DSM and detected building edges and corners. Assuming building edges are straight lines, the core idea is to step-by-step build a graph data structure that stores points, lines, faces and their relationships for every single building. As being a hybrid method, the proposed approach is not limited to the complexity of different types of buildings, thus performing well especially for large area 3D building modeling. A general workflow is shown in Figure 3.

Figure 3. The proposed vectorization pipeline.

Results and conclusion

The experimental results for a testing area of Berlin are shown in Figure 4 and the reconstructed 3D building model is shown in Figure 5. With the help of a self-attention module, we obtain promising results for both regression of building heights and semantic segmentation of edges and corners. Though limitations exist in straight edge assumptions and the completeness of reconstructed building models, the results prove the overall robustness and accuracy of our proposed method.

Figure 4. Experimental results of a 500m x 500m testing area. Some buildings in (c) are not shown in other images because of the time difference. Some edges are missing in (f) compared to (e) because they don’t meet the requirements of vectorization process, especially for boundary objects as they are incomplete.
Figure 5. Reconstructed 3D building model of the testing area.


[1] Bittner, Ksenia, et al. "Long-Short Skip Connections in Deep Neural Networks for DSM Refinement." The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences 43.B2 (2020): 383-390.

[2] Isola, Phillip, et al. "Image-to-image translation with conditional adversarial networks." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.

[3] Woo, Sanghyun, et al. "Cbam: Convolutional block attention module." Proceedings of the European conference on computer vision (ECCV). 2018.


Dieses Bild zeigt Norbert Haala
apl. Prof. Dr.-Ing.

Norbert Haala

Stellvertretender Institutsleiter

Zum Seitenanfang