Deep Learning-Based Point Cloud Filtering for Urban Digital Surface Models

Tim Kayser

Duration: 6 months
Completition: January 2024
Supervisor: Dr.-Ing. Patrick Tutzauer, Dr.-Ing. Mathias Rothermel (both ESRI/nFrames GmbH, Stuttgart)
Examiner: Prof. Dr.-Ing. Norbert Haala

Introduction and Motivation

Digital Surface Models (DSM) describe a digital representation of the earth's surface, including all natural and built features, and are one of the most important formats of geospatial data and an essential component of geographic information systems (GIS). Using photogrammetry, large regions can be captured and processed into DSMs with high accuracy and low turnaround times. However, such photogrammetric approaches are susceptible to characteristic outliers and noise in urban scenes due to buildings creating shadows and occlusions, especially in narrow alleyways, when filtering the point clouds reconstructed during dense matching into a DSM.

Meanwhile, deep learning on point clouds has seen great advancements in recent years, especially since the introduction of PointNet [1] and its derived architectures, promising the understanding of complex data patterns and extracting meaningful information. The objective of this thesis is therefore the application of such a deep learning network as a DSM filter from dense matching point clouds, with the following key contributions:

Develop a deep learning framework for generating DSMs from dense matching point clouds, based on an existing deep learning network.
Show that this framework can be applied to city-wide point clouds while still respecting small scale features by exploiting per point information returned from dense matching.
Evaluate the results of this framework against SURE [2], a commercial software for photogrammetric applications.
Asses the generalizability of our framework across different data sets.

Building the Framework

To achieve the provided objectives, we implement an adapted PointCleanNet [3] architecture, a PointNet-based network intended for cleaning point clouds by employing two stages: an outlier detector and a denoiser. PointCleanNet applies the approach proposed by PointNet on local patches, first determining per point features before aggregating them using a symmetric function to a per patch feature vector, which is then used to determine the outlier class or a displacement vector in the two stages respectively. Our adapted network utilizes the first outlier detection stage to categorize points into DSM and non-DSM classes. While PointCleanNet is limited to a three-dimensional input vector, we extend the network layers to enable the inclusion of additional per point properties.

Figure 1: Adapted PointCleanNet architecture

To train the network, two aerial data sets with high spatial resolutions are processed using SURE, ensuring great geometric accuracy in the reference DSM, and therefore allowing for the classification of the dense matching point clouds as ground truth via a simple distance threshold for each point to the DSM. Additionally, the same steps are applied to a third aerial data set that is utilized for a quantitative and qualitative evaluation of the trained models in the experiments.

Experiments and Results

To determine an ideal combination of the available per point properties in the training and inference of our network, baseline results for each of the features presented in Figure 2 are generated first.

Figure 2: Point properties derived during dense matching and included in the framework testing. Point precision is determined by propagating a half-pixel error in the dense matching while the unit normal is determined through gradients in the depth maps.

While a model trained on only the point positions, as intended by the original PointCleanNet architecture, already achieved reasonable results, it strongly filtered out points in sparsely reconstructed surfaces, leading to geometrically incomplete buildings in the DSM. The utilization of color or intensity features for the network training produces relatively similar results, with improved building edges and filtering in sparse areas, but also weaker facade rejection. Resorting instead to point normal information allows the network to reliably eliminate facade points, but the unstable nature of the normal computation leads to missing geometry in sloped surfaces. Training a model with the point precision values yielded a model that returns the highest integrity in low density regions, but also creates holes in dense, horizontal surfaces.

To mitigate the named weak points of each baseline, combinations of the introduced point properties are assessed, and the ideal feature set is determined as point positions, intensities, and precisions, providing the most reliable and complete reconstruction of the scene. Comparing the results of this ideal model with SURE, we found equivalent levels of overall quality, with improved noise in shady areas or near vegetation. Reconstruction of sparsely populated surfaces is still less complete than with the SURE DSM, but strongly improved compared to baseline results. As quantitative metrics, the binary confusion matrix for the classification is determined and precision, recall and F1-Score for both classes calculated.

Figure 3: Comparison of the DSM height maps generated by SURE, our ideal model using positions, intensities and point precisions (XYZIP), and a position-only model (XYZ).

Table 1: Quantitative metrics for the model trained on our ideal feature set.

Furthermore, a data set with looser geometric consistency requirements in the dense matching can be processed with SURE to obtain a higher point density at the cost of increased outlier counts. Here, our deep learning approach manages to better capture straight building edges and reduce noise in the DSM when comparing it with SURE. Lastly, the proposed framework is also tested against data sets from different imagery sources, namely satellite and drone.

Figure 4: Comparison of the DSM height maps and True Orthophotos (TO) generated by SURE and our ideal framework model (XYZIP) on the data set with looser geometric consistency requirements.

Conclusions

In this work, a deep learning-based framework for the filtering of dense matching point clouds into DSMs has been successfully established and its quality and reliability tested against multiple use cases.

However, with a single inference on a point cloud of roughly 175 million points taking around 63 hours, processing times are still far off from conventional filtering approaches. Additionally, the lack of attention for structures outside the local patch radius leads to multi-layered horizontal surfaces, e.g. balconies or windowsills in facades.

The application of our framework on a data set with looser geometric consistency requirements provided an example of a real-world use case, where our approach can outperform conventional DSM filtering algorithms, e.g. for data sets where ideal capture configurations may not be guaranteed.

Bibliography

Charles Ruizhongtai Qi, Hao Su, Kaichun Mo, and Leonidas J. Guibas. Pointnet:
Deep learning on point sets for 3d classi_cation and segmentation. CoRR, abs/1612.00593, 2016. URL http://arxiv.org/abs/1612.00593.

Mathias Rothermel, Konrad Wenzel, Dieter Fritsch, and Norbert Haala. Sure: Photogrammetric surface reconstruction from imagery. 12 2012.

Marie-Julie Rakotosaona, Vittorio La Barbera, Paul Guerrero, Niloy J. Mitra, and Maks Ovsjanikov. POINTCLEANNET: learning to denoise and remove outliers from dense point clouds. CoRR, abs/1901.01060, 2019. URL http://arxiv.org/abs/1901.01060.

Deep Learning-Based Point Cloud Filtering for Urban Digital Surface Models

Tim Kayser