Road Inventory Mapping with Street-Level Imagery from iPhone - A Combination of Structure from Motion and Deep Learning

Master Thesis at ifp - Yujing Chen

Yujing Chen

Road Inventory Mapping with Street-Level Imagery from iPhone

A Combination of Structure from Motion and Deep Learning

Duration of the Thesis: 6 months
Completion: July 2019
Supervisor: Dr.-Ing. Michael Cramer, Achim Hoth (corparate partner)
Examiner: Prof. Dr.-Ing. Norbert Haala



We present a system that uses the latest object detection methods and structure from motion to 3D reconstruct and analyse street-view image sequences taken by mobile phones. An object detection model was trained and deployed using Tensorflow, to detect traffic signs on images. A 3D reconstruction of the scene is produced by key points matching between images. Within the reconstructed 3D scene, the object detection on the images can be transferred into the 3D space or vice versa. A top-view of the street surface is generated, by transforming and stitching the street-view image. Our system built the connection between images and 3D space, enabling transferring pixel coordinates to world coordinates, locating objects in 3D space, and measuring the real world dimension of objects on images. This system is used to empower automatic road condition surveillance for our corporate partner.

Key words: Computer Vision, Machine Learning, Structure from Motion, Convolutional Neural Networks.


Figure 1: Examples of traffic sign detection

The green bounding boxes are the detections. The detections can be later located in 3D reconstructions.

Figure 2: Examples of instance segmentation and binary masks

In the instance segmentation, a bounding box and corresponding class and confidence is attached to each object as object detection, additionally, a mask shows which pixels belongs to the object is computed for each bounding box. The binary masks are used later for road surface stitching.

Figure 3: An example of 3D reconstruction
Figure 4: An example of 3D reconstruction

The points are sparse points cloud, which are extracted as key points in the images and used to match between images and reconstruct the 3D scene. The triangles represent camera positions, where the images were taken.

Figure 5: An example of road surface stitching

The brightness of the road surface stitching is different, because the two inspections are taken from different lighting condition.


Figure 6: Drawing lines 3D to 2D conversion

Figure (a) shows the lines drawn on the street surface in a 3D environment. The red points and lines are on the street surface. Figure (b) and (c) shows the same set of points and lines projected onto different images.


Figure 7: An example of projected damage on road surface

The colorful polygons are the projected damages on the road surface. The white triangles are the camera positions where the images were taken.


Figrue 8: Oblique overall view of mapped utilities and cameras in Mapbox


We developed and discussed algorithm associated with a low-cost mobile mapping systems using mobile phones alone. Deep learning using convolutional neural networks was used to detect objects from the images, followed by structure from motion to rebuild the 3D scene, to locate and map the objects, and to generate road surface stitching. The data from our corporate partner was closely examined, and improvement and potentials are proposed and tested. This project also indicates the great potential of portable and low cost mobile mapping systems using mobile phone associated with road surface and inventory.


Dieses Bild zeigt Michael Cramer

Michael Cramer

Gruppenleiter Photogrammetrische Systeme

Zum Seitenanfang