Skip to Main content Skip to Navigation

Localisation par l'image en milieu urbain : application à la réalité augmentée

Antoine Fond 1, 2
2 MAGRIT - Visual Augmentation of Complex Environments
Inria Nancy - Grand Est, LORIA - ALGO - Department of Algorithms, Computation, Image and Geometry
Abstract : This thesis addresses the problem of localization in urban areas. Inferring accurate positioning in the city is important in many applications such as augmented reality or mobile robotics. However, systems based on inertial sensors (IMUs) are subject to significant drifts and GPS data can suffer from a valley effect that limits their accuracy. A natural solution is to rely on the camera pose estimation in computer vision. We notice that buildings are the main visual landmarks of human beings but also objects of interest for augmented reality applications. We therefore aim to compute the camera pose relatively to a database of known reference buildings from a single image. The problem is twofold : find the visible references in the current image (place recognition) and compute the camera pose relatively to them. Conventional approaches to these two sub-problems are challenged in urban environments due to strong perspective effects, frequent repetitions and visual similarity between facades. While specific approaches to these environments have been developed that exploit the high structural regularity of such environments, they still suffer from a number of limitations in terms of detection and recognition of facades as well as pose computation through model registration. The original method developed in this thesis is part of these specific approaches and aims to overcome these limitations in terms of effectiveness and robustness to clutter and changes of viewpoints and illumination. For do so, the main idea is to take advantage of recent advances in deep learning by convolutional neural networks to extract high-level information on which geometric models can be based. Our approach is thus mixed Bottom- Up/Top-Down and is divided into three key stages. We first propose a method to estimate the rotation of the camera pose. The 3 main vanishing points of the image of urban environnement, known as Manhattan vanishing points, are detected by a convolutional neural network (CNN) that estimates both these vanishing points and the image segmentation relative to them. A second refinement step uses this information and image segmentation in a Bayesian model to estimate these points effectively and more accurately. By estimating the camera’s rotation, the images can be rectified and thus free from perspective effects to find the translation. In a second contribution, we aim to detect the facades in these rectified images to recognize them among a database of known buildings and estimate a rough translation. For the sake of efficiency, a series of cues based on facade specific characteristics (repetitions, symmetry, semantics) have been proposed to enable the fast selection of facade proposals. Then they are classified as facade or non-facade according to a new contextual CNN descriptor. Finally, the matching of the detected facades to the references is done by a nearest neighbor search using a metric learned on these descriptors. Eventually we propose a method to refine the estimation of the translation relying on the semantic segmentation inferred by a CNN for its robustness to changes of illumination ans small deformations. If we can already estimate a rough translation from these detected facades, we choose to refine this result by relying on the se- mantic segmentation of the image inferred from a CNN for its robustness to changes of illuminations and small deformations. Since the facade is identified in the previous step, we adopt a model-based approach by registration. Since the problems of registration and segmentation are linked, a Bayesian model is proposed which enables both problems to be jointly solved. This joint processing improves the results of registration and segmentation while remaining efficient in terms of computation time. These three parts have been validated on consistent community data sets. The results show that our approach is fast and more robust to changes in shooting conditions than previous methods
Complete list of metadatas

Cited literature [123 references]  Display  Hide  Download
Contributor : Abes Star :  Contact
Submitted on : Friday, May 11, 2018 - 12:10:08 PM
Last modification on : Tuesday, December 18, 2018 - 4:18:26 PM
Document(s) archivé(s) le : Tuesday, September 25, 2018 - 9:12:19 PM


Version validated by the jury (STAR)


  • HAL Id : tel-01789709, version 1


Antoine Fond. Localisation par l'image en milieu urbain : application à la réalité augmentée. Vision par ordinateur et reconnaissance de formes [cs.CV]. Université de Lorraine, 2018. Français. ⟨NNT : 2018LORR0028⟩. ⟨tel-01789709⟩



Record views


Files downloads