, 2.2 Point correspondences matching based methods, p.22

. .. Machine, 28 2.3.1 Machine learning theory for pose estimation

. .. ,

. .. Hybrid-approach, 41 2.5.1 Sparse random forest based methods

. .. Conclusion, P. .-;-amine-kacete, J. Richard, and . Royan, Efficient multi-output scene coordinate prediction for fast and accurate camera relocalization from a single RGB image, Computer Vision and Image Understanding, 15th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, vol.2020, 2019.

?. Duong, A. Kacete, C. Sodalie, P. Richard, and J. Royan, xyzNet: Towards Machine Learning Camera Relocalization by Using a Scene Coordinate Prediction Network, IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct), pp.258-263, 2018.
URL : https://hal.archives-ouvertes.fr/hal-02048735

?. Duong, A. Kacete, C. Soladie, P. Richard, and J. Royan, Forêt de Régression Précise basée sur des Caractéristiques Éparses pour la Relocalisation de Caméra en Temps-Réel, IEEE International Conference on 3D Vision (3DV), pp.643-652, 2018.

?. Duong, A. Kacete, C. Soladie, P. Richard, and J. Royan, Patents ? Nam-Duong Duong, Amine Kacete, Catherine Soladie. Method for Estimating The Installation of a Camera in The Reference Frame of a Three-Dimensional Scene, Device, Augmented Reality System and Associated Computer Program, congrès Reconnaissance des Formes, 2018.

?. Duong, A. Kacete, and C. Soladie, Procédé de prédiction d'une représentation en trois dimensions (3D), Dispositif, Système et Programme d'ordinateur correspondant, vol.1873626

S. Agarwal, Y. Furukawa, N. Snavely, B. Curless, S. M. Seitz et al., Reconstructing rome, Computer, vol.43, issue.6, pp.40-47, 2010.

S. Agarwal, N. Snavely, I. Simon, S. M. Seitz, and R. Szeliski, Building rome in a day, IEEE 12th international conference on computer vision, pp.72-79, 2009.

P. F. Alcantarilla, A. Bartoli, and A. J. Davison, Kaze features, European Conference on Computer Vision, pp.214-227, 2012.

P. F. Alcantarilla, J. J. Yebes, J. Almazán, and L. M. Bergasa, On combining visual slam and dense scene flow to increase the robustness of localization and mapping in dynamic environments, Robotics and Automation (ICRA), 2012 IEEE International Conference on, pp.1290-1297, 2012.

R. Arandjelovic, P. Gronat, A. Torii, T. Pajdla, and J. Sivic, Netvlad: Cnn architecture for weakly supervised place recognition, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.5297-5307, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01557234

R. Arandjelovic and A. Zisserman, All about vlad, Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp.1578-1585, 2013.

C. Arth, D. Wagner, M. Klopschitz, A. Irschara, and D. Schmalstieg, Wide area localization on mobile phones, 8th ieee international symposium on mixed and augmented reality, pp.73-82, 2009.

S. Arya, D. M. Mount, N. S. Netanyahu, R. Silverman, and A. Y. Wu, An optimal algorithm for approximate nearest neighbor searching fixed dimensions, Journal of the ACM (JACM), vol.45, issue.6, pp.891-923, 1998.

R. T. Azuma, A survey of augmented reality, Presence: Teleoperators & Virtual Environments, vol.6, issue.4, pp.355-385, 1997.

A. Babenko and V. Lempitsky, Aggregating local deep features for image retrieval, The IEEE International Conference on Computer Vision (ICCV), 2015.

V. Balntas, S. Li, and V. Prisacariu, Relocnet: Continuous metric learning relocalisation using neural nets, Proceedings of the European Conference on Computer Vision (ECCV), pp.751-767, 2018.

H. Bay, A. Ess, T. Tuytelaars, and L. Van-gool, Computer vision and image understanding, vol.110, pp.346-359, 2008.

H. Bay, T. Tuytelaars, and L. Van-gool, Surf: Speeded up robust features, European conference on computer vision, pp.404-417, 2006.

J. S. Beis and D. G. Lowe, Shape indexing using approximate nearest-neighbour search in high-dimensional spaces, cvpr, vol.97, p.1000, 1997.

B. Bescos, J. M. Facil, J. Civera, and J. Neira, Dynaslam: Tracking, mapping, and inpainting in dynamic scenes, IEEE Robotics and Automation Letters, vol.3, issue.4, pp.4076-4083, 2018.

G. Bleser and D. Stricker, Advanced tracking through efficient image processing and visual-inertial sensor fusion, Computers & Graphics, vol.33, issue.1, pp.59-72, 2009.

L. Bottou, Large-scale machine learning with stochastic gradient descent, 2010.

S. Boyd and L. Vandenberghe, Convex optimization, 2004.

E. Brachmann, A. Krull, F. Michel, S. Gumhold, J. Shotton et al., Learning 6d object pose estimation using 3d object coordinates, European Conference on Computer Vision, pp.536-551, 2014.

E. Brachmann, A. Krull, S. Nowozin, J. Shotton, F. Michel et al., Dsac -differentiable ransac for camera localization, The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.

E. Brachmann, F. Michel, A. Krull, M. Y. Yang, S. Gumhold et al., Uncertainty-driven 6d pose estimation of objects and scenes from a single rgb image, Conference on Computer Vision and Pattern Recognition, 2016.

E. Brachmann, F. Michel, A. Krull, Y. Yang, M. Gumhold et al., Uncertainty-driven 6d pose estimation of objects and scenes from a single rgb image, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.3364-3372, 2016.

E. Brachmann and C. Rother, Learning less is more-6d camera localization via 3d surface regression, Proc. CVPR, vol.8, 2018.

G. Bradski, The opencv library. Dr Dobb's, J. Software Tools, vol.25, pp.120-125, 2000.

S. Brahmbhatt, J. Gu, K. Kim, J. Hays, and J. Kautz, Geometry-aware learning of maps for camera localization, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.2616-2625, 2018.

L. Breiman, Random forests. Machine learning, vol.45, pp.5-32, 2001.

M. Bui, S. Albarqouni, S. Ilic, and N. Navab, Scene coordinate and correspondence learning for image-based localization, BMVC, p.3, 2018.

M. Cai, C. Shen, R. , and I. D. , A hybrid probabilistic model for camera relocalization, BMVC, 2018.

M. Calonder, V. Lepetit, C. Strecha, and P. Fua, Brief: Binary robust independent elementary features, European conference on computer vision, pp.778-792, 2010.

F. Camposeco, A. Cohen, M. Pollefeys, and T. Sattler, Hybrid camera pose estimation, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.136-144, 2018.

T. Cavallari, S. Golodetz, N. A. Lord, J. Valentin, L. Di-stefano et al., On-the-fly adaptation of regression forests for online camera relocalisation, The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.

R. Clark, S. Wang, A. Markham, N. Trigoni, W. et al., Vidloc: A deep spatio-temporal model for 6-dof video-clip relocalization, The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.

A. Criminisi and J. Shotton, Decision forests for computer vision and medical image analysis, 2013.

M. Cummins and P. Newman, Appearance-only slam at large scale with fab-map 2.0, The International Journal of Robotics Research, vol.30, issue.9, pp.1100-1123, 2011.

A. J. Davison, I. D. Reid, N. D. Molton, and O. Stasse, Monoslam: Real-time single camera slam, IEEE transactions on pattern analysis and machine intelligence, vol.29, pp.1052-1067, 2007.

D. F. Dementhon and L. S. Davis, Model-based object pose in 25 lines of code, International journal of computer vision, vol.15, issue.1-2, pp.123-141, 1995.

M. Donoser and D. Schmalstieg, Discriminative feature-to-point matching in image-based localization, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.516-523, 2014.

J. Engel, V. Koltun, and D. Cremers, Direct sparse odometry, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017.

J. Engel, T. Schöps, and D. Cremers, Lsd-slam: Large-scale direct monocular slam, European Conference on Computer Vision, pp.834-849, 2014.

G. Fanelli, J. Gall, and L. Van-gool, Real time head pose estimation with random regression forests, Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, pp.617-624, 2011.

O. Faugeras and O. A. Faugeras, Three-dimensional computer vision: a geometric viewpoint, 1993.

Y. Feng, L. Fan, and Y. Wu, Fast localization in large-scale environments using supervised indexing of binary features, IEEE Transactions on Image Processing, vol.25, issue.1, pp.343-358, 2016.

M. A. Fischler and R. C. Bolles, Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography, Communications of the ACM, vol.24, issue.6, pp.381-395, 1981.

J. H. Friedman, J. L. Bentley, and R. A. Finkel, An algorithm for finding best matches in logarithmic expected time, ACM Trans. Math. Softw, vol.3, issue.3, pp.209-226, 1977.

Y. Gal, Uncertainty in deep learning, 2016.

J. Gall and V. Lempitsky, Class-specific hough forests for object detection. In Decision forests for computer vision and medical image analysis, pp.143-157, 2013.

J. Gall, A. Yao, N. Razavi, L. Van-gool, and V. Lempitsky, Hough forests for object detection, tracking, and action recognition, IEEE transactions, vol.33, issue.11, pp.2188-2202, 2011.

D. Gálvez-lópez and J. D. Tardós, Bags of binary words for fast place recognition in image sequences, IEEE Transactions on Robotics, vol.28, issue.5, pp.1188-1197, 2012.

X. Gao, X. Hou, J. Tang, and H. Cheng, Complete solution classification for the perspective-three-point problem, IEEE transactions, vol.25, issue.8, pp.930-943, 2003.

R. Girshick, Fast r-cnn, Proceedings of the IEEE International Conference on Computer Vision, pp.1440-1448, 2015.

R. Girshick, J. Donahue, T. Darrell, M. , and J. , Rich feature hierarchies for accurate object detection and semantic segmentation, Proceedings of the IEEE conference on computer vision and pattern recognition, pp.580-587, 2014.

B. Glocker, J. Shotton, A. Criminisi, and S. Izadi, Real-time rgb-d camera relocalization via randomized ferns for keyframe encoding, IEEE transactions on visualization and computer graphics, vol.21, issue.5, pp.571-583, 2015.

Y. Gong, L. Wang, R. Guo, and S. Lazebnik, Multi-scale orderless pooling of deep convolutional activation features, European conference on computer vision, pp.392-407, 2014.

I. Goodfellow, Y. Bengio, and A. Courville, Deep learning, 2016.

A. Gordo, J. Almazan, J. Revaud, and D. Larlus, End-to-end learning of deep visual representations for image retrieval, International Journal of Computer Vision, vol.124, issue.2, pp.237-254, 2017.

A. Gordoa, J. A. Rodríguez-serrano, F. Perronnin, and E. Valveny, Leveraging category-level labels for instance-level image retrieval, 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp.3045-3052, 2012.

S. Gupta, P. Arbeláez, R. Girshick, M. , and J. , Aligning 3d models to rgb-d images of cluttered scenes, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.4731-4740, 2015.

S. Gupta, R. Girshick, P. Arbeláez, M. , and J. , Learning rich features from rgb-d images for object detection and segmentation, European Conference on Computer Vision, pp.345-360, 2014.

A. Guzman-rivera, P. Kohli, B. Glocker, J. Shotton, T. Sharp et al., Multi-output learning for camera relocalization, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.1114-1121, 2014.

R. M. Haralick, D. Lee, K. Ottenburg, and M. Nolle, Analysis and solutions of the three point perspective pose estimation problem, Proceedings. 1991 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp.592-598, 1991.

R. Hartley and A. Zisserman, Multiple view geometry in computer vision, Robotica, vol.23, issue.2, pp.271-271, 2005.

R. I. Hartley and P. Sturm, Triangulation. Computer vision and image understanding, vol.68, pp.146-157, 1997.
URL : https://hal.archives-ouvertes.fr/inria-00525693

J. Hays and A. A. Efros, Im2gps: estimating geographic information from a single image, 2008 ieee conference on computer vision and pattern recognition, pp.1-8, 2008.

K. He, X. Zhang, S. Ren, and J. Sun, Deep residual learning for image recognition, Proceedings of the IEEE conference on computer vision and pattern recognition, pp.770-778, 2016.

J. Hensman, N. Fusi, and N. D. Lawrence, Gaussian processes for big data, Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence, UAI'13, pp.282-290, 2013.

B. K. Horn, Closed-form solution of absolute orientation using unit quaternions, vol.4, pp.629-642, 1987.

B. K. Horn, H. M. Hilden, and S. Negahdaripour, Closed-form solution of absolute orientation using orthonormal matrices, JOSA A, vol.5, issue.7, pp.1127-1135, 1988.

C. Huang, X. Ding, and C. Fang, Head pose estimation based on random forests for multiclass classification, Pattern Recognition (ICPR), 2010 20th International Conference on, pp.934-937, 2010.

J. Huang, X. Shao, and H. Wechsler, Face pose discrimination using support vector machines (svm), Proceedings. Fourteenth International Conference on, vol.1, pp.154-156, 1998.

D. Q. Huynh, Metrics for 3d rotations: Comparison and analysis, Journal of Mathematical Imaging and Vision, vol.35, issue.2, pp.155-164, 2009.

A. Irschara, C. Zach, J. Frahm, and H. Bischof, From structure-frommotion point clouds to fast location recognition, Computer Vision and Pattern Recognition, pp.2599-2606, 2009.

H. Jégou, M. Douze, C. Schmid, and P. Pérez, Aggregating local descriptors into a compact image representation, CVPR 2010-23rd IEEE Conference on Computer Vision & Pattern Recognition, pp.3304-3311, 2010.

H. Jegou, F. Perronnin, M. Douze, J. Sánchez, P. Perez et al., Aggregating local image descriptors into compact codes, IEEE transactions, vol.34, issue.9, pp.1704-1716, 2012.
URL : https://hal.archives-ouvertes.fr/inria-00633013

Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long et al., Caffe: Convolutional architecture for fast feature embedding, 2014.

N. Jiang, Z. Cui, and P. Tan, A global linear method for camera pose registration, Proceedings of the IEEE International Conference on Computer Vision, pp.481-488, 2013.

J. Kim, H. Dunn, E. Frahm, and J. , Predicting good features for image geolocalization using per-bundle vlad, Proceedings of the IEEE International Conference on Computer Vision, pp.1170-1178, 2015.

W. Kabsch, A solution for the best rotation to relate two sets of vectors, Acta Crystallographica Section A: Crystal Physics, Diffraction, Theoretical and General Crystallography, vol.32, issue.5, pp.922-923, 1976.

A. Kacete, J. Royan, R. Seguier, M. Collobert, and C. Soladie, Real-time eye pupil localization using hough regression forest, Applications of Computer Vision (WACV), 2016 IEEE Winter Conference on, pp.1-8, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01393562

A. Kacete, T. Wentz, and J. Royan, ). [poster] decision forest for efficient and robust camera relocalization, 2017 IEEE International Symposium on, pp.20-24, 2017.

W. Kehl, F. Milletari, F. Tombari, S. Ilic, and N. Navab, Deep learning of local rgb-d patches for 3d object detection and 6d pose estimation, European Conference on Computer Vision, pp.205-220, 2016.

A. Kendall and R. Cipolla, Modelling uncertainty in deep learning for camera relocalization, Proceedings of the International Conference on Robotics and Automation (ICRA), 2016.

A. Kendall and R. Cipolla, Geometric loss functions for camera pose regression with deep learning, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017.

A. Kendall, M. Grimes, and R. Cipolla, Posenet: A convolutional network for real-time 6-dof camera relocalization, Proceedings of the IEEE International Conference on Computer Vision, pp.2938-2946, 2015.

H. J. Kim, E. Dunn, and J. Frahm, Learned contextual feature reweighting for image geo-localization, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.3251-3260, 2017.

G. Klein and T. Drummond, Sensor fusion and occlusion refinement for tablet-based ar, Proceedings of the 3rd IEEE/ACM International Symposium on Mixed and Augmented Reality, pp.38-47, 2004.

G. Klein and D. Murray, Parallel tracking and mapping for small ar workspaces, 6th IEEE and ACM International Symposium on, pp.225-234, 2007.

G. Klein and D. Murray, Improving the agility of keyframe-based slam, European Conference on Computer Vision, pp.802-815, 2008.

R. Kouskouridas, A. Tejani, A. Doumanoglou, D. Tang, and T. Kim, Latent-class hough forests for 6 dof object pose estimation, 2016.

A. Krizhevsky, I. Sutskever, and G. E. Hinton, Imagenet classification with deep convolutional neural networks, Advances in neural information processing systems, pp.1097-1105, 2012.

A. Krull, E. Brachmann, F. Michel, Y. Yang, M. Gumhold et al., Learning analysis-by-synthesis for 6d pose estimation in rgb-d images, Proceedings of the IEEE International Conference on Computer Vision, pp.954-962, 2015.

A. Krull, F. Michel, E. Brachmann, S. Gumhold, S. Ihrke et al., 6-dof model based tracking via object coordinate regression, Asian Conference on Computer Vision, pp.384-399, 2014.

J. Kwon and K. M. Lee, Monocular slam with locally planar landmarks via geometric rao-blackwellized particle filtering on lie groups, Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, pp.1522-1529, 2010.

J. N. Kwong and S. Gong, Learning support vector machines for a multi-view face model, BMVC, pp.1-10, 1999.

Z. Laskar, I. Melekhov, S. Kalia, and J. Kannala, Camera relocalization by computing pairwise relative poses using convolutional neural network, Proceedings of the IEEE International Conference on Computer Vision, pp.929-938, 2017.

Y. Lecun, Y. Bengio, and G. Hinton, Deep learning. nature, vol.521, p.436, 2015.

Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, Gradient-based learning applied to document recognition, Proceedings of the IEEE, vol.86, issue.11, pp.2278-2324, 1998.

V. Lepetit and P. Fua, Keypoint recognition using randomized trees, IEEE transactions on pattern analysis and machine intelligence, vol.28, pp.1465-1479, 2006.

V. Lepetit and P. Fua, Monocular model-based 3d tracking of rigid objects: A survey, Foundations and Trends® in Computer Graphics and Vision, vol.1, issue.1, pp.1-89, 2005.

V. Lepetit, F. Moreno-noguer, and P. Fua, Epnp: An accurate o (n) solution to the pnp problem, International journal of computer vision, vol.81, issue.2, p.155, 2009.

R. Li, S. Wang, Z. Long, and D. Gu, Undeepvo: Monocular visual odometry through unsupervised deep learning, 2018 IEEE International Conference on Robotics and Automation (ICRA), pp.7286-7291, 2018.

S. Z. Li, Q. Fu, L. Gu, B. Scholkopf, Y. Cheng et al., Kernel machine based learning for multi-view face detection and pose estimation, Proceedings. Eighth IEEE International Conference on, vol.2, pp.674-679, 2001.

X. Li, J. Ylioinas, and J. Kannala, Full-frame scene coordinate regression for image-based localization, RSS, 2018.

X. Li, J. Ylioinas, J. Verbeek, and J. Kannala, Scene coordinate regression with angle-based reprojection loss for camera relocalization, Proceedings of the European Conference on Computer Vision (ECCV), pp.0-0, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01867143

Y. Li, S. Gong, J. Sherrah, and H. Liddell, Support vector machine based multi-view face detection and recognition, Image and Vision Computing, vol.22, issue.5, pp.413-427, 2004.

Y. Li, N. Snavely, D. Huttenlocher, and P. Fua, Worldwide pose estimation using 3d point clouds, European conference on computer vision, pp.15-29, 2012.

Y. Li, N. Snavely, and D. P. Huttenlocher, Location recognition using prioritized feature matching, European Conference on Computer Vision, pp.791-804, 2010.

S. Lieberknecht, A. Huber, S. Ilic, and S. Benhimane, Rgb-d camera-based parallel tracking and meshing, 10th IEEE International Symposium on, pp.147-155, 2011.

L. Liu, H. Li, and Y. Dai, Efficient global 2d-3d matching for camera localization in a large-scale 3d map, Proceedings of the IEEE International Conference on Computer Vision, pp.2372-2381, 2017.

T. Liu, A. W. Moore, K. Yang, and A. G. Gray, An investigation of practical approximate nearest neighbor algorithms, Advances in neural information processing systems, pp.825-832, 2005.

J. Long, E. Shelhamer, D. , and T. , Fully convolutional networks for semantic segmentation, Proceedings of the IEEE conference on computer vision and pattern recognition, pp.3431-3440, 2015.

D. G. Lowe, Object recognition from local scale-invariant features, The proceedings of the seventh IEEE international conference on, vol.2, pp.1150-1157, 1999.

D. G. Lowe, Distinctive image features from scale-invariant keypoints. International journal of computer vision, vol.60, pp.91-110, 2004.

V. Malyavej, P. Torteeka, S. Wongkharn, and T. Wiangtong, Pose estimation of unmanned ground vehicle based on dead-reckoning/gps sensor fusion by unscented kalman filter, Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology, vol.1, pp.395-398, 2009.

F. Massa, R. Marlet, A. , and M. , Crafting a multi-task cnn for viewpoint estimation, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01743267

D. Massiceti, A. Krull, E. Brachmann, C. Rother, T. et al., Random forests versus neural networks-what's best for camera localization?, 2017 IEEE International Conference on, pp.5118-5125, 2017.

I. Melekhov, J. Ylioinas, J. Kannala, and E. Rahtu, Image-based localization using hourglass networks, Proceedings of the IEEE International Conference on Computer Vision, pp.879-886, 2017.

L. Meng, J. Chen, F. Tung, J. J. Little, and C. W. Silva, Exploiting random rgb and sparse features for camera pose estimation, BMVC, 2016.

L. Meng, J. Chen, F. Tung, J. Little, J. Valentin et al., Backtracking regression forests for accurate camera relocalization, IEEE/RSJ International Conference on Intelligent Robots and Systems, 2017.

L. Meng, F. Tung, J. J. Little, J. Valentin, and C. W. Silva, Exploiting points and lines in regression forests for rgb-d camera relocalization, IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp.6827-6834, 2018.

F. Michel, A. Krull, E. Brachmann, M. Y. Yang, S. Gumhold et al., Pose estimation of kinematic chain instances via object coordinate regression, Proc. British Machine Vision Conf, pp.181-182, 2015.

P. Milgram, H. Takemura, A. Utsumi, and F. Kishino, Augmented reality: A class of displays on the reality-virtuality continuum, Telemanipulator and telepresence technologies, vol.2351, pp.282-293, 1995.

P. Moulon, P. Monasse, and R. Marlet, Global fusion of relative motions for robust, accurate and scalable structure from motion, Proceedings of the IEEE International Conference on Computer Vision, pp.3248-3255, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00873504

M. Muja and D. G. Lowe, Fast approximate nearest neighbors with automatic algorithm configuration, VISAPP, vol.2, issue.1, p.2, 2009.

R. Mur-artal, J. M. Montiel, and J. D. Tardos, Orb-slam: a versatile and accurate monocular slam system, IEEE Transactions on Robotics, vol.31, issue.5, pp.1147-1163, 2015.

R. Mur-artal and J. D. Tardós, Fast relocalisation and loop closing in keyframe-based slam, 2014 IEEE International Conference on Robotics and Automation (ICRA), pp.846-853, 2014.

R. Mur-artal and J. D. Tardós, Orb-slam2: An open-source slam system for monocular, stereo, and rgb-d cameras, IEEE Transactions on Robotics, vol.33, issue.5, pp.1255-1262, 2017.

E. Murphy-chutorian, A. Doshi, and M. M. Trivedi, Head pose estimation for driver assistance systems: A robust algorithm and experimental evaluation, Intelligent Transportation Systems Conference, pp.709-714, 2007.

T. Naseer and W. Burgard, Deep regression for monocular camera-based 6-dof global localization in outdoor environments, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp.1525-1530, 2017.

R. A. Newcombe, D. Fox, and S. M. Seitz, Dynamicfusion: Reconstruction and tracking of non-rigid scenes in real-time, Proceedings of the IEEE conference on computer vision and pattern recognition, pp.343-352, 2015.

R. A. Newcombe, S. Izadi, O. Hilliges, D. Molyneaux, D. Kim et al., Kinectfusion: Real-time dense surface mapping and tracking, Mixed and augmented reality (ISMAR), 2011 10th IEEE international symposium on, pp.127-136, 2011.

R. A. Newcombe, S. J. Lovegrove, and A. J. Davison, Dtam: Dense tracking and mapping in real-time, 2011 international conference on computer vision, pp.2320-2327, 2011.

D. Nister and H. Stewenius, Scalable recognition with a vocabulary tree, IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06), vol.2, pp.2161-2168, 2006.

A. Oliva and A. Torralba, Modeling the shape of the scene: A holistic representation of the spatial envelope, International journal of computer vision, vol.42, issue.3, pp.145-175, 2001.

B. Peasley and S. Birchfield, Rgbd point cloud alignment using lucas-kanade data association and automatic error metric selection, IEEE Transactions on Robotics, vol.31, issue.6, pp.1548-1554, 2015.

F. Perronnin, Y. Liu, J. Sánchez, and H. Poirier, Large-scale image retrieval with compressed fisher vectors, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp.3384-3391, 2010.

M. Pollefeys, L. Van-gool, M. Vergauwen, F. Verbiest, K. Cornelis et al., Visual modeling with a hand-held camera, International Journal of Computer Vision, vol.59, issue.3, pp.207-232, 2004.

M. Pupilli and A. Calway, Real-time camera tracking using a particle filter, BMVC, 2005.

L. Quan and Z. Lan, Linear n-point camera pose determination, IEEE Transactions, vol.21, issue.8, pp.774-780, 1999.
URL : https://hal.archives-ouvertes.fr/inria-00590105

N. Radwan, A. Valada, and W. Burgard, Vlocnet++: Deep multitask learning for semantic visual localization and odometry, IEEE Robotics and Automation Letters, vol.3, issue.4, pp.4407-4414, 2018.

C. E. Rasmussen, Gaussian processes in machine learning, Summer School on Machine Learning, pp.63-71, 2003.

S. Ren, K. He, R. Girshick, and J. Sun, Faster r-cnn: Towards real-time object detection with region proposal networks, Advances in neural information processing systems, pp.91-99, 2015.

L. Riazuelo, L. Montano, and J. Montiel, Semantic visual slam in populated environments, 2017 European Conference on Mobile Robots (ECMR), pp.1-7, 2017.

E. Rosten and T. Drummond, Machine learning for high-speed corner detection, European conference on computer vision, pp.430-443, 2006.

E. Rublee, V. Rabaud, K. Konolige, and G. R. Bradski, Orb: An efficient alternative to sift or surf, ICCV, p.2, 2011.

D. E. Rumelhart, G. E. Hinton, and R. J. Williams, Learning representations by back-propagating errors, Cognitive modeling, vol.5, issue.3, p.1, 1988.

S. Rusinkiewicz and M. Levoy, Efficient variants of the icp algorithm, vol.3, pp.145-152, 2001.

O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh et al., Imagenet large scale visual recognition challenge, International journal of computer vision, vol.115, issue.3, pp.211-252, 2015.

T. Sattler, M. Havlena, F. Radenovic, K. Schindler, and M. Pollefeys, , 2015.

, Hyperpoints and fine vocabularies for large-scale location recognition, Proceedings of the IEEE International Conference on Computer Vision, pp.2102-2110

T. Sattler, B. Leibe, and L. Kobbelt, Fast image-based localization using direct 2d-to-3d matching, 2011 International Conference on Computer Vision, pp.667-674, 2011.

T. Sattler, B. Leibe, and L. Kobbelt, Improving image-based localization by active correspondence search, European conference on computer vision, pp.752-765, 2012.

T. Sattler, B. Leibe, and L. Kobbelt, Efficient & effective prioritized matching for large-scale image-based localization, IEEE transactions on pattern analysis and machine intelligence, vol.39, pp.1744-1756, 2017.

D. Schmalstieg and T. Hollerer, Augmented reality: principles and practice, 2016.

M. Schwarz, H. Schulz, and S. Behnke, Rgb-d object recognition and pose estimation based on pre-trained convolutional neural network features, Robotics and Automation (ICRA), 2015 IEEE International Conference on, pp.1329-1335, 2015.

E. Seemann, K. Nickel, and R. Stiefelhagen, Head pose estimation using stereo vision for human-robot interaction, Proceedings. Sixth IEEE International Conference on, pp.626-631, 2004.

A. Sharif-razavian, J. Sullivan, A. Maki, C. , and S. , A baseline for visual instance retrieval with deep convolutional networks, International Conference on Learning Representations, 2015.

J. Shotton, B. Glocker, C. Zach, S. Izadi, A. Criminisi et al., Scene coordinate regression forests for camera relocalization in rgb-d images, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.2930-2937, 2013.

J. Shotton, T. Sharp, A. Kipman, A. Fitzgibbon, M. Finocchio et al., Real-time human pose recognition in parts from single depth images, Communications of the ACM, vol.56, issue.1, pp.116-124, 2013.

C. Silpa-anan and R. Hartley, Optimised kd-trees for fast image descriptor matching, 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp.1-8, 2008.

K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, International Conference on Learning Representations, 2015.

J. Sivic and A. Zisserman, Video google: A text retrieval approach to object matching in videos, p.1470, 2003.

N. Snavely, S. M. Seitz, and R. Szeliski, Photo tourism: exploring photo collections in 3d, ACM transactions on graphics (TOG), vol.25, pp.835-846, 2006.

N. Snavely, S. M. Seitz, and R. Szeliski, Modeling the world from internet photo collections, International Journal of Computer Vision, vol.80, issue.2, pp.189-210, 2008.

P. F. Sturm and S. J. Maybank, On plane-based camera calibration: A general algorithm, singularities, applications, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149), vol.1, pp.432-437, 1999.
URL : https://hal.archives-ouvertes.fr/inria-00525681

H. Su, C. R. Qi, Y. Li, and L. J. Guibas, Render for cnn: Viewpoint estimation in images using cnns trained with rendered 3d model views, Proceedings of the IEEE International Conference on Computer Vision, pp.2686-2694, 2015.

Y. Sun, M. Liu, M. Q. Meng, and .. , Improving rgb-d slam in dynamic environments: A motion removal approach, Robotics and Autonomous Systems, vol.89, pp.110-122, 2017.

L. Svärm, O. Enqvist, F. Kahl, and M. Oskarsson, City-scale localization for cameras with known vertical direction, IEEE transactions on pattern analysis and machine intelligence, vol.39, pp.1455-1461, 2017.

C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed et al., Going deeper with convolutions, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.1-9, 2015.

H. Taira, M. Okutomi, T. Sattler, M. Cimpoi, M. Pollefeys et al., Inloc: Indoor visual localization with dense matching and view synthesis, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.7199-7209, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01859637

W. Tan, H. Liu, Z. Dong, G. Zhang, B. et al., Robust monocular slam in dynamic environments, Mixed and Augmented Reality (ISMAR), 2013 IEEE International Symposium on, pp.209-218, 2013.

K. Tateno, F. Tombari, I. Laina, and N. Navab, Cnn-slam: Real-time dense monocular slam with learned depth prediction, The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.

A. Torii, R. Arandjelovic, J. Sivic, M. Okutomi, and T. Pajdla, 24/7 place recognition by view synthesis, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.1808-1817, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01616660

A. Toshev and C. Szegedy, Deeppose: Human pose estimation via deep neural networks, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.1653-1660, 2014.

B. Triggs, P. F. Mclauchlan, R. I. Hartley, and A. W. Fitzgibbon, Bundle adjustment-a modern synthesis, International workshop on vision algorithms, pp.298-372, 1999.
URL : https://hal.archives-ouvertes.fr/inria-00548290

R. Y. Tsai and R. K. Lenz, Real time versatile robotics hand/eye calibration using 3d machine vision, IEEE International Conference on, pp.554-561, 1988.

A. Valada, N. Radwan, and W. Burgard, Deep auxiliary learning for visual localization and odometry, 2018 IEEE International Conference on Robotics and Automation (ICRA), pp.6939-6946, 2018.

J. Valentin, M. Nießner, J. Shotton, A. Fitzgibbon, S. Izadi et al., Exploiting uncertainty in regression forests for accurate camera relocalization, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.4400-4408, 2015.

F. Walch, C. Hazirbas, L. Leal-taixe, T. Sattler, S. Hilsenbeck et al., Image-based localization using lstms for structured feature correlation, The IEEE International Conference on Computer Vision (ICCV), 2017.

S. Wangsiripitak and D. W. Murray, Avoiding moving outliers in visual slam by tracking moving objects, ICRA, vol.2, p.7, 2009.

O. Wasenmüller, M. Meyer, and D. Stricker, Corbs: Comprehensive rgb-d benchmark for slam using kinect v2, Applications of Computer Vision (WACV), 2016 IEEE Winter Conference on, pp.1-7, 2016.

T. Weyand, I. Kostrikov, and J. Philbin, Planet-photo geolocation with convolutional neural networks, European Conference on Computer Vision, pp.37-55, 2016.

K. F. Whelan, M. Kaess, M. F. Fallon, H. Johannsson, J. J. Leonard et al., Kintinuous: Spatially extended kinectfusion, AAAI, 2012.

T. Whelan, S. Leutenegger, R. F. Salas-moreno, B. Glocker, and A. J. Davison, Elasticfusion: Dense slam without a pose graph, Robotics: science and systems, vol.11, 2015.

T. Whelan, R. F. Salas-moreno, B. Glocker, A. J. Davison, and S. Leutenegger, Elasticfusion: Real-time dense slam and light source estimation, The International Journal of Robotics Research, p.0278364916669237, 2016.

K. Wilson and N. Snavely, Robust global translations with 1dsfm, European Conference on Computer Vision, pp.61-75, 2014.

P. Wohlhart and V. Lepetit, Learning descriptors for object recognition and 3d pose estimation, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.3109-3118, 2015.

C. Wu, Towards linear-time incremental structure from motion, 3DTV-Conference, 2013 International Conference on, pp.127-134, 2013.

J. Wu, L. Ma, and X. Hu, Delving deeper into convolutional neural networks for camera relocalization, 2017 IEEE International Conference on Robotics and Automation (ICRA), pp.5644-5651, 2017.

S. You and U. Neumann, Fusion of vision and gyro tracking for robust augmented reality registration, Proceedings IEEE Virtual Reality, pp.71-78, 2001.

Y. Yun, M. H. Changrampadi, and I. Y. Gu, Head pose classification by multi-class adaboost with fusion of rgb and depth images, Signal Processing and Integrated Networks (SPIN), 2014 International Conference on, pp.174-177, 2014.

B. Zeisl, T. Sattler, and M. Pollefeys, Camera pose voting for large-scale image-based localization, Proceedings of the IEEE International Conference on Computer Vision, pp.2704-2712, 2015.

P. Zhang, J. Gu, E. E. Milios, and P. Huynh, Navigation with imu/gps/digital compass with unscented kalman filter, Mechatronics and Automation, vol.3, pp.1497-1502, 2005.

Z. Zhang, A flexible new technique for camera calibration, IEEE Transactions, p.22, 2000.

G. Zhou, B. Bescos, M. Dymczyk, M. Pfeiffer, J. Neira et al., Dynamic objects segmentation for visual localization in urban environments, 2018.