S. Agarwal, Y. Furukawa, N. Snavely, B. Curless, S. Seitz et al., Reconstructing Rome, Computer, vol.43, issue.6, pp.40-47, 2010.
DOI : 10.1109/MC.2010.175

A. Ahmed, K. Yu, W. Xu, Y. Gong, and E. Xing, Training Hierarchical Feed-Forward Visual Recognition Models Using Transfer Learning from Pseudo-Tasks, ECCV, 2008.
DOI : 10.1007/978-3-540-88690-7_6

N. Ahuja and S. Todorovic, Learning the Taxonomy and Models of Categories Present in Arbitrary Images, 2007 IEEE 11th International Conference on Computer Vision, 2007.
DOI : 10.1109/ICCV.2007.4409039

K. Alahari, G. Seguin, J. Sivic, and I. Laptev, Pose Estimation and Segmentation of People in 3D Movies, 2013 IEEE International Conference on Computer Vision, 2013.
DOI : 10.1109/ICCV.2013.263

URL : https://hal.archives-ouvertes.fr/hal-00874884

D. G. Aliaga, P. A. Rosen, and D. R. Bekins, Style grammars for interactive visualization of architecture. Visualization and Computer Graphics, IEEE Transactions on, vol.13, issue.4, 2007.

N. E. Apostoloff and A. Zisserman, Who are you? - real-time person identification, Procedings of the British Machine Vision Conference 2007, 2007.
DOI : 10.5244/C.21.48

O. Arandjelovic and R. Cipolla, Automatic Cast Listing in Feature-Length Films with Anisotropic Manifold Space, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Volume 2 (CVPR'06), 2006.
DOI : 10.1109/CVPR.2006.64

O. Arandjelovic and A. Zisserman, Automatic Face Recognition for Film Character Retrieval in Feature-Length Films, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), 2005.
DOI : 10.1109/CVPR.2005.81

R. Arandjelovi´carandjelovi´c and A. Zisserman, Smooth object retrieval using a bag of boundaries, ICCV, 2011.

R. Arandjelovi´carandjelovi´c and A. Zisserman, Three things everyone should know to improve object retrieval, CVPR, 2012.

M. Aubry, B. Russell, A. Efros, and J. Sivic, Seeing 3D Chairs: Exemplar Part-Based 2D-3D Alignment Using a Large Dataset of CAD Models, 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2013.
DOI : 10.1109/CVPR.2014.487

URL : https://hal.archives-ouvertes.fr/hal-01057240

M. Aubry, B. Russell, and J. Sivic, Painting-to-3D model alignment via discriminative visual elements Pre-print available at http, INRIA. Accepted for publication in ACM Transactions on Graphics, 2013.

Y. Aytar and A. Zisserman, Tabula rasa: Model transfer for object category detection, 2011 International Conference on Computer Vision, 2011.
DOI : 10.1109/ICCV.2011.6126504

G. Baatz, O. Saurer, K. Köser, and M. Pollefeys, Large Scale Visual Geo-Localization of Images in Mountainous Terrain, ECCV, 2012.
DOI : 10.1007/978-3-642-33709-3_37

L. Baboud, M. Cadik, E. Eisemann, and H. Seidel, Automatic photo-to-terrain alignment for the annotation of mountain pictures, CVPR 2011, 2011.
DOI : 10.1109/CVPR.2011.5995727

F. Bach and Z. Harchaoui, DIFFRAC : a discriminative and flexible framework for clustering, NIPS, 2007.

S. Bae, A. Agarwala, and F. Durand, Computational rephotography, ACM Transactions on Graphics, vol.29, issue.3, 2010.
DOI : 10.1145/1805964.1805968

L. Ballan, G. J. Brostow, J. Puwein, and M. Pollefeys, Unstructured video-based rendering: Interactive exploration of casually captured videos, 2010.

A. Hillel and D. Weinshall, Subordinate class recognition using relational object models, NIPS, 2006.

E. Bart, I. Porteous, P. Perona, and M. Welling, Unsupervised learning of visual taxonomies, 2008 IEEE Conference on Computer Vision and Pattern Recognition, 2008.
DOI : 10.1109/CVPR.2008.4587620

T. Berg, A. Berg, J. Edwards, R. White, Y. W. Teh et al., Names and faces in the news, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004., pp.848-854, 2004.
DOI : 10.1109/CVPR.2004.1315253

C. M. Bishop, Pattern Recognition and Machine Learning, 2006.

M. Blank, L. Gorelick, E. Shechtman, M. Irani, and R. Basri, Actions as space-time shapes, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1, 2005.
DOI : 10.1109/ICCV.2005.28

D. Blei, T. Griffiths, M. Jordan, and J. Tenenbaum, Hierarchical topic models and the nested chinese restaurant process, NIPS, 2004.

D. M. Blei, T. Griffiths, M. I. Jordan, and J. Tenenbaum, Hierarchical topic models and the nested chinese restaurant process, NIPS, 2003.

P. Bojanowski, F. Bach, I. Laptev, J. Ponce, C. Schmid et al., Finding Actors and Actions in Movies, 2013 IEEE International Conference on Computer Vision, 2013.
DOI : 10.1109/ICCV.2013.283

URL : https://hal.archives-ouvertes.fr/hal-00904991

F. Bosché, Automated recognition of 3D CAD model objects in laser scans and calculation of as-built dimensions for dimensional compliance control in construction, Advanced Engineering Informatics, vol.24, issue.1, pp.107-118, 2010.
DOI : 10.1016/j.aei.2009.08.006

L. Bourdev and J. Malik, Poselets: Body part detectors trained using 3D human pose annotations, 2009 IEEE 12th International Conference on Computer Vision, 2009.
DOI : 10.1109/ICCV.2009.5459303

Y. Boureau, F. Bach, Y. Lecun, and J. Ponce, Learning mid-level features for recognition, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010.
DOI : 10.1109/CVPR.2010.5539963

C. Buckley, G. Salton, J. Allan, and A. Singhal, Automatic query expansion using smart, TREC-3 Proc, 1995.

P. Buehler, M. Everingham, and A. Zisserman, Learning sign language by watching TV (using weakly aligned subtitles), 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009.
DOI : 10.1109/CVPR.2009.5206523

D. Chen and G. Baatz, City-scale landmark identification on mobile devices, CVPR 2011, 2011.
DOI : 10.1109/CVPR.2011.5995610

W. Choi, Y. Chao, C. Pantofaru, and S. Savarese, Understanding Indoor Scenes Using 3D Geometric Phrases, 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013.
DOI : 10.1109/CVPR.2013.12

O. Chum and J. Matas, Geometric Hashing with Local Affine Frames, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Volume 1 (CVPR'06), 2006.
DOI : 10.1109/CVPR.2006.125

O. Chum, A. Mikulik, M. Perdoch, and J. Matas, Total recall II: Query expansion revisited, CVPR 2011, 2011.
DOI : 10.1109/CVPR.2011.5995601

O. Chum, M. Perdoch, and J. Matas, Geometric min-Hashing: Finding a (thick) needle in a haystack, 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009.
DOI : 10.1109/CVPR.2009.5206531

O. Chum, J. Philbin, M. Isard, and A. Zisserman, Scalable near identical image and shot detection, Proceedings of the 6th ACM international conference on Image and video retrieval, CIVR '07, 2007.
DOI : 10.1145/1282280.1282359

O. Chum, J. Philbin, J. Sivic, M. Isard, and A. Zisserman, Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval, 2007 IEEE 11th International Conference on Computer Vision, 2007.
DOI : 10.1109/ICCV.2007.4408891

O. Chum and A. Zisserman, An Exemplar Model for Learning Object Classes, 2007 IEEE Conference on Computer Vision and Pattern Recognition, 2007.
DOI : 10.1109/CVPR.2007.383050

R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu et al., Natural language processing (almost) from scratch, JMLR, vol.12, pp.2493-2537, 2011.

T. Cour, C. Jordan, E. Miltsakaki, and B. Taskar, Movie/Script: Alignment and Parsing of Video and Text Transcription, ECCV, 2008.
DOI : 10.1007/978-3-540-88693-8_12

T. Cour, B. Sapp, C. Jordan, and B. Taskar, Learning from ambiguously labeled images, 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009.
DOI : 10.1109/CVPR.2009.5206667

G. Csurka, C. Dance, L. Fan, J. Willamowski, and C. Bray, Visual categorization with bags of keypoints, ECCV Workshop, 2004.

M. Cummins and P. Newman, Highly scalable appearance-only SLAM - FAB-MAP 2.0, Robotics: Science and Systems V, 2009.
DOI : 10.15607/RSS.2009.V.039

N. Dalal and B. Triggs, Histograms of Oriented Gradients for Human Detection, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), 2005.
DOI : 10.1109/CVPR.2005.177

URL : https://hal.archives-ouvertes.fr/inria-00548512

P. E. Debevec, C. J. Taylor, and J. Malik, Modeling and rendering architecture from photographs, Proceedings of the 23rd annual conference on Computer graphics and interactive techniques , SIGGRAPH '96, 1996.
DOI : 10.1145/237170.237191

L. Del-pero, J. Bowdish, B. Kermgard, E. Hartley, and K. Barnard, Understanding Bayesian Rooms Using Composite 3D Object Models, 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013.
DOI : 10.1109/CVPR.2013.27

V. Delaitre, D. Fouhey, I. Laptev, J. Sivic, A. Gupta et al., Scene Semantics from Long-Term Observation of People, ECCV, 2012.
DOI : 10.1007/978-3-642-33783-3_21

URL : https://hal.archives-ouvertes.fr/hal-01060880

V. Delaitre, J. Sivic, and I. Laptev, Learning person-object interactions for action recognition in still images, NIPS, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00648156

J. Deng, W. Dong, R. Socher, L. Li, K. Li et al., ImageNet: A Large-Scale Hierarchical Image Database, CVPR, 2009.

C. Doersch, S. Singh, A. Gupta, J. Sivic, and A. A. Efros, What makes Paris look like Paris?, ACM Transactions on Graphics (TOG), vol.31, issue.4, p.101, 2012.
URL : https://hal.archives-ouvertes.fr/hal-01053876

P. Dollár, V. Rabaud, G. Cottrell, and S. Belongie, Behavior recognition via sparse spatiotemporal features, VS-PETS, 2005.

O. Duchenne, I. Laptev, J. Sivic, F. Bach, and J. Ponce, Automatic annotation of human actions in video, 2009 IEEE 12th International Conference on Computer Vision, 2009.
DOI : 10.1109/ICCV.2009.5459279

A. A. Efros, A. C. Berg, G. Mori, and J. Malik, Recognizing action at a distance, Proceedings Ninth IEEE International Conference on Computer Vision, 2003.
DOI : 10.1109/ICCV.2003.1238420

B. Epshtein and S. Ullman, Feature hierarchies for object classification, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1, 2005.
DOI : 10.1109/ICCV.2005.98

M. Everingham, L. Van-gool, C. K. Williams, J. Winn, and A. Zisserman, The Pascal Visual Object Classes (VOC) Challenge, International Journal of Computer Vision, vol.73, issue.2, pp.303-338, 2010.
DOI : 10.1007/s11263-009-0275-4

M. Everingham, J. Sivic, and A. Zisserman, Hello! My name is... Buffy'' -- Automatic Naming of Characters in TV Video, Procedings of the British Machine Vision Conference 2006, 2006.
DOI : 10.5244/C.20.92

M. Everingham, J. Sivic, and A. Zisserman, Taking the bite out of automatic naming of characters in TV video, Image and Vision Computing, vol.27, issue.5, 2009.

M. Everingham and A. Zisserman, Identifying individuals in video by combining 'generative' and discriminative head models, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1, 2005.
DOI : 10.1109/ICCV.2005.116

R. Fan, K. Chang, C. Hsieh, X. Wang, and C. Lin, Liblinear: A library for large linear classification, JMLR, vol.9, issue.1, pp.1871-1874, 2008.

C. Farabet, C. Couprie, L. Najman, and Y. Lecun, Learning Hierarchical Features for Scene Labeling, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.35, issue.8, 2013.
DOI : 10.1109/TPAMI.2012.231

URL : https://hal.archives-ouvertes.fr/hal-00742077

A. Farhadi, M. Hejrati, A. Sadeghi, P. Young, C. Rashtchian et al., Every Picture Tells a Story: Generating Sentences from Images, ECCV, 2010.
DOI : 10.1007/978-3-642-15561-1_2

A. Farhadi, M. K. Tabrizi, I. Endres, and D. Forsyth, A latent model of discriminative aspect, 2009 IEEE 12th International Conference on Computer Vision, 2009.
DOI : 10.1109/ICCV.2009.5459350

P. Felzenszwalb, R. Girshick, D. Mcallester, and D. Ramanan, Object Detection with Discriminatively Trained Part-Based Models, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.32, issue.9, 2010.
DOI : 10.1109/TPAMI.2009.167

P. Felzenszwalb, D. Mcallester, and D. Ramanan, A discriminatively trained, multiscale, deformable part model, 2008 IEEE Conference on Computer Vision and Pattern Recognition, 2008.
DOI : 10.1109/CVPR.2008.4587597

S. Fidler, S. Dickinson, and R. Urtasun, 3D object detection and viewpoint estimation with a deformable 3d cuboid model, NIPS, 2012.

S. Fidler and A. Leonardis, Towards Scalable Representations of Object Categories: Learning a Hierarchy of Parts, 2007 IEEE Conference on Computer Vision and Pattern Recognition, 2007.
DOI : 10.1109/CVPR.2007.383269

A. W. Fitzgibbon and A. Zisserman, On Affine Invariant Clustering and Automatic Cast Listing in Movies, ECCV, pp.304-320, 2002.
DOI : 10.1007/3-540-47977-5_20

D. Fouhey, V. Delaitre, A. Gupta, A. Efros, I. Laptev et al., People watching: Human actions as a cue for single-view geometry, ECCV, 2012.
URL : https://hal.archives-ouvertes.fr/hal-01060874

A. Frome, Y. Singer, F. Sha, and J. Malik, Learning Globally-Consistent Local Distance Functions for Shape-Based Image Retrieval and Classification, 2007 IEEE 11th International Conference on Computer Vision, 2007.
DOI : 10.1109/ICCV.2007.4408839

M. Gharbi, T. Malisiewicz, S. Paris, and F. Durand, A Gaussian approximation of feature space for fast image similarity, p.2012

K. Grauman and T. Darrell, Unsupervised Learning of Categories from Sets of Partially Matching Image Features, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Volume 1 (CVPR'06), 2006.
DOI : 10.1109/CVPR.2006.322

P. Gronat, G. Obozinski, J. Sivic, and T. Pajdla, Learning and calibrating per-location classifiers for visual place recognition, CVPR, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00934332

M. Guillaumin, T. Mensink, J. Verbeek, and C. Schmid, TagProp: Discriminative metric learning in nearest neighbor models for image auto-annotation, 2009 IEEE 12th International Conference on Computer Vision, 2009.
DOI : 10.1109/ICCV.2009.5459266

URL : https://hal.archives-ouvertes.fr/inria-00439276

A. Gupta and L. S. Davis, Beyond Nouns: Exploiting Prepositions and Comparative Adjectives for Learning Visual Classifiers, ECCV, 2008.
DOI : 10.1007/978-3-540-88682-2_3

A. Gupta, A. Efros, and M. Hebert, Blocks World Revisited: Image Understanding Using Qualitative Geometry and Mechanics, ECCV, 2010.
DOI : 10.1007/978-3-642-15561-1_35

A. Gupta, P. Srinivasan, J. Shi, and L. Davis, Understanding videos, constructing plots learning a visually grounded storyline model from annotated videos, 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009.
DOI : 10.1109/CVPR.2009.5206492

D. C. Hauagge and N. Snavely, Image matching using local symmetry features, 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012.
DOI : 10.1109/CVPR.2012.6247677

M. Hejrati and D. Ramanan, Analyzing 3d objects in cluttered images, NIPS, 2012.

G. E. Hinton, Learning multiple layers of representation, Trends in Cognitive Sciences, vol.11, issue.10, pp.428-434, 2007.
DOI : 10.1016/j.tics.2007.09.004

D. P. Huttenlocher and S. Ullman, Object recognition using alignment, International Conference on Computer Vision, 1987.

A. Irschara, C. Zach, J. Frahm, and H. Bischof, From structure-from-motion point clouds to fast location recognition, 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009.
DOI : 10.1109/CVPR.2009.5206587

A. Jain, A. Gupta, M. Rodriguez, and L. Davis, Representing Videos Using Mid-level Discriminative Patches, 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013.
DOI : 10.1109/CVPR.2013.332

H. Jegou, M. Douze, and C. Schmid, On the burstiness of visual elements, 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009.
DOI : 10.1109/CVPR.2009.5206609

URL : https://hal.archives-ouvertes.fr/inria-00394211

H. Jegou, M. Douze, and C. Schmid, Product Quantization for Nearest Neighbor Search, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.33, issue.1, pp.117-128, 2011.
DOI : 10.1109/TPAMI.2010.57

URL : https://hal.archives-ouvertes.fr/inria-00514462

H. Jegou, F. Perronnin, M. Douze, J. Sanchez, P. Perez et al., Aggregating Local Image Descriptors into Compact Codes, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.34, issue.9, pp.1704-1716, 2012.
DOI : 10.1109/TPAMI.2011.235

URL : https://hal.archives-ouvertes.fr/inria-00633013

J. Jiang and C. Zhai, Instance weighting for domain adaptation in NLP, ACL, 2007.

Y. Jin and S. Geman, Context and hierarchy in a probabilistic image model, CVPR, 2006.

A. Joulin, F. Bach, and J. Ponce, Discriminative clustering for image co-segmentation, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010.
DOI : 10.1109/CVPR.2010.5539868

A. Joulin, F. Bach, and J. Ponce, Multi-class cosegmentation, 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012.
DOI : 10.1109/CVPR.2012.6247719

URL : https://hal.archives-ouvertes.fr/hal-00717448

M. Juneja, A. Vedaldi, C. V. Jawahar, and A. Zisserman, Blocks That Shout: Distinctive Parts for Scene Classification, 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013.
DOI : 10.1109/CVPR.2013.124

B. Kaneva, J. Sivic, A. Torralba, S. Avidan, and W. T. Freeman, Infinite Images: Creating and Exploring a Large Photorealistic Virtual Space, Proceedings of the IEEE, pp.1391-1407, 2010.
DOI : 10.1109/JPROC.2009.2031133

B. Kaneva, J. Sivic, A. Torralba, S. Avidan, and W. T. Freeman, Matching and predicting street level images, ECCV 2010 Workshop on Vision for Cognitive Tasks, 2010.

L. Karlinsky, M. Dinerstein, and S. Ullman, Unsupervised feature optimization (UFO): Simultaneous selection of multiple features with their detection parameters, 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009.
DOI : 10.1109/CVPR.2009.5206499

A. Khosla, T. Zhou, T. Malisiewicz, A. A. Efros, and A. Torralba, Undoing the Damage of Dataset Bias, ECCV, 2012.
DOI : 10.1007/978-3-642-33718-5_12

J. Knopp, J. Sivic, and T. Pajdla, Avoiding Confusing Features in Place Recognition, ECCV, 2010.
DOI : 10.1007/978-3-642-15549-9_54

J. Kopf, B. Neubert, B. Chen, M. Cohen, D. Cohen-or et al., Deep photo: Model-based photograph enhancement and viewing, ACM Transactions on Graphics, vol.27, issue.5, 2008.

A. Krizhevsky, I. Sutskever, and G. E. Hinton, Imagenet classification with deep convolutional neural networks, 2012.

M. P. Kumar, P. H. Torr, and A. Zisserman, An Invariant Large Margin Nearest Neighbour Classifier, 2007 IEEE 11th International Conference on Computer Vision, 2007.
DOI : 10.1109/ICCV.2007.4409041

N. Kumar, P. Belhumeur, and S. Nayar, FaceTracer: A Search Engine for Large Collections of Images with Faces, ECCV, 2008.
DOI : 10.1007/978-3-540-88693-8_25

I. Laptev, M. Marszalek, C. Schmid, and B. Rozenfeld, Learning realistic human actions from movies, 2008 IEEE Conference on Computer Vision and Pattern Recognition, 2008.
DOI : 10.1109/CVPR.2008.4587756

URL : https://hal.archives-ouvertes.fr/inria-00548659

I. Laptev and P. Pérez, Retrieving actions in movies, 2007 IEEE 11th International Conference on Computer Vision, 2007.
DOI : 10.1109/ICCV.2007.4409105

S. Lazebnik, C. Schmid, and J. Ponce, Semi-Local Affine Parts for Object Recognition, Procedings of the British Machine Vision Conference 2004, 2004.
DOI : 10.5244/C.18.98

URL : https://hal.archives-ouvertes.fr/inria-00548542

S. Lazebnik, C. Schmid, and J. Ponce, Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Volume 2 (CVPR'06), 2006.
DOI : 10.1109/CVPR.2006.68

URL : https://hal.archives-ouvertes.fr/inria-00548585

Q. Le, M. Ranzato, R. Monga, M. Devin, K. Chen et al., Building high-level features using large scale unsupervised learning, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, 2012.
DOI : 10.1109/ICASSP.2013.6639343

Q. V. Le, W. Zou, S. Y. Yeung, and A. Y. Ng, Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis, CVPR 2011, 2011.
DOI : 10.1109/CVPR.2011.5995496

Y. Lecun, L. Bottou, and J. Huangfu, Learning methods for generic object recognition with invariance to pose and lighting, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004., 2004.
DOI : 10.1109/CVPR.2004.1315150

Y. J. Lee and K. Grauman, Shape discovery from unlabeled image collections, CVPR, 2009.

Y. J. Lee and K. Grauman, Learning the easy things first: Self-paced visual category discovery, CVPR 2011, 2011.
DOI : 10.1109/CVPR.2011.5995523

G. Levin and P. Debevec, Rouen revisited ? interactive installation, 1999.

J. Lezama, K. Alahari, J. Sivic, and I. Laptev, Track to the future: Spatio-temporal video segmentation with long-range motion cues, CVPR 2011, 2011.
DOI : 10.1109/CVPR.2011.6044588

URL : https://hal.archives-ouvertes.fr/hal-00817961

P. Li, H. Ai, Y. Li, and C. Huang, Video parsing based on head tracking and face recognition, Proceedings of the 6th ACM international conference on Image and video retrieval, CIVR '07, 2007.
DOI : 10.1145/1282280.1282288

Y. Li, N. Snavely, D. Huttenlocher, and P. Fua, Worldwide pose estimation using 3D point clouds, ECCV, 2012.

Y. Li, N. Snavely, and D. P. Huttenlocher, Location Recognition Using Prioritized Feature Matching, ECCV, 2010.
DOI : 10.1007/978-3-642-15552-9_57

J. Lim, H. Pirsiavash, and A. Torralba, Parsing IKEA Objects: Fine Pose Estimation, 2013 IEEE International Conference on Computer Vision, 2013.
DOI : 10.1109/ICCV.2013.372

C. Liu, J. Yuen, A. Torralba, J. Sivic, and W. T. Freeman, SIFT Flow: Dense Correspondence across Different Scenes, ECCV, 2008.
DOI : 10.1007/978-3-540-88690-7_3

D. Lowe, Local feature view clustering for 3D object recognition, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001, 2001.
DOI : 10.1109/CVPR.2001.990541

D. Lowe, Distinctive Image Features from Scale-Invariant Keypoints, International Journal of Computer Vision, vol.60, issue.2, pp.91-110, 2004.
DOI : 10.1023/B:VISI.0000029664.99615.94

D. G. Lowe, Three-dimensional object recognition from single two-dimensional images, Artificial Intelligence, vol.31, issue.3, pp.355-395, 1987.
DOI : 10.1016/0004-3702(87)90070-1

J. Luo, B. Caputo, and V. Ferrari, Who's doing what: Joint modeling of names and verbs for simultaneous face and pose annotation, NIPS, 2009.

T. Malisiewicz, A. Gupta, and A. Efros, Ensemble of exemplar-SVMs for object detection and beyond, 2011 International Conference on Computer Vision, 2011.
DOI : 10.1109/ICCV.2011.6126229

M. Marszalek, I. Laptev, and C. Schmid, Actions in context, 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009.
DOI : 10.1109/CVPR.2009.5206557

URL : https://hal.archives-ouvertes.fr/inria-00548645

M. Marszalek, C. Schmid, H. Harzallah, and J. Van-de-weijer, Learning object representations for visual object class recognition, Visual Recognition Challange workshop, ICCV, 2007.
URL : https://hal.archives-ouvertes.fr/inria-00548669

T. Mensink and J. Verbeek, Improving People Search Using Query Expansions, ECCV, 2008.
DOI : 10.1007/978-3-540-88688-4_7

URL : https://hal.archives-ouvertes.fr/inria-00321045

M. Muja and D. Lowe, Fast approximate nearest neighbors with automatic algorithm configuration, VISAPP, 2009.

J. L. Mundy, Object Recognition in the Geometric Era: A Retrospective, Toward Category- Level Object Recognition, pp.3-29, 2006.
DOI : 10.1007/11957959_1

P. Musialski, P. Wonka, D. G. Aliaga, M. Wimmer, L. Van-gool et al., A Survey of Urban Reconstruction, Eurographics 2012-State of the Art Reports, 2012.
DOI : 10.1111/cgf.12077

M. Nguyen, L. Torresani, F. De-la-torre, and C. Rother, Weakly supervised discriminative localization and classification: a joint learning process, 2009 IEEE 12th International Conference on Computer Vision, 2009.
DOI : 10.1109/ICCV.2009.5459426

J. C. Niebles, H. Wang, and L. Fei-fei, Unsupervised learning of human action categories using spatial-temporal words, Proc. BMVC, 2006.

D. Nister and H. Stewenius, Scalable Recognition with a Vocabulary Tree, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Volume 2 (CVPR'06), 2006.
DOI : 10.1109/CVPR.2006.264

B. Ommer and J. Buhmann, Learning Compositional Categorization Models, ECCV, 2006.
DOI : 10.1007/11744078_25

M. Oquab, L. Bottou, I. Laptev, and J. Sivic, Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks, 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2013.
DOI : 10.1109/CVPR.2014.222

URL : https://hal.archives-ouvertes.fr/hal-00911179

V. Ordonez, G. Kulkarni, and T. L. Berg, Im2text: Describing images using 1 million captioned photographs, NIPS, 2011.

R. Osadchy, M. Miller, and Y. Lecun, Synergistic Face Detection and Pose Estimation with Energy-Based Models, NIPS, 2005.
DOI : 10.1007/11957959_10

D. Ozkan and P. Duygulu, Finding People Frequently Appearing in News, CIVR, 2006.
DOI : 10.1007/11788034_18

S. J. Pan and Q. Yang, A survey on transfer learning. Knowledge and Data Engineering, IEEE Transactions on, vol.22, issue.10, pp.1345-1359, 2010.

F. Perronnin, J. Sánchez, and T. Mensink, Improving the Fisher Kernel for Large-Scale Image Classification, ECCV, 2010.
DOI : 10.1007/978-3-642-15561-1_11

URL : https://hal.archives-ouvertes.fr/inria-00548630

J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman, Object retrieval with large vocabularies and fast spatial matching, 2007 IEEE Conference on Computer Vision and Pattern Recognition, 2007.
DOI : 10.1109/CVPR.2007.383172

J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman, Lost in quantization: Improving particular object retrieval in large scale image databases, 2008 IEEE Conference on Computer Vision and Pattern Recognition, 2008.
DOI : 10.1109/CVPR.2008.4587635

J. Philbin, M. Isard, J. Sivic, and A. Zisserman, Descriptor Learning for Efficient Retrieval, ECCV, 2010.
DOI : 10.1007/978-3-642-15558-1_49

J. Philbin, J. Sivic, and A. Zisserman, Geometric Latent Dirichlet Allocation on a Matching Graph for??Large-scale Image Datasets, International Journal of Computer Vision, vol.62, issue.2, 2010.
DOI : 10.1007/s11263-010-0363-5

URL : https://hal.archives-ouvertes.fr/hal-01064717

H. Pirsiavash and D. Ramanan, Detecting activities of daily living in first-person camera views, 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012.
DOI : 10.1109/CVPR.2012.6248010

T. Quack, B. Leibe, and L. Van-gool, World-scale mining of objects and events from community photo collections, Proceedings of the 2008 international conference on Content-based image and video retrieval, CIVR '08, 2008.
DOI : 10.1145/1386352.1386363

D. Ramanan, S. Baker, and S. Kakade, Leveraging archival video for building face datasets, 2007 IEEE 11th International Conference on Computer Vision, 2007.
DOI : 10.1109/ICCV.2007.4409012

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.118.4964

J. B. Rapp, A geometrical analysis of multiple viewpoint perspective in the work of Giovanni Battista Piranesi: an application of geometric restitution of perspective, The Journal of Architecture, vol.13, issue.6, 2008.
DOI : 10.1080/13602360802573868

X. Ren and D. Ramanan, Histograms of Sparse Codes for Object Detection, 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013.
DOI : 10.1109/CVPR.2013.417

L. Roberts, Machine perception of 3-d solids, 1965.

M. Rodriguez, I. Laptev, J. Sivic, and J. Audibert, Density-aware person detection and tracking in crowds, 2011 International Conference on Computer Vision, 2011.
DOI : 10.1109/ICCV.2011.6126526

URL : https://hal.archives-ouvertes.fr/hal-00654266

M. Rodriguez, J. Sivic, I. Laptev, and J. Audibert, Data-driven crowd analysis in videos, 2011 International Conference on Computer Vision, 2011.
DOI : 10.1109/ICCV.2011.6126374

URL : https://hal.archives-ouvertes.fr/hal-00654256

F. Rothganger, S. Lazebnik, C. Schmid, and J. Ponce, 3D object modeling and recognition using affine-invariant patches and multi-view spatial constraints, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings., 2003.
DOI : 10.1109/CVPR.2003.1211480

URL : https://hal.archives-ouvertes.fr/inria-00548224

B. C. Russell, A. A. Efros, J. Sivic, W. T. Freeman, and A. Zisserman, Using Multiple Segmentations to Discover Objects and their Extent in Image Collections, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Volume 2 (CVPR'06), 2006.
DOI : 10.1109/CVPR.2006.326

B. C. Russell, J. Sivic, J. Ponce, and H. Dessales, Automatic alignment of paintings and photographs depicting a 3D scene, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), pp.3-11, 2011.
DOI : 10.1109/ICCVW.2011.6130291

URL : https://hal.archives-ouvertes.fr/hal-01053879

B. C. Russell and A. Torralba, Building a database of 3D scenes from user annotations, 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009.
DOI : 10.1109/CVPR.2009.5206643

B. C. Russell, A. Torralba, K. P. Murphy, and W. T. Freeman, LabelMe: A Database and Web-Based Tool for Image Annotation, International Journal of Computer Vision, vol.3, issue.1, pp.1-3157, 2008.
DOI : 10.1007/s11263-007-0090-8

K. Saenko, B. Kulis, M. Fritz, and T. Darrell, Adapting Visual Category Models to New Domains, ECCV, 2010.
DOI : 10.1007/978-3-642-15561-1_16

URL : http://hdl.handle.net/11858/00-001M-0000-0017-E577-9

S. Satkin, J. Lin, and M. Hebert, Data-Driven Scene Understanding from 3D Models, Procedings of the British Machine Vision Conference 2012, 2012.
DOI : 10.5244/C.26.128

T. Sattler, B. Leibe, and L. Kobbelt, Fast image-based localization using direct 2D-to-3D matching, 2011 International Conference on Computer Vision, 2011.
DOI : 10.1109/ICCV.2011.6126302

F. Schaffalitzky and A. Zisserman, Automated location matching in movies, Computer Vision and Image Understanding, vol.92, issue.2-3, pp.236-264, 2003.
DOI : 10.1016/j.cviu.2003.06.008

G. Schindler, M. Brown, and R. Szeliski, City-Scale Location Recognition, 2007 IEEE Conference on Computer Vision and Pattern Recognition, 2007.
DOI : 10.1109/CVPR.2007.383150

G. Schindler, F. Dellaert, and S. B. Kang, Inferring Temporal Order of Images From 3D Structure, 2007 IEEE Conference on Computer Vision and Pattern Recognition, 2007.
DOI : 10.1109/CVPR.2007.383088

C. Schmid and R. Mohr, Local grayvalue invariants for image retrieval, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.19, issue.5, pp.530-534, 1997.
DOI : 10.1109/34.589215

URL : https://hal.archives-ouvertes.fr/inria-00548358

C. Schuldt, I. Laptev, and B. Caputo, Recognizing human actions: a local SVM approach, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004., 2004.
DOI : 10.1109/ICPR.2004.1334462

S. Shalev-shwartz, Y. Singer, N. Srebro, and A. Cotter, Pegasos, Proceedings of the 24th international conference on Machine learning, ICML '07, pp.3-30, 2011.
DOI : 10.1145/1273496.1273598

E. Shechtman and M. Irani, Space-Time Behavior Based Correlation, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), 2005.
DOI : 10.1109/CVPR.2005.328

E. Shechtman and M. Irani, Matching Local Self-Similarities across Images and Videos, 2007 IEEE Conference on Computer Vision and Pattern Recognition, 2007.
DOI : 10.1109/CVPR.2007.383198

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.76.1297

J. Shi and J. Malik, Normalized cuts and image segmentation, CVPR, 1997.

A. Shrivastava, T. Malisiewicz, A. Gupta, and A. A. Efros, Data-driven visual similarity for cross-domain image matching, Proc. SIGGRAPH Asia), 2011.

C. Silpa-anan and R. Hartley, Localization using an image-map, ACRA, 2004.

S. Singh, A. Gupta, and A. A. Efros, Unsupervised Discovery of Mid-Level Discriminative Patches, ECCV, 2012.
DOI : 10.1007/978-3-642-33709-3_6

J. Sivic, M. Everingham, and A. Zisserman, Person Spotting: Video Shot Retrieval for Face Sets, International Conference on Image and Video Retrieval, 2005.
DOI : 10.1007/11526346_26

J. Sivic, M. Everingham, and A. Zisserman, Who are you? " -Learning person specific classifiers from video, CVPR, 2009.

J. Sivic, B. Kaneva, A. Torralba, S. Avidan, and W. T. Freeman, Creating and exploring a large photorealistic virtual space, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 2008.
DOI : 10.1109/CVPRW.2008.4562950

J. Sivic, B. C. Russell, A. A. Efros, A. Zisserman, and W. T. Freeman, Discovering objects and their location in images, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1, 2005.
DOI : 10.1109/ICCV.2005.77

J. Sivic, B. C. Russell, A. Zisserman, W. T. Freeman, and A. A. Efros, Unsupervised discovery of visual object class hierarchies, 2008 IEEE Conference on Computer Vision and Pattern Recognition, 2008.
DOI : 10.1109/CVPR.2008.4587622

J. Sivic and A. Zisserman, Video Google: a text retrieval approach to object matching in videos, Proceedings Ninth IEEE International Conference on Computer Vision, 2003.
DOI : 10.1109/ICCV.2003.1238663

J. Sivic and A. Zisserman, Efficient Visual Search of Videos Cast as Text Retrieval, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.31, issue.4, pp.591-606, 2009.
DOI : 10.1109/TPAMI.2008.111

Z. Song, Q. Chen, Z. Huang, Y. Hua, and S. Yan, Contextualizing object detection and classification, CVPR 2011, 2011.
DOI : 10.1109/CVPR.2011.5995330

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.660.6015

E. Sudderth and M. Jordan, Shared segmentation of natural scenes using dependent Pitman- Yor processes, NIPS, 2008.

R. Szeliski, Image Alignment and Stitching, pp.1-104, 2006.
DOI : 10.1007/0-387-28831-7_17

M. Tapaswi, M. Bauml, and R. Stiefelhagen, Knock! Knock! Who is it? " probabilistic person identification in tv-series, CVPR, 2012.

G. W. Taylor, R. Fergus, Y. Lecun, and C. Bregler, Convolutional Learning of Spatio-temporal Features, ECCV, 2010.
DOI : 10.1007/978-3-642-15567-3_11

S. Todorovic and N. Ahuja, Extracting Subimages of an Unknown Category from a Set of Images, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Volume 1 (CVPR'06), 2006.
DOI : 10.1109/CVPR.2006.116

T. Tommasi, F. Orabona, and B. Caputo, Safety in numbers: Learning categories from few examples with multi model knowledge transfer, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010.
DOI : 10.1109/CVPR.2010.5540064

A. Torii, J. Sivic, and T. Pajdla, Visual localization by linear combination of image descriptors, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), 2011.
DOI : 10.1109/ICCVW.2011.6130230

URL : https://hal.archives-ouvertes.fr/hal-01053880

A. Torii, J. Sivic, T. Pajdla, and M. Okutomi, Visual place recognition with repetitive structures, CVPR, 2013.
DOI : 10.1109/tpami.2015.2409868

URL : https://hal.archives-ouvertes.fr/hal-00934288

A. Torralba and A. A. Efros, Unbiased look at dataset bias, CVPR 2011, 2011.
DOI : 10.1109/CVPR.2011.5995347

A. Torralba, K. P. Murphy, and W. T. Freeman, Sharing features: efficient boosting procedures for multiclass object detection, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004., 2004.
DOI : 10.1109/CVPR.2004.1315241

P. Turcot and D. Lowe, Better matching with fewer features: The selection of useful features in large database recognition problems, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops, 2009.
DOI : 10.1109/ICCVW.2009.5457541

R. Vaillant, C. Monrocq, and Y. Lecun, Original approach for the localisation of objects in images, IEE Proceedings - Vision, Image, and Signal Processing, vol.141, issue.4, pp.245-250, 1994.
DOI : 10.1049/ip-vis:19941301

N. Vasconcelos, Image indexing with mixture hierarchies, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001, 2001.
DOI : 10.1109/CVPR.2001.990449

P. Viola and M. Jones, Rapid object detection using a boosted cascade of simple classifiers, CVPR, 2001.

G. Wang, Y. Zhang, and L. Fei-fei, Using Dependent Regions for Object Categorization in a Generative Framework, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Volume 2 (CVPR'06), 2006.
DOI : 10.1109/CVPR.2006.324

Y. Wang and G. Mori, A discriminative latent model of image region and object tag correspondence, NIPS, 2010.

J. Winn, A. Criminisi, and T. Minka, Object categorization by learned universal visual dictionary, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1, 2005.
DOI : 10.1109/ICCV.2005.171

C. Wu, B. Clipp, X. Li, J. Frahm, and M. Pollefeys, 3D model matching with viewpoint invariant patches (VIPs), CVPR, 2008.

J. Xiao, B. Russell, and A. Torralba, Localizing 3d cuboids in single-view images, NIPS, 2012.

L. Xu, J. Neufeld, B. Larson, and D. Schuurmans, Maximum margin clustering, NIPS, 2004.

J. Yang, A. Hauptmann, and M. Chen, Finding Person X: Correlating Names with Visual Appearances, CIVR, 2004.
DOI : 10.1007/978-3-540-27814-6_34

J. Yang, Y. Rong, and A. Hauptmann, Multiple instance learning for labeling faces in broadcasting news video, Proceedings of the 13th annual ACM international conference on Multimedia , MULTIMEDIA '05, 2005.
DOI : 10.1145/1101149.1101155

L. Zelnik-manor and M. Irani, Event-based analysis of video, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001, 2001.
DOI : 10.1109/CVPR.2001.990935

L. Zhu, Y. Chen, and A. Yuille, Unsupervised learning of a probabilistic grammar for object detection and parsing, NIPS, 2006.

M. Zia, M. Stark, B. Schiele, and K. Schindler, Detailed 3D Representations for Object Recognition and Modeling, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.35, issue.11, 2013.
DOI : 10.1109/TPAMI.2013.87

A. Zweig and D. Weinshall, Exploiting Object Hierarchy: Combining Models from Different Category Levels, 2007 IEEE 11th International Conference on Computer Vision, 2007.
DOI : 10.1109/ICCV.2007.4409064