G. Csurka, C. Dance, L. Fan, J. Willamowski, and C. Bray, Visual Categorization with Bags of Keypoints, Workshop on statistical learning in computer vision, Proceedings of the European Conference on Computer Vision (ECCVW), p.72, 2004.

]. J. Daugman, Complete discrete 2-D Gabor transforms by neural networks for image analysis and compression, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol.36, issue.7, pp.1169-1179, 1988.
DOI : 10.1109/29.1644
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.371.5847

]. T. Deselaers and V. Ferrari, Visual and semantic similarity in ImageNet, CVPR 2011, p.91, 2011.
DOI : 10.1109/CVPR.2011.5995474
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.228.553

A. and V. Ferrari, Weakly Supervised Localization and Learning with Generic Knowledge, International Journal of Computer Vision (IJCV), vol.100, issue.69, pp.275-293, 2012.

. Donahue, Semi-supervised Domain Adaptation with Instance Constraints, 2013 IEEE Conference on Computer Vision and Pattern Recognition, p.39, 2013.
DOI : 10.1109/CVPR.2013.92
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.671.1392

. Bibliography and . Lecun, Gradient-based Learning Applied to Document Recognition, Proceedings of the IEEE, pp.2278-2324, 1998.

. Lee, Efficient Sparse Coding Algorithms, Proceedings of Advances in Neural Information Processing Systems (NIPS), p.19, 2006.

. Li, Co-Salient Object Detection From Multiple Images, IEEE Transactions on Multimedia, vol.15, issue.8, pp.1896-1909, 2013.
DOI : 10.1109/TMM.2013.2271476

. Li, Image Co-localization by Mimicking a Good Detector???s Confidence Score Distribution, Proceedings of the European Conference on Computer Vision (ECCV), p.40, 2016.
DOI : 10.1007/978-3-319-46475-6_2
URL : http://arxiv.org/abs/1603.04619

. Lin, Network In Network, International Conference on Learning Representations (ICLR), p.23, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01460127

. Lin, Microsoft COCO: Common Objects in Context, Proceedings of the European Conference on Computer Vision (ECCV), pp.48-62, 2014.
DOI : 10.1007/978-3-319-10602-1_48
URL : http://arxiv.org/abs/1405.0312

]. D. Lin, An Information-Theoretic Definition of Similarity, Proceedings of the International Conference of Machine Learning (ICML), p.100, 1998.

]. T. Lindeberg, Feature Detection with Automatic Scale Selection, International Journal of Computer Vision, vol.30, issue.2, pp.79-116, 1998.
DOI : 10.1023/A:1008045108935

. Liu, SSD: Single Shot MultiBox Detector, Proceedings of the European Conference on Computer Vision (ECCV), pp.29-30, 2016.
DOI : 10.1007/978-3-642-33712-3_25
URL : http://arxiv.org/pdf/1512.02325

]. D. Lowe, Object recognition from local scale-invariant features, Proceedings of the Seventh IEEE International Conference on Computer Vision, p.72, 1999.
DOI : 10.1109/ICCV.1999.790410
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.121.4065

]. D. Lowe, Local feature view clustering for 3D object recognition, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001, p.16, 2001.
DOI : 10.1109/CVPR.2001.990541
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.125.4907

]. D. Lowe, Distinctive Image Features from Scale-Invariant Keypoints, International Journal of Computer Vision, vol.60, issue.2, pp.91-110, 2004.
DOI : 10.1023/B:VISI.0000029664.99615.94
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.14.4931

]. J. Macqueen, Some Methods for Classification and Analysis of Multivariate Observations, Proceedings of Berkeley Symposium on Mathematical Statistics and Probability, p.18, 1967.

. Mairal, Supervised Dictionary Learning, Proceedings of Advances in Neural Information Processing Systems (NIPS), 2009.
URL : https://hal.archives-ouvertes.fr/inria-00322431

&. Makhzani, ]. A. Frey-2014, B. J. Makhzani, and . Frey, k-Sparse Autoencoders, International Conference on Learning Representations (ICLR), p.19, 2014.

&. Maron, ]. O. Ratan, A. L. Maron, and . Ratan, Multiple-Instance Learning for Natural Scene Classification, Proceedings of the International Conference of Machine Learning (ICML), p.36, 1998.

. Matas, Robust wide-baseline stereo from maximally stable extremal regions, Image and Vision Computing, vol.22, issue.10, pp.761-767, 2004.
DOI : 10.1016/j.imavis.2004.02.006
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.671.8241

C. Mikolajczyk and . Schmid, An Affine Invariant Interest Point Detector, Proceedings of the European Conference on Computer Vision (ECCV), p.16, 2002.
DOI : 10.1007/3-540-47969-4_9
URL : https://hal.archives-ouvertes.fr/inria-00548252

C. Mikolajczyk and . Schmid, Scale & Affine Invariant Interest Point Detectors, International Journal of Computer Vision, vol.60, issue.1, pp.63-86, 2004.
DOI : 10.1023/B:VISI.0000027790.02288.f2
URL : https://hal.archives-ouvertes.fr/inria-00548554

C. Mikolajczyk, A. Schmid, and . Zisserman, Human Detection Based on a Probabilistic Assembly of Robust Part Detectors, Proceedings of the European Conference on Computer Vision (ECCV), p.16, 2004.
DOI : 10.1007/978-3-540-24670-1_6
URL : https://hal.archives-ouvertes.fr/inria-00548537

. Mikolajczyk, A Comparison of Affine Region Detectors, International Journal of Computer Vision, vol.65, issue.1-2, pp.43-72, 2005.
DOI : 10.1007/s11263-005-3848-x
URL : https://hal.archives-ouvertes.fr/inria-00548528

. Mikolov, Distributed Representations of Words and Phrases and their Compositionality, Proceedings of Advances in Neural Information Processing Systems (NIPS), p.97, 2013.

. Mikolov, Linguistic Regularities in Continuous Space Word Representations, Proceedings of The Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), 2013.

. Misra, Watch and learn: Semi-supervised learning of object detectors from videos, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), p.86, 2015.
DOI : 10.1109/CVPR.2015.7298982
URL : http://arxiv.org/abs/1505.05769

&. Murase, ]. H. Nayar, S. K. Murase, and . Nayar, Visual learning and recognition of 3-d objects from appearance, International Journal of Computer Vision, vol.37, issue.10, pp.5-24, 1995.
DOI : 10.1007/BF01421486

]. V. Nair and G. E. Hinton, Rectified Linear Units Improve Restricted Boltzmann Machines, Proceedings of the International Conference on Machine Learning (ICML), pp.807-814, 2010.

. Ouyang, DeepID-Net: Deformable deep convolutional neural networks for object detection, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), p.59, 2015.
DOI : 10.1109/CVPR.2015.7298854
URL : http://arxiv.org/abs/1412.5661

&. Lazebnik-2011-]-m, S. Pandey, and . Lazebnik, Scene Recognition and Weakly Supervised Object Localization with Deformable Part-Based Models, Proceedings of the International Conference on Computer Vision (ICCV), pp.61-65, 2011.

. Park, Efficient use of local edge histogram descriptor, Proceedings of the 2000 ACM workshops on Multimedia , MULTIMEDIA '00, p.15, 2000.
DOI : 10.1145/357744.357758

. Pass, Comparing images using color coherence vectors, Proceedings of the fourth ACM international conference on Multimedia , MULTIMEDIA '96, p.15, 1996.
DOI : 10.1145/244130.244148
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.29.9596

. Pedersoli, A Coarse-to-fine Approach for Fast Deformable Object Detection, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), p.28, 2011.
DOI : 10.1016/j.patcog.2014.11.006
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.422.3844

. Pennington, Glove: Global Vectors for Word Representation, Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), p.91, 2014.
DOI : 10.3115/v1/D14-1162
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.645.8863

. Perronnin, Improving the Fisher Kernel for Large-Scale Image Classification, Proceedings of the European Conference on Computer Vision (ECCV), p.72, 2010.
DOI : 10.1007/978-3-642-15561-1_11
URL : https://hal.archives-ouvertes.fr/inria-00548630

. Bibliography and . Rosenberg, Semi- Supervised Self-Training of Object Detection Models, IEEE Winter Conference on Applications of Computer Vision (WACV), p.86, 2005.

&. Rothe, . Schütze-2015-]-s, H. Rothe, and . Schütze, AutoExtend: Extending Word Embeddings to Embeddings for Synsets and Lexemes, Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), p.97, 2015.
DOI : 10.3115/v1/P15-1173
URL : http://arxiv.org/abs/1507.01127

. Fei, ImageNet Large Scale Visual Recognition Challenge, International Journal of Computer Vision (IJCV), vol.0, issue.85, pp.1-42, 2015.

]. O. Russakovsky, Scaling Up Object Detection, 2015.

. Sánchez, Image Classification with the Fisher Vector: Theory and Practice, International Journal of Computer Vision, vol.73, issue.2, pp.222-245, 2013.
DOI : 10.1007/s11263-013-0636-x

]. R. Schapire, The Boosting Approach to Machine Learning: An Overview, MSRI Workshop on Nonlinear Estimation and Classification, p.20, 2001.
DOI : 10.1007/978-0-387-21579-2_9

R. Schmid, C. Mohr, and . Bauckhage, Evaluation of Interest Point Detectors, International Journal of Computer Vision, vol.37, issue.2, pp.151-172, 2000.
DOI : 10.1023/A:1008199403446
URL : https://hal.archives-ouvertes.fr/inria-00548302

. Sermanet, OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks, International Conference on Learning Representations (ICLR), pp.2014-2036

. Bibliography and . Siva, In Defence of Negative Mining for Annotating Weakly Labelled Data, Proceedings of the European Conference on Computer Vision (ECCV), 2012. 38, pp.63-65

. Song, Contextualizing object detection and classification, CVPR 2011, pp.1585-1592, 2011.
DOI : 10.1109/CVPR.2011.5995330

. Song, On Learning to Localize Objects with Minimal Supervision, Proceedings of the International Conference of Machine Learning (ICML), pp.2014-2052
URL : https://hal.archives-ouvertes.fr/hal-00996849

. Song, Weakly-supervised Discovery of Visual Pattern Configurations, Proceedings of Advances in Neural Information Processing Systems (NIPS), p.73, 2014.

. Srebro, Maximum-Margin Matrix Factorization, Proceedings of Advances in Neural Information Processing Systems (NIPS), p.19, 2005.

&. Swain, ]. M. Ballard, D. H. Swain, and . Ballard, Color indexing, International Journal of Computer Vision, vol.31, issue.1, pp.11-32, 1991.
DOI : 10.1007/BF00130487

. Szegedy, Deep Neural Networks for Object Detection, Proceedings of Advances in Neural Information Processing Systems (NIPS). 2013. 5, pp.30-45

. Bibliography and . Szegedy, Going Deeper with Convolutions, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), p.23, 2015.

. Tang, Co-localization in Real-World Images, 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp.40-41, 2014.
DOI : 10.1109/CVPR.2014.190
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.645.4429

. Tang, Fusing generic objectness and deformable part-based models for weakly supervised object detection, 2014 IEEE International Conference on Image Processing (ICIP), p.50, 2014.
DOI : 10.1109/ICIP.2014.7025827
URL : https://hal.archives-ouvertes.fr/hal-01301105

. Tang, Large Scale Semi-Supervised Object Detection Using Visual and Semantic Knowledge Transfer, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
DOI : 10.1109/CVPR.2016.233
URL : https://hal.archives-ouvertes.fr/hal-01488579

. Tang, Weakly Supervised Learning of Deformable Part-Based Models for Object Detection via Region Proposals, IEEE Transactions on Multimedia, vol.19, issue.2, pp.1-1, 2016.
DOI : 10.1109/TMM.2016.2614862
URL : https://hal.archives-ouvertes.fr/hal-01488575

. Trulls, Segmentation-Aware Deformable Part Models, 2014 IEEE Conference on Computer Vision and Pattern Recognition, p.27, 2014.
DOI : 10.1109/CVPR.2014.29
URL : https://hal.archives-ouvertes.fr/hal-01109286

]. T. Tuytelaars, Dense interest points, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, p.17, 2010.
DOI : 10.1109/CVPR.2010.5539911