P. Agrawal, R. Girshick, and J. Malik, Analyzing the Performance of Multilayer Neural Networks for Object Recognition, European Conference on Computer Vision, ECCV, 2014.
DOI : 10.1007/978-3-319-10584-0_22

K. Ahmed, M. H. Baig, and L. Torresani, Network of Experts for Large-Scale Image Categorization, European Conference on Computer Vision, ECCV, 2016.
DOI : 10.1109/ICCV.2015.314

Z. Akata, F. Perronnin, Z. Harchaoui, and C. Schmid, Good practice in large-scale learning for image classification. Pattern Analysis and Machine Intelligence, 2014.
URL : https://hal.archives-ouvertes.fr/hal-00690014

R. Aljundi, P. Chakravarty, and T. Tuytelaars, Expert Gate: Lifelong Learning with a Network of Experts, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
DOI : 10.1109/CVPR.2017.753

G. Andrew, R. Arora, J. Bilmes, and K. Livescu, Deep canonical correlation analysis, International Conference on Machine Learning, ICML, 2013.

S. Antol, A. Agrawal, J. Lu, M. Mitchell, D. Batra et al., VQA: Visual Question Answering, 2015 IEEE International Conference on Computer Vision (ICCV), 2015.
DOI : 10.1109/ICCV.2015.279

J. Atif, C. Hudelot, G. Fouquier, I. Bloch, and E. D. Angelini, From generic knowledge to specific reasoning for medical image interpretation using graph based representations, International Joint Conference on Artificial Intelligence, IJCAI, 2007.

J. Atkinson, The developing visual brain, 2002.
DOI : 10.1093/acprof:oso/9780198525998.001.0001

S. Avila, N. Thome, M. Cord, E. Valle, and A. D. Araújo, Pooling in image representation: The visual codeword point of view, Computer Vision and Image Understanding, vol.117, issue.5, 2013.
DOI : 10.1016/j.cviu.2012.09.007

URL : https://hal.archives-ouvertes.fr/hal-01172709

Y. Aytar, C. Vondrick, and A. Torralba, See, hear, and read: Deep aligned representations, 2017.

H. Azizpour, A. Razavian, J. Sullivan, A. Maki, and S. Carlsson, Factors of transferability for a generic convnet representation. Pattern Analysis and Machine Intelligence, 2015.

S. Bai, S. Agethen, T. Chao, and W. Hsu, Semi-supervised learning for convolutional neural networks via online graph construction. arXiv preprint, 2015.

H. Bannour and C. Hudelot, Towards ontologies for image interpretation and annotation, 2011 9th International Workshop on Content-Based Multimedia Indexing (CBMI), 2011.
DOI : 10.1109/CBMI.2011.5972547

URL : https://hal.archives-ouvertes.fr/hal-00825255

H. Bannour and C. Hudelot, Building and using fuzzy multimedia ontologies for semantic image annotation, Transactions on Multimedia Tools and Applications, 2014.
DOI : 10.1109/JPROC.2010.2050411

URL : https://hal.archives-ouvertes.fr/hal-00825169

M. Bar and S. Ullman, Spatial Context in Recognition, Perception, vol.25, issue.3, 1996.
DOI : 10.1016/0010-0277(89)90036-X

E. Baralis, S. Chiusano, and P. Garza, A Lazy Approach to Associative Classification, IEEE Transactions on Knowledge and Data Engineering, vol.20, issue.2, 2008.
DOI : 10.1109/TKDE.2007.190677

C. Barnes, E. Shechtman, A. Finkelstein, and D. B. Goldman, Patchmatch: A randomized correspondence algorithm for structural image editing, ACM Transactions on Graphics, 2009.

D. Bau, B. Zhou, A. Khosla, A. Oliva, and A. Torralba, Network Dissection: Quantifying Interpretability of Deep Visual Representations, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
DOI : 10.1109/CVPR.2017.354

S. Bell, P. Upchurch, N. Snavely, and K. Bala, Material recognition in the wild with the materials in context database (supplemental material), Computer Vision and Pattern Recognition, CVPR, 2015.

Y. Bengio, A. Courville, and P. Vincent, Representation learning: A review and new perspectives. Pattern Analysis and Machine Intelligence, PAMI, 2013.

A. Bergamo and L. Torresani, Meta-class features for large-scale object categorization on a budget, 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012.
DOI : 10.1109/CVPR.2012.6248040

A. Bergamo, L. Torresani, and A. W. Fitzgibbon, Picodes: Learning a compact code for novel-category recognition, Advances in Neural Information Processing Systems, NIPS, 2011.

H. Bilen and A. Vedaldi, Universal representations: The missing link between faces, text, planktons, and cat breeds, 2017.

S. Bird, E. Klein, and E. Loper, Natural language processing with Python, 2009.

D. M. Blei and M. I. Jordan, Modeling annotated data, Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval , SIGIR '03, 2003.
DOI : 10.1145/860435.860460

M. Blot, T. Robert, N. Thome, and M. Cord, Shade: Information-based regularization for deep learning, International Conference on Image Processing, ICIP, 2018.
DOI : 10.1109/icip.2018.8451092

URL : https://hal.archives-ouvertes.fr/hal-01994740

F. Bousefsaf, M. Tamaazousti, S. H. Said, and R. Michel, Image completion using multispectral imaging. IET Image Processing, 2018.

M. Brady, Artificial intelligence and robotics, Artificial intelligence, 1985.

S. Brodeur, E. Perez, A. Anand, F. Golemo, L. Celotti et al., Home: A household multimodal environment, International Conference on Learning Representations, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01653037

M. Bucher, S. Herbin, and F. Jurie, Improving Semantic Embedding Consistency by Metric Learning for Zero-Shot Classiffication, European Conference on Computer Vision, ECCV, 2016.
DOI : 10.1007/s11263-013-0695-z

F. Carrara, A. Esuli, T. Fagni, F. Falchi, and A. M. Fernández, Picture it in your mind: generating high level visual representations from textual descriptions, Information Retrieval Journal, vol.78, issue.10, 2017.
DOI : 10.1109/5.58337

URL : http://arxiv.org/pdf/1606.07287

F. Chabot, M. Chaouch, J. Rabarisoa, C. Teulì, and T. Chateau, Deep MANTA: A Coarse-to-Fine Many-Task Network for Joint 2D and 3D Vehicle Analysis from Monocular Image, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
DOI : 10.1109/CVPR.2017.198

URL : https://hal.archives-ouvertes.fr/hal-01653519

I. Chami, Y. Tamaazousti, and H. L. Borgne, AMECON, Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval , ICMR '17, p.2017
DOI : 10.1109/CVPR.2015.7298966

URL : https://hal.archives-ouvertes.fr/cea-01813718

K. Chatfield, K. Simonyan, A. Vedaldi, and A. Zisserman, Return of the Devil in the Details: Delving Deep into Convolutional Nets, Proceedings of the British Machine Vision Conference 2014, 2014.
DOI : 10.5244/C.28.6

X. Chen and A. Gupta, Webly Supervised Learning of Convolutional Networks, 2015 IEEE International Conference on Computer Vision (ICCV), 2015.
DOI : 10.1109/ICCV.2015.168

G. Chican and M. Tamaazousti, Constrained PatchMatch for Image Completion, Advances in Visual Computing, 2014.
DOI : 10.1007/978-3-319-14249-4_53

URL : https://hal.archives-ouvertes.fr/cea-01836543

F. Chollet, Xception: Deep Learning with Depthwise Separable Convolutions, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
DOI : 10.1109/CVPR.2017.195

URL : http://arxiv.org/pdf/1610.02357

T. Chua, J. Tang, R. Hong, H. Li, Z. Luo et al., NUS-WIDE, Proceeding of the ACM International Conference on Image and Video Retrieval, CIVR '09, 2009.
DOI : 10.1145/1646396.1646452

M. Cimpoi, S. Maji, I. Kokkinos, and A. Vedaldi, Deep Filter Banks for Texture Recognition, Description, and Segmentation, International Journal of Computer Vision, vol.83, issue.1, p.2016
DOI : 10.1007/978-3-642-15555-0_11

URL : https://hal.archives-ouvertes.fr/hal-01263622

G. Collell, T. Zhang, and M. Moens, Imagined visual representations as multimodal embeddings, Association for the Advancement of Artificial Intelligence, AAAI, 2017.

A. Conneau and D. Kiela, Senteval: An evaluation toolkit for universal sentence representations. arXiv preprint, 2018.

A. Conneau, D. Kiela, H. Schwenk, L. Barrault, and A. Bordes, Supervised learning of universal sentence representations from natural language inference data. arXiv preprint, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01897968

A. Conneau, G. Kruszewski, G. Lample, L. Barrault, and M. Baroni, What you can cram into a single vector: Probing sentence embeddings for linguistic properties. arXiv preprint, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01898412

A. Conneau, H. Schwenk, L. Barrault, and Y. Lecun, Very Deep Convolutional Networks for Text Classification, Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, 2017.
DOI : 10.18653/v1/E17-1104

URL : https://hal.archives-ouvertes.fr/hal-01454940

J. , C. Pereira, E. Coviello, G. Doyle, N. Rasiwasia et al., On the role of correlation and abstraction in cross-modal multimedia retrieval. Pattern Analysis and Machine Intelligence, 2014.

G. Csurka, C. Dance, L. Fan, J. Willamowski, and C. Bray, Visual categorization with bags of keypoints, European Conference on Computer Vision, ECCV-Workshop

G. Cybenko, Approximation by superpositions of a sigmoidal function, Mathematics of Control, Signals, and Systems, 1989.

H. Daher, R. Besançon, O. Ferret, H. Le-borgne, A. Daquo et al., Désambigu¨Désambigu¨?sation d'entités nommées par apprentissage de modèles d'entitésentités`entitésà largé echelle, COnférence en Recherche d'Information et Applications, 2017.

H. Daher, R. Besançon, O. Ferret, H. Le-borgne, A. Daquo et al., Supervised learning of entity disambiguation models by negative sample selection, International Conference on Computational Linguistics and Intelligent Text Processing, CICLing, 2017.
URL : https://hal.archives-ouvertes.fr/cea-01857878

N. Dalal and B. Triggs, Histograms of Oriented Gradients for Human Detection, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), 2005.
DOI : 10.1109/CVPR.2005.177

URL : https://hal.archives-ouvertes.fr/inria-00548512

A. Das, S. Kottur, K. Gupta, A. Singh, D. Yadav et al., Visual dialog, Computer Vision and Pattern Recognition, CVPR, 2017.

J. Deng, N. Ding, Y. Jia, A. Frome, K. Murphy et al., Large-Scale Object Classification Using Label Relation Graphs, European Conference on Computer Vision, ECCV, 2014.
DOI : 10.1007/978-3-319-10590-1_4

J. Deng, W. Dong, R. Socher, L. Li, K. Li et al., ImageNet: A large-scale hierarchical image database, 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009.
DOI : 10.1109/CVPR.2009.5206848

J. Deng, J. Krause, A. C. Berg, and L. Fei-fei, Hedging your bets: Optimizing accuracy-specificity trade-offs in large scale visual recognition, Computer Vision and Pattern Recognition, CVPR, 2012.

P. Dollar and C. L. Zitnick, Fast edge detection using structured forests. Pattern Analysis and Machine Intelligence, PAMI, 2015.

J. Donahue, L. A. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan et al., Long-term recurrent convolutional networks for visual recognition and description, Computer Vision and Pattern Recognition, CVPR, 2015.

J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang et al., Decaf: A deep convolutional activation feature for generic visual recognition, International Conference on Machine Learning, ICML, 2014.

J. Dong, X. Li, and C. G. Snoek, Predicting Visual Features from Text for Image and Video Caption Retrieval, IEEE Transactions on Multimedia, 2018.
DOI : 10.1109/TMM.2018.2832602

O. F. Dorian-kodelja and R. Besançon, Intégration de contexte global par amorçage pour la détection d'´ evénements, conférence sur le Traitement Automatique des Langues Naturelles, TALN, 2018.

T. Durand, Weakly supervised learning for visual recognition, 2017.
URL : https://hal.archives-ouvertes.fr/tel-01635374

T. Durand, T. Mordan, N. Thome, and M. Cord, WILDCAT: Weakly Supervised Learning of Deep ConvNets for Image Classification, Pointwise Localization and Segmentation, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
DOI : 10.1109/CVPR.2017.631

URL : https://hal.archives-ouvertes.fr/hal-01515640

T. Durand, D. Picard, N. Thome, and M. Cord, Semantic pooling for image categorization using multiple kernel learning, 2014 IEEE International Conference on Image Processing (ICIP), p.2014
DOI : 10.1109/ICIP.2014.7025033

URL : https://hal.archives-ouvertes.fr/hal-01077046

T. Durand, N. Thome, and M. Cord, WELDON: Weakly Supervised Learning of Deep Convolutional Neural Networks, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
DOI : 10.1109/CVPR.2016.513

URL : https://hal.archives-ouvertes.fr/hal-01343785

T. Durand, N. Thome, and M. Cord, Exploiting negative evidence for deep latent structured models. Pattern Analysis and Machine Intelligence, PAMI, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01969819

T. Durand, N. Thome, M. Cord, and S. Avila, Image classification using object detectors, 2013 IEEE International Conference on Image Processing, p.2013
DOI : 10.1109/ICIP.2013.6738894

URL : https://hal.archives-ouvertes.fr/hal-01078079

A. Dutt, D. Pellerin, and G. Quenot, Improving Image Classification using Coarse and Fine Labels, Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval , ICMR '17, p.2017
DOI : 10.1109/ICCV.2013.260

URL : https://hal.archives-ouvertes.fr/hal-01590672

A. Eisenschtat and L. Wolf, Linking Image and Text with 2-Way Nets, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
DOI : 10.1109/CVPR.2017.201

URL : http://arxiv.org/pdf/1608.07973

M. Engilberge, L. Chevallier, P. Pérez, and M. Cord, Finding beans in burgers: Deep semantic-visual embedding with localization, Computer Vision and Pattern Recognition, CVPR, 2018.

D. Erhan, Y. Bengio, A. Courville, and P. Vincent, Visualizing higher-layer features of a deep network, 2009.

M. Everingham, L. Van-gool, C. K. Williams, J. Winn, and A. Zisserman, The Pascal Visual Object Classes (VOC) Challenge, International Journal of Computer Vision, vol.73, issue.2, 2012.
DOI : 10.1371/journal.pcbi.0040027

M. Everingham, L. Van-gool, C. K. Williams, J. Winn, and A. Zisserman, The Pascal Visual Object Classes (VOC) Challenge, International Journal of Computer Vision, vol.73, issue.2, 2010.
DOI : 10.1371/journal.pcbi.0040027

F. Faghri, D. J. Fleet, J. R. Kiros, and S. Fidler, Vse++: Improving visual-semantic embeddings with hard negatives. arXiv preprint, 2017.

L. Fei-fei, R. Fergus, and P. Perona, One-shot learning of object categories. Pattern Analysis and Machine Intelligence, PAMI, 2006.

Y. Feng and M. Lapata, Topic models for image annotation and text illustration, ACL Human Language Technologies, HLT, 2010.

I. France, , 2017.

R. M. French, Catastrophic forgetting in connectionist networks, Trends in Cognitive Sciences, vol.3, issue.4, pp.128-135, 1999.
DOI : 10.1016/S1364-6613(99)01294-2

A. Frome, G. S. Corrado, J. Shlens, S. Bengio, J. Dean et al., Devise: A deep visual-semantic embedding model, Advances in Neural Information Processing Systems, NIPS, 2013.

G. Gay-bellile, S. Bourgeois, M. Tamaazousti, S. Naudet-collette, and S. Knodel, A mobile markerless augmented reality system for the automotive field, International Symposium on Mixed and Augmented Reality Workshop, 2012.

V. Gay-bellile, M. Tamaazousti, R. Dupont, and S. Naudet-collette, A vision-based hybrid system for real-time accurate localization in an indoor environment, International Conference on Computer Vision Theory and Applications, 2010.

P. Gehler and S. Nowozin, On feature combination for multiclass object classification, 2009 IEEE 12th International Conference on Computer Vision, 2009.
DOI : 10.1109/ICCV.2009.5459169

A. L. Ginsca, A. Popescu, H. Le-borgne, N. Ballas, P. Vo et al., Large-Scale Image Mining with Flickr Groups, International Conference on Multimedia Modelling, MM, 2015.
DOI : 10.1007/978-3-319-14445-0_28

URL : https://hal.archives-ouvertes.fr/hal-01172319

R. Girshick, J. Donahue, T. Darrell, and J. Malik, Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation, 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014.
DOI : 10.1109/CVPR.2014.81

H. Goh, N. Thome, M. Cord, and J. Lim, Top-down regularization of deep belief networks, NIPS, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00947569

C. Gomez, H. Le-borgne, P. Allemand, C. Delacourt, and P. Ledru, N???FindR method versus independent component analysis for lithological identification in hyperspectral imagery, International Journal of Remote Sensing, vol.14, issue.23, 2007.
DOI : 10.1109/36.54356

URL : https://hal.archives-ouvertes.fr/insu-00273351

Y. Gong, Q. Ke, M. Isard, and S. Lazebnik, A Multi-View Embedding Space for Modeling Internet Images, Tags, and Their Semantics, International Journal of Computer Vision, vol.22, issue.12, p.2014
DOI : 10.1109/TPAMI.2008.127

Y. Gong, L. Wang, M. Hodosh, J. Hockenmaier, and S. Lazebnik, Improving Image-Sentence Embeddings Using Large Weakly Annotated Photo Collections, European Conference on Computer Vision, ECCV, 2014.
DOI : 10.1007/978-3-319-10593-2_35

A. Gonzalez-garcia, D. Modolo, and V. Ferrari, Do semantic parts emerge in convolutional neural networks? arXiv preprint, 2016.

I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning, 2016.

I. Goodfellow, J. Pouget-abadie, M. Mirza, B. Xu, D. Warde-farley et al., Generative adversarial nets, Advances in Neural Information Processing Systems, NIPS, 2014.

I. J. Goodfellow, D. Warde-farley, M. Mirza, A. Courville, and Y. Bengio, International Conference on Machine Learning, ICML, 2013.

A. Gordoa, J. A. Rodríguez-serrano, F. Perronnin, and E. Valveny, Leveraging category-level labels for instance-level image retrieval, 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012.
DOI : 10.1109/CVPR.2012.6248035

K. Grauman and T. Darrell, The pyramid match kernel: discriminative classification with sets of image features, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1, 2005.
DOI : 10.1109/ICCV.2005.239

G. Griffin, A. Holub, and P. Perona, Caltech-256 object category dataset, 2007.

S. H. Said, M. Tamaazousti, and A. Bartoli, Image-Based Models for Specularity Propagation in Diminished Reality, IEEE Transactions on Visualization and Computer Graphics, vol.24, issue.7, 2017.
DOI : 10.1109/TVCG.2017.2705687

URL : https://hal.archives-ouvertes.fr/hal-01657292

D. R. Hardoon, S. R. Szedmak, and J. R. Shawe-taylor, Canonical Correlation Analysis: An Overview with Application to Learning Methods, Neural Computation, vol.10, issue.12, 2004.
DOI : 10.1093/biomet/58.3.433

K. He, X. Zhang, S. Ren, and J. Sun, Spatial pyramid pooling in deep convolutional networks for visual recognition, European Conference on Computer Vision, ECCV, 2014.

K. He, X. Zhang, S. Ren, and J. Sun, Deep Residual Learning for Image Recognition, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
DOI : 10.1109/CVPR.2016.90

L. Herranz, S. Jiang, and X. Li, Scene Recognition with CNNs: Objects, Scales and Dataset Bias, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
DOI : 10.1109/CVPR.2016.68

M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, G. Klambauer et al., Gans trained by a two time-scale update rule converge to a nash equilibrium. arXiv preprint, 2017.

G. Hinton, S. Osindero, and Y. Teh, A Fast Learning Algorithm for Deep Belief Nets, Neural Computation, vol.18, issue.7, 2006.
DOI : 10.1162/jmlr.2003.4.7-8.1235

G. Hinton, O. Vinyals, and J. Dean, Distilling the knowledge in a neural network, 2015.

S. Hochreiter and J. Schmidhuber, Long Short-Term Memory, Neural Computation, vol.4, issue.8, 1997.
DOI : 10.1016/0893-6080(88)90007-X

M. Hodosh, P. Young, and J. Hockenmaier, Framing image description as a ranking task: Data, models and evaluation metrics, Journal of Artificial Intelligence Research, 2013.

Z. Hu, X. Ma, Z. Liu, E. H. Hovy, and E. P. Xing, Harnessing Deep Neural Networks with Logic Rules, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2016.
DOI : 10.18653/v1/P16-1228

G. Huang, Z. Liu, K. Q. Weinberger, and L. Van-der-maaten, Densely Connected Convolutional Networks, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
DOI : 10.1109/CVPR.2017.243

Y. Huang, Q. Wu, and L. Wang, Learning semantic concepts and order for image and sentence matching, Computer Vision and Pattern Recognition, CVPR, 2017.

C. Hudelot, J. Atif, and I. Bloch, Fuzzy spatial relation ontology for image interpretation. Fuzzy Sets and Systems, 2008.
URL : https://hal.archives-ouvertes.fr/hal-00824590

C. Hudelot, J. Atif, and I. Bloch, Alc(f): A new description logic for spatial reasoning in images, European Conference on Computer Vision, ECCV Workshop, 2014.

M. Huh, P. Agrawal, and A. A. Efros, What makes imagenet good for transfer learning?, Advances in Neural Information Processing Systems, 2016.

S. Ioffe and C. Szegedy, Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint, 2015.

A. Jaimes and S. Chang, Conceptual framework for indexing visual information at multiple levels, Internet Imaging International Society for Optics and Photonics, pp.2-16, 1999.

M. Jain, J. C. Van-gemert, T. Mensink, and C. G. Snoek, Objects2action: Classifying and Localizing Actions without Any Video Example, 2015 IEEE International Conference on Computer Vision (ICCV), 2015.
DOI : 10.1109/ICCV.2015.521

H. Jégou, M. Douze, C. Schmid, and P. Pérez, Aggregating local descriptors into a compact image representation, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010.
DOI : 10.1109/CVPR.2010.5540039

Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long et al., Caffe, Proceedings of the ACM International Conference on Multimedia, MM '14, 2014.
DOI : 10.1145/2647868.2654889

L. Li, H. Su, L. Fei-fei, and E. P. Xing, Object bank: A high-level image representation for scene classification & semantic feature sparsification, Advances in Neural Information Processing Systems, NIPS, 2010.

J. Johnson, R. Krishna, M. Stark, L. Li, D. A. Shamma et al., Image retrieval using scene graphs, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
DOI : 10.1109/CVPR.2015.7298990

R. Johnson and T. Zhang, Semi-supervised convolutional neural networks for text categorization via region embedding, Advances in Neural Information Processing Systems, NIPS, 2015.

P. Jolicoeur, M. A. Gluck, and S. M. Kosslyn, Pictures and names: Making the connection, Cognitive Psychology, vol.16, issue.2, 1984.
DOI : 10.1016/0010-0285(84)90009-4

C. Jörgensen, Attributes of images in describing tasks, Information Processing & Management, vol.34, issue.2-3, pp.161-174, 1998.
DOI : 10.1016/S0306-4573(97)00077-0

A. Joulin, L. Van-der-maaten, A. Jabri, and N. Vasilache, Learning Visual Features from Large Weakly Supervised Data, European Conference on Computer Vision, ECCV, 2016.
DOI : 10.1109/ICCV.2015.283

L. Kaiser, A. N. Gomez, N. Shazeer, A. Vaswani, N. Parmar et al., One model to learn them all. arXiv preprint, 2017.

A. Karpathy and L. Fei-fei, Deep visual-semantic alignments for generating image descriptions, Computer Vision and Pattern Recognition, CVPR, 2015.

A. Karpathy, A. Joulin, and F. F. Li, Deep fragment embeddings for bidirectional image sentence mapping, Advances in Neural Information Processing Systems, NIPS, 2014.

D. Kiela, A. Conneau, A. Jabri, and M. Nickel, Learning Visually Grounded Sentence Representations, Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), 2017.
DOI : 10.18653/v1/N18-1038

Y. Kim, Convolutional neural networks for sentence classification. arXiv preprint, 2014.

D. P. Kingma and J. Ba, Adam: A method for stochastic optimization. arXiv preprint, 2014.

R. Kiros, R. Salakhutdinov, and R. S. , Unifying visual-semantic embeddings with multimodal neural language models, 2014.

B. Klein, G. Lev, G. Sadeh, and L. Wolf, Fisher vectors derived from hybrid gaussian-laplacian mixture models for image annotation, 2014.

D. Kodelja, R. Besancon, and O. Ferret, Représentations et modèles en extraction d'´ evénements supervisée, Rencontres des Jeunes Chercheurs en Intelligence Artificielle

I. Kokkinos, UberNet: Training a Universal Convolutional Neural Network for Low-, Mid-, and High-Level Vision Using Diverse Datasets and Limited Memory, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
DOI : 10.1109/CVPR.2017.579

J. Krause, M. Stark, J. Deng, and L. Fei-fei, 3D Object Representations for Fine-Grained Categorization, 2013 IEEE International Conference on Computer Vision Workshops, 2013.
DOI : 10.1109/ICCVW.2013.77

A. Krizhevsky, I. Sutskever, and G. E. Hinton, ImageNet classification with deep convolutional neural networks, Advances in Neural Information Processing Systems, NIPS, 2012.
DOI : 10.1162/neco.2009.10-08-881

L. I. Kuncheva and J. J. Rodriguez, Classifier Ensembles with a Random Linear Oracle, IEEE Transactions on Knowledge and Data Engineering, vol.19, issue.4, 2007.
DOI : 10.1109/TKDE.2007.1016

S. S. Layne, Some issues in the indexing of images, Journal of the American Society for Information Science, vol.45, issue.8, 1994.
DOI : 10.1002/(SICI)1097-4571(199409)45:8<583::AID-ASI13>3.0.CO;2-N

S. Lazebnik, C. Schmid, and J. Ponce, Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Volume 2 (CVPR'06), 2006.
DOI : 10.1109/CVPR.2006.68

URL : https://hal.archives-ouvertes.fr/inria-00548585

H. , L. Borgne, E. Gadeski, I. Chami, T. Q. Tran et al., Image annotation and two paths to text illustration, CLEF 2016 Evaluation Labs and Workshop, Online Working Notes, 2016.
URL : https://hal.archives-ouvertes.fr/cea-01843172

L. Borgne and A. Guérin-dugué, Sparse-dispersed coding and images discrimination with independent component analysis, International Conference on ICA and BSS, 2001.

H. , L. Borgne, A. Guérin-dugué, and A. Antoniadis, Representation of images for classification with independent features, Pattern Recognition Letters, 2004.

H. , L. Borgne, A. Guérin-dugué, and N. E. Connor, Learning midlevel image features for natural scene and texture classification. IEEE Transaction on Circuits and Systems for Video Technologies, 2007.
URL : https://hal.archives-ouvertes.fr/hal-00276879

Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, Gradient-based learning applied to document recognition, Proceedings of the IEEE, pp.2278-2324, 1998.
DOI : 10.1109/5.726791

C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunningham et al., Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
DOI : 10.1109/CVPR.2017.19

Y. Li, W. Ouyang, B. Zhou, K. Wang, and X. Wang, Scene Graph Generation from Objects, Phrases and Region Captions, 2017 IEEE International Conference on Computer Vision (ICCV), 2017.
DOI : 10.1109/ICCV.2017.142

Y. Li, J. Yosinski, J. Clune, H. Lipson, and J. Hopcroft, Convergent learning: Do different neural networks learn the same representations, International Conference on Learning Representations, ICLR, 2016.

T. Lin, M. Maire, S. Belongie, J. Hays, P. Perona et al., Microsoft COCO: Common Objects in Context, European Conference on Computer Vision, ECCV, 2014.
DOI : 10.1007/978-3-319-10602-1_48

X. Ling, S. Singh, and D. Weld, Design challenges for entity linking, Transactions of the Association for Computational Linguistics, 2015.

G. Litjens, T. Kooi, B. E. Bejnordi, A. A. Setio, F. Ciompi et al., A survey on deep learning in medical image analysis Medical image analysis, 2017.
DOI : 10.1016/j.media.2017.07.005

URL : http://arxiv.org/pdf/1702.05747

L. Liu, L. Wang, and X. Liu, In defense of soft-assignment coding, International Conference on Computer Vision, ICCV, 2011.

W. Liu, A. Rabinovich, and A. C. Berg, Parsenet: Looking wider to see better, International Conference on Learning Representations, ICLR Workshop, 2016.

Y. Liu, Y. Guo, E. M. Bakker, and M. S. Lew, Learning a Recurrent Residual Fusion Network for Multimodal Matching, 2017 IEEE International Conference on Computer Vision (ICCV), p.2017
DOI : 10.1109/ICCV.2017.442

K. Longi, T. Pulkkinen, and A. Klami, Semi-supervised convolutional neural networks for identifying wi-fi interference sources, Asian Conference on Machine Learning, ACML, 2017.

D. G. Lowe, Distinctive Image Features from Scale-Invariant Keypoints, International Journal of Computer Vision, vol.60, issue.2, 2004.
DOI : 10.1023/B:VISI.0000029664.99615.94

URL : http://www.cs.ubc.ca/~lowe/papers/ijcv03.ps

C. Lu, R. Krishna, M. S. Bernstein, and F. Li, Visual Relationship Detection with Language Priors, European Conference on Computer Vision, ECCV, 2016.
DOI : 10.1023/B:VISI.0000029664.99615.94

URL : http://arxiv.org/pdf/1608.00187

D. P. Carvalho and R. Cadene, Cross-Modal Retrieval in the Cooking Context, The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval , SIGIR '18, 2018.
DOI : 10.1109/CVPR.2015.7298966

Z. Ma, Y. Lu, and D. Foster, Finding linear structure in large datasets with scalable canonical correlation analysis, International Conference on Machine Learning, ICML, 2015.

A. L. Maas, A. Y. Hannun, and A. Y. Ng, Rectifier nonlinearities improve neural network acoustic models, International Conference on Machine Learning, ICML, 2013.

A. Mahendran and A. Vedaldi, Understanding deep image representations by inverting them, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
DOI : 10.1109/CVPR.2015.7299155

URL : http://arxiv.org/pdf/1412.0035

T. Malisiewicz and A. A. Efros, Beyond categories: The visual memex model for reasoning about object relationships, Advances in Neural Information Processing Systems, NIPS, 2009.

S. Mallat, A Wavelet Tour of Signal Processing, 1999.

A. Mallya and S. Lazebnik, Packnet: Adding multiple tasks to a single network by iterative pruning. arXiv preprint, 2017.

F. Manessi, A. Rozza, S. Bianco, P. Napoletano, and R. Schettini, Automated pruning for deep neural network compression, 2017.

J. Mao, W. Xu, Y. Yang, J. Wang, Z. Huang et al., Deep captioning with multimodal recurrent neural networks (m-rnn). arXiv preprint, 2014.

K. Marino, R. Salakhutdinov, and A. Gupta, The More You Know: Using Knowledge Graphs for Image Classification, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
DOI : 10.1109/CVPR.2017.10

URL : http://arxiv.org/pdf/1612.04844

A. Mathews, L. Xie, and X. He, Choosing Basic-Level Concept Names Using Visual and Language Context, 2015 IEEE Winter Conference on Applications of Computer Vision, p.2015
DOI : 10.1109/WACV.2015.85

URL : http://users.cecs.anu.edu.au/%7Exlx/papers/wacv2015.pdf

W. S. Mcculloch and W. Pitts, A logical calculus of the ideas immanent in nervous activity. The bulletin of mathematical biophysics, pp.115-133, 1943.

S. Meftah, N. Semmar, and F. Sadat, A neural network model for part-of-speech tagging of social media texts, Conference on Language Resources and Evaluation, LREC, 2018.

P. Mettes, D. Koelma, and C. G. Snoek, The ImageNet Shuffle, Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval, ICMR '16, 2016.
DOI : 10.1145/2733373.2806221

URL : http://arxiv.org/pdf/1602.07119

T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, Distributed representations of words and phrases and their compositionality, Advances in Neural Information Processing Systems, NIPS, 2013.

G. A. Miller, WordNet: a lexical database for English, Communications of the ACM, vol.38, issue.11, pp.39-41, 1995.
DOI : 10.1145/219717.219748

M. Minsky and S. Papert, Perceptrons: An Introduction to Computational Geometry, 1969.

P. Molchanov, S. Tyree, T. Karras, T. Aila, and J. Kautz, Pruning convolutional neural networks for resource efficient inference, International Conference on Learning Representations, ICLR, 2016.

F. Monay and D. Gatica-perez, Modeling semantic aspects for cross-media image indexing. Pattern Analysis and Machine Intelligence, PAMI, 2007.
DOI : 10.1109/tpami.2007.1097

URL : http://publications.idiap.ch/downloads/papers/2007/monay-pami-2007.pdf

A. Morgand and M. Tamaazousti, Generic and real-time detection of specular reflections in images, International Conference on Computer Vision Theory and Applications, p.2014

A. Morgand, M. Tamaazousti, and A. Bartoli, An Empirical Model for Specularity Prediction with Application to Dynamic Retexturing, 2016 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), p.2016
DOI : 10.1109/ISMAR.2016.13

A. Morgand, M. Tamaazousti, and A. Bartoli, A Multiple-View Geometric Model of Specularities on Non-Planar Shapes with Application to Dynamic Retexturing, IEEE Transactions on Visualization and Computer Graphics, vol.23, issue.11, 2017.
DOI : 10.1109/TVCG.2017.2734538

URL : https://hal.archives-ouvertes.fr/hal-01657253

H. Murase and S. K. Nayar, Visual learning and recognition of 3-d objects from appearance, International Journal of Computer Vision, vol.37, issue.10, 1995.
DOI : 10.1007/BF01421486

V. N. Murthy, V. Singh, T. Chen, R. Manmatha, and D. Comaniciu, Deep decision network for multiclass image classification, Computer Vision and Pattern Recognition, CVPR, 2016.
DOI : 10.1109/cvpr.2016.246

H. Nam, J. Ha, and J. Kim, Dual Attention Networks for Multimodal Reasoning and Matching, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
DOI : 10.1109/CVPR.2017.232

URL : http://arxiv.org/pdf/1611.00471

A. P. Natsev, M. R. Naphade, and J. R. Smith, Semantic representation, Proceedings of the 2004 ACM SIGKDD international conference on Knowledge discovery and data mining , KDD '04, 2004.
DOI : 10.1145/1014052.1014133

J. Ngiam, A. Khosla, M. Kim, J. Nam, H. Lee et al., Multimodal deep learning International Conference on Machine Learning, ICML, 2011.

A. Nie, E. D. Bennett, and N. D. Goodman, Dissent: Sentence representation learning from explicit discourse relations, 2017.

M. Nilsback and A. Zisserman, Automated Flower Classification over a Large Number of Classes, 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing, 2008.
DOI : 10.1109/ICVGIP.2008.47

Y. Nishino, K. Sato, and I. K. , Eigen-texture method: Appearance compression based on 3-d model, Computer Vision and Pattern Recognition, CVPR, 1999.

M. Noroozi, H. Pirsiavash, and P. Favaro, Representation Learning by Learning to Count, 2017 IEEE International Conference on Computer Vision (ICCV), 2017.
DOI : 10.1109/ICCV.2017.628

D. Novotny, D. Larlus, and A. Vedaldi, Learning 3D Object Categories by Looking Around Them, 2017 IEEE International Conference on Computer Vision (ICCV), 2017.
DOI : 10.1109/ICCV.2017.558

A. Oliva and A. Torralba, Modeling the shape of the scene: A holistic representation of the spatial envelope, International Journal of Computer Vision, 2001.

M. Oquab, L. Bottou, I. Laptev, and J. Sivic, Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks, 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014.
DOI : 10.1109/CVPR.2014.222

URL : https://hal.archives-ouvertes.fr/hal-00911179

M. Oquab, L. Bottou, I. Laptev, and J. Sivic, Is object localization for free? - Weakly-supervised learning with convolutional neural networks, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
DOI : 10.1109/CVPR.2015.7298668

URL : https://hal.archives-ouvertes.fr/hal-01015140

V. Ordonez, J. Deng, Y. Choi, A. C. Berg, and T. Berg, From large scale image categorization to entrylevel categories, International Conference on Computer Vision, ICCV, 2013.
DOI : 10.1109/iccv.2013.344

URL : http://www.cs.unc.edu/~vicente/files/entrylevel.pdf

V. Ordonez, W. Liu, J. Deng, Y. Choi, A. C. Berg et al., Predicting Entry-Level Categories, International Journal of Computer Vision, vol.30, issue.1, p.2015
DOI : 10.1145/1101826.1101838

W. Ouyang, X. Wang, C. Zhang, and X. Yang, Factors in Finetuning Deep Model for Object Detection with Long-Tail Distribution, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
DOI : 10.1109/CVPR.2016.100

M. Palatucci, D. Pomerleau, G. E. Hinton, and T. M. Mitchell, Zero-shot learning with semantic output codes, Advances in Neural Information Processing Systems, NIPS, 2009.

S. J. Pan and Q. Yang, A Survey on Transfer Learning, IEEE Transactions on Knowledge and Data Engineering, vol.22, issue.10, 2010.
DOI : 10.1109/TKDE.2009.191

URL : http://www.cs.ust.hk/~sinnopan/publications/TLsurvey_0822.pdf

D. Pathak, R. Girshick, P. Dollár, T. Darrell, and B. Hariharan, Learning Features by Watching Objects Move, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
DOI : 10.1109/CVPR.2017.638

URL : http://arxiv.org/pdf/1612.06370

D. Pathak, P. Krahenbuhl, J. Donahue, T. Darrell, and A. A. Efros, Context Encoders: Feature Learning by Inpainting, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
DOI : 10.1109/CVPR.2016.278

URL : http://arxiv.org/pdf/1604.07379

H. Peng, F. Long, and C. Ding, Feature selection based on mutual information criteria of maxdependency , max-relevance, and min-redundancy. Pattern Analysis and Machine Intelligence, 2005.

F. Perronnin and C. Dance, Fisher Kernels on Visual Vocabularies for Image Categorization, 2007 IEEE Conference on Computer Vision and Pattern Recognition, 2007.
DOI : 10.1109/CVPR.2007.383266

URL : http://www.xrce.xerox.com/Publications/Attachments/2006-034/2006-034.pdf

F. Perronnin, J. Sánchez, and T. Mensink, Improving the Fisher Kernel for Large-Scale Image Classification, European Conference on Computer Vision, ECCV, 2010.
DOI : 10.1007/978-3-642-15561-1_11

URL : https://hal.archives-ouvertes.fr/inria-00548630

D. Picard and P. Gosselin, Improving image similarity with vectors of locally aggregated tensors, 2011 18th IEEE International Conference on Image Processing, 2011.
DOI : 10.1109/ICIP.2011.6116641

URL : https://hal.archives-ouvertes.fr/hal-00591993

D. Picard and P. Gosselin, Efficient image signatures and similarities using tensor products of local descriptors, Computer Vision and Image Understanding, vol.117, issue.6, 2013.
DOI : 10.1016/j.cviu.2013.02.004

URL : https://hal.archives-ouvertes.fr/hal-00799074

F. Plesse, A. Ginsca, B. Delezoide, and F. Prêteux, Visual relationship detection based on guided proposals and semantic knowledge distillation, 2018.

D. L. Poole and A. K. Mackworth, Artificial Intelligence: foundations of computational agents, 2010.
DOI : 10.1017/CBO9780511794797

A. Popescu, G. Etienne, and H. L. Borgne, Scalable domain adaptation of convolutional neural networks, 2015.

D. Putthividhy, H. T. Attias, and S. S. Nagarajan, Topic regression multi-modal Latent Dirichlet Allocation for image annotation, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010.
DOI : 10.1109/CVPR.2010.5540000

URL : http://research.goldenmetallic.com/cvpr10.pdf

F. Quanfu and C. Richard, Sparse deep feature representation for object detection from wearable cameras, British Machine Vision Conference, p.2017

A. Quattoni and A. Torralba, Recognizing indoor scenes, 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009.
DOI : 10.1109/CVPR.2009.5206537

URL : http://people.csail.mit.edu/torralba/publications/indoor.pdf

A. Rannen, R. Aljundi, M. B. Blaschko, and T. Tuytelaars, Encoder Based Lifelong Learning, 2017 IEEE International Conference on Computer Vision (ICCV), 2017.
DOI : 10.1109/ICCV.2017.148

URL : http://arxiv.org/pdf/1704.01920

N. Rasiwasia, J. C. Pereira, E. Coviello, G. Doyle, G. R. Lanckriet et al., A new approach to cross-modal multimedia retrieval, Proceedings of the international conference on Multimedia, MM '10, 2010.
DOI : 10.1145/1873951.1873987

URL : http://www.svcl.ucsd.edu/publications/conference/2010/acm2010/crossmodal.pdf

N. Rasiwasia, P. J. Moreno, and N. Vasconcelos, Bridging the gap: Query by semantic example, IEEE Transactions on Multimedia, 2007.

A. S. Razavian, H. Azizpour, J. Sullivan, and S. Carlsson, CNN Features Off-the-Shelf: An Astounding Baseline for Recognition, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2014.
DOI : 10.1109/CVPRW.2014.131

URL : http://arxiv.org/pdf/1403.6382.pdf

S. Rebuffi, H. Bilen, and A. Vedaldi, Learning multiple visual domains with residual adapters, Advances in Neural Information Processing Systems, NIPS, 2017.

S. Rebuffi, H. Bilen, and A. Vedaldi, Efficient parametrization of multi-domain deep neural networks, Computer Vision and Pattern Recognition, CVPR, 2018.

J. Redmon and A. Farhadi, YOLO9000: Better, Faster, Stronger, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
DOI : 10.1109/CVPR.2017.690

URL : http://arxiv.org/pdf/1612.08242

A. Romero, N. Ballas, S. E. Kahou, A. Chassang, C. Gatta et al., Fitnets: Hints for thin deep nets. arXiv preprint, 2014.

E. Rosch, Principles of categorization. Cognition and Categorization, 1978.

F. Rosenblatt, The perceptron: A probabilistic model for information storage and organization in the brain., Psychological Review, vol.65, issue.6, p.386, 1958.
DOI : 10.1037/h0042519

O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh et al., ImageNet Large Scale Visual Recognition Challenge, International Journal of Computer Vision, vol.1010, issue.1, 2015.
DOI : 10.1007/978-3-642-15555-0_11

URL : http://dspace.mit.edu/bitstream/1721.1/104944/1/11263_2015_Article_816.pdf

O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh et al., ImageNet Large Scale Visual Recognition Challenge, International Journal of Computer Vision, vol.1010, issue.1, 2015.
DOI : 10.1007/978-3-642-15555-0_11

URL : http://dspace.mit.edu/bitstream/1721.1/104944/1/11263_2015_Article_816.pdf

S. J. Russell and P. Norvig, Artificial intelligence: a modern approach. Malaysia, p.2016

A. A. Rusu, N. C. Rabinowitz, G. Desjardins, H. Soyer, J. Kirkpatrick et al., Progressive neural networks. arXiv preprint, 2016.

S. H. Said, M. Tamaazousti, and A. Bartoli, Image-Based Models for Specularity Propagation in Diminished Reality, IEEE Transactions on Visualization and Computer Graphics, vol.24, issue.7, 2017.
DOI : 10.1109/TVCG.2017.2705687

URL : https://hal.archives-ouvertes.fr/hal-01657292

G. Salton and M. J. Mcgill, Introduction to modern information retrieval, 1986.

A. Salvador, N. Hynes, Y. Aytar, J. Marin, F. Ofli et al., Learning Cross-Modal Embeddings for Cooking Recipes and Food Images, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
DOI : 10.1109/CVPR.2017.327

N. Semmar, A hybrid approach for automatic extraction of bilingual multiword expressions from parallel corpora, Conference on Language Resources and Evaluation, LREC, 2018.

P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus et al., Overfeat: Integrated recognition , localization and detection using convolutional networks, International Conference on Learning Representations, ICLR, 2014.

A. Shabou and H. L. Borgne, Locality-constrained and spatially regularized coding for scene categorization, 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012.
DOI : 10.1109/CVPR.2012.6248107

S. Shatford, Analyzing the Subject of a Picture: A Theoretical Approach, Cataloging & Classification Quarterly, vol.6, issue.3, pp.39-62, 1986.
DOI : 10.1300/J104v06n03_04

K. Simonyan, A. Vedaldi, and A. Zisserman, Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312, 2013.

K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, International Conference on Learning Representations, ICLR, 2015.

J. Sivic and A. Zisserman, Video Google: a text retrieval approach to object matching in videos, Proceedings Ninth IEEE International Conference on Computer Vision, 2003.
DOI : 10.1109/ICCV.2003.1238663

A. W. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain, Content-based image retrieval at the end of the early years. Pattern Analysis and Machine Intelligence, 2000.

R. Socher, A. Karpathy, Q. V. Le, C. D. Manning, and A. Y. Ng, Grounded compositional semantics for finding and describing images with sentences, Transactions of the Association of Computational Linguistics, 2014.

N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, Dropout: A simple way to prevent neural networks from overfitting, Journal of Machine Learning Research, 2014.

N. Srivastava and R. R. Salakhutdinov, Multimodal learning with deep boltzmann machines, Advances in Neural Information Processing Systems, NIPS, 2012.

P. Stone, R. Brooks, E. Brynjolfsson, O. E. Ryan-calo, G. Hager et al., one hundred year study on artificial intelligence: Report of the 2015-2016 study panel, 2016.

S. Subramanian, A. Trischler, Y. Bengio, and C. J. , Learning general purpose distributed sentence representations via large scale multi-task learning, International Conference on Learning Representations , ICLR, 2018.

D. Surís, A. Duarte, A. Salvador, J. Torres, X. Giró-i et al., Cross-modal embeddings for video and audio retrieval. arXiv preprint, 2018.

C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed et al., Going deeper with convolutions, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
DOI : 10.1109/CVPR.2015.7298594

M. Tamaazousti, L'ajustement de faisceaux contraint comme cadre d'unification des méthodes de localisation: applicationàapplication`applicationà la réalité augmentée sur des objets 3D, 2013.

M. Tamaazousti, S. Naudet-collette, V. Gay-bellile, S. Bourgeois, B. Besbes et al., The constrained SLAM framework for non-instrumented augmented reality, Multimedia Tools and Applications, vol.16, issue.3, 2016.
DOI : 10.1007/978-3-540-74272-2_3

URL : https://hal.archives-ouvertes.fr/cea-01830529

Y. Tamaazousti, H. L. Borgne, and C. Hudelot, Agrégation de descripteurs sémantiques locaux contraints par parcimonie basée sur le contenu, Reconnaissance des Formes et Intelligence Artificielle

Y. Tamaazousti, H. L. Borgne, and C. Hudelot, DescripteursàDescripteurs`Descripteursà divers niveaux de concepts pour la classification d'images multi-objets, Reconnaissance des Formes et Intelligence Artificielle

Y. Tamaazousti, H. L. Borgne, and C. Hudelot, Diverse Concept-Level Features for Multi-Object Classification, Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval, ICMR '16, 2016.
DOI : 10.1111/j.1467-9868.2005.00532.x

URL : https://hal.archives-ouvertes.fr/cea-01813723

Y. Tamaazousti, H. L. Borgne, and C. Hudelot, MuCaLe-Net: Multi Categorical-Level Networks to Generate More Discriminating Features, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
DOI : 10.1109/CVPR.2017.561

URL : https://hal.archives-ouvertes.fr/cea-01841669

Y. Tamaazousti, H. Le-borgne, C. Hudelot, M. E. Seddik, and M. Tamaazousti, Learning more universal representations for transfer-learning, 2017.

Y. Tamaazousti, H. L. Borgne, and A. Popescu, Constrained Local Enhancement of Semantic Features by Content-Based Sparsity, Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval, ICMR '16, 2016.
DOI : 10.1145/2733373.2806244

URL : https://hal.archives-ouvertes.fr/cea-01813602

Y. Tamaazousti, H. Le-borgne, A. Popescu, E. Gadeski, A. Ginsca et al., Vision-language integration using constrained local semantic features, Computer Vision and Image Understanding, vol.163, 2017.
DOI : 10.1016/j.cviu.2017.05.017

URL : https://hal.archives-ouvertes.fr/cea-01803830

Y. Tamaazousti, H. Le-borgne, A. Popescu, E. Gadeski, A. L. Ginsca et al., Déscripteur sémantique local contraint basé sur un rnc diversifié, p.2017

J. W. Tanaka and M. Taylor, Object categories and expertise: Is the basic level in the eye of the beholder? Cognitive Psychology, 1991.

S. J. Thorpe and M. Fabre-thorpe, Seeking categories in the brain, Science, issue.5502, pp.291260-263, 2001.

K. Todorov, C. Hudelot, A. Popescu, and P. Geibel, FUZZY ONTOLOGY ALIGNMENT USING BACKGROUND KNOWLEDGE, International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, vol.77, issue.1, 2014.
DOI : 10.1016/S0166-218X(01)00290-6

URL : https://hal.archives-ouvertes.fr/lirmm-01350553

K. Todorov, N. James, and C. Hudelot, Multimedia ontology matching by using visual and textual modalities, Transactions on Multimedia Tools and Applications, 2013.
DOI : 10.1109/TMM.2007.900156

URL : https://hal.archives-ouvertes.fr/hal-00824573

L. Torresani, M. Szummer, and A. Fitzgibbon, Efficient Object Category Recognition Using Classemes, European Conference on Computer Vision, ECCV, 2010.
DOI : 10.1007/978-3-642-15549-9_56

T. Q. Tran, H. L. Borgne, and M. Crucianu, Combining generic and specific information for crossmodal retrieval, International Conference on Multimedia Retrieval, ICMR, 2015.
DOI : 10.1145/2671188.2749348

URL : https://hal.archives-ouvertes.fr/cea-01813724

T. Q. Tran, H. L. Borgne, and M. Crucianu, Aggregating Image and Text Quantized Correlated Components, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
DOI : 10.1109/CVPR.2016.225

URL : https://hal.archives-ouvertes.fr/cea-01843176

T. Q. Tran, H. L. Borgne, and M. Crucianu, Cross-modal Classification by Completing Unimodal Representations, Proceedings of the 2016 ACM workshop on Vision and Language Integration Meets Multimedia Fusion, iV&L-MM '16, 2016.
DOI : 10.1109/CVPR.2009.5206816

URL : https://hal.archives-ouvertes.fr/cea-01840417

M. Turk and A. Pentland, Eigenfaces for Recognition, Journal of Cognitive Neuroscience, vol.10, issue.9, 1991.
DOI : 10.1007/BF00239352

J. R. Uijlings, K. E. Van-de-sande, T. Gevers, and A. W. Smeulders, Selective Search for Object Recognition, International Journal of Computer Vision, vol.57, issue.1, p.2013
DOI : 10.1023/B:VISI.0000013087.49260.fb

URL : https://pure.uva.nl/ws/files/19494140/UijlingsIJCV2013.pdf

D. C. Van-essen and J. L. Gallant, Neural mechanisms of form and motion processing in the primate visual system, Neuron, vol.13, issue.1, pp.1-10, 1994.
DOI : 10.1016/0896-6273(94)90455-3

J. C. Van-gemert, C. J. Veenman, A. W. Smeulders, and J. Geusebroek, Visual word ambiguity. Pattern Analysis and Machine Intelligence, PAMI, 2010.

A. Veit, M. J. Wilber, and S. Belongie, Residual networks behave like ensembles of relatively shallow networks, Advances in Neural Information Processing Systems, NIPS, 2016.

S. Venugopalan, H. Xu, J. Donahue, M. Rohrbach, R. Mooney et al., Translating Videos to Natural Language Using Deep Recurrent Neural Networks, Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2014.
DOI : 10.3115/v1/N15-1173

URL : https://doi.org/10.3115/v1/n15-1173

P. Vo, A. L. Ginsca, H. L. Borgne, and A. Popescu, Effective training of convolutional networks using noisy Web images, 2015 13th International Workshop on Content-Based Multimedia Indexing (CBMI), 2015.
DOI : 10.1109/CBMI.2015.7153607

URL : https://hal.archives-ouvertes.fr/hal-01185757

P. D. Vo, A. Ginsca, H. L. Borgne, and A. Popescu, On deep representation learning from noisy web images, 2015.

P. D. Vo, A. Ginsca, H. L. Borgne, and A. Popescu, Harnessing noisy Web images for deep representation, Computer Vision and Image Understanding, vol.164, p.2017
DOI : 10.1016/j.cviu.2017.01.009

URL : https://hal.archives-ouvertes.fr/cea-01756775

C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie, The caltech-ucsd birds, 2011.

G. Wang, D. Hoiem, and D. Forsyth, Learning image similarity from Flickr groups using Stochastic Intersection Kernel MAchines, 2009 IEEE 12th International Conference on Computer Vision, 2009.
DOI : 10.1109/ICCV.2009.5459167

H. Wang, H. Wang, and K. Xu, Categorizing concepts with basic level for vision-to-language, Computer Vision and Pattern Recognition, CVPR, 2018.

J. Wang, Q. Qin, Z. Li, X. Ye, J. Wang et al., Deep hierarchical representation and segmentation of high resolution remote sensing images, 2015 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), 2015.
DOI : 10.1109/IGARSS.2015.7326782

J. Wang, J. Yang, K. Yu, F. Lv, T. Huang et al., Locality-constrained Linear Coding for image classification, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010.
DOI : 10.1109/CVPR.2010.5540018

L. Wang, Y. Li, J. Huang, and S. Lazebnik, Learning two-branch neural networks for image-text matching tasks. Pattern Analysis and Machine Intelligence, 2018.

L. Wang, Y. Li, and S. Lazebnik, Learning Deep Structure-Preserving Image-Text Embeddings, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
DOI : 10.1109/CVPR.2016.541

Y. Wang, D. Ramanan, and M. Hebert, Growing a Brain: Fine-Tuning by Increasing Model Capacity, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
DOI : 10.1109/CVPR.2017.323

Y. Wei, W. Xia, J. Huang, B. Ni, J. Dong et al., Cnn: Single-label to multi-label, 2014.

J. Weston, S. Bengio, and N. Usunier, Wsabie: Scaling up to large vocabulary image annotation, IJCAI, IJCAI, 2011.

Y. Wu, J. Li, Y. Kong, and Y. Fu, Deep Convolutional Neural Network with Independent Softmax for Large Scale Face Recognition, Proceedings of the 2016 ACM on Multimedia Conference, MM '16, 2016.
DOI : 10.1109/CVPR.2015.7298594

S. Xie, R. Girshick, P. Dollár, Z. Tu, and K. He, Aggregated Residual Transformations for Deep Neural Networks, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), p.2017
DOI : 10.1109/CVPR.2017.634

B. Xu, N. Wang, T. Chen, and M. Li, Empirical evaluation of rectified activations in convolutional network, 2015.

D. Xu, Y. Zhu, C. B. Choy, and L. Fei-fei, Scene Graph Generation by Iterative Message Passing, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
DOI : 10.1109/CVPR.2017.330

F. Yan and K. Mikolajczyk, Deep correlation for matching images and text, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
DOI : 10.1109/CVPR.2015.7298966

Z. Yan, H. Zhang, R. Piramuthu, V. Jagadeesh, D. Decoste et al., HD-CNN: Hierarchical Deep Convolutional Neural Networks for Large Scale Visual Recognition, 2015 IEEE International Conference on Computer Vision (ICCV), 2015.
DOI : 10.1109/ICCV.2015.314

H. Yang, J. T. Zhou, Y. Zhang, B. Gao, J. Wu et al., Exploit Bounding Box Annotations for Multi-Label Object Recognition, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
DOI : 10.1109/CVPR.2016.37

J. Yang, K. Yu, Y. Gong, and T. Huang, Linear spatial pyramid matching using sparse coding for image classification, Computer Vision and Pattern Recognition, CVPR, 2009.

B. Yao, X. Jiang, A. Khosla, A. L. Lin, L. Guibas et al., Human action recognition by learning bases of action attributes and parts, 2011 International Conference on Computer Vision, 2011.
DOI : 10.1109/ICCV.2011.6126386

X. Yin, J. Han, J. Yang, and P. S. Yu, Efficient classification across multiple database relations: a CrossMine approach, IEEE Transactions on Knowledge and Data Engineering, vol.18, issue.6, 2006.
DOI : 10.1109/TKDE.2006.94

J. Yosinski, J. Clune, Y. Bengio, and H. Lipson, How transferable are features in deep neural networks?, Advances in Neural Information Processing Systems, NIPS, 2014.

J. Yosinski, J. Clune, A. Nguyen, T. Fuchs, and H. Lipson, Understanding neural networks through deep visualization, International Conference on Machine Learning, ICML, 2015.

P. Young, A. Lai, M. Hodosh, and J. Hockenmaier, From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions, 2014.

R. Yu, A. Li, V. I. Morariu, and L. S. Davis, Visual Relationship Detection with Internal and External Linguistic Knowledge Distillation, 2017 IEEE International Conference on Computer Vision (ICCV), 2017.
DOI : 10.1109/ICCV.2017.121

S. Zagoruyko and N. Komodakis, Wide residual networks. arXiv preprint, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01832503

A. R. Zamir, A. Sax, W. Shen, L. Guibas, J. Malik et al., Taskonomy: Disentangling task transfer learning, Computer Vision and Pattern Recognition, CVPR, 2018.

M. D. Zeiler and R. Fergus, Visualizing and Understanding Convolutional Networks, European Conference on Computer Vision, ECCV, 2014.
DOI : 10.1007/978-3-319-10590-1_53

C. Zhang, S. Bengio, M. Hardt, B. Recht, and O. Vinyals, Understanding deep learning requires rethinking generalization, International Conference on Learning Representations, p.2017

J. Zhang, S. Sclaroff, Z. Lin, X. Shen, B. Price et al., Unconstrained Salient Object Detection via Proposal Subset Optimization, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
DOI : 10.1109/CVPR.2016.618

X. Zhang, Z. Li, C. C. Loy, and D. Lin, PolyNet: A Pursuit of Structural Diversity in Very Deep Networks, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
DOI : 10.1109/CVPR.2017.415

B. Zhou, D. Bau, A. Oliva, and A. Torralba, Interpreting deep visual representations via network dissection, 2017.

B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba, Object detectors emerge in deep scene cnns, International Conference on Learning Representations, ICLR, 2015.

B. Zhou, A. Lapedriza, J. Xiao, A. Torralba, and A. Oliva, Learning deep features for scene recognition using places database, Advances in Neural Information Processing Systems, NIPS, 2014.

A. Znaidia, A. Shabou, H. Le-borgne, C. Hudelot, and N. Paragios, Bag-of-multimedia-words for image classification, ICPR, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00825187