T. Durand, N. Thome, M. Cord, and S. Avila, Image classification using object detectors, 2013 IEEE International Conference on Image Processing, p.2013
DOI : 10.1109/ICIP.2013.6738894

URL : https://hal.archives-ouvertes.fr/hal-01078079

A. Dutt, D. Pellerin, and G. Quenot, Improving Image Classification using Coarse and Fine Labels, Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval , ICMR '17, p.2017
DOI : 10.1109/ICCV.2013.260

URL : https://hal.archives-ouvertes.fr/hal-01590672

A. Eisenschtat and L. Wolf, Linking Image and Text with 2-Way Nets, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
DOI : 10.1109/CVPR.2017.201

URL : http://arxiv.org/pdf/1608.07973

M. Engilberge, L. Chevallier, P. Pérez, and M. Cord, Finding beans in burgers: Deep semantic-visual embedding with localization, Computer Vision and Pattern Recognition, CVPR, 2018.

D. Erhan, Y. Bengio, A. Courville, and P. Vincent, Visualizing higher-layer features of a deep network, 2009.

M. Everingham, L. Van-gool, C. K. Williams, J. Winn, and A. Zisserman, The Pascal Visual Object Classes (VOC) Challenge, International Journal of Computer Vision, vol.73, issue.2, 2012.
DOI : 10.1371/journal.pcbi.0040027

M. Everingham, L. Van-gool, C. K. Williams, J. Winn, and A. Zisserman, The Pascal Visual Object Classes (VOC) Challenge, International Journal of Computer Vision, vol.73, issue.2, 2010.
DOI : 10.1371/journal.pcbi.0040027

F. Faghri, D. J. Fleet, J. R. Kiros, and S. Fidler, Vse++: Improving visual-semantic embeddings with hard negatives. arXiv preprint, 2017.

L. Fei-fei, R. Fergus, and P. Perona, One-shot learning of object categories. Pattern Analysis and Machine Intelligence, PAMI, 2006.

Y. Feng and M. Lapata, Topic models for image annotation and text illustration, ACL Human Language Technologies, HLT, 2010.

X. Ling, S. Singh, and D. Weld, Design challenges for entity linking, Transactions of the Association for Computational Linguistics, 2015.

G. Litjens, T. Kooi, B. E. Bejnordi, A. A. Setio, F. Ciompi et al., A survey on deep learning in medical image analysis Medical image analysis, 2017.
DOI : 10.1016/j.media.2017.07.005

URL : http://arxiv.org/pdf/1702.05747

L. Liu, L. Wang, and X. Liu, In defense of soft-assignment coding, International Conference on Computer Vision, ICCV, 2011.

W. Liu, A. Rabinovich, and A. C. Berg, Parsenet: Looking wider to see better, International Conference on Learning Representations, ICLR Workshop, 2016.

Y. Liu, Y. Guo, E. M. Bakker, and M. S. Lew, Learning a Recurrent Residual Fusion Network for Multimodal Matching, 2017 IEEE International Conference on Computer Vision (ICCV), p.2017
DOI : 10.1109/ICCV.2017.442

K. Longi, T. Pulkkinen, and A. Klami, Semi-supervised convolutional neural networks for identifying wi-fi interference sources, Asian Conference on Machine Learning, ACML, 2017.

D. G. Lowe, Distinctive Image Features from Scale-Invariant Keypoints, International Journal of Computer Vision, vol.60, issue.2, 2004.
DOI : 10.1023/B:VISI.0000029664.99615.94

URL : http://www.cs.ubc.ca/~lowe/papers/ijcv03.ps

C. Lu, R. Krishna, M. S. Bernstein, and F. Li, Visual Relationship Detection with Language Priors, European Conference on Computer Vision, ECCV, 2016.
DOI : 10.1023/B:VISI.0000029664.99615.94

URL : http://arxiv.org/pdf/1608.00187

D. P. Carvalho and R. Cadene, Cross-Modal Retrieval in the Cooking Context, The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval , SIGIR '18, 2018.
DOI : 10.1109/CVPR.2015.7298966

Z. Ma, Y. Lu, and D. Foster, Finding linear structure in large datasets with scalable canonical correlation analysis, International Conference on Machine Learning, ICML, 2015.

A. L. Maas, A. Y. Hannun, and A. Y. Ng, Rectifier nonlinearities improve neural network acoustic models, International Conference on Machine Learning, ICML, 2013.

A. Mahendran and A. Vedaldi, Understanding deep image representations by inverting them, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
DOI : 10.1109/CVPR.2015.7299155

URL : http://arxiv.org/pdf/1412.0035

T. Malisiewicz and A. A. Efros, Beyond categories: The visual memex model for reasoning about object relationships, Advances in Neural Information Processing Systems, NIPS, 2009.

S. Mallat, A Wavelet Tour of Signal Processing, 1999.

A. Mallya and S. Lazebnik, Packnet: Adding multiple tasks to a single network by iterative pruning. arXiv preprint, 2017.

F. Manessi, A. Rozza, S. Bianco, P. Napoletano, and R. Schettini, Automated pruning for deep neural network compression, 2017.

J. Mao, W. Xu, Y. Yang, J. Wang, Z. Huang et al., Deep captioning with multimodal recurrent neural networks (m-rnn). arXiv preprint, 2014.

K. Marino, R. Salakhutdinov, and A. Gupta, The More You Know: Using Knowledge Graphs for Image Classification, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
DOI : 10.1109/CVPR.2017.10

URL : http://arxiv.org/pdf/1612.04844

A. Mathews, L. Xie, and X. He, Choosing Basic-Level Concept Names Using Visual and Language Context, 2015 IEEE Winter Conference on Applications of Computer Vision, p.2015
DOI : 10.1109/WACV.2015.85

URL : http://users.cecs.anu.edu.au/%7Exlx/papers/wacv2015.pdf

W. S. Mcculloch and W. Pitts, A logical calculus of the ideas immanent in nervous activity. The bulletin of mathematical biophysics, pp.115-133, 1943.

S. Meftah, N. Semmar, and F. Sadat, A neural network model for part-of-speech tagging of social media texts, Conference on Language Resources and Evaluation, LREC, 2018.

P. Mettes, D. Koelma, and C. G. Snoek, The ImageNet Shuffle, Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval, ICMR '16, 2016.
DOI : 10.1145/2733373.2806221

URL : http://arxiv.org/pdf/1602.07119

T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, Distributed representations of words and phrases and their compositionality, Advances in Neural Information Processing Systems, NIPS, 2013.

G. A. Miller, WordNet: a lexical database for English, Communications of the ACM, vol.38, issue.11, pp.39-41, 1995.
DOI : 10.1145/219717.219748

M. Minsky and S. Papert, Perceptrons: An Introduction to Computational Geometry, 1969.

P. Molchanov, S. Tyree, T. Karras, T. Aila, and J. Kautz, Pruning convolutional neural networks for resource efficient inference, International Conference on Learning Representations, ICLR, 2016.

F. Monay and D. Gatica-perez, Modeling semantic aspects for cross-media image indexing. Pattern Analysis and Machine Intelligence, PAMI, 2007.
DOI : 10.1109/tpami.2007.1097

URL : http://publications.idiap.ch/downloads/papers/2007/monay-pami-2007.pdf

A. Morgand and M. Tamaazousti, Generic and real-time detection of specular reflections in images, International Conference on Computer Vision Theory and Applications, p.2014

A. Morgand, M. Tamaazousti, and A. Bartoli, An Empirical Model for Specularity Prediction with Application to Dynamic Retexturing, 2016 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), p.2016
DOI : 10.1109/ISMAR.2016.13

A. Morgand, M. Tamaazousti, and A. Bartoli, A Multiple-View Geometric Model of Specularities on Non-Planar Shapes with Application to Dynamic Retexturing, IEEE Transactions on Visualization and Computer Graphics, vol.23, issue.11, 2017.
DOI : 10.1109/TVCG.2017.2734538

URL : https://hal.archives-ouvertes.fr/hal-01657253

H. Murase and S. K. Nayar, Visual learning and recognition of 3-d objects from appearance, International Journal of Computer Vision, vol.37, issue.10, 1995.
DOI : 10.1007/BF01421486

V. N. Murthy, V. Singh, T. Chen, R. Manmatha, and D. Comaniciu, Deep decision network for multiclass image classification, Computer Vision and Pattern Recognition, CVPR, 2016.
DOI : 10.1109/cvpr.2016.246

H. Nam, J. Ha, and J. Kim, Dual Attention Networks for Multimodal Reasoning and Matching, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
DOI : 10.1109/CVPR.2017.232

URL : http://arxiv.org/pdf/1611.00471

A. P. Natsev, M. R. Naphade, and J. R. Smith, Semantic representation, Proceedings of the 2004 ACM SIGKDD international conference on Knowledge discovery and data mining , KDD '04, 2004.
DOI : 10.1145/1014052.1014133

A. Oliva and A. Torralba, Modeling the shape of the scene: A holistic representation of the spatial envelope, International Journal of Computer Vision, 2001.

M. Oquab, L. Bottou, I. Laptev, and J. Sivic, Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks, 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014.
DOI : 10.1109/CVPR.2014.222

URL : https://hal.archives-ouvertes.fr/hal-00911179

M. Oquab, L. Bottou, I. Laptev, and J. Sivic, Is object localization for free? - Weakly-supervised learning with convolutional neural networks, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
DOI : 10.1109/CVPR.2015.7298668

URL : https://hal.archives-ouvertes.fr/hal-01015140

V. Ordonez, J. Deng, Y. Choi, A. C. Berg, and T. Berg, From large scale image categorization to entrylevel categories, International Conference on Computer Vision, ICCV, 2013.
DOI : 10.1109/iccv.2013.344

URL : http://www.cs.unc.edu/~vicente/files/entrylevel.pdf

V. Ordonez, W. Liu, J. Deng, Y. Choi, A. C. Berg et al., Predicting Entry-Level Categories, International Journal of Computer Vision, vol.30, issue.1, p.2015
DOI : 10.1145/1101826.1101838

W. Ouyang, X. Wang, C. Zhang, and X. Yang, Factors in Finetuning Deep Model for Object Detection with Long-Tail Distribution, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
DOI : 10.1109/CVPR.2016.100

M. Palatucci, D. Pomerleau, G. E. Hinton, and T. M. Mitchell, Zero-shot learning with semantic output codes, Advances in Neural Information Processing Systems, NIPS, 2009.

S. J. Pan and Q. Yang, A Survey on Transfer Learning, IEEE Transactions on Knowledge and Data Engineering, vol.22, issue.10, 2010.
DOI : 10.1109/TKDE.2009.191

URL : http://www.cs.ust.hk/~sinnopan/publications/TLsurvey_0822.pdf

D. Pathak, R. Girshick, P. Dollár, T. Darrell, and B. Hariharan, Learning Features by Watching Objects Move, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
DOI : 10.1109/CVPR.2017.638

URL : http://arxiv.org/pdf/1612.06370

D. Pathak, P. Krahenbuhl, J. Donahue, T. Darrell, and A. A. Efros, Context Encoders: Feature Learning by Inpainting, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
DOI : 10.1109/CVPR.2016.278

URL : http://arxiv.org/pdf/1604.07379

H. Peng, F. Long, and C. Ding, Feature selection based on mutual information criteria of maxdependency , max-relevance, and min-redundancy. Pattern Analysis and Machine Intelligence, 2005.

F. Perronnin and C. Dance, Fisher Kernels on Visual Vocabularies for Image Categorization, 2007 IEEE Conference on Computer Vision and Pattern Recognition, 2007.
DOI : 10.1109/CVPR.2007.383266

URL : http://www.xrce.xerox.com/Publications/Attachments/2006-034/2006-034.pdf

F. Perronnin, J. Sánchez, and T. Mensink, Improving the Fisher Kernel for Large-Scale Image Classification, European Conference on Computer Vision, ECCV, 2010.
DOI : 10.1007/978-3-642-15561-1_11

URL : https://hal.archives-ouvertes.fr/inria-00548630

D. Picard and P. Gosselin, Improving image similarity with vectors of locally aggregated tensors, 2011 18th IEEE International Conference on Image Processing, 2011.
DOI : 10.1109/ICIP.2011.6116641

URL : https://hal.archives-ouvertes.fr/hal-00591993

D. Picard and P. Gosselin, Efficient image signatures and similarities using tensor products of local descriptors, Computer Vision and Image Understanding, vol.117, issue.6, 2013.
DOI : 10.1016/j.cviu.2013.02.004

URL : https://hal.archives-ouvertes.fr/hal-00799074

F. Plesse, A. Ginsca, B. Delezoide, and F. Prêteux, Visual relationship detection based on guided proposals and semantic knowledge distillation, 2018.

D. L. Poole and A. K. Mackworth, Artificial Intelligence: foundations of computational agents, 2010.
DOI : 10.1017/CBO9780511794797

A. Popescu, G. Etienne, and H. L. Borgne, Scalable domain adaptation of convolutional neural networks, 2015.

D. Putthividhy, H. T. Attias, and S. S. Nagarajan, Topic regression multi-modal Latent Dirichlet Allocation for image annotation, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010.
DOI : 10.1109/CVPR.2010.5540000

URL : http://research.goldenmetallic.com/cvpr10.pdf

F. Quanfu and C. Richard, Sparse deep feature representation for object detection from wearable cameras, British Machine Vision Conference, p.2017

A. Quattoni and A. Torralba, Recognizing indoor scenes, 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009.
DOI : 10.1109/CVPR.2009.5206537

URL : http://people.csail.mit.edu/torralba/publications/indoor.pdf

A. Rannen, R. Aljundi, M. B. Blaschko, and T. Tuytelaars, Encoder Based Lifelong Learning, 2017 IEEE International Conference on Computer Vision (ICCV), 2017.
DOI : 10.1109/ICCV.2017.148

URL : http://arxiv.org/pdf/1704.01920

N. Rasiwasia, J. C. Pereira, E. Coviello, G. Doyle, G. R. Lanckriet et al., A new approach to cross-modal multimedia retrieval, Proceedings of the international conference on Multimedia, MM '10, 2010.
DOI : 10.1145/1873951.1873987

URL : http://www.svcl.ucsd.edu/publications/conference/2010/acm2010/crossmodal.pdf

N. Rasiwasia, P. J. Moreno, and N. Vasconcelos, Bridging the gap: Query by semantic example, IEEE Transactions on Multimedia, 2007.

A. S. Razavian, H. Azizpour, J. Sullivan, and S. Carlsson, CNN Features Off-the-Shelf: An Astounding Baseline for Recognition, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2014.
DOI : 10.1109/CVPRW.2014.131

URL : http://arxiv.org/pdf/1403.6382.pdf

S. Rebuffi, H. Bilen, and A. Vedaldi, Learning multiple visual domains with residual adapters, Advances in Neural Information Processing Systems, NIPS, 2017.

S. Rebuffi, H. Bilen, and A. Vedaldi, Efficient parametrization of multi-domain deep neural networks, Computer Vision and Pattern Recognition, CVPR, 2018.

J. Redmon and A. Farhadi, YOLO9000: Better, Faster, Stronger, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
DOI : 10.1109/CVPR.2017.690

URL : http://arxiv.org/pdf/1612.08242

A. Romero, N. Ballas, S. E. Kahou, A. Chassang, C. Gatta et al., Fitnets: Hints for thin deep nets. arXiv preprint, 2014.

E. Rosch, Principles of categorization. Cognition and Categorization, 1978.

F. Rosenblatt, The perceptron: A probabilistic model for information storage and organization in the brain., Psychological Review, vol.65, issue.6, p.386, 1958.
DOI : 10.1037/h0042519

O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh et al., ImageNet Large Scale Visual Recognition Challenge, International Journal of Computer Vision, vol.1010, issue.1, 2015.
DOI : 10.1007/978-3-642-15555-0_11

URL : http://dspace.mit.edu/bitstream/1721.1/104944/1/11263_2015_Article_816.pdf

O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh et al., ImageNet Large Scale Visual Recognition Challenge, International Journal of Computer Vision, vol.1010, issue.1, 2015.
DOI : 10.1007/978-3-642-15555-0_11

URL : http://dspace.mit.edu/bitstream/1721.1/104944/1/11263_2015_Article_816.pdf

S. J. Russell and P. Norvig, Artificial intelligence: a modern approach. Malaysia, p.2016