M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen et al., Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, 2015.

M. Jamal-afridi, A. Ross, and E. M. Shapiro, On automated source selection for transfer learning in convolutional neural networks, Pattern Recognition, vol.73, pp.65-75, 2018.

M. Arjovsky, S. Chintala, and L. Bottou, Wasserstein generative adversarial networks, International Conference on Machine Learning (ICML), pp.214-223, 2017.

Y. Aytar and A. Zisserman, Tabula rasa: Model transfer for object category detection, IEEE International Conference on Computer Vision (ICCV), pp.2252-2259, 2011.

R. Francis, . Bach, R. G. Gert, and M. Lanckriet, Multiple kernel learning, conic duality, and the SMO algorithm, International Conference on Machine Learning (ICML), p.6, 2004.

V. Badrinarayanan, A. Kendall, and R. Cipolla, Segnet: A deep convolutional encoder-decoder architecture for image segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol.39, issue.12, pp.2481-2495, 2017.

A. Valerio-miceli, B. Barone, U. Haddow, R. Germann, and . Sennrich, Regularization techniques for fine-tuning in neural machine translation, Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp.1489-1494, 2017.

J. Baxter, A bayesian/information theoretic model of learning to learn via multiple task sampling, Machine learning, vol.28, pp.7-39, 1997.

M. I. Belghazi, A. Baratin, S. Rajeshwar, S. Ozair, Y. Bengio et al., Mutual information neural estimation, International Conference on Machine Learning (ICML), pp.530-539, 2018.

F. Bonnans and A. Shapiro, Optimization problems with perturbations: A guided tour, SIAM review, vol.40, issue.2, pp.228-264, 1998.
URL : https://hal.archives-ouvertes.fr/inria-00073819

L. Bossard, M. Guillaumin, and L. Van-gool, Food-101-mining discriminative components with random forests, European Conference on Computer Vision (ECCV), pp.446-461, 2014.

O. Bousquet, S. Gelly, I. Tolstikhin, C. Simon-gabriel, and B. Schoelkopf, From optimal transport to generative modeling: the vegan cookbook, 2017.

H. Valerio-cappellini, W. Sommers, and K. Bruzda, Random bistochastic matrices, Journal of Physics A: Mathematical and Theoretical, vol.42, issue.36, p.365209, 2009.

R. Caruana, Multitask learning. Machine learning, vol.28, pp.41-75, 1997.

C. Chelba and A. Acero, Adaptation of maximum entropy capitalizer: Little data can help a lot, Computer Speech & Language, vol.20, issue.4, pp.382-399, 2006.

L. Chen, G. Papandreou, F. Schroff, and H. Adam, Rethinking atrous convolution for semantic image segmentation, 2017.

L. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs, IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol.40, issue.4, pp.834-848, 2018.

Y. Liang-chieh-chen, G. Zhu, F. Papandreou, H. Schroff, and . Adam, Encoder-decoder with atrous separable convolution for semantic image segmentation, Proceedings of the European Conference on Computer Vision (ECCV), pp.801-818, 2018.

L. Chen, S. Dai, C. Tao, H. Zhang, Z. Gan et al., Adversarial text generation via feature-mover's distance, Advances in Neural Information Processing Systems (NIPS), pp.4666-4677, 2018.

Z. Chen, V. Badrinarayanan, C. Lee, and A. Rabinovich, Gradnorm: Gradient normalization for adaptive loss balancing in deep multitask networks, International Conference on Machine Learning (ICML), pp.793-802, 2018.

J. Cheng, Y. Tsai, S. Wang, and M. Yang, SegFlow: Joint learning for video object segmentation and optical flow, IEEE International Conference on Computer Vision (ICCV), pp.686-695, 2017.

M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler et al., The Cityscapes dataset for semantic urban scene understanding, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.3213-3223, 2016.

N. Courty and R. Flamary, Devis Tuia, and Alain Rakotomamonjy. Optimal transport for domain adaptation, IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol.39, pp.1853-1865, 2017.

Y. Cui, Y. Song, C. Sun, A. Howard, and S. Belongie, Large scale fine-grained categorization and domain-specific transfer learning, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.4109-4118, 2018.

M. Cuturi, Sinkhorn distances: Lightspeed computation of optimal transport, Advances in Neural Information Processing Systems (NIPS), pp.2292-2300, 2013.

J. Deng, W. Dong, R. Socher, L. Li, K. Li et al., Imagenet: A large-scale hierarchical image database, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.248-255, 2009.

Z. Ding, M. Shao, and Y. Fu, Incomplete multisource transfer learning, IEEE Transactions on Neural Networks and Learning Systems, vol.29, issue.2, pp.310-323, 2018.

J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang et al., Decaf: A deep convolutional activation feature for generic visual recognition, International Conference on Machine Learning (ICML), pp.647-655, 2014.

D. Monroe, . Donsker, and . Sr-srinivasa-varadhan, Asymptotic evaluation of certain markov process expectations for large time, iv. Communications on Pure and Applied Mathematics, vol.36, issue.2, pp.183-212, 1983.

A. Dosovitskiy, P. Fischer, E. Ilg, P. Hausser, C. Hazirbas et al., FlowNet: Learning optical flow with convolutional networks, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.2758-2766, 2015.

J. Duchi and Y. Singer, Efficient online and batch learning using forward backward splitting, Journal of Machine Learning Research, vol.10, pp.2899-2934, 2009.

J. Duchi, E. Hazan, and Y. Singer, Adaptive subgradient methods for online learning and stochastic optimization, Journal of Machine Learning Research, vol.12, pp.2121-2159, 2011.

M. Everingham, L. Van-gool, K. I. Christopher, J. Williams, A. Winn et al., The PASCAL visual object classes (VOC) challenge, International Journal of Computer Vision, vol.88, issue.2, pp.303-338, 2010.

L. Fei-fei, R. Fergus, and P. Perona, One-shot learning of object categories, IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol.28, issue.4, pp.594-611, 2006.

C. Frogner, C. Zhang, H. Mobahi, M. Araya, and . Poggio, Learning with a wasserstein loss, Advances in Neural Information Processing Systems (NIPS), pp.2053-2061, 2015.

N. Frosst, N. Papernot, and G. Hinton, Analyzing and improving representations with the soft nearest neighbor loss, International Conference on Machine Learning (ICML), 2019.

W. Ge and Y. Yu, Borrowing treasures from the wealthy: Deep transfer learning through selective joint fine-tuning, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.10-19, 2017.

A. Genevay, G. Peyre, and M. Cuturi, Learning generative models with sinkhorn divergences, International Conference on Artificial Intelligence and Statistics (AISTATS), pp.1608-1617, 2018.

R. Girshick, Fast R-CNN, IEEE International Conference on Computer Vision (ICCV), pp.1440-1448, 2015.

R. Girshick, J. Donahue, T. Darrell, and J. Malik, Rich feature hierarchies for accurate object detection and semantic segmentation, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.580-587, 2014.

X. Glorot and Y. Bengio, Understanding the difficulty of training deep feedforward neural networks, International Conference on Artificial Intelligence and Statistics (AISTATS), pp.249-256, 2010.

I. Goodfellow, J. Pouget-abadie, M. Mirza, B. Xu, D. Warde-farley et al.,

, Advances in Neural Information Processing Systems (NIPS), pp.2672-2680, 2014.

I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. Adaptive Computation and Machine Learning, 2017.

G. Griffin, A. Holub, and P. Perona, Caltech-256 object category dataset, 2007.

I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. Courville, Improved training of wasserstein gans, Advances in Neural Information Processing Systems (NIPS), pp.5767-5777, 2017.

P. Bharath-hariharan, L. Arbeláez, S. Bourdev, J. Maji, and . Malik, Semantic contours from inverse detectors, IEEE International Conference on Computer Vision (ICCV), pp.991-998, 2011.

K. He, X. Zhang, S. Ren, and J. Sun, Spatial pyramid pooling in deep convolutional networks for visual recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol.37, issue.9, pp.1904-1916, 2015.

K. He, X. Zhang, S. Ren, and J. Sun, Deep residual learning for image recognition, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.770-778, 2016.

K. He, X. Zhang, S. Ren, and J. Sun, Identity mappings in deep residual networks, European Conference on Computer Vision (ECCV), pp.630-645, 2016.

K. He, G. Gkioxari, P. Dollár, and R. Girshick, Mask R-CNN

, IEEE International Conference on Computer Vision (ICCV), pp.2980-2988

, IEEE, 2017.

G. Hinton, O. Vinyals, and J. Dean, Distilling the knowledge in a neural network, NIPS Deep Learning and Representation Learning Workshop, 2015.

M. Holschneider, R. Kronland-martinet, J. Morlet, and P. Tchamitchian, A real-time algorithm for signal analysis with the help of the wavelet transform, Wavelets, pp.286-297, 1990.

G. Andrew, M. Howard, B. Zhu, D. Chen, W. Kalenichenko et al., Mobilenets: Efficient convolutional neural networks for mobile vision applications, 2017.

J. Hu, L. Shen, and G. Sun, Squeeze-and-excitation networks, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.7132-7141, 2018.

G. Huang, C. Guo, J. Matt, Y. Kusner, F. Sun et al., Supervised word mover's distance, Advances in Neural Information Processing Systems (NIPS), pp.4862-4870, 2016.

G. Huang, Z. Liu, L. Van-der-maaten, and K. Weinberger, Densely connected convolutional networks, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.4700-4708, 2017.

E. Ilg, N. Mayer, T. Saikia, M. Keuper, A. Dosovitskiy et al., Flownet 2.0: Evolution of optical flow estimation with deep networks, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.2462-2470, 2017.

S. Ioffe and C. Szegedy, Batch normalization: Accelerating deep network training by reducing internal covariate shift, International Conference on Machine Learning (ICML), pp.448-456, 2015.

S. Ji, W. Xu, M. Yang, and K. Yu, 3d convolutional neural networks for human action recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol.35, pp.221-231, 2013.

H. Jung, J. Ju, M. Jung, and J. Kim, Less-forgetting learning in deep neural networks, AAAI Conference on Artificial Intelligence, 2018.

L. Kantorovich, On the transfer of masses (in russian), Doklady Akademii Nauk, pp.227-229, 1942.

A. Kendall, Y. Gal, and R. Cipolla, Multi-task learning using uncertainty to weigh losses for scene geometry and semantics, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.7482-7491, 2018.

A. Khosla, N. Jayadevaprakash, B. Yao, and F. Li, Novel dataset for fine-grained image categorization: Stanford dogs, Proc. CVPR Workshop on Fine-Grained Visual Categorization (FGVC), 2011.

P. Diederik, J. Kingma, and . Ba, Adam: A method for stochastic optimization, International Conference on Learning Representations (ICLR), 2015.

J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins et al.,

. Grabska-barwinska, Overcoming catastrophic forgetting in neural networks, Proceedings of the National Academy of Sciences, vol.114, issue.13, pp.3521-3526, 2017.

J. Kohler, H. Daneshmand, A. Lucchi, T. Hofmann, M. Zhou et al., Exponential convergence rates for batch normalization: The power of length-direction decoupling in non-convex optimization, International Conference on Artificial Intelligence and Statistics (ICAIS), p.113, 2019.

J. Krause, M. Stark, J. Deng, and L. Fei-fei, 3D object representations for fine-grained categorization, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.554-561, 2013.

A. Krizhevsky, I. Sutskever, and G. E. Hinton, Imagenet classification with deep convolutional neural networks, Advances in Neural Information Processing Systems (NIPS), pp.1097-1105, 2012.

M. Kusner, Y. Sun, N. Kolkin, and K. Weinberger, From word embeddings to document distances, International Conference on Machine Learning (ICML), pp.957-966, 2015.

S. Lawrence, L. Giles, A. Chung-tsoi, and A. Back, Face recognition: A convolutional neural-network approach, IEEE Transactions on Neural Networks, vol.8, issue.1, pp.98-113, 1997.

Y. Lecun, B. Boser, S. John, D. Denker, R. E. Henderson et al., Backpropagation applied to handwritten zip code recognition, Neural computation, vol.1, issue.4, pp.541-551, 1989.

L. Erich, G. Lehmann, and . Casella, Theory of point estimation, 1998.

J. L. Ba, J. R. Kiros, and G. E. Hinton, , 2016.

X. Li, H. Xiong, H. Wang, Y. Rao, L. Liu et al., Delta: Deep learning transfer using feature map with attention for convolutional networks, International Conference on Learning Representations (ICLR), 2019.

Z. Li and D. Hoiem, Learning without forgetting, IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol.40, issue.12, pp.2935-2947, 2017.

H. Liao, Speaker adaptation of context dependent deep neural networks, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.7947-7951, 2013.

M. Lin, Q. Chen, and S. Yan, Network in network, International Conference on Learning Representations (ICLR), 2014.
URL : https://hal.archives-ouvertes.fr/hal-01950552

T. Lin, M. Maire, S. Belongie, J. Hays, P. Perona et al., Microsoft COCO: Common objects in context, European Conference on Computer Vision (ECCV), pp.740-755, 2014.

W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed et al., Ssd: Single shot multibox detector, Proceedings of the European Conference on Computer Vision (ECCV), pp.21-37, 2016.

J. Long, E. Shelhamer, and T. Darrell, Fully convolutional networks for semantic segmentation, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.3431-3440, 2015.

M. Long, Y. Cao, J. Wang, and M. Jordan, Learning transferable features with deep adaptation networks, International Conference on Machine Learning (ICML), pp.97-105, 2015.

W. Luo, G. Alexander, R. Schwing, and . Urtasun, Efficient deep learning for stereo matching, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.5695-5703, 2016.

S. Maji, E. Rahtu, J. Kannala, M. Blaschko, and A. Vedaldi,

, Fine-grained visual classification of aircraft, 2013.

N. Martinel, G. L. Foresti, and C. Micheloni, Wide-slice residual networks for food recognition, Winter Conference on Applications of Computer Vision (WACV), pp.567-576, 2018.

N. Mayer, E. Ilg, P. Hausser, P. Fischer, D. Cremers et al., A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.4040-4048, 2016.

J. Mccormac, A. Handa, A. Davison, and S. Leutenegger, Semanticfusion: Dense 3d semantic mapping with convolutional neural networks, IEEE International Conference on Robotics and automation (ICRA), pp.4628-4635

, IEEE, 2017.

I. Misra, A. Shrivastava, A. Gupta, and M. Hebert, Cross-stitch networks for multi-task learning, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.3994-4003, 2016.

. Gaspard-monge, Mémoire sur la théorie des déblais et des remblais, Histoire de l'Académie Royale des Sciences, pp.666-704, 1781.

R. Mottaghi, X. Chen, X. Liu, N. Cho, S. Lee et al., The role of context for object detection and semantic segmentation in the wild, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.891-898, 2014.

E. Yurii and . Nesterov, A method for solving the convex programming problem with convergence rate o (1/k?2), In Dokl. akad. nauk Sssr, vol.269, pp.543-547, 1983.

M. Nilsback and A. Zisserman, Automated flower classification over a large mumber of classes, Proceedings of the Indian Conference on Computer Vision, Graphics and Image Processing, 2008.

H. Noh, S. Hong, and B. Han, Learning deconvolution network for semantic segmentation, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.1520-1528, 2015.

T. Ochiai, S. Matsuda, X. Lu, C. Hori, and S. Katagiri, Speaker adaptive training using deep neural networks, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.6349-6353

, IEEE, 2014.

M. Oquab, L. Bottou, I. Laptev, and J. Sivic, Learning and transferring mid-level image representations using convolutional neural networks, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.1717-1724, 2014.
URL : https://hal.archives-ouvertes.fr/hal-00911179

Q. Sinno-jialin-pan and . Yang, A survey on transfer learning, IEEE Transactions on Knowledge and Data Engineering, vol.22, issue.10, pp.1345-1359, 2010.

I. W. Sinno-jialin-pan, J. T. Tsang, Q. Kwok, and . Yang, Domain adaptation via transfer component analysis, IEEE Transactions on Neural Networks, vol.22, issue.2, pp.199-210, 2011.

G. Papandreou, I. Kokkinos, and P. Savalle, Modeling local and global deformations in deep learning: Epitomic convolution, multiple instance learning, and sliding window detection, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.390-399, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01263611

O. Pele and M. Werman, Fast and robust earth mover's distances, Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp.460-467, 2009.

A. Pentina and C. H. Lampert, Lifelong learning with non-iid tasks, Advances in Neural Information Processing Systems (NIPS), pp.1540-1548, 2015.

F. Perazzi, J. Pont-tuset, B. Mcwilliams, L. Van-gool, M. Gross et al., A benchmark dataset and evaluation methodology for video object segmentation, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.724-732, 2016.

G. Peyré, M. Cuturi, and J. Solomon, Gromov-wasserstein averaging of kernel and distance matrices, International Conference on Machine Learning (ICML), pp.2664-2672, 2016.

G. Peyré and M. Cuturi, , 2018.

H. Charles-r-qi, K. Su, L. J. Mo, and . Guibas, Pointnet: Deep learning on point sets for 3d classification and segmentation, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.652-660, 2017.

X. Qi, R. Liao, Z. Liu, R. Urtasun, and J. Jia, GeoNet: Geometric neural network for joint depth and surface normal estimation, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.283-291, 2018.

A. Quattoni and A. Torralba, Recognizing indoor scenes, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.413-420, 2009.

M. Raghu, J. Gilmer, J. Yosinski, and J. Sohl-dickstein, SVCCA: Singular vector canonical correlation analysis for deep learning dynamics and inter-117 BIBLIOGRAPHY pretability, Advances in Neural Information Processing Systems (NIPS), pp.6076-6085, 2017.

J. Redmon and A. Farhadi, YOLO9000: Better, faster, stronger, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.6517-6525, 2017.

J. Redmon and A. Farhadi, YOLOv3: An incremental improvement, 2018.

J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, You only look once: Unified, real-time object detection, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.779-788, 2016.

K. Shaoqing-ren, R. He, J. Girshick, and . Sun, Faster R-CNN: Towards real-time object detection with region proposal networks, Advances in Neural Information Processing Systems (NIPS), pp.91-99, 2015.

A. Rolet, M. Cuturi, and G. Peyré, Fast dictionary learning with a smoothed wasserstein loss, International Conference on Artificial Intelligence and Statistics (AISTATS), pp.630-638, 2016.

O. Ronneberger, P. Fischer, and T. Brox, U-net: Convolutional networks for biomedical image segmentation, International Conference on Medical image computing and computer-assisted intervention, pp.234-241, 2015.

A. Rozantsev, M. Salzmann, and P. Fua, Beyond sharing weights for deep domain adaptation, IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol.41, pp.801-814, 2019.

S. Ruder, An overview of gradient descent optimization algorithms, 2016.

A. Andrei, . Rusu, C. Neil, G. Rabinowitz, H. Desjardins et al., Razvan Pascanu, and Raia Hadsell. Progressive neural networks, 2016.

T. Salimans, H. Zhang, A. Radford, and D. Metaxas, Improving gans using optimal transport, International Conference on Learning Representations (ICLR), 2018.

F. Santambrogio, Optimal transport for applied mathematicians, vol.55, pp.58-63, 2015.

S. Santurkar, D. Tsipras, A. Ilyas, and A. Madry, How does batch normalization help optimization?, Advances in Neural Information Processing Systems (NIPS), pp.2483-2493, 2018.

P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus et al., Overfeat: Integrated recognition, localization and detection using convolutional networks, International Conference on Learning Representations (ICLR), 2014.

A. Sharif-razavian, H. Azizpour, J. Sullivan, and S. Carlsson, CNN features off-the-shelf: an astounding baseline for recognition, IEEE Conference on Computer Vision and Pattern Recognition (CVPR) workshop, pp.806-813, 2014.

K. Simonyan and A. Zisserman, Very deep convolutional networks for largescale image recognition, International Conference on Learning Representations (ICLR), 2015.

R. Sinkhorn, A relationship between arbitrary positive matrices and doubly stochastic matrices. The annals of mathematical statistics, vol.35, pp.876-879, 1964.

N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, Dropout: A simple way to prevent neural networks from overfitting, The Journal of Machine Learning Research, vol.15, issue.1, pp.1929-1958, 2014.

I. Sutskever, J. Martens, G. Dahl, and G. Hinton, On the importance of initialization and momentum in deep learning, International Conference on Machine Learning (ICML), pp.1139-1147, 2013.

C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed et al., Going deeper with convolutions, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.1-9, 2015.

C. Szegedy, V. Vanhoucke, and S. Ioffe, Jon Shlens, and Zbigniew Wojna. Rethinking the inception architecture for computer vision, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), p.119, 2016.

C. Szegedy, S. Ioffe, V. Vanhoucke, and A. A. Alemi, Inception-v4, inception-resnet and the impact of residual connections on learning, AAAI Conference on Artificial Intelligence, 2017.

S. Thrun, M. Tom, and . Mitchell, Lifelong robot learning. Robotics and Autonomous Systems, vol.15, pp.25-46, 1995.

T. Tieleman and G. Hinton, Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural networks for machine learning, vol.4, pp.26-31, 2012.

T. Tommasi, F. Orabona, and B. Caputo, Learning categories from few examples with multi model knowledge transfer, IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol.36, pp.928-941, 2014.

E. Tzeng, J. Hoffman, N. Zhang, K. Saenko, and T. Darrell, Deep domain confusion: Maximizing for domain invariance, 2014.

E. Tzeng, J. Hoffman, T. Darrell, and K. Saenko, Simultaneous deep transfer across domains and tasks, Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp.4068-4076, 2015.

D. Ulyanov, A. Vedaldi, and V. Lempitsky, Instance normalization: The missing ingredient for fast stylization, 2016.

S. Grant-van-horn, R. Branson, S. Farrell, J. Haber, and . Barry, Building a bird recognition app and large scale dataset with citizen scientists: The fine print in fine-grained dataset collection

, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.595-604, 2015.

O. M. Grant-van-horn, Y. Aodha, Y. Song, C. Cui, A. Sun et al., The inaturalist species classification and detection dataset, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.8769-8778, 2018.

C. Villani, Optimal transport: old and new, vol.338, 2008.

P. Weinzaepfel, J. Revaud, Z. Harchaoui, and C. Schmid, Deepflow: Large displacement optical flow with deep matching, IEEE International Conference on Computer Vision (ICCV), pp.1385-1392, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00873592

P. Welinder, S. Branson, T. Mita, C. Wah, F. Schroff et al., Caltech-UCSD birds 200, 2010.

Y. Wu and K. He, Group normalization, Proceedings of the European Conference on Computer Vision (ECCV), pp.3-19, 2018.

L. Xiao, Y. Bahri, J. Sohl-dickstein, S. Samuel, J. Schoenholz et al., Dynamical isometry and a mean field theory of CNNs: How to train 10,000-layer vanilla convolutional neural networks, International Conference on Machine Learning (ICML), pp.793-802, 2018.

Y. Xie, X. Wang, R. Wang, and H. Zha, A fast proximal point method for computing wasserstein distance, 2018.

J. Yang, R. Yan, and A. G. Hauptmann, Adapting SVM classifiers to data with shifted distributions, IEEE International Conference on Data Mining Workshops (ICDMW), pp.69-76, 2007.

Y. Yang and T. Hospedales, Deep multi-task representation learning: A tensor factorisation approach, International Conference on Learning Representations, 2017.

J. Yosinski, J. Clune, Y. Bengio, and H. Lipson, How transferable are features in deep neural networks?, Advances in Neural Information Processing Systems (NIPS), pp.3320-3328, 2014.

S. Zagoruyko and N. Komodakis, Learning to compare image patches via convolutional neural networks, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.4353-4361, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01246261

J. Zbontar and Y. Lecun, Stereo matching by training a convolutional neural network to compare image patches, Journal of Machine Learning Research, vol.17, issue.2, pp.1-32, 2016.

D. Matthew, R. Zeiler, and . Fergus, Visualizing and understanding convolutional networks, European Conference on Computer Vision (ECCV), pp.818-833, 2014.

C. Zhang, S. Bengio, M. Hardt, B. Recht, and O. Vinyals, Understanding deep learning requires rethinking generalization, International Conference on Learning Representations (ICLR, 2017.

H. Zhang, K. Dana, J. Shi, Z. Zhang, X. Wang et al., Context encoding for semantic segmentation, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.7151-7160, 2018.

H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, Pyramid scene parsing network, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.2881-2890, 2017.

B. Zhou, H. Zhao, X. Puig, S. Fidler, A. Barriuso et al., Scene parsing through ADE20K dataset, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.633-641, 2017.

B. Zhou, A. Lapedriza, A. Khosla, A. Oliva, and A. Torralba, Places: A 10 million image database for scene recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol.40, issue.6, pp.1452-1464, 2018.

B. Zoph, V. Quoc, and . Le, Neural architecture search with reinforcement learning, International Conference on Learning Representations (ICLR, 2017.

B. Zoph, V. Vasudevan, J. Shlens, and Q. Le, Learning transferable architectures for scalable image recognition, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.8697-8710, 2018.