A. Dutt, Towards Incremental Learning with Deep Convolutional Networks, Conférence en Recherche d'Informations et Applications -CORIA 2017, 14th French Information Retrieval Conference, pp.385-394, 2017.

A. Dutt, D. Pellerin, and G. Quénot, Improving image classification using coarse and fine labels, Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval, pp.438-442, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01590672

A. Dutt, D. Pellerin, and G. Quénot, Improving Hierarchical Image Classification with Merged CNN Architectures, Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing, vol.31, pp.1-31, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01590664

A. Dutt, D. Pellerin, and G. Quénot, Coupled ensembles of neural networks, 6th International Conference on Learning Representations, 2018.
URL : https://hal.archives-ouvertes.fr/hal-02088253

A. Dutt, D. Pellerin, and G. Quénot, Coupled Ensembles of Neural Networks, 2018 International Conference on Content-Based Multimedia Indexing (CBMI), pp.1-6, 2018.
URL : https://hal.archives-ouvertes.fr/hal-02088253

W. Stefen-chan, M. Tim, D. Rombaut, A. Pellerin, and . Dutt, Descriptor extraction based on a multilayer dictionary architecture for classification of natural images, Computer Vision and Image Understanding, 2018.

A. Dutt, D. Pellerin, and G. Quénot, Coupled ensembles of neural networks". In: Neurocomputing, vol.10, p.92, 2018.
URL : https://hal.archives-ouvertes.fr/hal-02088253

A. Pham-tanh-dat, D. Dutt, G. Pellerin, and . Quénot, Classifier training from a generative model, 2019 International Conference on Content-Based Multimedia Indexing (CBMI), 2019.

R. Aljundi, P. Chakravarty, and T. Tuytelaars, Expert gate: Lifelong learning with a network of experts, 2017.

B. Ans and S. Rousset, Avoiding catastrophic forgetting by coupling two reverberating neural networks, Comptes Rendus de l'Académie des Sciences-Series III-Sciences de la Vie, vol.320, pp.989-997, 1997.
URL : https://hal.archives-ouvertes.fr/hal-00171579

J. L. Ba, J. R. Kiros, and G. E. Hinton, Layer normalization, 2016.

A. Brock, J. Donahue, and K. Simonyan, Large Scale GAN Training for High Fidelity Natural Image Synthesis, International Conference on Learning Representations, 2019.

N. Brown and T. Sandholm, Superhuman AI for heads-up nolimit poker: Libratus beats top professionals, Science, vol.359, pp.418-424, 2018.

R. Caruana, Multitask learning, Machine learning, vol.28, pp.41-75, 1997.

F. Chollet, Xception: Deep learning with depthwise separable convolutions, Proceedings of the IEEE conference on computer vision and pattern recognition, pp.1251-1258, 2017.

D. Ciresan, U. Meier, and J. Schmidhuber, Multi-column deep neural networks for image classification, 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp.3642-3649, 2012.

D. Ciresan, Convolutional neural network committees for handwritten character classification, 2011 International Conference on Document Analysis and Recognition, pp.1135-1139, 2011.

A. Coates, A. Ng, and H. Lee, An analysis of single-layer networks in unsupervised feature learning, Proceedings of the fourteenth international conference on artificial intelligence and statistics, 2011.

D. Pham-tanh, Classifier training from a generative model, 2019 International Conference on Content-Based Multimedia Indexing (CBMI), 2019.

. Harm-de-vries, Modulating early visual processing by language, Advances in Neural Information Processing Systems, pp.6594-6604, 2017.

J. Deng, C. Alexander, L. Berg, and . Fei-fei, Hierarchical semantic indexing for large scale image retrieval, Computer Vision and Pattern Recognition (CVPR), pp.785-792, 2011.

J. Devlin, BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics, pp.4171-4186, 2019.

T. Devries and G. W. Taylor, Improved Regularization of Convolutional Neural Networks with Cutout, 2017.

A. Dutt, D. Pellerin, and G. Quénot, Coupled ensembles of neural networks, 6th International Conference on Learning Representations, 2018.
URL : https://hal.archives-ouvertes.fr/hal-02088253

A. Dutt, D. Pellerin, and G. Quénot, Coupled ensembles of neural networks". In: Neurocomputing, 2019.
URL : https://hal.archives-ouvertes.fr/hal-02088253

J. Frankle and M. Carbin, The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks, 2019.

Y. Freund and R. E. Schapire, A Short Introduction to Boosting, Journal of Japanese Society for Artificial Intelligence, vol.14, pp.771-780, 1999.

X. Gastaldi, Shake-shake regularization, International Conference on Learning Representations (Workshop, 2017.

J. Gehring, Convolutional sequence to sequence learning, Proceedings of the 34th International Conference on Machine Learning, vol.70, pp.1243-1252, 2017.

X. Glorot and Y. Bengio, Understanding the difficulty of training deep feedforward neural networks, Proceedings of the thirteenth international conference on artificial intelligence and statistics, pp.249-256, 2010.

J. Ian, O. Goodfellow, A. Vinyals, and . Saxe, Qualitatively characterizing neural network optimization problems, International Conference on Learning Representations, 2015.

I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning, 2016.

I. Goodfellow, Generative adversarial nets, Advances in neural information processing systems, pp.2672-2680, 2014.

K. Greff, K. Rupesh, J. Srivastava, and . Schmidhuber, Highway and Residual Networks learn Unrolled Iterative Estimation, International Conference on Learning Representations, 2017.

C. Guo, On Calibration of Modern Neural Networks, International Conference on Machine Learning, pp.1321-1330, 2017.

Y. Guo, Depthwise Convolution is All You Need for Learning Multiple Visual Domains, Thirty-Third AAAI Conference on Artificial Intelligence, 2019.

S. Han, H. Mao, and W. J. Dally, Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding, International Conference on Learning Representations, 2015.

L. K. Hansen and P. Salamon, Neural network ensembles, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.12, pp.993-1001, 1990.

K. He, Deep residual learning for image recognition, Proceedings of the IEEE conference on computer vision and pattern recognition, pp.770-778, 2016.

K. He, Delving deep into rectifiers: Surpassing human-level performance on imagenet classification, Proceedings of the IEEE, pp.1026-1034, 2015.

K. He, Identity mappings in deep residual networks, European Conference on Computer Vision, pp.630-645, 2016.

X. He and H. Jaeger, Overcoming Catastrophic Interference using Conceptor Aided Backpropagation, International Conference on Learning Representations, 2018.

M. Heusel, Gans trained by a two time-scale update rule converge to a local nash equilibrium, Advances in Neural Information Processing Systems, pp.6626-6637, 2017.

G. Hinton, C. David, and . Plaut, Using fast weights to deblur old memories, Proceedings of the ninth annual conference of the Cognitive Science Society, 1987.

G. Hinton, O. Vinyals, and J. Dean, Distilling the knowledge in a neural network, NIPS 2014 Deep Learning Workshop, 2014.

G. Hinton, Deep neural networks for acoustic modeling in speech recognition, IEEE Signal processing magazine, p.29, 2012.

S. Hochreiter and J. Schmidhuber, Flat minima, Neural Computation, vol.9, pp.1-42, 1997.

G. Andrew and . Howard, Mobilenets: Efficient convolutional neural networks for mobile vision applications, 2017.

G. Huang, Densely connected convolutional networks, Proceedings of the IEEE conference on computer vision and pattern recognition, pp.4700-4708, 2017.

G. Huang, Snapshot ensembles: Train 1, get m for free, International Conference on Learning Representations, 2017.

S. Ioffe, Batch renormalization: Towards reducing minibatch dependence in batch-normalized models, Advances in neural information processing systems, pp.1945-1953, 2017.

S. Ioffe and C. Szegedy, Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift, Proceedings of the 32nd International Conference on Machine Learning (ICML-15, pp.448-456, 2015.

Y. Jia and T. Darrell, Latent task adaptation with large-scale hierarchies, Proceedings of the IEEE International Conference on Computer Vision, pp.2080-2087, 2013.

K. Kawaguchi, Deep learning without poor local minima, Advances in neural information processing systems, pp.586-594, 2016.

P. Diederik, J. Kingma, and . Ba, Adam: A method for stochastic optimization, International Conference on Learning Representations, 2015.

J. Kirkpatrick, Overcoming catastrophic forgetting in neural networks, Proceedings of the National Academy of Sciences, 2017.

A. Krizhevsky and G. Hinton, Learning multiple layers of features from tiny images, 2009.

A. Krizhevsky, I. Sutskever, and G. E. Hinton, Imagenet classification with deep convolutional neural networks, Advances in neural information processing systems, pp.1097-1105, 2012.

A. Krogh and J. A. Hertz, A simple weight decay can improve generalization, Advances in neural information processing systems, pp.950-957, 1992.

B. Lakshminarayanan, A. Pritzel, and C. Blundell, Simple and scalable predictive uncertainty estimation using deep ensembles, Advances in Neural Information Processing Systems, pp.6402-6413, 2017.

Y. Lecun, S. John, S. A. Denker, and . Solla, Optimal brain damage, Advances in neural information processing systems, pp.598-605, 1990.

Y. Lecun, Backpropagation applied to handwritten zip code recognition, Neural computation, vol.1, pp.541-551, 1989.

Y. Lecun, Gradient-based learning applied to document recognition, Proceedings of the IEEE, vol.86, pp.2278-2324, 1998.

H. Li, Pruning filters for efficient convnets, International Conference on Learning Representations, 2017.

Z. Li and D. Hoiem, Learning without forgetting, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017.

M. Lin, Q. Chen, and S. Yan, Network in network, International Conference on Learning Representations, 2013.
URL : https://hal.archives-ouvertes.fr/hal-01950552

T. Lin, Microsoft coco: Common objects in context, European conference on computer vision, pp.740-755, 2014.

H. Liu, K. Simonyan, and Y. Yang, DARTS: Differentiable Architecture Search, International Conference on Learning Representations, 2019.

D. Lopez-paz, Gradient Episodic Memory for Continuum Learning, 2017.

I. Loshchilov and F. Hutter, Sgdr: Stochastic gradient descent with warm restarts, International Conference on Learning Representations, 2016.

V. Macko, Improving neural architecture search image classifiers via ensemble learning, 2019.

A. Mallya and S. Lazebnik, Piggyback: Adding Multiple Tasks to a Single, Fixed Network by Learning to Mask, 2018.

M. Mccloskey, J. Neal, and . Cohen, Catastrophic interference in connectionist networks: The sequential learning problem, Psychology of learning and motivation, vol.24, 1989.

M. Mirza and S. Osindero, Conditional generative adversarial nets, 2014.

D. Mishkin and J. Matas, All you need is a good init, International Conference on Learning Representations, 2016.

T. Miyato, Spectral Normalization for Generative Adversarial Networks, International Conference on Learning Representations, 2018.

N. Morgan and H. Bourlard, Generalization and parameter estimation in feedforward nets: Some experiments, Advances in neural information processing systems, pp.630-637, 1990.

Y. Netzer, Reading digits in natural images with unsupervised feature learning, NIPS Workshop on Deep Learning and Unsupervised Feature Learning, 2011.

T. Pang, Improving Adversarial Robustness via Promoting Ensemble Diversity, Proceedings of the 36th International Conference on Machine Learning, vol.97, pp.4970-4979, 2019.

N. Papernot, Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data, International Conference on Learning Representations, 2016.

A. Radford, L. Metz, and S. Chintala, Unsupervised representation learning with deep convolutional generative adversarial networks, International Conference on Learning Representations, 2016.

R. Ratcliff, Connectionist models of recognition memory: Constraints imposed by learning and forgetting functions, Psychological review 97, vol.2, 1990.

H. Sylvestre-alvise-rebuffi, A. Bilen, and . Vedaldi, Learning multiple visual domains with residual adapters, 2017.

. Sylvestre-alvise-rebuffi, icarl: Incremental classifier and representation learning, Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp.2001-2010, 2017.

J. Redmon and A. Farhadi, YOLO9000: better, faster, stronger, Proceedings of the IEEE conference on computer vision and pattern recognition, pp.7263-7271, 2017.

B. Mark and . Ring, CHILD: A first step towards continual learning, Machine Learning, vol.28, pp.77-104, 1997.

A. Robins, Catastrophic forgetting, rehearsal and pseudorehearsal, Connection Science, vol.7, pp.123-146, 1995.

A. Rosenfeld and . John-k-tsotsos, Incremental learning through deep adaptation, IEEE transactions on pattern analysis and machine intelligence, 2018.

A. Royer and C. H. Lampert, Classifier adaptation at prediction time, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.1401-1409, 2015.

G. E. David-e-rumelhart, R. Hinton, and . Williams, Learning internal representations by error propagation, Parallel Distributed Processing, 1985.

O. Russakovsky, Imagenet large scale visual recognition challenge, International journal of computer vision, vol.115, pp.211-252, 2015.

A. Andrei and . Rusu, Progressive neural networks", 2016.

T. Salimans, P. Durk, and . Kingma, Weight normalization: A simple reparameterization to accelerate training of deep neural networks, Advances in Neural Information Processing Systems, pp.901-909, 2016.

T. Salimans, Improved techniques for training gans, Advances in neural information processing systems, pp.2234-2242, 2016.

S. Santurkar, L. Schmidt, and A. Madry, A Classification Based Study of Covariate Shift in GAN Distributions, International Conference on Machine Learning, pp.4487-4496, 2018.

S. Santurkar, How does batch normalization help optimization, Advances in Neural Information Processing Systems, pp.2483-2493, 2018.

H. Shin, Continual learning with deep generative replay, Advances in Neural Information Processing Systems, pp.2990-2999, 2017.

K. Shmelkov, C. Schmid, and K. Alahari, How good is my GAN?, In: Proceedings of the European Conference on Computer Vision (ECCV), pp.213-229, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01850447

D. Silver, Mastering the game of go without human knowledge, Nature, vol.550, p.354, 2017.

K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, International Conference on Learning Representations, 2015.

J. Tobias-springenberg, Striving for simplicity: The all convolutional net, International Conference on Learning Representations (Workshop, 2015.

N. Srivastava, Dropout: a simple way to prevent neural networks from overfitting, The journal of machine learning research, vol.15, pp.1929-1958, 2014.

K. Rupesh, K. Srivastava, J. Greff, and . Schmidhuber, Training very deep networks, Advances in neural information processing systems, pp.2377-2385, 2015.

C. Szegedy, Going deeper with convolutions, Proceedings of the IEEE conference on computer vision and pattern recognition, pp.1-9, 2015.

M. Tan, Mnasnet: Platform-aware neural architecture search for mobile, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.2820-2828, 2019.

S. Thrun, M. Tom, and . Mitchell, Lifelong robot learning, The biology and technology of intelligent autonomous agents, 1995.

W. Stefen-chan and . Tim, Descriptor extraction based on a multilayer dictionary architecture for classification of natural images, Computer Vision and Image Understanding, 2018.

A. Veit, J. Michael, S. Wilber, and . Belongie, Residual networks behave like ensembles of relatively shallow networks, Advances in neural information processing systems, pp.550-558, 2016.

X. Wang, Non-local neural networks, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.7794-7803, 2018.

Y. Wu and K. He, Group normalization, Proceedings of the European Conference on Computer Vision (ECCV), pp.3-19, 2018.

S. Xie, Aggregated residual transformations for deep neural networks, Proceedings of the IEEE conference on computer vision and pattern recognition, pp.1492-1500, 2017.

Z. Yan, HD-CNN: hierarchical deep convolutional neural networks for large scale visual recognition, Proceedings of the IEEE, pp.2740-2748, 2015.

J. Yoon, Lifelong Learning with Dynamically Expandable Networks, International Conference on Learning Representations, 2018.

H. Zhang, Self-Attention Generative Adversarial Networks, International Conference on Machine Learning, pp.7354-7363, 2019.

T. Zhang, Interleaved group convolutions, Proceedings of the IEEE International Conference on Computer Vision, pp.4373-4382, 2017.

L. Zhao, On the Connection of Deep Fusion to Ensembling, 2016.

B. Zoph, V. Quoc, and . Le, Neural architecture search with reinforcement learning, International Conference on Learning Representations, 2017.

B. Zoph, Learning transferable architectures for scalable image recognition, Proceedings of the IEEE conference on computer vision and pattern recognition, pp.8697-8710, 2018.