M. Abadi, A. Chu, I. Goodfellow, H. B. Mcmahan, I. Mironov et al., Deep learning with differential privacy, Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, CCS '16, pp.308-318, 2016.

Y. S. Abu-mostafa, Learning from hints in neural networks, Journal of Complexity, vol.6, issue.2, pp.192-198, 1990.

Y. S. Abu-mostafa, A method for learning from hints, Advances in Neural Information Processing Systems, vol.5, pp.73-80, 1992.

Y. S. Abu-mostafa, Hints and the vc dimension, Neural Computation, vol.5, issue.2, pp.278-288, 1993.

Y. S. Abu-mostafa, M. Magdon-ismail, L. , and H. , Learning From Data, 2012.

D. H. Ackley, G. E. Hinton, and T. J. Sejnowski, A learning algorithm for Boltzmann machines, Cognitive Science, vol.9, pp.147-169, 1985.

C. C. Aggarwal, Data Classification: Algorithms and Applications, 2014.

I. N. Aizenberg, N. N. Aizenberg, and J. P. Vandewalle, Multi-Valued and Universal Binary Neurons: Theory, Learning and Applications, 2000.

G. A. Anastassiou, Intelligent Systems II: Complete Approximation by Neural Network Operators, 2016.

G. A. Anastassiou and O. Duman, Intelligent Mathematics II: Applied Mathematics and Approximation Theory, vol.441, 2016.

G. A. Anastassiou and S. G. Gal, Approximation theory. moduli of continuity and global smoothness preservation, 2002.

P. Andersen, Deep reinforcement learning using capsules in advanced game environments, 2018.

A. Argyriou, T. Evgeniou, and M. Pontil, Multi-task feature learning, Advances in Neural Information Processing Systems 19, Proceedings of the Twentieth Annual Conference on Neural Information Processing Systems, pp.41-48, 2006.

A. Argyriou, C. A. Micchelli, M. Pontil, Y. , and Y. , A spectral regularization framework for multi-task structure learning, Advances in Neural Information Processing Systems 20, Proceedings of the Twenty-First Annual Conference on Neural Information Processing Systems, pp.25-32, 2007.

D. Arpit, S. K. Jastrzebski, N. Ballas, D. Krueger, E. Bengio et al., A closer look at memorization in deep networks, ICML, vol.70, pp.233-242, 2017.

K. Arulkumaran, M. P. Deisenroth, M. Brundage, and A. A. Bharath, Deep reinforcement learning: A brief survey, IEEE Signal Processing Magazine, vol.34, issue.6, pp.26-38, 2017.

J. L. Atkins, P. H. Whincup, R. W. Morris, L. T. Lennon, O. Papacosta et al., Sarcopenic obesity and risk of cardiovascular disease and mortality: a population-based cohort study of older men, Journal of the American Geriatrics Society, vol.62, issue.2, pp.253-60, 2014.

M. Auli, M. Galley, C. Quirk, and G. Zweig, Joint language and translation modeling with recurrent neural networks, Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, vol.2013, pp.1044-1054, 2013.

L. J. Ba and R. Caruana, Do deep nets really need to be deep?, Proceedings of the 27th International Conference on Neural Information Processing Systems, vol.2, pp.2654-2662, 2014.

D. Bahdanau, K. Cho, and Y. Bengio, Neural machine translation by jointly learning to align and translate, 2014.

H. Baird, Document image defect models, Proceddings, IAPR Workshop on Syntactic and Structural Pattern Recognition, 1990.

P. Baldi, S. Brunak, P. Frasconi, G. Soda, P. et al., Exploiting the past and the future in protein secondary structure prediction, Bioinformatics, vol.15, issue.11, pp.937-946, 1999.

D. H. Ballard, Modular learning in neural networks, Proc. AAAI, pp.279-284, 1987.

Y. Bar, I. Diamant, L. Wolf, G. , and H. , Deep learning with non-medical training used for chest pathology identification, Proc. SPIE, Medical Imaging: Computer-Aided Diagnosis, vol.9414, pp.94140-94147, 2015.

A. R. Barron, Universal approximation bounds for superpositions of a sigmoidal function. Information Theory, IEEE Transactions on, vol.39, issue.3, pp.930-945, 1993.

J. Baxter, A model of inductive bias learning, J. Artif. Int. Res, vol.12, issue.1, pp.149-198, 2000.

D. Belanger and A. Mccallum, Structured prediction energy networks, Proceedings of the 33nd International Conference on Machine Learning, pp.983-992, 2016.

S. Belharbi, C. Chatelain, R. Hérault, A. , and S. , Learning structured output dependencies using deep neural networks, Deep Learning Workshop in the 32nd International Conference on Machine Learning (ICML), 2015.

S. Belharbi, C. Chatelain, R. Hérault, A. , and S. , A unified neural based model for structured output problems, Conférence Francophone sur l'Apprentissage Automatique (CAP), 2015.

S. Belharbi, C. Chatelain, R. Hérault, A. , and S. , Neural networks regularization through class-wise invariant representation learning, 2017.
URL : https://hal.archives-ouvertes.fr/hal-02129472

S. Belharbi, C. Chatelain, R. Hérault, S. Adam, S. Thureau et al., Spotting l3 slice in ct scans using deep convolutional network and transfer learning, Computers in Biology and Medicine, vol.87, pp.95-103, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01643960

S. Belharbi, C. Chatelain, R. Hérault, S. Adam, S. Thureau et al., Spotting l3 slice in ct scans using deep convolutional network and transfer learning, Computers in Biology and Medicine, vol.87, pp.95-103, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01643960

S. Belharbi, R. Hérault, C. Chatelain, A. , and S. , Deep multi-task learning with evolving weights, European Symposium on Artificial Neural Networks (ESANN), 2016.

S. Belharbi, R. Hérault, C. Chatelain, A. , and S. , Pondération dynamique dans un cadre multi-tâche pour réseaux de neurones profonds, Apprentissage et, 2016.

S. Belharbi, R. Hérault, C. Chatelain, A. , and S. , Deep neural networks regularization for structured output prediction, Neurocomputing, vol.281, pp.169-177, 2018.
URL : https://hal.archives-ouvertes.fr/hal-02094963

S. Belharbi, R. Hérault, C. Chatelain, A. , and S. , Deep multi-task learning with evolving weights, European Symposium on Artificial Neural Networks (ESANN), 2016.

P. N. Belhumeur, D. W. Jacobs, D. J. Kriegman, and N. Kumar, Localizing parts of faces using a consensus of exemplars, CVPR, pp.545-552, 2011.

R. Bellman, Dynamic Programming, 1957.

S. Ben-david, J. Blitzer, K. Crammer, A. Kulesza, F. Pereira et al., A theory of learning from different domains, Machine Learning, vol.79, pp.151-175, 2010.

S. Ben-david, J. Blitzer, K. Crammer, and F. Pereira, Analysis of representations for domain adaptation, Advances in Neural Information Processing Systems 19, Proceedings of the Twentieth Annual Conference on Neural Information Processing Systems, pp.137-144, 2006.

S. Ben-david and R. S. Borbely, A notion of task relatedness yielding provable multipletask learning guarantees, Machine Learning, vol.73, pp.273-287, 2008.

S. Ben-david, J. Gehrke, and R. Schuller, A theoretical framework for learning from a pool of disparate data sources, KDD, pp.443-449, 2002.

S. Ben-david and R. Schuller, Exploiting task relatedness for multiple task learning, Learning Theory and Kernel Machines, pp.567-580, 2003.

Y. Bengio, Learning Deep Architectures for AI. Found, Trends Mach. Learn, vol.2, issue.1, pp.1-127, 2009.

Y. Bengio, G. Montavon, G. B. Orr, and K. Müller, Practical recommendations for gradient-based training of deep architectures, Neural Networks: Tricks of the Trade, vol.7700, pp.437-478, 2012.

Y. Bengio, Deep learning of representations: Looking forward, 2013.

Y. Bengio, A. Courville, and P. Vincent, Representation learning: A review and new perspectives, IEEE Trans. Pattern Anal. Mach. Intell, vol.35, issue.8, pp.1798-1828, 2013.

Y. Bengio, A. C. Courville, and P. Vincent, Representation Learning: A Review and New Perspectives, IEEE PAMI, vol.35, issue.8, pp.1798-1828, 2013.

Y. Bengio, P. Lamblin, D. Popovici, L. , and H. , Greedy layer-wise training of deep networks, Advances in Neural information Processing Systems, vol.19, pp.153-160, 2006.

Y. Bengio, P. Lamblin, D. Popovici, L. , and H. , Greedy Layer-Wise Training of Deep Networks, pp.153-160, 2007.

Y. Bengio and Y. Lecun, Scaling learning algorithms towards AI, 2007.

Y. Bengio, P. Simard, and P. Frasconi, Learning long-term dependencies with gradient descent is difficult, Trans. Neur. Netw, vol.5, issue.2, pp.157-166, 1994.

M. L. Bermingham, R. Pong-wong, A. Spiliopoulou, C. Hayward, I. Rudan et al., Application of high-dimensional feature selection: evaluation for genomic prediction in man, Scientific reports, vol.5, p.10312, 2015.

D. M. Bikel, R. Schwartz, and R. M. Weischedel, An algorithm that learns what's in a name, Machine learning, vol.34, issue.1-3, pp.211-231, 1999.

C. Bishop, Regularization and complexity control in feed-forward networks, Proceedings International Conference on Artificial Neural Networks ICANN'95, vol.1, pp.141-148, 1995.

J. Blitzer, K. Crammer, A. Kulesza, F. Pereira, and J. Wortman, Learning bounds for domain adaptation, Advances in Neural Information Processing Systems 20, Proceedings of the Twenty-First Annual Conference on Neural Information Processing Systems, pp.129-136, 2007.

E. V. Bonilla, K. M. Chai, W. , and C. K. , Multi-task gaussian process prediction, Advances in Neural Information Processing Systems 20, Proceedings of the TwentyFirst Annual Conference on Neural Information Processing Systems, pp.153-160, 2007.

K. Bousmalis, G. Trigeorgis, N. Silberman, D. Krishnan, and D. Erhan, Domain separation networks, NIPS, pp.343-351, 2016.

S. Boyd and L. Vandenberghe, Convex Optimization, 2004.

L. Breiman, Bagging predictors, Machine Learning, vol.24, pp.123-140, 1996.

L. Breiman, Random forests, Mach. Learn, vol.45, issue.1, pp.5-32, 2001.

J. S. Bridle and S. Cox, Recnorm: Simultaneous normalisation and classification applied to speech recognition, NIPS, pp.234-240, 1990.

J. Bromley, I. Guyon, Y. Lecun, E. Säckinger, and R. Shah, Signature verification using a "siamese" time delay neural network, Advances in Neural Information Processing Systems, vol.6, pp.737-744, 1994.

A. Bryson and Y. Ho, Applied optimal control: optimization, estimation, and control, 1969.

A. E. Bryson, A gradient method for optimizing multi-stage allocation processes, Proc. Harvard Univ. Symposium on digital computers and their applications, 1961.

J. Bryson, A. E. Denham, and W. F. , A steepest-ascent method for solving optimum programming problems, 1961.

E. J. Candes and T. Tao, Decoding by linear programming, IEEE Trans. Inf. Theor, vol.51, issue.12, pp.4203-4215, 2005.

N. Carlini, C. Liu, J. Kos, Ú. Erlingsson, and D. Song, The secret sharer: Measuring unintended neural network memorization & extracting secrets, 2018.

R. Caruana, Multitask learning: A knowledge-based source of inductive bias, ICML, pp.41-48, 1993.

R. Caruana, Multitask learning, Machine Learning, vol.28, pp.41-75, 1997.

O. Chapelle, B. Schölkopf, and A. Zien, Semi-supervised learning. Adaptive computation and machine learning, 2006.

R. Chartrand, Exact reconstruction of sparse signals via nonconvex minimization, IEEE Signal Processing Letters, vol.14, issue.10, pp.707-710, 2007.

R. Chartrand, Fast algorithms for nonconvex compressive sensing: Mri reconstruction from very few data, 2009 IEEE International Symposium on Biomedical Imaging: From Nano to Macro, pp.262-265, 2009.

P. Checco and F. Corinto, Cnn-based algorithm for drusen identification, International Symposium on Circuits and Systems, 2006.

J. Chen and N. S. Chaudhari, Capturing long-term dependencies for protein secondary structure prediction, Advances in Neural Networks-ISNN 2004, International Symposium on Neural Networks, pp.494-500, 2004.

M. Chen, Z. E. Xu, K. Q. Weinberger, S. , and F. , Marginalized denoising autoencoders for domain adaptation, ICML. icml.cc / Omnipress, 2012.

X. Chen, F. Xu, Y. , and Y. , Lower Bound Theory of Nonzero Entries in Solutions of l2-lp Minimization, 2009.

L. Chi and Y. Mu, Deep steering: Learning end-to-end driving model from spatial and temporal visual cues, 2017.

D. Chicco, P. Sadowski, and P. Baldi, Deep autoencoder neural networks for gene ontology annotation predictions, Proceedings of the 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, BCB '14, pp.533-540, 2014.

K. Cho, B. Van-merrienboer, D. Bahdanau, and Y. Bengio, On the properties of neural machine translation: Encoder-decoder approaches, Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, pp.103-111, 2014.

F. Chollet, , 2015.

S. Chopra, R. Hadsell, and Y. Lecun, Learning a similarity metric discriminatively, with application to face verification, IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005, pp.539-546, 2005.

H. Chung, D. Cobzas, L. Birdsell, J. Lieffers, and V. Baracos, Automated segmentation of muscle and adipose tissue on CT images for human body composition analysis, Proceedings of SPIE, vol.7261, pp.72610-72610, 2009.

J. Chung, Ç. Gülçehre, K. Cho, and Y. Bengio, Empirical evaluation of gated recurrent neural networks on sequence modeling, 2014.

J. Chung, . Çaglar-gülçehre, K. Cho, and Y. Bengio, Gated feedback recurrent neural networks, Proceedings of the 32nd International Conference on Machine Learning, pp.2067-2075, 2015.

D. C. Cire?an, U. Meier, L. M. Gambardella, and J. Schmidhuber, Deep, big, simple neural nets for handwritten digit recognition, Neural Comput, vol.22, issue.12, pp.3207-3220, 2010.

D. C. Cire?an, U. Meier, L. M. Gambardella, and J. Schmidhuber, Deep, big, simple neural nets for handwritten digit recognition, Neural Computation, vol.22, issue.12, pp.3207-3220, 2010.

D. Ciresan, U. Meier, and J. Schmidhuber, Multi-column deep neural networks for image classification, PROCEEDINGS OF THE 25TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2012, pp.3642-3649, 2012.

D. C. Ciresan, A. Giusti, L. M. Gambardella, and J. Schmidhuber, Deep neural networks segment neuronal membranes in electron microscopy images, Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held, pp.2852-2860, 2012.

D. C. Cire?an, U. Meier, and J. Schmidhuber, Transfer learning for latin and chinese characters with deep neural networks, International Joint Conference on Neural Networks, pp.1-6, 2012.

R. Collobert and J. Weston, A unified architecture for natural language processing: deep neural networks with multitask learning, Machine Learning, Proceedings of he 25th International Conference, pp.160-167, 2008.

R. Collobert and J. Weston, A unified architecture for natural language processing: deep neural networks with multitask learning, Machine Learning, Proceedings of the Twenty-Fifth International Conference (ICML 2008), pp.160-167, 2008.

C. Cortes and V. Vapnik, Support-vector networks, Machine Learning, vol.20, issue.3, pp.273-297, 1995.

K. Crammer, M. J. Kearns, and J. Wortman, Learning from multiple sources, Journal of Machine Learning Research, vol.9, pp.1757-1774, 2008.

D. Cristinacce and T. Cootes, Feature Detection and Tracking with Constrained Local Models, BMVC, vol.10, pp.95-96, 2006.

B. C. Csáji, Approximation with artificial neural networks, 2001.

A. Cunliffe, B. White, J. Justusson, C. Straus, R. Malik et al., Comparison of Two Deformable Registration Algorithms in the Presence of Radiologic Change Between Serial Lung CT Scans, Journal of Digital Imaging, vol.28, issue.6, pp.755-760, 2015.

G. Cybenko, Approximation by superpositions of a sigmoidal function, Mathematics of Control, Signals and Systems, vol.2, issue.4, pp.303-314, 1989.

W. Dai, Q. Yang, G. Xue, Y. , and Y. , Boosting for transfer learning, Machine Learning, Proceedings of the Twenty-Fourth International Conference (ICML 2007), pp.193-200, 2007.

H. De-vries, R. Memisevic, and A. Courville, Deep learning vector quantization, European Symposium on Artificial Neural Networks (ESANN), 2016.

J. Dean, G. S. Corrado, R. Monga, K. Chen, M. Devin et al., Large scale distributed deep networks, Proceedings of the 25th International Conference on Neural Information Processing Systems, vol.1, pp.1223-1231, 2012.

R. Dechter, Learning while searching in constraint-satisfaction-problems, Morgan Kaufmann. References, vol.113, pp.178-185, 1986.

S. Demyanov, Regularization methods for neural networks and related models, 2015.

J. Deng, W. Dong, R. Socher, L. Li, K. Li et al., ImageNet: A Large-Scale Hierarchical Image Database, CVPR09, 2009.

J. Deng, W. Dong, R. Socher, L. Li, K. Li et al., Imagenet: A large-scale hierarchical image database, CVPR, pp.248-255, 2009.

L. Deng and D. Yu, Deep learning: Methods and applications. Found. Trends Signal Process, vol.7, pp.197-387, 2014.

G. Desjardins, K. Simonyan, R. Pascanu, C. Cortes, N. D. Lawrence et al., Natural neural networks, Advances in Neural Information Processing Systems, vol.28, pp.2071-2079, 2015.

C. Doersch, Tutorial on variational autoencoders, 2016.

D. Donoho, For most large underdetermined systems of linear equations the minimal l1-norm solution is also the sparsest Solution, 2004.

A. Dosovitskiy, J. T. Springenberg, M. Tatarchenko, and T. Brox, Learning to generate chairs, tables and cars with convolutional networks, IEEE Trans. Pattern Anal. Mach. Intell, vol.39, issue.4, pp.692-705, 2017.

T. Dozat, Incorporating nesterov momentum into adam, 2016.

S. E. Dreyfus, The numerical solution of variational problems, Journal of Mathematical Analysis and Applications, vol.5, issue.1, pp.30-45, 1962.

R. Dubey, P. Agrawal, D. Pathak, T. L. Griffiths, and A. A. Efros, Investigating human priors for playing video games, 2018.

J. Duchi, E. Hazan, and Y. Singer, Adaptive Subgradient Methods for Online Learning and Stochastic Optimization, COLT, pp.257-269, 2010.

L. Duong, T. Cohn, S. Bird, and P. Cook, Low resource dependency parsing: Crosslingual parameter sharing in a neural network parser, ACL (2), pp.845-850, 2015.

C. Dwork, A firm foundation for private data analysis, Commun. ACM, vol.54, issue.1, pp.86-95, 2011.

C. Dwork, F. Mcsherry, K. Nissim, and A. D. Smith, Calibrating noise to sensitivity in private data analysis, TCC, vol.3876, pp.265-284, 2006.

C. Dwork and A. Roth, The algorithmic foundations of differential privacy, Foundations and Trends in Theoretical Computer Science, vol.9, issue.3-4, pp.211-407, 2014.

M. El-yacoubi, M. Gilloux, and J. Bertille, A statistical approach for phrase location and recognition within a text line: An application to street name recognition, IEEE PAMI, vol.24, issue.2, pp.172-188, 2002.

H. C. Ellis, Invariant Subspaces, 1965.

D. Erhan, Y. Bengio, A. Courville, P. Manzagol, P. Vincent et al., Why does unsupervised pre-training help deep learning?, J. Mach. Learn. Res, vol.11, pp.625-660, 2010.

D. Erhan, C. Szegedy, A. Toshev, A. , and D. , Scalable object detection using deep neural networks, CVPR, pp.2155-2162, 2014.

T. Evgeniou and M. Pontil, Regularized multi-task learning, Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp.109-117, 2004.

S. E. Fahlman, G. E. Hinton, and T. J. Sejnowski, Massively parallel architectures for AI: netl, thistle, and boltzmann machines, Proceedings of the National Conference on Artificial Intelligence, pp.109-113, 1983.

J. Fan, R. , and L. , Variable selection via nonconcave penalized likelihood and its oracle properties, Journal of the American Statistical Association, vol.96, pp.1348-1360, 2001.

C. Farabet, C. Couprie, L. Najman, and Y. Lecun, Learning Hierarchical Features for Scene Labeling, IEEE PAMI, vol.35, issue.8, pp.1915-1929, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00742077

R. Flamary, M. Cuturi, N. Courty, R. , and A. , Wasserstein discriminant analysis. Machine learning, 2018.
URL : https://hal.archives-ouvertes.fr/hal-02112754

L. Fridman, B. Jenik, and J. Terwilliger, Deeptraffic: Driving fast through dense traffic with deep reinforcement learning, 2018.

M. Fridman, Hidden markov model regression, Graduate School of Arts and Sciences, 1993.

T. Fukada, M. Schuster, and Y. Sagisaka, Phoneme boundary estimation using bidirectional recurrent neural networks and its applications. Systems and Computers in Japan, vol.30, pp.20-30, 1999.

K. Fukushima, Neural network model for a mechanism of pattern recognition unaffected by shift in position-Neocognitron, Trans. IECE, J62-A, issue.10, pp.658-665, 1979.

K. Fukushima, Neocognitron: A self-organizing neural network for a mechanism of pattern recognition unaffected by shift in position, Biological Cybernetics, vol.36, issue.4, pp.193-202, 1980.

K. Fukushima, Increasing robustness against background noise: visual pattern recognition by a Neocognitron, Neural Networks, vol.24, issue.7, pp.767-778, 2011.

K. Fukushima, Training multi-layered neural network Neocognitron, Neural Networks, vol.40, pp.18-31, 2013.

K. Fukushima and S. Miyake, Neocognitron: A new algorithm for pattern recognition tolerant of deformations and shifts in position, Pattern Recognition, vol.15, issue.6, pp.455-469, 1982.

K. Funahashi, On the approximate realization of continuous mappings by neural networks, Neural Networks, vol.2, issue.3, pp.183-192, 1989.

A. Gamba, L. Gamberini, G. Palmieri, and R. Sanna, Further experiments with papa, Il Nuovo Cimento, vol.20, issue.2, pp.112-115, 1955.

Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle et al., Domain-adversarial training of neural networks, Journal of Machine Learning Research, vol.17, issue.59, pp.1-35, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01624607

J. Gao, W. Fan, J. Jiang, H. , and J. , Knowledge transfer via multiple model local structure mapping, Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp.283-291, 2008.

D. Ge, X. Jiang, Y. , and Y. , A note on the complexity of lp minimization, Mathematical Programming, vol.129, issue.2, pp.285-299, 2011.

S. Geman, E. Bienenstock, and R. Doursat, Neural networks and the bias/variance dilemma, Neural Comput, vol.4, issue.1, pp.1-58, 1992.

P. Germain, A. Habrard, F. Laviolette, and E. Morvant, Pac-bayes and domain adaptation. arXiv, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01563152

S. Ghosh, R. S. Alomari, V. Chaudhary, and G. Dhillon, Automatic lumbar vertebra segmentation from clinical CT for wedge compression fracture diagnosis, Proceedings of the SPIE, vol.3, pp.796303-796312, 2011.

F. Girosi and T. Poggio, Representation properties of networks: Kolmogorov's theorem is irrelevant, Neural Computation, vol.1, issue.4, pp.465-469, 1989.

B. Glocker, D. Zikic, E. Konukoglu, D. Haynor, C. et al., Vertebrae localization in pathological spine CT via dense classification from sparse annotations, MICCAI, pp.262-70, 2013.

B. Glocker, J. Feulner, A. Criminisi, D. R. Haynor, and E. Konukoglu, Automatic Localization and Identification of Vertebrae in Arbitrary Field-of-View CT Scans, pp.590-598, 2012.

B. Glocker, D. Zikic, and D. R. Haynor, Robust Registration of Longitudinal Spine CT, pp.251-258, 2014.

X. Glorot and Y. Bengio, Understanding the difficulty of training deep feedforward neural networks, International conference on artificial intelligence and statistics, pp.249-256, 2010.

X. Glorot, A. Bordes, and Y. Bengio, Deep sparse rectifier neural networks, Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics (AISTATS-11), vol.15, pp.315-323, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00752497

X. Glorot, A. Bordes, and Y. Bengio, Domain adaptation for large-scale sentiment classification: A deep learning approach, Proceedings of the 28th International Conference on Machine Learning, pp.513-520, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00752091

B. Goertzel, Are there deep reasons underlying the pathologies of today's deep learning algorithms?, Artificial General Intelligence, pp.70-79, 2015.

S. Golodetz, I. Voiculescu, C. , and S. , Automatic spine identification in abdominal CT slices using image partition forests, International Symposium on Image and Signal Processing and Analysis, 2009.

F. J. Gomez and J. Schmidhuber, Co-evolving recurrent neurons learn deep memory pomdps, Proceedings of the 7th Annual Conference on Genetic and Evolutionary Computation, GECCO '05, pp.491-498, 2005.

I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning, 2016.

I. Goodfellow, Y. Bulatov, J. Ibarz, S. Arnoud, and V. Shet, Multi-digit number recognition from street view imagery using deep convolutional neural networks, International Conference on Learning Representations, 2014.

I. Goodfellow, J. Pouget-abadie, M. Mirza, B. Xu, D. Warde-farley et al., Generative adversarial nets, Advances in Neural Information Processing Systems, vol.27, pp.2672-2680, 2014.

I. J. Goodfellow, J. Shlens, and C. Szegedy, Explaining and harnessing adversarial examples, 2014.

I. J. Goodfellow, D. Warde-farley, M. Mirza, A. C. Courville, and Y. Bengio, Maxout networks, Proceedings of the 30th International Conference on Machine Learning, ICML 2013, pp.1319-1327, 2013.

D. F. Gordon and M. Desjardins, Evaluation and selection of biases in machine learning, Machine Learning, vol.20, pp.5-22, 1995.

S. Gouérant, M. Leheurteur, M. Chaker, R. Modzelewski, O. Rigal et al., A higher body mass index and fat mass are factors predictive of docetaxel dose intensity, Anticancer research, vol.33, issue.12, p.5655, 2013.

A. Graves, Supervised Sequence Labelling with Recurrent Neural Networks, Studies in Computational Intelligence, vol.385, 2012.
DOI : 10.1007/978-3-642-24797-2

URL : http://mediatum.ub.tum.de/doc/673554/document.pdf

A. Graves, Generating sequences with recurrent neural networks, 2013.

A. Graves, Generating sequences with recurrent neural networks, 2013.

A. Graves and N. Jaitly, Towards end-to-end speech recognition with recurrent neural networks, Proceedings of the 31th International Conference on Machine Learning, pp.1764-1772, 2014.

A. Graves and N. Jaitly, Towards end-to-end speech recognition with recurrent neural networks, Proceedings of the 31st International Conference on International Conference on Machine Learning, vol.32, 2014.

A. Graves, M. Liwicki, S. Fernández, R. Bertolami, H. Bunke et al., A novel connectionist system for unconstrained handwriting recognition, vol.31, pp.855-868, 2009.
DOI : 10.1109/tpami.2008.137

URL : http://www.idsia.ch/~juergen/tpami_2008.pdf

A. Graves, A. Mohamed, and G. E. Hinton, Speech recognition with deep recurrent neural networks, IEEE International Conference on Acoustics, Speech and Signal Processing, pp.6645-6649, 2013.
DOI : 10.1109/icassp.2013.6638947

A. Graves, J. Schmidhuber, D. Schuurmans, Y. Bengio, and L. Bottou, Offline handwriting recognition with multidimensional recurrent neural networks, Advances in Neural Information Processing Systems, vol.21, pp.545-552, 2009.
DOI : 10.1007/978-1-4471-4072-6_12

A. Graves, G. Wayne, and I. Danihelka, Neural turing machines, 2014.

A. Graves, G. Wayne, M. Reynolds, T. Harley, I. Danihelka et al., Hybrid computing using a neural network with dynamic external memory, Nature, vol.538, issue.7626, pp.471-476, 2016.
DOI : 10.1038/nature20101

A. Griewank, Documenta Mathematica-Extra, vol.ISMP, pp.389-400, 2012.

S. Grossberg, Contour enhancement, short term memory, and constancies in reverberating neural networks, Studies in Applied Mathematics, vol.52, issue.3, pp.213-257, 1973.
DOI : 10.1007/978-94-009-7758-7_8

S. Grossberg, Contour Enhancement, Short Term Memory, and Constancies in Reverberating Neural Networks, pp.332-378, 1982.

P. D. Grünwald, The Minimum Description Length Principle (Adaptive Computation and Machine Learning), References, vol.117, 2007.

P. Grünwald, A tutorial introduction to the minimum description length principle, Advances in Minimum Description Length: Theory and Applications, 2005.

S. Gu and L. Rigazio, Towards deep neural network architectures robust to adversarial examples, 2014.

J. Hadamard, Mémoire sur le problème d'analyse relatif à l'équilibre des plaques élastiques encastrées, vol.33, 1908.

R. Hadsell, S. Chopra, and Y. Lecun, Dimensionality reduction by learning an invariant mapping, IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp.1735-1742, 2006.
DOI : 10.1109/cvpr.2006.100

L. G. Hafemann, R. Sabourin, and L. S. Oliveira, Writer-independent feature learning for offline signature verification using deep convolutional neural networks, 2016.
DOI : 10.1109/ijcnn.2016.7727521

URL : http://arxiv.org/pdf/1604.00974

B. Hammer, On the approximation capability of recurrent neural networks, International Symposium on Neural Computation, pp.12-16, 1998.

D. Hansel, G. Mato, and C. Meunier, Memorization without generalization in a multilayered neural network, Europhysics Letters), vol.20, issue.5, p.471, 1992.

K. Hashimoto, C. Xiong, Y. Tsuruoka, and R. Socher, A joint many-task model: Growing a neural network for multiple NLP tasks, EMNLP, pp.1923-1933, 2017.

M. H. Hassoun, Fundamentals of Artificial Neural Networks, 1995.

T. Hastie, R. Tibshirani, W. , and M. , Statistical Learning with Sparsity: The Lasso and Generalizations, 2015.

M. Havaei, A. Davy, D. Warde-farley, A. Biard, A. Courville et al., Brain tumor segmentation with deep neural networks, 2015.

K. He, X. Zhang, S. Ren, and J. Sun, Delving deep into rectifiers: Surpassing human-level performance on imagenet classification, ICCV 2015, pp.1026-1034, 2015.

K. He, X. Zhang, S. Ren, and J. Sun, Deep residual learning for image recognition, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.770-778, 2016.

K. He, X. Zhang, S. Ren, and J. Sun, Identity mappings in deep residual networks, Computer Vision-ECCV 2016-14th European Conference, pp.630-645, 2016.

D. O. Hebb, The organization of behavior: A neuropsychological theory, 1949.

R. Hecht-nielsen, Neurocomputing, 1989.

R. Hecht-nielsen, Theory of the backpropagation neural network, International Joint Conference on Neural Networks (IJCNN), pp.593-605, 1989.

D. Heckerman, D. Geiger, C. , and D. M. , Learning bayesian networks: The combination of knowledge and statistical data, Machine Learning, vol.20, issue.3, pp.197-243, 1995.

G. E. Hinton, Training products of experts by minimizing contrastive divergence, Neural Comput, vol.14, issue.8, pp.1771-1800, 2002.

G. E. Hinton, Learning multiple layers of representation, Trends in Cognitive Sciences, vol.11, pp.428-434, 2007.

G. E. Hinton, Learning multiple layers of representation, Trends in Cognitive Sciences, vol.11, pp.428-434, 2007.

G. E. Hinton, A practical guide to training restricted boltzmann machines, Neural Networks: Tricks of the Trade, vol.7700, pp.599-619, 2012.

G. E. Hinton, J. L. Mcclelland, and D. E. Rumelhart, Parallel distributed processing: Explorations in the microstructure of cognition, chapter Distributed Representations, vol.1, pp.77-109, 1986.

G. E. Hinton, S. Osindero, and Y. Teh, A fast learning algorithm for deep belief nets, Neural Comput, vol.18, issue.7, pp.1527-1554, 2006.

G. E. Hinton, S. Osindero, and Y. W. Teh, A fast learning algorithm for deep belief nets, Neural Computation, vol.18, issue.7, pp.1527-1554, 2006.

G. E. Hinton, S. Sabour, and N. Frosst, Matrix capsules with EM routing, International Conference on Learning Representations, 2018.

G. E. Hinton and R. Salakhutdinov, Reducing the dimensionality of data with neural networks, Science, vol.313, issue.5786, pp.504-507, 2006.

G. E. Hinton, T. J. Sejnowski, and D. H. Ackley, Boltzmann machines: Constraint satisfaction networks that learn, 1984.

T. K. Ho, Random decision forests, Proceedings of the Third International Conference on Document Analysis and Recognition, vol.1, p.278, 1995.

S. Hochreiter, Untersuchungen zu dynamischen neuronalen Netzen, 1991.

S. Hochreiter, Y. Bengio, P. Frasconi, and J. Schmidhuber, Gradient flow in recurrent nets: the difficulty of learning long-term dependencies, A Field Guide to Dynamical Recurrent Neural Networks, 2001.

S. Hochreiter and J. Schmidhuber, Long short-term memory, Neural Comput, vol.9, issue.8, pp.1735-1780, 1997.

J. Hoffman, D. Wang, F. Yu, D. , and T. , Fcns in the wild: Pixel-level adversarial and constraint-based adaptation, 2016.

K. Hornik, M. Stinchcombe, and H. White, Multilayer feedforward networks are universal approximators, Neural Networks, vol.2, issue.5, pp.359-366, 1989.

K. Hornik, M. Stinchcombe, and H. White, Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks, Neural Netw, vol.3, issue.5, pp.551-560, 1990.

I. Hosu and T. Rebedea, Playing atari games with deep reinforcement learning and human checkpoint replay, 2016.

G. B. Huang and V. Jain, Deep and wide multiscale recursive networks for robust image labeling, 2013.

S. H. Huang, Y. H. Chu, S. H. Lai, and C. L. Novak, Learning-Based Vertebra Detection and Iterative Normalized-Cut Segmentation for Spinal MRI, IEEE Transactions on Medical Imaging, vol.28, issue.10, pp.1595-1605, 2009.

D. H. Hubel and T. Wiesel, Receptive fields, binocular interaction, and functional architecture in the cat's visual cortex, Journal of Physiology, vol.160, pp.106-154, 1962.

I. and H. D. , Frustratingly easy domain adaptation, Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pp.256-263, 2007.

(. ). Iii and W. C. , An Adaptive Logic System with Generalizing Properties, 1962.

S. Ioffe and C. Szegedy, Batch normalization: Accelerating deep network training by reducing internal covariate shift, Proceedings of the 32nd International Conference on Machine Learning, pp.448-456, 2015.

B. Irie and S. Miyake, Capabilities of three-layered perceptrons, IEEE International Conference on Neural Networks, vol.1, p.218, 1988.

A. G. Ivakhnenko, Polynomial theory of complex systems, IEEE Transactions on Systems, Man and Cybernetics, issue.4, pp.364-378, 1971.

A. G. Ivakhnenko and V. G. Lapa, Cybernetic Predicting Devices, 1965.

A. G. Ivakhnenko, V. G. Lapa, and R. N. Mcdonough, Cybernetics and forecasting techniques, 1967.

M. Jaderberg, K. Simonyan, A. Vedaldi, and A. Zisserman, Deep structured output learning for unconstrained text recognition, 2014.

G. James, D. Witten, T. Hastie, and R. Tibshirani, An Introduction to Statistical Learning: With Applications in R, 2014.

K. Jarrett, K. Kavukcuoglu, M. Ranzato, and Y. Lecun, What is the best multi-stage architecture for object recognition, ICCV 2009, pp.2146-2153, 2009.

J. Jiang and C. Zhai, Instance weighting for domain adaptation in NLP, ACL 2007, Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics, 2007.

X. Jiang, Representational transfer in deep belief networks, 28th Canadian Conference on Artificial Intelligence, pp.338-342, 2015.

D. T. Jones, Protein secondary structure prediction based on position-specific scoring matrices, Journal of Molecular Biology, vol.292, issue.2, pp.195-202, 1999.

A. Joulin and T. Mikolov, Inferring algorithmic patterns with stack-augmented recurrent nets, Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems, pp.190-198, 2015.

R. Józefowicz, O. Vinyals, M. Schuster, N. Shazeer, and Y. Wu, Exploring the limits of language modeling, 2016.

S. Kadoury, H. Labelle, P. , and N. , Automatic inference of articulated spine models in CT images using high-order markov random fields, Medical Image Analysis, vol.15, issue.4, pp.426-437, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00856308

T. Kaido, K. Ogawa, Y. Fujimoto, Y. Ogura, K. Hata et al., Impact of sarcopenia on survival in patients undergoing living donor liver transplantation, American Journal of Transplantation, vol.13, issue.6, pp.1549-1556, 2013.

A. Karpathy and F. Li, Deep visual-semantic alignments for generating image descriptions, IEEE Conference on Computer Vision and Pattern Recognition, pp.3128-3137, 2015.

M. J. Kearns and U. V. Vazirani, An Introduction to Computational Learning Theory, 1994.

H. J. Kelley, Gradient theory of optimal flight paths, Ars Journal, vol.30, issue.10, pp.947-954, 1960.

A. Kendall, Y. Gal, and R. Cipolla, Multi-task learning using uncertainty to weigh losses for scene geometry and semantics, 2017.

Y. Kim, Convolutional neural networks for sentence classification, 2014.

Y. Kim, Convolutional neural networks for sentence classification, Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pp.1746-1751, 2014.

D. P. Kingma and J. Ba, Adam: A method for stochastic optimization, 2014.

D. P. Kingma and M. Welling, Auto-encoding variational bayes, 2013.

R. Kiros, R. Salakhutdinov, and R. S. Zemel, Unifying visual-semantic embeddings with multimodal neural language models, 2014.

T. G. Kolda and B. W. Bader, Tensor decompositions and applications, SIAM Rev, vol.51, issue.3, pp.455-500, 2009.

A. K. Kolmogorov, On the representation of continuous functions of several variables by superposition of continuous functions of one variable and addition, Doklady Akademii Nauk SSSR, vol.114, pp.369-373, 1957.

A. N. Kolmogorov, On the representation of continuous functions of several variables by superposition of continuous functions of one variable and addition, Doklady Akademii. Nauk USSR, vol.114, pp.679-681, 1965.

A. Krizhevsky, Learning multiple layers of features from tiny images, 2009.

A. Krizhevsky, I. Sutskever, G. E. Hinton, F. Pereira, C. Burges et al., ImageNet Classification with Deep Convolutional Neural Networks, Advances in Neural Information Processing Systems, vol.25, pp.1097-1105, 2012.

D. Krueger, N. Ballas, S. Jastrzebski, D. Arpit, M. S. Kanwal et al., Deep nets don't learn via memorization, 2017.

E. Krupka and N. Tishby, Incorporating prior knowledge on features into learning, Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics, pp.227-234, 2007.

A. Kumar, O. Irsoy, P. Ondruska, M. Iyyer, J. Bradbury et al., Ask me anything: Dynamic memory networks for natural language processing, Proceedings of The 33rd International Conference on Machine Learning, vol.48, pp.1378-1387, 2016.

J. D. Lafferty, A. Mccallum, and F. C. Pereira, Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data, ICML, pp.282-289, 2001.

M. Lai, Deep learning for medical image segmentation, 2015.

H. Lanic, J. Kraut-tauzia, R. Modzelewski, F. Clatot, S. Mareschal et al., Sarcopenia is an independent prognostic factor in elderly patients with diffuse large b-cell lymphoma treated with immunochemotherapy, Leukemia & Lymphoma, vol.55, issue.4, pp.817-823, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01141161

N. D. Lawrence and J. C. Platt, Learning to learn with the informative vector machine, Machine Learning, Proceedings of the Twenty-first International Conference (ICML 2004), 2004.

V. Le, J. Brandt, Z. Lin, L. D. Bourdev, and T. S. Huang, Interactive Facial Feature Localization, ECCV, 2012, Proceedings, Part III, pp.679-692, 2012.

Y. Lecun, Une procédure d'apprentissage pour réseau à seuil asymétrique, Proceedings of Cognitiva 85, pp.599-604, 1985.

Y. Lecun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard et al., Back-propagation applied to handwritten zip code recognition, Neural Computation, vol.1, issue.4, pp.541-551, 1989.

Y. Lecun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard et al., Backpropagation applied to handwritten zip code recognition, Neural Comput, vol.1, issue.4, pp.541-551, 1989.

Y. Lecun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard et al., Handwritten digit recognition with a back-propagation network, Advances in Neural Information Processing Systems, vol.2, pp.396-404, 1990.

Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, Gradient-based learning applied to document recognition, Proceedings of the IEEE, pp.2278-2324, 1998.

Y. Lecun, L. Bottou, G. B. Orr, and K. Müller, Efficient backprop, Neural Networks: Tricks of the Trade, This Book is an Outgrowth of a 1996 NIPS Workshop, pp.9-50, 1998.

H. Lee, R. Grosse, R. Ranganath, and A. Y. Ng, Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations, Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, pp.609-616, 2009.

G. W. Leibniz, Memoir using the chain rule, vol.7, pp.2-3, 1676.

J. Lerouge, R. Herault, C. Chatelain, F. Jardin, and R. Modzelewski, IODA : An input / output deep architecture for image labeling, Pattern Recognition, vol.48, issue.9, pp.2847-2858, 2015.
URL : https://hal.archives-ouvertes.fr/hal-02094941

M. Leshno, V. Y. Lin, A. Pinkus, and S. Schocken, Multilayer feedforward networks with a nonpolynomial activation function can approximate any function, Neural Networks, vol.6, issue.6, pp.861-867, 1993.

L. 'hôpital and G. F. , Analyse des infiniment petits, pour l'intelligence des lignes courbes, L'Imprimerie Royale, 1696.

H. Li, A. Kadav, I. Durdanovic, H. Samet, and H. P. Graf, Pruning filters for efficient convnets, 2016.

X. Li, T. Uricchio, L. Ballan, M. Bertini, C. G. Snoek et al., Socializing the semantic gap: A comparative survey on image tag assignment, refinement, and retrieval, ACM Comput. Surv, vol.49, issue.1, p.39, 2016.

W. Light, Ridge functions, sigmoidal functions and neural networks. Approximation theory VII, pp.163-206, 1992.

S. Linnainmaa, The representation of the cumulative rounding error of an algorithm as a Taylor expansion of the local rounding errors, 1970.

S. Linnainmaa, Taylor expansion of the accumulated rounding error, BIT Numerical Mathematics, vol.16, issue.2, pp.146-160, 1976.

R. P. Lippmann, An introduction to computing with neural nets, SIGARCH Computer Architecture News, vol.16, issue.1, pp.7-25, 1988.

D. C. Liu and J. Nocedal, On the limited memory bfgs method for large scale optimization, Math. Program, vol.45, issue.3, pp.503-528, 1989.

S. Liu, N. Yang, M. Li, and M. Zhou, A recursive recurrent neural network for statistical machine translation, Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, vol.1, pp.1491-1500, 2014.

J. Long, E. Shelhamer, D. , and T. , Fully convolutional networks for semantic segmentation, IEEE Conference on Computer Vision and Pattern Recognition, pp.3431-3440, 2015.

M. Long and J. Wang, Learning multiple tasks with deep relationship networks, 2015.

Y. Lu, A. Kumar, S. Zhai, Y. Cheng, T. Javidi et al., Fully-adaptive feature sharing in multi-task networks with applications in person attribute classification, CVPR, pp.1131-1140, 2017.

D. G. Luenberger, Optimization by vector space methods. Decision and control, 1969.

J. Ma and L. Lu, Hierarchical segmentation and identification of thoracic vertebra using learning-based edge detection and coarse-to-fine deformable model, Computer Vision and Image Understanding, vol.117, issue.9, pp.1072-1083, 2013.

A. L. Maas, A. Y. Hannun, and A. Y. Ng, Rectifier nonlinearities improve neural network acoustic models, ICML Workshop on Deep Learning for Audio, Speech and Language Processing, 2013.

M. M. Mahmud and S. R. Ray, Transfer learning using kolmogorov complexity: Basic theory and empirical evaluations, Advances in Neural Information Processing Systems 20, Proceedings of the Twenty-First Annual Conference on Neural Information Processing Systems, pp.985-992, 2007.

D. Major, J. Hlad?vka, F. Schulze, and K. Bühler, Automated landmarking and labeling of fully and partially scanned spinal columns in CT images, Medical Image Analysis, vol.17, issue.8, pp.1151-1163, 2013.

A. Makhzani, J. Shlens, N. Jaitly, and I. J. Goodfellow, Adversarial autoencoders. CoRR, 2015.

C. Malon, M. Miller, H. C. Burger, E. Cosatto, and H. P. Graf, Identifying histological elements with convolutional neural networks, Int. Conf. on Soft Computing As Transdisciplinary Science and Technology, pp.450-456, 2008.
DOI : 10.1145/1456223.1456316

G. Marcus, Deep learning: A critical appraisal, 2018.

G. F. Marcus, The algebraic mind: Integrating connectionism and cognitive science, 2003.

J. Martens, Deep learning via hessian-free optimization, Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp.735-742, 2010.
DOI : 10.1007/978-3-642-35289-8_27

URL : http://www.cs.toronto.edu/~jmartens/docs/HF_book_chapter.pdf

J. Martens and I. Sutskever, Learning recurrent neural networks with Hessian-free optimization, ICML 2011, pp.1033-1040, 2011.
DOI : 10.1007/978-3-642-35289-8_27

URL : http://www.cs.toronto.edu/~jmartens/docs/HF_book_chapter.pdf

L. Martin, L. Birdsell, N. Macdonald, T. Reiman, M. T. Clandinin et al., Cancer cachexia in the age of obesity: Skeletal muscle depletion is a powerful prognostic factor, independent of body mass index, Journal of Clinical Oncology, vol.31, issue.12, pp.1539-1547, 2013.

J. Masci, U. Meier, D. Cire?an, and J. Schmidhuber, Stacked Convolutional AutoEncoders for Hierarchical Feature Extraction, pp.52-59, 2011.
DOI : 10.1007/978-3-642-21735-7_7

URL : http://www.idsia.ch/~juergen/icann2011stack.pdf

W. S. Mcculloch and W. Pitts, A logical calculus of the ideas immanent in nervous activity, The bulletin of mathematical biophysics, vol.5, issue.4, pp.115-133, 1943.

T. Mcinerney and D. Terzopoulos, Deformable models in medical image analysis: a survey, Medical image analysis, vol.1, issue.2, pp.91-108, 1996.

B. Michael-kelm, M. Wels, K. Zhou, S. Seifert, S. Suehling et al., Spine detection in CT and MR using iterated marginal space learning, Medical Image Analysis, vol.17, issue.8, pp.1283-1292, 2013.

L. Mihalkova, T. N. Huynh, and R. J. Mooney, Mapping and revising markov logic networks for transfer learning, Proceedings of the Twenty-Second AAAI Conference on Artificial Intelligence, pp.608-614, 2007.

L. Mihalkova and R. Mooney, Transfer learning with markov logic networks, ICML workshop on structural knowledge transfer for machine learning, 2006.

L. Mihalkova and R. J. Mooney, Transfer learning by mapping with minimal target data, Proceedings of the AAAI-08 workshop on transfer learning for complex tasks, 2008.

T. Mikolov, Statistical language models based on neural networks, 2012.

T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, D. et al., Distributed representations of words and phrases and their compositionality, NIPS, pp.3111-3119, 2013.

M. Minsky and S. Papert, Perceptrons: An Introduction to Computational Geometry, 1969.

M. L. Minsky and S. A. Papert, Perceptrons: Expanded Edition, 1988.

I. Misra, A. Shrivastava, A. Gupta, H. , and M. , Cross-stitch networks for multi-task learning, CVPR, pp.3994-4003, 2016.
DOI : 10.1109/cvpr.2016.433

URL : http://arxiv.org/pdf/1604.03539

T. M. Mitchell, The need for biases in learning generalizations, 1980.

T. M. Mitchell, Machine Learning, 1997.

N. Mitsiopoulos, R. N. Baumgartner, S. B. Heymsfield, W. Lyons, D. Gallagher et al., Cadaver validation of skeletal muscle measurement by magnetic resonance imaging and computerized tomography, Journal of applied physiology, vol.85, issue.1, pp.115-122, 1998.

V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou et al., Playing atari with deep reinforcement learning, NIPS Deep Learning Workshop, 2013.

V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness et al., Human-level control through deep reinforcement learning, Nature, vol.518, issue.7540, pp.529-533, 2015.

V. Mnih, H. Larochelle, and G. E. Hinton, Conditional restricted boltzmann machines for structured output prediction, UAI 2011, Proceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence, pp.514-522, 2011.

M. Mohri, A. Rostamizadeh, and A. Talwalkar, Foundations of Machine Learning, 2012.

P. Molchanov, S. Tyree, T. Karras, T. Aila, and J. Kautz, Pruning convolutional neural networks for resource efficient transfer learning, 2016.

M. F. Moller, Exact calculation of the product of the Hessian matrix of feed-forward network error functions and a vector in O(N) time, 1993.

G. Montúfar, R. Pascanu, K. Cho, and Y. Bengio, On the number of linear regions of deep neural networks, Proceedings of the 27th International Conference on Neural Information Processing Systems, vol.2, pp.2924-2932, 2014.

N. Mourad and J. P. Reilly, Minimizing nonconvex functions for sparse vector reconstruction, IEEE Trans. Signal Processing, vol.58, issue.7, pp.3485-3496, 2010.

V. Nair and G. E. Hinton, Rectified linear units improve restricted boltzmann machines, Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp.807-814, 2010.

L. Nanni, A. Lumini, and S. Brahnam, Local binary patterns variants as texture descriptors for medical image analysis, Artificial Intelligence in Medicine, vol.49, issue.2, pp.117-125, 2010.

B. K. Natarajan, Sparse approximate solutions to linear systems, SIAM J. Comput, vol.24, issue.2, pp.227-234, 1995.

Y. Nesterov, A method of solving a convex programming problem with convergence rate O(1/sqr(k)), Soviet Mathematics Doklady, vol.27, pp.372-376, 1983.

A. M. Nguyen, J. Yosinski, C. , and J. , Deep neural networks are easily fooled: High confidence predictions for unrecognizable images, CVPR, pp.427-436, 2015.

D. T. Nguyen, F. Alam, F. Ofli, and M. Imran, Automatic image filtering on social networks using deep learning and perceptual hashing during crises, 2017.

S. Nicolas, T. Paquet, and L. Heutte, A Markovian Approach for Handwritten Document Segmentation, ICPR (3), pp.292-295, 2006.
URL : https://hal.archives-ouvertes.fr/hal-00509210

R. H. Nielsen, Kolmogorov's mapping neural network existence theorem, Proceedings of the IEEE First International Conference on Neural Networks, vol.III, pp.11-13, 1987.

F. Ning, D. Delhomme, Y. Lecun, F. Piano, L. Bottou et al., Toward automatic phenotyping of developing embryos from videos, IEEE Trans. Image Processing, vol.14, issue.9, pp.1360-1371, 2005.
URL : https://hal.archives-ouvertes.fr/hal-00114920

P. Niyogi, F. Girosi, P. , and T. , Incorporating prior information in machine learning by creating virtual examples, Proceedings of the IEEE, vol.86, issue.11, pp.2196-2209, 1998.

H. Noh, S. Hong, H. , and B. , Learning deconvolution network for semantic segmentation, 2015 IEEE International Conference on Computer Vision, ICCV 2015, pp.1520-1528, 2015.

K. Noto and M. Craven, Learning Hidden Markov Models for Regression using Path Aggregation. CoRR, abs/1206, p.3275, 2012.

A. B. Novikoff, On convergence proofs on perceptrons, Proceedings of the Symposium on the Mathematical Theory of Automata, 1962.

F. J. Och, Minimum error rate training in statistical machine translation, Proceedings of the ACL, vol.1, 2003.

T. Ojala, M. Pietikainen, and T. Maenpaa, Multiresolution gray-scale and rotation invariant texture classification with local binary patterns, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.24, issue.7, pp.971-987, 2002.

A. B. Oktay and Y. S. Akgul, Localization of the lumbar discs using machine learning and exact probabilistic inference, Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, pp.158-165, 2011.

M. Olazaran, A sociological study of the official history of the perceptrons controversy, Social Studies of Science, vol.26, issue.3, pp.611-659, 1996.

E. S. Olivas, J. D. Guerrero, M. M. Sober, J. R. Benedito, and A. J. Lopez, Handbook Of Research On Machine Learning Applications and Trends: Algorithms, Methods and Techniques-2 Volumes, 2009.

M. Oster, R. J. Douglas, and S. Liu, Computation with spikes in a winner-take-all network, Neural Computation, vol.21, issue.9, pp.2437-2465, 2009.

S. J. Pan, Y. , and Q. , A survey on transfer learning, IEEE Trans. on Knowl. and Data Eng, vol.22, issue.10, pp.1345-1359, 2010.

G. Pandey and A. Dukkipati, To go deep or wide in learning?, Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics, vol.33, pp.724-732, 2014.

N. Papernot, M. Abadi, Ú. Erlingsson, I. J. Goodfellow, and K. Talwar, Semisupervised knowledge transfer for deep learning from private training data, 2016.

B. R. Paredes, A. Argyriou, N. Berthouze, and M. Pontil, Exploiting unrelated tasks in multi-task learning, Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, vol.22, pp.951-959, 2012.

J. Park, S. Li, W. Wen, P. T. Tang, H. Li et al., Faster cnns with direct sparse convolutions and guided pruning, 2016.

D. B. Parker, Learning-logic, 1985.

R. Pascanu, T. Mikolov, and Y. Bengio, Understanding the exploding gradient problem, 2012.

R. Pascanu, T. Mikolov, and Y. Bengio, On the difficulty of training recurrent neural networks, Proceedings of the 30th International Conference on Machine Learning, ICML 2013, pp.1310-1318, 2013.

R. Pascanu, T. Mikolov, and Y. Bengio, On the difficulty of training recurrent neural networks, Proceedings of the 30th International Conference on International Conference on Machine Learning, vol.28, 2013.

B. A. Pearlmutter, Fast exact multiplication by the Hessian, Neural Computation, vol.6, issue.1, pp.147-160, 1994.

P. Peng, M. Van-vledder, S. Tsai, M. De-jong, M. Makary et al., Sarcopenia negatively impacts short-term outcomes in patients undergoing hepatic resection for colorectal liver metastasis, HPB, vol.13, issue.7, pp.439-446, 2011.

G. Peyré and M. Cuturi, Computational optimal transport, 2017.

D. L. Pham, C. Xu, P. , and J. L. , Current methods in medical image segmentation 1, Annual review of biomedical engineering, vol.2, issue.1, pp.315-337, 2000.

B. T. Polyak, Some methods of speeding up the convergence of iteration methods, USSR Computational Mathematics and Mathematical Physics, vol.4, issue.5, pp.1-17, 1964.

B. Poole, J. Sohl-dickstein, and S. Ganguli, Analyzing noise in autoencoders and deep networks, 2014.

D. Povey, X. Zhang, and S. Khudanpur, Parallel training of deep neural networks with natural gradient and parameter averaging, 2014.

S. Rabanser, O. Shchur, G. , and S. , Introduction to tensor decompositions and their applications in machine learning, 2017.

L. Rabiner, A tutorial on hidden Markov models and selected applications in speech recognition, Proceedings of the IEEE, vol.77, issue.2, pp.257-286, 1989.

A. Radford, L. Metz, C. , and S. , Unsupervised representation learning with deep convolutional generative adversarial networks, 2015.

T. Raiko, H. Valpola, and Y. Lecun, Deep learning made easier by linear transformations in perceptrons, Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, vol.22, pp.924-932, 2012.

R. Raina, A. Battle, H. Lee, B. Packer, and A. Y. Ng, Self-taught learning: transfer learning from unlabeled data, Machine Learning, Proceedings of the Twenty-Fourth International Conference (ICML 2007), pp.759-766, 2007.

R. Raina, A. Madhavan, and A. Y. Ng, Large-scale deep unsupervised learning using graphics processors, Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, pp.873-880, 2009.

P. Ramachandran, P. J. Liu, L. , and Q. V. , Unsupervised pretraining for sequence to sequence learning, EMNLP, pp.383-391, 2017.

B. Ramsundar, S. M. Kearnes, P. Riley, D. Webster, D. E. Konerding et al., Massively multitask networks for drug discovery, 2015.

A. Ranzato, C. Poultney, S. Chopra, and Y. Lecun, Efficient Learning of Sparse Representations with an Energy-Based Model, NIPS, pp.1137-1144, 2007.

M. Ranzato, Y. Boureau, and Y. Lecun, Sparse feature learning for deep belief networks, Advances in Neural Information Processing Systems 20, Proceedings of the TwentyFirst Annual Conference on Neural Information Processing Systems, pp.1185-1192, 2007.

M. Ranzato, F. Huang, Y. Boureau, and Y. Lecun, Unsupervised learning of invariant feature hierarchies with applications to object recognition, Proc. Computer Vision and Pattern Recognition Conference (CVPR'07), 2007.

M. Ranzato, C. S. Poultney, S. Chopra, and Y. Lecun, Efficient learning of sparse representations with an energy-based model, Advances in Neural Information Processing Systems 19, Proceedings of the Twentieth Annual Conference on Neural Information Processing Systems, pp.1137-1144, 2006.

A. S. Razavian, H. Azizpour, J. Sullivan, C. , and S. , CNN features off-the-shelf: An astounding baseline for recognition, CVPR Workshops, pp.512-519, 2014.

S. Ren, K. He, R. Girshick, and J. Sun, Faster r-cnn: Towards real-time object detection with region proposal networks, NIPS, vol.28, pp.91-99, 2015.

M. Riesenhuber and T. Poggio, Hierarchical models of object recognition in cortex, Nature Neuroscience, vol.2, issue.11, 1999.

S. Rifai, G. Mesnil, P. Vincent, X. Muller, Y. Bengio et al., Higher order contractive auto-encoder, European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD), 2011.

S. Rifai, P. Vincent, X. Muller, X. Glorot, and Y. Bengio, Contractive auto-encoders: Explicit invariance during feature extraction, Proceedings of the 28th International Conference on Machine Learning, pp.833-840, 2011.

J. Rissanen, Modeling by shortest data description, Automatica, vol.14, issue.5, pp.465-471, 1978.
DOI : 10.1016/0005-1098(78)90005-5

O. Ronneberger, P. Fischer, and T. Brox, U-net: Convolutional networks for biomedical image segmentation, Medical Image Computing and Computer-Assisted Intervention-MICCAI 2015-18th International Conference, pp.234-241, 2015.
DOI : 10.1007/978-3-319-24574-4_28

URL : http://arxiv.org/pdf/1505.04597

F. Rosenblatt, The perceptron: A probabilistic model for information storage and organization in the brain, Psychological Review, pp.65-386, 1958.

F. Rosenblatt, Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms, 1962.

H. R. Roth, J. Yao, L. Lu, J. Stieger, J. E. Burns et al., Detection of sclerotic spine metastases via random aggregation of deep convolutional neural network classifications, 2014.

S. Ruder, J. Bingel, I. Augenstein, and A. Søgaard, Sluice networks: Learning what to share between loosely related tasks, 2017.

W. Rudin, Principles of mathematical analysis, 1964.

D. E. Rumelhart, G. E. Hinton, W. , and R. J. , Learning internal representations by error propagation, Parallel Distributed Processing, vol.1, pp.318-362, 1986.

S. Sabour, N. Frosst, and G. E. Hinton, Dynamic routing between capsules, Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems, pp.3859-3869, 2017.

C. Sagonas, G. Tzimiropoulos, S. Zafeiriou, and M. Pantic, A semi-automatic methodology for facial landmark annotation, CVPR Workshops, pp.896-903, 2013.
DOI : 10.1109/cvprw.2013.132

R. Salakhutdinov and G. Hinton, Deep Boltzmann machines, Proceedings of the International Conference on Artificial Intelligence and Statistics, vol.5, pp.448-455, 2009.

T. Salimans, D. P. Kingma, and M. Welling, Markov chain monte carlo and variational inference: Bridging the gap, Proceedings of the 32nd International Conference on Machine Learning, pp.1218-1226, 2015.

A. Sathyanarayana, S. Joty, L. Fernandez-luque, F. Ofli, J. Srivastava et al., Sleep quality prediction from wearable data using deep learning, JMIR mHealth and uHealth, vol.4, issue.4, 2016.
DOI : 10.2196/mhealth.6562

URL : https://doi.org/10.2196/mhealth.6562

A. D. Savva, T. L. Economopoulos, and G. K. Matsopoulos, Geometry-based vs. intensity-based medical image registration: A comparative study on 3D CT data, Computers in Biology and Medicine, vol.69, pp.120-133, 2016.

H. Schmid, Part-of-speech tagging with neural networks, conference on Computational linguistics, vol.12, pp.44-49, 1994.
DOI : 10.3115/991886.991915

URL : http://arxiv.org/pdf/cmp-lg/9410018

J. Schmidhuber, A local learning algorithm for dynamic feedforward and recurrent networks, Connection Science, vol.1, issue.4, pp.403-412, 1989.

J. Schmidhuber, Learning complex, extended sequences using the principle of history compression, Neural Computation, vol.4, issue.2, pp.234-242, 1992.

J. Schmidhuber, My first Deep Learning system of 1991 + Deep Learning timeline 1962-2013, 2013.

J. Schmidhuber, Deep learning in neural networks: An overview, Neural Networks, vol.61, pp.85-117, 2014.
DOI : 10.1016/j.neunet.2014.09.003

URL : http://arxiv.org/pdf/1404.7828

B. Scholkopf and A. J. Smola, Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond, 2001.

M. Schuster, On supervised learning from sequential data with applications for speech recognition, 1999.

M. Schuster and K. Paliwal, Bidirectional recurrent neural networks, Trans. Sig. Proc, vol.45, issue.11, pp.2673-2681, 1997.

P. Sermanet, K. Kavukcuoglu, S. Chintala, and Y. Lecun, Pedestrian detection with unsupervised multi-stage feature learning, Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition, CVPR '13, pp.3626-3633, 2013.

S. Shalev-shwartz and S. Ben-david, Understanding Machine Learning: From Theory to Algorithms, 2014.

W. Shen, M. Punyanitya, Z. Wang, D. Gallagher, M. St-onge et al., Total body skeletal muscle and adipose tissue volumes: estimation from a single abdominal cross-sectional image, Journal of applied physiology, vol.97, issue.6, pp.2333-2338, 2004.

Y. Shen, X. He, J. Gao, L. Deng, and G. Mesnil, A latent semantic model with convolutional-pooling structure for information retrieval, CIKM, pp.101-110, 2014.

H. Shimodaira, Improving predictive inference under covariate shift by weighting the log-likelihood function, Journal of Statistical Planning and Inference, vol.90, issue.2, pp.227-244, 2000.

H. C. Shin, H. R. Roth, M. Gao, L. Lu, Z. Xu et al., Deep convolutional neural networks for computer-aided detection: Cnn architectures, dataset characteristics and transfer learning, IEEE Transactions on Medical Imaging, vol.35, issue.5, pp.1285-1298, 2016.

J. Sietsma and R. J. Dow, Creating artificial neural networks that generalize, Neural Networks, vol.4, issue.1, pp.67-79, 1991.

D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre et al., Mastering the game of go with deep neural networks and tree search, Nature, vol.529, issue.7587, pp.484-489, 2016.

P. Simard, Y. Le-cun, and J. Denker, Efficient Pattern Recognition Using a New Transformation Distance, Advances in Neural Information Processing Systems, vol.5, pp.50-58, 1993.

P. Simard, B. Victorri, Y. Lecun, and J. Denker, Tangent Prop-a formalism for specifying selected invariances in an adaptive network, Advances in Neural Information Processing Systems, vol.4, pp.895-903, 1992.

P. Y. Simard, D. Steinkraus, and J. C. Platt, Best practices for convolutional neural networks applied to visual document analysis, Proceedings of the Seventh International Conference on Document Analysis and Recognition, vol.2, p.958, 2003.

K. Simonyan, A. Vedaldi, and A. Zisserman, Deep inside convolutional networks: Visualising image classification models and saliency maps, 2013.

K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, 2014.

J. Sjoberg, J. Sjoeberg, J. Sjöberg, and L. Ljung, Overtraining, regularization and searching for a minimum, with application to neural networks, International Journal of Control, vol.62, pp.1391-1407, 1995.

D. D. Sleator and D. Temperley, Parsing English with a link grammar, Proc. Third International Workshop on Parsing Technologies, pp.277-292, 1993.

P. Smolensky, Parallel distributed processing: Explorations in the microstructure of cognition, chapter Information Processing in Dynamical Systems: Foundations of Harmony Theory, vol.1, pp.194-281, 1986.

R. Socher, A. Perelygin, J. Wu, J. Chuang, C. D. Manning et al., Recursive deep models for semantic compositionality over a sentiment treebank, Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp.1631-1642, 2013.

A. Søgaard and Y. Goldberg, Deep multi-task learning with low level tasks supervised at lower layers. In ACL (2). The Association for Computer Linguistics, 2016.

K. Sohn, H. Lee, Y. , and X. , Learning structured output representation using deep conditional generative models, NIPS 2015, pp.3483-3491, 2015.

N. Srivastava, Improving Neural Networks with Dropout, 2013.

N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, Dropout: A simple way to prevent neural networks from overfitting, Journal of Machine Learning Research, vol.15, pp.1929-1958, 2014.

R. K. Srivastava, K. Greff, and J. Schmidhuber, Highway Networks, 2016.

J. Starck, F. Murtagh, and J. Fadili, Dictionary Learning, pp.263-274, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01717943

K. Steffens, The history of approximation theory: from Euler to Bernstein, 2007.

D. Steinkrau, P. Y. Simard, and I. Buck, Using gpus for machine learning algorithms, Proceedings of the Eighth International Conference on Document Analysis and Recognition, ICDAR '05, pp.1115-1119, 2005.

B. Stuner, C. Chatelain, P. , and T. , Cohort of LSTM and lexicon verification for handwriting recognition with gigantic lexicon, 2016.

S. C. Suddarth and Y. L. Kergosien, Rule-injection hints as a means of improving network performance and learning time, Neural Networks, EURASIP workshop 1990, pp.120-129, 1990.

M. Sugiyama, Dimensionality reduction of multimodal labeled data by local fisher discriminant analysis, J. Mach. Learn. Res, vol.8, pp.1027-1061, 2007.

S. Sukhbaatar, A. Szlam, J. Weston, F. , and R. , Weakly supervised memory networks, 2015.

X. Sun and E. W. Cheney, The fundamentality of sets of ridge functions. aequationes mathematicae, vol.44, pp.226-235, 1992.

I. Sutskever, J. Martens, G. Dahl, and G. Hinton, On the importance of initialization and momentum in deep learning, ICML, vol.28, pp.1139-1147, 2013.

I. Sutskever, O. Vinyals, L. , and Q. V. , Sequence to sequence learning with neural networks, Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems, pp.3104-3112, 2014.

I. Sutskever, O. Vinyals, L. , and Q. V. , Sequence to sequence learning with neural networks, Proceedings of the 27th International Conference on Neural Information Processing Systems, vol.2, pp.3104-3112, 2014.

U. Syed and G. Yona, Enzyme function prediction with interpretable models. Computational Systems Biology, pp.373-420, 2009.

V. Sze, Y. H. Chen, T. J. Yang, and J. S. Emer, Efficient processing of deep neural networks: A tutorial and survey, Proceedings of the IEEE, vol.105, issue.12, pp.2295-2329, 2017.

C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed et al., Going Deeper with Convolutions, 2014.

C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. E. Reed et al., , 2014.

C. Szegedy, A. Toshev, and D. Erhan, Deep neural networks for object detection, vol.26, pp.2553-2561, 2013.

C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan et al., Intriguing properties of neural networks, 2013.

M. Szummer and Y. Qi, Contextual Recognition of Hand-drawn Diagrams with Conditional Random Fields, IWFHR, pp.32-37, 2004.

Y. Tang and C. Eliasmith, Deep networks for robust visual recognition, Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp.1055-1062, 2010.

T. Terlaky, On lp programming, European Journal of Operational Research, vol.22, issue.1, pp.70-100, 1985.

S. Thrun, Is learning the n-th thing any easier than learning the first?, Advances in Neural Information Processing Systems, pp.640-646, 1996.

S. Thrun and T. M. Mitchell, Learning one more thing, Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, vol.95, pp.1217-1225, 1995.

R. Tibshirani, Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society, Series B, vol.58, pp.267-288, 1994.

A. N. Tikhonov, On solving ill-posed problem and method of regularization, Doklady Akademii Nauk USSR, vol.153, pp.501-504, 1963.

A. N. Tikhonov and V. Y. Arsenin, Solutions of Ill-posed problems, 1977.

G. Tsechpenakis, J. Wang, B. Mayer, and D. Metaxas, Coupling CRFs and Deformable Models for 3D Medical Image Segmentation, ICCV, pp.1-8, 2007.

E. Tzeng, J. Hoffman, T. Darrell, and K. Saenko, Adversarial discriminative domain adaptation, CVPR, 2017.

D. Ulyanov, A. Vedaldi, and V. Lempitsky, Deep image prior, 2017.

G. Urban, M. Bendszus, F. A. Hamprecht, and J. Kleesiek, Multi-modal brain tumor segmentation using deep convolutional neural networks, MICCAI BraTS Challenge Proceedings, pp.31-35, 2014.

P. E. Utgoff, Machine Learning of Inductive Bias, 1986.

L. G. Valiant, A theory of the learnable, Commun. ACM, vol.27, issue.11, pp.1134-1142, 1984.

A. Van-den-oord, S. Dieleman, and B. Schrauwen, Deep content-based music recommendation, NIPS, pp.2643-2651, 2013.

V. N. Vapnik and A. Y. Chervonenkis, On the uniform convergence of relative frequencies of events to their probabilities, Theory of Probability and its Applications, vol.16, pp.264-280, 1971.

P. Vincent, L. Hugo, Y. Bengio, and P. Manzagol, Extracting and composing robust features with denoising autoencoders, Proceedings of the 25th international conference on Machine learning, ICML '08, pp.1096-1103, 2008.

P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P. Manzagol, Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion, JMLR, vol.11, pp.3371-3408, 2010.

O. Vinyals, L. Kaiser, T. Koo, S. Petrov, I. Sutskever et al., Grammar as a foreign language, Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems, pp.2773-2781, 2015.

O. Vinyals, A. Toshev, S. Bengio, and D. Erhan, Show and tell: A neural image caption generator, IEEE Conference on Computer Vision and Pattern Recognition, pp.3156-3164, 2015.
DOI : 10.1109/cvpr.2015.7298935

URL : http://arxiv.org/pdf/1411.4555

S. Wager, S. Wang, P. S. Liang, L. Bottou, M. Welling et al., Dropout training as adaptive regularization, Advances in Neural Information Processing Systems, vol.26, pp.351-359, 2013.

J. Wallis and T. Miller, Three-dimensional display in nuclear medicine and radiology, Society of Nuclear Medicine, vol.32, issue.3, pp.534-546, 1991.
DOI : 10.1109/42.41482

J. W. Wallis, Cardiovascular Nuclear Medicine and MRI: Quantitation and Clinical Applications, pp.89-100, 1992.

J. W. Wallis, T. R. Miller, C. A. Lerner, and E. C. Kleerup, Three-dimensional display in nuclear medicine, IEEE Trans. on Medical Imaging, vol.8, issue.4, pp.297-230, 1989.
DOI : 10.1109/42.41482

L. Wan, M. D. Zeiler, S. Zhang, Y. Lecun, F. et al., Regularization of neural networks using dropconnect, ICML, vol.28, pp.1058-1066, 2013.

C. Wang and S. Mahadevan, Manifold alignment using procrustes analysis, Machine Learning, Proceedings of the Twenty-Fifth International Conference (ICML 2008), pp.1120-1127, 2008.
DOI : 10.1145/1390156.1390297

URL : https://scholarworks.umass.edu/cgi/viewcontent.cgi?article=1061&context=cs_faculty_pubs

X. Wang, L. Li, D. Lockington, D. Pullar, J. et al., Self-organizing polynomial neural network for modelling complex hydrological processes, 2005.

X. Wang and Y. Wang, Improving content-based and hybrid music recommendation using deep learning, ACM Multimedia, pp.627-636, 2014.
DOI : 10.1145/2647868.2654940

D. Warde-farley, I. J. Goodfellow, A. C. Courville, and Y. Bengio, An empirical analysis of dropout in piecewise linear networks, International Conference on Learning Representations, 2014.

J. Weng, N. Ahuja, and T. S. Huang, Cresceptron: a self-organizing neural network which grows adaptively, International Joint Conference on Neural Networks (IJCNN), vol.1, pp.576-581, 1992.
DOI : 10.1109/ijcnn.1992.287150

URL : http://vision.ai.uiuc.edu/publications/cresceptron_1992.pdf

J. J. Weng, N. Ahuja, and T. S. Huang, Learning recognition and segmentation using the cresceptron, International Journal of Computer Vision, vol.25, issue.2, pp.109-143, 1997.

P. J. Werbos, Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences, 1974.

P. J. Werbos, Applications of advances in nonlinear sensitivity analysis, Proceedings of the 10th IFIP Conference, 31.8-4.9, pp.762-770, 1981.

J. Weston, S. Chopra, and A. Bordes, Memory networks, 2014.

J. Weston, F. Ratle, C. , and R. , Deep learning via semi-supervised embedding, Machine Learning, Proceedings of the Twenty-Fifth International Conference (ICML 2008), pp.1168-1175, 2008.
DOI : 10.1145/1390156.1390303

URL : http://icml2008.cs.helsinki.fi/papers/340.pdf

J. Weston, F. Ratle, H. Mobahi, C. , and R. , Deep learning via semi-supervised embedding, Neural Networks: Tricks of the Trade, 2012.
DOI : 10.1145/1390156.1390303

URL : http://icml2008.cs.helsinki.fi/papers/340.pdf

B. Widrow, An Adaptive "ADALINE" Neuron Using Chemical "Memistors, 1960.

D. H. Wiesel and T. N. Hubel, Receptive fields of single neurones in the cat's striate cortex, J. Physiol, vol.148, pp.574-591, 1959.

S. Wiesler, A. Richard, R. Schlüter, and H. Ney, Mean-normalized stochastic gradient for large-scale deep learning, International Conference on Acoustics Speech and Signal Processing ICASSP, pp.180-184, 2014.

S. Wiesler, R. Schlüter, and H. Ney, A convergence analysis of log-linear training and its application to speech recognition, IEEE Workshop on Automatic Speech Recognition Understanding, pp.1-6, 2011.

R. Winter and B. Widrow, Madaline rule ii: a training algorithm for neural networks, IEEE 1988 International Conference on Neural Networks, vol.1, pp.401-408, 1988.

D. H. Wolpert, The lack of a priori distinctions between learning algorithms, Neural Comput, vol.8, issue.7, pp.1341-1390, 1996.

R. S. Woodworth and E. Thorndike, The influence of improvement in one mental function upon the efficiency of other functions.(i), Psychological review, vol.8, issue.3, p.247, 1901.

X. Wu and R. Srihari, Incorporating prior knowledge with weighted margin support vector machines, Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '04, pp.326-333, 2004.

K. Xu, J. Ba, R. Kiros, K. Cho, A. C. Courville et al., Show, attend and tell: Neural image caption generation with visual attention, Proceedings of the 32nd International Conference on Machine Learning, pp.2048-2057, 2015.

G. Xue and Y. Ye, An efficient algorithm for minimizing a sum of p-norms, SIAM Journal on Optimization, vol.10, issue.2, pp.551-579, 2000.

Y. Xue, X. Liao, L. Carin, and B. Krishnapuram, Multi-task learning for classification with dirichlet process priors, Journal of Machine Learning Research, vol.8, pp.35-63, 2007.

Y. Yang and T. M. Hospedales, Deep multi-task representation learning: A tensor factorisation approach, 2016.

Y. Yang and T. M. Hospedales, Trace norm regularised deep multi-task learning, 2016.

C. Yip, C. Dinkel, A. Mahajan, M. Siddique, G. Cook et al., Imaging body composition in cancer patients: visceral obesity, sarcopenia and sarcopenic obesity may impact on clinical outcome, Insights into Imaging, pp.489-497, 2015.

J. Yosinski, J. Clune, Y. Bengio, L. , and H. , How transferable are features in deep neural networks? In NIPS, pp.3320-3328, 2014.

A. Yu, R. Palefsky-smith, and R. Bedi, Deep reinforcement learning for simulated autonomous vehicle control, Course Project Reports, pp.1-7, 2016.

T. Yu, T. Jan, S. Simoff, and J. Debenham, Incorporating prior domain knowledge into inductive machine learning. Unpublished doctoral dissertation Computer Sciences, 2007.

T. Yu, S. Simoff, J. , and T. , VQSVM: A case study for incorporating prior domain knowledge into inductive machine learning, Neurocomputing, vol.73, pp.2614-2623, 2010.

M. Zeiler, ADADELTA: An Adaptive Learning Rate Method, 2012.

M. D. Zeiler, ADADELTA: an adaptive learning rate method, 2012.

M. D. Zeiler and R. Fergus, Stochastic pooling for regularization of deep convolutional neural networks, International Conference on Learning Representations (ICLR2013), 2013.

M. D. Zeiler and R. Fergus, Visualizing and understanding convolutional networks, ECCV (1), vol.8689, pp.818-833, 2014.

H. Zen, K. Tokuda, and A. Black, Statistical parametric speech synthesis, Speech Communication, vol.51, issue.11, pp.1039-1064, 2009.
URL : https://hal.archives-ouvertes.fr/hal-00746106

C. Zhang, S. Bengio, M. Hardt, B. Recht, and O. Vinyals, Understanding deep learning requires rethinking generalization, 2016.

J. Zhang, S. Shan, M. Kan, C. , and X. , Coarse-to-Fine Auto-Encoder Networks (CFAN) for Real-Time Face Alignment, ECCV, Part II, pp.1-16, 2014.

K. Zhang, W. Zuo, S. Gu, and L. Zhang, Learning deep CNN denoiser prior for image restoration, CVPR, pp.2808-2817, 2017.

X. Zhang, S. Das, O. Neopane, and K. Kreutz-delgado, A design methodology for efficient implementation of deconvolutional neural networks on an FPGA, 2017.

Y. Zhang, P. David, and B. Gong, Curriculum domain adaptation for semantic segmentation of urban scenes, 2017.

Y. Zhang and Q. Yang, A survey on multi-task learning, 2017.

Z. Zhang, P. Luo, C. C. Loy, and X. Tang, Facial landmark detection by deep multi-task learning, Computer Vision, ECCV 2014, 13th European Conference, pp.94-108, 2014.

F. Zhuang, X. Cheng, P. Luo, S. J. Pan, and Q. He, 1. bias-variance tradeoff (Sec.A.2.1), 2. feedforward networks (Sec.A.2.2), including (a) backpropagation, derivatives computation, and issues, IJCAI, pp.4119-4125, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01667782

, 3) including L p norm regularization (Sec.A.2.3.1), and early stopping (Sec.A.2.3.2). real life (Sec.A.1.1), while we provide some basic definitions (Sec.A.1.2) and different learning scenarios (Sec.A.1.3). of applications, including ? Text or document classification, and the impact of some regularization approaches on the obtained solution

, ? Speech recognition, speech synthesis, speaker verification; ? Computational biology applications, e.g., protein function or structural prediction; ? Computer vision tasks, e.g., image recognition, face detection; ? Fraud detection (credit card, telephone), and network intrusion

, Medical diagnosis; ? Recommendation systems, search engines, information extraction systems