M. Achab, A. Guilloux, S. Gaïffas, and E. Bacry, SGD with Variance Reduction beyond Empirical Risk Minimization, 2015.

B. Alipanahi, A. Delong, M. T. Weirauch, and B. J. Frey, Predicting the sequence specificities of dna-and rna-binding proteins by deep learning, Nature biotechnology, vol.33, issue.8, p.831, 2015.

S. Allassonnière, Y. Amit, and A. Trouvé, Towards a coherent statistical framework for dense deformable template estimation, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol.69, issue.1, pp.3-29, 2007.

Z. Allen-zhu, Katyusha: The first direct acceleration of stochastic gradient methods, Journal of Machine Learning Research (JMLR), vol.18, issue.1, pp.8194-8244, 2017.

Z. Allen-zhu and Y. Li, What can resnet learn efficiently, going beyond kernels?, Advances in Neural Information Processing Systems (NeurIPS), 2019.

Z. Allen-zhu, Y. Yuan, and K. Sridharan, Exploiting the Structure: Stochastic Gradient Methods Using Raw Clusters, Advances in Neural Information Processing Systems (NIPS), 2016.

Z. Allen-zhu, Y. Li, and Y. Liang, Learning and generalization in overparameterized neural networks, going beyond two layers, Advances in Neural Information Processing Systems (NeurIPS), 2019.

Z. Allen-zhu, Y. Li, and Z. Song, A convergence theory for deep learning via overparameterization, Proceedings of the International Conference on Machine Learning (ICML), 2019.

S. F. Altschul, T. L. Madden, A. A. Schäffer, J. Zhang, Z. Zhang et al., Gapped blast and psi-blast: a new generation of protein database search programs, Nucleic acids research, vol.25, issue.17, pp.3389-3402, 1997.

J. Andén and S. Mallat, Deep scattering spectrum, IEEE Transactions on Signal Processing, vol.62, issue.16, pp.4114-4128, 2014.

F. Anselmi, L. Rosasco, C. Tan, and T. Poggio, Deep convolutional networks are hierarchical kernel machines, 2015.

F. Anselmi, L. Rosasco, and T. Poggio, On invariance and selectivity in representation learning, Information and Inference, vol.5, issue.2, pp.134-158, 2016.

M. Anthony and P. Bartlett, Neural network learning: Theoretical foundations, 2009.

M. Arbel, D. J. Sutherland, M. Bi?kowski, and A. Gretton, On gradient regularizers for MMD GANs, Advances in Neural Information Processing Systems (NeurIPS), 2018.

M. Arjovsky, S. Chintala, L. Bottou, and G. Wasserstein, Proceedings of the International Conference on Machine Learning (ICML), 2017.

N. Aronszajn, Theory of reproducing kernels, Transactions of the American mathematical society, vol.68, issue.3, pp.337-404, 1950.

S. Arora, R. Ge, B. Neyshabur, and Y. Zhang, Stronger generalization bounds for deep nets via a compression approach, Proceedings of the International Conference on Machine Learning (ICML), 2018.

S. Arora, S. S. Du, W. Hu, Z. Li, R. Salakhutdinov et al., On exact computation with an infinitely wide neural net, Advances in Neural Information Processing Systems (NeurIPS), 2019.

S. Arora, S. S. Du, W. Hu, Z. Li, and R. Wang, Fine-grained analysis of optimization and generalization for overparameterized two-layer neural networks, Proceedings of the International Conference on Machine Learning (ICML), 2019.

K. Atkinson and W. Han, Spherical harmonics and approximations on the unit sphere: an introduction, vol.2044, 2012.

F. Bach, Sharp analysis of low-rank kernel matrix approximations, Conference on Learning Theory (COLT), 2013.
URL : https://hal.archives-ouvertes.fr/hal-00723365

F. Bach, Breaking the curse of dimensionality with convex neural networks, Journal of Machine Learning Research (JMLR), vol.18, issue.19, pp.1-53, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01098505

F. Bach, On the equivalence between kernel quadrature rules and random feature expansions, Journal of Machine Learning Research (JMLR), vol.18, issue.21, pp.1-38, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01118276

F. Bach and M. I. Jordan, Kernel independent component analysis, Journal of Machine Learning Research (JMLR), vol.3, pp.1-48, 2002.

F. Bach and M. I. Jordan, Predictive low-rank decomposition for kernel methods, Proceedings of the International Conference on Machine Learning (ICML), 2005.

F. Bach and E. Moulines, Non-asymptotic analysis of stochastic approximation algorithms for machine learning, Advances in Neural Information Processing Systems (NIPS), 2011.
URL : https://hal.archives-ouvertes.fr/hal-00608041

F. Bach and E. Moulines, Non-strongly-convex smooth stochastic approximation with convergence rate o (1/n), Advances in Neural Information Processing Systems (NIPS), 2013.
URL : https://hal.archives-ouvertes.fr/hal-00831977

P. L. Bartlett and S. Mendelson, Rademacher and gaussian complexities: Risk bounds and structural results, Journal of Machine Learning Research, vol.3, pp.463-482, 2002.

P. L. Bartlett, O. Bousquet, and S. Mendelson, Local rademacher complexities, The Annals of Statistics, vol.33, issue.4, pp.1497-1537, 2005.

P. L. Bartlett, M. I. Jordan, and J. D. Mcauliffe, Convexity, classification, and risk bounds, Journal of the American Statistical Association, vol.101, issue.473, pp.138-156, 2006.

P. L. Bartlett, D. J. Foster, and M. Telgarsky, Spectrally-normalized margin bounds for neural networks, Advances in Neural Information Processing Systems (NIPS), 2017.

P. L. Bartlett, P. M. Long, G. Lugosi, and A. Tsigler, Benign overfitting in linear regression, 2019.

R. Basri, D. Jacobs, Y. Kasten, and S. Kritchman, The convergence rate of neural networks for learned functions of different frequencies, Advances in Neural Information Processing Systems (NeurIPS), 2019.

M. Belkin, D. Hsu, and P. Mitra, Overfitting or perfect fitting? risk bounds for classification and regression rules that interpolate, Advances in Neural Information Processing Systems (NeurIPS), 2018.

M. Belkin, S. Ma, and S. Mandal, To understand deep learning we need to understand kernel learning, Proceedings of the International Conference on Machine Learning (ICML), 2018.

M. Belkin, A. Rakhlin, and A. B. Tsybakov, Does data interpolation contradict statistical optimality?, Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), 2019.

Y. Bengio, N. L. Roux, P. Vincent, O. Delalleau, and P. Marcotte, Convex neural networks, Advances in Neural Information Processing Systems (NIPS), 2006.

A. Berlinet and C. Thomas-agnan, Reproducing kernel Hilbert spaces in probability and statistics, 2004.

A. Bietti and J. , Invariance and stability of deep convolutional representations, Advances in Neural Information Processing Systems (NIPS), 2017.
URL : https://hal.archives-ouvertes.fr/hal-01630265

A. Bietti and J. , Stochastic optimization with variance reduction for infinite datasets with finite sum structure, Advances in Neural Information Processing Systems (NIPS), 2017.
URL : https://hal.archives-ouvertes.fr/hal-01375816

A. Bietti and J. , Group invariance, stability to deformations, and complexity of deep convolutional representations, Journal of Machine Learning Research, vol.20, issue.25, pp.1-49, 2019.
URL : https://hal.archives-ouvertes.fr/hal-01536004

A. Bietti and J. , On the inductive bias of neural tangent kernels, Advances in Neural Information Processing Systems (NeurIPS), 2019.
URL : https://hal.archives-ouvertes.fr/hal-02144221

A. Bietti, A. Agarwal, and J. Langford, A contextual bandit bake-off, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01708310

A. Bietti, G. Mialon, D. Chen, and J. Mairal, A kernel perspective for regularizing deep neural networks, Proceedings of the International Conference on Machine Learning (ICML), 2019.
URL : https://hal.archives-ouvertes.fr/hal-01884632

B. Biggio and F. Roli, Wild patterns: Ten years after the rise of adversarial machine learning, Pattern Recognition, vol.84, pp.317-331, 2018.

M. Bi?kowski, D. J. Sutherland, M. Arbel, A. Gretton, . Demystifying et al., Proceedings of the International Conference on Learning Representations (ICLR), 2018.

L. Bo, X. Ren, and D. Fox, Kernel descriptors for visual recognition, Advances in Neural Information Processing Systems (NIPS), 2010.

L. Bo, K. Lai, X. Ren, and D. Fox, Object recognition with hierarchical kernel descriptors, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2011.

L. Bottou and O. Bousquet, The Tradeoffs of Large Scale Learning, Advances in Neural Information Processing Systems (NIPS), 2008.

L. Bottou, F. E. Curtis, and J. Nocedal, Optimization methods for large-scale machine learning, Siam Review, vol.60, issue.2, pp.223-311, 2018.

S. Boucheron, O. Bousquet, and G. Lugosi, Theory of classification: A survey of some recent advances, ESAIM: probability and statistics, vol.9, pp.323-375, 2005.
URL : https://hal.archives-ouvertes.fr/hal-00017923

J. Bouvrie, L. Rosasco, and T. Poggio, On invariance in hierarchical models, Advances in Neural Information Processing Systems (NIPS), 2009.

M. M. Bronstein, J. Bruna, Y. Lecun, A. Szlam, and P. Vandergheynst, Geometric deep learning: going beyond euclidean data, IEEE Signal Processing Magazine, vol.34, issue.4, pp.18-42, 2017.

J. Bruna and S. Mallat, Invariant scattering convolution networks, IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), vol.35, pp.1872-1886, 2013.

J. Bruna, A. Szlam, and Y. Lecun, Learning stable group invariant representations with convolutional networks, 2013.

S. Bubeck, Convex optimization: Algorithms and complexity. Foundations and Trends R in Machine Learning, vol.8, pp.3-4, 2015.

Y. Cao and Q. Gu, Generalization bounds of stochastic gradient descent for wide and deep neural networks, Advances in Neural Information Processing Systems (NeurIPS), 2019.

A. Caponnetto and E. Vito, Optimal rates for the regularized least-squares algorithm, Foundations of Computational Mathematics, vol.7, issue.3, pp.331-368, 2007.

T. Ching, Opportunities and obstacles for deep learning in biology and medicine, Journal of The Royal Society Interface, vol.15, issue.141, 2018.

L. Chizat and F. Bach, On the global convergence of gradient descent for overparameterized models using optimal transport, Advances in Neural Information Processing Systems (NeurIPS), 2018.
URL : https://hal.archives-ouvertes.fr/hal-01798792

L. Chizat, E. Oyallon, and F. Bach, On lazy training in differentiable programming, Advances in Neural Information Processing Systems (NeurIPS), 2019.
URL : https://hal.archives-ouvertes.fr/hal-01945578

Y. Cho and L. K. Saul, Kernel methods for deep learning, Advances in Neural Information Processing Systems (NIPS), 2009.

M. Cisse, P. Bojanowski, E. Grave, Y. Dauphin, and N. Usunier, Parseval networks: Improving robustness to adversarial examples, International Conference on Machine Learning (ICML), 2017.

A. Coates, H. Lee, and A. Y. Ng, An Analysis of Single-Layer Networks in Unsupervised Feature Learning, Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), 2011.

J. Cohen, E. Rosenfeld, and Z. Kolter, Certified adversarial robustness via randomized smoothing, Proceedings of the International Conference on Machine Learning (ICML), 2019.

T. Cohen and M. Welling, Group equivariant convolutional networks, International Conference on Machine Learning (ICML), 2016.

T. Cohen, M. Geiger, J. Koehler, and M. Welling, Spherical CNNs, Proceedings of the International Conference on Learning Representations (ICLR), 2018.

F. Cucker and S. Smale, On the mathematical foundations of learning, Bulletin of the American mathematical society, vol.39, issue.1, pp.1-49, 2002.

A. Daniely, Sgd learns the conjugate kernel class of the network, Advances in Neural Information Processing Systems (NIPS), 2017.

A. Daniely, R. Frostig, and Y. Singer, Toward deeper understanding of neural networks: The power of initialization and a dual view on expressivity, Advances in Neural Information Processing Systems (NIPS), 2016.

A. Daniely, R. Frostig, V. Gupta, and Y. Singer, Random features for compositional kernels, 2017.

A. Defazio, F. Bach, and S. Lacoste-julien, Saga: A fast incremental gradient method with support for non-strongly convex composite objectives, Advances in Neural Information Processing Systems (NIPS), 2014.
URL : https://hal.archives-ouvertes.fr/hal-01016843

A. Defazio, J. Domke, and T. S. Caetano, Finito: A faster, permutable incremental gradient method for big data problems, Proceedings of the International Conference on Machine Learning (ICML), 2014.

L. Devroye, L. Györfi, and G. Lugosi, A probabilistic theory of pattern recognition, 1996.

J. Diestel and J. J. Uhl, Vector Measures, 1977.

A. Dieuleveut and F. Bach, Nonparametric stochastic approximation with large stepsizes, The Annals of Statistics, vol.44, issue.4, pp.1363-1399, 2016.

A. Dieuleveut, N. Flammarion, and F. Bach, Harder, better, faster, stronger convergence rates for least-squares regression, Journal of Machine Learning Research (JMLR), vol.18, issue.1, pp.3520-3570, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01275431

H. Drucker and Y. Le-cun, Double backpropagation increasing generalization performance, International Joint Conference on Neural Networks (IJCNN), 1991.

S. S. Du, J. D. Lee, H. Li, L. Wang, and X. Zhai, Gradient descent finds global minima of deep neural networks, Proceedings of the International Conference on Machine Learning (ICML), 2019.

S. S. Du, X. Zhai, B. Poczos, and A. Singh, Gradient descent provably optimizes overparameterized neural networks, Proceedings of the International Conference on Learning Representations (ICLR), 2019.

J. C. Duchi and Y. Singer, Efficient online and batch learning using forward backward splitting, Journal of Machine Learning Research (JMLR), vol.10, pp.2899-2934, 2009.

J. C. Duchi, M. I. Jordan, and M. J. Wainwright, Privacy aware learning, Advances in Neural Information Processing Systems (NIPS), 2012.

G. K. Dziugaite, D. M. Roy, and Z. Ghahramani, Training generative neural networks via maximum mean discrepancy optimization, Conference on Uncertainty in Artificial Intelligence (UAI), 2015.

C. Efthimiou and C. Frye, Spherical harmonics in p dimensions, 2014.

A. E. Alaoui and M. Mahoney, Fast randomized kernel ridge regression with statistical guarantees, Advances in Neural Information Processing Systems (NIPS), 2015.

L. Engstrom, B. Tran, D. Tsipras, L. Schmidt, and A. Madry, Exploring the landscape of spatial robustness, Proceedings of the International Conference on Machine Learning (ICML), 2019.

S. Fine and K. Scheinberg, Efficient SVM training using low-rank kernel representations, Journal of Machine Learning Research, vol.2, pp.243-264, 2001.

S. Fischer and I. Steinwart, Sobolev norm learning rates for regularized least-squares algorithm, 2017.

G. B. Folland, A course in abstract harmonic analysis, 2016.

A. Garriga-alonso, L. Aitchison, and C. E. Rasmussen, Deep convolutional networks as shallow gaussian processes, Proceedings of the International Conference on Learning Representations (ICLR), 2019.

B. Ghorbani, S. Mei, T. Misiakiewicz, and A. Montanari, Linearized two-layers neural networks in high dimension, 2019.

N. Golowich, A. Rakhlin, and O. Shamir, Size-independent sample complexity of neural networks, Conference on Learning Theory (COLT), 2018.

A. Gretton, K. M. Borgwardt, M. J. Rasch, B. Schölkopf, and A. Smola, A kernel two-sample test, Journal of Machine Learning Research, vol.13, pp.723-773, 2012.

I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. C. Courville, Improved training of Wasserstein GANs, Advances in Neural Information Processing Systems (NIPS), 2017.

S. Gunasekar, B. E. Woodworth, S. Bhojanapalli, B. Neyshabur, and N. Srebro, Implicit regularization in matrix factorization, Advances in Neural Information Processing Systems (NIPS), 2017.

S. Gunasekar, J. D. Lee, D. Soudry, and N. Srebro, Implicit bias of gradient descent on linear convolutional networks, Advances in Neural Information Processing Systems (NeurIPS), 2018.

L. Györfi, M. Kohler, A. Krzyzak, and H. Walk, A distribution-free theory of nonparametric regression, 2006.

B. Haasdonk and H. Burkhardt, Invariant kernel functions for pattern analysis and machine learning, Machine learning, vol.68, issue.1, pp.35-61, 2007.

T. Håndstad, A. J. Hestnes, and P. Saetrom, Motif kernel generated by genetic programming improves remote homology and fold detection, BMC bioinformatics, vol.8, issue.1, p.23, 2007.

T. Hastie, R. Tibshirani, and J. Friedman, The elements of statistical learning, 2009.

T. Hastie, R. Tibshirani, and M. Wainwright, Statistical learning with sparsity: the lasso and generalizations, 2015.

K. He, X. Zhang, S. Ren, and J. Sun, Deep residual learning for image recognition, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.

J. Hiriart-urruty and C. , Convex analysis and minimization algorithms I: Fundamentals. Springer science & business media, 1993.

T. Hofmann, A. Lucchi, S. Lacoste-julien, and B. Mcwilliams, Variance Reduced Stochastic Gradient Descent with Neighbors, Advances in Neural Information Processing Systems (NIPS), 2015.
URL : https://hal.archives-ouvertes.fr/hal-01248672

K. Hornik, M. Stinchcombe, and H. White, Multilayer feedforward networks are universal approximators, Neural networks, vol.2, issue.5, pp.359-366, 1989.

D. Hsu, S. Kakade, and T. Zhang, Random design analysis of ridge regression, Foundations of Computational Mathematics, vol.14, issue.3, 2014.

A. Jacot, F. Gabriel, and C. Hongler, Neural tangent kernel: Convergence and generalization in neural networks, Advances in Neural Information Processing Systems (NeurIPS), 2018.
URL : https://hal.archives-ouvertes.fr/hal-01824549

R. Johnson and T. Zhang, Accelerating stochastic gradient descent using predictive variance reduction, Advances in Neural Information Processing Systems (NIPS), 2013.

S. M. Kakade, K. Sridharan, and A. Tewari, On the complexity of linear prediction: Risk bounds, margin bounds, and regularization, Advances in Neural Information Processing Systems (NIPS), 2009.

J. Khim and P. Loh, Adversarial risk bounds via function transformation, 2018.

G. Kimeldorf and G. Wahba, Some results on tchebycheffian spline functions, Journal of mathematical analysis and applications, vol.33, issue.1, pp.82-95, 1971.

V. Koltchinskii, Local rademacher complexities and oracle inequalities in risk minimization, The Annals of Statistics, vol.34, issue.6, pp.2593-2656, 2006.

V. Koltchinskii and D. Panchenko, Empirical margin distributions and bounding the generalization error of combined classifiers. The Annals of Statistics, vol.30, pp.1-50, 2002.

R. Kondor and S. Trivedi, On the generalization of equivariance and convolution in neural networks to the action of compact groups, Proceedings of the International Conference on Machine Learning (ICML), 2018.

A. Krizhevsky, I. Sutskever, and G. E. Hinton, Imagenet classification with deep convolutional neural networks, Advances in Neural Information Processing Systems (NIPS), 2012.

A. Kulunchakov and J. , Estimate sequences for stochastic composite optimization: Variance reduction, acceleration, and robustness to noise, 2019.
URL : https://hal.archives-ouvertes.fr/hal-01993531

S. Lacoste-julien, M. Schmidt, and F. Bach, A simpler approach to obtaining an O(1/t) convergence rate for the projected stochastic subgradient method, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00768187

G. Lan and Y. Zhou, An optimal randomized incremental gradient method, 2017.

L. Landweber, An iteration formula for fredholm integral equations of the first kind, American journal of mathematics, vol.73, issue.3, pp.615-624, 1951.

Y. Lecun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard et al., Backpropagation applied to handwritten zip code recognition, Neural computation, vol.1, issue.4, pp.541-551, 1989.

Y. Lecun, Y. Bengio, and G. Hinton, Deep learning, Nature, vol.521, issue.7553, pp.436-444, 2015.

M. Lecuyer, V. Atlidakis, R. Geambasu, D. Hsu, and S. Jana, Certified robustness to adversarial examples with differential privacy, IEEE Symposium on Security and Privacy (SP), 2019.

J. Lee, Y. Bahri, R. Novak, S. S. Schoenholz, J. Pennington et al., Deep neural networks as gaussian processes, Proceedings of the International Conference on Learning Representations (ICLR), 2018.

J. Lee, L. Xiao, S. S. Schoenholz, Y. Bahri, J. Sohl-dickstein et al., Wide neural networks of any depth evolve as linear models under gradient descent, Advances in Neural Information Processing Systems (NeurIPS), 2019.

C. Li, W. Chang, Y. Cheng, Y. Yang, and B. Póczos, Mmd gan: Towards deeper understanding of moment matching network, Advances in Neural Information Processing Systems (NIPS), 2017.

Y. Li and Y. Liang, Learning overparameterized neural networks via stochastic gradient descent on structured data, Advances in Neural Information Processing Systems (NeurIPS), 2018.

Y. Li, T. Ma, and H. Zhang, Algorithmic regularization in over-parameterized matrix sensing and neural networks with quadratic activations, Conference on Learning Theory (COLT), 2018.

T. Liang and A. Rakhlin, Just interpolate: Kernel "ridgeless" regression can generalize, Annals of Statistics, 2019.

T. Liang, T. Poggio, A. Rakhlin, and J. Stokes, Fisher-Rao metric, geometry, and complexity of neural networks, Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), 2018.

H. Lin, J. Mairal, and Z. Harchaoui, A Universal Catalyst for First-Order Optimization, Advances in Neural Information Processing Systems (NIPS), 2015.
URL : https://hal.archives-ouvertes.fr/hal-01160728

J. Lin, A. Rudi, L. Rosasco, and V. Cevher, Optimal rates for spectral algorithms with least-squares regression over hilbert spaces, Applied and Computational Harmonic Analysis, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01958890

G. Loosli, S. Canu, and L. Bottou, Training invariant support vector machines using selective sampling, Large Scale Kernel Machines, pp.301-320, 2007.

C. Lyu, K. Huang, and H. Liang, A unified gradient regularization family for adversarial examples, IEEE International Conference on Data Mining (ICDM), 2015.

A. L. Maas, R. E. Daly, P. T. Pham, D. Huang, A. Y. Ng et al., Learning word vectors for sentiment analysis, The 49th Annual Meeting of the Association for Computational Linguistics (ACL), pp.142-150, 2011.

A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, Towards deep learning models resistant to adversarial attacks, Proceedings of the International Conference on Learning Representations (ICLR), 2018.

J. , Incremental Majorization-Minimization Optimization with Application to Large-Scale Machine Learning, SIAM Journal on Optimization, vol.25, issue.2, pp.829-855, 2015.

J. , End-to-End Kernel Learning with Supervised Convolutional Kernel Networks, Advances in Neural Information Processing Systems (NIPS), 2016.

J. Mairal, P. Koniusz, Z. Harchaoui, and C. Schmid, Convolutional kernel networks, Advances in Neural Information Processing Systems (NIPS), 2014.
URL : https://hal.archives-ouvertes.fr/hal-01005489

S. Mallat, Group invariant scattering, Communications on Pure and Applied Mathematics, vol.65, issue.10, pp.1331-1398, 2012.

E. Mammen and A. B. Tsybakov, Smooth discrimination analysis. The Annals of Statistics, vol.27, pp.1808-1829, 1999.

P. Massart and É. Nédélec, Risk bounds for statistical learning, The Annals of Statistics, vol.34, issue.5, pp.2326-2366, 2006.

A. Matthews, M. Rowland, J. Hron, R. E. Turner, and Z. Ghahramani, Gaussian process behaviour in wide deep neural networks, 2018.

S. Mei, A. Montanari, and P. Nguyen, A mean field view of the landscape of twolayer neural networks, Proceedings of the National Academy of Sciences, vol.115, issue.33, pp.7665-7671, 2018.

S. Mei, T. Misiakiewicz, and A. Montanari, Mean-field theory of two-layers neural networks: dimension-free bounds and kernel limit, Conference on Learning Theory (COLT), 2019.

N. Meinshausen and P. Bühlmann, Stability selection, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol.72, issue.4, pp.417-473, 2010.

T. Miyato, T. Kataoka, M. Koyama, and Y. Yoshida, Spectral normalization for generative adversarial networks, Proceedings of the International Conference on Learning Representations (ICLR), 2018.

T. Miyato, S. Maeda, S. Ishii, and M. Koyama, Virtual adversarial training: a regularization method for supervised and semi-supervised learning, IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 2018.

G. Montavon, M. L. Braun, and K. Müller, Kernel analysis of deep networks, Journal of Machine Learning Research (JMLR), vol.12, pp.2563-2581, 2011.

Y. Mroueh, S. Voinea, and T. A. Poggio, Learning with group invariant features: A kernel perspective, Advances in Neural Information Processing Systems (NIPS), 2015.

K. Muandet, K. Fukumizu, B. Sriperumbudur, and B. Schölkopf, Kernel mean embedding of distributions: A review and beyond. Foundations and Trends in Machine Learning, vol.10, pp.1-141, 2017.

A. G. Murzin, S. E. Brenner, T. Hubbard, and C. Chothia, Scop: a structural classification of proteins database for the investigation of sequences and structures, Journal of molecular biology, vol.247, issue.4, pp.536-540, 1995.

R. M. Neal, Bayesian learning for neural networks, 1996.

A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro, Robust Stochastic Approximation Approach to Stochastic Programming, SIAM Journal on Optimization, vol.19, issue.4, pp.1574-1609, 2009.
URL : https://hal.archives-ouvertes.fr/hal-00976649

Y. Nesterov, Introductory Lectures on Convex Optimization, 2004.

G. Neu and L. Rosasco, Iterate averaging as regularization for stochastic gradient descent, Conference on Learning Theory (COLT), 2018.

B. Neyshabur, R. Tomioka, and N. Srebro, Norm-based capacity control in neural networks, Conference on Learning Theory (COLT), 2015.

B. Neyshabur, R. Tomioka, and N. Srebro, In search of the real inductive bias: On the role of implicit regularization in deep learning, Proceedings of the International Conference on Learning Representations (ICLR), 2015.

B. Neyshabur, S. Bhojanapalli, D. Mcallester, and N. Srebro, Exploring generalization in deep learning, Advances in Neural Information Processing Systems (NIPS), 2017.

B. Neyshabur, S. Bhojanapalli, D. Mcallester, and N. Srebro, A PAC-Bayesian approach to spectrally-normalized margin bounds for neural networks, Proceedings of the International Conference on Learning Representations (ICLR), 2018.

B. Neyshabur, Z. Li, S. Bhojanapalli, Y. Lecun, and N. Srebro, The role of overparametrization in generalization of neural networks, Proceedings of the International Conference on Learning Representations (ICLR), 2019.

R. Novak, L. Xiao, Y. Bahri, J. Lee, G. Yang et al., Bayesian deep convolutional networks with many channels are gaussian processes, Proceedings of the International Conference on Learning Representations (ICLR), 2019.

E. Oyallon and S. Mallat, Deep roto-translation scattering for object classification, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.

E. Oyallon, E. Belilovsky, and S. Zagoruyko, Scaling the scattering transform: Deep hybrid networks, International Conference on Computer Vision (ICCV, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01495734

M. Paulin, J. Revaud, Z. Harchaoui, F. Perronnin, and C. Schmid, Transformation pursuit for image classification, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014.
URL : https://hal.archives-ouvertes.fr/hal-00979464

A. Pinkus, Approximation theory of the mlp model in neural networks, Acta numerica, vol.8, pp.143-195, 1999.

A. Raghunathan, J. Steinhardt, and P. Liang, Certified defenses against adversarial examples, Proceedings of the International Conference on Learning Representations (ICLR), 2018.

A. Rahimi and B. Recht, Random features for large-scale kernel machines, Advances in Neural Information Processing Systems (NIPS), 2007.

A. Raj, A. Kumar, Y. Mroueh, T. Fletcher, and B. Schoelkopf, Local group invariant representations via orbit embeddings, International Conference on Artificial Intelligence and Statistics, 2017.

G. Raskutti, M. J. Wainwright, and B. Yu, Early stopping and non-parametric regression: an optimal data-dependent stopping rule, Journal of Machine Learning Research, vol.15, issue.1, pp.335-366, 2014.

H. Robbins and S. Monro, A stochastic approximation method. The annals of mathematical statistics, pp.400-407, 1951.

J. Rony, L. G. Hafemann, L. S. Oliveira, I. B. Ayed, R. Sabourin et al., Decoupling direction and norm for efficient gradient-based l2 adversarial attacks and defenses, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.

L. Rosasco, E. D. Vito, A. Caponnetto, M. Piana, and A. Verri, Are loss functions all the same?, Neural Computation, vol.16, issue.5, pp.1063-1076, 2004.

F. Rosenblatt, The perceptron: a probabilistic model for information storage and organization in the brain, Psychological review, vol.65, issue.6, p.386, 1958.

S. Rosset, G. Swirszcz, N. Srebro, and J. Zhu, 1 regularization in infinite dimensional feature spaces, Conference on Learning Theory (COLT), 2007.

K. Roth, A. Lucchi, S. Nowozin, and T. Hofmann, Stabilizing training of generative adversarial networks through regularization, Advances in Neural Information Processing Systems (NIPS), 2017.

K. Roth, A. Lucchi, S. Nowozin, and T. Hofmann, Adversarially robust training through structured gradient regularization, 2018.

A. Rudi and L. Rosasco, Generalization properties of learning with random features, Advances in Neural Information Processing Systems, pp.3215-3225, 2017.

A. Rudi, R. Camoriano, and L. Rosasco, Less is more: Nyström computational regularization, Advances in Neural Information Processing Systems (NIPS), 2015.

S. Saitoh, Integral transforms, reproducing kernels and their applications, vol.369, 1997.

H. Salman, G. Yang, J. Li, P. Zhang, H. Zhang et al., Provably robust deep learning via adversarially trained smoothed classifiers, Advances in Neural Information Processing Systems (NeurIPS), 2019.

P. Savarese, I. Evron, D. Soudry, and N. Srebro, How do infinite width bounded norm networks look in function space?, Conference on Learning Theory (COLT), 2019.

R. E. Schapire and Y. Freund, Boosting: Foundations and algorithms, 2012.

R. E. Schapire, Y. Freund, P. Bartlett, and W. S. Lee, Boosting the margin: A new explanation for the effectiveness of voting methods. The annals of statistics, vol.26, pp.1651-1686, 1998.

L. Schmidt, S. Santurkar, D. Tsipras, K. Talwar, and A. M?dry, Adversarially robust generalization requires more data, Advances in Neural Information Processing Systems (NeurIPS), 2018.

M. Schmidt, N. L. Roux, and F. Bach, Minimizing finite sums with the stochastic average gradient, Mathematical Programming, vol.162, issue.1, pp.83-112, 2017.
URL : https://hal.archives-ouvertes.fr/hal-00860051

I. J. Schoenberg, Positive definite functions on spheres, Duke Mathematical Journal, vol.9, issue.1, pp.96-108, 1942.

B. Schölkopf, Support Vector Learning, 1997.

B. Schölkopf and A. J. Smola, Learning with kernels: support vector machines, regularization, optimization, and beyond, 2001.

B. Schölkopf, A. Smola, and K. Müller, Nonlinear component analysis as a kernel eigenvalue problem, Neural Computation, vol.10, issue.5, pp.1299-1319, 1998.

H. Sedghi, V. Gupta, and P. M. Long, The singular values of convolutional layers, Proceedings of the International Conference on Learning Representations (ICLR), 2019.

S. Shalev-shwartz, SDCA without Duality, Regularization, and Individual Convexity, International Conference on Machine Learning (ICML), 2016.

S. Shalev-shwartz and S. Ben-david, Understanding machine learning: From theory to algorithms, 2014.

S. Shalev-shwartz and T. Zhang, Stochastic dual coordinate ascent methods for regularized loss minimization, Journal of Machine Learning Research (JMLR), vol.14, pp.567-599, 2013.

S. Shalev-shwartz, O. Shamir, and K. Sridharan, Learning kernel-based halfspaces with the 0-1 loss, SIAM Journal on Computing, vol.40, issue.6, pp.1623-1646, 2011.

J. Shawe-taylor and N. Cristianini, Kernel methods for pattern analysis, 2004.

L. Sifre and S. Mallat, Rotation, scaling and deformation invariant scattering for texture discrimination, Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), 2013.

P. Y. Simard, Y. A. Lecun, J. S. Denker, and B. Victorri, Transformation invariance in pattern recognition-tangent distance and tangent propagation, Neural networks: tricks of the trade, pp.239-274, 1998.
URL : https://hal.archives-ouvertes.fr/halshs-00009505

C. Simon-gabriel, Y. Ollivier, L. Bottou, B. Schölkopf, and D. Lopez-paz, First-order adversarial vulnerability of neural networks and input dimension, Proceedings of the International Conference on Machine Learning (ICML), 2019.

K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, Proceedings of the International Conference on Learning Representations (ICLR), 2014.

A. Sinha, H. Namkoong, and J. Duchi, Certifying some distributional robustness with principled adversarial training, Proceedings of the International Conference on Learning Representations (ICLR), 2018.

S. Smale and D. Zhou, Estimating the approximation error in learning theory, Analysis and Applications, vol.1, issue.01, pp.17-41, 2003.

S. Smale, L. Rosasco, J. Bouvrie, A. Caponnetto, and T. Poggio, Mathematics of the neural response, Foundations of Computational Mathematics, vol.10, issue.1, pp.67-91, 2010.

A. J. Smola and B. Schölkopf, Sparse greedy matrix approximation for machine learning, Proceedings of the International Conference on Machine Learning (ICML), 2000.

A. J. Smola, Z. L. Ovari, and R. C. Williamson, Regularization with dot-product kernels, Advances in Neural Information Processing Systems (NIPS), 2001.

M. Soltanolkotabi, A. Javanmard, and J. D. Lee, Theoretical insights into the optimization landscape of over-parameterized shallow neural networks, IEEE Transactions on Information Theory, vol.65, issue.2, pp.742-769, 2018.

D. Soudry, E. Hoffer, M. S. Nacson, S. Gunasekar, and N. Srebro, The implicit bias of gradient descent on separable data, Journal of Machine Learning Research (JMLR), vol.19, issue.1, pp.2822-2878, 2018.

B. K. Sriperumbudur, K. Fukumizu, A. Gretton, B. Schölkopf, and G. R. Lanckriet, On the empirical estimation of integral probability metrics, Electronic Journal of Statistics, vol.6, pp.1550-1599, 2012.

E. M. Stein, Harmonic Analysis: Real-variable Methods, Orthogonality, and Oscillatory Integrals, 1993.

I. Steinwart and A. Christmann, Support vector machines, 2008.

I. Steinwart, P. Thomann, and N. Schmid, Learning with hierarchical gaussian kernels, 2016.

C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan et al., Intriguing properties of neural networks, International Conference on Learning Representations (ICLR), 2014.

C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, Rethinking the inception architecture for computer vision, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.

M. Telgarsky, Margins, shrinkage and boosting, Proceedings of the International Conference on Machine Learning (ICML), 2013.

A. Torralba and A. Oliva, Statistics of natural image categories. Network: computation in neural systems, vol.14, pp.391-412, 2003.

A. Trouvé and L. Younes, Local geometry of deformable templates, SIAM journal on mathematical analysis, vol.37, issue.1, pp.17-59, 2005.

D. Tsipras, S. Santurkar, L. Engstrom, A. Turner, and A. Madry, Robustness may be at odds with accuracy, Proceedings of the International Conference on Learning Representations (ICLR), 2019.

A. B. Tsybakov, Introduction to Nonparametric Estimation, 2008.

L. G. Valiant, A theory of the learnable, Proceedings of the sixteenth annual ACM symposium on Theory of computing, pp.436-445, 1984.

M. J. Van-de and . Vijver, A Gene-Expression Signature as a Predictor of Survival in Breast Cancer, New England Journal of Medicine, vol.347, issue.25, 1999.

L. Van-der-maaten, M. Chen, S. Tyree, and K. Q. Weinberger, Learning with marginalized corrupted features, International Conference on Machine Learning (ICML), 2013.

V. Vapnik, The nature of statistical learning theory, 2000.

V. Vapnik and A. Y. Chervonenkis, On the uniform convergence of relative frequencies of events to their probabilities, Theory of Probability and its Applications, vol.16, p.264, 1971.

J. Vert and J. , Machine learning with kernel methods, Course in the "Mathématiques, Vision, Apprentissage" Master. ENS Cachan, 2017.

U. Luxburg and O. Bousquet, Distance-based classification with lipschitz functions, Journal of Machine Learning Research (JMLR), vol.5, pp.669-695, 2004.

S. Wager, W. Fithian, S. Wang, and P. Liang, Altitude Training: Strong Bounds for Single-layer Dropout, Advances in Neural Information Processing Systems (NIPS), 2014.

G. Wahba, Spline models for observational data, vol.59, 1990.

M. J. Wainwright, High-dimensional statistics: A non-asymptotic viewpoint, vol.48, 2019.

C. Wei, J. D. Lee, Q. Liu, and T. Ma, Regularization matters: Generalization and optimization of neural nets v.s. their induced kernel, Advances in Neural Information Processing Systems (NeurIPS), 2019.

T. Wiatowski and H. Bölcskei, A mathematical theory of deep convolutional neural networks for feature extraction, IEEE Transactions on Information Theory, vol.64, issue.3, pp.1845-1866, 2018.

C. K. Williams, Computing with infinite networks, Advances in Neural Information Processing Systems (NIPS), 1997.

C. K. Williams and M. Seeger, Using the Nyström method to speed up kernel machines, Advances in Neural Information Processing Systems (NIPS), 2001.

F. Williams, M. Trager, C. Silva, D. Panozzo, D. Zorin et al., Gradient dynamics of shallow low-dimensional relu networks, Advances in Neural Information Processing Systems (NeurIPS), 2019.

A. C. Wilson, R. Roelofs, M. Stern, N. Srebro, and B. Recht, The marginal value of adaptive gradient methods in machine learning, Advances in Neural Information Processing Systems (NIPS), 2017.

E. Wong and J. Z. Kolter, Provable defenses against adversarial examples via the convex outer adversarial polytope, Proceedings of the International Conference on Machine Learning (ICML), 2018.

L. Xiao, Dual averaging methods for regularized stochastic learning and online optimization, Journal of Machine Learning Research (JMLR), vol.11, pp.2543-2596, 2010.

L. Xiao and T. Zhang, A proximal stochastic gradient method with progressive variance reduction, SIAM Journal on Optimization, vol.24, issue.4, pp.2057-2075, 2014.

B. Xie, Y. Liang, and L. Song, Diverse neural network learns true target functions, Proceedings of the International Conference on Artificial Intelligence and Statistics, 2017.

H. Xu, C. Caramanis, and S. Mannor, Robust regression and lasso, Advances in Neural Information Processing Systems (NIPS), 2009.

H. Xu, C. Caramanis, and S. Mannor, Robustness and regularization of support vector machines, Journal of Machine Learning Research (JMLR), vol.10, pp.1485-1510, 2009.

G. Yang, Scaling limits of wide neural networks with weight sharing: Gaussian process behavior, gradient independence, and neural tangent kernel derivation, 2019.

G. Yang and H. Salman, A fine-grained spectral perspective on neural networks, 2019.

Y. Yao, L. Rosasco, and A. Caponnetto, On early stopping in gradient descent learning, Constructive Approximation, vol.26, issue.2, pp.289-315, 2007.

D. Yin, K. Ramchandran, and P. Bartlett, Rademacher complexity for adversarially robust generalization, Proceedings of the International Conference on Machine Learning (ICML), 2019.

Y. Yoshida and T. Miyato, Spectral norm regularization for improving the generalizability of deep learning, 2017.

S. Zagoruyko and N. Komodakis, Wide residual networks, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01832503

C. Zhang, S. Bengio, M. Hardt, B. Recht, and O. Vinyals, Understanding deep learning requires rethinking generalization, Proceedings of the International Conference on Learning Representations (ICLR), 2017.

C. Zhang, S. Bengio, and Y. Singer, Are all layers created equal?, 2019.

K. Zhang, I. W. Tsang, and J. T. Kwok, Improved nyström low-rank approximation and error analysis, Proceedings of the International Conference on Machine Learning (ICML), 2008.

T. Zhang and B. Yu, Boosting with early stopping: Convergence and consistency, The Annals of Statistics, vol.33, issue.4, pp.1538-1579, 2005.

Y. Zhang, J. D. Lee, and M. I. Jordan, 1 -regularized neural networks are improperly learnable in polynomial time, International Conference on Machine Learning (ICML), 2016.

Y. Zhang, P. Liang, and M. J. Wainwright, Convexified convolutional neural networks, International Conference on Machine Learning (ICML), 2017.

S. Zheng, J. , and T. Kwok, Lightweight stochastic optimization for minimizing finite sums with infinite data, Proceedings of the International Conference on Machine Learning (ICML), 2018.

S. Zheng, Y. Song, T. Leung, and I. Goodfellow, Improving the robustness of deep neural networks via stability training, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.

D. Zou, Y. Cao, D. Zhou, and Q. Gu, Stochastic gradient descent optimizes overparameterized deep relu networks, Machine Learning, 2019.