, SGD with Variance Reduction beyond Empirical Risk Minimization, 2015.
Predicting the sequence specificities of dna-and rna-binding proteins by deep learning, Nature biotechnology, vol.33, issue.8, p.831, 2015. ,
Towards a coherent statistical framework for dense deformable template estimation, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol.69, issue.1, pp.3-29, 2007. ,
Katyusha: The first direct acceleration of stochastic gradient methods, Journal of Machine Learning Research (JMLR), vol.18, issue.1, pp.8194-8244, 2017. ,
What can resnet learn efficiently, going beyond kernels?, Advances in Neural Information Processing Systems (NeurIPS), 2019. ,
Exploiting the Structure: Stochastic Gradient Methods Using Raw Clusters, Advances in Neural Information Processing Systems (NIPS), 2016. ,
Learning and generalization in overparameterized neural networks, going beyond two layers, Advances in Neural Information Processing Systems (NeurIPS), 2019. ,
A convergence theory for deep learning via overparameterization, Proceedings of the International Conference on Machine Learning (ICML), 2019. ,
Gapped blast and psi-blast: a new generation of protein database search programs, Nucleic acids research, vol.25, issue.17, pp.3389-3402, 1997. ,
Deep scattering spectrum, IEEE Transactions on Signal Processing, vol.62, issue.16, pp.4114-4128, 2014. ,
Deep convolutional networks are hierarchical kernel machines, 2015. ,
On invariance and selectivity in representation learning, Information and Inference, vol.5, issue.2, pp.134-158, 2016. ,
Neural network learning: Theoretical foundations, 2009. ,
On gradient regularizers for MMD GANs, Advances in Neural Information Processing Systems (NeurIPS), 2018. ,
, Proceedings of the International Conference on Machine Learning (ICML), 2017.
Theory of reproducing kernels, Transactions of the American mathematical society, vol.68, issue.3, pp.337-404, 1950. ,
Stronger generalization bounds for deep nets via a compression approach, Proceedings of the International Conference on Machine Learning (ICML), 2018. ,
On exact computation with an infinitely wide neural net, Advances in Neural Information Processing Systems (NeurIPS), 2019. ,
Fine-grained analysis of optimization and generalization for overparameterized two-layer neural networks, Proceedings of the International Conference on Machine Learning (ICML), 2019. ,
Spherical harmonics and approximations on the unit sphere: an introduction, vol.2044, 2012. ,
Sharp analysis of low-rank kernel matrix approximations, Conference on Learning Theory (COLT), 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00723365
Breaking the curse of dimensionality with convex neural networks, Journal of Machine Learning Research (JMLR), vol.18, issue.19, pp.1-53, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01098505
On the equivalence between kernel quadrature rules and random feature expansions, Journal of Machine Learning Research (JMLR), vol.18, issue.21, pp.1-38, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01118276
Kernel independent component analysis, Journal of Machine Learning Research (JMLR), vol.3, pp.1-48, 2002. ,
Predictive low-rank decomposition for kernel methods, Proceedings of the International Conference on Machine Learning (ICML), 2005. ,
Non-asymptotic analysis of stochastic approximation algorithms for machine learning, Advances in Neural Information Processing Systems (NIPS), 2011. ,
URL : https://hal.archives-ouvertes.fr/hal-00608041
Non-strongly-convex smooth stochastic approximation with convergence rate o (1/n), Advances in Neural Information Processing Systems (NIPS), 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00831977
Rademacher and gaussian complexities: Risk bounds and structural results, Journal of Machine Learning Research, vol.3, pp.463-482, 2002. ,
Local rademacher complexities, The Annals of Statistics, vol.33, issue.4, pp.1497-1537, 2005. ,
Convexity, classification, and risk bounds, Journal of the American Statistical Association, vol.101, issue.473, pp.138-156, 2006. ,
Spectrally-normalized margin bounds for neural networks, Advances in Neural Information Processing Systems (NIPS), 2017. ,
Benign overfitting in linear regression, 2019. ,
The convergence rate of neural networks for learned functions of different frequencies, Advances in Neural Information Processing Systems (NeurIPS), 2019. ,
Overfitting or perfect fitting? risk bounds for classification and regression rules that interpolate, Advances in Neural Information Processing Systems (NeurIPS), 2018. ,
To understand deep learning we need to understand kernel learning, Proceedings of the International Conference on Machine Learning (ICML), 2018. ,
Does data interpolation contradict statistical optimality?, Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), 2019. ,
Convex neural networks, Advances in Neural Information Processing Systems (NIPS), 2006. ,
Reproducing kernel Hilbert spaces in probability and statistics, 2004. ,
Invariance and stability of deep convolutional representations, Advances in Neural Information Processing Systems (NIPS), 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01630265
Stochastic optimization with variance reduction for infinite datasets with finite sum structure, Advances in Neural Information Processing Systems (NIPS), 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01375816
Group invariance, stability to deformations, and complexity of deep convolutional representations, Journal of Machine Learning Research, vol.20, issue.25, pp.1-49, 2019. ,
URL : https://hal.archives-ouvertes.fr/hal-01536004
On the inductive bias of neural tangent kernels, Advances in Neural Information Processing Systems (NeurIPS), 2019. ,
URL : https://hal.archives-ouvertes.fr/hal-02144221
, A contextual bandit bake-off, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01708310
A kernel perspective for regularizing deep neural networks, Proceedings of the International Conference on Machine Learning (ICML), 2019. ,
URL : https://hal.archives-ouvertes.fr/hal-01884632
Wild patterns: Ten years after the rise of adversarial machine learning, Pattern Recognition, vol.84, pp.317-331, 2018. ,
, Proceedings of the International Conference on Learning Representations (ICLR), 2018.
Kernel descriptors for visual recognition, Advances in Neural Information Processing Systems (NIPS), 2010. ,
Object recognition with hierarchical kernel descriptors, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2011. ,
The Tradeoffs of Large Scale Learning, Advances in Neural Information Processing Systems (NIPS), 2008. ,
Optimization methods for large-scale machine learning, Siam Review, vol.60, issue.2, pp.223-311, 2018. ,
Theory of classification: A survey of some recent advances, ESAIM: probability and statistics, vol.9, pp.323-375, 2005. ,
URL : https://hal.archives-ouvertes.fr/hal-00017923
On invariance in hierarchical models, Advances in Neural Information Processing Systems (NIPS), 2009. ,
Geometric deep learning: going beyond euclidean data, IEEE Signal Processing Magazine, vol.34, issue.4, pp.18-42, 2017. ,
Invariant scattering convolution networks, IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), vol.35, pp.1872-1886, 2013. ,
, Learning stable group invariant representations with convolutional networks, 2013.
Convex optimization: Algorithms and complexity. Foundations and Trends R in Machine Learning, vol.8, pp.3-4, 2015. ,
Generalization bounds of stochastic gradient descent for wide and deep neural networks, Advances in Neural Information Processing Systems (NeurIPS), 2019. ,
Optimal rates for the regularized least-squares algorithm, Foundations of Computational Mathematics, vol.7, issue.3, pp.331-368, 2007. ,
Opportunities and obstacles for deep learning in biology and medicine, Journal of The Royal Society Interface, vol.15, issue.141, 2018. ,
On the global convergence of gradient descent for overparameterized models using optimal transport, Advances in Neural Information Processing Systems (NeurIPS), 2018. ,
URL : https://hal.archives-ouvertes.fr/hal-01798792
On lazy training in differentiable programming, Advances in Neural Information Processing Systems (NeurIPS), 2019. ,
URL : https://hal.archives-ouvertes.fr/hal-01945578
Kernel methods for deep learning, Advances in Neural Information Processing Systems (NIPS), 2009. ,
Parseval networks: Improving robustness to adversarial examples, International Conference on Machine Learning (ICML), 2017. ,
An Analysis of Single-Layer Networks in Unsupervised Feature Learning, Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), 2011. ,
Certified adversarial robustness via randomized smoothing, Proceedings of the International Conference on Machine Learning (ICML), 2019. ,
Group equivariant convolutional networks, International Conference on Machine Learning (ICML), 2016. ,
Spherical CNNs, Proceedings of the International Conference on Learning Representations (ICLR), 2018. ,
On the mathematical foundations of learning, Bulletin of the American mathematical society, vol.39, issue.1, pp.1-49, 2002. ,
Sgd learns the conjugate kernel class of the network, Advances in Neural Information Processing Systems (NIPS), 2017. ,
Toward deeper understanding of neural networks: The power of initialization and a dual view on expressivity, Advances in Neural Information Processing Systems (NIPS), 2016. ,
Random features for compositional kernels, 2017. ,
Saga: A fast incremental gradient method with support for non-strongly convex composite objectives, Advances in Neural Information Processing Systems (NIPS), 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-01016843
Finito: A faster, permutable incremental gradient method for big data problems, Proceedings of the International Conference on Machine Learning (ICML), 2014. ,
A probabilistic theory of pattern recognition, 1996. ,
Vector Measures, 1977. ,
Nonparametric stochastic approximation with large stepsizes, The Annals of Statistics, vol.44, issue.4, pp.1363-1399, 2016. ,
Harder, better, faster, stronger convergence rates for least-squares regression, Journal of Machine Learning Research (JMLR), vol.18, issue.1, pp.3520-3570, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01275431
Double backpropagation increasing generalization performance, International Joint Conference on Neural Networks (IJCNN), 1991. ,
Gradient descent finds global minima of deep neural networks, Proceedings of the International Conference on Machine Learning (ICML), 2019. ,
Gradient descent provably optimizes overparameterized neural networks, Proceedings of the International Conference on Learning Representations (ICLR), 2019. ,
Efficient online and batch learning using forward backward splitting, Journal of Machine Learning Research (JMLR), vol.10, pp.2899-2934, 2009. ,
Privacy aware learning, Advances in Neural Information Processing Systems (NIPS), 2012. ,
Training generative neural networks via maximum mean discrepancy optimization, Conference on Uncertainty in Artificial Intelligence (UAI), 2015. ,
Spherical harmonics in p dimensions, 2014. ,
Fast randomized kernel ridge regression with statistical guarantees, Advances in Neural Information Processing Systems (NIPS), 2015. ,
Exploring the landscape of spatial robustness, Proceedings of the International Conference on Machine Learning (ICML), 2019. ,
Efficient SVM training using low-rank kernel representations, Journal of Machine Learning Research, vol.2, pp.243-264, 2001. ,
Sobolev norm learning rates for regularized least-squares algorithm, 2017. ,
A course in abstract harmonic analysis, 2016. ,
Deep convolutional networks as shallow gaussian processes, Proceedings of the International Conference on Learning Representations (ICLR), 2019. ,
Linearized two-layers neural networks in high dimension, 2019. ,
Size-independent sample complexity of neural networks, Conference on Learning Theory (COLT), 2018. ,
A kernel two-sample test, Journal of Machine Learning Research, vol.13, pp.723-773, 2012. ,
Improved training of Wasserstein GANs, Advances in Neural Information Processing Systems (NIPS), 2017. ,
Implicit regularization in matrix factorization, Advances in Neural Information Processing Systems (NIPS), 2017. ,
Implicit bias of gradient descent on linear convolutional networks, Advances in Neural Information Processing Systems (NeurIPS), 2018. ,
A distribution-free theory of nonparametric regression, 2006. ,
Invariant kernel functions for pattern analysis and machine learning, Machine learning, vol.68, issue.1, pp.35-61, 2007. ,
Motif kernel generated by genetic programming improves remote homology and fold detection, BMC bioinformatics, vol.8, issue.1, p.23, 2007. ,
The elements of statistical learning, 2009. ,
Statistical learning with sparsity: the lasso and generalizations, 2015. ,
Deep residual learning for image recognition, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. ,
Convex analysis and minimization algorithms I: Fundamentals. Springer science & business media, 1993. ,
Variance Reduced Stochastic Gradient Descent with Neighbors, Advances in Neural Information Processing Systems (NIPS), 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01248672
Multilayer feedforward networks are universal approximators, Neural networks, vol.2, issue.5, pp.359-366, 1989. ,
Random design analysis of ridge regression, Foundations of Computational Mathematics, vol.14, issue.3, 2014. ,
Neural tangent kernel: Convergence and generalization in neural networks, Advances in Neural Information Processing Systems (NeurIPS), 2018. ,
URL : https://hal.archives-ouvertes.fr/hal-01824549
Accelerating stochastic gradient descent using predictive variance reduction, Advances in Neural Information Processing Systems (NIPS), 2013. ,
On the complexity of linear prediction: Risk bounds, margin bounds, and regularization, Advances in Neural Information Processing Systems (NIPS), 2009. ,
, Adversarial risk bounds via function transformation, 2018.
Some results on tchebycheffian spline functions, Journal of mathematical analysis and applications, vol.33, issue.1, pp.82-95, 1971. ,
Local rademacher complexities and oracle inequalities in risk minimization, The Annals of Statistics, vol.34, issue.6, pp.2593-2656, 2006. ,
Empirical margin distributions and bounding the generalization error of combined classifiers. The Annals of Statistics, vol.30, pp.1-50, 2002. ,
On the generalization of equivariance and convolution in neural networks to the action of compact groups, Proceedings of the International Conference on Machine Learning (ICML), 2018. ,
Imagenet classification with deep convolutional neural networks, Advances in Neural Information Processing Systems (NIPS), 2012. ,
Estimate sequences for stochastic composite optimization: Variance reduction, acceleration, and robustness to noise, 2019. ,
URL : https://hal.archives-ouvertes.fr/hal-01993531
A simpler approach to obtaining an O(1/t) convergence rate for the projected stochastic subgradient method, 2012. ,
URL : https://hal.archives-ouvertes.fr/hal-00768187
An optimal randomized incremental gradient method, 2017. ,
An iteration formula for fredholm integral equations of the first kind, American journal of mathematics, vol.73, issue.3, pp.615-624, 1951. ,
Backpropagation applied to handwritten zip code recognition, Neural computation, vol.1, issue.4, pp.541-551, 1989. ,
Deep learning, Nature, vol.521, issue.7553, pp.436-444, 2015. ,
Certified robustness to adversarial examples with differential privacy, IEEE Symposium on Security and Privacy (SP), 2019. ,
Deep neural networks as gaussian processes, Proceedings of the International Conference on Learning Representations (ICLR), 2018. ,
Wide neural networks of any depth evolve as linear models under gradient descent, Advances in Neural Information Processing Systems (NeurIPS), 2019. ,
Mmd gan: Towards deeper understanding of moment matching network, Advances in Neural Information Processing Systems (NIPS), 2017. ,
Learning overparameterized neural networks via stochastic gradient descent on structured data, Advances in Neural Information Processing Systems (NeurIPS), 2018. ,
Algorithmic regularization in over-parameterized matrix sensing and neural networks with quadratic activations, Conference on Learning Theory (COLT), 2018. ,
Just interpolate: Kernel "ridgeless" regression can generalize, Annals of Statistics, 2019. ,
Fisher-Rao metric, geometry, and complexity of neural networks, Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), 2018. ,
A Universal Catalyst for First-Order Optimization, Advances in Neural Information Processing Systems (NIPS), 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01160728
Optimal rates for spectral algorithms with least-squares regression over hilbert spaces, Applied and Computational Harmonic Analysis, 2018. ,
URL : https://hal.archives-ouvertes.fr/hal-01958890
Training invariant support vector machines using selective sampling, Large Scale Kernel Machines, pp.301-320, 2007. ,
A unified gradient regularization family for adversarial examples, IEEE International Conference on Data Mining (ICDM), 2015. ,
Learning word vectors for sentiment analysis, The 49th Annual Meeting of the Association for Computational Linguistics (ACL), pp.142-150, 2011. ,
Towards deep learning models resistant to adversarial attacks, Proceedings of the International Conference on Learning Representations (ICLR), 2018. ,
Incremental Majorization-Minimization Optimization with Application to Large-Scale Machine Learning, SIAM Journal on Optimization, vol.25, issue.2, pp.829-855, 2015. ,
End-to-End Kernel Learning with Supervised Convolutional Kernel Networks, Advances in Neural Information Processing Systems (NIPS), 2016. ,
Convolutional kernel networks, Advances in Neural Information Processing Systems (NIPS), 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-01005489
Group invariant scattering, Communications on Pure and Applied Mathematics, vol.65, issue.10, pp.1331-1398, 2012. ,
Smooth discrimination analysis. The Annals of Statistics, vol.27, pp.1808-1829, 1999. ,
Risk bounds for statistical learning, The Annals of Statistics, vol.34, issue.5, pp.2326-2366, 2006. ,
, Gaussian process behaviour in wide deep neural networks, 2018.
A mean field view of the landscape of twolayer neural networks, Proceedings of the National Academy of Sciences, vol.115, issue.33, pp.7665-7671, 2018. ,
Mean-field theory of two-layers neural networks: dimension-free bounds and kernel limit, Conference on Learning Theory (COLT), 2019. ,
Stability selection, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol.72, issue.4, pp.417-473, 2010. ,
Spectral normalization for generative adversarial networks, Proceedings of the International Conference on Learning Representations (ICLR), 2018. ,
Virtual adversarial training: a regularization method for supervised and semi-supervised learning, IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 2018. ,
Kernel analysis of deep networks, Journal of Machine Learning Research (JMLR), vol.12, pp.2563-2581, 2011. ,
Learning with group invariant features: A kernel perspective, Advances in Neural Information Processing Systems (NIPS), 2015. ,
Kernel mean embedding of distributions: A review and beyond. Foundations and Trends in Machine Learning, vol.10, pp.1-141, 2017. ,
Scop: a structural classification of proteins database for the investigation of sequences and structures, Journal of molecular biology, vol.247, issue.4, pp.536-540, 1995. ,
Bayesian learning for neural networks, 1996. ,
Robust Stochastic Approximation Approach to Stochastic Programming, SIAM Journal on Optimization, vol.19, issue.4, pp.1574-1609, 2009. ,
URL : https://hal.archives-ouvertes.fr/hal-00976649
Introductory Lectures on Convex Optimization, 2004. ,
Iterate averaging as regularization for stochastic gradient descent, Conference on Learning Theory (COLT), 2018. ,
Norm-based capacity control in neural networks, Conference on Learning Theory (COLT), 2015. ,
In search of the real inductive bias: On the role of implicit regularization in deep learning, Proceedings of the International Conference on Learning Representations (ICLR), 2015. ,
Exploring generalization in deep learning, Advances in Neural Information Processing Systems (NIPS), 2017. ,
A PAC-Bayesian approach to spectrally-normalized margin bounds for neural networks, Proceedings of the International Conference on Learning Representations (ICLR), 2018. ,
The role of overparametrization in generalization of neural networks, Proceedings of the International Conference on Learning Representations (ICLR), 2019. ,
Bayesian deep convolutional networks with many channels are gaussian processes, Proceedings of the International Conference on Learning Representations (ICLR), 2019. ,
Deep roto-translation scattering for object classification, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015. ,
Scaling the scattering transform: Deep hybrid networks, International Conference on Computer Vision (ICCV, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01495734
Transformation pursuit for image classification, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-00979464
Approximation theory of the mlp model in neural networks, Acta numerica, vol.8, pp.143-195, 1999. ,
Certified defenses against adversarial examples, Proceedings of the International Conference on Learning Representations (ICLR), 2018. ,
Random features for large-scale kernel machines, Advances in Neural Information Processing Systems (NIPS), 2007. ,
Local group invariant representations via orbit embeddings, International Conference on Artificial Intelligence and Statistics, 2017. ,
Early stopping and non-parametric regression: an optimal data-dependent stopping rule, Journal of Machine Learning Research, vol.15, issue.1, pp.335-366, 2014. ,
A stochastic approximation method. The annals of mathematical statistics, pp.400-407, 1951. ,
Decoupling direction and norm for efficient gradient-based l2 adversarial attacks and defenses, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019. ,
Are loss functions all the same?, Neural Computation, vol.16, issue.5, pp.1063-1076, 2004. ,
The perceptron: a probabilistic model for information storage and organization in the brain, Psychological review, vol.65, issue.6, p.386, 1958. ,
1 regularization in infinite dimensional feature spaces, Conference on Learning Theory (COLT), 2007. ,
Stabilizing training of generative adversarial networks through regularization, Advances in Neural Information Processing Systems (NIPS), 2017. ,
, Adversarially robust training through structured gradient regularization, 2018.
Generalization properties of learning with random features, Advances in Neural Information Processing Systems, pp.3215-3225, 2017. ,
Less is more: Nyström computational regularization, Advances in Neural Information Processing Systems (NIPS), 2015. ,
Integral transforms, reproducing kernels and their applications, vol.369, 1997. ,
Provably robust deep learning via adversarially trained smoothed classifiers, Advances in Neural Information Processing Systems (NeurIPS), 2019. ,
How do infinite width bounded norm networks look in function space?, Conference on Learning Theory (COLT), 2019. ,
Boosting: Foundations and algorithms, 2012. ,
Boosting the margin: A new explanation for the effectiveness of voting methods. The annals of statistics, vol.26, pp.1651-1686, 1998. ,
Adversarially robust generalization requires more data, Advances in Neural Information Processing Systems (NeurIPS), 2018. ,
Minimizing finite sums with the stochastic average gradient, Mathematical Programming, vol.162, issue.1, pp.83-112, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-00860051
Positive definite functions on spheres, Duke Mathematical Journal, vol.9, issue.1, pp.96-108, 1942. ,
Support Vector Learning, 1997. ,
Learning with kernels: support vector machines, regularization, optimization, and beyond, 2001. ,
Nonlinear component analysis as a kernel eigenvalue problem, Neural Computation, vol.10, issue.5, pp.1299-1319, 1998. ,
The singular values of convolutional layers, Proceedings of the International Conference on Learning Representations (ICLR), 2019. ,
SDCA without Duality, Regularization, and Individual Convexity, International Conference on Machine Learning (ICML), 2016. ,
Understanding machine learning: From theory to algorithms, 2014. ,
Stochastic dual coordinate ascent methods for regularized loss minimization, Journal of Machine Learning Research (JMLR), vol.14, pp.567-599, 2013. ,
Learning kernel-based halfspaces with the 0-1 loss, SIAM Journal on Computing, vol.40, issue.6, pp.1623-1646, 2011. ,
Kernel methods for pattern analysis, 2004. ,
Rotation, scaling and deformation invariant scattering for texture discrimination, Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), 2013. ,
Transformation invariance in pattern recognition-tangent distance and tangent propagation, Neural networks: tricks of the trade, pp.239-274, 1998. ,
URL : https://hal.archives-ouvertes.fr/halshs-00009505
First-order adversarial vulnerability of neural networks and input dimension, Proceedings of the International Conference on Machine Learning (ICML), 2019. ,
Very deep convolutional networks for large-scale image recognition, Proceedings of the International Conference on Learning Representations (ICLR), 2014. ,
Certifying some distributional robustness with principled adversarial training, Proceedings of the International Conference on Learning Representations (ICLR), 2018. ,
Estimating the approximation error in learning theory, Analysis and Applications, vol.1, issue.01, pp.17-41, 2003. ,
Mathematics of the neural response, Foundations of Computational Mathematics, vol.10, issue.1, pp.67-91, 2010. ,
Sparse greedy matrix approximation for machine learning, Proceedings of the International Conference on Machine Learning (ICML), 2000. ,
Regularization with dot-product kernels, Advances in Neural Information Processing Systems (NIPS), 2001. ,
Theoretical insights into the optimization landscape of over-parameterized shallow neural networks, IEEE Transactions on Information Theory, vol.65, issue.2, pp.742-769, 2018. ,
The implicit bias of gradient descent on separable data, Journal of Machine Learning Research (JMLR), vol.19, issue.1, pp.2822-2878, 2018. ,
On the empirical estimation of integral probability metrics, Electronic Journal of Statistics, vol.6, pp.1550-1599, 2012. ,
, Harmonic Analysis: Real-variable Methods, Orthogonality, and Oscillatory Integrals, 1993.
Support vector machines, 2008. ,
Learning with hierarchical gaussian kernels, 2016. ,
Intriguing properties of neural networks, International Conference on Learning Representations (ICLR), 2014. ,
Rethinking the inception architecture for computer vision, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. ,
Margins, shrinkage and boosting, Proceedings of the International Conference on Machine Learning (ICML), 2013. ,
Statistics of natural image categories. Network: computation in neural systems, vol.14, pp.391-412, 2003. ,
Local geometry of deformable templates, SIAM journal on mathematical analysis, vol.37, issue.1, pp.17-59, 2005. ,
Robustness may be at odds with accuracy, Proceedings of the International Conference on Learning Representations (ICLR), 2019. ,
Introduction to Nonparametric Estimation, 2008. ,
A theory of the learnable, Proceedings of the sixteenth annual ACM symposium on Theory of computing, pp.436-445, 1984. ,
A Gene-Expression Signature as a Predictor of Survival in Breast Cancer, New England Journal of Medicine, vol.347, issue.25, 1999. ,
Learning with marginalized corrupted features, International Conference on Machine Learning (ICML), 2013. ,
The nature of statistical learning theory, 2000. ,
On the uniform convergence of relative frequencies of events to their probabilities, Theory of Probability and its Applications, vol.16, p.264, 1971. ,
Machine learning with kernel methods, Course in the "Mathématiques, Vision, Apprentissage" Master. ENS Cachan, 2017. ,
Distance-based classification with lipschitz functions, Journal of Machine Learning Research (JMLR), vol.5, pp.669-695, 2004. ,
Altitude Training: Strong Bounds for Single-layer Dropout, Advances in Neural Information Processing Systems (NIPS), 2014. ,
Spline models for observational data, vol.59, 1990. ,
High-dimensional statistics: A non-asymptotic viewpoint, vol.48, 2019. ,
Regularization matters: Generalization and optimization of neural nets v.s. their induced kernel, Advances in Neural Information Processing Systems (NeurIPS), 2019. ,
A mathematical theory of deep convolutional neural networks for feature extraction, IEEE Transactions on Information Theory, vol.64, issue.3, pp.1845-1866, 2018. ,
Computing with infinite networks, Advances in Neural Information Processing Systems (NIPS), 1997. ,
Using the Nyström method to speed up kernel machines, Advances in Neural Information Processing Systems (NIPS), 2001. ,
Gradient dynamics of shallow low-dimensional relu networks, Advances in Neural Information Processing Systems (NeurIPS), 2019. ,
The marginal value of adaptive gradient methods in machine learning, Advances in Neural Information Processing Systems (NIPS), 2017. ,
Provable defenses against adversarial examples via the convex outer adversarial polytope, Proceedings of the International Conference on Machine Learning (ICML), 2018. ,
Dual averaging methods for regularized stochastic learning and online optimization, Journal of Machine Learning Research (JMLR), vol.11, pp.2543-2596, 2010. ,
A proximal stochastic gradient method with progressive variance reduction, SIAM Journal on Optimization, vol.24, issue.4, pp.2057-2075, 2014. ,
Diverse neural network learns true target functions, Proceedings of the International Conference on Artificial Intelligence and Statistics, 2017. ,
Robust regression and lasso, Advances in Neural Information Processing Systems (NIPS), 2009. ,
Robustness and regularization of support vector machines, Journal of Machine Learning Research (JMLR), vol.10, pp.1485-1510, 2009. ,
Scaling limits of wide neural networks with weight sharing: Gaussian process behavior, gradient independence, and neural tangent kernel derivation, 2019. ,
A fine-grained spectral perspective on neural networks, 2019. ,
On early stopping in gradient descent learning, Constructive Approximation, vol.26, issue.2, pp.289-315, 2007. ,
Rademacher complexity for adversarially robust generalization, Proceedings of the International Conference on Machine Learning (ICML), 2019. ,
, Spectral norm regularization for improving the generalizability of deep learning, 2017.
Wide residual networks, 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01832503
Understanding deep learning requires rethinking generalization, Proceedings of the International Conference on Learning Representations (ICLR), 2017. ,
Are all layers created equal?, 2019. ,
Improved nyström low-rank approximation and error analysis, Proceedings of the International Conference on Machine Learning (ICML), 2008. ,
Boosting with early stopping: Convergence and consistency, The Annals of Statistics, vol.33, issue.4, pp.1538-1579, 2005. ,
1 -regularized neural networks are improperly learnable in polynomial time, International Conference on Machine Learning (ICML), 2016. ,
Convexified convolutional neural networks, International Conference on Machine Learning (ICML), 2017. ,
Lightweight stochastic optimization for minimizing finite sums with infinite data, Proceedings of the International Conference on Machine Learning (ICML), 2018. ,
Improving the robustness of deep neural networks via stability training, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. ,
Stochastic gradient descent optimizes overparameterized deep relu networks, Machine Learning, 2019. ,