R. Adamczak, On the marchenko-pastur and circular laws for some classes of random matrices with dependent entries, Electronic Journal of Probability, vol.16, pp.1065-1095, 2011.

E. George, R. Andrews, R. Askey, and . Roy, Special functions, vol.71, 2000.

, Naum Ilich Akhiezer and Izrail Markovich Glazman. Theory of linear operators in Hilbert space, Courier Corporation, 2013.

L. Arnold, M. Volker, L. Gundlach, and . Demetrius, Evolutionary formalism for products of positive random matrices, The Annals of Applied Probability, vol.4, issue.3, pp.859-901, 1994.

K. Ralph-g-andrzejak, F. Lehnertz, C. Mormann, P. Rieke, C. E. David et al., Indications of nonlinear deterministic and finite-dimensional structures in time series of brain electrical activity: Dependence on recording region and brain state, Physical Review E, vol.64, issue.6, p.61907, 2001.

K. Avrachenkov, A. Mishenin, P. Gonçalves, and M. Sokol, Generalized optimization framework for graph-based semisupervised learning, Proceedings of the 2012 SIAM International Conference on Data Mining, pp.966-974, 2012.
URL : https://hal.archives-ouvertes.fr/inria-00633818

M. Abramowitz and I. A. Stegun, Handbook of mathematical functions: with formulas, graphs, and mathematical tables, Courier Corporation, vol.55, 1965.

S. Madhu, A. Advani, and . Saxe, High-dimensional dynamics of generalization error in neural networks, 2017.

J. C. Dj-albers, W. D. Sprott, and . Dechert, Dynamical behavior of artificial neural networks with random weights, vol.6, pp.17-22, 1996.

Z. Allen-zhu, Y. Li, and Z. Song, A convergence theory for deep learning via over-parameterization, 2018.

G. Baudat and F. Anouar, Generalized discriminant analysis using a kernel approach, Neural computation, vol.12, issue.10, pp.2385-2404, 2000.

J. Baik, G. Ben-arous, and S. Péché, Phase transition of the largest eigenvalue for nonnull complex sample covariance matrices. The Annals of Probability, vol.33, pp.1643-1697, 2005.

H. Heinz, P. L. Bauschke, and . Combettes, Convex analysis and monotone operator theory in Hilbert spaces, vol.408, 2011.

Y. Bengio, A. Courville, and P. Vincent, Representation learning: A review and new perspectives, IEEE transactions on pattern analysis and machine intelligence, vol.35, pp.1798-1828, 2013.

S. Ben-david, N. Eiron, and P. Long, On the difficulty of approximately maximizing agreements, Journal of Computer and System Sciences, vol.66, issue.3, pp.496-514, 2003.

F. Benaych, -. Georges, and R. Couillet, Spectral analysis of the gram matrix of mixture models, ESAIM: Probability and Statistics, vol.20, pp.217-237, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01215342

F. Benaych, -. Georges, and R. Rao-nadakuditi, The eigenvalues and eigenvectors of finite, low rank perturbations of large random matrices, Advances in Mathematics, vol.227, issue.1, pp.494-521, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00423593

F. Benaych, -. Georges, and R. Rao-nadakuditi, The singular values and vectors of low rank perturbations of large rectangular random matrices, Journal of Multivariate Analysis, vol.111, pp.120-135, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00575203

P. Baldi and K. Hornik, Neural networks and principal component analysis: Learning from examples without local minima, Neural networks, vol.2, issue.1, pp.53-58, 1989.

P. Billingsley, Probability and measure, 2012.

M. Christopher and . Bishop, Pattern Recognition and Machine Learning, 2007.

L. Benigni and S. Péché, Eigenvalue distribution of nonlinear models of random matrices, 2019.

A. Blum, L. Ronald, and . Rivest, Training a 3-node neural network is NP-complete, Advances in neural information processing systems, pp.494-501, 1989.

L. Breiman, Random forests. Machine learning, vol.45, pp.5-32, 2001.

J. Baik and . Jack-w-silverstein, Eigenvalues of large sample covariance matrices of spiked population models, Journal of multivariate analysis, vol.97, issue.6, pp.1382-1408, 2006.

D. Zhidong, . Bai, and . Jack-w-silverstein, Clt for linear spectral statistics of large-dimensional sample covariance matrices, Advances In Statistics, pp.281-333, 2008.

Z. Bai and . Jack-w-silverstein, Spectral analysis of large dimensional random matrices, vol.20, 2010.

S. Boyd and L. Vandenberghe, Convex optimization, 2004.

R. Couillet and F. Benaych-georges, Kernel spectral clustering of large dimensional data, Electronic Journal of Statistics, vol.10, issue.1, pp.1393-1454, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01215343

Y. Chen and E. Candes, Solving random quadratic systems of equations is nearly as easy as solving linear systems, Advances in Neural Information Processing Systems, pp.739-747, 2015.

A. Caponnetto and E. D. Vito, Optimal rates for the regularized least-squares algorithm, Foundations of Computational Mathematics, vol.7, issue.3, pp.331-368, 2007.

-. Chein and . Chang, Hyperspectral imaging: techniques for spectral detection and classification, vol.1, 2003.

R. Couillet and A. Kammoun, Random matrix improved subspace clustering, Signals, Systems and Computers, 2016 50th Asilomar Conference on, pp.90-94, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01633444

F. Clarke, Optimization and Nonsmooth Analysis, Society for Industrial and Applied Mathematics, 1990.

Y. Chi, M. Yue, Y. Lu, and . Chen, Nonconvex optimization meets low-rank matrix factorization: An overview, 2018.

R. Couillet, Z. Liao, and X. Mai, Classification asymptotics in the random matrix regime, 26th European Signal Processing Conference (EUSIPCO), pp.1875-1879, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01957686

J. Emmanuel, Y. Candes, and . Plan, Tight oracle inequalities for low-rank matrix recovery from a minimal number of noisy random measurements, IEEE Transactions on Information Theory, vol.57, issue.4, pp.2342-2359, 2011.

D. Cox and N. Pinto, Beyond simple features: A large-scale feature search approach to unconstrained face recognition, Face and Gesture, pp.8-15, 2011.

X. Cheng and A. Singer, The spectrum of random inner-product kernel matrices, Random Matrices: Theory and Applications, vol.2, p.1350010, 2013.

J. Emmanuel, P. Candès, and . Sur, The phase transition for the existence of the maximum likelihood estimate in high-dimensional logistic regression, 2018.

R. Collobert and J. Weston, A unified architecture for natural language processing: Deep neural networks with multitask learning, Proceedings of the 25th international conference on Machine learning, pp.160-167, 2008.

R. Couillet, G. Wainrib, H. Sevi, and H. Ali, The asymptotic performance of linear echo state neural networks, The Journal of Machine Learning Research, vol.17, issue.1, pp.6171-6205, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01322809

C. Simon-s-du, J. D. Jin, . Lee, A. Michael-i-jordan, B. Singh et al., Gradient descent can take exponential time to escape saddle points, Advances in neural information processing systems, pp.1067-1077, 2017.

D. Didier and K. Kurdyka, Explicit bounds for the ?ojasiewicz exponent in the gradient inequality for polynomials, Annales Polonici Mathematici, vol.1, pp.51-61, 2005.

J. D. Simon-s-du, Y. Lee, and . Tian, Aarti Singh, and Barnabas Poczos. Gradient descent learns one-hidden-layer cnn: Don't be afraid of spurious local minima, International Conference on Machine Learning, pp.1338-1347, 2018.

D. Donoho and A. Montanari, High dimensional robust Mestimation: asymptotic variance via approximate message passing, Probability Theory and Related Fields, vol.166, pp.935-969, 2016.

Y. Do and V. Vu, The spectrum of random kernel matrices: universality results for rough and varying kernels, Random Matrices: Theory and Applications, vol.2, p.1350005, 2013.

J. Fabrizio-de-vico-fallani, M. Richiardi, S. Chavez, and . Achard, Graph analysis of functional brain networks: practical issues in translational neuroscience, Philosophical Transactions of the Royal Society B: Biological Sciences, vol.369, p.20130521, 1653.

F. Draxler and K. Veschgini, Manfred Salmhofer, and Fred Hamprecht. Essentially no barriers in neural network energy landscape, International Conference on Machine Learning, pp.1308-1317, 2018.

N. E. Karoui, Concentration of measure and spectra of random matrices: Applications to correlation matrices, elliptical distributions and beyond, The Annals of Applied Probability, vol.19, issue.6, pp.2362-2405, 2009.

N. E. Karoui, On Information Plus Noise Kernel Random Matrices, The Annals of Statistics, vol.38, issue.5, pp.3191-3216, 2010.

N. E. Karoui, The spectrum of kernel random matrices. The Annals of Statistics, vol.38, pp.1-50, 2010.

D. Ekbb-+-13]-noureddine-el-karoui, . Bean, J. Peter, C. Bickel, B. Lim et al., On robust regression with high-dimensional predictors, Proceedings of the National Academy of Sciences, vol.110, pp.14557-14562, 2013.

T. Evgeniou, M. Pontil, C. Papageorgiou, and T. Poggio, Image representations for object detection using kernel classifiers, Asian Conference on Computer Vision, pp.687-692, 2000.

D. Freeman and J. Bruna, Topology and geometry of halfrectified network optimization, 2016.

J. Friedman, T. Hastie, and R. Tibshirani, The elements of statistical learning, Springer series in statistics, vol.1, 2001.

Z. Fan and A. Montanari, The spectral norm of random innerproduct kernel matrices. Probability Theory and Related Fields, vol.173, pp.27-85, 2019.

Y. Freund, R. Schapire, and N. Abe, A short introduction to boosting, Journal-Japanese Society For Artificial Intelligence, vol.14, p.1612, 1999.

X. Glorot and Y. Bengio, Understanding the difficulty of training deep feedforward neural networks, Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp.249-256, 2010.

I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning, 2016.

E. Gelenbe, Learning in the recurrent random neural network, Neural computation, vol.5, issue.1, pp.154-164, 1993.

P. Geurts, D. Ernst, and L. Wehenkel, Extremely randomized trees, Machine learning, vol.63, issue.1, pp.3-42, 2006.
URL : https://hal.archives-ouvertes.fr/hal-00341932

R. Ge, F. Huang, C. Jin, and Y. Yuan, Escaping from saddle points-online stochastic gradient for tensor decomposition, Conference on Learning Theory, pp.797-842, 2015.

T. Garipov, P. Izmailov, D. Podoprikhin, P. Dmitry, A. Vetrov et al., Loss surfaces, mode connectivity, and fast ensembling of dnns, Advances in Neural Information Processing Systems, pp.8789-8798, 2018.

R. Ge, C. Jin, and Y. Zheng, No spurious local minima in nonconvex low rank problems: A unified geometric analysis, Proceedings of the 34th International Conference on Machine Learning, vol.70, pp.1233-1242, 2017.

, Izrail Solomonovich Gradshteyn and Iosif Moiseevich Ryzhik. Table of integrals, series, and products, 2014.

R. Giryes, G. Sapiro, and A. M. Bronstein, Deep neural networks with random gaussian weights: A universal classification strategy? IEEE Transactions Signal Processing, vol.64, pp.3444-3457, 2016.

. Van-gestel, A. K. Johan, G. Suykens, A. Lanckriet, . Lambrechts et al., Bayesian framework for least-squares support vector machine classifiers, gaussian processes, and kernel fisher discriminant analysis, Neural computation, vol.14, issue.5, pp.1115-1147, 2002.

A. Roger and C. Horn, Matrix analysis. Cambridge university press, 2012.

W. Hachem, P. Loubaton, and J. Najim, Deterministic equivalents for certain functionals of large random matrices, The Annals of Applied Probability, vol.17, issue.3, pp.875-930, 2007.
URL : https://hal.archives-ouvertes.fr/hal-00621793

W. Morris, C. C. Hirsch, M. Pugh, and . Shub, Invariant manifolds, vol.583, 2006.

K. He, Y. Wang, and J. Hopcroft, A powerful generative model using random weights for the deep image representation, Advances in Neural Information Processing Systems, pp.631-639, 2016.

G. Huang, H. Zhou, X. Ding, and R. Zhang, Extreme learning machine for regression and multiclass classification, IEEE Transactions on Systems, Man, and Cybernetics, vol.42, issue.2, pp.513-529, 2012.

K. He, X. Zhang, S. Ren, and J. Sun, Identity mappings in deep residual networks, European conference on computer vision, pp.630-645, 2016.

H. Jaeger, The "echo state" approach to analysing and training recurrent neural networks-with an erratum note, German National Research Center for Information Technology GMD Technical Report, vol.148, issue.34, p.13, 2001.

K. Jarrett, K. Kavukcuoglu, and Y. Lecun, What is the best multi-stage architecture for object recognition?, 2009 IEEE 12th international conference on computer vision, pp.2146-2153, 2009.

K. Kawaguchi, Deep learning without poor local minima, Advances In Neural Information Processing Systems, pp.586-594, 2016.

A. Kammoun and R. Couillet, Subspace kernel spectral clustering of large dimensional data, 2017.

P. Kar and H. Karnick, Random feature maps for dot product kernels, In Artificial Intelligence and Statistics, pp.583-591, 2012.

C. Kmm-+-13]-florent-krzakala, E. Moore, J. Mossel, A. Neeman, L. Sly et al., Spectral redemption in clustering sparse networks, Proceedings of the National Academy of Sciences, vol.110, issue.52, pp.20935-20940, 2013.

A. Krizhevsky, I. Sutskever, and G. E. Hinton, Imagenet classification with deep convolutional neural networks, Advances in neural information processing systems, pp.1097-1105, 2012.

Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, Gradientbased learning applied to document recognition, Proceedings of the IEEE, vol.86, issue.11, pp.2278-2324, 1998.

Z. Liao and R. Couillet, On the spectrum of random features maps of high dimensional data, Proceedings of the 35th International Conference on Machine Learning, vol.80, pp.3063-3071, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01954933

C. Louart and R. Couillet, A random matrix and concentration inequalities framework for neural networks analysis, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.4214-4218, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01957729

Z. Liao and R. Couillet, Inner-product kernels are asymptotically equivalent to binary discrete kernels, 2019.

Z. Liao and R. Couillet, A large dimensional analysis of least squares support vector machines, IEEE Transactions on Signal Processing, vol.67, issue.4, pp.1065-1074, 2019.
URL : https://hal.archives-ouvertes.fr/hal-02048984

C. Louart and R. Couillet, Concentration of measure and large random matrices with an application to sample covariance matrices, 2019.
URL : https://hal.archives-ouvertes.fr/hal-02020287

L. Laloux, P. Cizeau, M. Potters, and J. Bouchaud, Random matrix theory and financial correlations, International Journal of Theoretical and Applied Finance, vol.3, issue.03, pp.391-397, 2000.

M. Ledoux, The concentration of measure phenomenon. Number 89, 2005.

C. Louart, Z. Liao, and R. Couillet, A random matrix approach to neural networks, The Annals of Applied Probability, vol.28, issue.2, pp.1190-1248, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01957656

Z. Lu, A. May, K. Liu, A. Bagheri-garakani, D. Guo et al., How to scale up kernel methods to be as good as deep neural nets, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01666934

S. Lojasiewicz, Sur les trajectoires du gradient d'une fonction analytique. Seminari di geometria, pp.115-117, 1982.

A. Lytova and L. Pastur, Central limit theorem for linear eigenvalue statistics of random matrices with independent entries. The Annals of Probability, vol.37, pp.1778-1840, 2009.

M. Jason-d-lee, . Simchowitz, B. Michael-i-jordan, and . Recht, Gradient descent only converges to minimizers, Conference on learning theory, pp.1246-1257, 2016.

A. Mohamed, G. E. Dahl, and G. Hinton, Acoustic modeling using deep belief networks, IEEE transactions on audio, speech, and language processing, vol.20, pp.14-22, 2011.

G. Katta, . Murty, N. Santosh, and . Kabadi, Some NP-complete problems in quadratic and nonlinear programming. Mathematical programming, vol.39, pp.117-129, 1987.

A. Vladimir, L. A. Mar?enko, and . Pastur, Distribution of eigenvalues for some sets of random matrices, Mathematics of the USSR-Sbornik, vol.1, issue.4, p.457, 1967.

G. Mrw-+-99]-sebastian-mika, J. Ratsch, B. Weston, K. Scholkopf, and . Mullers, Fisher discriminant analysis with kernels, Proceedings of the 1999 IEEE signal processing society workshop, pp.41-48, 1999.

H. Masnadi, -. Shirazi, and N. Vasconcelos, On the design of loss functions for classification: theory, robustness to outliers, and savageboost, Advances in neural information processing systems, pp.1049-1056, 2009.

Y. Andrew, M. I. Ng, Y. Jordan, and . Weiss, On spectral clustering: Analysis and an algorithm, Advances in neural information processing systems, pp.849-856, 2002.

Y. Nesterov, T. Boris, and . Polyak, Cubic regularization of newton method and its global performance, Mathematical Programming, vol.108, issue.1, pp.177-205, 2006.

O. Sean and . Rourke, A note on the marchenko-pastur law for a class of random matrices with dependent entries, Electronic Communications in Probability, vol.17, 2012.

A. Leonid and . Pastur, A simple approach to the global regime of gaussian ensembles of random matrices, Ukrainian Mathematical Journal, vol.57, issue.6, pp.936-966, 2005.

D. Paul, Asymptotics of sample eigenstructure for a large dimensional spiked covariance model, Statistica Sinica, pp.1617-1642, 2007.

N. Pinto, D. Doukhan, J. James, D. Dicarlo, and . Cox, A highthroughput screening approach to discovering good forms of biologically inspired visual representation, PLoS computational biology, vol.5, issue.11, p.1000579, 2009.

D. Paul and . Jack-w-silverstein, No eigenvalues outside the support of the limiting empirical spectral distribution of a separable covariance matrix, Journal of Multivariate Analysis, vol.100, issue.1, pp.37-57, 2009.

L. Andreevich-pastur and M. Shcherbina, Eigenvalue distribution of large random matrices. Number 171, 2011.

J. Pennington, S. Schoenholz, and S. Ganguli, Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice, Advances in neural information processing systems, pp.4785-4795, 2017.

J. Pennington and P. Worah, Nonlinear random matrix theory for deep learning, Advances in Neural Information Processing Systems, pp.2634-2643, 2017.

O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh et al., Imagenet large scale visual recognition challenge, International journal of computer vision, vol.115, issue.3, pp.211-252, 2015.

F. Rosenblatt, The perceptron: a probabilistic model for information storage and organization in the brain, Psychological review, vol.65, issue.6, p.386, 1958.

A. Rahimi and B. Recht, Random features for large-scale kernel machines, Advances in neural information processing systems, pp.1177-1184, 2008.

I. Anatol'evich-rozanov, I. Anatol'evich-rozanov, and Y. A. Rozanov, Stationary random processes, 1967.

. Walter-rudin, Fourier analysis on groups, vol.121967, 1962.

. Walter-rudin, Principles of mathematical analysis, vol.3, 1964.

L. Rosasco, E. D. Vito, A. Caponnetto, M. Piana, and A. Verri, Are loss functions all the same?, Neural Computation, vol.16, issue.5, pp.1063-1076, 2004.

W. Jack, Z. D. Silverstein, and . Bai, On the empirical distribution of eigenvalues of a class of large dimensional random matrices, Journal of Multivariate analysis, vol.54, issue.2, pp.175-192, 1995.

W. Jack, S. Silverstein, and . Choi, Analysis of the limiting spectral distribution of large dimensional random matrices, Journal of Multivariate Analysis, vol.54, issue.2, pp.295-309, 1995.

P. Sur and . Candès, A modern maximumlikelihood theory for high-dimensional logistic regression, 2018.

J. Schmidhuber, ;. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever et al., Dropout: A simple way to prevent neural networks from overfitting, Deep learning in neural networks: An overview, vol.61, pp.1929-1958, 2014.

P. Andrew-m-saxe, Z. Wei-koh, M. Chen, B. Bhand, A. Suresh et al., On random weights and unsupervised feature learning, ICML, vol.2, p.6, 2011.

F. Wouter, M. A. Schmidt, R. Kraaijveld, and . Duin, Feed forward neural networks with random weights, International Conference on Pattern Recognition, pp.1-1, 1992.

A. Saade, F. Krzakala, and L. Zdeborová, Spectral clustering of graphs with the bethe hessian, Advances in Neural Information Processing Systems, pp.406-414, 2014.
URL : https://hal.archives-ouvertes.fr/cea-01140852

M. Andrew, J. L. Saxe, S. Mcclelland, and . Ganguli, Exact solutions to the nonlinear dynamics of learning in deep linear neural networks, 2013.

B. Schölkopf and A. J. Smola, Learning with kernels: support vector machines, regularization, optimization, and beyond, 2002.

J. Alex, B. Smola, and . Schölkopf, A tutorial on support vector regression, Statistics and computing, vol.14, issue.3, pp.199-222, 2004.

M. Charles and . Stein, Estimation of the mean of a multivariate normal distribution. The annals of Statistics, pp.1135-1151, 1981.

A. K. Johan, J. Suykens, and . Vandewalle, Least squares support vector machine classifiers. Neural processing letters, vol.9, pp.293-300, 1999.

S. Scardapane and D. Wang, Randomness in neural networks: an overview, Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, vol.7, issue.2, 2017.

D. Ulyanov, A. Vedaldi, and V. Lempitsky, Deep image prior, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.9446-9454, 2018.

V. Vapnik, Principles of risk minimization for learning theory, Advances in neural information processing systems, pp.831-838, 1992.

W. Aad, . Van-der, and . Vaart, Cambridge university press, Asymptotic statistics, vol.3, 2000.

U. Von and L. , A tutorial on spectral clustering. Statistics and computing, vol.17, issue.4, pp.395-416, 2007.

G. Vasile, J. Ovarlez, F. Pascal, and C. Tison, Coherency matrix estimation of heterogeneous clutter in high-resolution polarimetric sar images, IEEE Transactions on Geoscience and Remote Sensing, vol.48, issue.4, pp.1809-1826, 2009.
URL : https://hal.archives-ouvertes.fr/hal-00466647

M. Steven-van-vaerenbergh, I. Lázaro-gredilla, and . Santamaría, Kernel recursive least-squares tracker for time-varying regression, IEEE transactions on neural networks and learning systems, vol.23, pp.1313-1326, 2012.

A. Vedaldi and A. Zisserman, Efficient additive kernels via explicit feature maps. IEEE transactions on pattern analysis and machine intelligence, vol.34, pp.480-492, 2012.

S. Wold, K. Esbensen, and P. Geladi, Principal component analysis. Chemometrics and intelligent laboratory systems, vol.2, pp.37-52, 1987.

K. I. Christopher and . Williams, Computing with infinite networks, Advances in neural information processing systems, pp.295-301, 1997.

X. Wu, X. Zhu, G. Wu, and W. Ding, Data mining with big data, IEEE transactions on knowledge and data engineering, vol.26, issue.1, pp.97-107, 2013.

H. Xiao, K. Rasul, and R. Vollgraf, Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms, 2017.

Y. Yao, L. Rosasco, and A. Caponnetto, On early stopping in gradient descent learning, Constructive Approximation, vol.26, pp.289-315, 2007.

. Willard-i-zangwill, Convergence conditions for nonlinear programming algorithms, Management Science, vol.16, issue.1, pp.1-13, 1969.

C. Zhang, S. Bengio, M. Hardt, B. Recht, and O. Vinyals, Understanding deep learning requires rethinking generalization, 2016.

J. Xiaojin and . Zhu, Semi-supervised learning literature survey, 2005.