?. R. Couillet, Z. Liao, and X. Mai, Random Matrix Advances in Machine Learning and Neural Nets, European Signal Processing Conference (EUSIPCO'18), 2018.

?. X. Mai and Z. Liao, High Dimensional Classification via Empirical Risk Minimization: Statistical Analysis and Optimality, 2019.

?. X. Mai and R. Couillet, Statistical Behavior and Performance of Support Vector Machines for Large Dimensional Data, 2019.

?. X. Mai and R. Couillet, Consistent Semi-Supervised Graph Regularization for High Dimensional Data, 2019.

?. X. Mai and R. Couillet, A Random Matrix Analysis and Improvement of Semi-Supervised Learning for Large Dimensional Data, Journal of Machine Learning Research, vol.19, issue.79, pp.1-27, 2018.

, Articles in International Conferences

?. X. Mai, Z. Liao, and R. Couillet, A Large Scale Analysis of Logistic Regression: Asymptotic Performance and New Insights, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP'19), 2019.
URL : https://hal.archives-ouvertes.fr/hal-02139980

?. X. Mai and R. Couillet, Revisiting and Improving Semi-Supervised Learning: A Large Dimensional Approach, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP'19), 2019.
URL : https://hal.archives-ouvertes.fr/hal-02139979

?. X. Mai and R. Couillet, Semi-Supervised Spectral Clustering, Asilomar Conference on Signals, Systems, and Computers, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01982268

?. R. Couillet, Z. Liao, and X. Mai, Classification Asymptotics in the Random Matrix Regime, European Signal Processing Conference (EUSIPCO'18), 2018.
URL : https://hal.archives-ouvertes.fr/hal-01957686

A. D. Résumé-(franç-ais-;-mai, Z. Liao, and R. Couillet, Comme la régression logistique donne une estimation du maximum de vraisemblance pour les paramètres , 0 , option par défaut et généralement considérée comme optimale lorsque l'hypothèse de distribution des données est satisfaite, nous proposons de vérifier l'optimalité de la régression logistique par une analyse conjointe des algorithmes de la minimisation du risque empirique avec fonctions lisses de perte. De manière remarquable, nos résultats prouvent que, contrairementà la conviction générale, la régression logistique basée sur la maximization de vraisemblance ne produit pas la meilleure performance de classification. Nousélaboronségalement des stratégies d'amélioration de ces algorithmesà partir de nos résultats théoriques avant d'en discuter les limites. Le chapitre est basé sur les contributions suivantes X, La régression logistique est l'un des algorithmes définis par le principe de la minimisation du risque empirique, avec une perte de vraisemblance logarithmique négative, 2019.

X. Mai and Z. Liao, High Dimensional Classification via Empirical Risk Minimization: Statistical Analysis and Optimality, 2019.

V. Marchenko and L. Pastur, The eigenvalue distribution in some ensembles of random matrices, Math. USSR Sbornik, vol.1, pp.457-483, 1967.

J. W. Silverstein and Z. Bai, On the empirical distribution of eigenvalues of a class of large dimensional random matrices, Journal of Multivariate analysis, vol.54, issue.2, pp.175-192, 1995.

Z. Bai and J. W. Silverstein, Spectral analysis of large dimensional random matrices, vol.20, 2010.

A. Zollanvari and E. R. Dougherty, Generalized consistent error estimator of linear discriminant analysis, IEEE transactions on signal processing, vol.63, issue.11, pp.2804-2814, 2015.

K. Elkhalil, A. Kammoun, R. Couillet, T. Y. Al-na?ouri, and M. Alouini, Asymptotic performance of regularized quadratic discriminant analysis based classifiers, 2017 IEEE 27th International Workshop on Machine Learning for Signal Processing, pp.1-6, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01957741

N. E. Karoui, D. Bean, P. J. Bickel, C. Lim, and B. Yu, On robust regression with highdimensional predictors, Proceedings of the National Academy of Sciences, p.201307842, 2013.

D. Donoho and A. Montanari, High dimensional robust m-estimation: Asymptotic variance via approximate message passing, Probability Theory and Related Fields, vol.166, pp.935-969, 2016.

P. Sur and E. J. Candès, A modern maximum-likelihood theory for high-dimensional logistic regression, 2018.

E. J. Candès and P. Sur, The phase transition for the existence of the maximum likelihood estimate in high-dimensional logistic regression, 2018.

X. Mai and R. Couillet, A random matrix analysis and improvement of semi-supervised learning for large dimensional data, The Journal of Machine Learning Research, vol.19, issue.1, pp.3074-3100, 2018.

, Consistent semi-supervised graph regularization for high dimensional data, 2019.

C. Louart and R. Couillet, Concentration of measure and large random matrices with an application to sample covariance matrices, 2019.
URL : https://hal.archives-ouvertes.fr/hal-02020287

M. E. Seddik, M. Tamaazousti, and R. Couillet, Kernel random matrices of large concentrated data: the example of gan-generated images, ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.7480-7484, 2019.

B. M. Shahshahani and D. A. Landgrebe, The e?ect of unlabeled samples in reducing the small sample size problem and mitigating the hughes phenomenon, IEEE Transactions on Geoscience and remote sensing, vol.32, issue.5, pp.1087-1095, 1994.

F. G. Cozman, I. Cohen, and M. Cirelo, Unlabeled data can degrade classification performance of generative classifiers, Flairs conference, pp.327-331, 2002.

S. Ben-david, T. Lu, and D. Pál, Does unlabeled data provably help? worst-case analysis of the sample complexity of semi-supervised learning, COLT, pp.33-44, 2008.

O. Chapelle, B. Scholkopf, and A. Zien, Semi-supervised learning (chapelle, IEEE Transactions on Neural Networks, vol.20, issue.3, pp.542-542, 2006.

C. Cortes and V. Vapnik, Support-vector networks, Machine learning, vol.20, issue.3, pp.273-297, 1995.

U. and V. Luxburg, A tutorial on spectral clustering, Statistics and computing, vol.17, issue.4, pp.395-416, 2007.

U. Von-luxburg, M. Belkin, and O. Bousquet, Consistency of spectral clustering, The Annals of Statistics, pp.555-586, 2008.

R. Couillet and F. Benaych-georges, Kernel spectral clustering of large dimensional data, Electronic Journal of Statistics, vol.10, issue.1, pp.1393-1454, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01215343

H. Huang, Asymptotic behavior of support vector machine for spiked population model, Journal of Machine Learning Research, vol.18, issue.45, pp.1-21, 2017.

N. E. Karoui, Asymptotic behavior of unregularized and ridge-regularized high-dimensional robust regression estimators: rigorous results, 2013.

V. A. Mar?enko and L. A. Pastur, Distribution of eigenvalues for some sets of random matrices, Mathematics of the USSR-Sbornik, vol.1, issue.4, p.457, 1967.

P. Billingsley, Probability and Measure, 1995.

R. Couillet, M. Debbah, and J. Silverstein, A deterministic equivalent for the analysis of correlated mimo multiple access channels, IEEE Transactions on Information Theory, vol.6, issue.57, pp.3493-3514, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00553676

R. Couillet and M. Debbah, Random matrix methods for wireless communications, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00658725

M. S. Jaakkola and M. Szummer, Partially labeled classification with markov random walks, International Conference in Neural Information Processing Systems, vol.14, pp.945-952, 2002.

X. Zhu and Z. Ghahramani, Learning from labeled and unlabeled data with label propagation, Citeseer, Tech. Rep, 2002.

K. Avrachenkov, P. Goncalves, A. Mishenin, and M. Sokol, Generalized optimization framework for graph-based semi-supervised learning, 2011.
URL : https://hal.archives-ouvertes.fr/inria-00633818

X. Zhu, Z. Ghahramani, and J. La?erty, Semi-supervised learning using gaussian fields and harmonic functions, International Conference on Machine Learning, vol.3, pp.912-919, 2003.

M. Belkin, I. Matveeva, and P. Niyogi, Regularization and semi-supervised learning on large graphs, International Conference on Computational Learning Theory (COLT

T. Joachims, Transductive learning via spectral graph partitioning, International Conference on Machine Learning, vol.3, pp.290-297, 2003.

D. Zhou, O. Bousquet, T. N. Lal, J. Weston, and B. Schölkopf, Learning with local and global consistency, vol.16, pp.321-328, 2004.

M. Belkin and P. Niyogi, Semi-supervised learning on riemannian manifolds, Machine learning, vol.56, issue.1-3, pp.209-239, 2004.

A. B. Goldberg, X. Zhu, A. Singh, Z. Xu, and R. Nowak, Multi-manifold semi-supervised learning, 2009.

A. Moscovich, A. Ja?e, and B. Nadler, Minimax-optimal semi-supervised regression on unknown manifolds, 2016.

L. Wasserman and J. D. La?erty, Statistical analysis of semi-supervised regression, International Conference on Neural Information Processing Systems, pp.801-808, 2008.

P. J. Bickel and B. Li, Local polynomial regression on unknown manifolds, Complex Datasets and Inverse Problems, pp.177-186, 2007.

A. Globerson, R. Livni, and S. Shalev-shwartz, E?ective semi-supervised learning on manifolds, International Conference on Learning Theory (COLT, pp.978-1003, 2017.

S. K. Narang, A. Gadde, and A. Ortega, Signal processing techniques for interpolation in graph structured data, IEEE International Conference Acoustics, Speech and Signal Processing, pp.5445-5449, 2013.

S. K. Narang, A. Gadde, E. Sanou, and A. Ortega, Localized iterative methods for interpolation in graph structured data, Global Conference on Signal and Information Processing, pp.491-494, 2013.

A. Gadde, A. Anis, and A. Ortega, Active semi-supervised learning using sampling theory for graph signals, International Conference on Knowledge Discovery and Data Mining, pp.492-501, 2014.

A. Anis, A. E. Gamal, S. Avestimehr, and A. Ortega, Asymptotic justification of bandlimited interpolation of graph signals for semi-supervised learning, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.5461-5465, 2015.

R. Couillet and F. Benaych-georges, Kernel spectral clustering of large dimensional data, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01215343

K. Beyer, J. Goldstein, R. Ramakrishnan, and U. Shaft, When is "nearest neighbor" meaningful, in International conference on database theory, pp.217-235, 1999.

C. C. Aggarwal, A. Hinneburg, and D. A. Keim, On the surprising behavior of distance metrics in high dimensional space, International conference on database theory, pp.420-434, 2001.

A. Hinneburg, C. C. Aggarwal, and D. A. Keim, What is the nearest neighbor in high dimensional spaces, 26th Internat. Conference on Very Large Databases, pp.506-515, 2000.

D. Francois, V. Wertz, and M. Verleysen, The concentration of fractional distances, IEEE Transactions on Knowledge and Data Engineering, vol.19, issue.7, pp.873-886, 2007.

F. Angiulli, On the behavior of intrinsically high-dimensional spaces: Distances, direct and reverse nearest neighbors, and hubness, Journal of Machine Learning Research, vol.18, issue.170, pp.1-60, 2018.

B. Nadler, N. Srebro, and X. Zhou, Semi-supervised learning with the graph laplacian: the limit of infinite unlabelled data, International Conference on Neural Information Processing Systems, pp.1330-1338, 2009.

Y. Lecun, C. Cortes, and C. J. Burges, The mnist database of handwritten digits, 1998.

O. Chapelle, B. Schölkopf, and A. Zien, Semi-Supervised Learning, 2006.

B. Schölkopf and A. J. Smola, Learning with kernels: support vector machines, regularization, optimization, and beyond, 2002.

Z. Liao and R. Couillet, A large dimensional analysis of least squares support vector machines, 2017.
URL : https://hal.archives-ouvertes.fr/hal-02048984

R. Couillet and A. Kammoun, Random matrix improved subspace clustering, Asilomar Conference on Signals, Systems and Computers, pp.90-94, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01633444

D. I. Shuman, S. K. Narang, P. Frossard, A. Ortega, and P. Vandergheynst, The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains, IEEE Signal Processing Magazine, vol.30, issue.3, pp.83-98, 2013.

R. Couillet and F. Benaych-georges, Kernel spectral clustering of large dimensional data, Electronic Journal of Statistics, vol.10, issue.1, pp.1393-1454, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01215343

J. Baik and J. W. Silverstein, Eigenvalues of large sample covariance matrices of spiked population models, Journal of Multivariate Analysis, vol.97, issue.6, pp.1382-1408, 2006.

F. Benaych-georges and R. R. Nadakuditi, The singular values and vectors of low rank perturbations of large rectangular random matrices, Journal of Multivariate Analysis, vol.111, pp.120-135, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00575203

A. Krizhevsky, V. Nair, and G. Hinton, The cifar-10 dataset, 2014.

K. He, X. Zhang, S. Ren, and J. Sun, Deep residual learning for image recognition, Proceedings of the IEEE conference on computer vision and pattern recognition, pp.770-778, 2016.

M. Belkin, P. Niyogi, and V. Sindhwani, Manifold regularization: A geometric framework for learning from labeled and unlabeled examples, Journal of machine learning research, vol.7, pp.2399-2434, 2006.

V. Vapnik, Principles of risk minimization for learning theory, Advances in neural information processing systems, pp.831-838, 1992.

S. Ben-david, N. Eiron, and P. M. Long, On the di culty of approximately maximizing agreements, Journal of Computer and System Sciences, vol.66, issue.3, pp.496-514, 2003.

L. Rosasco, E. D. Vito, A. Caponnetto, M. Piana, and A. Verri, Are loss functions all the same?, Neural Computation, vol.16, issue.5, pp.1063-1076, 2004.

H. Masnadi-shirazi and N. Vasconcelos, On the design of loss functions for classification: theory, robustness to outliers, and savageboost, Advances in neural information processing systems, pp.1049-1056, 2009.

E. L. Lehmann and J. P. Romano, Testing statistical hypotheses, 2006.

P. Mccullagh and J. A. Nelder, Generalized linear models, vol.37, 1989.

S. Portnoy, Asymptotic behavior of m-estimators of p regression parameters when p 2 /n is large. i. consistency, The Annals of Statistics, vol.12, issue.4, pp.1298-1309, 1984.

R. Couillet, Z. Liao, and X. Mai, Classification Asymptotics in the Random Matrix Regime, 26th European Signal Processing Conference, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01957686

Y. Freund, R. Schapire, and N. Abe, A short introduction to boosting, Journal-Japanese Society For Artificial Intelligence, vol.14, p.1612, 1999.

R. Rojas, Adaboost and the super bowl of classifiers a tutorial introduction to adaptive boosting, 2009.

P. Sur, Y. Chen, and E. J. Candès, The likelihood ratio test in high-dimensional logistic regression is asymptotically a rescaled chi-square, 2017.

G. Fumera, R. Fabio, and S. Alessandra, A theoretical analysis of bagging as a linear combination of classifiers, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.30, issue.7, pp.1293-1299, 2008.

H. T. Ali, A. Kammoun, and R. Couillet, Random matrix asymptotics of inner product kernel spectral clustering, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing, pp.2441-2445, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01812005

V. N. Vapnik and V. Vapnik, Statistical learning theory, vol.1, 1998.

B. Settles, Active learning literature survey, 2009.

S. J. Pan and Q. Yang, A survey on transfer learning, IEEE Transactions on knowledge and data engineering, vol.22, issue.10, pp.1345-1359, 2009.

M. T. Rosenstein, Z. Marx, L. P. Kaelbling, and T. G. Dietterich, To transfer or not to transfer, NIPS 2005 workshop on transfer learning, vol.898, p.3, 2005.

P. Billingsley, Probability and measure, 2008.

M. A. Woodbury, Inverting modified matrices, Memorandum report, vol.42, issue.106, p.336, 1950.

R. Walter, Real and complex analysis, 1987.

J. Sherman and W. J. Morrison, Adjustment of an inverse matrix corresponding to a change in one element of a given matrix, The Annals of Mathematical Statistics, vol.21, issue.1, pp.124-127, 1950.

Z. Bai and J. W. Silverstein, No eigenvalues outside the support of the limiting spectral distribution of large-dimensional sample covariance matrices, The Annals of Probability, vol.26, issue.1, pp.316-345, 1998.

F. Benaych-georges and R. Couillet, Spectral analysis of the gram matrix of mixture models, ESAIM: Probability and Statistics, vol.20, pp.217-237, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01215342