Skip to Main content Skip to Navigation

Advanced Random Matrix Methods for Machine Learning

Abstract : Machine Learning (ML) has been quite successful to solve many real-world applications going from supervised to unsupervised tasks due to the development of powerful algorithms (Support Vector Machine (SVM), Deep Neural Network, Spectral Clustering, etc). These algorithms are based on optimization schemes motivated by low dimensional intuitions which collapse in high dimension, a phenomenon known as the "curse of dimensionality''. Nonetheless, by assuming the data dimension and their number to be both large and comparable, Random Matrix Theory (RMT) provides a systematic approach to assess the (statistical) behavior of these large learning systems, to properly understand and improve them when applied to large dimensional data. Previous random matrix analyses (cf. Mai & Couillet, 2018 ; Liao & Couillet, 2019 ; Deng et al., 2019) have shown that asymptotic performances of most machine learning and signal processing methods depend only on first and second-order statistics (means and covariance matrices of the data). This makes covariance matrices extremely rich objects that need to be "well treated and understood". The thesis demonstrates first how poorly naive covariance matrix processing can destroy machine learning algorithms by introducing biases that are difficult to clean, whereas consistent random-matrix estimation of the functionals of interest avoids biases. We then exemplify how means and covariance matrix statistics of the data are sufficient (through simple functionals) to handle the statistical behavior of even quite involved algorithms of modern interest, such as multi-task and transfer learning methods. The large dimensional analysis allows furthermore for an improvement of multi-task and transfer learning schemes.
Complete list of metadata
Contributor : Abes Star :  Contact
Submitted on : Thursday, October 21, 2021 - 5:14:10 PM
Last modification on : Monday, October 25, 2021 - 2:29:57 PM


Version validated by the jury (STAR)


  • HAL Id : tel-03391681, version 1


Malik Tiomoko. Advanced Random Matrix Methods for Machine Learning. Machine Learning [stat.ML]. Université Paris-Saclay, 2021. English. ⟨NNT : 2021UPASG067⟩. ⟨tel-03391681⟩



Record views


Files downloads