Skip to Main content Skip to Navigation
Theses

Handling heterogeneous and MNAR missing data in statistical learning frameworks : imputation based on low-rank models, online linear regression with SGD, and model-based clustering

Abstract : The statistical analysis of growing masses of data represents a real added value for numerous and varied applications. Nevertheless, one of the ironies of the increased data collection is that missing data are unavoidable: the more data there are, the more missing data there are. The goal of this PhD thesis is to propose new statistical methods to handle missing values in several supervised and unsupervised machine learning scenarios, particularly when the data can be Missing Not At Random (MNAR), i.e. when the unavailability of values depends on the missing values themselves and values of other variables. A particular attention has been paid to derive methods relying on both strong theoretical and practical aspects, and meeting concrete needs in applications. First, low-rank models either with fixed or random effects are studied when MNAR values on several variables can occur. Second we address the case of online linear regression with missing covariates using a debiased averaged stochastic gradient algorithm. Furthermore, we investigate model-based clustering with MNAR data. Finally, we present our collaborative platform for reproducible research on missing values processing, that bundles classical and state-of-the-art methods.
Document type :
Theses
Complete list of metadata

https://tel.archives-ouvertes.fr/tel-03722429
Contributor : ABES STAR :  Contact
Submitted on : Wednesday, July 13, 2022 - 1:37:11 PM
Last modification on : Friday, August 5, 2022 - 3:00:08 PM

File

SPORTISSE_Aude_2021.pdf
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-03722429, version 1

Citation

Aude Sportisse. Handling heterogeneous and MNAR missing data in statistical learning frameworks : imputation based on low-rank models, online linear regression with SGD, and model-based clustering. Statistics [math.ST]. Sorbonne Université, 2021. English. ⟨NNT : 2021SORUS506⟩. ⟨tel-03722429⟩

Share

Metrics

Record views

28

Files downloads

4