Service interruption on Monday 11 July from 12:30 to 13:00: all the sites of the CCSD (HAL, EpiSciences, SciencesConf, AureHAL) will be inaccessible (network hardware connection).
Skip to Main content Skip to Navigation

Selection Bias Correction in Supervised Learning with Importance Weight

Van-Tinh Tran 1 
1 DM2L - Data Mining and Machine Learning
LIRIS - Laboratoire d'InfoRmatique en Image et Systèmes d'information
Abstract : In the theory of supervised learning, the identical assumption, i.e. the training and test samples are drawn from the same probability distribution, plays a crucial role. Unfortunately, this essential assumption is often violated in the presence of selection bias. Under such condition, the standard supervised learning frameworks may suffer a significant bias. In this thesis, we address the problem of selection bias in supervised learning using the importance weighting method. We first introduce the supervised learning frameworks and discuss the importance of the identical assumption. We then study the importance weighting framework for generative and discriminative learning under a general selection scheme and investigate the potential of Bayesian Network to encode the researcher's a priori assumption about the relationships between the variables, including the selection variable, and to infer the independence and conditional independence relationships that allow selection bias to be corrected.We pay special attention to covariate shift, i.e. a special class of selection bias where the conditional distribution P(y|x) of the training and test data are the same. We propose two methods to improve importance weighting for covariate shift. We first show that the unweighted model is locally less biased than the weighted one on low importance instances, and then propose a method combining the weighted and the unweighted models in order to improve the predictive performance in the target domain. Finally, we investigate the relationship between covariate shift and the missing data problem for data sets with small sample sizes and study a method that uses missing data imputation techniques to correct the covariate shift in simple but realistic scenarios
Document type :
Complete list of metadata

Cited literature [89 references]  Display  Hide  Download
Contributor : ABES STAR :  Contact
Submitted on : Tuesday, December 12, 2017 - 12:26:24 AM
Last modification on : Tuesday, June 1, 2021 - 2:08:09 PM


Version validated by the jury (STAR)


  • HAL Id : tel-01661470, version 1


Van-Tinh Tran. Selection Bias Correction in Supervised Learning with Importance Weight. Artificial Intelligence [cs.AI]. Université de Lyon, 2017. English. ⟨NNT : 2017LYSE1118⟩. ⟨tel-01661470⟩



Record views


Files downloads