Selection Bias Correction in Supervised Learning with Importance Weight

Van-Tinh Tran 1
1 DM2L - Data Mining and Machine Learning
LIRIS - Laboratoire d'InfoRmatique en Image et Systèmes d'information
Abstract : In the theory of supervised learning, the identical assumption, i.e. the training and test samples are drawn from the same probability distribution, plays a crucial role. Unfortunately, this essential assumption is often violated in the presence of selection bias. Under such condition, the standard supervised learning frameworks may suffer a significant bias. In this thesis, we address the problem of selection bias in supervised learning using the importance weighting method. We first introduce the supervised learning frameworks and discuss the importance of the identical assumption. We then study the importance weighting framework for generative and discriminative learning under a general selection scheme and investigate the potential of Bayesian Network to encode the researcher's a priori assumption about the relationships between the variables, including the selection variable, and to infer the independence and conditional independence relationships that allow selection bias to be corrected.We pay special attention to covariate shift, i.e. a special class of selection bias where the conditional distribution P(y|x) of the training and test data are the same. We propose two methods to improve importance weighting for covariate shift. We first show that the unweighted model is locally less biased than the weighted one on low importance instances, and then propose a method combining the weighted and the unweighted models in order to improve the predictive performance in the target domain. Finally, we investigate the relationship between covariate shift and the missing data problem for data sets with small sample sizes and study a method that uses missing data imputation techniques to correct the covariate shift in simple but realistic scenarios
Document type :
Theses
Complete list of metadatas

Cited literature [89 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-01661470
Contributor : Abes Star <>
Submitted on : Tuesday, December 12, 2017 - 12:26:24 AM
Last modification on : Wednesday, November 20, 2019 - 3:07:06 AM

File

TH2017TranVanTinh.pdf
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-01661470, version 1

Citation

Van-Tinh Tran. Selection Bias Correction in Supervised Learning with Importance Weight. Artificial Intelligence [cs.AI]. Université de Lyon, 2017. English. ⟨NNT : 2017LYSE1118⟩. ⟨tel-01661470⟩

Share

Metrics

Record views

267

Files downloads

1597