N. Aronszajn, Theory of reproducing kernels, Transactions of the American Mathematical Society, vol.68, issue.3, pp.337-404, 1950.
DOI : 10.1090/S0002-9947-1950-0051437-7

S. Baluja, Probabilistic modeling for face orientation discrimination: Learning from labeled and unlabeled data, Advances in Neural Information Processing Systems 11, pp.854-860, 1999.

E. Bareinboim and J. Pearl, Controlling selection bias in causal inference, Journal of Machine Learning Research -Proceedings Track, vol.22, pp.100-108, 2012.

E. Bareinboim, J. Tian, and J. Pearl, Recovering from selection bias in causal and statistical inference, AAAI, pp.2410-2416, 2014.

S. Ben-david, J. Blitzer, K. Crammer, and O. Pereira, Analysis of representations for domain adaptation, NIPS, p.24, 2007.

S. Ben-david, J. Blitzer, K. Crammer, A. Kulesza, F. Pereira et al., A theory of learning from different domains, Machine Learning, vol.60, issue.1-2, pp.151-175, 2010.
DOI : 10.1007/s10994-009-5152-4

S. Bickel, M. Brückner, and T. Scheffer, Discriminative learning under covariate shift, The Journal of Machine Learning Research, vol.10, issue.24, pp.2137-2155, 2009.

J. Blitzer, K. Crammer, A. Kulesza, F. Pereira, and J. Wortman, Learning bounds for domain adaptation, Advances in neural information processing systems, pp.129-136, 2008.

A. Blumer, A. Ehrenfeucht, D. Haussler, and M. Warmuth, Classifying learnable geometric concepts with the Vapnik-Chervonenkis dimension, Proceedings of the eighteenth annual ACM symposium on Theory of computing , STOC '86, pp.273-282, 1986.
DOI : 10.1145/12130.12158

A. Blumer, A. Ehrenfeucht, D. Haussler, and M. K. Warmuth, Learnability and the Vapnik-Chervonenkis dimension, Journal of the ACM, vol.36, issue.4, pp.929-965, 1989.
DOI : 10.1145/76359.76371

A. Blumer, A. Ehrenfeucht, D. Haussler, and M. K. Warmuth, Occams razor. Readings in machine learning, pp.201-204, 1990.

E. Bernhard, . Boser, M. Isabelle, . Guyon, N. Vladimir et al., A training algorithm for optimal margin classifiers, Proceedings of the fifth annual workshop on Computational learning theory, pp.144-152, 1992.

L. Bottou and V. Vapnik, Local Learning Algorithms, Neural Computation, vol.2, issue.6, pp.888-900, 1992.
DOI : 10.1162/neco.1989.1.2.281

E. George, . Box, C. George, and . Tiao, Bayesian inference in statistical analysis, 2011.

L. Breiman and P. Spector, Submodel selection and evaluation in regression . the x-random case. International statistical review/revue internationale de Statistique, pp.291-319, 1992.

P. Caillet, S. Klemm, M. Ducher, A. Aussem, and A. Schott, Hip Fracture in the Elderly: A Re-Analysis of the EPIDOS Study with Causal Bayesian Networks, Plos One, p.75, 2015.
DOI : 10.1371/journal.pone.0120125.s001

J. Q. Candela, M. Sugiyama, A. Schwaighofer, and N. D. Lawrence, Dataset Shift in Machine Learning, p.23, 2009.

V. Castelli, M. Thomas, and . Cover, The relative value of labeled and unlabeled samples in pattern recognition with an unknown mixing parameter, IEEE Transactions on Information Theory, vol.42, issue.6, pp.2102-2117, 1996.
DOI : 10.1109/18.556600

O. Chapelle, M. Chi, and A. Zien, A continuation method for semi-supervised SVMs, Proceedings of the 23rd international conference on Machine learning , ICML '06, pp.185-192, 2006.
DOI : 10.1145/1143844.1143868

G. Cooper, Causal discovery from data in the presence of selection bias, Proceedings of the Fifth International Workshop on Artificial Intelligence and Statistics, pp.140-150, 1995.

C. Cortes and V. Vapnik, Support-vector networks, Machine Learning, vol.1, issue.3, pp.273-297, 1995.
DOI : 10.1007/BF00994018

C. Cortes, Y. Mansour, and M. Mohri, Learning bounds for importance weighting, Advances in Neural Information Processing Systems, pp.442-450, 1924.

N. Courty and R. Flamary, Devis Tuia, and Alain Rakotomamonjy. Optimal transport for domain adaptation, IEEE transactions on pattern analysis and machine intelligence, p.24, 2016.

D. Cox and N. Wermuth, Multivariate dependencies: Models, analysis and interpretation, p.23, 1996.

H. Daumé and I. , Frustratingly easy domain adaptation. arXiv preprint, p.24, 2009.

S. Rodrigues, D. Morais, and A. Aussem, An efficient learning algorithm for local bayesian network structure discovery, European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECMLPKDD10, pp.164-169, 2010.

L. Devroye, L. Györfi, and G. Lugosi, A probabilistic theory of pattern recognition, 2013.
DOI : 10.1007/978-1-4612-0711-5

V. Didelez, S. Kreiner, and N. Keiding, Graphical Models for Inference Under Outcome-Dependent Sampling, Statistical Science, vol.25, issue.3, pp.368-387, 2010.
DOI : 10.1214/10-STS340

M. Dudík, J. Steven, . Phillips, E. Robert, and . Schapire, Correcting sample selection bias in maximum entropy density estimation, Advances in neural information processing systems, pp.323-330, 2005.

C. Elkan, The foundations of cost-sensitive learning, Proceedings of the 17th International Joint Conference on Artificial Intelligence, pp.973-978, 2001.

T. Evgeniou, M. Pontil, and T. Poggio, Regularization networks and support vector machines Advances in computational mathematics, p.36, 2000.

W. Fan and I. Davidson, On Sample Selection Bias and Its Efficient Correction via Model Averaging and Unlabeled Examples, SDM. SIAM, p.24, 2007.
DOI : 10.1137/1.9781611972771.29

B. Fernando, A. Habrard, M. Sebban, and T. Tuytelaars, Subspace alignment for domain adaptation. arXiv preprint, p.24, 2014.

Y. Freund, E. Robert, and . Schapire, A desicion-theoretic generalization of on-line learning and an application to boosting, European conference on computational learning theory, pp.23-37, 1995.
DOI : 10.1007/3-540-59119-2_166

S. Geneletti, S. Richardson, and N. Best, Adjusting for selection bias in retrospective, case-control studies, Biostatistics, vol.10, issue.1, pp.17-31, 2009.
DOI : 10.1093/biostatistics/kxn010

F. Girosi, M. Jones, and T. Poggio, Regularization Theory and Neural Networks Architectures, Neural Computation, vol.26, issue.3, pp.219-269, 1995.
DOI : 10.1016/0893-6080(90)90004-5

M. Glymour, Using causal diagrams to understand common problems in social epidemiology, Methods in social epidemiology, pp.393-428, 2006.

J. Peter and . Green, Reversible jump markov chain monte carlo computation and bayesian model determination, Biometrika, issue.8, pp.711-732, 1995.

S. Greenland, J. Pearl, M. James, and . Robins, Causal Diagrams for Epidemiologic Research, Epidemiology, vol.10, issue.1, pp.37-48, 1999.
DOI : 10.1097/00001648-199901000-00008

A. Gretton, A. Smola, J. Huang, M. Schmittfull, K. Borgwardt et al., Covariate shift by kernel mean matching . Dataset shift in machine learning, pp.5-88, 2009.
DOI : 10.7551/mitpress/9780262170055.003.0008

URL : http://www.kyb.tuebingen.mpg.de/publications/attachments/shift-book-for-LeEtAl-webversion_5376%5B0%5D.pdf

A. Miguel, S. Hernán, . Hernández-díaz, M. Martha, A. A. Werler et al., Causal knowledge as a prerequisite for confounding evaluation: an application to birth defects epidemiology, American journal of epidemiology, vol.155, issue.2, pp.176-184, 2002.

A. Miguel, S. Hernán, . Hernández-díaz, M. James, and . Robins, A structural approach to selection bias, Epidemiology, vol.15, issue.51, pp.615-625, 2004.

T. Hofmann, B. Schölkopf, and A. J. Smola, Kernel methods in machine learning. The annals of statistics, pp.1171-1220, 2008.

I. Ralph, . Horwitz, R. Alvan, and . Feinstein, Alternative analytic methods for casecontrol studies of estrogens and endometrial cancer, New England journal of medicine, vol.299, issue.20, pp.1089-1094, 1978.

J. Huang, A. J. Smola, A. Gretton, K. M. Borgwardt, and B. Schölkopf, Correcting sample selection bias by unlabeled data, NIPS, pp.601-608, 2006.

T. Joachims, Making large-scale svm learning practical, LS VIII-Report, vol.37, p.69, 1998.

T. Joachims, Transductive inference for text classification using support vector machines, Proceedings of the Sixteenth International Conference on Machine Learning, ICML '99, pp.200-209, 1999.

T. Kanamori, S. Hido, and M. Sugiyama, A least-squares approach to direct importance estimation, J. Mach. Learn. Res, vol.10, pp.1391-1445, 2009.

T. Kanamori, T. Suzuki, and M. Sugiyama, Statistical analysis of kernel-based least-squares density-ratio estimation, Machine Learning, pp.335-367, 2012.
DOI : 10.1109/ISIT.2009.5205712

J. Michael, U. Kearns, . Virkumar, and . Vazirani, An introduction to computational learning theory, 1994.

R. Kohavi, A study of cross-validation and bootstrap for accuracy estimation and model selection, Ijcai, pp.1137-1145, 1921.

K. Kojima, E. Perrier, S. Imoto, and S. Miyano, Optimal search on clustered structural constraint for learning bayesian network structure, Journal of Machine Learning Research, vol.11, pp.285-310, 2010.

J. Roderick and . Little, Missing-data adjustments in large surveys, Journal of Business & Economic Statistics, vol.6, issue.3 101, pp.287-296, 1988.

J. David and . Mackay, Bayesian methods for adaptive models, 1992.

S. Mendelson, A Few Notes on Statistical Learning Theory, Advanced lectures on machine learning, pp.1-40, 2003.
DOI : 10.1007/3-540-36434-X_1

G. Jose, T. Moreno-torres, R. Raeder, . Alaiz-rodríguez, V. Nitesh et al., A unifying view on dataset shift in classification, Pattern Recognition, vol.45, issue.1, pp.521-530, 2012.

G. Jose, T. Moreno-torres, R. Raeder, . Alaiz-rodríguez, V. Nitesh et al., A unifying view on dataset shift in classification, Pattern Recognition, vol.45, issue.1, pp.521-530, 2012.

P. Tim, . Morris, R. Ian, P. White, and . Royston, Tuning multiple imputation by predictive mean matching and local residual draws. BMC medical research methodology, 2014.

X. Nguyen, J. Martin, . Wainwright, I. Michael, and . Jordan, Estimating divergence functionals and the likelihood ratio by convex risk minimization. Information Theory, IEEE Transactions on, vol.56, issue.11, pp.5847-5861, 2010.

E. Parzen, On estimation of a probability density function and mode. The annals of mathematical statistics, pp.1065-1076, 1962.

J. Pearl, Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference, 1988.

J. Pearl, On a class of bias-amplifying variables that endanger effect estimates, UAI 2010 Proceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence, pp.417-424, 2010.

J. Pearl, A solution to a class of selection-bias problems, p.50, 2012.

M. Jose and . Peña, Finding consensus bayesian network structures, Journal of Artificial Intelligence Research, vol.42, pp.661-687, 2011.

M. Rosenblatt, Remarks on Some Nonparametric Estimates of a Density Function, The Annals of Mathematical Statistics, vol.27, issue.3, pp.832-837, 1956.
DOI : 10.1214/aoms/1177728190

B. Scholkopf and A. J. Smola, Learning with kernels: support vector machines, regularization, optimization, and beyond, p.36, 2001.

M. Scutari and A. Brogini, Bayesian Network Structure Learning with Permutation Tests, Communications in Statistics - Theory and Methods, vol.35, issue.3, pp.16-173233, 2012.
DOI : 10.1007/s10994-006-6889-7

H. Shimodaira, Improving predictive inference under covariate shift by weighting the log-likelihood function, Journal of Statistical Planning and Inference, vol.90, issue.2, pp.227-244, 2000.
DOI : 10.1016/S0378-3758(00)00115-4

J. Alex, B. Smola, and . Schölkopf, Learning with kernels. Citeseer, 1921.

I. Steinwart, Support Vector Machines are Universally Consistent, Journal of Complexity, vol.18, issue.3, pp.768-791, 2002.
DOI : 10.1006/jcom.2002.0642

M. Stone, Cross-validatory choice and assessment of statistical predictions, Journal of the Royal Statistical Society. Series B (Methodological), pp.111-147, 1921.

M. Sugiyama and M. Kawanabe, Machine learning in nonstationary environments: Introduction to covariate shift adaptation, p.25, 2012.

M. Sugiyama, M. Krauledat, and K. A?ller, Covariate shift adaptation by importance weighted cross validation, Journal of Machine Learning Research, vol.8, pp.985-1005, 2007.

M. Sugiyama, S. Nakajima, H. Kashima, M. Paul-von-bünau, and . Kawanabe, Direct importance estimation with model selection and its application to covariate shift adaptation, NIPS, p.89, 1925.

B. Sun, J. Feng, and K. Saenko, Return of frustratingly easy domain adaptation, AAAI, pp.8-2016

K. M. Ting, A study on the effect of class distribution using cost-sensitive learning, Proceedings of the 5th International Conference on Discovery Science, DS '02, pp.98-112, 2002.

. Van-tinh, A. Tran, and . Aussem, A practical approach to reduce the learning bias under covariate shift, Machine Learning and Knowledge Discovery in Databases -European Conference, ECML PKDD 2015 Proceedings, Part II, pp.71-86, 2015.

. Van-tinh, A. Tran, and . Aussem, Correcting a class of complete selection bias with external data based on importance weight estimation, International Conference on Neural Information Processing, pp.111-118

. Springer, 23 REFERENCES Van-Tinh Tran and Alex Aussem Reducing variance due to importance weightingin covariate shift bias correction, 25th European Symposium on Artificial Neural Networks, p.98, 2015.

V. Vapnik, The nature of statistical learning theory Springer science & business media, p.37, 2013.

N. Vladimir and . Vapnik, Statistical learning theory, p.15, 1998.

E. Villanueva and C. Maciel, Optimized algorithm for learning bayesian network super-structures, pp.217-222, 2012.

J. Von-neumann, Various techniques used in connection with random digits, National Bureau of Standards, Applied Mathematics Series, vol.12, pp.36-38, 1951.

G. Wahba, Spline models for observational data, Siam, vol.59, 1921.
DOI : 10.1137/1.9781611970128

H. David and . Wolpert, The lack of a priori distinctions between learning algorithms, Neural computation, vol.8, issue.7 11, pp.1341-1390, 1996.

Y. Yu and C. Szepesvári, Analysis of kernel mean matching under covariate shift, Proceedings of the 29th International Conference on Machine Learning (ICML-12), pp.607-614, 2012.

B. Zadrozny, Learning and evaluating classifiers under sample selection bias, Twenty-first international conference on Machine learning , ICML '04, pp.114-145, 2004.
DOI : 10.1145/1015330.1015425

B. Zadrozny, J. Langford, and N. Abe, Cost-sensitive learning by cost-proportionate example weighting, Third IEEE International Conference on Data Mining, pp.435-467, 2003.
DOI : 10.1109/ICDM.2003.1250950

X. Zhu, Semi-supervised learning literature survey, 2005.