M. A. Arcones and E. Gine, Limit theorems for U-processes. The Annals of Probability, pp.1494-1542, 1993.
DOI : 10.1214/aop/1176989128

N. Aronszajn, Theory of reproducing kernels, Transactions of the American Mathematical Society, vol.68, issue.3, pp.337-404, 1950.
DOI : 10.1090/S0002-9947-1950-0051437-7

F. R. Bach and M. I. Jordan, Kernel independent component analysis, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)., pp.1-48, 2002.
DOI : 10.1109/ICASSP.2003.1202783

C. R. Baker, Joint measures and cross-covariance operators. Transactions of the, pp.273-289, 1973.
DOI : 10.2307/1996566

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.153.8596

O. Banerjee, L. Ghaoui, and A. , Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data, The Journal of Machine Learning Research, vol.9, 2008.

Y. Bengio, Learning Deep Architectures for AI, Machine Learning, pp.1-127, 2009.
DOI : 10.1561/2200000006

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.150.527

Y. Bengio, E. Thibodeau-laufer, G. Alain, and J. Yosinski, Deep generative stochastic networks trainable by backprop, Proceedings of the 31st International Conference on Machine Learning, 2014.

A. Berlinet and C. Thomas-agnan, Reproducing Kernel Hilbert Spaces in Probability and Statistics, 2011.
DOI : 10.1007/978-1-4419-9096-9

R. Bouckaert, P. Lemey, M. Dunn, S. J. Greenhill, A. V. Alekseyenko et al., Mapping the Origins and Expansion of the Indo-European Language Family, Science, vol.4, issue.5, pp.337957-960, 2012.
DOI : 10.1080/10635150490522232

W. Bounliphone and M. B. Blaschko, Linear time non-Gaussian precision matrix estimation

W. Bounliphone, A. Gretton, and M. Blaschko, Kernel non-parametric tests of relative dependency, NIPS Workshop on Modern Nonparametrics 3: Automating the Learning Pipeline, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01263194

W. Bounliphone, A. Gretton, A. Tenenhaus, and M. B. Blaschko, A kernel test of relative dependency, Women in Machine Learning Workshop, 2014.

W. Bounliphone, A. Gretton, A. Tenenhaus, and M. B. Blaschko, A low variance consistent test of relative dependency, Proceedings of The 32nd International Conference on Machine Learning Conference Proceedings, pp.20-29, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01005828

W. Bounliphone, A. Gretton, A. Tenenhaus, and M. B. Blaschko, Kernel non-parametric tests of relative dependency, International Conference of the ERCIM WG on Computational and Methodological Statistics -CMStatistics 2015, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01263194

W. Bounliphone, E. Belilovsky, M. B. Blaschko, I. Antonoglou, and A. Gretton, A test of relative similarity for model selection in generative models, The 4th International Conference on Learning Representations, 2016.

W. Bounliphone, E. Belilovsky, M. B. Blaschko, I. Antonoglou, and A. Gretton, A kernel test of relative similarity, Women in Machine Learning Workshop, 2016.

W. Bounliphone, E. Belilovsky, A. Tenenhaus, I. Antonoglou, A. Gretton et al., Fast Non-Parametric Tests of Relative Dependency and Similarity

J. Bring, A geometric approach to compare variables in a regression model, The American Statistician, vol.50, issue.1, pp.57-62, 1996.

H. Callaert and P. Janssen, The Berry-Esseen theorem for U-statistics. The Annals of Statistics, pp.417-421, 1978.

J. Chang, Q. Shao, and W. Zhou, Cramér-type moderate deviations for Studentized two-sample U-statistics with applications. The Annals of Statistics, pp.1931-1956, 2016.

L. H. Chen and Q. Shao, Normal approximation for nonlinear statistics using a concentration inequality approach, Bernoulli, vol.13, issue.2, pp.581-599, 2007.
DOI : 10.3150/07-BEJ5164

. Chromosome-disorder and . Outreach, Introduction to chromosomes, 2016.

K. P. Chwialkowski, A. Ramdas, D. Sejdinovic, and A. Gretton, Fast two-sample testing with analytic representations of probability measures, Advances in Neural Information Processing Systems, pp.1981-1989, 2015.

C. Cortes, M. Mohri, and A. Rostamizadeh, Learning non-linear combinations of kernels, Advances in Neural Information Processing Systems, 2009.

R. B. Darlington, Multiple regression in psychological research and practice., Psychological Bulletin, vol.69, issue.3, p.161, 1968.
DOI : 10.1037/h0025471

J. Dauxois and G. M. Nkiet, Nonlinear canonical analysis and independence tests, The Annals of Statistics, vol.26, issue.4, pp.1254-1278, 1998.
DOI : 10.1214/aos/1024691242

URL : http://projecteuclid.org/download/pdf_1/euclid.aos/1024691242

A. P. Dawid, Conditional independence in statistical theory, 1979.
DOI : 10.1214/aos/1176345011

URL : http://projecteuclid.org/download/pdf_1/euclid.aos/1176345011

S. R. De-morais and A. Aussem, An Efficient and Scalable Algorithm for Local Bayesian Network Structure Discovery, Machine Learning and Knowledge Discovery in Databases, Part III, pp.164-179, 2010.
DOI : 10.1007/978-3-642-15939-8_11

A. P. Dempster, Covariance Selection, Biometrics, vol.28, issue.1, pp.157-175, 1972.
DOI : 10.2307/2528966

J. Dieudonné, Foundations of Modern Analysis, 1960.

G. Doran, K. Muandet, K. Zhang, and B. Schölkopf, A permutation-based kernel conditional independence test, Conference on Uncertainty in Artificial Intelligence, pp.132-141, 2014.

M. Drton and M. D. Perlman, Model selection for Gaussian concentration graphs, Biometrika, vol.91, issue.3, pp.591-602, 2004.
DOI : 10.1093/biomet/91.3.591

R. M. Dudley, Real Analysis and Probability, 2002.
DOI : 10.1017/CBO9780511755347

G. K. Dziugaite, D. M. Roy, and Z. Ghahramani, Training generative neural networks via maximum mean discrepancy optimization, Conference on Uncertainty in Artificial Intelligence, 2015.

Y. Escoufier, Le Traitement des Variables Vectorielles, Biometrics, vol.29, issue.4, pp.751-760, 1973.
DOI : 10.2307/2529140

R. A. Fisher, The distribution of the partial correlation coefficient, Metron, vol.3, pp.329-332, 1924.

M. Fréchet, Sur les ensembles de fonctions et les opérations linéaires, CR Acad. Sci. Paris, vol.144, pp.1414-1416, 1907.

J. Friedman, T. Hastie, and R. Tibshirani, Sparse inverse covariance estimation with the graphical lasso, Biostatistics, vol.9, issue.3, 2008.
DOI : 10.1093/biostatistics/kxm045

URL : http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3019769

J. H. Friedman and L. C. Rafsky, Multivariate generalizations of the Wald-Wolfowitz and Smirnov two-sample tests. The Annals of Statistics, pp.697-717, 1979.

K. Fukumizu, F. R. Bach, and M. I. Jordan, Dimensionality reduction for supervised learning with reproducing kernel Hilbert spaces, Journal of Machine Learning Research, vol.5, pp.73-99, 2004.

K. Fukumizu, A. Gretton, X. Sun, and B. Schölkopf, Kernel measures of conditional dependence, Advances in Neural Information Processing Systems, pp.489-496, 2007.

K. Fukumizu, F. R. Bach, and M. I. Jordan, Kernel dimension reduction in regression. The Annals of Statistics, pp.1871-1905, 2009.
DOI : 10.1214/08-aos637

URL : http://arxiv.org/abs/0908.1854

M. Gasse, A. Aussem, and H. Elghazel, An Experimental Comparison of Hybrid Algorithms for Bayesian Network Structure Learning, Machine Learning and Knowledge Discovery in Databases, Part I, pp.58-73, 2012.
DOI : 10.1007/978-3-642-33460-3_9

URL : https://hal.archives-ouvertes.fr/hal-01122771

R. J. Gilbertson and D. H. Gutmann, Tumorigenesis in the Brain: Location, Location, Location, Cancer Research, vol.67, issue.12, pp.5579-5582, 2007.
DOI : 10.1158/0008-5472.CAN-07-0760

E. Gómez, A multivariate generalization of the power exponential family of distributions, Communications in Statistics - Theory and Methods, vol.31, issue.3, p.27, 1998.
DOI : 10.1109/TAC.1973.1100374

I. Goodfellow, J. Pouget-abadie, M. Mirza, B. Xu, D. Warde-farley et al., Generative adversarial nets, Advances in Neural Information Processing Systems, pp.2672-2680, 2014.

R. D. Gray and Q. D. Atkinson, Language-tree divergence times support the Anatolian theory of Indo-European origin, Nature, vol.426, issue.6965, pp.426435-439, 2003.
DOI : 10.1038/nature02029

A. Gretton and L. Gyorfi, Consistent nonparametric tests of independence, Journal of Machine Learning Research, vol.11, pp.1391-1423, 2010.

A. Gretton, O. Bousquet, A. J. Smola, and B. Schölkopf, Measuring Statistical Dependence with Hilbert-Schmidt Norms, Algorithmic Learning Theory, pp.63-77, 2005.
DOI : 10.1007/11564089_7

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.105.477

A. Gretton, R. Herbrich, A. Smola, O. Bousquet, and B. Schölkopf, Kernel methods for measuring independence, Journal of Machine Learning Research, vol.6, pp.2075-2129, 2005.

A. Gretton, K. M. Borgwardt, M. Rasch, B. Schölkopf, and A. J. Smola, A kernel method for the two-sample-problem, Advances in Neural Information Processing Systems, pp.513-520, 2006.

A. Gretton, K. Fukumizu, C. Teo, L. Song, B. Schölkopf et al., A kernel statistical test of independence, Neural Information Processing Systems, pp.585-592, 2008.

A. Gretton, K. M. Borgwardt, M. J. Rasch, B. Schölkopf, and A. Smola, A kernel two-sample test, The Journal of Machine Learning Research, vol.13, issue.1, pp.723-773, 2012.

A. Gretton, D. Sejdinovic, H. Strathmann, S. Balakrishnan, M. Pontil et al., Optimal kernel choice for large-scale two-sample tests, Advances in Neural Information Processing Systems 25, pp.1205-1213, 2012.

M. G. Sell, J. Taylor, and R. Tibshirani, Adaptive testing for the graphical lasso, 2013.

S. R. Gunn and J. S. Kandola, Structural modelling with sparse kernels, Machine Learning, pp.137-163, 2002.

P. Hall and N. Tajvidi, Permutation tests for equality of distributions in high-dimensional settings, Biometrika, vol.89, issue.2, pp.359-374, 2002.
DOI : 10.1093/biomet/89.2.359

J. M. Hammersley and P. Clifford, Markov fields on finite graphs and lattices, 1971.

R. Heller, Y. Heller, and M. Gorfine, A consistent multivariate test of association based on ranks of distances, Biometrika, vol.100, issue.2, pp.503-510, 2013.
DOI : 10.1093/biomet/ass070

G. Hinton, S. Osindero, and Y. Teh, A Fast Learning Algorithm for Deep Belief Nets, Neural Computation, vol.18, issue.7, pp.1527-1554, 2006.
DOI : 10.1162/jmlr.2003.4.7-8.1235

W. Hoeffding, A class of statistics with asymptotically normal distribution. The Annals of Mathematical Statistics, pp.293-325, 1948.

W. Hoeffding, Probability Inequalities for Sums of Bounded Random Variables, Journal of the American Statistical Association, vol.1, issue.301, pp.13-30, 1963.
DOI : 10.1214/aoms/1177730491

J. Jankova and S. Van-de-geer, Confidence intervals for high-dimensional inverse covariance estimation, Electronic Journal of Statistics, vol.9, issue.1, pp.1205-1229, 2015.
DOI : 10.1214/15-EJS1031

H. Joe, Multivariate Models and Multivariate Dependence Concepts, 1997.
DOI : 10.1201/b13150

M. G. Kendall, The Advanced Theory of, Statistics. C. Griffin, 1946.

D. P. Kingma and M. Welling, Auto-encoding variational Bayes, International Conference on Learning Representations, 2014.

D. P. Kingma, S. Mohamed, D. J. Rezende, and M. Welling, Semi-supervised learning with deep generative models, Advances in Neural Information Processing Systems, pp.3581-3589, 2014.

J. B. Kinney and G. S. , Equitability, mutual information, and the maximal information coefficient, Proceedings of the National Academy of Sciences, 2014.
DOI : 10.1109/TMI.2003.815867

URL : http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3948249

P. Koehn, Europarl: A parallel corpus for statistical machine translation, MT summit, pp.79-86, 2005.

A. Krishnamurthy, K. Kandasamy, B. Póczos, and L. A. Wasserman, On estimating L 2 2 divergence, Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, 2015.

A. Krizhevsky, I. Sutskever, and G. E. Hinton, Imagenet classification with deep convolutional neural networks, Advances in Neural Information Processing Systems, pp.1097-1105, 2012.

H. Larochelle and I. Murray, The neural autoregressive distribution estimator, Journal of Machine Learning Research, vol.15, pp.29-37, 2011.

S. L. Lauritzen, Graphical Models, 1996.

Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, Gradient-based learning applied to document recognition, Proceedings of the IEEE, pp.2278-2324, 1998.
DOI : 10.1109/5.726791

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.138.1115

A. J. Lee, U-statistics: Theory and practice, 1990.

E. L. Lehmann, Elements of Large-sample Theory, 1999.
DOI : 10.1007/b98855

H. Li and J. Gui, Gradient directed regularization for sparse Gaussian concentration graphs, with applications to inference of genetic networks, Biostatistics, vol.7, issue.2, pp.302-317, 2006.
DOI : 10.1093/biostatistics/kxj008

S. Li, Y. Xie, H. Dai, and L. Song, M-statistic for kernel change-point detection, Advances in Neural Information Processing Systems, pp.3348-3356, 2015.

Y. Li, K. Swersky, and R. Zemel, Generative moment matching networks, International Conference on Machine Learning, pp.1718-1727, 2015.

W. Liu, Gaussian graphical model estimation with false discovery rate control. The Annals of Statistics, pp.2948-2978, 2013.
DOI : 10.1214/13-aos1169

URL : http://arxiv.org/abs/1306.0976

J. R. Lloyd and Z. Ghahramani, Statistical model criticism using kernel two sample tests, Advances in Neural Information Processing Systems, 2015.

R. Lockhart, J. Taylor, R. J. Tibshirani, and R. Tibshirani, A significance test for the lasso, The Annals of Statistics, vol.42, issue.2, p.413, 2014.
DOI : 10.1214/14-AOS1175REJ

P. Loh and M. J. Wainwright, Structure estimation for discrete graphical models: Generalized covariance matrices and their inverses, The Annals of Statistics, vol.41, issue.6, pp.3022-3049, 2013.
DOI : 10.1214/13-AOS1162SUPP

URL : http://arxiv.org/abs/1212.0478

C. Louizos, K. Swersky, Y. Li, M. Welling, and R. Zemel, The variational fair auto encoder, International Conference on Learning Representations, 2016.

C. Mcdiarmid, On the method of bounded differences, pp.148-188, 1989.
DOI : 10.1017/CBO9781107359949.008

N. Meinshausen and P. Bühlmann, High-dimensional graphs and variable selection with the lasso. The Annals of Statistics, pp.1436-1462, 2006.

P. Narasimhan, J. Wood, C. R. Macintyre, and D. Mathai, Risk factors for tuberculosis. Pulmonary Medicine, 2013.

. National-human-genome-research-institute, Clinical sequencing centers aim for guidelines, 2016.

R. E. Neapolitan, Learning Bayesian Networks, 2004.
DOI : 10.1016/B978-012370477-1.50021-9

T. Palm, D. Figarella-branger, F. Chapon, C. Lacroix, F. Gray et al., Expression profiling of ependymomas unravels localization and tumor grade-specific tumorigenesis, Cancer, vol.17, issue.3-4, pp.1153955-3968, 2009.
DOI : 10.1002/cncr.24476

C. Peters, M. Braschler, and P. Clough, Multilingual Information Retrieval: From Research to Practice, 2012.
DOI : 10.1007/978-3-642-23008-0

J. Peters, J. M. Mooij, D. Janzing, and B. Schölkopf, Causal discovery with continuous additive noise models, Journal of Machine Learning Research, vol.15, issue.1, pp.2009-2053, 2014.

S. Puget, C. Philippe, D. Bax, B. Job, P. Varlet et al., Mesenchymal Transition and PDGFRA Amplification/Mutation Are Key Distinct Oncogenic Events in Pediatric Diffuse Intrinsic Pontine Gliomas, PLoS ONE, vol.26, issue.1, p.30313, 2012.
DOI : 10.1371/journal.pone.0030313.s008

P. Ravikumar, M. J. Wainwright, G. Raskutti, and B. Yu, High-dimensional covariance estimation by minimizing ???1-penalized log-determinant divergence, Electronic Journal of Statistics, vol.5, issue.0, pp.935-980, 2011.
DOI : 10.1214/11-EJS631

URL : http://arxiv.org/pdf/0811.3628v1.pdf

Z. Ren, T. Sun, C. Zhang, and H. H. Zhou, Asymptotic normality and optimalities in estimation of large Gaussian graphical models. The Annals of Statistics, pp.991-1026, 2015.

D. Reshef, Y. Reshef, H. Finucane, S. Grossman, G. Mcvean et al., Detecting novel associations in large datasets, Science, issue.6062, p.334, 2011.
DOI : 10.1126/science.1205438

URL : http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3325791

F. Riesz, Sur une espèce de géométrie analytique des systèmes de fonctions sommables, CR Acad. Sci. Paris, vol.144, pp.1409-1411, 1907.
DOI : 10.1007/978-3-642-37535-4_31

P. R. Rosenbaum, An exact distribution-free test comparing two multivariate distributions based on adjacency, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol.31, issue.4, pp.515-530, 2005.
DOI : 10.1214/aos/1032526956

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.324.5111

A. Roverato and J. Whittaker, Standard errors for the parameters of graphical Gaussian models, Statistics and Computing, vol.32, issue.3, pp.297-302, 1996.
DOI : 10.1007/BF00140874

W. Rudin, Real and Complex Analysis, 1987.

D. E. Rumelhart, G. E. Hinton, and R. J. Williams, Learning representations by backpropagating errors, Neurocomputing: Foundations of Research, pp.696-699, 1988.
DOI : 10.1038/323533a0

R. Salakhutdinov and G. E. Hinton, Deep Boltzmann machines, International Conference on Artificial Intelligence and Statistics, pp.448-455, 2009.

J. Schäfer and K. Strimmer, A Shrinkage Approach to Large-Scale Covariance Matrix Estimation and Implications for Functional Genomics, Statistical Applications in Genetics and Molecular Biology, vol.4, issue.1, 2005.
DOI : 10.2202/1544-6115.1175

I. Schaller-schwaner, Abstract, Language Learning in Higher Education, vol.5, issue.1, pp.1-23, 2015.
DOI : 10.1515/cercles-2015-0001

B. Schölkopf and A. J. Smola, Learning with Kernels: Support Vector Machines, Regularization , Optimization, and Beyond, 2001.

B. Schölkopf, A. Smola, and K. Müller, Kernel principal component analysis, International Conference on Artificial Neural Networks, pp.583-588, 1997.
DOI : 10.1007/BFb0020217

K. Sechidis and G. Brown, Markov blanket discovery in positive-unlabelled and semisupervised data, Machine Learning and Knowledge Discovery in Databases, Part I, pp.351-366, 2015.
DOI : 10.1007/978-3-319-23528-8_22

D. Sejdinovic, B. Sriperumbudur, A. Gretton, and K. Fukumizu, Equivalence of distancebased and RKHS-based statistics in hypothesis testing. The Annals of Statistics, pp.2263-2702, 2013.

R. J. Serfling, Approximation Theorems of Mathematical Statistics, 2009.

J. Shawe-taylor and N. Cristianini, Kernel Methods for Pattern Analysis, 2004.
DOI : 10.1017/CBO9780511809682

A. Smola, A. Gretton, L. Song, and B. Schölkopf, A Hilbert Space Embedding for Distributions, International Conference on Algorithmic Learning Theory, pp.13-31, 2007.
DOI : 10.1073/pnas.0601231103

L. Song, A. Smola, A. Gretton, J. Bedo, and K. Borgwardt, Feature selection via dependence maximization, Journal of Machine Learning Research, vol.13, pp.1393-1434, 2012.

B. K. Sriperumbudur, K. Fukumizu, and G. R. Lanckriet, Universality, characteristic kernels and RKHS embedding of measures, Journal of Machine Learning Research, vol.12, pp.2389-2410, 2011.

I. Steinwart, On the influence of the kernel on the consistency of support vector machines, Journal of Machine Learning Research, vol.2, pp.67-93, 2001.

S. Suchindran, E. S. Brouwer, and A. Van-rie, Is HIV Infection a Risk Factor for Multi-Drug Resistant Tuberculosis? A Systematic Review, PLoS ONE, vol.177, issue.74, p.5561, 2009.
DOI : 10.1371/journal.pone.0005561.t001

URL : http://doi.org/10.1371/journal.pone.0005561

D. J. Sutherland, Scalable, Flexible, and Active Learning on Distributions, 2016.

D. J. Sutherland, H. Tung, H. Strathmann, S. De, A. Ramdas et al., Generative model and models criticism via optimized maximum mean discrepancy, 2016.

G. J. Székely, M. L. Rizzo, and N. K. Bakirov, Measuring and testing dependence by correlation of distances. The Annals of Statistics, pp.2769-2794, 2007.

J. Trommershauser, K. Kording, and M. S. Landy, Sensory Cue Integration, 2011.
DOI : 10.1093/acprof:oso/9780195387247.001.0001

B. Bahr, On the Convergence of Moments in the Central Limit Theorem, The Annals of Mathematical Statistics, vol.36, issue.3, pp.808-818, 1965.
DOI : 10.1214/aoms/1177700055

L. Wasserman, M. Kolar, and A. Rinaldo, Berry-Esseen bounds for estimating undirected graphs, Electronic Journal of Statistics, vol.8, issue.1, p.2014
DOI : 10.1214/14-EJS928

J. Whittaker, Graphical Models in Applied Multivariate Statistics, 2009.

N. Xia, Y. Qin, and Z. Bai, Convergence rates of eigenvector empirical spectral distribution of large dimensional sample covariance matrix. The Annals of Statistics, pp.2572-2607, 2013.

M. Yuan and Y. Lin, Model selection and estimation in the Gaussian graphical model, Biometrika, vol.94, issue.1, pp.19-35, 2007.
DOI : 10.1093/biomet/asm018

W. Zaremba, A. Gretton, and M. Blaschko, B-test: A non-parametric, low variance kernel two-sample test, Advances in Neural Information Processing Systems, pp.755-763, 2013.

K. Zhang, J. Peters, D. Janzing, and B. Schölkopf, Kernel-based conditional independence test and application in causal discovery, In Uncertainty in Artificial Intelligence, pp.804-813, 2011.

K. Zhang, B. Schölkopf, K. Muandet, and Z. Wang, Domain adaptation under target and conditional shift, International Conference on Machine Learning, pp.819-827, 2013.