M. Agueh and G. Carlier, Barycenters in the Wasserstein space, SIAM Journal on Mathematical Analysis, vol.43, issue.2, pp.904-924, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00637399

D. Alvarez-melis, T. S. Jaakkola, and S. Jegelka, , 2017.

L. Ambrosio, N. Gigli, and G. Savaré, Gradient flows in metric spaces and in the space of probability measures, 2006.

M. Arjovsky, S. Chintala, and L. Bottou, Wasserstein generative adversarial networks, International Conference on Machine Learning, pp.214-223, 2017.

F. Aurenhammer, F. Hoffmann, and B. Aronov, Minkowski-type theorems and leastsquares clustering, Algorithmica, vol.20, issue.1, pp.61-76, 1998.

F. Bach, Sharp analysis of low-rank kernel matrix approximations, Conference on Learning Theory, pp.185-209, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00723365

P. L. Bartlett and S. Mendelson, Rademacher and gaussian complexities: Risk bounds and structural results, Journal of Machine Learning Research, vol.3, pp.463-482, 2002.

F. Bassetti, A. Bodini, and E. Regazzini, On minimum Kantorovich distance estimators, Statistics & probability letters, vol.76, issue.12, pp.1298-1302, 2006.

M. G. Bellemare, I. Danihelka, W. Dabney, S. Mohamed, B. Lakshminarayanan et al., The Cramer distance as a solution to biased Wasserstein gradients, 2017.

J. Benamou, G. Carlier, M. Cuturi, L. Nenna, and G. Peyré, Iterative bregman projections for regularized transportation problems, SIAM Journal on Scientific Computing, vol.37, issue.2, pp.1111-1138, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01096124

E. Bernton, P. E. Jacob, M. Gerber, and C. P. Robert, Inference in generative models using the Wasserstein distance, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01517550

J. Bigot, E. Cazelles, and N. Papadakis, Central limit theorems for Sinkhorn divergence between probability distributions on finite spaces and statistical applications, 2017.

M. Bi?kowski, D. J. Sutherland, M. Arbel, A. Gretton, . Demystifying et al., , 2018.

O. Bousquet, S. Gelly, I. Tolstikhin, C. Simon-gabriel, and B. Schoelkopf, From optimal transport to generative modeling: the VEGAN cookbook, 2017.

R. Burkard, M. Dell'amico, and S. Martello, , 2009.

P. J. Bushell, Hilbert's metric and positive contraction mappings in a Banach space, Archive for Rational Mechanics and Analysis, vol.52, issue.4, pp.330-338, 1973.

A. Calderón, Lebesgue spaces of differentiable functions, Proc. Sympos. Pure Math, vol.4, pp.33-49, 1961.

G. Canas and L. Rosasco, Learning probability measures with respect to optimal transport metrics, Advances in Neural Information Processing Systems, vol.25, pp.2492-2500, 2012.

G. Carlier, V. Duval, G. Peyré, and B. Schmitzer, Convergence of entropic schemes for optimal transport and gradient flows, SIAM Journal on Mathematical Analysis, vol.49, issue.2, pp.1385-1418, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01246086

L. Carratino, A. Rudi, and L. Rosasco, Learning with sgd and random features, Advances in Neural Information Processing Systems, pp.10213-10224, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01958906

Y. Chen, T. Georgiou, and M. Pavon, Entropic and displacement interpolation: a computational approach using the hilbert metric, SIAM Journal on Applied Mathematics, vol.76, issue.6, pp.2375-2396, 2016.

L. Chizat, Unbalanced optimal transport: Models, numerical methods, applications, 2017.
URL : https://hal.archives-ouvertes.fr/tel-01881166

L. Chizat, G. Peyré, B. Schmitzer, and F. Vialard, An interpolating distance between optimal transport and Fisher-Rao metrics, Foundations of Computational Mathematics, vol.18, issue.1, pp.1-44, 2018.

R. Cominetti and J. S. Martin, Asymptotic analysis of the exponential penalty trajectory in linear programming, Mathematical Programming, vol.67, issue.1-3, pp.169-187, 1994.

N. Courty, R. Flamary, and D. Tuia, Domain adaptation with regularized optimal transport, Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp.274-289, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01018698

N. Courty, R. Flamary, D. Tuia, and A. Rakotomamonjy, Optimal transport for domain adaptation, IEEE Transactions on Pattern Analysis and Machine Intelligence, issue.99, pp.1-1, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01170705

I. Csiszár, I-divergence geometry of probability distributions and minimization problems. The Annals of Probability, pp.146-158, 1975.

M. Cuturi, Sinkhorn distances: Lightspeed computation of optimal transport, Adv. in Neural Information Processing Systems, pp.2292-2300, 2013.

M. Cuturi and A. Doucet, Fast computation of Wasserstein barycenters, International Conference on Machine Learning, pp.685-693, 2014.

E. , D. Barrio, and J. Loubes, Central limit theorem for empirical transportation cost in general dimension, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01517192

R. Dudley, The speed of mean Glivenko-Cantelli convergence, The Annals of Mathematical Statistics, vol.40, issue.1, pp.40-50, 1969.

G. Dziugaite, D. Roy, and Z. Ghahramani, Training generative neural networks via maximum mean discrepancy optimization, Uncertainty in Artificial Intelligence-Proceedings of the 31st Conference, UAI 2015, pp.258-267, 2015.

J. Feydy and A. Trouvé, Global divergences between measures: from Hausdorff distance to Optimal Transport, International Workshop on Shape in Medical Imaging, pp.102-115, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01827184

J. Feydy, T. Séjourné, F. Vialard, S. Amari, A. Trouvé et al., Interpolating between optimal transport and MMD using Sinkhorn divergences, International Conference on Artificial Intelligence and Statistics, 2019.
URL : https://hal.archives-ouvertes.fr/hal-01898858

J. Franklin and J. Lorenz, On the scaling of multidimensional matrices, Linear Algebra and its applications, vol.114, pp.717-735, 1989.

C. Frogner, C. Zhang, H. Mobahi, M. Araya, and T. Poggio, Learning with a Wasserstein loss, Adv. in Neural Information Processing Systems, pp.2044-2052, 2015.

K. Fukumizu, A. Gretton, G. R. Lanckriet, B. Schölkopf, and B. K. Sriperumbudur, Kernel choice and classifiability for RKHS embeddings of probability distributions, Advances in neural information processing systems, pp.1750-1758, 2009.

B. Galerne, A. Leclaire, and J. Rabin, A texture synthesis model based on semi-discrete optimal transport in patch space, SIAM Journal on Imaging Sciences, vol.11, issue.4, pp.2456-2493, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01726443

A. Genevay, M. Cuturi, G. Peyré, and F. Bach, Stochastic optimization for large-scale optimal transport, Proc. NIPS'16, pp.3432-3440, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01321664

A. Genevay, G. Peyre, and M. Cuturi, Learning generative models with Sinkhorn divergences, International Conference on Artificial Intelligence and Statistics, pp.1608-1617, 2018.

A. Genevay, L. Chizat, F. Bach, M. Cuturi, and G. Peyré, Sample complexity of Sinkhorn divergences, International Conference on Artificial Intelligence and Statistics, 2019.

I. Goodfellow, J. Pouget-abadie, M. Mirza, B. Xu, D. Warde-farley et al., Generative adversarial nets, Advances in neural information processing systems, pp.2672-2680, 2014.

A. Gretton, K. Borgwardt, M. Rasch, B. Schölkopf, and A. Smola, A kernel method for the two-sample-problem, Adv. in Neural Information Processing Systems, pp.513-520, 2006.

I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. Courville, Improved training of Wasserstein GANs, 2017.

M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter, Gans trained by a two time-scale update rule converge to a local nash equilibrium, Advances in Neural Information Processing Systems, pp.6626-6637, 2017.

G. Huang, C. Guo, M. J. Kusner, Y. Sun, F. Sha et al., Supervised word mover's distance, Advances in Neural Information Processing Systems, vol.29, pp.4862-4870, 2016.

L. Kantorovich, On the transfer of masses (in Russian), Doklady Akademii Nauk, vol.37, issue.2, pp.227-229, 1942.

D. Kingma and J. Ba, Adam: A method for stochastic optimization, 2014.

D. P. Kingma and M. Welling, Auto-encoding variational bayes, 2013.

J. Kivinen, A. J. Smola, and R. C. Williamson, Online learning with kernels, Advances in neural information processing systems, pp.785-792, 2002.

M. Kusner, Y. Sun, N. Kolkin, and K. Q. Weinberger, From word embeddings to document distances, Proc. of the 32nd Intern. Conf. on Machine Learning, pp.957-966, 2015.

A. B. Larsen, S. K. Sonderby, H. Larochelle, and O. Winther, Autoencoding beyond pixels using a learned similarity metric, Proceedings of The 33rd International Conference on Machine Learning, vol.48, pp.20-22, 2016.

C. Li, W. Chang, Y. Cheng, Y. Yang, B. Póczos et al., Towards deeper understanding of moment matching network, 2017.

K. Li, R. Swersky, and . Zemel, Generative moment matching networks, International Conference on Machine Learning, pp.1718-1727, 2015.

M. Liero, A. Mielke, and G. Savaré, Optimal entropy-transport problems and a new Hellinger-Kantorovich distance between positive measures, Inventiones mathematicae, vol.211, issue.3, pp.969-1117, 2018.

M. Lucic, K. Kurach, M. Michalski, S. Gelly, and O. Bousquet, Are GANs created equal? a large-scale study, Advances in neural information processing systems, pp.697-706, 2018.

G. Luise, A. Rudi, M. Pontil, and C. Ciliberto, Differential properties of Sinkhorn approximation for learning with Wasserstein distance, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01958887

C. Mcdiarmid, On the method of bounded differences, London Mathematical Society Lecture Note Series, pp.148-188, 1989.

Q. Mérigot, A multiscale approach to optimal transport, Comput. Graph. Forum, vol.30, issue.5, pp.1583-1592, 2011.

G. Montavon, K. Müller, and M. Cuturi, Wasserstein training of restricted Boltzmann machines, Adv. in Neural Information Processing Systems, 2016.

A. Müller, Integral probability metrics and their generating classes of functions, vol.29, pp.429-443, 1997.

X. Nguyen, M. J. Wainwright, and M. I. Jordan, Estimating divergence functionals and the likelihood ratio by convex risk minimization, IEEE Transactions on Information Theory, vol.56, issue.11, pp.5847-5861, 2010.

S. Nowozin, B. Cseke, R. Tomioka, and . Gan, Training generative neural samplers using variational divergence minimization, Advances in Neural Information Processing Systems, pp.271-279, 2016.

J. Pennington, R. Socher, and C. Manning, Glove: Global vectors for word representation, Proc. of the Empirical Methods in Natural Language Processing, vol.12, pp.1532-1543, 2014.

G. Peyré and M. Cuturi, Computational optimal transport, 2017.

B. T. Polyak and A. B. Juditsky, Acceleration of stochastic approximation by averaging, SIAM Journal on Control and Optimization, vol.30, issue.4, pp.838-855, 1992.

A. Radford, L. Metz, and S. Chintala, Unsupervised representation learning with deep convolutional generative adversarial networks, 2015.

A. Rahimi and B. Recht, Random features for large-scale kernel machines, Adv. in Neural Information Processing Systems, pp.1177-1184, 2007.

A. Rahimi and B. Recht, Weighted sums of random kitchen sinks: Replacing minimization with randomization in learning, Advances in neural information processing systems, pp.1313-1320, 2009.

A. Ramdas, N. G. Trillos, and M. Cuturi, On Wasserstein two-sample testing and related families of nonparametric tests, Entropy, vol.19, issue.2, 2017.

A. Rolet, M. Cuturi, and G. Peyré, Fast dictionary learning with a smoothed Wasserstein loss, Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, vol.51, pp.9-11, 2016.

Y. Rubner, C. Tomasi, and L. J. Guibas, The earth mover's distance as a metric for image retrieval, IJCV, vol.40, issue.2, pp.99-121, 2000.

A. Rudi and L. Rosasco, Generalization properties of learning with random features, Advances in Neural Information Processing Systems, pp.3215-3225, 2017.

T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford et al., Improved techniques for training GANs, Advances in Neural Information Processing Systems, pp.2234-2242, 2016.

T. Salimans, H. Zhang, A. Radford, and D. Metaxas, Improving GANs using optimal transport, International Conference on Learning Representations, 2018.

F. Santambrogio, Optimal Transport for applied mathematicians, Nonlinear Differential Equations and their applications, vol.87, 2015.

M. Schmidt, N. L. Roux, and F. Bach, Minimizing finite sums with the stochastic average gradient, Mathematical Programming, 2016.
URL : https://hal.archives-ouvertes.fr/hal-00860051

B. Schmitzer, Stabilized sparse scaling algorithms for entropy regularized transport problems, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01385251

V. Seguy, B. B. Damodaran, R. Flamary, N. Courty, A. Rolet et al., Largescale optimal transport and mapping estimation, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01956354

D. Sejdinovic, B. Sriperumbudur, A. Gretton, and K. Fukumizu, Equivalence of distance-based and RKHS-based statistics in hypothesis testing, The Annals of Statistics, pp.2263-2291, 2013.

R. Sinkhorn, A relationship between arbitrary positive matrices and doubly stochastic matrices, Ann. Math. Statist, vol.35, pp.876-879, 1964.

R. Sinkhorn, Diagonal equivalence to matrices with prescribed row and column sums, The American Mathematical Monthly, vol.74, issue.4, pp.402-405, 1967.

J. Solomon, F. De-goes, G. Peyré, M. Cuturi, A. Butscher et al., Convolutional Wasserstein distances: Efficient optimal transportation on geometric domains, Proc. SIGGRAPH 2015), 2015.

B. K. Sriperumbudur, A. Gretton, K. Fukumizu, B. Schölkopf, and G. R. Lanckriet, Hilbert space embeddings and metrics on probability measures, Journal of Machine Learning Research, vol.11, pp.1517-1561, 2010.

B. K. Sriperumbudur, K. Fukumizu, A. Gretton, B. Schölkopf, and G. R. Lanckriet, On the empirical estimation of integral probability metrics, Electronic Journal of Statistics, vol.6, pp.1550-1599, 2012.

M. Staib, S. Claici, J. M. Solomon, and S. Jegelka, Parallel streaming Wasserstein barycenters, Advances in Neural Information Processing Systems, pp.2647-2658, 2017.

I. Steinwart and A. Christmann, Support vector machines, 2008.

J. Weed and F. Bach, Sharp asymptotic and finite-sample rates of convergence of empirical measures in Wasserstein distance, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01555307

G. Wu, E. Chang, Y. K. Chen, and C. Hughes, Incremental approximate matrix factorization for speeding up support vector machines, Proc. of the 12th ACM SIGKDD Intern. Conf. on Knowledge Discovery and Data Mining, pp.760-766, 2006.