. .. Mixed-effect-models, 162 6.2.1 Population approach and hierarchical models, vol.163, p.160

, Fast Stochastic Approximation of the EM

. .. Distributions, 164 6.3.1 The conditional distribution of the individual parameters

-. The-nlme and . .. The-f-saem, 167 6.4.1 Proposal based on Laplace approximation

, 172 6.5.2 Time-to-event Data Model

.. .. Conclusion,

. .. Introduction, 192 7.2.2 Convergence of the iSAEM for curved exponential family, p.195

N. .. Applications,

. .. Conclusion, 203 complete model (y, z) where the realisations of y are observed and z is the latent data

. .. Noncontinuous-data-models, 214 8.3 A Repeated Time-To-Event Data Model

, A Categorical Data Model with Regression Variables

. The-r-package and . Saemix-[comets, of the Stochastic Approximation Expectation Maximization algorithm developed by Kuhn and Lavielle, 2004.

D. Exp, , p.1

, On the Global Convergence of (Fast) Incremental Expectation Maximization Methods, Advances in Neural Information Processing Systems, 2019.

. Non-asymptotic, Analysis of Biased Stochastic Approximation Scheme, Proceedings of Conference on Learning Theory, 2019.

B. Karimi and M. Lavielle, Articles in peer-reviewed journals f-SAEM: A fast Stochastic Approximation of the EM algorithm, Belhal Karimi, Marc Lavielle and Eric Moulines, Computational Statistics and Data Analysis (CSDA), 2019. Preprints A Doubly Stochastic Surrogate Optimization Scheme for Non-convex Finite-sum Problems, Proceedings, 2018.

, Software R package, extension of saemix, 2019.

, Awards Visiting Student Researcher Grant from the Jacques Hadamard Foundation, HSE-Samsung AI Lab in Moscow (RUSSIA) with Dr. Dmitry Vetrov (2 months)

, Student award, 32nd Conference on Learning Theory, 2019.

M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen et al.,

S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens et al., TensorFlow: Large-scale machine learning on heterogeneous systems, 2015.

Y. Abbasi-yadkori, N. Lazic, and C. Szepesvari, Regret bounds for model-free linear quadratic control, 2018.

P. Ablin, A. Gramfort, J. Cardoso, and F. Bach, EM algorithms for ICA, 2018.

A. Agarwal and L. Bottou, A lower bound for the optimization of finite sums, 2014.

A. Agarwal and J. C. Duchi, The generalization ability of online algorithms for dependent data, IEEE Transactions on Information Theory, vol.59, issue.1, pp.573-587, 2013.

N. Agarwal, Z. Allen-zhu, B. Bullins, E. Hazan, and T. Ma, Finding approximate local minima faster than gradient descent, Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp.1195-1199, 2017.

A. Agresti, Categorical data analysis. A Wiley-Interscience publication, 1990.

S. Allassonnière and J. Chevallier, A New Class of EM Algorithms. Escaping Local Minima and Handling Intractable Sampling, 2019.

S. Allassonniere and E. Kuhn, Convergent Stochastic Expectation Maximization algorithm with efficient sampling in high dimension, Application to deformable template model estimation, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00720617

Z. Allen-zhu and E. Hazan, Variance reduction for faster non-convex optimization, International Conference on Machine Learning, pp.699-707, 2016.

P. K. Andersen, Survival Analysis, Wiley Reference Series in Biostatistics, 2006.

C. Andrieu, A. Doucet, and R. Holenstein, Particle markov chain monte carlo methods, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol.72, issue.3, pp.269-342, 2010.

C. Andrieu and É. Moulines, On the ergodicity properties of some adaptive mcmc algorithms, The Annals of Applied Probability, vol.16, issue.3, pp.1462-1505, 2006.

, Bibliography 229

C. Andrieu and G. O. Roberts, The pseudo-marginal approach for efficient monte carlo computations, The Annals of Statistics, vol.37, issue.2, pp.697-725, 2009.

C. Andrieu and J. Thoms, A tutorial on adaptive mcmc, Statistics and computing, vol.18, issue.4, pp.343-373, 2008.

Y. F. Atchadé and J. S. Rosenthal, On adaptive markov chain monte carlo algorithms, Bernoulli, vol.11, issue.5, pp.815-828, 2005.

S. Balakrishnan, M. J. Wainwright, Y. , and B. , Statistical guarantees for the EM algorithm: From population to sample-based analysis, Ann. Statist, vol.45, issue.1, pp.77-120, 2017.

J. Baxter and P. L. Bartlett, Infinite-horizon policy-gradient estimation, Journal of Artificial Intelligence Research, vol.15, pp.319-350, 2001.

S. Beal and L. Sheiner, The NONMEM system. The American Statistician, vol.34, pp.118-119, 1980.

A. Benveniste, P. Priouret, and M. Métivier, Adaptive Algorithms and Stochastic Approximations, 1990.

J. Bertrand, E. Comets, C. M. Laffont, M. Chenel, and F. Mentré, Pharmacogenetics and population pharmacokinetics: impact of the design on three tests using the saem algorithm, Journal of pharmacokinetics and pharmacodynamics, vol.36, issue.4, pp.317-339, 2009.
URL : https://hal.archives-ouvertes.fr/inserm-00406739

D. P. Bertsekas, Nonlinear programming, 1999.

D. P. Bertsekas, Incremental gradient, subgradient, and proximal methods for convex optimization: A survey. Optimization for Machine Learning, p.3, 2010.

M. Betancourt, A conceptual introduction to Hamiltonian Monte Carlo, 2017.

J. Bhandari, D. Russo, and R. Singal, A finite time analysis of temporal difference learning with linear function approximation, Conference On Learning Theory, pp.1691-1692, 2018.

C. M. Bishop, Pattern recognition and machine learning, 2006.

D. M. Blei, A. Kucukelbir, and J. D. Mcauliffe, Variational inference: A review for statisticians, Journal of the American Statistical Association, vol.112, issue.518, pp.859-877, 2017.

D. M. Blei, A. Kucukelbir, and J. D. Mcauliffe, Variational Inference: A Review for Statisticians, Journal of the American statistical Association, vol.112, issue.518, pp.859-877, 2017.

C. Blundell, J. Cornebise, K. Kavukcuoglu, and D. Wierstra, Weight uncertainty in neural network, International Conference on Machine Learning, vol.230, pp.1613-1622, 2015.

, Chapter

V. S. Borkar, Stochastic approximation with two time scales, Systems & Control Letters, vol.29, issue.5, pp.291-294, 1997.

V. S. Borkar, Stochastic approximation: a dynamical systems viewpoint, vol.48, 2009.

L. Bottou, Stochastic gradient learning in neural networks, Proceedings of Neuro-N?mes, vol.91, p.12, 1991.

L. Bottou, Online learning and stochastic approximations, vol.17, p.142, 1998.

L. Bottou, O. Bousquet, J. C. Platt, D. Koller, Y. Singer et al., The tradeoffs of large scale learning, Advances in Neural Information Processing Systems, vol.20, pp.161-168, 2008.

L. Bottou, F. E. Curtis, and J. Nocedal, Optimization methods for large-scale machine learning, SIAM Review, vol.60, issue.2, pp.223-311, 2018.

L. Bottou and Y. Le-cun, On-line learning for very large data sets. Applied stochastic models in business and industry, vol.21, pp.137-151, 2005.

S. Boucheron, G. Lugosi, and P. Massart, Concentration inequalities: A nonasymptotic theory of independence, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00794821

S. Boyd and L. Vandenberghe, Convex optimization, 2004.

T. Brocker and L. Lander, Differentiable germs and catastrophes, vol.17, 1975.

S. Brooks, A. Gelman, G. Jones, and X. Meng, Handbook of markov chain monte carlo, 2011.

N. Brosse, A. Durmus, É. Moulines, and S. Sabanis, The tamed unadjusted langevin algorithm, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01648667

O. Cappé and E. Moulines, On-line Expectation Maximization algorithm for latent data models, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol.71, issue.3, pp.593-613, 2009.

Y. Carmon, J. C. Duchi, O. Hinder, and A. Sidford, Convex until proven guilty: Dimension-free acceleration of gradient descent on non-convex functions, Proceedings of the 34th International Conference on Machine Learning, vol.70, pp.654-663, 2017.

. Jmlr and . Org,

B. Carpenter, A. Gelman, M. Hoffman, D. Lee, G. Ben et al., Stan: A probabilistic programming language, Journal of Statistical Software, issue.1, p.76, 2017.

P. L. Chan, P. Jacqmin, M. Lavielle, L. Mcfadyen, and B. Weatherley, , 2011.

, The use of the SAEM algorithm in MONOLIX software for estimation of population pharmacokinetic-pharmacodynamic-viral dynamics parameters of maraviroc in asymptomatic HIV subjects, Journal of Pharmacokinetics and Pharmacodynamics, vol.38, issue.1, pp.41-61

J. Chen, J. Zhu, Y. W. Teh, and T. Zhang, Stochastic Expectation Maximization with variance reduction, Advances in Neural Information Processing Systems, pp.7978-7988, 2018.

E. Comets, A. Lavenu, and M. Lavielle, Parameter estimation in nonlinear mixed effect models using saemix, an r implementation of the saem algorithm, Journal of Statistical Software, vol.80, issue.3, pp.1-42, 2017.
URL : https://hal.archives-ouvertes.fr/inserm-01502767

A. R. Conn, N. Gould, A. Sartenaer, and P. L. Toint, Global convergence of a class of trust region algorithms for optimization using inexact projections on convex constraints, SIAM Journal on Optimization, vol.3, issue.1, pp.164-221, 1993.

I. Csiszár and G. Tusnády, Information geometry and alternating minimization procedures, Statist. Decisions, suppl, vol.1, pp.205-237, 1984.

G. Dalal, B. Szörényi, G. Thoppe, and S. Mannor, Finite sample analyses for td (0) with function approximation, Thirty-Second AAAI Conference on Artificial Intelligence, 2018.

G. Dalal, B. Szorenyi, G. Thoppe, and S. Mannor, Finite sample analysis of two-timescale stochastic approximation with applications to reinforcement learning, Conference On Learning Theory, 2018.

M. Davidian, Nonlinear models for repeated measurement data, 2017.

N. De-freitas, P. Højen-sørensen, M. I. Jordan, R. , and S. , Variational mcmc. Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence, pp.120-127, 2001.

A. Defazio, F. Bach, and S. Lacoste-julien, Saga: A fast incremental gradient method with support for non-strongly convex composite objectives, Advances in neural information processing systems, pp.1646-1654, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01016843

T. Degris, M. White, and R. S. Sutton, Off-policy actor-critic, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00764021

B. Delyon, M. Lavielle, and E. Moulines, Convergence of a stochastic approximation version of the EM algorithm, Ann. Statist, vol.27, issue.1, pp.94-128, 1999.

B. Delyon, M. Lavielle, and E. Moulines, Convergence of a stochastic approximation version of the EM algorithm, Ann. Statist, vol.27, issue.1, pp.94-128, 1999.

A. P. Dempster, N. M. Laird, and D. B. Rubin, Maximum likelihood from incomplete data via the EM algorithm, Journal of the royal statistical society. Series B (methodological), pp.1-38, 1977.

J. V. Dillon, I. Langmore, D. Tran, E. Brevdo, S. Vasudevan et al., , 2017.

S. Donnet and A. Samson, Using pmcmc in EM algorithm for stochastic mixed models: theoretical and practical issues, Journal de la Société Française de Statistique, vol.155, issue.1, pp.49-72, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00950760

R. Douc, E. Moulines, and D. Stoffer, Nonlinear Time Series: Theory, Methods and Applications with R examples, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01263245

A. Doucet, S. Godsill, and C. Andrieu, On sequential monte carlo sampling methods for bayesian filtering, Statistics and computing, vol.10, issue.3, pp.197-208, 2000.

P. Doukhan, P. Massart, and E. Rio, Invariance principles for absolutely regular empirical processes, Annales de l'IHP Probabilités et statistiques, vol.31, pp.393-427, 1995.

J. C. Duchi, A. Agarwal, M. Johansson, J. , and M. I. , Ergodic mirror descent, SIAM Journal on Optimization, vol.22, issue.4, pp.1549-1578, 2012.

C. Fang, C. J. Li, Z. Lin, and T. Zhang, Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator, Advances in Neural Information Processing Systems, pp.687-697, 2018.

M. Fazel, R. Ge, S. Kakade, and M. Mesbahi, Global convergence of policy gradient methods for the linear quadratic regulator, International Conference on Machine Learning, pp.1466-1475, 2018.

R. A. Fisher, Theory of statistical estimation, Mathematical Proceedings of the Cambridge Philosophical Society, vol.22, pp.700-725, 1925.

G. Fort, E. Moulines, and P. Priouret, Convergence of adaptive and interacting Markov chain monte carlo algorithms, The Annals of Statistics, vol.39, issue.6, pp.3262-3289, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00695649

C. Fraley and A. E. Raftery, Bayesian regularization for normal mixture estimation and model-based clustering, Journal of classification, vol.24, issue.2, pp.155-181, 2007.

, Bibliography 233

S. Ghadimi and G. Lan, Stochastic first-and zeroth-order methods for nonconvex stochastic programming, SIAM Journal on Optimization, vol.23, issue.4, pp.2341-2368, 2013.

S. Ghadimi, G. Lan, and H. Zhang, Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization, Mathematical Programming, vol.155, issue.1-2, pp.267-305, 2016.

Z. Ghahramani, Probabilistic machine learning and artificial intelligence, Nature, vol.521, issue.7553, pp.452-459, 2015.

R. M. Gower, N. Loizou, X. Qian, A. Sailanbayev, E. Shulgin et al., SGD: general analysis and improved rates, 2019.

A. Griewank and A. Walther, Evaluating derivatives: principles and techniques of algorithmic differentiation, vol.105, 2008.

A. Gunawardana and W. Byrne, Convergence theorems for generalized alternating minimization procedures, Journal of Machine Learning Research, vol.6, pp.2049-2073, 2005.

H. Haario, E. Saksman, and J. Tamminen, An adaptive metropolis algorithm, Bernoulli, vol.7, issue.2, pp.223-242, 2001.

P. Hall and C. C. Heyde, Martingale limit theory and its application, 2014.

G. E. Hinton, S. Osindero, and Y. Teh, A fast learning algorithm for deep belief nets, Neural computation, vol.18, issue.7, pp.1527-1554, 2006.

M. D. Hoffman and A. Gelman, The No-U-turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo, Journal of Machine Learning Research, vol.15, issue.1, pp.1593-1623, 2014.

T. Hofmann, Probabilistic latent semantic indexing, Proceedings of the 22Nd, 1999.

, Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '99, pp.50-57

S. Horváth and P. Richtárik, Nonconvex variance reduced optimization with arbitrary sampling, 2018.

J. Reddi, S. Sra, S. Poczos, B. Smola, A. J. Lee et al., Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization, Advances in Neural Information Processing Systems, vol.29, pp.1145-1153, 2016.

T. Jaakkola, M. I. Jordan, and S. P. Singh, Convergence of stochastic iterative dynamic programming algorithms, Advances in Neural Information Processing Systems, pp.703-710, 1994.

, Chapter

F. Jaffrézic, C. Meza, M. Lavielle, and J. Foulley, Genetic analysis of growth curves using the saem algorithm, Genetics Selection Evolution, vol.38, issue.6, p.583, 2006.

W. Jiang, J. Josse, and M. Lavielle, Logistic regression with missing covariatesparameter estimation, model selection and prediction, 2018.

R. Johnson and T. Zhang, Accelerating stochastic gradient descent using predictive variance reduction, Advances in neural information processing systems, pp.315-323, 2013.

M. I. Jordan, Z. Ghahramani, T. S. Jaakkola, and L. K. Saul, An introduction to variational methods for graphical models, Mach. Learn, vol.37, issue.2, pp.183-233, 1999.

R. E. Kalman and J. E. Bertram, Control system analysis and design via the "second method" of lyapunov: I-continuous-time systems, Journal of Basic Engineering, vol.82, issue.2, pp.371-393, 1960.

B. Karimi and M. Lavielle, Efficient Metropolis-Hastings sampling for nonlinear mixed effects models, Proceedings, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01958247

B. Karimi, M. Lavielle, and E. Moulines, f-saem: A fast stochastic approximation of the em algorithm for nonlinear mixed effects models, Computational Statistics & Data Analysis, vol.141, pp.123-138, 2020.
URL : https://hal.archives-ouvertes.fr/hal-01958248

B. Karimi, B. Miasojedow, E. Moulines, and H. Wai, Non-asymptotic analysis of biased stochastic approximation scheme, Proceedings of the Thirty-Second Conference on Learning Theory, vol.99, pp.1944-1974, 2019.
URL : https://hal.archives-ouvertes.fr/hal-02127750

B. Karimi, H. Wai, and E. Moulines, A doubly stochastic surrogate optimization scheme for non-convex finite-sum problems, 2019.

B. Karimi, H. Wai, E. Moulines, and M. Lavielle, On the global convergence of (fast) incremental expectation maximization methods, Advances in Neural Information Processing Systems, 2019.

D. P. Kingma and J. Ba, Adam: A method for stochastic optimization, 3rd International Conference on Learning Representations, 2015.

D. P. Kingma and M. Welling, Auto-encoding variational bayes, 2nd International Conference on Learning Representations, 2014.

V. R. Konda and J. N. Tsitsiklis, On actor-critic algorithms, SIAM journal on Control and Optimization, vol.42, issue.4, pp.1143-1166, 2003.

, Bibliography 235

A. Kucukelbir, R. Ranganath, A. Gelman, D. Blei, C. Cortes et al., Automatic variational inference in stan, Advances in Neural Information Processing Systems, vol.28, pp.568-576, 2015.

E. Kuhn and M. Lavielle, Coupling a stochastic approximation version of EM with an MCMC procedure, ESAIM: Probability and Statistics, vol.8, pp.115-131, 2004.

H. Kushner and G. G. Yin, Stochastic approximation and recursive algorithms and applications, vol.35, 2003.

H. J. Kushner and D. S. Clark, Stochastic approximation methods for constrained and unconstrained systems, vol.26, 2012.

C. Lakshminarayanan and C. Szepesvari, Linear stochastic approximation: How far does constant step-size and iterate averaging, International Conference on Artificial Intelligence and Statistics, pp.1347-1355, 2018.

K. Lange, MM Optimization Algorithms, 2016.

M. Lavielle, A stochastic algorithm for parametric and non-parametric estimation in the case of incomplete data, Signal Processing, vol.42, issue.1, pp.3-17, 1995.

M. Lavielle, Monolix (modèles non linéaires à effets mixtes), 2005.

M. Lavielle, Mixed effects models for the population approach: models, tasks, methods and tools, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01122873

M. Lavielle, E. Ilinca, and R. Kuate, mlxR: Simulation of Longitudinal Data, 2019.

M. Lavielle and F. Mentré, Estimation of population pharmacokinetic parameters of saquinavir in hiv patients with the monolix software, Journal of pharmacokinetics and pharmacodynamics, vol.34, issue.2, pp.229-249, 2007.
URL : https://hal.archives-ouvertes.fr/inserm-00156907

M. Lavielle and B. Ribba, Enhanced method for diagnosing pharmacometric models: random sampling from conditional distributions, Pharmaceutical research, vol.33, issue.12, pp.2979-2988, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01365532

Y. Lecun, The mnist database of handwritten digits, 1998.

Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, Gradient-based learning applied to document recognition, Proceedings of the IEEE, vol.86, issue.11, pp.2278-2324, 1998.

Y. Li and Y. Gal, Dropout inference in bayesian neural networks with alphadivergences, Proceedings of the 34th International Conference on Machine Learning, vol.70, pp.2052-2061, 2017.

T. A. Louis, Finding the observed information matrix when using the EM algorithm, Journal of the Royal Statistical Society, Series B: Methodological, vol.44, pp.226-233, 1982.

J. Mairal, Incremental majorization-minimization optimization with application to large-scale machine learning, SIAM Journal on Optimization, vol.25, issue.2, pp.829-855, 2015.
URL : https://hal.archives-ouvertes.fr/hal-00948338

J. Mairal, Incremental majorization-minimization optimization with application to large-scale machine learning, SIAM J. Optim, vol.25, issue.2, pp.829-855, 2015.
URL : https://hal.archives-ouvertes.fr/hal-00948338

D. Makowski and M. Lavielle, Using saem to estimate parameters of models of response to applied fertilizer, Journal of Agricultural, Biological, and Environmental Statistics, vol.11, issue.1, pp.45-60, 2006.

C. Mbogning, K. Bleakley, and M. Lavielle, Joint modeling of longitudinal and repeated time-to-event data using nonlinear mixed-effects models and the SAEM algorithm, Journal of Statistical Computation and Simulation, vol.85, issue.8, pp.1512-1528, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01122140

G. Mclachlan and T. Krishnan, The EM algorithm and extensions, vol.382, 2007.

G. J. Mclachlan and T. Krishnan, The EM algorithm and extensions. Wiley Series in Probability and Statistics, 2008.

O. Medelyan, Human-competitive automatic topic indexing, 2009.

K. L. Mengersen and R. L. Tweedie, Rates of convergence of the hastings and metropolis algorithms, Ann. Statist, vol.24, issue.1, pp.101-121, 1996.

N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and E. Teller, Equation of state calculations by fast computing machines, The Journal of Chemical Physics, vol.21, issue.6, pp.1087-1092, 1953.

S. P. Meyn and R. L. Tweedie, Markov chains and stochastic stability, 2012.

H. Migon, D. Gamerman, and F. Louzada, Statistical Inference: An Integrated Approach, Second Edition. Chapman & Hall/CRC Texts in Statistical Science, 2014.

E. Moulines and F. R. Bach, Non-asymptotic analysis of stochastic approximation Bibliography 237, 2011.

, algorithms for machine learning, Advances in Neural Information Processing Systems, pp.451-459

R. M. Neal, Bayesian learning for neural networks, vol.118, 2012.

R. M. Neal, Mcmc using hamiltonian dynamics. Handbook of Markov Chain Monte Carlo, vol.2, p.2, 2011.

R. M. Neal and G. E. Hinton, A view of the EM algorithm that justifies incremental, sparse, and other variants, Learning in graphical models, pp.355-368, 1998.

. Springer,

A. S. Nemirovsky, A. S. Iudin, and D. B. , Problem complexity and method efficiency in optimization, 1983.

Y. Nesterov, Introductory Lectures on Convex Optimization: A Basic Course, 2004.

S. Ng and G. Mclachlan, On the choice of the number of blocks with the incremental EM algorithm for the fitting of normal mixtures, Statistics and Computing, vol.13, issue.1, pp.45-55, 2003.

R. A. O'reilly and P. M. Aggeler, Studies on coumarin anticoagulant drugs initiation of warfarin therapy without a loading dose, Circulation, vol.38, issue.1, pp.169-177, 1968.

J. Paisley, D. Blei, J. , and M. , Variational bayesian inference with stochastic search, ICML. icml.cc / Omnipress, 2012.

M. Papini, D. Binaghi, G. Canonaco, M. Pirotta, and M. Restelli, Stochastic variance-reduced policy gradient, Proceedings of the 35th International Conference on Machine Learning, vol.80, pp.4026-4035, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01940394

S. E. Pav, Madness: a package for Multivariate Automatic Differentiation, 2016.

J. Peters and S. Schaal, Natural actor-critic, Neurocomputing, vol.71, issue.7-9, pp.1180-1190, 2008.

N. G. Polson and V. Sokolov, Deep learning: a bayesian perspective, Bayesian Analysis, vol.12, issue.4, pp.1275-1304, 2017.

B. T. Polyak and A. B. Juditsky, Acceleration of stochastic approximation by averaging, SIAM Journal on Control and Optimization, vol.30, issue.4, pp.838-855, 1992.

X. Qian, A. Sailanbayev, K. Mishchenko, and P. Richtárik, Miso is making a comeback with better proofs and rates, 2019.

, Chapter

, R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, 2008.

M. Razaviyayn, M. Hong, and Z. Luo, A unified convergence analysis of block successive minimization methods for nonsmooth optimization, SIAM Journal on Optimization, vol.23, issue.2, pp.1126-1153, 2013.

S. Reddi, M. Zaheer, S. Sra, B. Poczos, F. Bach et al., A generic approach for escaping saddle points, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01652150

F. Cruz, Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, vol.84, pp.1233-1242

S. J. Reddi, A. Hefny, S. Sra, B. Poczos, and A. Smola, Stochastic variance reduction for nonconvex optimization, International conference on machine learning, pp.314-323, 2016.

S. J. Reddi, S. Sra, B. Póczos, and A. Smola, Fast incremental method for nonconvex optimization, 2016.

D. J. Rezende, S. Mohamed, and D. Wierstra, Stochastic backpropagation and approximate inference in deep generative models, International Conference on Machine Learning, pp.1278-1286, 2014.

H. Robbins and S. Monro, A stochastic approximation method, The Annals of Mathematical Statistics, vol.22, issue.3, pp.400-407, 1951.

C. P. Robert and G. Casella, Metropolis-Hastings Algorithms, pp.167-197, 2010.
URL : https://hal.archives-ouvertes.fr/hal-01067920

G. O. Roberts, A. Gelman, and W. R. Gilks, Weak convergence and optimal scaling of random walk metropolis algorithms, Ann. Appl. Probab, vol.7, issue.1, pp.110-120, 1997.

G. O. Roberts and J. S. Rosenthal, Optimal scaling of discrete approximations to langevin diffusions, J. R. Statist. Soc. B, vol.60, pp.255-268, 1997.

G. O. Roberts and J. S. Rosenthal, Quantitative non-geometric convergence bounds for independence samplers, Methodology and Computing in Applied Probability, vol.13, issue.2, pp.391-403, 2011.

G. O. Roberts and R. L. Tweedie, Exponential convergence of langevin distributions and their discrete approximations, Bernoulli, vol.2, issue.4, pp.341-363, 1996.

N. L. Roux, M. Schmidt, and F. R. Bach, A stochastic gradient method with an exponential convergence _rate for finite training sets, Advances in Neural Information Processing Systems, vol.25, pp.2663-2671, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00674995

C. W. Royer and S. J. Wright, Complexity analysis of second-order linesearch algorithms for smooth nonconvex optimization, SIAM Journal on Optimization, vol.28, issue.2, pp.1448-1477, 2018.

H. Rue, S. Martino, and N. Chopin, Approximate bayesian inference for latent gaussian models by using integrated nested laplace approximations, Journal of the royal statistical society: Series b (statistical methodology), vol.71, issue.2, pp.319-392, 2009.

A. Samson, M. Lavielle, and F. Mentré, Extension of the saem algorithm to leftcensored data in nonlinear mixed-effects model: Application to hiv dynamics model, Computational Statistics & Data Analysis, vol.51, issue.3, pp.1562-1574, 2006.
URL : https://hal.archives-ouvertes.fr/hal-00263506

R. M. Savic, F. Mentré, and M. Lavielle, Implementation and evaluation of the SAEM algorithm for longitudinal ordered categorical data with an illustration in pharmacokinetics-pharmacodynamics, The AAPS Journal, vol.13, issue.1, pp.44-53, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00637400

M. Schmidt, N. Le-roux, and F. Bach, Minimizing finite sums with the stochastic average gradient, Mathematical Programming, vol.162, issue.1-2, pp.83-112, 2017.
URL : https://hal.archives-ouvertes.fr/hal-00860051

A. Shapiro, D. Dentcheva, and A. Ruszczy?ski, Lectures on stochastic programming: modeling and theory, 2009.

E. Snoeck, P. Chanu, M. Lavielle, P. Jacqmin, E. Jonsson et al., A comprehensive hepatitis c viral kinetic model explaining cure, Clinical Pharmacology & Therapeutics, vol.87, issue.6, pp.706-713, 2010.
URL : https://hal.archives-ouvertes.fr/hal-00637434

, RStan: the R interface to Stan, Stan Development Team, 2018.

O. Stramer and R. L. Tweedie, Langevin-type models i: Diffusions with given stationary distributions and their discretizations*, Methodology And Computing In Applied Probability, vol.1, issue.3, pp.283-306, 1999.

T. Sun, Y. Sun, W. Yin, S. Bengio, H. Wallach et al., On Markov chain gradient descent, Advances in Neural Information Processing Systems, vol.31, pp.9918-9927, 2018.

I. Sutskever, J. Martens, G. Dahl, and G. Hinton, On the importance of initialization and momentum in deep learning, International conference on machine learning, pp.1139-1147, 2013.

R. Sutton and A. Barto, Reinforcement Learning: An Introduction, 2018.

R. S. Sutton, D. A. Mcallester, S. P. Singh, and Y. Mansour, Policy gradient methods for reinforcement learning with function approximation, Advances in Neural Information Processing Systems, pp.1057-1063, 2000.

V. B. Tadi? and A. Doucet, Asymptotic bias of stochastic gradient search, The Annals of Applied Probability, vol.27, issue.6, pp.3255-3304, 2017.

B. Thiesson, C. Meek, and D. Heckerman, Accelerating EM for large databases, Machine Learning, vol.45, issue.3, pp.279-299, 2001.

M. K. Titsias and O. Papaspiliopoulos, Auxiliary gradient?Ä?based sampling algorithms, Journal of the Royal Statistical Society: Series B (Statistical Methodology, issue.0, p.0, 2018.

N. D. Vanli, M. Gurbuzbalaban, and A. Ozdaglar, Global convergence rate of proximal incremental aggregated gradient methods, SIAM Journal on Optimization, vol.28, issue.2, pp.1282-1300, 2018.

V. Vapnik, The nature of statistical learning theory, 2013.

G. Verbeke, Linear mixed models for longitudinal data, 1997.

M. Vihola, Robust adaptive metropolis algorithm with coerced acceptance rate, Statistics and Computing, vol.22, issue.5, pp.997-1008, 2012.

M. J. Wainwright and M. I. Jordan, Graphical models, exponential families, and variational inference. Foundations and Trends® in Machine Learning, vol.1, pp.1-305, 2008.

Y. Wang, Derivation of various nonmem estimation methods, Journal of Pharmacokinetics and pharmacodynamics, vol.34, issue.5, pp.575-593, 2007.

Z. Wang, Q. Gu, Y. Ning, H. Liu, N. D. Lawrence et al., High dimensional EM algorithm: Statistical optimization and asymptotic normality, Advances in Neural Information Processing Systems, vol.28, pp.2521-2529, 2015.

Z. Wang, Q. Gu, Y. Ning, and H. Liu, High dimensional em algorithm: Statistical optimization and asymptotic normality, Advances in neural information processing systems, pp.2521-2529, 2015.

G. C. Wei and M. A. Tanner, A monte carlo implementation of the EM algorithm and the poor man's data augmentation algorithms, Journal of the American Statistical Association, vol.85, issue.411, pp.699-704, 1990.

G. C. Wei and M. A. Tanner, A Monte Carlo implementation of the EM algorithm and the poor man's data augmentation algorithms, Journal of the American Statistical Association, vol.85, issue.411, pp.699-704, 1990.

R. J. Williams, Simple statistical gradient-following algorithms for connectionist reinforcement learning, Machine Learning, vol.8, pp.229-256, 1992.

, Bibliography

C. Wu, On the convergence properties of the EM algorithm, The Annals of Statistics, p.11, 1983.

C. J. Wu, On the convergence properties of the EM algorithm, The Annals of statistics, vol.11, issue.1, pp.95-103, 1983.

J. Xu, D. J. Hsu, A. Maleki, D. D. Lee, M. Sugiyama et al., Global analysis of expectation maximization for mixtures of two gaussians, Advances in Neural Information Processing Systems, vol.29, pp.2676-2684, 2016.

J. Xu, D. J. Hsu, and A. Maleki, Global analysis of Expectation Maximization for mixtures of two gaussians, Advances in Neural Information Processing Systems, pp.2676-2684, 2016.

Y. Xu, J. Rong, Y. , and T. , First-order stochastic algorithms for escaping from saddle points in almost linear time, Advances in Neural Information Processing Systems, pp.5530-5540, 2018.

Z. Zhang, Parametric regression model for survival data: Weibull regression model as an example, Ann Transl Med, vol.24, 2016.

. .. , 25 1.2 Metropolis-Hastings (MH) algorithm: representation of a proposal q(z) and the target ?(z) distributions in one dimension, p.33

, Viral load of four patients with hepatitis C (taken from, p.35, 2014.

, Algorithme MH: représentation d'une distribution de proposition et d'une distribution cible en dimension 1

, Charge virale pour 4 patients atteints d'hepatitis C (tiré de [Lavielle, p.54, 2014.

, Convergence of first component of the vector of parameters ? and ? for the SAEM, the MCEM and the MISSO methods. The convergence is plotted against the number of passes over the data

. .. , 71 5.1 Performance of stochastic EM methods for fitting a GMM. (Left) Precision (|µ (k) ? µ | 2 ) as a function of the epoch elapsed. (Right) Number of iterations to reach a precision of 10 ?3, Incremental Variational Inference) Negated ELBO versus epochs elapsed for fitting the Bayesian LeNet-5 on MNIST using different algorithms

. .. Subjects, 174 1.1 ERM methods: Table comparing the complexity, measured in terms of iterations, of different algorithms for non-convex optimization. MC stands for Monte Carlo integration of the drift term and Step. for stepsize, Warfarin concentration (mg/l) over time (h) for 32, p.26

M. Méthodes-de, Tableau de comparaison de complexité, mesuré en termes d'iterations, de différents algorithmes d'optimisation non convexe. MC siginifie Intégration de Monte Carlo du terme de dérive, p.45

.. .. Lenet-5-architecture,

. Msjd and . .. Per-dimension,

.. .. Means,

. Msjd and . .. Per-dimension,

. Msjd and . .. Per-dimension, , p.181

. Msjd and . .. Per-dimension,