A. Agarwal and L. Bottou, A lower bound for the optimization of finite sums, Proceedings of the International Conferences on Machine Learning (ICML), 2015.

N. Agarwal, Z. Allen-zhu, B. Bullins, E. Hazan, and T. Ma, Finding approximate local minima for nonconvex optimization in linear time, 2016.

Z. Allen-zhu, Katyusha: the first direct acceleration of stochastic gradient methods, Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing , STOC 2017, 2016.
DOI : 10.1145/1015330.1015332

Z. Allen-zhu, Natasha: Faster stochastic non-convex optimization via strongly non-convex parameter, Proceedings of the International Conferences on Machine Learning (ICML), 2017.

Z. Allen-zhu and L. Orecchia, Linear coupling: An ultimate unification of gradient and mirror descent, 2014.

Y. Arjevani and O. Shamir, Dimension-free iteration complexity of finite sum optimization problems, Advances in Neural Information Processing Systems (NIPS), 2016.

A. Auslender, Numerical methods for nondifferentiable convex optimization. Nonlinear Analysis and Optimization, pp.102-126, 1987.

F. Bach, R. Jenatton, J. Mairal, and G. Obozinski, Optimization with Sparsity-Inducing Penalties, Machine Learning, pp.1-106, 2012.
DOI : 10.1561/2200000015
URL : https://hal.archives-ouvertes.fr/hal-00613125

A. Beck and M. Teboulle, A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems, SIAM Journal on Imaging Sciences, vol.2, issue.1, pp.183-202, 2009.
DOI : 10.1137/080716542

A. Beck and M. Teboulle, Smoothing and First Order Methods: A Unified Framework, SIAM Journal on Optimization, vol.22, issue.2, pp.557-580, 2012.
DOI : 10.1137/100818327

D. P. Bertsekas, Nonlinear programming, Athena scientific Belmont, 1999.

D. P. Bertsekas, Incremental proximal methods for large scale convex optimization, Mathematical Programming, pp.163-195, 2011.
DOI : 10.1137/S1052623495294797

, BIBLIOGRAPHY

D. P. Bertsekas, Convex Optimization Algorithms, Athena Scientific, 2015.

A. Bietti and J. Mairal, Stochastic optimization with variance reduction for infinite datasets with finite-sum structure, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01375816

J. Bolte, T. P. Nguyen, J. Peypouquet, and B. Suter, From error bounds to the complexity of first-order descent methods for convex functions, Mathematical Programming, 2016.
DOI : 10.1137/S1052623402403505

J. F. Bonnans, J. C. Gilbert, C. Lemaréchal, and C. A. Sagastizábal, A family of variable metric proximal methods, Mathematical Programming, pp.15-47, 1995.
DOI : 10.1007/BF01585756
URL : https://hal.archives-ouvertes.fr/inria-00074821

J. F. Bonnans, J. C. Gilbert, C. Lemaréchal, and C. A. Sagastizábal, Numerical Optimization: Theoretical and Practical Aspects, 2006.
DOI : 10.1007/978-3-662-05078-1

J. M. Borwein and A. S. Lewis, Convex analysis and nonlinear optimization: theory and examples, 2010.

L. Bottou, Large-scale machine learning with stochastic gradient descent, Proceedings of COMPSTAT, 2010.

L. Bottou, Stochastic Gradient Descent Tricks, Neural networks: Tricks of the trade, pp.421-436, 2012.
DOI : 10.1137/1116025
URL : http://leon.bottou.org/publications/pdf/tricks-2012.pdf

S. P. Boyd and L. Vandenberghe, Convex optimization, 2009.

C. G. Broyden, Quasi-Newton methods and their application to function minimisation, Mathematics of Computation, vol.21, issue.99, pp.368-381, 1967.
DOI : 10.1090/S0025-5718-1967-0224273-2
URL : http://www.ams.org/mcom/1967-21-099/S0025-5718-1967-0224273-2/S0025-5718-1967-0224273-2.pdf

C. G. Broyden, The Convergence of a Class of Double-rank Minimization Algorithms 1. General Considerations, IMA Journal of Applied Mathematics, vol.6, issue.1, pp.76-90, 1970.
DOI : 10.1093/imamat/6.1.76

S. Bubeck, Y. Lee, and M. Singh, A geometric alternative to nesterov's accelerated gradient descent, 2015.

J. V. Burke and M. Qian, A Variable Metric Proximal Point Algorithm for Monotone Operators, SIAM Journal on Control and Optimization, vol.37, issue.2, pp.353-375, 1999.
DOI : 10.1137/S0363012992235547
URL : http://www.math.washington.edu/~burke/papers/qian1.ps

J. V. Burke and M. Qian, On the superlinear convergence of the variable metric proximal point algorithm using Broyden and BFGS matrix secant updating, Mathematical Programming, pp.157-181, 2000.
DOI : 10.1007/PL00011373

R. H. Byrd and J. Nocedal, A Tool for the Analysis of Quasi-Newton Methods with Application to Unconstrained Minimization, SIAM Journal on Numerical Analysis, vol.26, issue.3, pp.727-739, 1989.
DOI : 10.1137/0726042

R. H. Byrd, J. Nocedal, and Y. Yuan, Global Convergence of a Cass of Quasi-Newton Methods on Convex Problems, SIAM Journal on Numerical Analysis, vol.24, issue.5, pp.1171-1190, 1987.
DOI : 10.1137/0724077

B. R. Byrd, J. Nocedal, and F. Oztoprak, An inexact successive quadratic approximation method for L-1 regularized optimization, Mathematical Programming, pp.375-396, 2015.
DOI : 10.1198/tech.2006.s352

R. H. Byrd, S. L. Hansen, J. Nocedal, and Y. Singer, A Stochastic Quasi-Newton Method for Large-Scale Optimization, SIAM Journal on Optimization, vol.26, issue.2, pp.1008-1031, 2016.
DOI : 10.1137/140954362
URL : http://arxiv.org/pdf/1401.7020

Y. Carmon, J. C. Duchi, O. Hinder, and A. Sidford, Accelerated methods for non-convex optimization, 2016.

Y. Carmon, J. C. Duchi, O. Hinder, and A. Sidford, Lower Bounds for Finding Stationary Points I. preprint arXiv, pp.1710-11606, 2017.

Y. Carmon, O. Hinder, J. C. Duchi, and A. Sidford, convex until proven guilt " : Dimension-free acceleration of gradient descent on non-convex functions, 2017.

C. Cartis, N. I. Gould, and P. L. Toint, On the Complexity of Steepest Descent, Newton's and Regularized Newton's Methods for Nonconvex Unconstrained Optimization Problems, SIAM Journal on Optimization, vol.20, issue.6, pp.2833-2852, 2010.
DOI : 10.1137/090774100

C. Cartis, N. I. Gould, and P. L. Toint, On the complexity of finding first-order critical points in constrained nonlinear optimization, Mathematical Programming, 2014.

A. Chambolle and T. Pock, A remark on accelerated block coordinate descent for computing the proximity operators of a sum of convex functions, SMAI Journal of Computational Mathematics, vol.1, pp.29-54, 2015.
DOI : 10.5802/smai-jcm.3
URL : https://hal.archives-ouvertes.fr/hal-01099182

X. Chen and M. Fukushima, Proximal quasi-Newton methods for nondifferentiable convex optimization, Mathematical Programming, vol.85, issue.2, pp.313-334, 1999.
DOI : 10.1007/s101070050059
URL : http://halo.kuamp.kyoto-u.ac.jp/zagato/member/staff/fuku/./papers/proxNewton.ps.Z

F. H. Clarke, R. J. Stern, and P. R. Wolenski, Proximal smoothness and the lower-C 2 property, Journal of Convex Analysis, vol.2, issue.12, pp.117-144, 1995.

P. L. Combettes and J. Pesquet, Proximal splitting methods in signal processing In Fixedpoint algorithms for inverse problems in science and engineering, pp.185-212, 2011.

R. Correa and C. Lemaréchal, Convergence of some algorithms for convex minimization, Mathematical Programming, pp.261-275, 1993.
DOI : 10.1007/BF01585170

I. Daubechies, M. Defrise, and C. Mol, An iterative thresholding algorithm for linear inverse problems with a sparsity constraint, Communications on Pure and Applied Mathematics, vol.58, issue.11, p.57, 2004.
DOI : 10.1016/S0165-1684(03)00150-6
URL : http://onlinelibrary.wiley.com/doi/10.1002/cpa.20042/pdf

W. C. Davidon, Variable Metric Method for Minimization, SIAM Journal on Optimization, vol.1, issue.1, pp.1-17, 1991.
DOI : 10.1137/0801001
URL : https://www.osti.gov/servlets/purl/4222000

, BIBLIOGRAPHY

E. De-klerk, F. Glineur, and A. B. Taylor, On the worst-case complexity of the gradient method with exact line search for smooth strongly convex functions. Optimization Letters, pp.1185-1199, 2017.

A. J. Defazio, A simple practical accelerated method for finite sums, Advances in Neural Information Processing Systems (NIPS), 2016.

A. J. Defazio, F. Bach, and S. Lacoste-julien, SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives, Advances in Neural Information Processing Systems (NIPS), 2014.
URL : https://hal.archives-ouvertes.fr/hal-01016843

A. J. Defazio, J. Domke, and T. S. Caetano, Finito: A faster, permutable incremental gradient method for big data problems, Proceedings of the International Conferences on Machine Learning (ICML), 2014.

J. E. Dennis and J. J. Moré, A characterization of superlinear convergence and its application to quasi-Newton methods, Mathematics of Computation, vol.28, issue.126, pp.549-560, 1974.
DOI : 10.1090/S0025-5718-1974-0343581-1

J. E. Dennis and J. J. Moré, Quasi-Newton Methods, Motivation and Theory, SIAM Review, vol.19, issue.1, pp.46-89, 1977.
DOI : 10.1137/1019005
URL : https://hal.archives-ouvertes.fr/hal-01495720

O. Devolder, F. Glineur, and Y. Nesterov, First-order methods of smooth convex optimization with inexact oracle, Mathematical Programming, vol.110, issue.3, pp.37-75, 2014.
DOI : 10.1007/978-3-642-82118-9

D. Drusvyatskiy and C. Paquette, Efficiency of minimizing compositions of convex functions and smooth maps, Mathematical Programming, vol.31, issue.3, 2016.
DOI : 10.1007/BF02591949

D. Drusvyatskiy, M. Fazel, and S. Roy, An Optimal First Order Method Based on Optimal Quadratic Averaging, SIAM Journal on Optimization, vol.28, issue.1, 2016.
DOI : 10.1137/16M1072528

J. C. Duchi, P. L. Bartlett, and M. J. Wainwright, Randomized Smoothing for Stochastic Optimization, SIAM Journal on Optimization, vol.22, issue.2, pp.674-701, 2012.
DOI : 10.1137/110831659

R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification, 2000.

H. W. Engl, M. Hanke, and A. Neubauer, Regularization of inverse problems, 1996.

H. Federer, Curvature measures. Transactions of the, pp.418-491, 1959.

O. Fercoq and Z. Qu, Restarting accelerated gradient methods with a rough strong convexity estimate, 2016.

R. Fletcher, A new approach to variable metric algorithms. The computer journal, pp.317-322, 1970.

R. Fletcher and M. J. Powell, A rapidly convergent descent method for minimization. The computer journal, pp.163-168, 1963.

M. P. Friedlander and M. Schmidt, Hybrid Deterministic-Stochastic Methods for Data Fitting, SIAM Journal on Scientific Computing, vol.34, issue.3, pp.1380-1405, 2012.
DOI : 10.1137/110830629
URL : https://hal.archives-ouvertes.fr/inria-00626571

J. Friedman, T. Hastie, and R. Tibshirani, The elements of statistical learning, series in statistics, 2001.

R. Frostig, R. Ge, S. M. Kakade, and A. Sidford, Un-regularizing: approximate proximal point and faster stochastic algorithms for empirical risk minimization, Proceedings of the International Conferences on Machine Learning (ICML), 2015.

M. Fuentes, J. Malick, and C. Lemaréchal, Descentwise inexact proximal algorithms for smooth optimization, Computational Optimization and Applications, vol.11, issue.1, pp.755-769, 2012.
DOI : 10.1137/1011036
URL : https://hal.archives-ouvertes.fr/hal-00628777

M. Fukushima and L. Qi, A Globally and Superlinearly Convergent Algorithm for Nonsmooth Convex Minimization, SIAM Journal on Optimization, vol.6, issue.4, pp.1106-1120, 1996.
DOI : 10.1137/S1052623494278839

D. Gabay, Chapter ix applications of the method of multipliers to variational inequalities. Studies in mathematics and its applications, pp.299-331, 1983.

R. Ge, F. Huang, C. Jin, and Y. Yuan, Escaping from saddle points-online stochastic gradient for tensor decomposition, Conference on Learning Theory, 2015.

S. Ghadimi and G. Lan, Accelerated gradient methods for nonconvex nonlinear and stochastic programming, Mathematical Programming, pp.59-99, 2016.
DOI : 10.1002/0471722138
URL : http://arxiv.org/pdf/1310.3787

S. Ghadimi, G. Lan, and H. Zhang, Generalized Uniformly Optimal Methods for Nonlinear Programming, 2015.

P. Giselsson and M. Fält, Nonsmooth minimization using smooth envelope functions, 2016.

D. Goldfarb, A family of variable-metric methods derived by variational means, Mathematics of Computation, vol.24, issue.109, pp.23-26, 1970.
DOI : 10.1090/S0025-5718-1970-0258249-6

R. M. Gower, D. Goldfarb, and P. Richtárik, Stochastic block BFGS: Squeezing more curvature out of data, Proceedings of the International Conferences on Machine Learning (ICML), 2016.

O. Güler, On the Convergence of the Proximal Point Algorithm for Convex Minimization, SIAM Journal on Control and Optimization, vol.29, issue.2, pp.403-419, 1991.
DOI : 10.1137/0329022

, BIBLIOGRAPHY

O. Güler, New Proximal Point Algorithms for Convex Minimization, SIAM Journal on Optimization, vol.2, issue.4, pp.649-664, 1992.
DOI : 10.1137/0802032

T. Hastie, R. Tibshirani, and M. Wainwright, Statistical Learning With Sparsity: The Lasso And Generalizations, 2015.
DOI : 10.1201/b18401

B. He and X. Yuan, An Accelerated Inexact Proximal Point Algorithm for Convex Minimization, Journal of Optimization Theory and Applications, vol.2, issue.2, pp.536-548, 2012.
DOI : 10.1137/080716542

J. Hiriart-urruty and C. Lemaréchal, Convex analysis and minimization algorithms I, 1996.
DOI : 10.1007/978-3-662-02796-7

J. Hiriart-urruty and C. Lemaréchal, Convex analysis and minimization algorithms. II, 1996.
DOI : 10.1007/978-3-662-06409-2

L. Jacob, G. Obozinski, and J. Vert, Group lasso with overlap and graph lasso, Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, 2009.
DOI : 10.1145/1553374.1553431

R. Jenatton, J. Audibert, and F. Bach, Structured variable selection with sparsity-inducing norms, Journal of Machine Learning Research, vol.12, pp.2777-2824, 2011.
URL : https://hal.archives-ouvertes.fr/inria-00377732

C. Jin, R. Ge, P. Netrapalli, S. M. Kakade, and M. I. Jordan, How to escape saddle points efficiently, 2017.

C. Jin, P. Netrapalli, and M. I. Jordan, Accelerated Gradient Descent Escapes Saddle Points Faster than Gradient Descent, 2017.

R. Johnson and T. Zhang, Accelerating stochastic gradient descent using predictive variance reduction, Advances in Neural Information Processing Systems (NIPS), 2013.

G. Lan and Y. Zhou, An optimal randomized incremental gradient method, Mathematical Programming, vol.14, issue.1, 2015.
DOI : 10.1007/s10107-014-0839-0

J. Lee, Y. Sun, and M. Saunders, Proximal Newton-type methods for convex optimization, Advances in Neural Information Processing Systems (NIPS), 2012.

J. Lee, M. Simchowitz, M. I. Jordan, and B. Recht, Gradient descent only converges to minimizers, Conference on Learning Theory, pp.1246-1257, 2016.

C. Lemaréchal and C. Sagastizábal, Practical Aspects of the Moreau--Yosida Regularization: Theoretical Preliminaries, SIAM Journal on Optimization, vol.7, issue.2, pp.367-385, 1997.
DOI : 10.1137/S1052623494267127

C. Lemaréchal and C. Sagastizábal, Variable metric bundle methods: From conceptual to implementable forms, Mathematical Programming, vol.2, issue.3, pp.393-410, 1997.
DOI : 10.1007/978-3-642-68874-4_11

B. H. Li and Z. Lin, Accelerated proximal gradient methods for nonconvex programming, Advances in Neural Information Processing Systems (NIPS), 2015.

H. Lin, J. Mairal, and Z. Harchaoui, A universal catalyst for first-order optimization, Advances in Neural Information Processing Systems (NIPS), 2015.
URL : https://hal.archives-ouvertes.fr/hal-01160728

H. Lin, J. Mairal, and Z. Harchaoui, A generic quasi-newton algorithm for faster gradient-based optimization, 2017.

Q. Lin, Z. Lu, and L. Xiao, An accelerated proximal coordinate gradient method, Advances in Neural Information Processing Systems (NIPS), 2014.

D. C. Liu and J. Nocedal, On the limited memory BFGS method for large scale optimization, Mathematical Programming, vol.32, issue.2, pp.503-528, 1989.
DOI : 10.1007/BF01589116

J. , Optimization with first-order surrogate functions, Proceedings of the 30th International Conference on Machine Learning (ICML), 2013.

J. , Incremental Majorization-Minimization Optimization with Application to Large-Scale Machine Learning, SIAM Journal on Optimization, vol.25, issue.2, pp.829-855, 2015.
DOI : 10.1137/140957639

J. Mairal, F. Bach, and J. Ponce, Sparse modeling for image and vision processing. Foundations and Trends in Computer Graphics and Vision, pp.85-283, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01081139

S. Mallat, A wavelet tour of signal processing: the sparse way Academic press, 2008.

B. Martinet, Régularisation d'inéquations variationnelles par approximations successives. Revue française d'informatique et de recherche opérationnelle, série rouge, pp.154-158, 1970.

R. Mifflin, A quasi-second-order proximal bundle algorithm, Mathematical Programming, pp.51-72, 1996.
DOI : 10.1007/978-3-642-82450-0_12

A. Mokhtari and A. Ribeiro, Global convergence of online limited memory bfgs, Journal of Machine Learning Research, vol.16, issue.1, pp.3151-3181, 2015.

J. J. Moré and J. A. Trangenstein, On the Global Convergence of Broyden's Method, Mathematics of Computation, vol.30, issue.135, pp.523-540, 1976.
DOI : 10.2307/2005323

J. J. Moreau, Fonctions convexes duales et points proximaux dans un espace hilbertien, Comptes Rendus de l'Académie des Sciences de Paris, pp.2897-2899, 1962.

P. Moritz, R. Nishihara, and M. I. Jordan, A linearly-convergent stochastic L-BFGS algorithm, Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), 2016.

, BIBLIOGRAPHY

T. Murata and T. Suzuki, Stochastic dual averaging methods using variance reduction techniques for regularized empirical risk minimization problems, 2016.

B. K. Natarajan, Sparse Approximate Solutions to Linear Systems, SIAM Journal on Computing, vol.24, issue.2, pp.227-234, 1995.
DOI : 10.1137/S0097539792240406

A. Nemirovskii, D. B. Yudin, and E. R. Dawson, Problem complexity and method efficiency in optimization, 1983.

Y. Nesterov, A method of solving a convex programming problem with convergence rate, Soviet Mathematics Doklady, vol.27, issue.1 22, pp.372-376, 1983.

Y. Nesterov, Introductory Lectures on Convex Optimization: A Basic Course, 2004.
DOI : 10.1007/978-1-4419-8853-9

Y. Nesterov, Smooth minimization of non-smooth functions, Mathematical Programming, vol.269, issue.1, pp.127-152, 2005.
DOI : 10.1007/s10107-004-0552-5

Y. Nesterov, Primal-dual subgradient methods for convex problems, Mathematical Programming, vol.8, issue.1, pp.221-259, 2009.
DOI : 10.1007/978-3-642-82118-9

Y. Nesterov, How to make the gradients small. OPTIMA, MPS Newsletter, pp.10-11, 2012.

Y. Nesterov, Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems, SIAM Journal on Optimization, vol.22, issue.2, pp.341-362, 2012.
DOI : 10.1137/100802001

Y. Nesterov, Gradient methods for minimizing composite functions, Mathematical Programming, pp.125-161, 2013.
DOI : 10.1109/TIT.2005.864420

Y. Nesterov and B. T. Polyak, Cubic regularization of Newton method and its global performance, Mathematical Programming, pp.177-205, 2006.
DOI : 10.1007/s10107-006-0706-8

J. , Updating quasi-Newton matrices with limited storage, Mathematics of Computation, vol.35, issue.151, pp.773-782, 1980.
DOI : 10.1090/S0025-5718-1980-0572855-7

J. Nocedal and S. Wright, Numerical optimization, 2006.
DOI : 10.1007/b98874

B. O-'donoghue and E. Candes, Adaptive restart for accelerated gradient schemes. Foundations of computational mathematics, pp.715-732, 2015.

M. O. Neill and S. J. Wright, Behavior of Accelerated Gradient Methods Near Critical Points of Nonconvex Problems ArXiv e-prints, 2017.

C. Paquette, H. Lin, D. Drusvyatskiy, J. Mairal, and Z. Harchaoui, Catalyst acceleration for gradient-based non-convex optimization. arXiv preprint arXiv:1703, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01536017

N. Parikh and S. P. Boyd, Proximal Algorithms, Foundations and Trends?? in Optimization, vol.1, issue.3, pp.123-231, 2014.
DOI : 10.1561/2400000003

, BIBLIOGRAPHY

R. A. Poliquin and R. T. Rockafellar, Prox-regular functions in variational analysis. Transactions of the, pp.1805-1838, 1996.

M. J. Powell, A new algorithm for unconstrained optimization. Nonlinear programming, pp.31-65, 1970.

M. J. Powell, On the Convergence of the Variable Metric Algorithm, IMA Journal of Applied Mathematics, vol.7, issue.1, pp.21-36, 1971.
DOI : 10.1093/imamat/7.1.21

M. J. Powell, How bad are the BFGS and DFP methods when the objective function is quadratic?, Mathematical Programming, vol.13, issue.1, pp.34-47, 1986.
DOI : 10.1007/BF01582161

H. Raguet, J. Fadili, and G. Peyré, A Generalized Forward-Backward Splitting, SIAM Journal on Imaging Sciences, vol.6, issue.3, pp.1199-1226, 2013.
DOI : 10.1137/120872802
URL : https://hal.archives-ouvertes.fr/hal-00613637

M. Razaviyayn, M. Hong, and Z. Luo, A Unified Convergence Analysis of Block Successive Minimization Methods for Nonsmooth Optimization, SIAM Journal on Optimization, vol.23, issue.2, pp.1126-1153, 2013.
DOI : 10.1137/120891009

S. J. Reddi, A. Hefny, S. Sra, B. Poczos, and A. Smola, Stochastic variance reduction for nonconvex optimization, International conference on machine learning (ICML), 2016.

S. J. Reddi, S. Sra, B. Poczos, and A. J. Smola, Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization, Advances in Neural Information Processing Systems (NIPS), 2016.

P. Richtárik and M. Taká?, Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function, Mathematical Programming, pp.1-38, 2014.
DOI : 10.1111/j.1467-9868.2005.00503.x

R. T. Rockafellar, Monotone Operators and the Proximal Point Algorithm, SIAM Journal on Control and Optimization, vol.14, issue.5, pp.877-898, 1976.
DOI : 10.1137/0314056

R. T. Rockafellar, Favorable classes of Lipschitz-continuous functions in subgradient optimization, Progress in nondifferentiable optimization of IIASA Collaborative Proc. Ser. CP-82, pp.125-143, 1982.

R. T. Rockafellar and R. J. Wets, Variational analysis, of Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences, 1998.
DOI : 10.1007/978-3-642-02431-3

S. Salzo and S. Villa, Inexact and accelerated proximal point algorithms, Journal of Convex Analysis, vol.19, issue.4, pp.1167-1192, 2012.

K. Scheinberg and X. Tang, Practical inexact proximal quasi-Newton method with global complexity analysis, Mathematical Programming, pp.495-529, 2016.
DOI : 10.1109/TSP.2009.2016892

M. Schmidt, D. Kim, and S. Sra, Projected Newton-type methods in machine learning, pp.305-330, 2011.

M. Schmidt, N. L. Roux, and F. Bach, Convergence rates of inexact proximal-gradient methods for convex optimization, Advances in Neural Information Processing Systems (NIPS), 2011.
URL : https://hal.archives-ouvertes.fr/inria-00618152

M. Schmidt, N. L. Roux, and F. Bach, Minimizing finite sums with the stochastic average gradient, Mathematical Programming, vol.160, issue.1, pp.83-112, 2017.
URL : https://hal.archives-ouvertes.fr/hal-00860051

D. Scieur, V. Roulet, F. Bach, and A. , Integration methods and accelerated optimization algorithms, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01474045

S. Shalev-shwartz, SDCA without duality, regularization, and individual convexity, Proceedings of the International Conferences on Machine Learning (ICML), 2016.

S. Shalev-shwartz and S. Ben-david, Understanding machine learning: From theory to algorithms, 2014.
DOI : 10.1017/CBO9781107298019

S. Shalev-shwartz and T. Zhang, Proximal stochastic dual coordinate ascent, 2012.

S. Shalev-shwartz and T. Zhang, Stochastic dual coordinate ascent methods for regularized loss minimization, Journal of Machine Learning Research, vol.14, issue.Feb, pp.567-599, 2013.

S. Shalev-shwartz and T. Zhang, Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization, Mathematical Programming, pp.105-145, 2016.
DOI : 10.1023/A:1012498226479

D. F. Shanno, Conditioning of quasi-Newton methods for function minimization, Mathematics of Computation, vol.24, issue.111, pp.647-656, 1970.
DOI : 10.1090/S0025-5718-1970-0274029-X

J. Sherman and W. J. Morrison, Adjustment of an Inverse Matrix Corresponding to a Change in One Element of a Given Matrix, The Annals of Mathematical Statistics, vol.21, issue.1, pp.124-127, 1950.
DOI : 10.1214/aoms/1177729893

N. Z. Shor, Minimization methods for non-differentiable functions, 2012.
DOI : 10.1007/978-3-642-82118-9

M. V. Solodov and B. F. Svaiter, A unified framework for some inexact proximal point algorithms . Numerical Functional Analysis and Optimization, 2001.

L. Stella, A. Themelis, and P. Patrinos, Forward???backward quasi-Newton methods for nonsmooth optimization problems, Computational Optimization and Applications, vol.26, issue.3, pp.443-487, 2017.
DOI : 10.1137/0726042

B. Adrien, . Taylor, M. Julien, F. Hendrickx, and . Glineur, Exact worst-case performance of first-order methods for composite convex optimization, SIAM Journal on Optimization, vol.27, issue.3, pp.1283-1313, 2017.

A. Themelis, L. Stella, and P. Patrinos, Forward-Backward Envelope for the Sum of Two Nonconvex Functions: Further Properties and Nonmonotone Linesearch Algorithms, SIAM Journal on Optimization, vol.28, issue.3, 2016.
DOI : 10.1137/16M1080240

R. Tibshirani, Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society. Series B (Methodological), pp.267-288, 1996.

P. Tseng, On accelerated proximal gradient methods for convex-concave optimization. submitted to SIAM, Journal on Optimization, 2008.

Y. Z. Tsypkin, Foundations of the theory of learning systems, 1973.

Y. Z. Tsypkin and Z. J. Nikolic, Adaptation and learning in automatic systems, 1971.

V. Vapnik, The nature of statistical learning theory Springer science & business media, 2013.

J. H. Wilkinson, The algebraic eigenvalue problem, 1965.

B. E. Woodworth and N. Srebro, Tight complexity bounds for optimizing composite objectives, Advances in Neural Information Processing Systems (NIPS), 2016.

L. Xiao, Dual averaging methods for regularized stochastic learning and online optimization, Journal of Machine Learning Research, vol.11, pp.2543-2596, 2010.

L. Xiao and T. Zhang, A Proximal Stochastic Gradient Method with Progressive Variance Reduction, SIAM Journal on Optimization, vol.24, issue.4, pp.2057-2075, 2014.
DOI : 10.1137/140961791

K. Yosida, Functional analysis, 1980.

J. Yu, S. V. Vishwanathan, S. Günter, and N. N. Schraudolph, A quasi-Newton approach to non-smooth convex optimization, Proceedings of the 25th international conference on Machine learning, ICML '08, 2008.
DOI : 10.1145/1390156.1390309

M. Yuan and Y. Lin, Model selection and estimation in regression with grouped variables, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol.58, issue.1, pp.49-67, 2006.
DOI : 10.1198/016214502753479356

Y. Zhang and L. Xiao, Stochastic primal-dual coordinate method for regularized empirical risk minimization, Proceedings of the 32nd International Conference on Machine Learning (ICML), 2015.

Y. Zhang and L. Xiao, Stochastic primal-dual coordinate method for regularized empirical risk minimization, Proceedings of the International Conferences on Machine Learning (ICML), 2015.

H. Zou and T. Hastie, Regularization and variable selection via the elastic net, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol.5, issue.2, pp.301-320, 2005.
DOI : 10.1073/pnas.201162998