, Bibliography

J. Abernethy, E. Hazan, and A. Rakhlin, Competing in the dark: An efficient algorithm for bandit linear optimization, Proceedings of the International Conference on Learning Theory (COLT), 2008.

A. Agarwal and L. Bottou, A lower bound for the optimization of finite sums, Proceedings of the Conference on Machine Learning (ICML), 2015.

A. Agarwal, P. L. Bartlett, P. Ravikumar, and M. J. Wainwright, Information-Theoretic Lower Bounds on the Oracle Complexity of Stochastic Convex Optimization, IEEE Transactions on Information Theory, vol.58, issue.5, pp.3235-3249, 2012.
DOI : 10.1109/TIT.2011.2182178

J. Alayrac, P. Bojanowski, N. Agrawal, I. Laptev, J. Sivic et al., Unsupervised Learning from Narrated Instruction Videos, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
DOI : 10.1109/CVPR.2016.495
URL : https://hal.archives-ouvertes.fr/hal-01171193

Z. Allen-zhu and L. Orecchia, Linear coupling: An ultimate unification of gradient and mirror descent, Proceedings of the Innovations in Theoretical Computer Science (ITCS), 2017.

F. Alvarez, H. Attouch, J. Bolte, and P. Redont, A second-order gradient-like dissipative dynamical system with Hessian-driven damping. Application to optimization and mechanics, J. Math. Pures Appl, issue.98, pp.81747-779, 2002.

D. Amelunxen, M. Lotz, M. B. Mccoy, and J. A. Tropp, Living on the edge: phase transitions in convex programs with random data, Information and Inference, vol.3, issue.3, 2014.
DOI : 10.1093/imaiai/iau005

N. N. Anuchina, K. I. Babenko, S. K. Godunov, N. A. Dmitriev, L. V. Dmitrieva et al., Teoreticheskie osnovy i konstruirovanie chislennykh algoritmov zadach matematichesko? ? fiziki, Nauka, 1979.

Y. Arjevani and O. Shamir, On the iteration complexity of oblivious first-order optimization algorithms, Proceedings of the Conference on Machine Learning (ICML), 2016.

L. Arnold, Random Dynamical Systems, 1998.

D. Arthur and S. Vassilvitskii, K-means++: The advantages of careful seeding, Proceedings of the ACM-SIAM Symposium on Discrete Algorithms (SODA), 2007.

J. E. Atkins, E. G. Boman, and B. Hendrickson, A Spectral Algorithm for Seriation and the Consecutive Ones Problem, SIAM Journal on Computing, vol.28, issue.1, pp.297-310, 1998.
DOI : 10.1137/S0097539795285771

H. Attouch and J. Peypouquet, The Rate of Convergence of Nesterov's Accelerated Forward-Backward Method is Actually Faster Than $1/k^2$, SIAM Journal on Optimization, vol.26, issue.3, pp.1824-1834, 2016.
DOI : 10.1137/15M1046095

H. Attouch, Z. Chbani, J. Peypouquet, and P. Redont, Fast convergence of inertial dynamics and algorithms with asymptotic vanishing viscosity, Mathematical Programming, vol.23, issue.3, pp.1-53, 2016.
DOI : 10.1137/110844805
URL : https://hal.archives-ouvertes.fr/hal-01821929

H. Attouch, J. Peypouquet, and P. Redont, Fast convex optimization via inertial dynamics with Hessian driven damping, Journal of Differential Equations, vol.261, issue.10, pp.5734-5783, 2016.
DOI : 10.1016/j.jde.2016.08.020
URL : https://hal.archives-ouvertes.fr/hal-02072674

J. Audibert and O. Catoni, Robust linear least squares regression, The Annals of Statistics, vol.39, issue.5, pp.2766-2794, 2011.
DOI : 10.1214/11-AOS918SUPP
URL : https://hal.archives-ouvertes.fr/hal-00522534

M. Ayer, H. D. Brunk, G. M. Ewing, W. T. Reid, and E. Silverman, An Empirical Distribution Function for Sampling with Incomplete Information, The Annals of Mathematical Statistics, vol.26, issue.4, pp.641-647
DOI : 10.1214/aoms/1177728423

K. S. Azoury and M. K. Warmuth, Relative loss bounds for on-line density estimation with the exponential family of distributions, Mach. Learn, vol.43, issue.3, 2001.

F. Bach, Self-concordant analysis for logistic regression, Electronic Journal of Statistics, vol.4, issue.0, pp.384-414, 2010.
DOI : 10.1214/09-EJS521
URL : https://hal.archives-ouvertes.fr/hal-00426227

F. Bach, Adaptivity of averaged stochastic gradient descent to local strong convexity for logistic regression, J. Mach. Learn. Res, vol.15, issue.1, pp.595-627, 2014.
URL : https://hal.archives-ouvertes.fr/hal-00804431

F. Bach, Duality Between Subgradient and Conditional Gradient Methods, SIAM Journal on Optimization, vol.25, issue.1, pp.115-129, 2015.
DOI : 10.1137/130941961
URL : https://hal.archives-ouvertes.fr/hal-00757696

F. Bach and Z. Harchaoui, DIFFRAC : a discriminative and flexible framework for clustering, Advances in Neural Information Processing Systems (NIPS), 2007.

F. Bach and E. Moulines, Non-asymptotic analysis of stochastic approximation algorithms for machine learning, Advances in Neural Information Processing Systems (NIPS), 2011.
URL : https://hal.archives-ouvertes.fr/hal-00608041

F. Bach and E. Moulines, Non-strongly-convex smooth stochastic approximation with convergence rate O(1/n), Advances in Neural Information Processing Systems (NIPS), 2013.
URL : https://hal.archives-ouvertes.fr/hal-00831977

F. Bach, R. Jenatton, J. Mairal, and G. Obozinski, Optimization with Sparsity-Inducing Penalties, Foundations and Trends?? in Machine Learning, vol.4, issue.1, pp.1-106, 2012.
DOI : 10.1561/2200000015
URL : https://hal.archives-ouvertes.fr/hal-00613125

R. E. Barlow, D. J. Bartholomew, J. M. Bremner, and H. D. Brunk, Statistical Inference under Order Restrictions. The Theory and Application of Isotonic Regression, 1972.

H. H. Bauschke and J. M. Borwein, Legendre functions and the method of random Bregman projections, J. Convex Anal, vol.4, issue.1, pp.27-67, 1997.

H. H. Bauschke and P. L. Combettes, Convex Analysis and Monotone Operator Theory in Hilbert Spaces, CMS Books in Mathematics, 2011.
DOI : 10.1007/978-3-319-48311-5
URL : https://hal.archives-ouvertes.fr/hal-00643354

H. H. Bauschke, J. Bolte, and M. Teboulle, A Descent Lemma Beyond Lipschitz Gradient Continuity: First-Order Methods Revisited and Applications, Mathematics of Operations Research, vol.42, issue.2, 2016.
DOI : 10.1287/moor.2016.0817
URL : http://publications.ut-capitole.fr/25852/1/25852.pdf

A. Beck and M. Teboulle, Mirror descent and nonlinear projected subgradient methods for convex optimization, Operations Research Letters, vol.31, issue.3, pp.167-175, 2003.
DOI : 10.1016/S0167-6377(02)00231-6

A. Beck and M. Teboulle, A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems, SIAM Journal on Imaging Sciences, vol.2, issue.1, pp.183-202, 2009.
DOI : 10.1137/080716542
URL : http://ie.technion.ac.il/%7Ebecka/papers/finalicassp2009.pdf

P. C. Bellec, Sharp oracle inequalities for least squares estimators in shape restricted regression. arXiv preprint, 2015.
DOI : 10.1214/17-aos1566
URL : http://arxiv.org/pdf/1510.08029

P. C. Bellec, Private communication, 2016.

P. C. Bellec and A. B. Tsybakov, Sharp oracle bounds for monotone and convex regression through aggregation, J. Mach. Learn. Res, vol.16, pp.1879-1892, 2015.

R. Bellman, A note on cluster analysis and dynamic programming, Mathematical Biosciences, vol.18, issue.3-4, 1973.
DOI : 10.1016/0025-5564(73)90007-2

A. Ben-tal and A. Nemirovski, Lectures on Modern Convex Optimization, MPS Series on Optimization. Society for Industrial and Applied Mathematics, 2001.
DOI : 10.1137/1.9780898718829
URL : http://iew3.technion.ac.il/Labs/Opt/opt/LN/Final.pdf

Y. Bengio, A. Courville, and P. Vincent, Representation Learning: A Review and New Perspectives, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.35, issue.8, pp.1798-1828, 2013.
DOI : 10.1109/TPAMI.2013.50
URL : http://www.cs.princeton.edu/courses/archive/spring13/cos598C/Representation Learning - A Review and New Perspectives.pdf

A. Benveniste, P. Priouret, and M. Métivier, Adaptive Algorithms and Stochastic Approximations, 1990.
DOI : 10.1007/978-3-642-75894-2

Q. Berthet and P. Rigollet, Complexity theoretic lower bounds for sparse principal component detection, Proceedings of the International Conference on Learning Theory (COLT), 2013.

D. P. , Bertsekas. Nonlinear Programming. Athena scientific, 1999.

P. J. Bickel and J. Fan, Some problems on the estimation of unimodal densities, Statist. Sinica, vol.6, issue.1, 1996.

. Birge, Estimation of unimodal densities without smoothness assumptions, The Annals of Statistics, vol.25, issue.3, pp.970-981, 1997.
DOI : 10.1214/aos/1069362733

M. S. Birman and M. Z. Solomjak, PIECEWISE-POLYNOMIAL APPROXIMATIONS OF FUNCTIONS OF THE CLASSES $ W_{p}^{\alpha}$, Mathematics of the USSR-Sbornik, vol.2, issue.3, pp.73331-355, 1967.
DOI : 10.1070/SM1967v002n03ABEH002343

G. Blanchard, M. Kawanabe, M. Sugiyama, V. Spokoiny, and K. Müller, In search of non-Gaussian components of a high-dimensional distribution, J. Mach. Learn. Res, vol.7, pp.247-282, 2006.

P. Bojanowski, F. Bach, I. Laptev, J. Ponce, C. Schmid et al., Finding Actors and Actions in Movies, 2013 IEEE International Conference on Computer Vision, 2013.
DOI : 10.1109/ICCV.2013.283
URL : https://hal.archives-ouvertes.fr/hal-00904991

J. Bolte and M. Teboulle, Smooth optimization with approximate gradient, SIAM J. Optim, vol.43, issue.3, pp.1266-1292, 2003.

V. S. Borkar, Stochastic approximation with two time scales, Systems & Control Letters, vol.29, issue.5, pp.291-294, 1997.
DOI : 10.1016/S0167-6911(97)90015-3

V. S. Borkar, Stochastic Approximation: a Dynamical Systems Viewpoint, 2008.

J. M. Borwein and A. S. Lewis, Convex Analysis and Nonlinear Optimization, CMS Books in Mathematics, vol.3, 2000.

L. Bottou and O. Bousquet, The tradeoffs of large scale learning, Advances in Neural Information Processing Systems (NIPS), 2008.

L. Bottou and Y. Le-cun, On-line learning for very large data sets, Applied Stochastic Models in Business and Industry, vol.14, issue.2, pp.137-151, 2005.
DOI : 10.1007/978-3-642-75894-2

S. Boucheron and P. Massart, A high-dimensional Wilks phenomenon. Probab. Theory Related Fields, pp.405-433, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00622983

S. Boucheron, G. Lugosi, and P. Massart, Concentration Inequalities, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00751496

N. Boumal, B. Mishra, P. Absil, and R. Sepulchre, Manopt, a Matlab toolbox for optimization on manifolds, J. Mach. Learn. Res, 2014.

J. Bourgain, V. H. Vu, and P. M. Wood, On the singularity probability of discrete random matrices, Journal of Functional Analysis, vol.258, issue.2, pp.559-603, 2010.
DOI : 10.1016/j.jfa.2009.04.016

V. Boyarshinov and M. Magdon-ismail, Linear time isotonic and unimodal regression in the L 1 and L 1 norms, J. Discrete Algorithms, vol.4, issue.4, 2006.

S. Boyd and L. Vandenberghe, Convex Optimization, 2004.

S. Boyd, L. Ghaoui, E. Feron, and V. Balakrishnan, Linear Matrix Inequalities in System and Control Theory, 1994.

L. M. Bregman, The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming, USSR Computational Mathematics and Mathematical Physics, vol.7, issue.3, pp.620-631, 1967.
DOI : 10.1016/0041-5553(67)90040-7

R. Bro and N. Sidiropoulos, Least squares algorithms under unimodality and non-negativity constraints, Journal of Chemometrics, vol.12, issue.4, pp.223-247, 1998.
DOI : 10.1002/(SICI)1099-128X(199807/08)12:4<223::AID-CEM511>3.0.CO;2-2

S. Bubeck, Convex Optimization: Algorithms and Complexity, Foundations and Trends?? in Machine Learning, vol.8, issue.3-4, pp.3-4
DOI : 10.1561/2200000050

S. Bubeck, Y. Lee, and M. Singh, A geometric alternative to nesterov's accelerated gradient descent, 2015.

G. Casella and R. L. Berger, Statistical Inference. Statistics/Probability Series, 1990.

A. Cauchy, Méthode générale pour la résolution des systemes d'équations simultanées, Comp. Rend. Sci. Paris, vol.25, pp.536-538, 1847.

N. Cesa-bianchi and G. Lugosi, Prediction, Learning, and Games, 2006.

N. Cesa-bianchi, A. Conconi, and C. Gentile, On the Generalization Ability of On-Line Learning Algorithms, IEEE Transactions on Information Theory, vol.50, issue.9, pp.2050-2057, 2004.
DOI : 10.1109/TIT.2004.833339

V. Chandrasekaran, B. Recht, P. A. Parrilo, and A. S. Willsky, The Convex Geometry of Linear Inverse Problems, Foundations of Computational Mathematics, vol.1, issue.10, pp.805-849, 2012.
DOI : 10.1007/978-1-4613-8431-1

S. Chatterjee, A new perspective on least squares under convex constraint, The Annals of Statistics, vol.42, issue.6, pp.2340-2381
DOI : 10.1214/14-AOS1254

S. Chatterjee, Matrix estimation by Universal Singular Value Thresholding, The Annals of Statistics, vol.43, issue.1, pp.177-214, 2015.
DOI : 10.1214/14-AOS1272

S. Chatterjee and J. Lafferty, Adaptive risk bounds in unimodal regression. arXiv preprint, 2015.

S. Chatterjee and S. Mukherjee, On estimation in tournaments and graphs under monotonicity constraints. arXiv preprint, 2016.

S. Chatterjee, A. Guntuboyina, and B. Sen, On risk bounds in isotonic and other shape restricted regression problems, The Annals of Statistics, vol.43, issue.4, pp.1774-1800, 2015.
DOI : 10.1214/15-AOS1324SUPP

S. Chatterjee, A. Guntuboyina, and B. Sen, On matrix estimation under monotonicity constraints, Bernoulli, vol.24, issue.2, p.2017
DOI : 10.3150/16-BEJ865

G. Chen and M. Teboule, Convergence Analysis of a Proximal-Like Minimization Algorithm Using Bregman Functions, SIAM Journal on Optimization, vol.3, issue.3, pp.538-543, 1993.
DOI : 10.1137/0803026

K. L. Chung, On a Stochastic Approximation Method, The Annals of Mathematical Statistics, vol.25, issue.3, pp.463-483, 1954.
DOI : 10.1214/aoms/1177728716

F. Clarke, Functional Analysis, Calculus of Variations and Optimal Control, Graduate Texts in Mathematics, vol.264, 2013.
DOI : 10.1007/978-1-4471-4820-3
URL : https://hal.archives-ouvertes.fr/hal-00865914

I. Colin, A. Bellet, J. Salmon, and S. Clémençon, Gossip dual averaging for decentralized optimization of pairwise functions, Proceedings of the Conference on Machine Learning (ICML), 2016.
URL : https://hal.archives-ouvertes.fr/hal-01329315

O. Collier and A. S. Dalalyan, Minimax rates in permutation estimation for feature matching, J. Mach. Learn. Res, vol.17, issue.6, pp.1-32, 2016.
URL : https://hal.archives-ouvertes.fr/hal-00874514

P. L. Combettes and J. Pesquet, Proximal Splitting Methods in Signal Processing, Fixed-Point Algorithms for Inverse Problems in Science and Engineering, pp.185-212, 2011.
DOI : 10.1007/978-1-4419-9569-8_10
URL : https://hal.archives-ouvertes.fr/hal-00643807

A. Cotter, O. Shamir, N. Srebro, and K. Sridharan, Better mini-batch algorithms via accelerated gradient methods, Advances in Neural Information Processing Systems (NIPS), 2011.

T. Cover and P. Hart, Nearest neighbor pattern classification, IEEE Transactions on Information Theory, vol.13, issue.1, pp.21-27, 1967.
DOI : 10.1109/TIT.1967.1053964
URL : http://ssg.mit.edu/cal/abs/2000_spring/np_dens/classification/cover67.pdf

T. M. Cover and J. A. Thomas, Elements of Information Theory, 2006.
DOI : 10.1002/047174882x

J. B. Crockett and H. Chernoff, Gradient methods of maximization, Pacific Journal of Mathematics, vol.5, issue.1, pp.33-50, 1955.
DOI : 10.2140/pjm.1955.5.33
URL : http://msp.org/pjm/1955/5-1/pjm-v5-n1-p03-s.pdf

H. B. Curry, The method of steepest descent for non-linear minimization problems, Quarterly of Applied Mathematics, vol.2, issue.3, pp.258-261, 1944.
DOI : 10.1090/qam/10667
URL : https://www.ams.org/qam/1944-02-03/S0033-569X-1944-10667-3/S0033-569X-1944-10667-3.pdf

D. Dai, P. Rigollet, L. Xia, and T. Zhang, Aggregation of affine estimators, Electronic Journal of Statistics, vol.8, issue.1, pp.302-327, 2014.
DOI : 10.1214/14-EJS886
URL : http://doi.org/10.1214/14-ejs886

A. S. Dalalyan, Theoretical guarantees for approximate sampling from a smooth and log-concave density. to appear in jrss b, arXiv preprint arXiv:1412, 2014.
DOI : 10.1111/rssb.12183
URL : http://arxiv.org/pdf/1412.7392

C. Daskalakis, I. Diakonikolas, and R. A. Servedio, Learning k-modal distributions via testing, Proceedings of the ACM-SIAM Symposium on Discrete Algorithms (SODA), 2012.
DOI : 10.1137/1.9781611973099.108
URL : https://epubs.siam.org/doi/pdf/10.1137/1.9781611973099.108

C. Daskalakis, I. Diakonikolas, R. A. Servedio, G. Valiant, and P. Valiant, Testing k-modal distributions: Optimal algorithms via reductions, Proceedings of the ACM-SIAM Symposium on Discrete Algorithms (SODA), 2013.
DOI : 10.1137/1.9781611973105.131
URL : https://epubs.siam.org/doi/pdf/10.1137/1.9781611973105.131

A. , Smooth Optimization with Approximate Gradient, SIAM Journal on Optimization, vol.19, issue.3, pp.1171-1183, 2008.
DOI : 10.1137/060676386

D. Davidson and J. Marschak, Experimental tests of a stochastic decision theory. Measurement: Definitions and theories, 1959.

D. Bie and N. Cristianini, Convex methods for transduction, Advances in Neural Information Processing Systems (NIPS), 2003.

F. De-la-torre and T. Kanade, Discriminative cluster analysis, Proceedings of the 23rd international conference on Machine learning , ICML '06, 2006.
DOI : 10.1145/1143844.1143875

A. Defazio, F. Bach, and S. Lacoste-julien, SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives, Advances in Neural Information Processing Systems (NIPS), 2014.
URL : https://hal.archives-ouvertes.fr/hal-01016843

A. Défossez and F. Bach, Averaged least-mean-squares: bias-variance trade-offs and optimal sampling distributions, Proceedings of the International Conference on Artificial Intelligence and Statistics, p.2015

O. Dekel, R. Gilad-bachrach, O. Shamir, and L. Xiao, Optimal distributed online prediction using mini-batches, J. Mach. Learn. Res, vol.13, pp.165-202, 2012.

O. Devolder, F. Glineur, and Y. Nesterov, First-order methods with inexact oracle: the strongly convex case, CORE Discussion Papers, 2013.

O. Devolder, F. Glineur, and Y. Nesterov, First-order methods of smooth convex optimization with inexact oracle, Mathematical Programming, vol.110, issue.3, pp.37-75, 2014.
DOI : 10.1007/978-3-642-82118-9

L. Devroye, L. Györfi, and G. Lugosi, A Probabilistic Theory of Pattern Recognition, Applications of Mathematics, vol.31, 1996.
DOI : 10.1007/978-1-4612-0711-5

E. Diederichs, A. Juditsky, A. Nemirovski, and V. Spokoiny, Sparse non Gaussian component analysis by semidefinite programming, Machine Learning, vol.290, issue.2, pp.211-238, 2013.
DOI : 10.1007/978-1-4757-2545-2
URL : https://hal.archives-ouvertes.fr/hal-00978264

A. Dieuleveut and F. Bach, Nonparametric stochastic approximation with large step-sizes, The Annals of Statistics, vol.44, issue.4, pp.1363-1399, 2015.
DOI : 10.1214/15-AOS1391
URL : https://hal.archives-ouvertes.fr/hal-01053831

A. Dieuleveut, N. Flammarion, and F. Bach, Harder, Better, Faster, Stronger Convergence Rates for Least-Squares Regression. arXiv preprint, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01275431

C. Ding and T. Li, Adaptive dimension reduction using discriminant analysis and Kmeans clustering, Proceedings of the Conference on Machine Learning (ICML), p.265, 2007.
DOI : 10.1145/1273496.1273562
URL : http://www.cs.fiu.edu/~taoli/pub/Ding-Li-ICML2007.pdf

L. Donoho, Gelfand n-widths and the method of least squares, Statistics Technical Report, vol.282, 1990.

D. L. Donoho, A. Maleki, and A. Montanari, Message-passing algorithms for compressed sensing, Proceedings of the National Academy of Sciences, pp.18914-18919, 2009.
DOI : 10.1080/14786437708235992
URL : http://www.pnas.org/content/106/45/18914.full.pdf

R. D. Driver, Note on a paper of Halanay on stability for finite difference equations, Archive for Rational Mechanics and Analysis, vol.12, issue.3, pp.241-243, 1965.
DOI : 10.1007/BF00281223

J. Duchi and F. Ruan, Local asymptotics for some stochastic optimization problems: optimality, constraint identification, and dual averaging, 2016.

J. Duchi and Y. Singer, Efficient online and batch learning using forward backward splitting, J. Mach. Learn. Res, vol.10, pp.2899-2934, 2009.

J. Duchi, S. Shalev-shwartz, Y. Singer, and A. Tewari, Composite objective mirror descent, Proceedings of the International Conference on Learning Theory (COLT), 2010.

J. Duchi, A. Agarwal, and M. Wainwright, Dual Averaging for Distributed Optimization: Convergence Analysis and Network Scaling, IEEE Transactions on Automatic Control, vol.57, issue.3, pp.592-606, 2012.
DOI : 10.1109/TAC.2011.2161027
URL : http://arxiv.org/pdf/1005.2012

M. Duflo, Random Iterative Models, 1997.
DOI : 10.1007/978-3-662-12880-0

A. Durmus and E. Moulines, Non-asymptotic convergence analysis for the unadjusted langevin algorithm, Ann. Appl. Prob, 2017.
DOI : 10.1214/16-aap1238
URL : https://hal.archives-ouvertes.fr/hal-01176132

A. Durmus, U. Simsekli, E. Moulines, R. Badeau, and G. Richard, Stochastic gradient Richardson-Romberg markov chain monte carlo, Advances in Neural Information Processing Systems (NIPS), 2016.
URL : https://hal.archives-ouvertes.fr/hal-01354064

P. P. Eggermont and V. N. Lariccia, Maximum likelihood estimation of smooth monotone and unimodal densities, Ann. Statist, vol.28, issue.3, 2000.

Y. M. Ermoliev, Methods of solution of nonlinear extremal problems, Cybernetics, vol.2, issue.4, pp.1-14, 1966.
DOI : 10.1007/BF01071403

Y. M. Ermoliev, The method of generalized stochastic gradients and stochastic quasi- Fejér sequences, Kibernetika, issue.2, pp.73-83, 1969.

V. Fabian, On Asymptotic Normality in Stochastic Approximation, The Annals of Mathematical Statistics, vol.39, issue.4, pp.1327-1332, 1968.
DOI : 10.1214/aoms/1177698258
URL : http://doi.org/10.1214/aoms/1177698258

P. C. Fishburn, Binary choice probabilities: on the varieties of stochastic transitivity, Journal of Mathematical Psychology, vol.10, issue.4, 1973.
DOI : 10.1016/0022-2496(73)90021-7

F. Fogel, R. Jenatton, F. Bach, and A. , Convex Relaxations for Permutation Problems, Advances in Neural Information Processing Systems (NIPS), 2013.
DOI : 10.1137/130947362
URL : https://hal.archives-ouvertes.fr/hal-01239317

D. Freedman, Statistical Models: Theory and Practice, 2009.
DOI : 10.1017/CBO9780511815867

J. H. Friedman and W. Stuetzle, Projection Pursuit Regression, Journal of the American Statistical Association, vol.4, issue.376, pp.817-823, 1981.
DOI : 10.1080/03610927508827223

A. Frieze and M. Jerrum, Improved approximation algorithms for MAX k-CUT and MAX BISECTION, Integer Programming and Combinatorial Optimization, 1995.
DOI : 10.1007/3-540-59408-6_37
URL : http://karush.rutgers.edu/~alizadeh/Sdppage/Frieze/k_cut.ps

M. Frisen, Unimodal regression, Journal of the Royal Statistical Society. Series D, vol.35, issue.4, pp.479-485, 1986.

R. Frostig, R. Ge, S. M. Kakade, and A. Sidford, Un-regularizing: Approximate proximal point and faster stochastic algorithms for empirical risk minimization, Proceedings of the Conference on Machine Learning (ICML), 2015.

D. R. Fulkerson and O. A. Gross, Incidence matrices with the consecutive 1's property, Bulletin of the American Mathematical Society, vol.70, issue.5, pp.681-684, 1964.
DOI : 10.1090/S0002-9904-1964-11160-5
URL : http://www.ams.org/bull/1964-70-05/S0002-9904-1964-11160-5/S0002-9904-1964-11160-5.pdf

C. Gao, Y. Lu, and H. H. Zhou, Rate-optimal graphon estimation, The Annals of Statistics, vol.43, issue.6, pp.2624-2652
DOI : 10.1214/15-AOS1354SUPP
URL : http://arxiv.org/pdf/1410.5837

M. R. Garey, D. S. Johnson, and L. Stockmeyer, Some simplified NP-complete graph problems, Theoretical Computer Science, vol.1, issue.3, pp.237-267, 1976.
DOI : 10.1016/0304-3975(76)90059-1
URL : https://doi.org/10.1016/0304-3975(76)90059-1

R. Ge, J. D. Lee, and T. Ma, Matrix completion has no spurious local minimum, Advances in Neural Information Processing Systems (NIPS), 2016.

Z. Geng and N. Z. Shi, Algorithm AS 257: Isotonic Regression for Umbrella Orderings, Applied Statistics, vol.39, issue.3, pp.397-402, 1990.
DOI : 10.2307/2347399

C. Gentile and N. Littlestone, -norm algorithms, Proceedings of the twelfth annual conference on Computational learning theory , COLT '99, 1999.
DOI : 10.1145/307400.307405

T. L. Gertzen and M. Grötschel, Flinders Petrie, the travelling salesman problem, and the beginning of mathematical modeling in archaeology, Extra volume: Optimization stories), pp.199-210, 2012.

M. X. Goemans and D. P. Williamson, Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming, Journal of the ACM, vol.42, issue.6, pp.1115-1145, 1995.
DOI : 10.1145/227683.227684
URL : http://www.almaden.ibm.com/cs/people/dpw/Cut/maxcut.ps

A. A. Goldstein, Cauchy's method of minimization, Numerische Mathematik, vol.4, issue.3, pp.146-150, 1962.
DOI : 10.1007/BF01386306

H. Golub and C. F. Van-loan, Matrix Computations. Johns Hopkins Studies in the Mathematical Sciences, 2013.

G. J. Gordon, Regret bounds for prediction problems, Proceedings of the twelfth annual conference on Computational learning theory , COLT '99, 1999.
DOI : 10.1145/307400.307410
URL : http://www.cs.cmu.edu/Groups/reinforcement/mosaic/talks-1999/99-09-27.paper.ps.gz

J. C. Gower and G. J. Ross, Minimum Spanning Trees and Single Linkage Cluster Analysis, Applied Statistics, vol.18, issue.1, 1969.
DOI : 10.2307/2346439

M. Grant and S. Boyd, Graph Implementations for Nonsmooth Convex Programs, Recent Advances in Learning and Control, Lecture Notes in Control and Information Sciences, pp.95-110, 2008.
DOI : 10.1007/978-1-84800-155-8_7
URL : http://www.stanford.edu/~boyd/papers/pdf/graph_dcp.pdf

M. Grant and S. Boyd, CVX: Matlab Software for Disciplined Convex Programming, 2014.

E. Grave, A convex relaxation for weakly supervised relation extraction, Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014.
DOI : 10.3115/v1/D14-1166
URL : https://hal.archives-ouvertes.fr/hal-01080310

C. Gu, Smoothing Spline ANOVA Models, 2013.

L. Györfi and H. Walk, On the Averaged Stochastic Approximation for Linear Regression, SIAM Journal on Control and Optimization, vol.34, issue.1, pp.31-61, 1996.
DOI : 10.1137/S0363012992226661

L. Györfi, M. Kohler, A. Krzyzak, and H. Walk, A Distribution-Free Theory of Nonparametric Regression, 2006.
DOI : 10.1007/b97848

M. Gürbüzbalaban, A. Ozdaglar, and P. Parrilo, Why random reshuffling beats stochastic gradient descent. arXiv preprint, 2015.

W. Hahn, ???ber die Anwendung der Methode von Ljapunov auf Differenzengleichungen, Mathematische Annalen, vol.63, issue.5, pp.430-441, 1958.
DOI : 10.1007/BF01347793

A. Halanay, Quelques questions de la th??orie de la stabilit?? pour les syst??mes aux diff??rences finies, Archive for Rational Mechanics and Analysis, vol.64, issue.No. 2, pp.150-154, 1963.
DOI : 10.1115/1.3662605

J. Hannan, Approximation to Bayes risk in repeated play, Contributions to the Theory of Games, pp.97-139, 1957.
DOI : 10.1515/9781400882151-006

O. Hanner, On the uniform convexity of Lp and lp, Arkiv f??r Matematik, vol.3, issue.3, pp.239-244, 1956.
DOI : 10.1007/BF02589410

T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning, 2009.

E. Hazan, The convex optimization approach to regret minimization. Optimization for Machine Learning, pp.287-303, 2012.

E. Hazan and S. Kale, Extracting certainty from uncertainty: regret bounded by??variation in??costs, Machine Learning, vol.56, issue.2, 2010.
DOI : 10.1007/s10994-010-5175-x
URL : https://link.springer.com/content/pdf/10.1007%2Fs10994-010-5175-x.pdf

E. Hazan and T. Ma, A non-generative framework and convex relaxations for unsupervised learning, Advances in Neural Information Processing Systems (NIPS), 2016.

E. Hazan, A. Agarwal, and S. Kale, Logarithmic regret algorithms for online convex optimization, Mach. Learn, vol.69, issue.2-3, 2007.
DOI : 10.1007/11776420_37
URL : http://www.cs.princeton.edu/~satyen/papers/HKKA2006.pdf

M. R. Hestenes and E. Stiefel, Methods of conjugate gradients for solving linear systems, Journal of Research of the National Bureau of Standards, vol.49, issue.6, pp.409-436, 1952.
DOI : 10.6028/jres.049.044
URL : http://doi.org/10.6028/jres.049.044

J. Hiriart-urruty and C. Lemaréchal, Fundamentals of Convex Analysis
DOI : 10.1007/978-3-642-56468-0

, Grundlehren Text Editions, 2001.

A. E. Hoerl, Application of ridge analysis to regression problems, Chemical Engineering Progress, vol.58, issue.3, pp.54-59, 1962.

A. E. Hoerl and R. W. Kennard, Ridge Regression: Biased Estimation for Nonorthogonal Problems, Technometrics, vol.24, issue.1, pp.55-67, 1970.
DOI : 10.2307/1909769

D. Hsu, S. M. Kakade, and T. Zhang, A tail inequality for quadratic forms of subgaussian random vectors, Electronic Communications in Probability, vol.17, issue.0, 2012.
DOI : 10.1214/ECP.v17-2079

D. Hsu, S. M. Kakade, and T. Zhang, Random Design Analysis of Ridge Regression, Foundations of Computational Mathematics, vol.17, issue.36, pp.569-600, 2014.
DOI : 10.1162/0899766054323008

C. Hu, W. Pan, and J. T. Kwok, Accelerated gradient methods for stochastic optimization and online learning, Advances in Neural Information Processing Systems (NIPS), 2009.

G. Huang, J. Zhang, S. Song, and Z. Chen, Maximin separation probability clustering, Proceedings of the AAAI Conference on Artificial Intelligence, 2015.

A. Hyvärinen, J. Karhunen, and E. Oja, Independent Component Analysis, 2004.

P. Jain, S. M. Kakade, R. Kidambi, P. Netrapalli, and A. Sidford, Parallelizing stochastic approximation through mini-batching and tail-averaging. arXiv preprint, 2016.

P. Jain, S. M. Kakade, R. Kidambi, P. Netrapalli, and A. Sidford, Accelerating stochastic gradient descent, 2017.

C. Jin, S. M. Kakade, and P. Netrapalli, Provable efficient online matrix completion via non-convex stochastic gradient descent, Advances in Neural Information Processing Systems (NIPS), 2016.

R. Johnson and T. Zhang, Accelerating stochastic gradient descent using predictive variance reduction, Advances in Neural Information Processing Systems (NIPS), p.269, 2013.

A. Joulin and F. Bach, A convex relaxation for weakly supervised classifiers, Proceedings of the Conference on Machine Learning (ICML), 2012.
URL : https://hal.archives-ouvertes.fr/hal-00717450

A. Joulin, F. Bach, and J. Ponce, Discriminative clustering for image co-segmentation, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010.
DOI : 10.1109/CVPR.2010.5539868
URL : http://www.di.ens.fr/%7Efbach/cosegmentation_cvpr2010.pdf

A. Joulin, J. Ponce, and F. Bach, Efficient optimization for discriminative latent class models, Advances in Neural Information Processing Systems (NIPS), 2010.

M. Journée, F. Bach, P. Absil, and R. Sepulchre, Low-Rank Optimization on the Cone of Positive Semidefinite Matrices, SIAM Journal on Optimization, vol.20, issue.5, 2010.
DOI : 10.1137/080731359

A. Juditsky and A. S. Nemirovski, Functional aggregation for nonparametric regression, Ann. Statist, vol.28, issue.3, pp.681-712, 2000.

A. Kalai and S. Vempala, Efficient algorithms for online decision problems, Journal of Computer and System Sciences, vol.71, issue.3, pp.291-307, 2005.
DOI : 10.1016/j.jcss.2004.10.016
URL : http://www-math.mit.edu/~vempala/papers/online.ps

A. T. Kalai, A. Moitra, and G. Valiant, Efficiently learning mixtures of two Gaussians, Proceedings of the 42nd ACM symposium on Theory of computing, STOC '10, 2010.
DOI : 10.1145/1806689.1806765
URL : http://people.csail.mit.edu/moitra/docs/2g-full.pdf

R. E. Kalman, LYAPUNOV FUNCTIONS FOR THE PROBLEM OF LUR'E IN AUTOMATIC CONTROL, Proc. Nat. Acad. Sci. U.S.A, pp.201-205, 1963.
DOI : 10.1073/pnas.49.2.201

R. E. Kalman and J. E. Bertram, Control System Analysis and Design Via the ???Second Method??? of Lyapunov: II???Discrete-Time Systems, Journal of Basic Engineering, vol.82, issue.2, pp.394-400, 1960.
DOI : 10.1115/1.3662605

L. V. Kantorovitch, On an effective method of solving extremal problems for quadratic functionals, Dokl. Akad. Nauk SSSR, vol.48, pp.455-460, 1945.

S. B. Karmakar, An algorithm for finding a circuit of even length in a directed graph, International Journal of Systems Science, vol.10, issue.11, pp.1197-1201, 1984.
DOI : 10.1137/0210062

R. M. Karp, Reducibility among combinatorial problems, Complexity of Computer Computations, pp.85-103, 1972.
DOI : 10.1007/978-3-540-68279-0_8

D. G. Kendall, A statistical approach to Flinders Petrie's sequence-dating, Bull. Inst. Internat. Statist, vol.40, pp.657-681, 1963.

D. G. Kendall, Incidence matrices, interval graphs and seriation in archeology, Pacific Journal of Mathematics, vol.28, issue.3, pp.565-570, 1969.
DOI : 10.2140/pjm.1969.28.565
URL : http://msp.org/pjm/1969/28-3/pjm-v28-n3-p08-s.pdf

D. G. Kendall, A Mathematical Approach to Seriation, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, vol.269, issue.1193, pp.125-134, 1193.
DOI : 10.1098/rsta.1970.0091

D. G. Kendall, Abundance matrices and seriation in archaeology, Zeitschrift f???r Wahrscheinlichkeitstheorie und Verwandte Gebiete, vol.1, issue.2, pp.104-112, 1971.
DOI : 10.1007/BF00538862

L. G. Khachiyan, Polynomial algorithms in linear programming, USSR Computational Mathematics and Mathematical Physics, vol.20, issue.1, pp.1093-1096, 1979.
DOI : 10.1016/0041-5553(80)90061-0

D. P. Kingma and J. Ba, Adam: A method for stochastic optimization, Proceedings of the international conference on learning representations (ICLR), 2015.

J. Kivinen and M. K. Warmuth, Exponentiated Gradient versus Gradient Descent for Linear Predictors, Information and Computation, vol.132, issue.1, pp.1-63, 1997.
DOI : 10.1006/inco.1996.2612
URL : https://doi.org/10.1006/inco.1996.2612

K. C. , Proximal Minimization Methods with Generalized Bregman Functions, SIAM Journal on Control and Optimization, vol.35, issue.4, pp.1142-1168, 1997.
DOI : 10.1137/S0363012995281742

A. J. Kleywegt, A. Shapiro, T. Homem-de, and . Mello, The Sample Average Approximation Method for Stochastic Discrete Optimization, SIAM Journal on Optimization, vol.12, issue.2, 2002.
DOI : 10.1137/S1052623499363220

C. Köllmann, B. Bornkamp, and K. Ickstadt, Unimodal regression using Bernstein-Schoenberg splines and penalties, Biometrics, vol.67, issue.4, p.2014
DOI : 10.1111/j.1541-0420.2011.01620.x