Z. Akata, F. Perronnin, Z. Harchaoui, and C. Schmid, Good Practice in Large-Scale Learning for Image Classification, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.36, issue.3, 2014.
DOI : 10.1109/TPAMI.2013.146

URL : https://hal.archives-ouvertes.fr/hal-00835810

Y. Amit, M. Fink, N. Srebro, and S. Ullman, Uncovering shared structures in multiclass classification, Proceedings of the 24th international conference on Machine learning, ICML '07, 2007.
DOI : 10.1145/1273496.1273499

F. Bach, Consistency of trace norm minimization, The Journal of Machine Learning Research, 2008.
URL : https://hal.archives-ouvertes.fr/hal-00179522

F. Bach, S. Lacoste-julien, and G. Obozinski, On the equivalence between herding and conditional gradient algorithms, Proceedings of the 29th International Conference on Machine Learning (ICML-12), ICML '12, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00681128

F. R. Bach, R. Jenatton, J. Mairal, and G. Obozinski, Optimization with Sparsity-Inducing Penalties, Machine Learning, pp.1-106, 2012.
DOI : 10.1561/2200000015

URL : https://hal.archives-ouvertes.fr/hal-00613125

P. Bartlett and M. Wegkamp, Classification with a reject option using a hinge loss, The Journal of Machine Learning Research, 2008.

A. Beck and M. Teboulle, A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems, SIAM Journal on Imaging Sciences, vol.2, issue.1, 2009.
DOI : 10.1137/080716542

A. Beck and M. Teboulle, Smoothing and First Order Methods: A Unified Framework, SIAM Journal on Optimization, vol.22, issue.2, 2012.
DOI : 10.1137/100818327

S. Becker, J. Bobin, C. , and E. J. , NESTA: A Fast and Accurate First-Order Method for Sparse Recovery, SIAM Journal on Imaging Sciences, vol.4, issue.1, 2011.
DOI : 10.1137/090756855

URL : https://authors.library.caltech.edu/23706/1/Becker2011p13865Siam_J_Imaging_Sci.pdf

A. Ben-tal and M. Teboulle, A smoothing technique for nondifferentiable optimization problems, Optimization, pp.1-11, 1989.
DOI : 10.1007/BFb0083582

D. Bertsekas, Stochastic optimization problems with nondifferentiable cost functionals, Journal of Optimization Theory and Applications, vol.12, issue.2, pp.218-231, 1973.
DOI : 10.1016/0022-247X(65)90049-1

D. Bertsekas, Nonlinear Programming, 2004.

S. Boyd and L. Vandenberghe, Convex optimization, 2004.

J. Brodie, I. Daubechies, C. De-mol, D. Giannone, L. et al., Sparse and stable Markowitz portfolios, Proceedings of the National Academy of Sciences, pp.12267-12272, 2009.
DOI : 10.1214/009053604000000067

URL : https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2718382/pdf

R. Bruck, On the week convergence of an ergodic iteration for the solution of variational inequalities for monotone operators in hilbert space, J. math. Anal, 1977.

S. Bubeck, Theory of convex optimization for machine learning, 2014.

R. H. Byrd, P. Lu, J. Nocedal, and C. Zhu, A Limited Memory Algorithm for Bound Constrained Optimization, SIAM Journal on Scientific Computing, vol.16, issue.5, pp.1190-1208, 1995.
DOI : 10.1137/0916069

URL : http://www.ece.northwestern.edu/~nocedal/PSfiles/limited.ps.gz

E. J. Candès and Y. Plan, Matrix Completion With Noise, Proceedings of the IEEE, 2010.
DOI : 10.1109/JPROC.2009.2035722

E. J. Candès and B. Recht, Exact matrix completion via convex optimization, Foundations of Computational mathematics, 2009.

V. Chandrasekaran, B. Recht, P. A. Parrilo, and A. S. Willsky, The Convex Geometry of Linear Inverse Problems, Foundations of Computational Mathematics, vol.1, issue.10, pp.805-849, 2012.
DOI : 10.1007/978-1-4613-8431-1

P. Combettes and J. Pesquet, Proximal splitting methods in signal processing In Fixed-point algorithms for inverse problems in science and engineering, pp.185-212, 2011.

P. Combettes and V. Wajs, Signal Recovery by Proximal Forward-Backward Splitting, Multiscale Modeling & Simulation, vol.4, issue.4, pp.1168-1200, 2005.
DOI : 10.1137/050626090

URL : https://hal.archives-ouvertes.fr/hal-00017649

B. Cox, A. Juditsky, and A. Nemirovski, Dual subgradient algorithms for large-scale nonsmooth learning problems, Mathematical Programming, vol.120, issue.1, 2014.
DOI : 10.1017/S096249291300007X

URL : https://hal.archives-ouvertes.fr/hal-00978358

J. Cullum, R. Willoughby, and M. Lake, A Lanczos Algorithm for Computing Singular Values and Vectors of Large Matrices, SIAM Journal on Scientific and Statistical Computing, vol.4, issue.2, 1983.
DOI : 10.1137/0904015

V. Demyanov and A. Rubinov, Approximate methods in optimization problems, 1970.

J. Deng, A. C. Berg, K. Li, and L. Fei-fei, What does classifying more than 10,000 image categories tell us? In Computer Vision, 2010.
DOI : 10.1007/978-3-642-15555-0_6

URL : http://www.cs.princeton.edu/%7Ejiadeng/DengBergLiFeiFei_ECCV2010.pdf

J. Deng, W. Dong, R. Socher, L. Li, K. Li et al., ImageNet: A Large-Scale Hierarchical Image Database, CVPR09, 2009.

O. Devolder, F. Glineur, and Y. Nesterov, First-order methods of smooth convex optimization with inexact oracle, Mathematical Programming, vol.110, issue.3, 2014.
DOI : 10.1007/978-3-642-82118-9

J. Duchi, S. Shalev-shwartz, Y. Singer, C. , and T. , Efficient projections onto the l 1- ball for learning in high dimensions, Proceedings of the 25th international conference on Machine learning, pp.272-279, 2008.

J. C. Duchi, P. Bartlett, and M. J. Wainwright, Randomized Smoothing for Stochastic Optimization, SIAM Journal on Optimization, vol.22, issue.2, 2012.
DOI : 10.1137/110831659

M. Dudik, Z. Harchaoui, and J. Malick, Lifted coordinate descent for learning with tracenorm regularization, Proceedings of the 15th International Conference on Artificial Intelligence and Statistics (AISTATS), 2012.
URL : https://hal.archives-ouvertes.fr/hal-00756802

M. D. Ekstrand, J. T. Riedl, and J. A. Konstan, Collaborative Filtering Recommender Systems, Foundations and Trends?? in Human???Computer Interaction, vol.4, issue.2, pp.81-173, 2011.
DOI : 10.1561/1100000009

M. Frank and P. Wolfe, An algorithm for quadratic programming, Naval Research Logistics Quarterly, vol.3, issue.1-2, 1956.
DOI : 10.2140/pjm.1955.5.183

D. Garber and E. Hazan, A Linearly Convergent Conditional Gradient Algorithm with Applications to Online and Stochastic Optimization ArXiv e-prints, pp.1301-4666, 2013.
DOI : 10.1137/140985366

D. Garber and E. Hazan, Faster rates for the frank-wolfe method over strongly-convex sets, Proceedings of the 32nd International Conference on Machine Learning, 2015.

Y. Grandvalet, A. Rakotomamonjy, J. Keshet, and S. Canu, Support vector machines with a reject option, Advances in neural information processing systems, pp.537-544, 2009.

C. Guzman and A. Nemirovski, On Lower Complexity Bounds for Large-Scale Smooth Convex Optimization. ArXiv e-prints, pp.1307-5001, 2013.

Z. Harchaoui, M. Douze, M. Paulin, M. Dudik, and J. Malick, Large-scale image classification with trace-norm regularization, 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012.
DOI : 10.1109/CVPR.2012.6248078

URL : https://hal.archives-ouvertes.fr/hal-00728388

Z. Harchaoui, A. Juditsky, and A. Nemirovski, Conditional gradient algorithms for machine learning, NIPS Workshop on Optimization for ML, 2012.

Z. Harchaoui, A. Juditsky, and A. Nemirovski, Conditional gradient algorithms for normregularized smooth convex optimization, Mathematical Programming, Series A, pp.1-30, 2014.
DOI : 10.1007/s10107-014-0778-9

URL : https://hal.archives-ouvertes.fr/hal-00978368

T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning, 2008.

E. Hazan, Sparse Approximate Solutions to Semidefinite Programs, LATIN 2008: Theoretical Informatics, pp.306-316, 2008.
DOI : 10.1007/978-3-540-78773-0_27

E. Hazan and S. Kale, Projection-free online learning, ICML, 2012.

J. Hiriart-urruty and C. Lemarechal, Fundamentals of Convex Analysis, 1993.
DOI : 10.1007/978-3-642-56468-0

J. Hiriart-urruty and C. Lemarechal, Convex Analysis and Minimization Algorithms II (Grundlehren Der Mathematischen Wissenschaften), 1996.

P. Huber, Robust statistics, 1981.
DOI : 10.1002/0471725250

H. G. Hummel, B. Van-den-berg, A. J. Berlanga, H. Drachsler, J. Janssen et al., Combining social-based and information-based approaches for personalised recommendation on sequencing learning activities, International Journal of Learning Technology, vol.3, issue.2, 2007.
DOI : 10.1504/IJLT.2007.014842

L. Jacob, G. Obozinski, and J. Vert, Group lasso with overlap and graph lasso, Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, pp.433-440, 2009.
DOI : 10.1145/1553374.1553431

URL : http://www.cs.mcgill.ca/~icml2009/papers/471.pdf

M. Jaggi, Revisiting Frank-Wolfe: Projection-free sparse convex optimization, ICML, pp.427-435, 2013.

M. Jaggi and M. Sulovský, A Simple Algorithm for Nuclear Norm Regularized Problems, ICML 2010: Proceedings of the 27th international conference on Machine learning, 2010.

A. Juditsky and A. Nemirovski, First order methods for nonsmooth convex large-scale optimization . Optimization for Machine Learning, 2010.
URL : https://hal.archives-ouvertes.fr/hal-00981863

G. Kim, E. P. Xing, L. Fei-fei, and T. Kanade, Distributed cosegmentation via submodular optimization on anisotropic diffusion, 2011 International Conference on Computer Vision, pp.169-176, 2011.

J. Kuczynski and H. Wozniakowski, Estimating the Largest Eigenvalue by the Power and Lanczos Algorithms with a Random Start, SIAM Journal on Matrix Analysis and Applications, vol.13, issue.4, 1992.
DOI : 10.1137/0613066

S. Lacoste-julien and M. Jaggi, On the global linear convergence of frank-wolfe optimization variants, Advances in Neural Information Processing Systems, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01248675

S. Lacoste-julien, M. Jaggi, M. Schmidt, and P. Pletscher, Block-Coordinate Frank-Wolfe Optimization for Structural SVMs, ICML 2013, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00720158

S. Lacoste-julien, F. Lindsten, and F. R. Bach, Sequential kernel herding: Frank-wolfe optimization for particle filtering, AISTATS, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01099197

J. Lafond, O. Klopp, E. Moulines, and J. Salmon, Probabilistic low-rank matrix completion on finite alphabets, Advances in Neural Information Processing Systems 27, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01081805

G. Lan, The Complexity of Large-scale Convex Programming under a Linear Optimization Oracle. ArXiv e-prints, 2013.

C. Lanczos, Linear differential operators, 1961.

A. Lewis, Nonsmooth analysis of eigenvalues, Mathematical Programming, pp.1-24, 1999.
DOI : 10.1007/s10107980004a

J. Mairal, Optimization with first-order surrogate functions, Proceedings of The 30th International Conference on Machine Learning, pp.783-791, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00822229

S. Mallat, A wavelet tour of signal processing: the sparse way Academic press, 2009.

H. Markowitz, Portfolio selection. The journal of finance, 1952.

J. Mattingley and S. Boyd, Automatic code generation for real-time convex optimization. Convex optimization in signal processing and communications, 2009.
DOI : 10.1017/cbo9780511804458.002

J. Mattingley and S. Boyd, Real-Time Convex Optimization in Signal Processing, IEEE Signal Processing Magazine, vol.27, issue.3, 2010.
DOI : 10.1109/MSP.2010.936020

URL : http://www.stanford.edu/~boyd/papers/pdf/sig_proc_mag.pdf

B. Miller, I. Albert, S. K. Lam, J. Konstan, and J. Riedl, MovieLens Unplugged: Experiences with a Recommender System on Four Mobile Devices, ACM SIGCHI Conference on Human Factors in Computing Systems, 2003.
DOI : 10.1007/978-1-4471-3754-2_16

J. Moreau, Proximit?? et dualit?? dans un espace hilbertien, Bulletin de la Société mathématique de France, vol.79, pp.273-299, 1965.
DOI : 10.24033/bsmf.1625

A. Nemirovski and D. Yudin, Information-based complexity of mathematical programming, 1983.

A. Izvestia, . Sssr, and . Ser, Tekhnicheskaya Kibernetika (the journal is translated to English as Engineering Cybernetics, Soviet J. Computer & Systems Sci, p.1

Y. Nesterov, A method of solving a convex programming problem with convergence rate, Soviet Mathematics Doklady, pp.372-376, 1983.

Y. Nesterov, Introductory lectures on convex optimization: A basic course, 2004.
DOI : 10.1007/978-1-4419-8853-9

Y. Nesterov, Smooth minimization of non-smooth functions, Mathematical Programming, 2005.
DOI : 10.1007/s10107-004-0552-5

URL : http://dial.uclouvain.be/downloader/downloader.php?pid=boreal:4894&datastream=PDF_01&disclaimer=ae9d96a001442217ab652bd3df962abdbef8b43ca2512b82abc499488659738c

Y. Nesterov, Gradient methods for minimizing composite objective function, CORE DISUSSION PAPER, vol.76, 2007.
DOI : 10.1007/s10107-012-0629-5

Y. Nesterov, Smoothing Technique and its Applications in Semidefinite Optimization, Mathematical Programming, vol.16, issue.1, pp.245-259, 2007.
DOI : 10.1007/978-1-4419-8853-9

URL : http://dial.uclouvain.be/downloader/downloader.php?pid=boreal:4801&datastream=PDF_01&disclaimer=62e9ccb2bf13ebab29489ad5220c97fbc2cd7924b4ce607c36eb3ab804140423

Y. Nesterov, Gradient methods for minimizing composite functions, Mathematical Programming, vol.51, issue.1, 2013.
DOI : 10.1109/TIT.2005.864420

F. Orabona, A. Argyriou, and N. Srebro, Prisma: Proximal iterative smoothing algorithm, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00855993

G. Passty, Ergodic convergence to a zero of the sum of monotone operatorsin hilbert space, J. Math. Anal. Appl, 1979.

F. Pierucci, Z. Harchaoui, and J. Malick, A smoothing approach for composite conditional gradient with nonsmooth loss, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01096630

R. T. Rockafellar, Convex analysis (princeton mathematical series), 1970.

G. Rogez, M. Khademi, I. Supan?i?, J. Montiel, J. And-ramanan et al., 3D Hand Pose Detection in Egocentric RGB-D Images, Computer Vision-ECCV 2014 Workshops, 2014.
DOI : 10.1007/978-3-319-16178-5_25

URL : http://arxiv.org/pdf/1412.0065

C. Rother, T. Minka, A. Blake, and V. Kolmogorov, Cosegmentation of Image Pairs by Histogram Matching - Incorporating a Global Constraint into MRFs, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Volume 1 (CVPR'06), 2006.
DOI : 10.1109/CVPR.2006.91

M. Schmidt, N. L. Roux, and F. R. Bach, Convergence rates of inexact proximal-gradient methods for convex optimization, Advances in neural information processing systems, 2011.
URL : https://hal.archives-ouvertes.fr/inria-00618152

S. Shalev-shwartz, A. Gonen, and O. Shamir, Large-Scale Convex Minimization with a Low- Rank Constraint, ICML, 2011.

N. Srebro, J. Rennie, and T. S. Jaakkola, Maximum-margin matrix factorization, Advances in neural information processing systems. NIPS, 2004.

V. Temlyakov, Greedy approximation in convex optimization. Constructive Approximation, 2012.
DOI : 10.1007/s00365-014-9272-0

R. Tomioka and T. Suzuki, Convex tensor decomposition via structured schatten norm regularization, Advances in neural information processing systems, 2013.

B. Turlach, W. Venables, W. , and S. , Simultaneous Variable Selection, Technometrics, vol.47, issue.3, 2005.
DOI : 10.1198/004017005000000139

N. Usunier, D. Buffoni, and P. Gallinari, Ranking with ordered weighted pairwise classification, Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, 2009.
DOI : 10.1145/1553374.1553509

URL : https://hal.archives-ouvertes.fr/hal-01297974

V. Vapnik, The nature of statistical learning theory, 2005.

V. N. Vapnik and A. J. Chervonenkis, Theory of pattern recognition, 1974.

S. Vicente, C. Rother, and V. Kolmogorov, Object cosegmentation, CVPR 2011, 2011.
DOI : 10.1109/CVPR.2011.5995530

M. Weimer, A. Karatzoglou, Q. V. Le, and A. J. Smola, Cofi rank -maximum margin matrix factorization for collaborative ranking, NIPS, 2007.

R. Yager, On ordered weighted averaging aggregation operators in multicriteria decisionmaking. Systems, Man and Cybernetics, IEEE Transactions on, 1988.

M. Yuan and Y. Lin, Model selection and estimation in regression with grouped variables, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol.58, issue.1, pp.49-67, 2006.
DOI : 10.1198/016214502753479356

X. Zhang, Y. Yu, and D. Schuurmans, Accelerated training for matrix-norm regularization: A boosting approach, NIPS, 2012.

P. Zhao, G. Rocha, Y. , and B. , The composite absolute penalties family for grouped and hierarchical variable selection. The Annals of Statistics, pp.3468-3497, 2009.
DOI : 10.1214/07-aos584

URL : http://doi.org/10.1214/07-aos584