M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen et al., Matthieu Devin, and others. Tensorflow: Large-scale machine learning on heterogeneous distributed systems, 2016.

H. David, G. E. Ackley, T. J. Hinton, and . Sejnowski, A learning algorithm for Boltzmann machines, Cognitive science, vol.9, issue.1, pp.147-169, 1985.

. Ad-aleksandorov, Almost everywhere existence of the second differential of a convex function and some properties of convex functions, Leningrad Univ. Ann, vol.37, pp.3-35, 1939.

A. Argyriou, T. Evgeniou, and M. Pontil, Convex multitask feature learning, Machine Learning, vol.73, pp.243-272, 2008.

F. R. Bach, Consistency of trace norm minimization, Journal of Machine Learning Research, vol.9, pp.1019-1048, 2008.
URL : https://hal.archives-ouvertes.fr/hal-00179522

S. Afonso, N. Bandeira, V. Boumal, and . Voroninski, On the low-rank approach for semidefinite programs arising in synchronization and community detection, 2016.

A. Beck and S. Shtern, Linearly convergent away-step conditional gradient for non-strongly convex functions, Mathematical Programming, pp.1-27, 2015.

A. Beck and M. Teboulle, A fast iterative shrinkage-thresholding algorithm for linear inverse problems, SIAM journal on imaging sciences, vol.2, issue.1, pp.183-202, 2009.

A. Bellet, Y. Liang, A. Bagheri-garakani, M. Balcan, and F. Sha, Distributed Frank-Wolfe algorithm: A unified framework for communication-efficient sparse learning. CoRR, abs/1404.2644, 2014.

S. Bhojanapalli, B. Neyshabur, and N. Srebro, Global optimality of local search for low rank matrix recovery, Advances in Neural Information Processing Systems, pp.3873-3881, 2016.

C. M. Bishop, Pattern recognition. Machine Learning, vol.128, pp.1-58, 2006.

S. Boyd and L. Vandenberghe, Convex optimization, 2004.

H. Busemann and W. Feller, Krümmungseigenschaften konvexer Flächen. Acta Mathematica, vol.66, issue.1, pp.1-47, 1936.

R. Cabral, F. D. Torre, J. P. Costeira, and A. Bernardino, Unifying Nuclear Norm and Bilinear Factorization Approaches for Low-Rank Matrix Decomposition, 2013 IEEE International Conference on Computer Vision, pp.2488-2495, 2013.

R. S. Cabral, F. Torre, J. P. Costeira, and A. Bernardino, Matrix Completion for Multi-label Image Classification, Advances in Neural Information Processing Systems, vol.24, pp.190-198, 2011.

J. Cai, E. J. Candès, and Z. Shen, A singular value thresholding algorithm for matrix completion, SIAM Journal on Optimization, vol.20, issue.4, pp.1956-1982, 2010.

E. Candès, Y. Eldar, T. Strohmer, and V. Voroninski, Phase Retrieval via Matrix Completion, SIAM Journal on Imaging Sciences, vol.6, issue.1, pp.199-225, 2013.

E. Candès and B. Recht, Exact matrix completion via convex optimization, Communications of the ACM, vol.55, issue.6, pp.111-119, 2012.

E. J. Candès and Y. Plan, Matrix completion with noise, Proceedings of the IEEE, vol.98, issue.6, pp.925-936, 2010.

E. J. Candès and T. Tao, The power of convex relaxation: Nearoptimal matrix completion, IEEE Transactions on Information Theory, vol.56, issue.5, pp.2053-2080, 2010.

R. Caruana, Multitask learning, Learning to learn, pp.95-133, 1998.

V. Chandrasekaran, B. Recht, P. A. Parrilo, and A. S. Willsky, The convex algebraic geometry of linear inverse problems, Communication, Control, and Computing (Allerton), 2010 48th Annual Allerton Conference on, pp.699-703, 2010.

Y. Chen, S. Bhojanapalli, S. Sanghavi, and R. Ward, Coherent Matrix Completion, PMLR, pp.674-682, 2014.

F. Chollet and . Keras, , 2015.

K. L. Clarkson, Coresets, sparse greedy approximation, and the Frank-Wolfe algorithm, ACM Transactions on Algorithms (TALG), vol.6, issue.4, p.63, 2010.

S. D. Ahipasaoglu, P. Sun, and M. J. Todd, Linear convergence of a modified Frank-Wolfe algorithm for computing minimum-volume enclosing ellipsoids, Optimisation Methods and Software, vol.23, issue.1, pp.5-19, 2008.

N. John, D. Darroch, and . Ratcliff, Generalized iterative scaling for log-linear models. The annals of mathematical statistics, pp.1470-1480, 1972.

A. , Smooth optimization with approximate gradient, SIAM Journal on Optimization, vol.19, issue.3, pp.1171-1183, 2008.

J. Dean and S. Ghemawat, MapReduce: simplified data processing on large clusters, Communications of the ACM, vol.51, issue.1, pp.107-113, 2008.

V. Fedorovich-demianov and A. Rubinov, Approximate methods in optimization problems, vol.32, 1970.

J. Deng, W. Dong, R. Socher, L. Li, K. Li et al., Imagenet: A large-scale hierarchical image database, Computer Vision and Pattern Recognition, pp.248-255, 2009.

P. Diggle, Analysis of longitudinal data, 2002.

F. Dinuzzo and K. Fukumizu, Learning low-rank output kernels, PMLR, pp.181-196, 2011.

M. Dudik, Z. Harchaoui, and J. Malick, Lifted coordinate descent for learning with trace-norm regularization, Artificial Intelligence and Statistics, pp.327-336, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00756802

J. C. Dunn, Rates of convergence for conditional gradient algorithms near singular and nonsingular extremals, SIAM Journal on Control and Optimization, vol.17, issue.2, pp.187-211, 1979.

T. Evgeniou and M. Pontil, Regularized multi-task learning, Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pp.109-117, 2004.

M. Fazel, H. Hindi, and S. Boyd, Rank minimization and applications in system theory, Proceedings of the 2004 American Control Conference, vol.4, pp.3273-3278, 2004.

M. Fazel, H. Hindi, and S. P. Boyd, A rank minimization heuristic with application to minimum order system approximation, American Control Conference, vol.6, pp.4734-4739, 2001.

H. Federer, Geometric measure theory, 2014.

M. Frank and P. Wolfe, An algorithm for quadratic programming, Naval Research Logistics Quarterly, vol.3, issue.1-2, pp.95-110, 1956.

M. Robert, P. Freund, and . Grigas, New analysis and results for the Frank-Wolfe method, Mathematical Programming, vol.155, issue.1-2, pp.199-230, 2016.

R. M. Freund, P. Grigas, and R. Mazumder, An Extended Frank-Wolfe Method with "In-Face" Directions, and Its Application to Low-Rank Matrix Completion, SIAM Journal on Optimization, vol.27, issue.1, pp.319-346, 2017.

M. Fukushima, A modified Frank-Wolfe algorithm for solving the traffic assignment problem, Transportation Research Part B: Methodological, vol.18, issue.2, pp.169-177, 1984.

D. Garber, Faster Projection-free Convex Optimization over the Spectrahedron, Advances in Neural Information Processing Systems, pp.874-882, 2016.

D. Garber, Projection-free Algorithms for Convex Optimization and Online Learning, 2016.

D. Garber and E. Hazan, A linearly convergent conditional gradient algorithm with applications to online and stochastic optimization, 2013.

D. Garber and E. Hazan, Playing non-linear games with linear oracles, Foundations of Computer Science (FOCS), 2013 IEEE 54th Annual Symposium on, pp.420-428, 2013.

D. Garber and E. Hazan, Faster Rates for the Frank-Wolfe Method over StronglyConvex Sets, PMLR, pp.541-549, 2015.

D. Garber, E. Hazan, and T. Ma, Online Learning of Eigenvectors, PMLR, pp.560-568, 2015.

R. F. Gariepy and W. P. Ziemer, Modern Real Analysis, 1995.

R. Ge, F. Huang, C. Jin, and Y. Yuan, Escaping From Saddle PointsOnline Stochastic Gradient for Tensor Decomposition, 2015.

R. Ge, J. D. Lee, and T. Ma, Matrix completion has no spurious local minimum, Advances in Neural Information Processing Systems, pp.2973-2981, 2016.

T. Gauthier-gidel, S. Jebara, and . Lacoste-julien, , 2016.

J. Giesen, M. Jaggi, and S. Laue, Optimizing over the growing spectrahedron, Algorithms-ESA, pp.503-514, 2012.

X. Michel, D. P. Goemans, and . Williamson, Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming, Journal of the ACM (JACM), vol.42, issue.6, pp.1115-1145, 1995.

A. Goldberg, B. Recht, J. Xu, R. Nowak, and X. Zhu, Transduction with Matrix Completion: Three Birds with One Stone, Advances in Neural Information Processing Systems 23, pp.757-765, 2010.

D. Goldfarb, G. Iyengar, and C. Zhou, Linear Convergence of Stochastic Frank Wolfe Variants, 2017.

E. Grave, G. R. Obozinski, and F. R. Bach, Trace lasso: a trace norm regularization for correlated designs, Advances in Neural Information Processing Systems, pp.2187-2195, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00620197

D. Gross, Recovering low-rank matrices from few coefficients in any basis, IEEE Transactions on Information Theory, vol.57, issue.3, pp.1548-1566, 2011.

D. Gross, Y. Liu, S. T. Flammia, S. Becker, and J. Eisert, Quantum state tomography via compressed sensing, Physical review letters, vol.105, issue.15, p.150401, 2010.

J. Guélat and P. Marcotte, Some comments on Wolfe's 'away step'. Mathematical Programming, vol.35, pp.110-119, 1986.

Z. Harchaoui, M. Douze, M. Paulin, M. Dudik, and J. Malick, Large-scale image classification with trace-norm regularization, 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp.3386-3393, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00728388

E. Hazan, Sparse approximate solutions to semidefinite programs, Latin American Symposium on Theoretical Informatics, pp.306-316, 2008.

E. Hazan and S. Kale, Projection-free online learning, 2012.

E. Hazan and H. Luo, Variance-reduced and projection-free stochastic optimization, International Conference on Machine Learning, pp.1263-1271, 2016.

, Elad Hazan and others. Introduction to online convex optimization. Foundations and Trends R in Optimization, vol.2, pp.157-325, 2016.

K. He, X. Zhang, S. Ren, and J. Sun, Deep residual learning for image recognition, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.770-778, 2016.

M. Juha and . Heinonen, Lectures on lipschitz analysis, 2005.

N. J. Higham, Matrix nearness problems and applications, 1988.

Z. Huo, F. Nie, and H. Huang, Robust and Effective Metric Learning Using Capped Trace Norm: Metric Learning via Capped Trace Norm, Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '16, pp.1605-1614, 2016.

M. Jaggi, Sparse convex optimization methods for machine learning, 2011.

M. Jaggi, Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization, ICML 2013 -Proceedings of the 30th International Conference on Machine Learning, 2013.

M. Jaggi and M. Sulovsk, A simple algorithm for nuclear norm regularized problems, Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp.471-478, 2010.

H. Ji, C. Liu, Z. Shen, and Y. Xu, Robust video denoising using low rank matrix completion, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp.1791-1798, 2010.

S. Ji and J. Ye, An accelerated gradient method for trace norm minimization, Proceedings of the 26th annual international conference on machine learning, pp.457-464, 2009.

R. Johnson and T. Zhang, Accelerating stochastic gradient descent using predictive variance reduction, Advances in neural information processing systems, pp.315-323, 2013.

B. Kanagal and V. Sindhwani, Rank selection in low-rank matrix approximations: A study of cross-validation for NMFs, Proc Conf Adv Neural Inf Process, vol.1, pp.10-15, 2010.

K. Koh, S. Kim, and S. Boyd, An interior-point method for large-scale l1-regularized logistic regression, Journal of Machine learning research, vol.8, pp.1519-1555, 2007.

V. Koltchinskii, K. Lounici, and A. B. Tsybakov, Nuclear-norm penalization and optimal rates for noisy low-rank matrix completion. The Annals of Statistics, pp.2302-2329, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00676868

Y. Koren, R. Bell, and C. Volinsky, Matrix Factorization Techniques for Recommender Systems, Computer, vol.42, issue.8, pp.30-37, 2009.

J. Kuczy?ski and H. Wo?niakowski, Estimating the largest eigenvalue by the power and Lanczos algorithms with a random start, SIAM journal on matrix analysis and applications, vol.13, issue.4, pp.1094-1122, 1992.

P. Kumar and E. Y?ld?r?m, A linearly convergent linear-time first-order algorithm for support vector classification with a core set result, INFORMS Journal on Computing, vol.23, issue.3, pp.377-391, 2011.

A. Kyrola, G. E. Blelloch, and C. Guestrin, Graphchi: Large-scale graph computation on just a pc, 2012.

S. Lacoste, -. , and M. Jaggi, On the global linear convergence of Frank-Wolfe optimization variants, Advances in Neural Information Processing Systems, pp.496-504, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01248675

S. Lacoste-julien, M. Jaggi, M. Schmidt, and P. Pletscher, Blockcoordinate Frank-Wolfe optimization for structural SVMs, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00720158

J. Lafond, H. Wai, and E. Moulines, Convergence analysis of a stochastic projection-free algorithm, stat, vol.1050, issue.5, 2015.

J. L. , Joseph Louis) Lagrange. Mécanique analytique, 1811.

N. M. Laird and J. H. Ware, Random-effects models for longitudinal data, Biometrics, pp.963-974, 1982.

G. Lan and Y. Zhou, Conditional gradient sliding for convex optimization, SIAM Journal on Optimization, vol.26, issue.2, pp.1379-1409, 2016.

C. Lanczos, An iteration method for the solution of the eigenvalue problem of linear differential and integral operators, 1950.
URL : https://hal.archives-ouvertes.fr/hal-01712947

D. Neil, J. C. Lawrence, and . Platt, Learning to learn with the informative vector machine, Proceedings of the twenty-first international conference on Machine learning, p.65, 2004.

H. Lebesgue, Sur l'intégration des fonctions discontinues, 1910.

L. J. Leblanc, R. V. Helgason, and D. E. Boyce, Improved efficiency of the Frank-Wolfe algorithm for convex network programs, Transportation Science, vol.19, issue.4, pp.445-462, 1985.

J. D. Lee, B. Recht, N. Srebro, J. Tropp, and R. R. Salakhutdinov, Practical large-scale optimization for max-norm regularization, Advances in Neural Information Processing Systems, pp.1297-1305, 2010.

E. S. Levitin and B. T. Polyak, Constrained minimization methods. USSR Computational Mathematics and Mathematical Physics, vol.6, issue.5, pp.1-50, 1966.

M. Li, D. G. Andersen, J. W. Park, A. J. Smola, A. Ahmed et al., Scaling Distributed Machine Learning with the Parameter Server, OSDI, vol.1, 2014.

Y. Kung, S. L. Liang, and . Zeger, Longitudinal data analysis using generalized linear models, Biometrika, pp.13-22, 1986.

W. Liu, C. Mu, R. Ji, S. Ma, J. R. Smith et al., Low-rank Similarity Metric Learning in High Dimensions, Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, AAAI'15, pp.2792-2799, 2015.

Z. Liu and I. Tsang, Approximate Conditional Gradient Descent on Multi-Class Classification, AAAI, pp.2301-2307, 2017.

F. Locatello, R. Khanna, M. Tschannen, and M. Jaggi, A Unified Optimization View on Generalized Matching Pursuit and Frank-Wolfe, PMLR, pp.860-868, 2017.

Y. Low, D. Bickson, J. Gonzalez, C. Guestrin, A. Kyrola et al., Distributed GraphLab: a framework for machine learning and data mining in the cloud, Proceedings of the VLDB Endowment, vol.5, pp.716-727, 2012.

S. Ma, D. Goldfarb, and L. Chen, Fixed Point and Bregman Iterative Methods for Matrix Rank Minimization, 2009.

M. Mahdavi, L. Zhang, and R. Jin, Mixed optimization for smooth functions, Advances in Neural Information Processing Systems, pp.674-682, 2013.

J. Mairal, Optimization with first-order surrogate functions, Proceedings of the 30th International Conference on Machine Learning (ICML-13), pp.783-791, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00822229

G. Malewicz, M. H. Austern, J. C. Aart, J. C. Bik, I. Dehnert et al., Pregel: a system for large-scale graph processing, Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pp.135-146, 2010.

R. Malouf, A comparison of algorithms for maximum entropy parameter estimation, proceedings of the 6th conference on Natural language learning, vol.20, pp.1-7, 2002.

L. Olvi, B. Mangasarian, and . Recht, Probability of unique integer solution to a system of linear equations, European Journal of Operational Research, vol.214, issue.1, pp.27-30, 2011.

R. Mcdonald, M. Mohri, N. Silberman, D. Walker, and G. S. Mann, Efficient large-scale distributed training of conditional maximum entropy models, Advances in Neural Information Processing Systems, pp.1231-1239, 2009.

G. Meyer, S. Bonnabel, and R. Sepulchre, Linear regression under fixed-rank constraints: a Riemannian approach, Proceedings of the 28th international conference on machine learning, 2011.

B. F. Mitchell, V. F. Demyanov, and V. N. Malozemov, Finding the point of a polyhedron closest to the origin, SIAM Journal on Control, vol.12, issue.1, pp.19-26, 1974.

A. Moharrer and S. Ioannidis, Distributing frank-wolfe via map-reduce, ICDM, 2017.

C. Mu, Y. Zhang, J. Wright, and D. Goldfarb, Scalable Robust Matrix Recovery: Frank-Wolfe Meets Proximal Methods, SIAM Journal on Scientific Computing, vol.38, issue.5, pp.3291-3317, 2016.

Y. Nesterov, Introductory lectures on convex optimization: A basic course, vol.87, 2013.

, Yurii Nesterov and others. Gradient methods for minimizing composite objective function. Core Louvain-la-Neuve, 2007.

H. Ouyang and A. Gray, Fast stochastic Frank-Wolfe algorithms for nonlinear SVMs, Proceedings of the 2010 SIAM International Conference on Data Mining, pp.245-256, 2010.

J. Pena and D. Rodriguez, Polytope conditioning and linear convergence of the Frank-Wolfe algorithm, 2015.

J. Pena, D. Rodríguez, and N. Soheili, On the von Neumann and Frank-Wolfe Algorithms with Away Steps, SIAM Journal on Optimization, vol.26, issue.1, pp.499-512, 2016.

D. Peteiro, -. Barral, and B. Guijarro-berdiñas, A survey of methods for distributed machine learning, Progress in Artificial Intelligence, vol.2, issue.1, pp.1-11, 2013.

W. Ping, Q. Liu, and A. T. Ihler, Learning Infinite RBMs with Frank-Wolfe, Advances in Neural Information Processing Systems, pp.3063-3071, 2016.

P. Ting-kei-pong, S. Tseng, J. Ji, and . Ye, Trace norm regularization: Reformulations, algorithms, and multi-task learning, SIAM Journal on Optimization, vol.20, issue.6, pp.3465-3489, 2010.

B. Recht, A simpler approach to matrix completion, Journal of Machine Learning Research, vol.12, pp.3413-3430, 2011.

M. Stephen and . Robinson, Generalized equations and their solutions, Part II: Applications to nonlinear programming. Optimality and Stability in Mathematical Programming, pp.200-221, 1982.

R. Tyrell and R. , Convex analysis, 2015.

W. Rudin and . Others, Principles of mathematical analysis, vol.3, 1964.

O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh et al., ImageNet Large Scale Visual Recognition Challenge, International Journal of Computer Vision (IJCV), vol.115, issue.3, pp.211-252, 2015.

J. D. Singer and J. B. Willett, Applied longitudinal data analysis: Modeling change and event occurrence, 2003.

N. Srebro, J. Rennie, and T. S. Jaakkola, Maximummargin matrix factorization, Advances in neural information processing systems, pp.1329-1336, 2005.

F. Jos and . Sturm, Using SeDuMi 1.02, a MATLAB toolbox for optimization over symmetric cones. Optimization methods and software, vol.11, pp.625-653, 1999.

J. Sun, Q. Qu, and J. Wright, When Are Nonconvex Problems Not Scary?, 2015.

A. C. Thompson, Cambridge University Press, vol.63, pp.9-16, 1996.

R. Tibshirani, Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society. Series B (Methodological), pp.267-288, 1996.

S. Kim-chuan-toh and . Yun, An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems, Pacific Journal of optimization, vol.6, p.15, 2010.

. Kim-chuan, M. J. Toh, R. H. Todd, and . Tütüncü, SDPT3-a MATLAB software package for semidefinite programming, version 1.3. Optimization methods and software, vol.11, pp.545-581, 1999.

J. Tsitsiklis, D. Bertsekas, and M. Athans, Distributed asynchronous deterministic and stochastic gradient optimization algorithms, IEEE transactions on automatic control, vol.31, issue.9, pp.803-812, 1986.

B. Vandereycken, Low-rank matrix completion by Riemannian optimization, SIAM Journal on Optimization, vol.23, issue.2, pp.1214-1236, 2013.

H. Wai, J. Lafond, A. Scaglione, and E. Moulines, Decentralized FrankWolfe Algorithm for Convex and Non-convex Problems, IEEE Transactions on Automatic Control, 2017.

H. Wai, A. Scaglione, J. Lafond, and E. Moulines, Fast and privacy preserving distributed low-rank regression, Acoustics, Speech and Signal Processing (ICASSP, pp.4451-4455, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01668252

Y. Wang, V. Sadhanala, W. Dai, and W. Neiswanger, Suvrit Sra, and Eric Xing. Parallel and distributed block-coordinate Frank-Wolfe algorithms, International Conference on Machine Learning, pp.1548-1557, 2016.

P. Wolfe, Convergence theory in nonlinear programming. Integer and nonlinear programming, pp.1-36, 1970.

E. P. Xing, Q. Ho, W. Dai, J. K. Kim, J. Wei et al., Petuum: A new platform for distributed machine learning on big data, IEEE Transactions on Big Data, vol.1, issue.2, pp.49-67, 2015.

E. P. Xing, Q. Ho, P. Xie, and W. Dai, Strategies and Principles of Distributed Machine Learning on Big Data, 2015.

H. Xu, Convergence Analysis of the Frank-Wolfe Algorithm and Its Generalization in Banach Spaces, 2017.

B. Yao, Z. Zhao, and K. Liu, Metric learning with trace-norm regularization for person re-identification, 2014 IEEE International Conference on Image Processing (ICIP), pp.2442-2446, 2014.

H. Yu, F. Huang, and C. Lin, Dual coordinate descent methods for logistic regression and maximum entropy models, Machine Learning, vol.85, pp.41-75, 2011.

M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma et al., Fast and interactive analytics over Hadoop data with Spark, USENIX Login, vol.37, issue.4, pp.45-51, 2012.

M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma et al., Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing, Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation, pp.2-2, 2012.

L. Scott, K. Zeger, and . Liang, Longitudinal data analysis for discrete and continuous outcomes, Biometrics, pp.121-130, 1986.

J. Zhou, J. Chen, and J. Ye, MALSAR: Multi-tAsk Learning via StructurAl Regularization, 2011.

J. Zhou, L. Yuan, J. Liu, and J. Ye, A multi-task learning formulation for predicting disease progression, Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, pp.814-822

J. Zhou, J. Chen, and J. Ye, Multi-task learning: Theory, algorithms, and applications, 2012.

J. Zhu, N. Chen, and E. P. Xing, Infinite latent SVM for classification and multi-task learning, Advances in neural information processing systems, pp.1620-1628, 2011.

M. Zinkevich, M. Weimer, L. Li, and A. J. Smola, Parallelized stochastic gradient descent, Advances in neural information processing systems, pp.2595-2603, 2010.

R. Ñanculef, E. Frandi, C. Sartori, and H. Allende, A novel Frank-Wolfe algorithm. Analysis and applications to large-scale SVM training, Information Sciences, vol.285, pp.66-99, 2014.