P. Ablin, J. Cardoso, and A. Gramfort, Faster independent component analysis by preconditioning with Hessian approximations, IEEE Trans. Signal Process, vol.66, issue.15, p.33, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01552340

A. Argyriou, T. Evgeniou, and M. Pontil, Multi-task feature learning, NeurIPS, p.24, 2006.

Ü. Aydin, J. Vorwerk, M. Dümpelmann, P. Küpper, H. Kugel et al., Combined EEG/MEG can outperform single modality EEG or MEG source reconstruction in presurgical epilepsy diagnosis, PloS one, vol.10, issue.3, p.33, 2015.

F. Bach, R. Jenatton, J. Mairal, and G. Obozinski, Convex optimization with sparsityinducing norms. Foundations and Trends in Machine Learning, vol.4, p.44, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00937150

H. H. Bauschke and P. L. Combettes, Convex analysis and monotone operator theory in Hilbert spaces, vol.27, p.53, 2011.
URL : https://hal.archives-ouvertes.fr/hal-01517477

A. Beck, First-Order Methods in Optimization, vol.25, p.100, 2017.

A. Beck and M. Teboulle, A fast iterative shrinkage-thresholding algorithm for linear inverse problems, SIAM J. Imaging Sci, vol.2, issue.1, p.45, 2009.

A. Beck and M. Teboulle, Gradient-based algorithms with applications to signal-recovery problems, p.24, 2010.

A. Beck and M. Teboulle, Smoothing and first order methods: A unified framework, SIAM J. Optim, vol.22, issue.2, p.97, 2012.

S. R. Becker, E. J. Candès, and M. C. Grant, Templates for convex cone problems with applications to sparse signal recovery, Math. Program. Comput, vol.3, issue.3, p.86, 2011.

S. Behnel, R. Bradshaw, C. Citro, L. Dalcin, D. S. Seljebotn et al., The best of both worlds. Computing in Science Engineering, vol.13, p.78, 2011.

A. Belloni, V. Chernozhukov, and L. Wang, Square-root Lasso: pivotal recovery of sparse signals via conic programming, Biometrika, vol.98, issue.4, p.125, 2011.

Q. Bertrand, M. Massias, A. Gramfort, and J. Salmon, Handling correlated and repeated measurements with the smoothed multivariate square-root Lasso, NeurIPS, vol.90, p.103, 2019.
URL : https://hal.archives-ouvertes.fr/hal-02010014

P. J. Bickel, Y. Ritov, and A. B. Tsybakov, Simultaneous analysis of Lasso and Dantzig selector, Ann. Statist, vol.37, issue.4, p.86, 2009.
URL : https://hal.archives-ouvertes.fr/hal-00401585

K. Bibliography, J. Bleakley, and . Vert, The group fused lasso for multiple change-point detection, p.26, 2011.

A. Boisbunon, R. Flamary, and A. Rakotomamonjy, Active set strategy for highdimensional non-convex sparse optimization problems, ICASSP, vol.44, p.75, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01025585

R. Bollapragada, D. Scieur, and A. , Nonlinear acceleration of momentum and primal-dual algorithms, p.51, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01893921

A. Bonnefoy, V. Emiya, L. Ralaivola, and R. Gribonval, A dynamic screening principle for the lasso, EUSIPCO, vol.55, p.66, 2014.
URL : https://hal.archives-ouvertes.fr/hal-00880787

A. Bonnefoy, V. Emiya, L. Ralaivola, and R. Gribonval, Dynamic screening: accelerating first-order algorithms for the Lasso and Group-Lasso, IEEE Trans. Signal Process, vol.63, issue.19, p.55, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01084986

S. Boyd and L. Vandenberghe, Convex optimization, vol.91, p.105, 2004.

S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine Learning, vol.3, p.24, 2011.

P. Bühlmann and J. Mandozzi, High-dimensional variable screening and bias in subsequent inference, with an empirical comparison, Computational Statistics, vol.29, issue.3, p.130, 2014.

L. Buitinck, G. Louppe, M. Blondel, F. Pedregosa, A. Mueller et al., API design for machine learning software: experiences from the scikit-learn project, p.78, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00856511

E. J. Candès, J. Romberg, and T. Tao, Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information, IEEE Trans. Inf. Theory, vol.52, issue.2, p.26, 2006.

E. J. Candès, M. B. Wakin, and S. P. Boyd, Enhancing sparsity by reweighted l 1 minimization, J. Fourier Anal. Applicat, vol.14, issue.5-6, p.25, 2008.

A. Chambolle and T. Pock, A first-order primal-dual algorithm for convex problems with applications to imaging, J. Math. Imaging Vis, vol.40, issue.1, p.101, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00490826

R. Chartrand, Exact reconstruction of sparse signals via nonconvex minimization, IEEE Trans. Signal Process. Lett, vol.14, issue.10, p.24, 2007.

S. Chen and A. Banerjee, Alternating estimation for structured high-dimensional multiresponse models, NeurIPS, vol.87, p.107, 2017.

S. S. Chen and D. L. Donoho, Atomic decomposition by basis pursuit, SPIE, vol.24, p.44, 1995.

S. S. Chen, D. L. Donoho, and M. A. Saunders, Atomic decomposition by basis pursuit, SIAM J. Sci. Comput, vol.20, issue.1, p.23, 1998.

J. F. Claerbout and F. Muir, Robust modeling with erratic data, Geophysics, vol.38, issue.5, p.23, 1973.

D. Cohen, Magnetoencephalography: evidence of magnetic fields produced by alpharhythm currents, Science, vol.161, issue.3843, p.33, 1968.

P. L. Combettes and J. Pesquet, Proximal splitting methods in signal processing. In Fixed-point algorithms for inverse problems in science and engineering, Springer Optim. Appl, vol.49, p.29, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00643807

A. M. Dale, A. K. Liu, B. R. Fischl, R. L. Buckner, J. W. Belliveau et al., Dynamic statistical parametric mapping: combining fMRI and MEG for high-resolution imaging of cortical activity, Neuron, vol.26, issue.1, p.36, 2000.

J. Daye, J. Chen, and H. Li, High-dimensional heteroscedastic regression with an application to eQTL data analysis, Biometrics, vol.68, issue.1, p.86, 2012.

D. L. Donoho, Compressed sensing, IEEE Trans. Inf. Theory, vol.52, issue.4, p.26, 2006.
URL : https://hal.archives-ouvertes.fr/inria-00369486

D. L. Donoho and I. M. Johnstone, Ideal spatial adaptation by wavelet shrinkage, Biometrika, vol.81, issue.3, p.23, 1994.

J. H. Duyn, The future of ultra-high field MRI and fMRI for study of the human brain, Neuroimage, vol.62, issue.2, p.31, 2012.

R. L. Dykstra, An algorithm for restricted least squares regression, J. Amer. Statist. Assoc, vol.78, issue.384, p.53, 1983.

M. A. Efroymson, Multiple regression analysis, Mathematical methods for digital computers, p.23, 1960.

L. E. Ghaoui, V. Viallon, and T. Rabbani, Safe feature elimination in sparse supervised learning, J. Pacific Optim, vol.8, issue.4, p.94, 2012.

D. A. Engemann and A. Gramfort, Automated model selection in covariance estimation and spatial whitening of MEG and EEG signals, NeuroImage, vol.108, p.87, 2015.

D. Engemann and A. Gramfort, Automated model selection in covariance estimation and spatial whitening of MEG and EEG signals, NeuroImage, vol.108, p.34, 2015.

D. Engemann, D. Strohmeier, E. Larson, and A. Gramfort, Mind the noise covariance when localizing brain sources with M/EEG, Pattern Recognition in NeuroImaging (PRNI), p.34, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01183551

R. Fan, K. Chang, C. Hsieh, X. Wang, and C. Lin, Liblinear: A library for large linear classification, J. Mach. Learn. Res, vol.9, p.66, 2008.

J. Fan and R. Li, Variable selection via nonconcave penalized likelihood and its oracle properties, J. Amer. Statist. Assoc, vol.96, issue.456, p.24, 2001.

J. Fan and J. Lv, Sure independence screening for ultrahigh dimensional feature space, J. R. Stat. Soc. Ser. B Stat. Methodol, vol.70, issue.5, p.94, 2008.

M. Fazel, Matrix rank minimization with applications, p.24, 2002.

O. Fercoq and P. Richtárik, Accelerated, parallel and proximal coordinate descent, SIAM J. Optim, vol.25, issue.3, p.67, 2015.
URL : https://hal.archives-ouvertes.fr/hal-02287265

O. Fercoq, A. Gramfort, and J. Salmon, Mind the duality gap: safer rules for the lasso, ICML, vol.55, p.86, 2015.

L. E. Frank and J. H. Friedman, A statistical view of some chemometrics regression tools, Technometrics, vol.35, issue.2, p.24, 1993.

J. Friedman, T. J. Hastie, H. Höfling, and R. Tibshirani, Pathwise coordinate optimization, Ann. Appl. Stat, vol.1, issue.2, p.66, 2007.

J. Friedman, T. J. Hastie, and R. Tibshirani, Sparse inverse covariance estimation with the graphical lasso, Biostatistics, vol.9, issue.3, p.106, 2008.

J. Friedman, T. J. Hastie, and R. Tibshirani, Regularization paths for generalized linear models via coordinate descent, J. Stat. Softw, vol.33, issue.1, p.78, 2010.

W. J. Fu, Penalized regressions: the bridge versus the lasso, J. Comput. Graph. Statist, vol.7, issue.3, p.44, 1998.

G. Gasso, A. Rakotomamonjy, and S. Canu, Recovering sparse signals with a certain family of nonconvex penalties and DC programming, IEEE Trans. Signal Process, vol.57, issue.12, p.25, 2009.
URL : https://hal.archives-ouvertes.fr/hal-00439453

C. F. Gauss, Theoria motus corporum coelestium in sectionibus conicis solem ambientium, vol.7, pp.1809-1830

P. Gloor, Neuronal generators and the problem of localization in electroencephalography: application of volume conductor theory to electroencephalography, Journal of clinical neurophysiology, vol.2, issue.4, p.31, 1985.

J. L. Goffin, On convergence rates of subgradient optimization methods, Mathematical Programming, vol.13, issue.1, p.29, 1977.

A. Gramfort, M. Kowalski, and M. Hämäläinen, Mixed-norm estimates for the M/EEG inverse problem using accelerated gradient methods, Phys. Med. Biol, vol.57, issue.7, p.81, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00690774

A. Gramfort, D. Strohmeier, J. Haueisen, M. S. Hämäläinen, and M. Kowalski, Timefrequency mixed-norm estimates: Sparse M/EEG imaging with non-stationary source activations, NeuroImage, vol.70, p.130, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00773276

A. Gramfort, M. Luessi, E. Larson, D. A. Engemann, D. Strohmeier et al., MNE software for processing MEG and EEG data, NeuroImage, vol.86, p.130, 2014.
URL : https://hal.archives-ouvertes.fr/hal-02369299

J. Gross, S. Baillet, G. R. Barnes, R. N. Henson, A. Hillebrand et al., Good practice for conducting and reporting MEG research, NeuroImage, vol.65, p.33, 2013.

E. Hale, W. Yin, and Y. Zhang, Fixed-point continuation for 1 -minimization: Methodology and convergence, SIAM J. Optim, vol.19, issue.3, p.71, 2008.

T. J. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning, Springer Series in Statistics, p.17, 2009.

T. J. Hastie, A. Montanari, S. Rosset, and R. J. Tibshirani, Surprises in highdimensional ridgeless least squares interpolation, p.22, 2019.

S. Haufe, V. V. Nikulin, A. Ziehe, K. Müller, and G. Nolte, Combining sparsity and rotational invariance in EEG/MEG source reconstruction, NeuroImage, vol.42, issue.2, p.130, 2008.

J. Hiriart-urruty and C. , Convex analysis and minimization algorithms, vol.II, p.27, 1993.

A. E. Hoerl and R. W. Kennard, Ridge regression: Biased estimation for nonorthogonal problems, Technometrics, vol.12, issue.1, p.22, 1970.

B. Hong, W. Liu, J. Ye, D. Cai, X. He et al., Scaling up sparse support vector machines by simultaneous feature and sample reduction, J. Mach. Learn. Res, vol.20, issue.121, p.44, 2019.

C. Hsieh, M. Sustik, I. Dhillon, and P. Ravikumar, QUIC: Quadratic approximation for sparse inverse covariance estimation, J. Mach. Learn. Res, vol.15, p.78, 2014.

J. Huang, P. Breheny, and S. Ma, A selective review of group selection in highdimensional models. Statistical science: a review journal of the Institute of Mathematical, Statistics, vol.27, issue.4, pp.2012-2036

C. J. Huang, S. Aine, E. Supek, D. Best, E. R. Ranken et al., Multi-start downhill simplex method for spatio-temporal source localization in magnetoencephalography, Electroencephalography and Clinical Neurophysiology/Evoked Potentials Section, vol.108, issue.1, p.35, 1998.

L. Huber, D. A. Handwerker, D. C. Jangraw, G. Chen, A. Hall et al., High-resolution CBV-fMRI allows mapping of laminar activity and connectivity of cortical input and output in human M1, Neuron, vol.96, issue.6, p.31, 2017.

P. J. Huber, Robust Statistics, p.86, 1981.

P. J. Huber and R. Dutter, Numerical solution of robust regression problems, Compstat 1974 (Proc. Sympos. Computational Statist., Univ. Vienna, p.86, 1974.

M. Hämäläinen and R. J. Ilmoniemi, Interpreting magnetic fields of the brain: minimum norm estimates, Medical & biological engineering & computing, vol.32, issue.1, p.35, 1994.

A. A. Ivanov, The theory of approximate methods and their applications to the numerical solution of singular integral equations, vol.2, p.22, 1976.

T. B. Bibliography and . Johnson, Scaling Machine Learning via Prioritized Optimization, p.60, 2018.

T. B. Johnson and C. Guestrin, Blitz: A principled meta-algorithm for scaling sparse optimization, ICML, vol.66, p.94, 2015.

T. B. Johnson and C. Guestrin, A fast, principled working set algorithm for exploiting piecewise linear structure in convex problems, vol.66, p.75, 2018.

P. Karimireddy, A. Koloskova, S. Stich, and M. Jaggi, Efficient greedy coordinate descent for composite problems, p.66, 2018.

S. Kim, K. Koh, M. Lustig, S. Boyd, and D. Gorinevsky, An interior-point method for large-scale 1 -regularized least squares, IEEE J. Sel. Topics Signal Process, vol.1, issue.4, p.60, 2007.

J. Kim and H. Park, Fast active-set-type algorithms for l1-regularized linear regression, AISTATS, p.56, 2010.

K. Kobayashi, T. Akiyama, T. Nakahori, H. Yoshinaga, and J. Gotman, Systematic source estimation of spikes by a combination of independent component analysis and RAP-MUSIC: I: Principles and simulation study, Clinical Neurophysiology, vol.113, issue.5, p.35, 2002.

K. Koh, S. Kim, and S. Boyd, An interior-point method for large-scale l1-regularized logistic regression, J. Mach. Learn. Res, vol.8, issue.8, p.66, 2007.

M. Kolar and J. Sharpnack, Variance function estimation in high-dimensions, ICML, p.86, 2012.

Z. J. Koles and A. C. Soong, EEG source localization: implementing the spatio-temporal decomposition approach, Electroencephalography and clinical Neurophysiology, vol.107, issue.5, p.35, 1998.

M. Kowalski, P. Weiss, A. Gramfort, and S. Anthoine, Accelerating ISTA with an active set strategy, OPT 2011: 4th International Workshop on Optimization for Machine Learning, vol.66, p.74, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00696992

S. K. Lam, A. Pitrou, and S. Seibert, Numba: A LLVM-based Python JIT Compiler, Proceedings of the Second Workshop on the LLVM Compiler Infrastructure in HPC, vol.78, p.107, 2015.

O. Ledoit and M. Wolf, A well-conditioned estimator for large-dimensional covariance matrices, J. Multivariate Anal, vol.88, issue.2, p.89, 2004.

J. Lee, Y. Sun, and M. Saunders, Proximal Newton-type methods for convex optimization, NeurIPS, p.75, 2012.

W. Lee and Y. Liu, Simultaneous multiple response regression and inverse covariance matrix estimation via penalized Gaussian maximum likelihood, vol.111, p.107, 2012.

A. Legendre, Nouvelles méthodes pour la détermination des orbites des comètes. F. Didot, 1805, p.21

C. Leng, Y. Lin, and G. Wahba, A note on the lasso and related procedures in model selection, Statistica Sinica, vol.16, issue.4, p.25, 2006.

K. Lounici, Estimation statistique en grande dimension, parcimonie et inégalités d'oracle, p.25, 2009.

F. Lucka, S. Pursiainen, M. Burger, and C. H. Wolters, Hierarchical Bayesian inference for the EEG inverse problem using realistic FE head models: depth localization and source separation for focal primary currents, Neuroimage, vol.61, issue.4, p.36, 2012.

J. , Sparse coding for machine learning, image processing and computer vision, vol.46, p.69, 2010.

S. Makeig, A. J. Bell, T. Jung, and T. J. Sejnowski, Independent component analysis of electroencephalographic data, NeurIPS, p.33, 1996.

S. Mallat and Z. Zhang, Matching pursuit with time-frequency dictionaries, IEEE Trans. Image Process, vol.41, p.23, 1993.

H. Markowitz, Portfolio selection, The Journal of Finance, vol.7, issue.1, p.23, 1952.

M. Massias, A. Gramfort, and J. Salmon, From safe screening rules to working sets for faster lasso-type solvers, NIPS-OPT workshop, 2017.

M. Massias, O. Fercoq, A. Gramfort, and J. Salmon, Generalized concomitant multitask Lasso for sparse multimodal regression, AISTATS, vol.103, p.107, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01812011

M. Massias, A. Gramfort, and J. Salmon, Celer: a fast solver for the Lasso with dual extrapolation, ICML, pp.3321-3330, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01833398

M. Massias, S. Vaiter, A. Gramfort, and J. Salmon, Dual extrapolation for sparse Generalized Linear Models, 2019.
URL : https://hal.archives-ouvertes.fr/hal-02263500

K. Matsuura and Y. Okabe, Selective minimum-norm solution of the biomagnetic inverse problem, IEEE Transactions on Biomedical Engineering, vol.42, issue.6, p.36, 1995.

P. Mccullagh and J. A. Nelder, Generalized Linear Models, CRC Monographs on Statistics and Applied Probability Series, vol.18, p.66, 1989.

N. Meinshausen and P. Bühlmann, High-dimensional graphs and variable selection with the lasso, Ann. Statist, vol.34, issue.3, p.25, 2006.

A. Miller, Subset selection in regression, p.23, 2002.

A. J. Molstad, Insights and algorithms for the multivariate square-root lasso, vol.96, p.97, 2019.

J. Moreau, Proximité et dualité dans un espace hilbertien, Bull. Soc. Math. France, vol.93, p.27, 1965.

V. A. Morozov, Methods for solving incorrectly posed problems, vol.2, p.22, 1984.

J. C. Bibliography, R. M. Mosher, and . Leahy, Source localization using recursively applied and projected (RAP) MUSIC, IEEE Trans. Signal Process, vol.47, issue.2, p.35, 1999.

J. C. Mosher, S. Baillet, and R. M. Leahy, EEG source localization and imaging using multiple signal classification approaches, Journal of Clinical Neurophysiology, vol.16, issue.3, p.35, 1999.

S. Murakami and Y. Okada, Invariance in current dipole moment density across brain structures and species: Physiological constraint for neuroimaging, NeuroImage, vol.111, p.32, 2015.

D. Myers and W. Shih, A constraint selection technique for a class of linear programs, Operations Research Letters, vol.7, issue.4, p.44, 1988.

Y. Nardi and A. Rinaldo, Autoregressive process modeling via the lasso procedure, Journal of Multivariate Analysis, vol.102, issue.3, p.26, 2011.

B. K. Natarajan, Sparse approximate solutions to linear systems, SIAM J. Comput, vol.24, issue.2, p.23, 1995.

E. Ndiaye, O. Fercoq, A. Gramfort, and J. Salmon, Gap safe screening rules for sparse multi-task and multi-class models, NeurIPS, p.81, 2015.
URL : https://hal.archives-ouvertes.fr/hal-02287197

E. Ndiaye, O. Fercoq, A. Gramfort, V. Leclère, and J. Salmon, Efficient smoothed concomitant lasso estimation for high dimensional regression, Journal of Physics: Conference Series, vol.904, issue.1, p.101, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01404966

E. Ndiaye, O. Fercoq, A. Gramfort, and J. Salmon, Gap safe screening rules for sparsity enforcing penalties, JMLR, vol.18, issue.128, p.94, 2017.

S. Negahban, P. Ravikumar, M. J. Wainwright, and B. Yu, A unified framework for high-dimensional analysis of m-estimators with decomposable regularizers, p.24, 2010.

Y. Nesterov, A method for solving a convex programming problem with rate of convergence O(1/k 2 ), Soviet Math. Doklady, vol.269, issue.3, p.29, 1983.

Y. Nesterov, Smooth minimization of non-smooth functions, Math. Program, vol.103, issue.1, p.97, 2005.

M. Nikolova, Relationship between the optimal solutions of least squares regularized with 0 -norm and constrained by k-sparsity, Applied and Computational Harmonic Analysis, vol.41, issue.1, p.24, 2016.
URL : https://hal.archives-ouvertes.fr/hal-00944006

P. L. Nunez and R. Srinivasan, Electric fields of the brain: the neurophysics of EEG, p.32, 2006.

J. Nutini, M. Schmidt, and W. Hare, Active-set complexity" of proximal gradient: how long does it take to find the sparsity pattern? Optimization Letters, p.69, 2017.

G. Obozinski, B. Taskar, and M. I. Jordan, Joint covariate selection and joint subspace selection for multiple classification problems, Statistics and Computing, vol.20, issue.2, p.126, 2010.

P. Ochs, A. Dosovitskiy, T. Brox, and T. Pock, On iteratively reweighted algorithms for nonsmooth nonconvex optimization in computer vision, SIAM J. Imaging Sci, vol.8, issue.1, p.25, 2015.

W. Ockham, Quaestiones et decisiones in quatuor libros Sententiarum cum centilogio theologico, pp.1319-1341

K. Ogawa, Y. Suzuki, and I. Takeuchi, Safe screening of non-support vectors in pathwise SVM computation, ICML, vol.44, p.66, 2013.

B. A. Olshausen and D. J. Field, Sparse coding with an overcomplete basis set: A strategy employed by V1? Vision research, vol.37, p.26, 1997.

W. Ou, M. Hämaläinen, and P. Golland, A distributed spatio-temporal EEG/MEG inverse solver, NeuroImage, vol.44, issue.3, p.130, 2009.

A. B. Owen, A robust hybrid of lasso and ridge regression, Cont. Math, vol.443, p.88, 2007.

F. Palacios-gomez, L. Lasdon, and M. Engquist, Nonlinear optimization by successive linear programming, Management Science, vol.28, issue.10, p.44, 1982.

N. Parikh, S. Boyd, E. Chu, B. Peleato, and J. Eckstein, Proximal algorithms. Foundations and Trends in Machine Learning, vol.1, p.29, 2013.

L. Parkonnen, MEG: An Introduction to Methods, p.33, 2010.

R. Pascual-marqui, Standardized low-resolution brain electromagnetic tomography (sLORETA): technical details, Methods Find. Exp. Clin. Pharmacol, vol.24, p.36, 2002.

Y. C. Pati, R. Rezaiifar, and P. S. Krishnaprasad, Orthogonal matching pursuit: Recursive function approximation with applications to wavelet decomposition, Proceedings of 27th Asilomar conference on signals, systems and computers, p.23, 1993.

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion et al., Scikit-learn: Machine learning in Python, J. Mach. Learn. Res, vol.12, p.106, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00650905

D. Perekrestenko, V. Cevher, and M. Jaggi, Faster coordinate descent via adaptive importance sampling, AISTATS, vol.56, p.66, 2017.

E. J. Pitman, Sufficient statistics and intrinsic accuracy, Mathematical Proceedings of the Cambridge Philosophical society, vol.32, p.19, 1936.

R. L. Plackett, Studies in the History of Probability and Statistics. XXIX: The discovery of the method of least squares, Biometrika, vol.59, issue.2, p.21, 1972.

C. Poon, J. Liang, and C. Schoenlieb, Local convergence properties of SAGA/Prox-SVRG and acceleration, ICML, p.69, 2018.

P. Rai, A. Kumar, and H. Daume, Simultaneously leveraging output and task structures for multiple-output regression, NeurIPS, p.87, 2012.

P. Richtárik and M. Taká?, Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function, Mathematical Programming, vol.144, issue.1-2, p.66, 2014.

S. E. Robinson and J. Vrba, Recent advances in biomagnetism, p.35, 1999.

R. T. Rockafellar, Convex analysis. Princeton Landmarks in Mathematics, p.28, 1997.

S. Rosset, J. Zhu, and T. Hastie, Boosting as a regularized path to a maximum margin classifier, J. Mach. Learn. Res, vol.5, p.69, 2004.

V. Roth and B. Fischer, The group-lasso for generalized linear models: uniqueness of solutions and efficient algorithms, ICML, p.66, 2008.

A. J. Rothman, E. Levina, and J. Zhu, Sparse multivariate regression with covariance estimation, Journal of Computational and Graphical Statistics, vol.19, issue.4, p.107, 2010.

M. De-santis, S. Lucidi, and F. Rinaldi, A fast active set block coordinate descent algorithm for 1 -regularized least squares, SIAM J. Optim, vol.26, issue.1, p.75, 2016.

F. Santosa and W. W. Symes, Linear inversion of band-limited reflection seismograms, SIAM Journal on Scientific and Statistical Computing, vol.7, issue.4, p.23, 1986.

K. Scheinberg and X. Tang, Complexity of inexact proximal Newton methods, p.75, 2013.

M. Scherg and D. Von-cramon, Two bilateral sources of the late AEP as identified by a spatio-temporal dipole model. Electroencephalography and Clinical Neurophysiology/Evoked Potentials Section, vol.62, p.35, 1985.

D. Scieur, Acceleration in Optimization, p.47, 2018.
URL : https://hal.archives-ouvertes.fr/tel-01887163

D. Scieur, A. Aspremont, and F. Bach, Regularized nonlinear acceleration, Neur-IPS, vol.51, p.121, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01384682

S. Shalev-shwartz and S. Ben-david, Understanding machine learning: From theory to algorithms, p.17, 2014.

A. Shibagaki, M. Karasuyama, K. Hatano, and I. Takeuchi, Simultaneous safe screening of features and samples in doubly sparse modeling, ICML, p.44, 2016.

N. Simon, J. Friedman, T. J. Hastie, and R. Tibshirani, A sparse-group lasso, J. Comput. Graph. Statist, vol.22, issue.2, p.66, 2013.

E. Soubies, L. Blanc-féraud, G. Aubert, ;. Städler, P. Bühlmann et al., A continuous exact 0 penalty (CEL0) for least squares regularized problem, SIAM J. Imaging Sci, vol.8, issue.3, p.86, 2010.
URL : https://hal.archives-ouvertes.fr/hal-01102492

S. Stich, A. Raj, and M. Jaggi, Safe adaptive importance sampling, NeurIPS, p.56, 2017.

D. Strohmeier, Spatio-Temporal Sparse Priors for MEG/EEG Source Reconstruction, p.36, 2016.

D. Strohmeier, Y. Bekhti, J. Haueisen, and A. Gramfort, The iterative reweighted mixed-norm estimate for spatio-temporal MEG/EEG source reconstruction, IEEE Trans. Med. Imag, p.36, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01079530

B. Stucky, Asymptotic confidence regions and sharp oracle results under structured sparsity, p.96, 2017.

T. Sun and C. Zhang, Scaled sparse linear regression, Biometrika, vol.99, issue.4, p.88, 2012.

Y. Sun, H. Jeong, J. Nutini, and M. Schmidt, Are we there yet? Manifold identification of gradient-related proximal methods, AISTATS, p.69, 2019.

H. L. Taylor, S. C. Banks, and J. F. Mccoy, Deconvolution with the 1 norm, Geophysics, vol.44, issue.1, p.23, 1979.

G. Thompson, F. Tonge, and S. Zionts, Techniques for removing nonbinding constraints and extraneous variables from linear programming problems, Management Science, vol.12, issue.7, p.44, 1966.

R. Tibshirani, Regression shrinkage and selection via the lasso, J. R. Stat. Soc. Ser. B Stat. Methodol, vol.58, issue.1, p.86, 1996.

R. Tibshirani, J. Bien, J. Friedman, T. J. Hastie, N. Simon et al., Strong rules for discarding predictors in lasso-type problems, J. R. Stat. Soc. Ser. B Stat. Methodol, vol.74, issue.2, p.94, 2012.

R. J. Tibshirani, The lasso problem and uniqueness, Electron. J. Stat, vol.7, p.70, 2013.

R. J. Tibshirani, Dykstra's algorithm, ADMM, and coordinate descent: Connections, insights, and extensions, NeurIPS, vol.50, p.53, 2017.

A. N. Tikhonov, On the stability of inverse problems, Dokl. Akad. Nauk SSSR, vol.39, p.22, 1943.

J. A. Tropp, Just relax: convex programming methods for identifying sparse signals in noise, IEEE Trans. Inf. Theory, vol.52, issue.3, p.23, 2006.

P. Tseng, Convergence of a block coordinate descent method for nondifferentiable minimization, J. Optim. Theory Appl, vol.109, issue.3, p.91, 2001.

P. Tseng and S. Yun, Block-coordinate gradient descent method for linearly constrained nonsmooth separable optimization, J. Optim. Theory Appl, vol.140, issue.3, p.92, 2009.

K. Bibliography, M. Uutela, R. Hämäläinen, and . Salmelin, Global optimization in the localization of neuromagnetic sources, IEEE Trans. Med. Imag, vol.45, issue.6, p.35, 1998.

K. Uutela, M. Hämäläinen, and E. Somersalo, Visualization of magnetoencephalographic data using minimum current estimates, NeuroImage, vol.10, issue.2, p.36, 1999.

D. Vainsencher, H. Liu, and T. Zhang, Local smoothness in variance reduced optimization, NeurIPS, p.44, 2015.

S. Vaiter, G. Peyré, and J. M. Fadili, Model consistency of partly smooth regularizers, IEEE Trans. Inf. Theory, vol.64, issue.3, p.74, 2018.
URL : https://hal.archives-ouvertes.fr/hal-00987293

S. Van-de-geer, Lecture notes from the 45th Probability Summer School held in Saint-Four, Lecture Notes in Mathematics, vol.2159, p.101, 2015.

S. Van-de-geer and B. Stucky, ? 2 -confidence sets in high-dimensional regression, Statistical analysis for high-dimensional data, p.97, 2016.

B. D. Van-veen, W. Van-drongelen, M. Yuchtman, and A. Suzuki, Localization of brain electrical activity via linearly constrained minimum variance spatial filtering, IEEE Transactions on biomedical engineering, vol.44, issue.9, p.35, 1997.

J. Wagener and H. Dette, Bridge estimators and the adaptive Lasso under heteroscedasticity, Math. Methods Statist, vol.21, p.86, 2012.

J. Wang, J. Zhou, P. Wonka, and J. Ye, Lasso screening rules via dual polytope projection, NeurIPS, vol.44, p.66, 2013.

D. Wipf and S. Nagarajan, A unified Bayesian framework for MEG/EEG source imaging, NeuroImage, vol.44, issue.3, p.36, 2009.

D. P. Wipf, J. P. Owen, H. Attias, K. Sekihara, and S. S. Nagarajan, Estimating the location and orientation of complex, correlated neural activity using MEG, NeurIPS, vol.36, p.130, 2008.

D. Wrinch and H. Jeffreys, On certain fundamental principles of scientific inquiry. The London, Edinburgh, and Dublin Philosophical Magazine, Journal of Science, vol.42, issue.249, p.22, 1921.

T. T. Wu and K. Lange, Coordinate descent algorithms for lasso penalized regression, Ann. Appl. Stat, p.24, 2008.

Z. J. Xiang and P. J. Ramadge, Fast lasso screening tests based on correlations, ICASSP, p.44, 2012.

Z. J. Xiang, Y. Wang, and P. J. Ramadge, Screening tests for lasso problems, IEEE Trans. Pattern Anal. Mach. Intell, pp.2016-66

G. Yuan, C. Ho, and C. Lin, An improved GLMNET for l1-regularized logistic regression, J. Mach. Learn. Res, vol.13, p.78, 1999.

M. Yuan and Y. Lin, Model selection and estimation in regression with grouped variables, J. R. Stat. Soc. Ser. B Stat. Methodol, vol.68, issue.1, p.66, 2006.

C. Zhang, Nearly unbiased variable selection under minimax concave penalty, Ann. Statist, vol.38, issue.2, p.24, 2010.

T. Zhang, Adaptive forward-backward greedy algorithm for learning sparse representations, IEEE Trans. Inf. Theory, vol.57, issue.7, p.23, 2011.

P. Zhao and B. Yu, On model selection consistency of Lasso, J. Mach. Learn. Res, vol.7, p.24, 2006.

M. Zibulevsky and B. A. Pearlmutter, Blind source separation by sparse decomposition in a signal dictionary, Neural computation, vol.13, issue.4, p.26, 2001.

H. Zou, The adaptive lasso and its oracle properties, J. Amer. Statist. Assoc, vol.101, issue.476, p.25, 2006.

H. Zou and T. J. Hastie, Regularization and variable selection via the elastic net, J. R. Stat. Soc. Ser. B Stat. Methodol, vol.67, issue.2, p.24, 2005.