, We prove an extended result that holds when ? · ? U and ? · ? V are more general mixed (? × ? )-norms
Information theory and an extension of the maximum likelihood principle, pp.199-213, 1998. ,
, Mathematical Methods for Physicists, pp.37-42, 1985.
Convex multi-task feature learning. Machine Learning, vol.73, pp.243-272, 2008. ,
Constant step size stochastic gradient descent for probabilistic modeling, Proceedings in Uncertainty in Artificial Intelligence, pp.219-228, 2018. ,
URL : https://hal.archives-ouvertes.fr/hal-01929810
Slice inverse regression with score functions, Electronic Journal of Statistics, vol.12, issue.1, pp.1507-1543, 2018. ,
URL : https://hal.archives-ouvertes.fr/hal-01388498
Sharp analysis of low-rank kernel matrix approximations, Conference on Learning Theory, pp.185-209, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00723365
Non-asymptotic analysis of stochastic approximation algorithms for machine learning, Adv. NIPS, 2011. ,
URL : https://hal.archives-ouvertes.fr/hal-00608041
Non-strongly-convex smooth stochastic approximation with convergence rate (1/ ), Advances in Neural Information Processing Systems (NIPS), 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00831977
A descent lemma beyond lipschitz gradient continuity: first-order methods revisited and applications, Mathematics of Operations Research, vol.42, issue.2, pp.330-348, 2016. ,
Mirror descent and nonlinear projected subgradient methods for convex optimization, Operations Research Letters, vol.31, issue.3, pp.167-175, 2003. ,
A fast iterative shrinkage-thresholding algorithm for linear inverse problems, SIAM journal on imaging sciences, vol.2, issue.1, pp.183-202, 2009. ,
On the difficulty of approximately maximizing agreements, Journal of Computer and System Sciences, vol.66, issue.3, pp.496-514, 2003. ,
Nonlinear programming. Athena scientific Belmont, 1999. ,
, Pattern Recognition and Machine Learning, 2006.
Learning classifiers with fenchelyoung losses: Generalized entropies, margins, and algorithms, 2018. ,
Fast kernel classifiers with online and active learning, Journal of Machine Learning Research, vol.6, pp.1579-1619, 2005. ,
URL : https://hal.archives-ouvertes.fr/hal-00752361
Convex analysis and nonlinear optimization: theory and examples, 2010. ,
Optimization methods for large-scale machine learning, 2016. ,
Concentration inequalities: A nonasymptotic theory of independence, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00794821
A Generalized Linear Model with 'Gaussian' Regressor Variables, 1982. ,
Convex optimization: Algorithms and complexity. Foundations and Trends® in Machine Learning, vol.8, pp.231-357, 2015. ,
On the Theory of Elliptically Contoured Distributions, Journal of Multivariate Analysis, vol.11, issue.3, pp.368-385, 1981. ,
Optimal rates for the regularized least-squares algorithm, Foundations of Computational Mathematics, vol.7, issue.3, pp.331-368, 2007. ,
Statistical inference, vol.2, 2002. ,
Robinson. One billion word benchmark for measuring progress in statistical language modeling, 2013. ,
Sublinear optimization for machine learning, Journal of the ACM (JACM), vol.59, issue.5, p.23, 2012. ,
Save: a method for dimension reduction and graphics in regression, Communications in Statistics -Theory and Methods, vol.29, pp.2109-2121, 2000. ,
Dimension Reduction in Binary Response Regression, Journal of the American Statistical Association, vol.94, pp.1187-1200, 1999. ,
Discussion of 'Sliced Inverse Regression, Journal of the American Statistical Association, vol.86, pp.328-332, 1991. ,
Support-vector networks, Machine learning, vol.20, issue.3, pp.273-297, 1995. ,
A new algorithm for estimating the effective dimension-reduction subspace, Journal of Machine Learning Research, vol.9, pp.1647-1678, 2008. ,
URL : https://hal.archives-ouvertes.fr/hal-00128129
Saga: A fast incremental gradient method with support for non-strongly convex composite objectives, Advances in neural information processing systems, pp.1646-1654, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-01016843
Nonparametric stochastic approximation with large stepsizes, Ann. Statist, vol.44, issue.4, pp.1363-1399, 2016. ,
Bridging the gap between constant step size stochastic gradient descent and markov chains, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01565514
Monotone Matrix Functions and Analytic Continuation, 1974. ,
Slicing regression: a link-free regression method, The Annals of Statistics, vol.19, pp.505-530, 1991. ,
Composite objective mirror descent, COLT, pp.14-26, 2010. ,
Agnostic learning of monomials by halfspaces is hard, SIAM Journal on Computing, vol.41, issue.6, pp.1558-1590, 2012. ,
Kernel dimension reduction in regression, The Annals of Statistics, vol.37, issue.4, pp.1871-1905, 2009. ,
Approximating semidefinite programs in sublinear time, Advances in Neural Information Processing Systems, pp.1080-1088, 2011. ,
Sublinear time algorithms for approximate semidefinite programming, Mathematical Programming, vol.158, issue.1-2, pp.329-361, 2016. ,
Markov chain Monte Carlo in practice, 1995. ,
Econometric theory. Econometric theory, 1964. ,
Deep Learning, 2016. ,
A sublinear-time randomized approximation algorithm for matrix games, Operations Research Letters, vol.18, issue.2, pp.53-58, 1995. ,
A distribution-free theory of nonparametric regression. Springer series in statistics, 2002. ,
Large sample properties of generalized method of moments estimators, Econometrica: Journal of the Econometric Society, pp.1029-1054, 1982. ,
Beating SGD: Learning SVMs in sublinear time, Advances in Neural Information Processing Systems, pp.1233-1241, 2011. ,
Mirror prox algorithm for multiterm composite minimization and semi-separable problems, Computational Optimization and Applications, vol.61, issue.2, pp.275-319, 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01335905
Negative binomial regression, 2011. ,
Simultaneous Equations and Canonical Correlation Theory, Econometrica, vol.27, pp.245-256, 1959. ,
Semiparametric methods in econometrics, vol.131, 2012. ,
Direct estimation of the index coefficient in a single index model, The Annals of Statistics, vol.29, issue.3, pp.595-623, 2001. ,
An asymptotic theory for sliced inverse regression, The Annals of Statistics, vol.20, issue.2, pp.1040-1061, 1992. ,
Estimation of non-normalized statistical models by score matching, Journal of Machine Learning Research, vol.6, pp.695-709, 2005. ,
Independent Component Analysis, vol.46, 2004. ,
Score function features for discriminative learning: Matrix and tensor framework, 2014. ,
, Generalization Bounds for Neural Networks through Tensor Factorization, 2015.
Accelerating stochastic gradient descent using predictive variance reduction, Advances in Neural Information Processing Systems, pp.315-323, 2013. ,
First-order methods for nonsmooth convex large-scale optimization, I: General purpose methods. Optimization for Machine Learning, pp.121-148, 2011. ,
URL : https://hal.archives-ouvertes.fr/hal-00981863
First order methods for nonsmooth convex large-scale optimization, ii: utilizing problems structure. Optimization for Machine Learning, pp.149-183, 2011. ,
Solving variational inequalities with stochastic mirror-prox algorithm, Stochastic Systems, vol.1, issue.1, pp.17-58, 2011. ,
URL : https://hal.archives-ouvertes.fr/hal-00318043
On the optimum rate of transmitting information, Probability and information theory, pp.126-169, 1969. ,
Probabilistic Graphical Models: Principles and Techniques -Adaptive Computation and Machine Learning, 2009. ,
Extragradient method for finding saddle points and other problems, Matekon, vol.13, issue.4, pp.35-49, 1977. ,
Conditional random fields: Probabilistic models for segmenting and labeling sequence data, Proc. ICML, 2001. ,
An optimal method for stochastic composite optimization, Mathematical Programming, vol.133, issue.1-2, pp.365-397, 2012. ,
Adaptive estimation of a quadratic functional by model selection, The Annals of Statistics, vol.28, issue.5, pp.1302-1338, 2000. ,
On some asymptotic properties of maximum likelihood estimates and related bayes estimates, Univ. California Pub. Statist, vol.1, pp.277-330, 1953. ,
Theory of point estimation, 2006. ,
Sliced Inverse Regression for Dimensional Reduction, Journal of the American Statistical Association, vol.86, pp.316-327, 1991. ,
On Principal Hessian Directions for Data Visualization and Dimension Reduction: Another Application of Stein's Lemma, Journal of the American Statistical Association, vol.87, pp.1025-1039, 1992. ,
Regression analysis under link violation, The Annals of Statistics, vol.17, pp.1009-1052, 1989. ,
UCI machine learning repository, 2013. ,
On consistency and sparsity for sliced inverse regression in high dimensions, The Annals of Statistics, vol.46, issue.2, pp.580-610, 2018. ,
Generalized linear models, European Journal of Operational Research, vol.16, issue.3, pp.285-292, 1984. ,
Generalized linear models, vol.37, 1989. ,
Spectral -support norm regularization, Advances in Neural Information Processing Systems, 2014. ,
Markov chains and stochastic stability, 1993. ,
Proximité et dualité dans un espace hilbertien, Bull. Soc. Math. France, vol.93, issue.2, pp.273-299, 1965. ,
, Machine Learning: A Probabilistic Perspective, 2012.
Prox-method with rate of convergence (1/ ) for variational inequalities with lipschitz continuous monotone operators and smooth convex-concave saddle point problems, SIAM Journal on Optimization, vol.15, issue.1, pp.229-251, 2004. ,
Accuracy certificates for computational problems with convex structure, Mathematics of Operations Research, vol.35, issue.1, pp.52-78, 2010. ,
Problem complexity and method efficiency in optimization, 1983. ,
Smooth minimization of non-smooth functions. Mathematical programming, vol.103, pp.127-152, 2005. ,
Gradient methods for minimizing composite objective function, 2007. ,
Introductory lectures on convex optimization: A basic course, vol.87, 2013. ,
On first-order algorithms for 1 /nuclear norm minimization, Acta Numerica, vol.22, pp.509-575, 2013. ,
A method for solving the convex programming problem with convergence rate o (1/k?2), In Dokl. Akad. Nauk SSSR, vol.269, pp.543-547, 1983. ,
Efficient first-order algorithms for adaptive signal denoising, Proceedings of the 35th ICML conference, vol.80, pp.3946-3955, 2018. ,
Stochastic variance reduction methods for saddle-point problems, Advances in Neural Information Processing Systems, pp.1416-1424, 2016. ,
, LSHTC: A benchmark for large-scale text classification, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01691460
Acceleration of stochastic approximation by averaging, SIAM Journal on Control and Optimization, vol.30, issue.4, pp.838-855, 1992. ,
Gaussian Processes for Machine Learning, 2006. ,
ªa stochastic approximation method, º annals math, Statistics, vol.22, pp.400-407, 1951. ,
Monotone operators and the proximal point algorithm, SIAM journal on control and optimization, vol.14, issue.5, pp.877-898, 1976. ,
, Convex analysis, 2015.
Falkon: An optimal large scale kernel method, Advances in Neural Information Processing Systems, pp.3891-3901, 2017. ,
Minimizing finite sums with the stochastic average gradient, Mathematical Programming, vol.162, issue.1-2, pp.83-112, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-00860051
Learning with Kernels: Support Vector Machines, Regularization, Optimization, and beyond, 2001. ,
Understanding machine learning: From theory to algorithms, 2014. ,
Stochastic dual coordinate ascent methods for regularized loss minimization, Journal of Machine Learning Research, vol.14, pp.567-599, 2013. ,
Pegasos: Primal estimated sub-gradient solver for SVM. Mathematical programming, vol.127, pp.3-30, 2011. ,
Lectures on stochastic programming: modeling and theory, 2009. ,
Kernel Methods for Pattern Analysis, 2004. ,
Bregman divergence for stochastic variance reduction: saddle-point and adversarial prediction, Advances in Neural Information Processing Systems, pp.6031-6041, 2017. ,
Fast projections onto mixed-norm balls with applications, Data Mining and Knowledge Discovery, vol.25, issue.2, pp.358-377, 2012. ,
Injective hilbert space embeddings of probability measures, Proc. COLT, 2008. ,
Estimation of the Mean of a Multivariate Normal Distribution, The Annals of Statistics, vol.9, pp.1135-1151, 1981. ,
Matrix perturbation theory (computer science and scientific computing), 1990. ,
Consistent estimation of scaled coefficients, Econometrica, vol.54, pp.1461-1481, 1986. ,
Introduction to Nonparametric Estimation, 2009. ,
Minimax sparse principal subspace estimation in high dimensions, The Annals of Statistics, vol.41, issue.6, pp.2905-2947, 2013. ,
On directional regression for dimension reduction, J. Amer. Statist. Ass. Citeseer, 2007. ,
Sliced regression for dimension reduction, Journal of the American Statistical Association, vol.103, issue.482, pp.811-821, 2008. ,
Using the nyström method to speed up kernel machines, Advances in neural information processing systems, pp.682-688, 2001. ,
Sketching as a tool for numerical linear algebra, Foundations and Trends® in Theoretical Computer Science, vol.10, issue.1-2, pp.1-157, 2014. ,
An adaptive estimation of dimension reduction space, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol.64, issue.3, pp.363-410, 2002. ,
An adaptive estimation of dimension reduction space, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol.64, issue.3, pp.363-410, 2002. ,
Dual averaging methods for regularized stochastic learning and online optimization, Journal of Machine Learning Research, vol.11, pp.2543-2596, 2010. ,
Dscovr: Randomized primal-dual block coordinate algorithms for asynchronous distributed optimization, 2017. ,
General distribution theory of the concomitants of order statistics, The Annals of Statistics, vol.5, pp.996-1002, 1977. ,
A useful variant of the davis-kahan theorem for statisticians, Biometrika, vol.102, issue.2, pp.315-323, 2015. ,
The strong convexity of von Neumann's entropy. Unpublished note, 2013. ,
On the identifiability of additive index models, Statistica Sinica, vol.21, issue.4, pp.1901-1911, 2011. ,
Averaging estimators in red vs averaging predictions in green. * is optimal linear predictor and ** is the global optimum, Statistica Sinica, vol.5, pp.727-736, 1995. ,
, Averaging estimators in red vs averaging predictions in green. Global optimizer coincides with the best linear, p.68
= sin 1 +sin 2 . Excess prediction performance vs. number of iterations (both in logscale) ,
73 3-9 MiniBooNE dataset, dimension = 50, kernel approach, column sampling = 200. Excess prediction performance vs. number of iterations (both in log-scale), performance vs. number of iterations (both in logscale), vol.74, pp.3-11 ,
99 4-2 Primal accuracy and duality gap (when available) for Algorithm 1, stochastic subgradient method (SSM), and Mirror Prox (MP) with exact gradients, on a synthetic data benchmark, Depiction of the Full Sampling Scheme, p.108 ,
Comparison of different methods using score functions ,
, Runtime (in seconds) of Algorithm 1 on synthetic data, p.107