Skip to Main content Skip to Navigation

Generic acceleration schemes for gradient-based optimization in machine learning

Abstract : Optimization problems arise naturally in machine learning for supervised problems. A typical example is the empirical risk minimization (ERM) formulation, which aims to find the best a posteriori estimator minimizing the regularized risk on a given dataset. The current challenge is to design efficient optimization algorithms that are able to handle large amounts of data in high-dimensional feature spaces. Classical optimization methods such as the gradient descent algorithm and its accelerated variants are computationally expensive under this setting, because they require to pass through the entire dataset at each evaluation of the gradient. This was the motivation for the recent development of incremental algorithms. By loading a single data point (or a minibatch) for each update, incremental algorithms reduce the computational cost per-iteration, yielding a significant improvement compared to classical methods, both in theory and in practice. A natural question arises: is it possible to further accelerate these incremental methods? We provide a positive answer by introducing several generic acceleration schemes for first-order optimization methods, which is the main contribution of this manuscript. In chapter 2, we develop a proximal variant of the Finito/MISO algorithm, which is an incremental method originally designed for smooth strongly convex problems. In order to deal with the non-smooth regularization penalty, we modify the update by introducing an additional proximal step. The resulting algorithm enjoys a similar linear convergence rate as the original algorithm, when the problem is strongly convex. In chapter 3, we introduce a generic acceleration scheme, called Catalyst, for accelerating gradient-based optimization methods in the sense of Nesterov. Our approach applies to a large class of algorithms, including gradient descent, block coordinate descent, incremental algorithms such as SAG, SAGA, SDCA, SVRG, Finito/MISO, and their proximal variants. For all of these methods, we provide acceleration and explicit support for non-strongly convex objectives. The Catalyst algorithm can be viewed as an inexact accelerated proximal point algorithm, applying a given optimization method to approximately compute the proximal operator at each iteration. The key for achieving acceleration is to appropriately choose an inexactness criteria and control the required computational effort. We provide a global complexity analysis and show that acceleration is useful in practice. In chapter 4, we present another generic approach called QNing, which applies Quasi-Newton principles to accelerate gradient-based optimization methods. The algorithm is a combination of inexact L-BFGS algorithm and the Moreau-Yosida regularization, which applies to the same class of functions as Catalyst. To the best of our knowledge, QNing is the first Quasi-Newton type algorithm compatible with both composite objectives and the finite sum setting. We provide extensive experiments showing that QNing gives significant improvement over competing methods in large-scale machine learning problems. We conclude the thesis by extending the Catalyst algorithm into the nonconvex setting. This is a joint work with Courtney Paquette and Dmitriy Drusvyatskiy, from University of Washington, and my PhD advisors. The strength of the approach lies in the ability of the automatic adaptation to convexity, meaning that no information about the convexity of the objective function is required before running the algorithm. When the objective is convex, the proposed approach enjoys the same convergence result as the convex Catalyst algorithm, leading to acceleration. When the objective is nonconvex, it achieves the best known convergence rate to stationary points for first-order methods. Promising experimental results have been observed when applying to sparse matrix factorization problems and neural network models.
Complete list of metadatas

Cited literature [168 references]  Display  Hide  Download
Contributor : Abes Star :  Contact
Submitted on : Tuesday, September 4, 2018 - 2:19:06 PM
Last modification on : Wednesday, October 14, 2020 - 4:16:32 AM
Long-term archiving on: : Wednesday, December 5, 2018 - 3:49:28 PM


Version validated by the jury (STAR)


  • HAL Id : tel-01867598, version 1



Hongzhou Lin. Generic acceleration schemes for gradient-based optimization in machine learning. Machine Learning [cs.LG]. Université Grenoble Alpes, 2017. English. ⟨NNT : 2017GREAM069⟩. ⟨tel-01867598⟩



Record views


Files downloads