Skip to Main content Skip to Navigation

Asynchronous optimization for machine learning

Abstract : The impressive breakthroughs of the last two decades in the field of machine learning can be in large part attributed to the explosion of computing power and available data. These two limiting factors have been replaced by a new bottleneck: algorithms. The focus of this thesis is thus on introducing novel methods that can take advantage of high data quantity and computing power. We present two independent contributions. First, we develop and analyze novel fast optimization algorithms which take advantage of the advances in parallel computing architecture and can handle vast amounts of data. We introduce a new framework of analysis for asynchronous parallel incremental algorithms, which enable correct and simple proofs. We then demonstrate its usefulness by performing the convergence analysis for several methods, including two novel algorithms. Asaga is a sparse asynchronous parallel variant of the variance-reduced algorithm Saga which enjoys fast linear convergence rates on smooth and strongly convex objectives. We prove that it can be linearly faster than its sequential counterpart, even without sparsity assumptions. ProxAsaga is an extension of Asaga to the more general setting where the regularizer can be non-smooth. We prove that it can also achieve a linear speedup. We provide extensive experiments comparing our new algorithms to the current state-of-art. Second, we introduce new methods for complex structured prediction tasks. We focus on recurrent neural networks (RNNs), whose traditional training algorithm for RNNs – based on maximum likelihood estimation (MLE) – suffers from several issues. The associated surrogate training loss notably ignores the information contained in structured losses and introduces discrepancies between train and test times that may hurt performance. To alleviate these problems, we propose SeaRNN, a novel training algorithm for RNNs inspired by the “learning to search” approach to structured prediction. SeaRNN leverages test-alike search space exploration to introduce global-local losses that are closer to the test error than the MLE objective. We demonstrate improved performance over MLE on three challenging tasks, and provide several subsampling strategies to enable SeaRNN to scale to large-scale tasks, such as machine translation. Finally, after contrasting the behavior of SeaRNN models to MLE models, we conduct an in-depth comparison of our new approach to the related work.
Complete list of metadatas

Cited literature [110 references]  Display  Hide  Download
Contributor : Abes Star :  Contact
Submitted on : Wednesday, January 29, 2020 - 9:15:08 AM
Last modification on : Tuesday, September 22, 2020 - 3:46:07 AM
Long-term archiving on: : Thursday, April 30, 2020 - 1:01:45 PM


Version validated by the jury (STAR)


  • HAL Id : tel-01950576, version 2



Rémi Leblond. Asynchronous optimization for machine learning. Machine Learning [cs.LG]. PSL Research University, 2018. English. ⟨NNT : 2018PSLEE057⟩. ⟨tel-01950576v2⟩



Record views


Files downloads