Nonsmooth optimization for statistical learning with structured matrix regularization

Federico Pierucci 1, 2
2 Thoth - Apprentissage de modèles à partir de données massives
LJK - Laboratoire Jean Kuntzmann, Inria Grenoble - Rhône-Alpes
Abstract : Training machine learning methods boils down to solving optimization problems whose objective functions often decomposes into two parts: a) the empirical risk, built upon the loss function, whose shape is determined by the performance metric and the noise assumptions; b) the regularization penalty, built upon a norm, or a gauge function, whose structure is determined by the prior information available for the problem at hand.Common loss functions, such as the hinge loss for binary classification, or more advanced loss functions, such as the one arising in classification with reject option, are non-smooth. Sparse regularization penalties such as the (vector) l1- penalty, or the (matrix) nuclear-norm penalty, are also non-smooth. However, basic non-smooth optimization algorithms, such as subgradient optimization or bundle-type methods, do not leverage the composite structure of the objective. The goal of this thesis is to study doubly non-smooth learning problems (with non-smooth loss functions and non-smooth regularization penalties) and first- order optimization algorithms that leverage composite structure of non-smooth objectives.In the first chapter, we introduce new regularization penalties, called the group Schatten norms, to generalize the standard Schatten norms to block- structured matrices. We establish the main properties of the group Schatten norms using tools from convex analysis and linear algebra; we retrieve in particular some convex envelope properties. We discuss several potential applications of the group nuclear-norm, in collaborative filtering, database compression, multi-label image tagging.In the second chapter, we present a survey of smoothing techniques that allow us to use first-order optimization algorithms designed for composite objectives decomposing into a smooth part and a non-smooth part. We also show how smoothing can be used on the loss function corresponding to the top-k accuracy, used for ranking and multi-class classification problems. We outline some first-order algorithms that can be used in combination with the smoothing technique: i) conditional gradient algorithms; ii) proximal gradient algorithms; iii) incremental gradient algorithms.In the third chapter, we study further conditional gradient algorithms for solving doubly non-smooth optimization problems. We show that an adaptive smoothing combined with the standard conditional gradient algorithm gives birth to new conditional gradient algorithms having the expected theoretical convergence guarantees. We present promising experimental results in collaborative filtering for movie recommendation and image categorization.
Complete list of metadatas

Cited literature [96 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-01572186
Contributor : Abes Star <>
Submitted on : Friday, January 12, 2018 - 4:22:06 PM
Last modification on : Wednesday, March 13, 2019 - 3:51:28 PM
Long-term archiving on : Saturday, May 5, 2018 - 9:55:25 PM

File

PIERUCCI_2017_archivage.pdf
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-01572186, version 2

Citation

Federico Pierucci. Nonsmooth optimization for statistical learning with structured matrix regularization. Data Structures and Algorithms [cs.DS]. Université Grenoble Alpes, 2017. English. ⟨NNT : 2017GREAM024⟩. ⟨tel-01572186v2⟩

Share

Metrics

Record views

329

Files downloads

314