Skip to Main content Skip to Navigation

On efficient methods for high-dimensional statistical estimation

Dmitry Babichev 1, 2
2 SIERRA - Statistical Machine Learning and Parsimony
DI-ENS - Département d'informatique de l'École normale supérieure, CNRS - Centre National de la Recherche Scientifique, Inria de Paris
Abstract : In this thesis we consider several aspects of parameter estimation for statistics and machine learning and optimization techniques applicable to these problems. The goal of parameter estimation is to find the unknown hidden parameters, which govern the data, for example parameters of an unknown probability density. The construction of estimators through optimization problems is only one side of the coin, finding the optimal value of the parameter often is an optimization problem that needs to be solved, using various optimization techniques. Hopefully these optimization problems are convex for a wide class of problems, and we can exploit their structure to get fast convergence rates. The first main contribution of the thesis is to develop moment-matching techniques for multi-index non-linear regression problems. We consider the classical non-linear regression problem, which is unfeasible in high dimensions due to the curse of dimensionality. We combine two existing techniques: ADE and SIR to develop the hybrid method without some of the weak sides of its parents. In the second main contribution we use a special type of averaging for stochastic gradient descent. We consider conditional exponential families (such as logistic regression), where the goal is to find the unknown value of the parameter. Classical approaches, such as SGD with constant step-size are known to converge only to some neighborhood of the optimal value of the parameter, even with averaging. We propose the averaging of moment parameters, which we call prediction functions. For finite-dimensional models this type of averaging can lead to negative error, i.e., this approach provides us with the estimator better than any linear estimator can ever achieve. The third main contribution of this thesis deals with Fenchel-Young losses. We consider multi-class linear classifiers with the losses of a certain type, such that their dual conjugate has a direct product of simplices as a support. We show, that for multi-class SVM losses with smart matrix-multiplication sampling techniques, our approach has an iteration complexity which is sublinear, i.e., we need to pay only trice O(n+d+k): for number of classes k, number of features d and number of samples n, whereas all existing techniques have higher complexity.
Complete list of metadatas

Cited literature [133 references]  Display  Hide  Download
Contributor : Abes Star :  Contact
Submitted on : Thursday, June 18, 2020 - 4:45:25 PM
Last modification on : Tuesday, September 22, 2020 - 3:49:33 AM


Version validated by the jury (STAR)


  • HAL Id : tel-02433016, version 2



Dmitry Babichev. On efficient methods for high-dimensional statistical estimation. Machine Learning [stat.ML]. PSL Research University, 2019. English. ⟨NNT : 2019PSLEE032⟩. ⟨tel-02433016v2⟩



Record views


Files downloads