Skip to Main content Skip to Navigation
Theses

Regularization Path Algorithm for Statistical Learning

Abstract : The selection of a proper model is an essential task in statistical learning. In general, for a given learning task, a set of parameters has to be chosen, each parameter corresponds to a different degree of "complexity''. In this situation, the model selection procedure becomes a search for the optimal "complexity'', allowing us to estimate a model that assures a good generalization. This model selection problem can be summarized as the calculation of one or more hyperparameters defining the model complexity in contrast to the parameters that allow to specify a model in the chosen complexity class.

The usual approach to determine these parameters is to use a "grid search''. Given a set of possible values, the generalization error for the best model is estimated for each of these values. This thesis is focused in an alternative approach consisting in calculating the complete set of possible solution for all hyperparameter values. This is what is called the regularization path. It can be shown that for the problems we are interested in, parametric quadratic programming (PQP), the corresponding regularization path is piecewise linear. Moreover, its calculation is no more complex than calculating a single PQP solution.

This thesis is organized in three chapters, the first one introduces the general setting of a learning problem under the Support Vector Machines' (SVM) framework together with the theory and algorithms that allow us to find a solution. The second part deals with supervised learning problems for classification and ranking using the SVM framework. It is shown that the regularization path of these problems is piecewise linear and alternative proofs to the one of Rosset (2004) are given via the subdifferential. These results lead to the corresponding algorithms to solve the mentioned supervised problems. The third part deals with semi-supervised learning problems followed by unsupervised learning problems. For the semi-supervised learning a sparsity constraint is introduced along with the corresponding regularization path algorithm. Graph-based dimensionality reduction methods are used for unsupervised learning problems. Our main contribution is a novel algorithm that allows to choose the number of nearest neighbors in an adaptive and appropriate way contrary to classical approaches based on a fix number of neighbors.
Document type :
Theses
Complete list of metadatas

Cited literature [20 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-00422854
Contributor : Karina Zapien <>
Submitted on : Thursday, October 8, 2009 - 1:27:19 PM
Last modification on : Tuesday, February 5, 2019 - 11:44:21 AM
Long-term archiving on: : Tuesday, June 15, 2010 - 10:34:34 PM

Identifiers

  • HAL Id : tel-00422854, version 1

Citation

Zapien Karina. Regularization Path Algorithm for Statistical Learning. Computer Science [cs]. INSA de Rouen, 2009. English. ⟨tel-00422854⟩

Share

Metrics

Record views

292

Files downloads

729