Model selection via cross-validation in density estimation, regression, and change-points detection

Abstract : In this thesis, we aim at studying a family of resampling algorithms, referred to as cross-validation, and especially of one of them named leave-$p$-out. Extensively used in practice, these algorithms remain poorly understood, especially in the non-asymptotic framework. Our analysis of the leave-$p$-out algorithm is carried out both in density estimation and regression. Its main concern is to better understand cross-validation with respect to the cardinality $p$ of the test set it relies on. From a general point of view, cross-validation is devoted to estimate the risk of an estimator. Usually due to a prohibitive computational complexity, the leave-$p$-out is intractable. However, we turned it into a feasible procedure thanks to closed-form formulas for the risk estimator of a wide range of widespread estimators. Besides, the question of model selection via cross-validation is considered through two approaches. The first one relies on the optimal estimation of the risk in terms of a bias-variance tradeoff, which results in a density estimation procedure based on a fully data-driven choice of $p$. This procedure is successfully applied to the multiple testing problem. The second approach is related to the interpretation of cross-validation in terms of penalized criterion. The quality of the leave-$p$-out procedure is theoretically assessed through oracle inequalities as well as an adaptivity result in the density estimation setup. The change-points detection problem is another concern of this work. It is explored through an extensive simulation study based on theoretical considerations. From this, we propose a fully resampling-based procedure, which enables to deal with the hard problem of heteroscedasticity, while keeping a reasonable computational complexity.
Document type :
Theses
Mathematics. Université Paris Sud - Paris XI, 2008. English


https://tel.archives-ouvertes.fr/tel-00346320
Contributor : Alain Celisse <>
Submitted on : Thursday, December 11, 2008 - 1:48:42 PM
Last modification on : Wednesday, February 29, 2012 - 12:48:05 PM

Identifiers

  • HAL Id : tel-00346320, version 1

Collections

Citation

Alain Celisse. Model selection via cross-validation in density estimation, regression, and change-points detection. Mathematics. Université Paris Sud - Paris XI, 2008. English. <tel-00346320>

Export

Share

Metrics

Consultation de
la notice

563

Téléchargement du document

227