Model selection via cross-validation in density estimation, regression, and change-points detection

Abstract : In this thesis, we aim at studying a family of resampling algorithms, referred to as cross-validation, and especially of one of them named leave-$p$-out. Extensively used in practice, these algorithms remain poorly understood, especially in the non-asymptotic framework. Our analysis of the leave-$p$-out algorithm is carried out both in density estimation and regression. Its main concern is to better understand cross-validation with respect to the cardinality $p$ of the test set it relies on. From a general point of view, cross-validation is devoted to estimate the risk of an estimator. Usually due to a prohibitive computational complexity, the leave-$p$-out is intractable. However, we turned it into a feasible procedure thanks to closed-form formulas for the risk estimator of a wide range of widespread estimators. Besides, the question of model selection via cross-validation is considered through two approaches. The first one relies on the optimal estimation of the risk in terms of a bias-variance tradeoff, which results in a density estimation procedure based on a fully data-driven choice of $p$. This procedure is successfully applied to the multiple testing problem. The second approach is related to the interpretation of cross-validation in terms of penalized criterion. The quality of the leave-$p$-out procedure is theoretically assessed through oracle inequalities as well as an adaptivity result in the density estimation setup. The change-points detection problem is another concern of this work. It is explored through an extensive simulation study based on theoretical considerations. From this, we propose a fully resampling-based procedure, which enables to deal with the hard problem of heteroscedasticity, while keeping a reasonable computational complexity.
Document type :
Mathematics [math]. Université Paris Sud - Paris XI, 2008. English
Contributor : Alain Celisse <>
Submitted on : Thursday, December 11, 2008 - 1:48:42 PM
Last modification on : Thursday, October 1, 2015 - 12:41:01 PM


  • HAL Id : tel-00346320, version 1



Alain Celisse. Model selection via cross-validation in density estimation, regression, and change-points detection. Mathematics [math]. Université Paris Sud - Paris XI, 2008. English. <tel-00346320>




Consultation de
la notice


Téléchargement du document