Skip to Main content Skip to Navigation

Statistiques discrètes et Statistiques bayésiennes en grande dimension

Abstract : In this PhD thesis we present the results we obtained in three linked fields: data compression for infinite alphabets; infinite-dimensinal Bayesian Statistics; multivariate multinomial mixture models. The first point deals with the problem of universal lossless coding on a countable infinite alphabet. It focuses on some classes of stationary memoryless sources defined by an envelope condition on the marginal distribution, namely exponentially decreasing envelope classes. An equivalent of the minimax redundancy of such classes is obtained. Then an approximately maximin prior distribution is provided and an adaptive algorithm is proposed, whose maximum redundancy is equivalent to the minimax redundancy. The next works deal with the asymptotic normality of a-posteriori distributions (Bernstein-von Mises theorems) in several nonparametric and semiparametric frameworks. First, in Gaussian linear regression models when the number of regressors increases with the sample size. Two kinds of Bernstein-von Mises Theorems are obtained in this framework: nonparametric theorems for the parameter itself, and semiparametric theorems for functionals of the parameter. We apply them to the Gaussian sequence model and to the regression of functions in Sobolev and Hölder regularity classes, in which we get the minimax convergence rates. Adaptivity is reached for the Bayesian estimators of functionals in our applications. We also get a nonparametric Bernstein-von Mises theorem for increasing-dimensional exponential models. In the last part of our work we consider the problem of estimating the number of components and the relevant variables in a multivariate multinomial mixture, in order to perform an unsupervised classification. Such models arise in particular when dealing with multilocus genotypic data. A new penalized maximum likelihood criterion is proposed, and a non-asymptotic oracle inequality is obtained. The criterion used in practice needs a calibration thanks to the slope heuristics, in an automatic data-driven procedure. Using simulated data, we found that this procedure improves the performances of the selection procedure with respect to classical criteria such as BIC and AIC. The procedures are implemented in a free-of-charge software.
Document type :
Complete list of metadata
Contributor : Dominique Bontemps Connect in order to contact the contributor
Submitted on : Tuesday, February 1, 2011 - 5:37:52 PM
Last modification on : Sunday, June 26, 2022 - 11:53:18 AM
Long-term archiving on: : Tuesday, November 6, 2012 - 1:10:17 PM


  • HAL Id : tel-00561749, version 1



Dominique Bontemps. Statistiques discrètes et Statistiques bayésiennes en grande dimension. Mathématiques [math]. Université Paris Sud - Paris XI, 2010. Français. ⟨tel-00561749⟩



Record views


Files downloads