Statistiques discrètes et Statistiques bayésiennes en grande dimension

Abstract : In this PhD thesis we present the results we obtained in three linked fields: data compression for infinite alphabets; infinite-dimensinal Bayesian Statistics; multivariate multinomial mixture models. The first point deals with the problem of universal lossless coding on a countable infinite alphabet. It focuses on some classes of stationary memoryless sources defined by an envelope condition on the marginal distribution, namely exponentially decreasing envelope classes. An equivalent of the minimax redundancy of such classes is obtained. Then an approximately maximin prior distribution is provided and an adaptive algorithm is proposed, whose maximum redundancy is equivalent to the minimax redundancy. The next works deal with the asymptotic normality of a-posteriori distributions (Bernstein-von Mises theorems) in several nonparametric and semiparametric frameworks. First, in Gaussian linear regression models when the number of regressors increases with the sample size. Two kinds of Bernstein-von Mises Theorems are obtained in this framework: nonparametric theorems for the parameter itself, and semiparametric theorems for functionals of the parameter. We apply them to the Gaussian sequence model and to the regression of functions in Sobolev and Hölder regularity classes, in which we get the minimax convergence rates. Adaptivity is reached for the Bayesian estimators of functionals in our applications. We also get a nonparametric Bernstein-von Mises theorem for increasing-dimensional exponential models. In the last part of our work we consider the problem of estimating the number of components and the relevant variables in a multivariate multinomial mixture, in order to perform an unsupervised classification. Such models arise in particular when dealing with multilocus genotypic data. A new penalized maximum likelihood criterion is proposed, and a non-asymptotic oracle inequality is obtained. The criterion used in practice needs a calibration thanks to the slope heuristics, in an automatic data-driven procedure. Using simulated data, we found that this procedure improves the performances of the selection procedure with respect to classical criteria such as BIC and AIC. The procedures are implemented in a free-of-charge software.
Document type :
Theses
Complete list of metadatas

https://tel.archives-ouvertes.fr/tel-00561749
Contributor : Dominique Bontemps <>
Submitted on : Tuesday, February 1, 2011 - 5:37:52 PM
Last modification on : Thursday, January 11, 2018 - 6:12:18 AM
Long-term archiving on : Tuesday, November 6, 2012 - 1:10:17 PM

Identifiers

  • HAL Id : tel-00561749, version 1

Collections

Citation

Dominique Bontemps. Statistiques discrètes et Statistiques bayésiennes en grande dimension. Mathématiques [math]. Université Paris Sud - Paris XI, 2010. Français. ⟨tel-00561749⟩

Share

Metrics

Record views

432

Files downloads

397