Skip to Main content Skip to Navigation
Habilitation à diriger des recherches

Résumé des Travaux en Statistique et Applications des Statistiques

Abstract : The present report surveys the essentials of my research activity since my PhD thesis [53], which was
mainly devoted to extend the use of recent advances in Computational Harmonic Analysis (such
as wavelet analysis) for adaptive nonparametric estimation methods in the i.i.d. setting to statistical
estimation based on Markovian data. As explained at length in [123], certain concentration of
measure properties (i.e. deviation probability and moment inequalities over functional classes,
specifically tailored for nonlinear approximation) are crucially required for taking advantages of
these analytical tools in statistical settings and getting estimation procedures with convergence
rates surpassing the ones of older methods. In [53] (see also [54], [55] and [56]), the regenerative
method (refer to [185]), consisting in dividing Harris Markov sample paths into asymptotically
i.i.d. blocks, has been crucially exploited for establishing the required probabilistic results, the
long term behavior of Markov processes being governed by certain renewal processes (the blocks
being actually determined by renewal times). But having constructed an estimator, estimation of
the accuracy (measured by the variance, particular quantiles or any functional of the distribution
function) of the computed statistic is next of crucial importance. In this respect and beyond its
practical simplicity (it consists in resampling data by making i.i.d. draws in the original data sample
and recompute the statistic from the bootstrap data sample), the bootstrap is known to have major
theoretical advantages over asymptotic normal approximation in the i.i.d. setting (it automatically
approximates the second order structure in the Edgeworth expansion of the statistic distribution).
I then turned naturally to the problem of extending the popular bootstrap procedure to markovian
data. Through the works I and Patrice Bertail have jointly carried out, the regenerative method
was revealed to be not solely a powerful analytical tool for proving probabilistic limit theorems
or inequalities, but also to be of practical use for statistical estimation: our proposed bootstrap
generalization is based on the resampling of (a random number of) regeneration data blocks (or of
approximation of the latter) so as to mimick the renewal structure of the data. This method has
also been shown to be advantageous for many other statistical purposes. And the first part of the
report strives to present the principle of regeneration-based statistical methods for Harris Markov
chains, as well as some of the various results obtained this way, in a comprehensive manner.
The second part of the report is devoted to the problem of learning how to order instances,
instead of classifying them only, in a supervised setting. This dicult problem is of practical
importance in many areas, ranging from medical diagnosis to information retrieval (IR) and asks
challenging theoretical and algorithmic questions, with no entirely satisfactory answers yet. A possible
approach to this subject consists in reducing the problem to a pairwise classification problem,
as suggested by a popular criterion (namely, the AUC criterion) widely used for evaluating the
pertinence of an ordering. In this context some results have been obtained in a joint work with
Gabor Lugosi and Nicolas Vayatis, involving the study of U-processes: the major novelty consisting
in the fact that here natural estimates of the risk are of the form of a U-statistic. However,
in many applications such as IR, only top ranked instances are eectively scanned and a criterion
corresponding to such local ranking problems as well as methods for computing optimal ordering
rules with respect to the latter are crucially needed. Further developments in this direction have been considered in a (continuing) series of works in collaboration with Nicolas Vayatis.
Finally, the last part of the report reflects my interest in practical applications of probabilistic
concepts and statistical tools. My personal background lead me to consider first applications in
finance. Although historical approaches are not preferred in this domain, I have been progressively
convinced that nonparametric statistics could play a major role in analyzing the massive (of very
large dimension and high-frequency) financial data for detecting hidden structure in the latter
and gaining advantage of the latter in risk assesment or portfolio selection for instance. As an
illustration, the works I have carried out with Skander Slim in that direction are described in a
word in this third part. Recently, I also happened to meet applied mathematicians or scientists
working in other fields, which may naturally interface with applied probability ans statistics. Hence,
applications to Toxicology, and in particular to toxic chemicals dietary exposure, has also been one
of my concern this last year, which I have spent in the pluridisciplinary research unity Metarisk
of the National Research Agronomy Institute, entirely dedicated to dietary risk analysis. I could
thus make use of my skills in Markov modelling for proposing a stochastic model describing the
temporal evolution of the total body burden of chemical (in a way that both the toxicokinetics and
the dietary behavior may be taken into account) and adequate inference methods for the latter in
a joint work with P. Bertail and J. Tressou. This line of research is still going on and will hopefully
provide practical insight and guidance for dietary contamination control in public health practice.
It is also briefly presented in this last part. Besides, I have the great opportunity to work currently
on the modelling of the AIDS epidemic with H. de Arazoza, B. Auvert, P. Bertail, R. Lounes and C.
Tran based on the cuban epidemic data available, which form one of the most informed database on
any HIV epidemic. While such a research project (taking place in the framework of the ACI-NIM
"Epidemic Modelling") aims at providing a numerical model (for computing incidence predictions
on short horizons for instance, so as to plan the quantity of antiretrovirals required), it also poses
very challenging probabilistic and statistical problems, ranging from the proof for the existence of
a quasi-stationary distribution describing the long term behavior of the epidemic to the diculties
encountered due to the incomplete character of the epidemic data available. Unfortunately, they
are not discussed here, presenting the wide variety of mathematical problems arising in this project
without denaturing it would have deserved a whole report.
Document type :
Habilitation à diriger des recherches
Complete list of metadatas

Cited literature [198 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-00138299
Contributor : Stephan Clémençon <>
Submitted on : Thursday, March 29, 2007 - 1:34:58 PM
Last modification on : Wednesday, December 9, 2020 - 3:10:37 PM
Long-term archiving on: : Friday, September 21, 2012 - 1:25:22 PM

Identifiers

  • HAL Id : tel-00138299, version 1
  • PRODINRA : 251816

Citation

Stéphan Clémençon. Résumé des Travaux en Statistique et Applications des Statistiques. Mathématiques [math]. Université de Nanterre - Paris X, 2006. ⟨tel-00138299⟩

Share

Metrics

Record views

2139

Files downloads

487