Abstract : This manuscript provides a mathematical study of the multiple testing problem in settings motivated by modern applications, for which the number of variables is much larger than the sample size. As we will see, this problem highly depends on the nature of the data and of the desired interpretation of the results.
Chapter 1 is a wide introduction to the multiple testing theme which is intended to be accessible for a possibly non-specialist reader and which includes a presentation of some high-dimensional genomic data. The necessary probabilistic materials are then introduced in Chapter 2, while Chapters 3, 4, 5 and 6 are guided by the findings [P2, P3, P4, P5, P6, P7, P8, P9, P10, P11, P12, P15, P16] listed page 81. Nevertheless, compared to the original papers, I have tried to simplify and unify the studies as much as possible. An effort has also be done for presenting self-contained proofs when possible. Let us also mention that, as an upstream work, I have proposed a survey paper in [P13]. Nevertheless, the overlap between that work and the present manuscript turns out to be only minor.
The publications [P1, P2, P3, P6, P7, P14] correspond to a research (essentially) carried out during my PhD period at the Paris-Sud University and INRA1. The papers [P15, P11] are related to my postdoctoral position at the VU University Amsterdam. I have elaborated the work [P4, P5, P8, P9, P10, P12, P13, P16] afterwards, as a “maître de conférences” at the Pierre et Marie Curie University in Paris.
Throughout this manuscript, we will see that while the multiple testing problem occurs in various practical and concrete situations, it relies on an astonishingly wide variety of theoretical concepts, as combinatorics, resampling, empirical processes, concentration inequalities, positive dependence, among others. This symbiosis between theory and practice explains the worldwide success of the multiple testing research field, which has become a prominent research area of contemporary statistics.