Statistiques de scan : théorie et application à l'épidémiologie

Abstract : The concept of cluster means the aggregation of events in time and / or space. In many areas, experts observe certain aggregations of events and the question arises whether these aggregations can be considered normal (by chance) or not. From a probabilistic point of view, normality can be described by a null hypothesis of random distribution of events.The detection of clusters of events is an area of statistics that has particularly spread over the past decades. First, the scientific community has focused on developing methods for the one-dimensional framework (eg time) and then subsequently extended these methods to the multidimensional case, especially two-dimensional (space). Of all the methods for detecting clusters of events, three major types of tests can be distinguished. The first type concerns global tests that detect an overall tendency to aggregation, without locating any clusters. The second type corresponds to the focused tests that are used when a priori knowledge is used to define a point source (date or spatial location) and to test the aggregation around it. The third type includes the cluster detection tests that allow localization, without a priori, cluster of events and test their statistical significance. In this thesis, we focused on the latter category, especially to methods based on scan statistics.These methods have emerged in the early 1960s and can detect clusters of events and determine their \"normal" appearance (coincidence) or "abnormal". The detection step is performed by scanning through a window, namely scanning window, the studied area (discrete or continuous, time, space), in which the events are observed. This detection step leads to a set of windows, each defining a potential cluster. A scan statistic is a random variable defined as the window with the maximum number of events observed.Scan statistics are used as a test statistic to check the independence and belonging to a given distribution of observations, against an alternative hypothesis supporting the existence of cluster within the studied region. Moreover, the main difficulty lies in determining the distribution of scan statistics under the null hypothesis. Indeed, since it is defined as the maximum of a sequence of dependent random variables, the dependence is due to the recovery of different windows scan, it exists only in very rare cases explicit solutions. Also, a piece of literature is focused on the development of methods (exact formulas and approximations) to determine the distribution of scan statistics. Moreover, in the two-dimensional framework, the scanning window can take various geometric shapes (rectangular, circular, ...) that could have an influence on the approximation of the distribution of the scan statistic. However, to our knowledge, no study has evaluated this influence. In the spatial context, the spatial scan statistics developed by M. Kulldorff are the most commonly used methods for spatial cluster detection. The principle of these methods lies in scanning the studied area with circular windows and selecting the most likely cluster maximizing a likelihood ratio test statistic. Statistical inference of the latter is achieved through Monte Carlo simulations. However, in the case of huge databases and / or when important accuracy of the critical probability associated with the detected cluster is required, Monte Carlo simulations are extremely time-consuming.First , we evaluated the influence of the scanning window shape on the distribution of two dimensional discrete scan statistics. A simulation study performed with squared, rectangular and discrete circle scanning windows has highlighted the fact that the distributions of the associated scan statistics are very close each to other but significantly different. The power of the scan statistics is related to the shape of the scanning window and that of the existing cluster under alternative hypothesis through out a simulation study. [...]
Keywords : Scan statistics
Document type :
Complete list of metadatas

Cited literature [124 references]  Display  Hide  Download
Contributor : Abes Star <>
Submitted on : Wednesday, June 11, 2014 - 5:42:08 PM
Last modification on : Tuesday, July 3, 2018 - 11:43:15 AM
Long-term archiving on : Thursday, September 11, 2014 - 12:55:29 PM


Version validated by the jury (STAR)


  • HAL Id : tel-01004929, version 1



Mickaël Genin. Statistiques de scan : théorie et application à l'épidémiologie. Médecine humaine et pathologie. Université du Droit et de la Santé - Lille II, 2013. Français. ⟨NNT : 2013LIL2S029⟩. ⟨tel-01004929⟩



Record views


Files downloads