Skip to Main content Skip to Navigation

Vers une vision robuste de l'inférence géométrique

Claire Brécheteau 1, 2, 3
1 DATASHAPE - Understanding the Shape of Data
CRISAM - Inria Sophia Antipolis - Méditerranée , Inria Saclay - Ile de France
2 SELECT - Model selection in statistical learning
LMO - Laboratoire de Mathématiques d'Orsay, Inria Saclay - Ile de France
Abstract : It is primordial to establish effective and robust methods to extract pertinent information from datasets. We focus on datasets that can be represented as point clouds in some metric space, e.g. Euclidean space Rd; and that are generated according to some distribution. Of the natural questions that may arise when one has access to data, three are addressed in this thesis. The first question concerns the comparison of two sets of points. How to decide whether two datasets have been generated according to similar distributions ? We build a statistical test allowing to one to decide whether two point clouds have been generated from distributions that are equal (up to some rigid transformation e.g. symmetry, translation, rotation...). The second question is about the decomposition of a set of points into clusters. Given a point cloud, how does one make relevant clusters ? Often, it consists of selecting a set of k representatives, and associating every point to its closest representative (in some sense to be defined). We develop methods suited to data sampled according to some mixture of k distributions, possibly with outliers. Finally, when the data can not be grouped naturally into k clusters, e.g. when they are generated in a close neighborhood of some sub-manifold in Rd, a more relevant question is the following. How to build a system of k representatives, with k large, from which it is possible to recover the sub-manifold? This last question is related to the problems of quantization and compact set inference. To address it, we introduce and study a modification of the k-means method adapted to the presence of outliers, in the context of quantization. The answers we bring in this thesis are of two types, theoretical and algorithmic. The methods we develop are based on continuous objects built from distributions and sub-measures. Statistical studies allow us to measure the proximity between the empirical objects and the continuous ones. These methods are easy to implement in practice, when samples of points are available. The main tool in this thesis is the function distance-to-measure, which was originally introduced to make topological data analysis work in the presence of outliers.
Complete list of metadata

Cited literature [145 references]  Display  Hide  Download
Contributor : Claire Brécheteau <>
Submitted on : Wednesday, October 17, 2018 - 3:52:58 PM
Last modification on : Wednesday, September 16, 2020 - 5:30:46 PM
Long-term archiving on: : Friday, January 18, 2019 - 2:54:13 PM


Files produced by the author(s)


  • HAL Id : tel-01897787, version 1


Claire Brécheteau. Vers une vision robuste de l'inférence géométrique. Statistiques [math.ST]. Université Paris Sud (Paris 11) - Université Paris Saclay, 2018. Français. ⟨tel-01897787v1⟩



Record views


Files downloads