Skip to Main content Skip to Navigation

Efficient Approximations of High-Dimensional Data

Abstract : In this thesis, we study approximations of set systems (X,S), where X is a base set and S consists of subsets of X called ranges. Given a finite set system, our goal is to construct a small subset of X set such that each range is `well-approximated'. In particular, for a given parameter epsilon in (0,1), we say that a subset A of X is an epsilon-approximation of (X,S) if for any range R in S, the fractions |A cap R|/|A| and |R|/|X| are epsilon-close.Research on such approximations started in the 1950s, with random sampling being the key tool for showing their existence. Since then, the notion of approximations has become a fundamental structure across several communities---learning theory, statistics, combinatorics and algorithms. A breakthrough in the study of approximations dates back to 1971 when Vapnik and Chervonenkis studied set systems with finite VC-dimension, which turned out a key parameter to characterise their complexity. For instance, if a set system (X,S) has VC dimension d, then a uniform sample of O(d/epsilon^2) points is an epsilon-approximation of (X,S) with high probability. Importantly, the size of the approximation only depends on epsilon and d, and it is independent of the input sizes |X| and |S|!In the first part of this thesis, we give a modular, self-contained, intuitive proof of the above uniform sampling guarantee .In the second part, we give an improvement of a 30 year old algorithmic bottleneck---constructing matchings with low crossing number. This can be applied to build approximations with improved guarantees.Finally, we answer a 30 year old open problem of Blumer etal. by proving tight lower bounds on the VC dimension of unions of half-spaces - a geometric set system that appears in several applications, e.g. coreset constructions
Document type :
Complete list of metadata
Contributor : ABES STAR :  Contact
Submitted on : Thursday, September 22, 2022 - 11:42:32 AM
Last modification on : Monday, October 3, 2022 - 11:25:30 AM


Version validated by the jury (STAR)


  • HAL Id : tel-03783594, version 1



Mónika Csikós. Efficient Approximations of High-Dimensional Data. Logic [math.LO]. Université Gustave Eiffel, 2022. English. ⟨NNT : 2022UEFL2004⟩. ⟨tel-03783594⟩



Record views


Files downloads