Skip to Main content Skip to Navigation

Efficient Sequential Learning in Structured and Constrained Environments

Daniele Calandriello 1
1 SEQUEL - Sequential Learning
Inria Lille - Nord Europe, CRIStAL - Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189
Abstract : The main advantage of non-parametric models is that the accuracy of the model (degrees of freedom) adapts to the number of samples. The main drawback is the so-called "curse of kernelization": to learn the model we must first compute a similarity matrix among all samples, which requires quadratic space and time and is unfeasible for large datasets. Nonetheless the underlying effective dimension (effective d.o.f.) of the dataset is often much smaller than its size, and we can replace the dataset with a subset (dictionary) of highly informative samples. Unfortunately, fast data-oblivious selection methods (e.g., uniform sampling) almost always discard useful information, while data-adaptive methods that provably construct an accurate dictionary, such as ridge leverage score (RLS) sampling, have a quadratic time/space cost. In this thesis we introduce a new single-pass streaming RLS sampling approach that sequentially construct the dictionary, where each step compares a new sample only with the current intermediate dictionary and not all past samples. We prove that the size of all intermediate dictionaries scales only with the effective dimension of the dataset, and therefore guarantee a per-step time and space complexity independent from the number of samples. This reduces the overall time required to construct provably accurate dictionaries from quadratic to near-linear, or even logarithmic when parallelized. Finally, for many non-parametric learning problems (e.g., K-PCA, graph SSL, online kernel learning) we we show that we can can use the generated dictionaries to compute approximate solutions in near-linear that are both provably accurate and empirically competitive.
Complete list of metadatas
Contributor : Daniele Calandriello <>
Submitted on : Friday, June 15, 2018 - 7:42:45 PM
Last modification on : Friday, December 11, 2020 - 6:44:05 PM
Long-term archiving on: : Monday, September 17, 2018 - 10:41:24 AM


Files produced by the author(s)


  • HAL Id : tel-01816904, version 1


Daniele Calandriello. Efficient Sequential Learning in Structured and Constrained Environments. Machine Learning [cs.LG]. Inria Lille Nord Europe - Laboratoire CRIStAL - Université de Lille, 2017. English. ⟨tel-01816904⟩



Record views


Files downloads