Skip to Main content Skip to Navigation
Theses

Paramètres d'ordre et sélection de modèles en apprentissage : caractérisation des modèles et sélection d'attributs

Abstract : This thesis focuses on model selection in Machine Learning from two points of view. The first part of the thesis focuses on relational kernel methods. Kernel methods hope to overcome the instances propositionalization, and to bridge the gap between relational and propositional problems. This thesis examines this objective in a particular case: the multiple instance problem, which is considered to be intermediate between relational and propositional problems. Concretely, we determine under which conditions the averaging kernel used for multiple instance problems, allows to reconstruct the target concept. This study follows the standard sketch of phase transition studies and relies on a new criterion to test the efficiency of of the propositionalization induced by the averaging kernel. The second part of the thesis focuses on feature selection. A solution to solve multiple instance problems, as presented in the first part, is to construct a propositionalization where each instance of the problem leads to a feature. This propositionalization constructs a huge number of features, which implies the need to look for a subset of features with only relevant features. Thus, the second part of the thesis presents a new framework for feature selection. Feature Selection is formalized as a Reinforcement Learning problem, leading to a provably optimal though intractable selection policy. This optimal policy is approximated, based on a one-player game approach and relying on the Monte-Carlo tree search UCT (Upper Confidence bound applied to Trees) proposed by Kocsis and Szepesvari (2006). The Feature Uct SElection (FUSE) algorithm extends UCT to deal with i) a finite unknown horizon (the target number of relevant features); ii) the huge branching factor of the search tree, reflecting the size of the feature set. Finally, a frugal reward function is proposed as a rough but unbiased estimate of the relevance of a feature subset. A proof of concept of FUSE is shown on benchmark data sets.
Document type :
Theses
Complete list of metadatas

Cited literature [11 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-00549090
Contributor : Romaric Gaudel <>
Submitted on : Tuesday, December 21, 2010 - 11:35:49 AM
Last modification on : Thursday, December 3, 2020 - 1:22:02 PM
Long-term archiving on: : Monday, November 5, 2012 - 2:45:11 PM

Identifiers

  • HAL Id : tel-00549090, version 1

Collections

Citation

Romaric Gaudel. Paramètres d'ordre et sélection de modèles en apprentissage : caractérisation des modèles et sélection d'attributs. Autre [cs.OH]. Université Paris Sud - Paris XI, 2010. Français. ⟨tel-00549090⟩

Share

Metrics

Record views

679

Files downloads

691