Skip to Main content Skip to Navigation
Theses

Arbres de décision et forêts aléatoires pour variables groupées

Abstract : In many problems in supervised learning, inputs have a known and/or obvious group structure. In this context, elaborating a prediction rule that takes into account the group structure can be more relevant than using an approach based only on the individual variables for both prediction accuracy and interpretation. The goal of this thesis is to develop some tree-based methods adapted to grouped variables. Here, we propose two new tree-based approaches which use the group structure to build decision trees. The first approach allows to build binary decision trees for classification problems. A split of a node is defined according to the choice of both a splitting group and a linear combination of the inputs belonging to the splitting group. The second method, which can be used for prediction problems in both regression and classification, builds a non-binary tree in which each split is a binary tree. These two approaches build a maximal tree which is next pruned. To this end, we propose two pruning strategies, one of which is a generalization of the minimal cost-complexity pruning algorithm. Since decisions trees are known to be unstable, we introduce a method of random forests that deals with groups of inputs. In addition to the prediction purpose, these new methods can be also use to perform group variable selection thanks to the introduction of some measures of group importance, This thesis work is supplemented by an independent part in which we consider the unsupervised framework. We introduce a new clustering algorithm. Under some classical regularity and sparsity assumptions, we obtain the rate of convergence of the clustering risk for the proposed alqorithm.
Document type :
Theses
Complete list of metadatas

Cited literature [114 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-01977246
Contributor : Abes Star :  Contact
Submitted on : Thursday, January 10, 2019 - 4:15:07 PM
Last modification on : Friday, July 10, 2020 - 4:17:59 PM
Long-term archiving on: : Thursday, April 11, 2019 - 5:42:41 PM

File

POTERIE_Audrey.pdf
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-01977246, version 1

Citation

Audrey Poterie. Arbres de décision et forêts aléatoires pour variables groupées. Statistiques [math.ST]. INSA de Rennes, 2018. Français. ⟨NNT : 2018ISAR0011⟩. ⟨tel-01977246⟩

Share

Metrics

Record views

333

Files downloads

2270