Skip to Main content Skip to Navigation
Theses

Optimisation combinatoire pour la sélection de variables en régression en grande dimension : Application en génétique animale

Julie Hamon 1, 2
1 DOLPHIN - Parallel Cooperative Multi-criteria Optimization
LIFL - Laboratoire d'Informatique Fondamentale de Lille, Inria Lille - Nord Europe
Abstract : Advances in high-throughput sequencing and genotyping technologies allow to measure large amounts of genomic information. The aim of this work is dedicated to the animal genomic selection is to select a subset of relevant genetic markers to predict a quantitative trait, in a context where the number of genotyped animals is widely lower than the number of markers studied. This thesis introduces a state-of-the-art of existing methods to address the problem. We then suggest to deal with the variable selection in high dimensional regression problem combining combinatorial optimization methods and statistical models. We start by experimentally set two combinatorial optimization methods, the iterated local search and the genetic algorithm, combined with a linear multiple regression and we evaluate their relevance. In the context of animal genomic, family relationships between animals are known and can be an important information. As our approach is flexible we suggest an adaptation to consider these familial relationships through the use of a mixed model. Moreover, the problem of overfitting is particularly present in such data due to the large imbalance between the number of variables studied and the number of animals available, so we suggest an improvement of our approach in order to reduce this over-fitting. The different suggested approaches are validated on data from the literature as well as on real data of Gènes Diffusion.
Complete list of metadatas

Cited literature [102 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-00920205
Contributor : Julie Hamon <>
Submitted on : Wednesday, December 18, 2013 - 8:17:55 AM
Last modification on : Thursday, February 21, 2019 - 10:52:49 AM
Document(s) archivé(s) le : Wednesday, March 19, 2014 - 5:44:51 AM

Identifiers

  • HAL Id : tel-00920205, version 1

Citation

Julie Hamon. Optimisation combinatoire pour la sélection de variables en régression en grande dimension : Application en génétique animale. Applications [stat.AP]. Université des Sciences et Technologie de Lille - Lille I, 2013. Français. ⟨tel-00920205⟩

Share

Metrics

Record views

1070

Files downloads

6189