Conception d'heuristiques d'optimisation pour les problèmes de grande dimension : application à l'analyse de données de puces à ADN

Abstract : This PhD thesis explains the recent issue concerning the resolution of high-dimensional problems. We present methods designed to solve them, and their applications for feature selection problems, in the data mining field. In the first part of this thesis, we introduce the stakes of solving high-dimensional problems. We mainly investigate line search methods, because we consider them to be particularly suitable for solving such problems. Then, we present the methods we developed, based on this principle : CUS, EUS and EM323. We emphasize, in particular, the very high convergence speed of CUS and EUS, and their simplicity of implementation. The EM323 method is based on an hybridization between EUS and a one-dimensional optimization algorithm developed by F. Glover : the 3-2-3 algorithm. We show that the results of EM323 are more accurate, especially for non-separable problems, which are the weakness of line search based methods. In the second part, we focus on data mining problems, and especially those concerning microarray data analysis. The objectives are to classify data and to predict the behavior of new samples. A collaboration with the Tenon Hospital in Paris allows us to analyze their private breast cancer data. To this end, we develop an exact method, called delta-test, enhanced by a method designed to automatically select the optimal number of variables. In a second time, we develop an heuristic, named ABEUS, based on the optimization of the DLDA classifier performances. The results obtained from publicly available data show that our methods manage to select very small subsets of variables, which is an important criterion to avoid overfitting
Document type :
Theses
Other [cs.OH]. Université Paris-Est, 2011. French. <NNT : 2011PEST1022>


https://tel.archives-ouvertes.fr/tel-00676449
Contributor : Abes Star <>
Submitted on : Monday, March 5, 2012 - 1:52:31 PM
Last modification on : Tuesday, August 26, 2014 - 4:22:10 PM

File

TH2011PEST1022_complete.pdf
fileSource_public_star

Identifiers

  • HAL Id : tel-00676449, version 1

Collections

Citation

Vincent Gardeux. Conception d'heuristiques d'optimisation pour les problèmes de grande dimension : application à l'analyse de données de puces à ADN. Other [cs.OH]. Université Paris-Est, 2011. French. <NNT : 2011PEST1022>. <tel-00676449>

Export

Share

Metrics

Consultation de
la notice

359

Téléchargement du document

1677