Skip to Main content Skip to Navigation

Process segmentation/clustering. Application to the analysis of CGH microarray data.

Abstract : This thesis is devoted to the development of a new statistical model for segmentation/clustering problems. The objective is to partition the data into homogeneous regions and to cluster these regions into a finite number of groups. Segmentation/clustering problems are traditionally studied with hidden Markov models. We propose an alternative model which combines segmentation models and mixture models.

We construct our model in the Gaussian case and we propose a generalization to discrete dependent variables. The parameters of the model are estimated by maximum likelihood with a hybrid algorithm based on dynamic programming and on the EM algorithm. We study a new model selection problem which is the simultaneous selection of the number of clusters and of the number of segments. We propose a heuristic for this choice.

Our model is applied to the analysis of CGH microarray data (Comparative Genomic Hybridization). This technique is used to measure the number of thousands of genes on the genome in one experiment. Our method allows us to localize deleted or amplified regions along chromosomes. We also propose an application to the analysis of DNA sequences for the identification of homogeneous regions in terms of nucleotide composition.
Document type :
Complete list of metadata
Contributor : Franck Picard Connect in order to contact the contributor
Submitted on : Friday, November 24, 2006 - 1:33:10 PM
Last modification on : Friday, October 23, 2020 - 4:33:48 PM
Long-term archiving on: : Thursday, September 20, 2012 - 3:00:54 PM


  • HAL Id : tel-00116025, version 1
  • PRODINRA : 252126



Franck Picard. Process segmentation/clustering. Application to the analysis of CGH microarray data.. Mathematics [math]. Université Paris Sud - Paris XI, 2005. English. ⟨tel-00116025⟩



Record views


Files downloads