Process segmentation/clustering. Application to the analysis of CGH microarray data.

Abstract : This thesis is devoted to the development of a new statistical model for segmentation/clustering problems. The objective is to partition the data into homogeneous regions and to cluster these regions into a finite number of groups. Segmentation/clustering problems are traditionally studied with hidden Markov models. We propose an alternative model which combines segmentation models and mixture models.

We construct our model in the Gaussian case and we propose a generalization to discrete dependent variables. The parameters of the model are estimated by maximum likelihood with a hybrid algorithm based on dynamic programming and on the EM algorithm. We study a new model selection problem which is the simultaneous selection of the number of clusters and of the number of segments. We propose a heuristic for this choice.

Our model is applied to the analysis of CGH microarray data (Comparative Genomic Hybridization). This technique is used to measure the number of thousands of genes on the genome in one experiment. Our method allows us to localize deleted or amplified regions along chromosomes. We also propose an application to the analysis of DNA sequences for the identification of homogeneous regions in terms of nucleotide composition.
Document type :
Theses
Complete list of metadatas

https://tel.archives-ouvertes.fr/tel-00116025
Contributor : Franck Picard <>
Submitted on : Friday, November 24, 2006 - 1:33:10 PM
Last modification on : Wednesday, November 29, 2017 - 4:02:37 PM
Long-term archiving on : Thursday, September 20, 2012 - 3:00:54 PM

Identifiers

  • HAL Id : tel-00116025, version 1

Collections

Citation

Franck Picard. Process segmentation/clustering. Application to the analysis of CGH microarray data.. Mathematics [math]. Université Paris Sud - Paris XI, 2005. English. ⟨tel-00116025⟩

Share

Metrics

Record views

506

Files downloads

208