Sélection de modèle pour la classification non supervisée. Choix du nombre de classes.

Abstract : The reported works take place in the statistical framework of model-based clustering. We particularly focus on choosing the number of classes and on the ICL model selection criterion. A fruitful approach for theoretically studying it consists of considering a contrast related to the clustering purpose. This entails the definition and study of a new estimator and new model selection criteria. Practical solutions are provided to compute them, which can also be applied to the computation of the usual maximum likelihood estimator within mixture models. The slope heuristics is applied to the calibration of the considered penalized criteria. Thus its theoretical bases are recalled in details and two approaches for its application are studied. Another approach for model-based clustering is considered: each class itself may be modeled by a Gaussian mixture. A methodology is proposed, notably to tackle the question of which components have to be merged. Finally a criterion is proposed, which enables to choose a number of components --when identified to the number of classes-- related to a known external classification.
Document type :
Theses
Complete list of metadatas

https://tel.archives-ouvertes.fr/tel-00461550
Contributor : Jean-Patrick Baudry <>
Submitted on : Thursday, March 4, 2010 - 6:48:09 PM
Last modification on : Wednesday, November 29, 2017 - 9:34:26 AM
Long-term archiving on : Friday, June 18, 2010 - 10:19:15 PM

Identifiers

  • HAL Id : tel-00461550, version 1

Collections

Citation

Jean-Patrick Baudry. Sélection de modèle pour la classification non supervisée. Choix du nombre de classes.. Mathématiques [math]. Université Paris Sud - Paris XI, 2009. Français. ⟨tel-00461550⟩

Share

Metrics

Record views

1169

Files downloads

1118