Skip to Main content Skip to Navigation

Estimation et sélection en classification semi-supervisée

Vincent Vandewalle 1, 2 
2 SELECT - Model selection in statistical learning
Inria Saclay - Ile de France, LMO - Laboratoire de Mathématiques d'Orsay
Abstract : The subject of this thesis is the semi-supervised classification which is considered in decision-making perspective. We are interested in model choice issue when models are estimated using both labeled data and many unlabeled data. We focus our research on generative models for which the semi-supervised classification is considered without difficulty, unlike predictive framework that requires additional unnatural assumptions. Having developed a state of the art of semi-supervised classification, we describe the parameters estimation of a classification model using labeled data and unlabeled data by the EM algorithm. Our contributions on models selection is closely-watched in the two following chapters. In Chapter 3, we present a statistical test where unlabeled data are used to test the model. In Chapter 4 we present a model selection criterion, AIC_cond, derived from the AIC criterion in a predictive point of view. We prove the asymptotic convergence of this criterion particularly well suited to semi-supervised setting and his good practical performance compared to the cross-validation and other penalized likelihood criteria. A second part of the thesis, not directly connected with the semi-supervised setting, the multinomial models for classification of qualitative variables are considered. We designed these models to address the limitations of parsimonious multinomial models proposed in the program MIXMOD. For this setting, we propose a BIC-type criterion which takes into account specifically the complexity of the constrained multinomial models.
Document type :
Complete list of metadata
Contributor : Vincent Vandewalle Connect in order to contact the contributor
Submitted on : Thursday, January 14, 2010 - 1:44:37 PM
Last modification on : Friday, October 7, 2022 - 3:45:32 AM
Long-term archiving on: : Thursday, June 17, 2010 - 10:46:33 PM


  • HAL Id : tel-00447141, version 1


Vincent Vandewalle. Estimation et sélection en classification semi-supervisée. Mathématiques [math]. Université des Sciences et Technologie de Lille - Lille I, 2009. Français. ⟨tel-00447141⟩



Record views


Files downloads