Skip to Main content Skip to Navigation

Estimation et sélection en classification semi-supervisée

Vincent Vandewalle 1, 2
2 SELECT - Model selection in statistical learning
LMO - Laboratoire de Mathématiques d'Orsay, Inria Saclay - Ile de France
Abstract : The subject of this thesis is the semi-supervised classification which is considered in decision-making perspective. We are interested in model choice issue when models are estimated using both labeled data and many unlabeled data. We focus our research on generative models for which the semi-supervised classification is considered without difficulty, unlike predictive framework that requires additional unnatural assumptions. Having developed a state of the art of semi-supervised classification, we describe the parameters estimation of a classification model using labeled data and unlabeled data by the EM algorithm. Our contributions on models selection is closely-watched in the two following chapters. In Chapter 3, we present a statistical test where unlabeled data are used to test the model. In Chapter 4 we present a model selection criterion, AIC_cond, derived from the AIC criterion in a predictive point of view. We prove the asymptotic convergence of this criterion particularly well suited to semi-supervised setting and his good practical performance compared to the cross-validation and other penalized likelihood criteria. A second part of the thesis, not directly connected with the semi-supervised setting, the multinomial models for classification of qualitative variables are considered. We designed these models to address the limitations of parsimonious multinomial models proposed in the program MIXMOD. For this setting, we propose a BIC-type criterion which takes into account specifically the complexity of the constrained multinomial models.
Document type :
Complete list of metadata
Contributor : Vincent Vandewalle <>
Submitted on : Thursday, January 14, 2010 - 1:44:37 PM
Last modification on : Friday, November 27, 2020 - 2:18:02 PM
Long-term archiving on: : Thursday, June 17, 2010 - 10:46:33 PM


  • HAL Id : tel-00447141, version 1



Vincent Vandewalle. Estimation et sélection en classification semi-supervisée. Mathématiques [math]. Université des Sciences et Technologie de Lille - Lille I, 2009. Français. ⟨tel-00447141⟩



Record views


Files downloads