Skip to Main content Skip to Navigation
Theses

Contributions to the estimation of probabilistic discriminative models: semi-supervised learning and feature selection

Nataliya Sokolovska 1
1 TAO - Machine Learning and Optimisation
CNRS - Centre National de la Recherche Scientifique : UMR8623, Inria Saclay - Ile de France, UP11 - Université Paris-Sud - Paris 11, LRI - Laboratoire de Recherche en Informatique
Abstract : In this thesis, we investigate the use of parametric probabilistic models for classification tasks in the domain of natural language processing. We focus in particular on discriminative models, such as logistic regression and its generalization, conditional random fields (CRFs). Discriminative probabilistic models design directly conditional probability of a class given an observation. The logistic regression has been widely used due to its simplicity and effectiveness. Conditional random fields allow to take structural dependencies into consideration and therefore are used for structured output prediction. In this study, we address two aspects of modern machine learning, namely, semi-supervised learning and model selection, in the context of CRFs. The contribution of this thesis is twofold. First, we consider the framework of semi-supervised learning and propose a novel semi-supervised estimator and show that it is preferable to the standard logistic regression. Second, we study model selection approaches for discriminative models, in particular for CRFs and propose to penalize the CRFs with the elastic net. Since the penalty term is not differentiable in zero, we consider coordinate-wise optimization. The comparison with the performances of other methods demonstrates competitiveness of the CRFs penalized by the elastic net.
Complete list of metadatas

Cited literature [183 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-00557662
Contributor : Nataliya Sokolovska <>
Submitted on : Wednesday, January 19, 2011 - 4:30:32 PM
Last modification on : Friday, April 10, 2020 - 2:10:22 AM
Document(s) archivé(s) le : Tuesday, November 6, 2012 - 11:55:12 AM

Identifiers

  • HAL Id : tel-00557662, version 1

Collections

Citation

Nataliya Sokolovska. Contributions to the estimation of probabilistic discriminative models: semi-supervised learning and feature selection. Computer Science [cs]. Ecole nationale supérieure des telecommunications - ENST, 2010. English. ⟨tel-00557662⟩

Share

Metrics

Record views

316

Files downloads

501