Optimisation convexe pour la cosegmentation

Armand Joulin 1, 2
2 SIERRA - Statistical Machine Learning and Parsimony
DI-ENS - Département d'informatique de l'École normale supérieure, ENS Paris - École normale supérieure - Paris, Inria Paris-Rocquencourt, CNRS - Centre National de la Recherche Scientifique : UMR8548
Abstract : People and most animals have a natural ability to see the world and understand it effortlessly. The apparent simplicity of this task suggests that this ability is, to some extend, mechanical, i.e., does not require high level thinking or profound reasoning. This observation suggests that this visual perception of the world should be reproducible on a mechanical device such as a computer. Computer vision is the field of research dedicated to creating a form of visual perception on computers. The first work on computer vision dates from the 50's but the amount of power needed for treating and analyzing visual data was not available at this time. It is only recently that improvements in computer power and storage capacities, have permitted this field to really emerge. On the one hand, constant progress in computer vision has allowed to develop dedicated solutions to practical or industrial problems. Detecting human faces, tracking people in crowded areas or default in production chains are industrial applications where computer vision is used. On the other hand, when it comes to creating a general visual perception for computers, it is probably fair to say that less progress has been made, and the community is still struggling with fundamental problems. One of these problems is to reproduce our ability of grouping into meaningful regions, the visual input data recorded by an optical device. This procedure, called segmentation, separates a scene into meaningful entities (e.g., objects or actions). Segmentation seems not only natural but essential for people to fully understand a given scene, but it is still very challenging for a computer. One reason is the difficulty of clearly identify what ``meaningful'' should be, i.e., depending on the scene or the situation, a region may have different interpretations. In this thesis, we will focus on the segmentation task and will try to avoid this fundamental difficulty by considering segmentation as a weakly supervised learning problem. Instead of segmenting images according to some predefined definition of ``meaningful'' regions, we develop methods to segment multiple images jointly into entities that repeatedly appear across the set of images. In other words, we define ``meaningful'' regions from a statistical point of view: they are regions that appears frequently in a dataset, and we design procedures to discover them. This leads us to design models whose a scope goes beyond this application to vision. Our approach takes its roots in the field of machine learning, whose goal is to design efficient methods to retrieve and/or learn common patterns in data. The field of machine learning has also gained in popularity in the last decades due to the recent improvement in computer power and the ever growing size of databases now available. In this thesis, we focus on methods tailored to retrieving hidden information from poorly annotated data, i.e., with incomplete or partial annotations. In particular, given a specific segmentation task defined by a set of images, we aim at segmenting the images and learn a related model as to segment unannotated images. Finally, our research drives us to explore the field of numerical optimization so as to design algorithms especially tailored for our problems. In particular, many numerical problems considered in this thesis cannot be solved by off-the-shelf software because of the complexity of their formulation. We use and adapt recently developed tools to approximate problems by solvable ones. We illustrate the promise of our formulations and algorithms on other general applications in different fields beside computer vision. In particular, we show that our work may also be used in text classification and discovery of cell configurations.
Document type :
Theses
Liste complète des métadonnées

Cited literature [133 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-00826236
Contributor : Abes Star <>
Submitted on : Monday, May 27, 2013 - 10:57:10 AM
Last modification on : Wednesday, January 30, 2019 - 10:43:29 AM
Document(s) archivé(s) le : Tuesday, April 4, 2017 - 11:26:02 AM

File

Joulin2012.pdf
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-00826236, version 1

Collections

Citation

Armand Joulin. Optimisation convexe pour la cosegmentation. General Mathematics [math.GM]. École normale supérieure de Cachan - ENS Cachan, 2012. English. ⟨NNT : 2012DENS0086⟩. ⟨tel-00826236⟩

Share

Metrics

Record views

768

Files downloads

330