Vers l’universalité des représentations visuelle et multimodales

Abstract : Because of its key societal, economic and cultural stakes, Artificial Intelligence (AI) is a hot topic. One of its main goal, is to develop systems that facilitates the daily life of humans, with applications such as household robots, industrial robots, autonomous vehicle and much more. The rise of AI is highly due to the emergence of tools based on deep neural-networks which make it possible to simultaneously learn, the representation of the data (which were traditionally hand-crafted), and the task to solve (traditionally learned with statistical models). This resulted from the conjunction of theoretical advances, the growing computational capacity as well as the availability of many annotated data. A long standing goal of AI is to design machines inspired humans, capable of perceiving the world, interacting with humans, in an evolutionary way. We categorize, in this Thesis, the works around AI, in the two following learning-approaches: (i) Specialization: learn representations from few specific tasks with the goal to be able to carry out very specific tasks (specialized in a certain field) with a very good level of performance; (ii) Universality: learn representations from several general tasks with the goal to perform as many tasks as possible in different contexts. While specialization was extensively explored by the deep-learning community, only a few implicit attempts were made towards universality. Thus, the goal of this Thesis is to explicitly address the problem of improving universality with deep-learning methods, for image and text data. We have addressed this topic of universality in two different forms: through the implementation of methods to improve universality (“universalizing methods”); and through the establishment of a protocol to quantify its universality. Concerning universalizing methods, we proposed three technical contributions: (i) in a context of large semantic representations, we proposed a method to reduce redundancy between the detectors through, an adaptive thresholding and the relations between concepts; (ii) in the context of neural-network representations, we proposed an approach that increases the number of detectors without increasing the amount of annotated data; (iii) in a context of multimodal representations, we proposed a method to preserve the semantics of unimodal representations in multimodal ones. Regarding the quantification of universality, we proposed to evaluate universalizing methods in a Transferlearning scheme. Indeed, this technical scheme is relevant to assess the universal ability of representations. This also led us to propose a new framework as well as new quantitative evaluation criteria for universalizing methods.
Liste complète des métadonnées

Cited literature [78 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-01828934
Contributor : Abes Star <>
Submitted on : Tuesday, July 3, 2018 - 3:42:06 PM
Last modification on : Thursday, April 11, 2019 - 8:09:04 AM
Document(s) archivé(s) le : Monday, October 1, 2018 - 8:47:56 AM

File

77286_TAMAAZOUSTI_2018_archiva...
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-01828934, version 1

Citation

Youssef Tamaazousti. Vers l’universalité des représentations visuelle et multimodales. Sciences de l'information et de la communication. Université Paris-Saclay, 2018. Français. ⟨NNT : 2018SACLC038⟩. ⟨tel-01828934⟩

Share

Metrics

Record views

1302

Files downloads

372