Service interruption on Monday 11 July from 12:30 to 13:00: all the sites of the CCSD (HAL, EpiSciences, SciencesConf, AureHAL) will be inaccessible (network hardware connection).
Skip to Main content Skip to Navigation

Deep generative models : over-generalisation and mode-dropping

Thomas Lucas 1, 2 
2 Thoth - Apprentissage de modèles à partir de données massives
Inria Grenoble - Rhône-Alpes, LJK - Laboratoire Jean Kuntzmann
Abstract : This dissertation explores the topic of generative modelling of natural images,which is the task of fitting a data generating distribution.Such models can be used to generate artificial data resembling the true data, or to compress images.Latent variable models, which are at the core of our contributions, seek to capture the main factors of variations of an image into a variable that can be manipulated.In particular we build on two successful latent variable generative models, the generative adversarial network (GAN) and Variational autoencoder (VAE) models.Recently GANs significantly improved the quality of images generated by deep models, obtaining very compelling samples.Unfortunately these models struggle to capture all the modes of the original distribution, ie they do not cover the full variability of the dataset.Conversely, likelihood based models such as VAEs typically cover the full variety of the data well and provide an objective measure of coverage.However these models produce samples of inferior visual quality that are more easily distinguished from real ones.The work presented in this thesis strives for the best of both worlds: to obtain compelling samples while modelling the full support of the distribution.To achieve that, we focus on i) the optimisation problems used and ii) practical model limitations that hinder performance.The first contribution of this manuscript is a deep generative model that encodes global image structure into latent variables, built on the VAE, and autoregressively models low level detail.We propose a training procedure relying on an auxiliary loss function to control what information is captured by the latent variables and what information is left to an autoregressive decoder.Unlike previous approaches to such hybrid models, ours does not need to restrict the capacity of the autoregressive decoder to prevent degenerate models that ignore the latent variables.The second contribution builds on the standard GAN model, which trains a discriminator network to provide feedback to a generative network.The discriminator usually assesses the quality of individual samples, which makes it hard to evaluate the variability of the data.Instead we propose to feed the discriminator with emph{batches} that mix both true and fake samples, and train it to predict the ratio of true samples in the batch.These batches work as approximations of the distribution of generated images and allows the discriminator to approximate distributional statistics.We introduce an architecture that is well suited to solve this problem efficiently,and show experimentally that our approach reduces mode collapse in GANs on two synthetic datasets, and obtains good results on the CIFAR10 and CelebA datasets.The mutual shortcomings of VAEs and GANs can in principle be addressed by training hybrid models that use both types of objective.In our third contribution, we show that usual parametric assumptions made in VAEs induce a conflict between them, leading to lackluster performance of hybrid models.We propose a solution based on deep invertible transformations, that trains a feature space in which usual assumptions can be made without harm.Our approach provides likelihood computations in image space while being able to take advantage of adversarial training.It obtains GAN-like samples that are competitive with fully adversarial models while improving likelihood scores over existing hybrid models at the time of publication, which is a significant advancement.
Complete list of metadata
Contributor : ABES STAR :  Contact
Submitted on : Thursday, January 7, 2021 - 3:51:30 PM
Last modification on : Friday, March 25, 2022 - 9:44:34 AM
Long-term archiving on: : Thursday, April 8, 2021 - 7:34:07 PM


Version validated by the jury (STAR)


  • HAL Id : tel-03102554, version 1



Thomas Lucas. Deep generative models : over-generalisation and mode-dropping. Artificial Intelligence [cs.AI]. Université Grenoble Alpes [2020-..], 2020. English. ⟨NNT : 2020GRALM049⟩. ⟨tel-03102554⟩



Record views


Files downloads