HAL will be down for maintenance from Friday, June 10 at 4pm through Monday, June 13 at 9am. More information
Skip to Main content Skip to Navigation
Theses

Deep Learning for 3D Shape Modelling

Roman Klokov 1, 2
2 Thoth - Apprentissage de modèles à partir de données massives
Inria Grenoble - Rhône-Alpes, LJK - Laboratoire Jean Kuntzmann
Abstract : Application of deep learning to geometric 3D data poses various challenges for researchers. The complex nature of geometric 3D data allows to represent it in different forms: occupancy grids, point clouds, meshes, implicit functions, etc. Each of those representations has already spawned streams of deep neural network models, capable of processing and predicting according data samples for further use in various data recognition, generation, and modification tasks.Modern deep learning models force researchers to make various design choices, associated with their architectures, learning algorithms and other specific aspects of the chosen applications. Often, these choices are made with the help of various heuristics and best practice methods discovered through numerous costly experimental evaluations. Probabilistic modeling provides an alternative to these methods that allows to formalize machine learning tasks in a meaningful manner and develop probability-based training objectives. This thesis explores combinations of deep learning based methods and probabilistic modeling in application to geometric 3D data.The first contribution explores how probabilistic modeling could be applied in the context of single-view 3D shape inference task. We propose a family of probabilistic models, Probabilistic Reconstruction Networks (PRNs),which treats the task as image conditioned generation and introduces a global latent variable, encoding shape geometry information. We explore different image conditioning options, and two different training objectives based on Monte Carlo and variational approximations of the model likelihood. Parameters of every distribution are predicted by multi-layered convolutional and fully-connected neural networks from the input images. All the options in the family of models are evaluated in the single-view 3D occupancy grid inference task on synthetic shapes and according image renderings from randomized viewpoints. We show that conditioning the latent variable prior on the input images is sufficient to achieve competitive and state-of-the-art single-view 3D shape inference performance for point cloud based and voxel based metrics, respectively. We additionally demonstrate that probabilistic objective based on variational approximation of the likelihood allows the model to obtain better results compared to Monte Carlo based approximation.The second contribution proposes a probabilistic model for 3D point cloud generation. It treats point clouds as distributions over exchangeable variables and use de Finetti’s representation theorem to define a global latent variable model with conditionally independent distributions for coordinates of each point. To model these point distributions a novel type of conditional normalizing flows is proposed, based on discrete coupling of point coordinate dimensions. These flows update the coordinates of each point sample multiple times by dividing them in two groups and inferring the updates for one group of coordinates from another group and, additionally, global latent variable sample by the means of multi-layered fully-connected neural networks with parameters shared for all the points. We also extend our Discrete Point Flow Networks (DPFNs) from generation to single-view inference task by conditioning the global latent variable prior in a manner similar to PRNs from the first contribution. Resulting generative performance demonstrates that DPFNs produce sets of samples of similar quality and diversity compared to state of the art based on continuous normalizing flows, but are approximately 30 times faster both in training and sampling. Results in autoencoding and single-view inference tasks show competitive and state-of-the-art performance for Chamfer distance, F-score and earth mover’s distance similarity metrics for point clouds.
Complete list of metadata

https://tel.archives-ouvertes.fr/tel-03667371
Contributor : Abes Star :  Contact
Submitted on : Friday, May 13, 2022 - 1:23:14 PM
Last modification on : Saturday, May 14, 2022 - 3:40:11 AM

File

KLOKOV_2021_archivage.pdf
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-03667371, version 1

Collections

STAR | LJK | INRIA2 | INRIA | CNRS | UGA

Citation

Roman Klokov. Deep Learning for 3D Shape Modelling. Neural and Evolutionary Computing [cs.NE]. Université Grenoble Alpes [2020-..], 2021. English. ⟨NNT : 2021GRALM060⟩. ⟨tel-03667371⟩

Share

Metrics

Record views

0

Files downloads

0