Learning Deep Representations : Toward a better new understanding of the deep learning paradigm

Ludovic Arnold 1
1 TAO - Machine Learning and Optimisation
CNRS - Centre National de la Recherche Scientifique : UMR8623, Inria Saclay - Ile de France, UP11 - Université Paris-Sud - Paris 11, LRI - Laboratoire de Recherche en Informatique
Abstract : Since 2006, deep learning algorithms which rely on deep architectures with several layers of increasingly complex representations have been able to outperform state-of-the-art methods in several settings. Deep architectures can be very efficient in terms of the number of parameters required to represent complex operations which makes them very appealing to achieve good generalization with small amounts of data. Although training deep architectures has traditionally been considered a difficult problem, a successful approach has been to employ an unsupervised layer-wise pre-training step to initialize deep supervised models. First, unsupervised learning has many benefits w.r.t. generalization because it only relies on unlabeled data which is easily found. Second, the possibility to learn representations layer by layer instead of all layers at once improves generalization further and reduces computational time. However, deep learning is a very recent approach and still poses a lot of theoretical and practical questions concerning the consistency of layer-wise learning with many layers and difficulties such as evaluating performance, performing model selection and optimizing layers. In this thesis we first discuss the limitations of the current variational justification for layer-wise learning which does not generalize well to many layers. We ask if a layer-wise method can ever be truly consistent, i.e. capable of finding an optimal deep model by training one layer at a time without knowledge of the upper layers. We find that layer-wise learning can in fact be consistent and can lead to optimal deep generative models. To do this, we introduce the Best Latent Marginal (BLM) upper bound, a new criterion which represents the maximum log-likelihood of a deep generative model where the upper layers are unspecified. We prove that maximizing this criterion for each layer leads to an optimal deep architecture, provided the rest of the training goes well. Although this criterion cannot be computed exactly, we show that it can be maximized effectively by auto-encoders when the encoder part of the model is allowed to be as rich as possible. This gives a new justification for stacking models trained to reproduce their input and yields better results than the state-of-the-art variational approach. Additionally, we give a tractable approximation of the BLM upper-bound and show that it can accurately estimate the final log-likelihood of models. Taking advantage of these theoretical advances, we propose a new method for performing layer-wise model selection in deep architectures, and a new criterion to assess whether adding more layers is warranted. As for the difficulty of training layers, we also study the impact of metrics and parametrization on the commonly used gradient descent procedure for log-likelihood maximization. We show that gradient descent is implicitly linked with the metric of the underlying space and that the Euclidean metric may often be an unsuitable choice as it introduces a dependence on parametrization and can lead to a breach of symmetry. To mitigate this problem, we study the benefits of the natural gradient and show that it can restore symmetry, regrettably at a high computational cost. We thus propose that a centered parametrization may alleviate the problem with almost no computational overhead.
Document type :
Theses
Complete list of metadatas

Cited literature [221 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-00842447
Contributor : Abes Star <>
Submitted on : Monday, July 8, 2013 - 3:42:12 PM
Last modification on : Monday, December 9, 2019 - 5:24:06 PM
Long-term archiving on: Wednesday, October 9, 2013 - 4:23:29 AM

Identifiers

  • HAL Id : tel-00842447, version 1

Collections

Citation

Ludovic Arnold. Learning Deep Representations : Toward a better new understanding of the deep learning paradigm. Other [cs.OH]. Université Paris Sud - Paris XI, 2013. English. ⟨NNT : 2013PA112103⟩. ⟨tel-00842447⟩

Share

Metrics

Record views

2687

Files downloads

21883