A Learning Algorithm for Boltzmann Machines*, Cognitive Science, vol.85, issue.1, pp.147-169, 1985. ,
DOI : 10.1207/s15516709cog0901_7
No unbiased estimator of the variance of k-fold cross-validation, Journal of Machine Learning Research, vol.5, pp.1089-1105, 2004. ,
Scaling learning algorithms towards ai, Large-Scale Kernel Machines, 2007. ,
Empirical Bernstein stopping, Proceedings of the 25th international conference on Machine learning, ICML '08, 2008. ,
DOI : 10.1145/1390156.1390241
URL : https://hal.archives-ouvertes.fr/hal-00834983
Parallel neural computing based on network duplicating, Parallel Algorithms for Digital Image Processing, Computer Vision and Neural Networks, pp.305-340, 1993. ,
Statistical Learning Theory, 1998. ,
A New Learning Algorithm for Mean Field Boltzmann Machines, Proceedings of the International Conference on Artificial Neural Networks (ICANN), 2002. ,
DOI : 10.1007/3-540-46084-5_57
Scaling learning algorithms towards ai, Large-Scale Kernel Machines, 2007. ,
Greedy layer-wise training of deep networks, Advances in Neural Information Processing Systems 19, pp.153-160, 2007. ,
Justifying and Generalizing Contrastive Divergence, Neural Computation, vol.17, issue.6, pp.1601-1621, 2009. ,
DOI : 10.1145/1390156.1390290
Unsupervised feature learning and deep learning: A review and new perspectives, 1206. ,
Random search for hyper-parameter optimization, Journal of Machine Learning Research, vol.13, pp.281-305, 2012. ,
Algorithms for hyper-parameter optimization, Advances in Neural Information Processing Systems 23, 2011. ,
URL : https://hal.archives-ouvertes.fr/hal-00642998
Auto-association by multilayer perceptrons and singular value decomposition, Biological Cybernetics, vol.13, issue.4-5, pp.291-294, 1988. ,
DOI : 10.1121/1.395916
Bayesian back-propagation, Complex Systems, vol.5, pp.603-643, 1991. ,
Elements of information theory.W i l e y - Interscience, 2006. ,
Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society. Series B (Methodological), vol.39, pp.1-38, 1977. ,
Training Products of Experts by Minimizing Contrastive Divergence, Neural Computation, vol.22, issue.8, pp.1771-1800, 2002. ,
DOI : 10.1162/089976600300015385
Reducing the Dimensionality of Data with Neural Networks, Science, vol.313, issue.5786, pp.313504-507, 2006. ,
DOI : 10.1126/science.1127647
A Fast Learning Algorithm for Deep Belief Nets, Neural Computation, vol.18, issue.7, pp.1527-1554, 2006. ,
DOI : 10.1162/jmlr.2003.4.7-8.1235
An empirical evaluation of deep architectures on problems with many factors of variation, Proceedings of the 24th international conference on Machine learning, ICML '07, pp.473-480, 2007. ,
DOI : 10.1145/1273496.1273556
Exploring strategies for training deep neural networks, The Journal of Machine Learning Research, vol.10, pp.1-40, 2009. ,
Representational power of restricted Boltzmann machines and deep belief networks, Neural Computation, vol.20, pp.1631-1649, 2008. ,
Gradient-based learning applied to document recognition, Proceedings of the IEEE, pp.2278-2324, 1998. ,
DOI : 10.1109/5.726791
Learning stochastic feedforward networks, 1990. ,
Annealed importance sampling, 1998. ,
A generative process for sampling contractive auto-encoders, International Conference on Machine Learning, p.12, 2012. ,
Deep Boltzmann machines, Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics (AISTATS), pp.448-455, 2009. ,
On the quantitative analysis of deep belief networks, Proceedings of the 25th international conference on Machine learning, ICML '08, pp.872-879, 2008. ,
DOI : 10.1145/1390156.1390266
Information processing in dynamical systems: foundations of harmony theory, Parallel Distributed Processing, pp.194-281, 1986. ,
Extracting and composing robust features with denoising autoencoders, Proceedings of the 25th international conference on Machine learning, ICML '08, pp.1096-1103, 2008. ,
DOI : 10.1145/1390156.1390294
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.141.2238
On the Convergence Properties of the EM Algorithm, The Annals of Statistics, vol.11, issue.1, pp.95-103, 1983. ,
DOI : 10.1214/aos/1176346060
would like to acknowledge the Dagstuhl Seminar No 10361 on the Theory of Evolutionary Computation 6 for inspiring their work on natural gradients and beyond, This work was partially supported by the ANR- 2010-COSI-002 grant (SIMINOLE) of the French National Research Agency ,
A Learning Algorithm for Boltzmann Machines*, Cognitive Science, vol.85, issue.1, pp.147-169, 1985. ,
DOI : 10.1207/s15516709cog0901_7
Bidirectional Relation between CMA Evolution Strategies and Natural Evolution Strategies, Proceedings of Parallel Problem Solving from Nature -PPSN XI, pp.154-163, 2010. ,
DOI : 10.1007/978-3-642-15844-5_16
Convergence of the Continuous Time Trajectories of Isotropic Evolution Strategies on Monotonic $\mathcal C^2$ -composite Functions, Lecture Notes in Computer Science, vol.7491, issue.1, pp.42-51, 2012. ,
DOI : 10.1007/978-3-642-32937-1_5
Natural Gradient Works Efficiently in Learning, Neural Computation, vol.37, issue.2, pp.251-276, 1998. ,
DOI : 10.1103/PhysRevLett.76.2188
Methods of information geometry, volume 191 of Translations of Mathematical Monographs, 2000. ,
Weighted multirecombination evolution strategies. Theoretical computer science, pp.18-37, 2006. ,
DOI : 10.1016/j.tcs.2006.04.003
URL : http://doi.org/10.1016/j.tcs.2006.04.003
Population based incremental learning: A method for integrating genetic search based function optimization and competitve learning, 1994. ,
Removing the Genetics from the Standard Genetic Algorithm, Proceedings of ICML'95, pp.38-46, 1995. ,
DOI : 10.1016/B978-1-55860-377-6.50014-1
Greedy layer-wise training of deep networks, Advances in Neural Information Processing Systems 19, pp.153-160, 2007. ,
Unsupervised feature learning and deep learning: A review and new perspectives, 1206. ,
Selection and Reinforcement Learning for Combinatorial Optimization, Parallel Problem Solving from Nature PPSN VI, pp.601-610, 1917. ,
DOI : 10.1007/3-540-45356-3_59
An adaptive scheme for real function optimization acting as a selection operator, 2000 IEEE Symposium on Combinations of Evolutionary Computation and Neural Networks. Proceedings of the First IEEE Symposium on Combinations of Evolutionary Computation and Neural Networks (Cat. No.00EX448), pp.140-149, 2000. ,
DOI : 10.1109/ECNN.2000.886229
Boltzmann machine for population-based incremental learning, ECAI, pp.198-202, 2002. ,
The Theory of Evolution Strategies. Natural Computing Series, 2001. ,
Addressing sampling errors and diversity loss in UMDA, Proceedings of the 9th annual conference on Genetic and evolutionary computation , GECCO '07, pp.508-515, 2007. ,
DOI : 10.1145/1276958.1277068
Informative geometry of probability spaces, Exposition. Math, vol.4, issue.4, pp.347-378, 1986. ,
Elements of information theory.W i l e y - Interscience, 2006. ,
A Tutorial on the Cross-Entropy Method, Annals of Operations Research, vol.16, issue.3, pp.19-67, 2005. ,
DOI : 10.1007/s10479-005-5724-z
Parallel tempering for training of restricted Boltzmann machines, Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics (AISTATS), 2010. ,
Population-Based Continuous Optimization, Probabilistic Modelling and Mean Shift, Evolutionary Computation, vol.12, issue.4, pp.29-42, 2005. ,
DOI : 10.1023/A:1013500812258
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.139.8825
Unsupervised Learning, Advanced Lectures on Machine Learning, pp.72-112, 2004. ,
DOI : 10.1080/01621459.1995.10476550
Exponential natural evolution strategies, Proceedings of the 12th annual conference on Genetic and evolutionary computation, GECCO '10, pp.393-400, 2010. ,
DOI : 10.1145/1830483.1830557
The CMA evolution strategy: a comparing review Advances on estimation of distribution algorithms, pp.75-102, 2006. ,
Evaluating the CMA Evolution Strategy on Multimodal Test Functions, Parallel Problem Solving from Nature PPSN VIII, pp.282-291, 2004. ,
DOI : 10.1007/978-3-540-30217-9_29
Completely Derandomized Self-Adaptation in Evolution Strategies, Evolutionary Computation, vol.9, issue.2, pp.159-195, 2001. ,
DOI : 10.1016/0004-3702(95)00124-7
Training Products of Experts by Minimizing Contrastive Divergence, Neural Computation, vol.22, issue.8, pp.1771-1800, 2002. ,
DOI : 10.1162/089976600300015385
A Fast Learning Algorithm for Deep Belief Nets, Neural Computation, vol.18, issue.7, pp.1527-1554, 2006. ,
DOI : 10.1162/jmlr.2003.4.7-8.1235
`` Direct Search'' Solution of Numerical and Statistical Problems, Journal of the ACM, vol.8, issue.2, pp.212-229, 1961. ,
DOI : 10.1145/321062.321069
Improving Evolution Strategies through Active Covariance Matrix Adaptation, 2006 IEEE International Conference on Evolutionary Computation, pp.2814-2821, 2006. ,
DOI : 10.1109/CEC.2006.1688662
An Invariant Form for the Prior Probability in Estimation Problems, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, vol.186, issue.1007, pp.453-461, 1946. ,
DOI : 10.1098/rspa.1946.0056
Numerical solution of stochastic differential equations, Applications of Mathematics, vol.23 ,
Information theory and statistics, 1968. ,
Estimation of distribution algorithms: A new tool for evolutionary computation, 2002. ,
DOI : 10.1007/978-1-4615-1539-5
Topmoumoute online natural gradient algorithm, NIPS, 2007. ,
An information geometry perspective on estimation of distribution algorithms, Proceedings of the 2008 GECCO conference companion on Genetic and evolutionary computation, GECCO '08, pp.2081-2088, 2008. ,
DOI : 10.1145/1388969.1389026
Towards the geometry of estimation of distribution algorithms based on the exponential family, Proceedings of the 11th workshop proceedings on Foundations of genetic algorithms, FOGA '11, pp.230-242, 2011. ,
DOI : 10.1145/1967654.1967675
A simplex method for function minimization, The Computer Journal, pp.308-313, 1965. ,
A survey of optimization by building and using probabilistic models, Proceedings of the 2000 American Control Conference. ACC (IEEE Cat. No.00CH36334), pp.5-20, 2002. ,
DOI : 10.1109/ACC.2000.879173
Information and the Accuracy Attainable in the Estimation of Statistical Parameters, Bull. Calcutta Math. Soc, vol.37, pp.81-91, 1945. ,
DOI : 10.1007/978-1-4612-0919-5_16
Evolutionsstrategie '94. Frommann-Holzboog Verlag, 1994. ,
Analyse. II, volume 43 of Collection Enseignement des Sciences [Collection: The Teaching of Science], Calcul différentiel et équations différentielles, 1992. ,
URL : https://hal.archives-ouvertes.fr/tel-00308504
Evolution and Optimum Seeking. Sixth-generation computer technology series, 1995. ,
Acceleration techniques for the backpropagation algorithm, Neural Networks, pp.110-119, 1990. ,
DOI : 10.1007/3-540-52255-7_32
Information processing in dynamical systems: foundations of harmony theory, Parallel Distributed Processing, pp.194-281, 1986. ,
Efficient natural evolution strategies, Proceedings of the 11th Annual conference on Genetic and evolutionary computation, GECCO '09, pp.539-546, 2009. ,
DOI : 10.1145/1569901.1569976
On the Convergence of Pattern Search Algorithms, SIAM Journal on Optimization, vol.7, issue.1, pp.1-25, 1997. ,
DOI : 10.1137/S1052623493250780
Notes on information geometry and evolutionary processes. eprint arXiv:nlin/0408040, 2004. ,
EEDA : A New Robust Estimation of Distribution Algorithms, 2004. ,
URL : https://hal.archives-ouvertes.fr/inria-00070802
The genitor algorithm and selection pressure: Why rank-based allocation of reproductive trials is best, Proceedings of the third international conference on Genetic algorithms, pp.116-121, 1989. ,
Optimisation de la topologie pour les réseaux de neurones profonds, 17e congrès francophone AFRIF?AFIA Reconnaissance des Formes et Intelli-gence Artificielle, 2010. ,
Unsupervised layer-wise model selection in deep neural networks, 19th European Conference on Artificial Intelligence Lisbon Portugal, pp.915-920, 2010. ,
URL : https://hal.archives-ouvertes.fr/hal-00488338
Sylvain Chevallier, and Hélène Paugam- Moisy. An introduction to deep learning, European Symposium on Artificial Neural Networks, 2011. ,
Informationgeometric optimization algorithms: A unifying picture via invariance principles ArXiv e-prints, 2011. ,
Layer-wise learning of deep generative models ArXiv e-prints, 2012. ,
A Learning Algorithm for Boltzmann Machines*, Cognitive Science, vol.85, issue.1, pp.147-169, 1985. ,
DOI : 10.1207/s15516709cog0901_7
Learning the structure of deep sparse graphical models, Journal of Machine Learning Research -Proceedings Track, vol.9, pp.1-8, 2010. ,
What regularized auto-encoders learn from the data generating distribution. ArXiv e-prints, 2012. ,
Natural Gradient Works Efficiently in Learning, Neural Computation, vol.37, issue.2, pp.251-276, 1998. ,
DOI : 10.1103/PhysRevLett.76.2188
Adaptive Method of Realizing Natural Gradient Learning for Multilayer Perceptrons, Neural Computation, vol.12, issue.6, pp.1399-1409, 2000. ,
DOI : 10.1162/089976698300017007
Population-based incremental learning: A method for integrating genetic search based function optimization and competitive learning, 1994. ,
The bellkor solution to the netflix prize, 2007. ,
Learning Deep Architectures for AI, Foundations and Trends?? in Machine Learning, vol.2, issue.1, p.80, 2007. ,
DOI : 10.1561/2200000006
Deep Learning of Representations, 2013. ,
DOI : 10.1007/978-3-642-36657-4_1
Justifying and Generalizing Contrastive Divergence, Neural Computation, vol.17, issue.6, pp.1601-1621, 2009. ,
DOI : 10.1145/1390156.1390290
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.334.5982
Understanding the difficulty of training deep feedforward neural networks, Proceedings of AISTATS 2010, pp.249-256, 2010. ,
Scaling learning algorithms towards ai In Large-Scale Kernel Machines, p.79, 2007. ,
Deep generative stochastic networks trainable by backprop. ArXiv e-prints, 2013. ,
The curse of highly variable functions for local kernel machines, Advances in Neural Information Processing Systems 18, p.79, 2006. ,
Greedy layer-wise training of deep networks, Advances in Neural Information Processing Systems 19, pp.153-160, 2007. ,
Unsupervised feature learning and deep learning: A review and new perspectives. CoRR, abs/1206, p.2012 ,
Generalized denoising auto-encoders as generative models. ArXiv e-prints, 2013. ,
Random search for hyper-parameter optimization, Journal of Machine Learning Research, vol.13, pp.281-305, 2012. ,
Algorithms for hyper-parameter optimization, Advances in Neural Information Processing Systems, 2011. ,
URL : https://hal.archives-ouvertes.fr/hal-00642998
Neural Networks for Pattern Recognition, 1995. ,
The expectation maximization algorithm: A short tutorial, 2004. ,
Auto-association by multilayer perceptrons and singular value decomposition, Biological Cybernetics, vol.13, issue.4-5, pp.291-294, 1988. ,
DOI : 10.1121/1.395916
Quickly Generating Representative Samples from an RBM-Derived Process, Neural Computation, vol.23, issue.8, pp.2053-2073, 2011. ,
DOI : 10.1080/17442509908834179
On contrastive divergence learning, In Artificial Intelligence and Statistics, 2005. ,
A Two-Stage Pretraining Algorithm for Deep Boltzmann Machines, Proceedings of the NIPS 2012 Workshop on Deep Learning and Unsupervised Feature Learning, p.2012 ,
DOI : 10.1007/978-3-642-40728-4_14
Deep neural networks segment neuronal membranes in electron microscopy images, NIPS, pp.2852-2860 ,
Multi-column deep neural network for traffic sign classification, Neural Networks, vol.32, pp.333-338 ,
DOI : 10.1016/j.neunet.2012.02.023
Deep big simple neural nets excel on handwritten digit recognition, p.80, 2010. ,
An analysis of single-layer networks in unsupervised feature learning, Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics (AISTATS), 2011. ,
A unified architecture for natural language processing, Proceedings of the 25th international conference on Machine learning, ICML '08, 2008. ,
DOI : 10.1145/1390156.1390177
Unsupervised models of images by spike-and-slab rbms, Proceedings of the 28th International Conference on Machine Learning (ICML- 11), pp.1145-1152, 2011. ,
The spike and slab restricted boltzmann machine, Proceedings of the 14th International Conference on Artificial Intelligence and Statistics (AISTATS), pp.233-241 ,
Probability, frequency and reasonable expectation, American Journal of Physics, vol.14, pp.1-13, 1946. ,
Approximation by superpositions of a sigmoidal function, Mathematics of Control, Signals, and Systems (MCSS), pp.303-314, 1989. ,
Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition, IEEE Transactions on Audio, Speech, and Language Processing, vol.20, issue.1, pp.30-42 ,
DOI : 10.1109/TASL.2011.2134090
Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society. Series B (Methodological), vol.39, pp.1-38, 1977. ,
Binary coding of speech spectrograms using a deep auto-encoder, INTERSPEECH, pp.1692-1695, 2010. ,
Parallel tempering for training of restricted boltzmann machines, Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics (AISTATS), 2010. ,
On tracking the partition function, Advances in Neural Information Processing Systems 24, pp.2501-2509, 2011. ,
Metric-free natural gradient for joint-training of boltzmann machines. CoRR, abs/1301, 2013. ,
The difficulty of training deep architectures and the effect of unsupervised pre-training, Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics (AISTATS), p.83, 2009. ,
Continuous sigmoidal belief networks trained using slice sampling, NIPS, pp.452-458, 1996. ,
Variational learning in nonlinear gaussian belief networks, Neural Computation, vol.11, issue.1, pp.193-213, 1999. ,
Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position, Biological Cybernetics, vol.40, issue.4, pp.193-202, 1980. ,
DOI : 10.1007/BF00344251
Unsupervised Learning, Advanced Lectures on Machine Learning, pp.72-112, 2004. ,
DOI : 10.1080/01621459.1995.10476550
Deep sparse rectifier neural networks, Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics (AISTATS) ,
URL : https://hal.archives-ouvertes.fr/hal-00752497
Measuring invariances in deep networks, Advances in Neural Information Processing Systems 22, pp.646-654, 2009. ,
Spike-and-slab sparse coding for unsupervised feature discovery. CoRR, abs/1201, p.2012 ,
Joint training of deep boltzmann machines for classification. ArXiv e-prints, 2013. ,
Offline Arabic Handwriting Recognition with Multidimensional Recurrent Neural Networks, Guide to OCR for Arabic Scripts, pp.297-313 ,
DOI : 10.1007/978-1-4471-4072-6_12
Offline Arabic Handwriting Recognition with Multidimensional Recurrent Neural Networks, NIPS, pp.545-552, 2008. ,
DOI : 10.1007/978-1-4471-4072-6_12
Theory and use of the em algorithm. Found. Trends Signal Process, pp.223-296 ,
The CMA evolution strategy: A tutorial, 2008. ,
URL : https://hal.archives-ouvertes.fr/hal-01297037
The human brain in numbers: a linearly scaled-up primate brain, Frontiers in Human Neuroscience, vol.3, issue.00031, p.31, 2009. ,
DOI : 10.3389/neuro.09.031.2009
Connectionist learning procedures, Artificial Intelligence, vol.40, issue.1-3, pp.185-234, 1989. ,
DOI : 10.1016/0004-3702(89)90049-0
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.216.5594
Training Products of Experts by Minimizing Contrastive Divergence, Neural Computation, vol.22, issue.8, pp.1771-1800, 2002. ,
DOI : 10.1162/089976600300015385
A Practical Guide to Training Restricted Boltzmann Machines, 2010. ,
DOI : 10.1073/pnas.79.8.2554
Reducing the dimensionality of data with neural networks, Science, issue.5786, pp.313504-507, 2006. ,
A Fast Learning Algorithm for Deep Belief Nets, Neural Computation, vol.18, issue.7, pp.1527-1554, 2006. ,
DOI : 10.1162/jmlr.2003.4.7-8.1235
Improving neural networks by preventing co-adaptation of feature detectors, pp.81-91, 2012. ,
A quantitative description of membrane current and its application to conduction and excitation in nerve, The Journal of Physiology, vol.117, issue.4, pp.500-544, 1952. ,
DOI : 10.1113/jphysiol.1952.sp004764
Multilayer feedforward networks are universal approximators, Neural Networks, vol.2, issue.5, pp.359-366, 1989. ,
DOI : 10.1016/0893-6080(89)90020-8
Estimation of non-normalized statistical models by score matching, J. Mach. Learn. Res, vol.6, pp.695-709, 2005. ,
What is the best multi-stage architecture for object recognition?, 2009 IEEE 12th International Conference on Computer Vision, 2009. ,
DOI : 10.1109/ICCV.2009.5459469
Fast inference in sparse coding algorithms with applications to object recognition. CoRR, abs/1010, p.2010 ,
Learning convolutional feature hierarchies for visual recognition, Advances in Neural Information Processing Systems 23, pp.1090-1098 ,
Optimization by Simulated Annealing, Science, vol.220, issue.4598, pp.671-680, 1983. ,
DOI : 10.1126/science.220.4598.671
Convolutional deep belief networks on cifar-10, p.84, 2010. ,
Learning multiple layers of features from tiny images, 2009. ,
Classification using discriminative restricted Boltzmann machines, Proceedings of the 25th international conference on Machine learning, ICML '08, pp.536-543, 2008. ,
DOI : 10.1145/1390156.1390224
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.149.8286
An empirical evaluation of deep architectures on problems with many factors of variation, Proceedings of the 24th international conference on Machine learning, ICML '07, pp.473-480, 2007. ,
DOI : 10.1145/1273496.1273556
Exploring strategies for training deep neural networks, The Journal of Machine Learning Research, vol.10, issue.86, pp.1-40, 2009. ,
Learning algorithms for the classification restricted boltzmann machine, J. Mach. Learn. Res, vol.13, pp.643-669 ,
Building high-level features using large scale unsupervised learning, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, p.2012 ,
DOI : 10.1109/ICASSP.2013.6639343
Handwritten digit recognition with a back-propagation network Advances in neural information processing systems 2, pp.396-404, 1990. ,
Representational power of restricted boltzmann machines and deep belief networks, Neural Computation, vol.20, issue.122, pp.1631-1649, 2008. ,
A fast natural newton method, ICML, pp.623-630, 2010. ,
Top-moumoute online natural gradient algorithm, Advances in Neural Information Processing Systems, 2007. ,
Generalization and network design strategies, Connectionism in Perspective, 1989. ,
Convolutional networks for images, speech, and time-series The Handbook of Brain Theory and Neural Networks, p.82, 1995. ,
Gradient-based learning applied to document recognition, Proceedings of the IEEE, pp.2278-2324, 1998. ,
DOI : 10.1109/5.726791
Efficient backprop, Neural Networks: Tricks of the Trade, 1998. ,
Sparse deep belief net model for visual area v2, Advances in Neural Information Processing Systems, 2007. ,
Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations, Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, pp.77-80, 2009. ,
DOI : 10.1145/1553374.1553453
Unsupervised feature learning for audio classification using convolutional deep belief networks, Advances in Neural Information Processing Systems 22, pp.1096-1104, 2009. ,
Inductive principles for restricted boltzmann machine learning, Journal of Machine Learning Research -Proceedings Track, vol.9, pp.509-516, 2010. ,
Deep learning via hessian-free optimization, Proceedings of the 27th Annual International Conference on Machine Learning, pp.735-742, 2010. ,
DOI : 10.1007/978-3-642-35289-8_27
Learning recurrent neural networks with hessian-free optimization, Lise Getoor and Tobias Scheffer Proceedings of the 28th Annual International Conference on Machine Learning, pp.1033-1040, 2011. ,
DOI : 10.1007/978-3-642-35289-8_27
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.296.4704
Better Digit Recognition with a Committee of Simple Neural Nets, 2011 International Conference on Document Analysis and Recognition, pp.1250-1254, 2011. ,
DOI : 10.1109/ICDAR.2011.252
Non-linear latent factor models for revealing structure in high-dimensional data, 2008. ,
Unsupervised Learning of Image Transformations, 2007 IEEE Conference on Computer Vision and Pattern Recognition, 2007. ,
DOI : 10.1109/CVPR.2007.383036
Learning to Represent Spatial Transformations with Factored Higher-Order Boltzmann Machines, Neural Computation, vol.17, issue.6, pp.1473-1492, 2010. ,
DOI : 10.1007/3-540-47969-4_30
Unsupervised and transfer learning challenge: a deep learning approach, JMLR W& CP: Proceedings of the Unsupervised and Transfer Learning challenge and workshop, pp.97-110 ,
Perceptrons: An introduction to computational geometry, 1969. ,
Deep Boltzmann Machines and the Centering Trick, LNCS, vol.10, issue.5, p.2012 ,
DOI : 10.1007/3-540-49430-8_11
Evaluating probabilities under highdimensional latent variable models, Advances in Neural Information Processing Systems, p.88, 2009. ,
3d object recognition with deep belief nets, Advances in Neural Information Processing Systems 22, pp.1339-1347, 2009. ,
Rectified linear units improve restricted boltzmann machines, ICML '10: Proceedings of the 27th international conference on Machine learning, pp.807-814, 2010. ,
Probabilistic inference using markov chain monte carlo methods, 1993. ,
Annealed importance sampling, 1998. ,
Sparse autoencoder. CS294A Lecture notes, 2011. ,
Multimodal deep learning, ICML, pp.689-696, 2011. ,
Numerical optimization, 2006. ,
DOI : 10.1007/b98874
Emergence of simple-cell receptive field properties by learning a sparse code for natural images, Nature, vol.381, pp.607-609, 1996. ,
Sparse coding with an overcomplete basis set: a strategy employed by v1? Vision research, pp.3311-3325, 1997. ,
Deep belief networks using discriminative features for phone recognition, ICASSP, pp.5060-5063, 2011. ,
Factored 3- way restricted boltzmann machines for modeling natural images, Journal of Machine Learning Research -Proceedings Track, vol.9, pp.621-628, 2010. ,
Contractive auto-encoders: Explicit invariance during feature extraction, ICML, pp.833-840, 2011. ,
A generative process for sampling contractive auto-encoders, International Conference on Machine Learning, p.2012 ,
Monte Carlo Statistical Methods (Springer Texts in Statistics), 2005. ,
Learning internal representations by error propagation, Parallel distributed processing: explorations in the microstructure of cognition, pp.318-362, 1986. ,
Learning and evaluating Boltzmann machines, 2008. ,
Learning in markov random fields using tempered transitions, Advances in Neural Information Processing Systems 22, pp.1598-1606, 2009. ,
Deep boltzmann machines, Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics (AISTATS), pp.448-455, 2009. ,
Semantic hashing, International Journal of Approximate Reasoning, vol.50, issue.7, pp.969-978, 2009. ,
DOI : 10.1016/j.ijar.2008.11.006
A better way to pretrain deep boltzmann machines, NIPS, pp.2456-2464 ,
On the quantitative analysis of deep belief networks, Proceedings of the 25th international conference on Machine learning, ICML '08, pp.872-879, 2008. ,
DOI : 10.1145/1390156.1390266
Generative versus discriminative training of rbms for classification of fmri images, NIPS, pp.1409-1416, 2008. ,
Higher-order Boltzmann machines, AIP Conference Proceedings, pp.398-403, 1986. ,
DOI : 10.1063/1.36246
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.165.1626
Information processing in dynamical systems: foundations of harmony theory, Parallel Distributed Processing, pp.194-281, 1986. ,
Multimodal learning with deep boltzmann machines, NIPS, pp.2231-2239, 2012. ,
Deep, Narrow Sigmoid Belief Networks Are Universal Approximators, Neural Computation, vol.20, issue.11, pp.2629-2636, 2008. ,
DOI : 10.1038/323533a0
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.131.5204
Generating text with recurrent neural networks, Lise Getoor and Tobias Scheffer Proceedings of the 28th International Conference on Machine Learning (ICML-11),I C M L '11, pp.1017-1024, 2011. ,
On autoencoders and score matching for energy based models, Proceedings of the 28th International Conference on Machine Learning (ICML-11), ICML '11, pp.1201-1208, 2011. ,
Factored conditional restricted boltzmann machines for modeling motion style, ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning, pp.1025-1032, 2009. ,
Modeling human motion using binary latent variables, Advances in Neural Information Processing Systems 19, pp.1345-1352, 2007. ,
In all likelihood, deep belief is not enough, Journal of Machine Learning Research, vol.12, pp.3071-3096, 2011. ,
Training restricted Boltzmann machines using approximations to the likelihood gradient, Proceedings of the 25th international conference on Machine learning, ICML '08, pp.1064-1071, 2008. ,
DOI : 10.1145/1390156.1390290
Using fast weights to improve persistent contrastive divergence, Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, pp.1033-1040, 2009. ,
DOI : 10.1145/1553374.1553506
A Connection Between Score Matching and Denoising Autoencoders, Neural Computation, vol.11, issue.7, pp.1661-1674, 2011. ,
DOI : 10.1007/3-540-46084-5_57
Extracting and composing robust features with denoising autoencoders, Proceedings of the 25th international conference on Machine learning, ICML '08, pp.1096-1103, 2008. ,
DOI : 10.1145/1390156.1390294
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.141.2238
Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion, J. Mach. Learn. Res, vol.11, pp.3371-3408, 2010. ,
No free lunch theorems for optimization, Evolutionary Computation IEEE Transactions on, vol.1, issue.1, pp.67-82, 1997. ,
On the convergence properties of the EM algorithm. The Annals of Statistics, pp.95-103, 1983. ,