M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen et al., , 2015.

M. Al-shedivat, A. G. Wilson, Y. Saatchi, Z. Hu, and E. P. Xing, Learning scalable deep kernels with recurrent structure, Journal of Machine Learning Research, vol.18, p.37, 2017.

M. Anitescu, J. Chen, W. , and L. , A matrix-free approach for solving the parametric Gaussian process maximum likelihood problem, SIAM Journal on Scientific Computing, vol.34, issue.1, pp.240-262, 2012.

A. Asuncion and D. J. Newman, UCI machine learning repository, 2007.

H. Avron, K. L. Clarkson, and D. P. Woodruff, Faster kernel ridge regression using sketching and preconditioning, SIAM Journal of Matrix Analysis Applications, vol.38, issue.4, pp.1116-1138, 2017.

H. Avron, M. Kapralov, C. Musco, C. Musco, A. Velingker et al., , 2017.

, Random Fourier features for kernel ridge regression: Approximation bounds and statistical guarantees, 34th International Conference on Machine Learning (ICML), pp.253-262

H. Avron and S. Toledo, Randomized algorithms for estimating the trace of an implicit symmetric positive semi-definite matrix, Journal of the ACM (JACM), vol.58, issue.2, p.34, 2011.

E. Awad, S. Levine, M. Kleiman-weiner, S. Dsouza, J. B. Tenenbaum et al., Blaming humans in autonomous vehicle accidents: Shared responsibility across levels of automation, 2018.

O. Axelsson, Iterative Solution Methods, 1994.

S. Bartels, J. Cockayne, I. C. Ipsen, and P. Hennig, Probabilistic linear solvers: A unifying view, 2018.

S. Bartels and P. Hennig, Probabilistic approximate least-squares, 19th International Conference on Artificial Intelligence and Statistics (AISTATS), pp.676-684, 2016.

M. Bauer, M. Van-der-wilk, and C. E. Rasmussen, Understanding probabilistic sparse Gaussian process approximations, Advances in Neural Information Processing Systems 29 (NeurIPS), pp.1533-1541, 2016.

A. G. Baydin, B. A. Pearlmutter, A. A. Radul, and J. Siskind, Automatic differentiation in machine learning: A survey, Journal of Machine Learning Research, vol.18, p.43, 2017.

C. Benoît, Note sur une méthode de résolution des équations normales provenant de l'application de la méthode des moindres carrés a un système d'équations linéaires en nombre inférieur a celui des inconnues. -application de la méthode a la résolution d'un système defini d'équations linéaires, Bulletin géodésique, vol.2, issue.1, pp.67-77, 1924.

K. Bimbraw, Autonomous cars: Past, present and future a review of the developments in the last century, the present scenario and the expected future of autonomous vehicle technology, 12th International Conference on Informatics in Control, Automation and Robotics (ICINCO), pp.191-198, 2015.

D. M. Blei, A. Kucukelbir, and J. D. Mcauliffe, Variational inference: A review for statisticians, Journal of the American Statistical Association, vol.112, issue.518, pp.859-877, 2017.

K. Blomqvist, S. Kaski, and M. Heinonen, Deep convolutional Gaussian processes, 2018.

E. V. Bonilla, K. Krauth, and A. Dezfouli, Generic inference in latent Gaussian process models, 2016.

J. Bonnefon, A. Shariff, and I. Rahwan, The social dilemma of autonomous vehicles, Science, vol.352, issue.6293, pp.1573-1576, 2016.

N. Bostrom, Superintelligence: Paths, Dangers, Strategies, 2014.

C. Boutsidis, P. Drineas, P. Kambadur, E. Kontopoulou, and A. Zouzias, A randomized algorithm for approximating the log determinant of a symmetric positive definite matrix, Linear Algebra and its Applications, vol.533, pp.95-117, 2017.

F. Briol, C. J. Oates, M. Girolami, M. A. Osborne, and D. Sejdinovic, Rejoinder for "Probabilistic integration: A role in statistical computation, 2018.

J. Bruna and S. Mallat, Invariant scattering convolution networks, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.35, issue.8, pp.1872-1886, 2013.

T. D. Bui, D. Hernández-lobato, J. M. Hernández-lobato, Y. Li, and R. E. Turner, , 2016.

, Deep Gaussian processes for regression using approximate expectation propagation, 33rd International Conference on Machine Learning (ICML), pp.1472-1481

T. D. Bui, J. Yan, and R. E. Turner, A unifying framework for Gaussian process pseudo-point approximations using power expectation propagation, Journal of Machine Learning Research, vol.18, p.72, 2017.

K. Chalupka, C. K. Williams, M. , and I. , A framework for evaluating approximation methods for Gaussian process regression, Journal of Machine Learning Research, vol.14, issue.1, pp.333-350, 2013.

J. Chen, R. Monga, S. Bengio, J. , and R. , Revisiting distributed synchronous SGD, Workshop Track, 4th International Conference on Learning Representations, 2016.

T. M. Chilimbi, Y. Suzue, J. Apacible, and K. Kalyanaraman, Project Adam: Building an efficient and scalable deep learning training system, 11th USENIX Symposium on Operating Systems Design and Implementation OSDI, pp.571-582, 2014.

Y. Cho and L. K. Saul, Kernel methods for deep learning, Advances in Neural Information Processing Systems 22 (NeurIPS), pp.342-350, 2009.

J. Cockayne, C. J. Oates, and M. A. Girolami, A Bayesian conjugate gradient method, 2018.

L. Csató and M. Opper, Sparse on-line Gaussian processes, Neural Computation, vol.14, issue.3, pp.641-668, 2002.

K. Cutajar, E. V. Bonilla, P. Michiardi, and M. Filippone, Random feature expansions for deep Gaussian processes, 34th International Conference on Machine Learning (ICML), pp.884-893, 2017.

K. Cutajar, M. A. Osborne, J. P. Cunningham, and M. Filippone, Preconditioning kernel matrices, 33rd International Conference on Machine Learning (ICML), pp.2529-2538, 2016.

K. Cutajar, M. Pullin, A. Damianou, N. D. Lawrence, and J. Gonzalez, Deep Gaussian processes for multi-fidelity modeling, Bayesian Deep Learning Workshop, vol.31, 2018.

Z. Dai, A. Damianou, J. González, and N. D. Lawrence, Variational auto-encoded deep Gaussian processes, 4th International Conference on Learning Representations, 2016.

A. Damianou, Deep Gaussian processes and variational propagation of uncertainty, 2015.

A. Damianou and N. D. Lawrence, Deep Gaussian processes, 16th International Conference on Artificial Intelligence and Statistics (AISTATS), pp.207-215, 2013.

A. Davies, Effective Implementation of Gaussian Process Regression for Machine Learning, 2014.

T. A. Davis and Y. Hu, The University of Florida Sparse Matrix Collection, ACM Transactions on Mathematical Software (TOMS), vol.38, issue.1, p.1, 2011.

B. De-finetti, First of two volumes translated from Teoria Delle probabilità, 1974.

M. Denil, B. Shakibi, L. Dinh, M. Ranzato, and N. De-freitas, Predicting parameters in deep learning, Advances in Neural Information Processing Systems 26 (NeurIPS), pp.2148-2156, 2013.

A. Dezfouli and E. V. Bonilla, Scalable inference for Gaussian process models with black-box likelihoods, Advances in Neural Information Processing Systems 28 (NeurIPS), pp.1414-1422, 2015.

R. Domingues, P. Michiardi, J. Zouaoui, and M. Filippone, Deep Gaussian process autoencoders for novelty detection. Machine Learning, Special Issue of the ECML, Journal Track, pp.885-6125, 2018.

K. Dong, D. Eriksson, H. Nickisch, D. Bindel, W. et al., Scalable log determinants for Gaussian process kernel learning, Advances in Neural Information Processing Systems 30 (NeurIPS), pp.6330-6340, 2017.

J. Duchi, E. Hazan, and Y. Singer, Adaptive subgradient methods for online learning and stochastic optimization, Journal of Machine Learning Research, vol.12, pp.2121-2159, 2011.

M. M. Dunlop, M. A. Girolami, A. M. Stuart, and A. L. Teckentrup, How deep are deep Gaussian processes, Journal of Machine Learning Research, vol.19, issue.54, pp.1-46, 2018.

D. Duvenaud, Automatic model construction with Gaussian processes, 2014.

D. Duvenaud, O. Rippel, R. P. Adams, and Z. Ghahramani, Avoiding pathologies in very deep networks, 17th International Conference on Artificial Intelligence and Statistics (AISTATS), pp.202-210, 2014.

B. Efron, Bootstrap methods: Another look at the jackknife, The Annals of Statistics, vol.7, issue.1, pp.1-26, 1979.

L. Ekenberg and J. Thorbiörnson, Second-order decision analysis, International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, vol.9, issue.01, pp.13-37, 2001.

T. W. Evans and P. B. Nair, Exploiting structure for fast kernel learning, SIAM International Conference on Data Mining, (SDM), pp.414-422, 2018.

M. Filippone and R. Engler, Enabling scalable stochastic gradient-based inference for Gaussian processes by employing the unbiased linear system solver (ULISSE), 32nd International Conference on Machine Learning (ICML), pp.1015-1024, 2015.

M. Filippone, M. Zhong, and M. Girolami, A comparative evaluation of stochasticbased inference methods for Gaussian process models, Machine Learning, vol.93, pp.93-114, 2013.

J. K. Fitzsimons, K. Cutajar, M. Filippone, M. A. Osborne, and S. J. Roberts, Bayesian inference of log determinants, 33rd Conference on Uncertainty in Artificial Intelligence (UAI), 2017.

J. K. Fitzsimons, D. Granziol, K. Cutajar, M. A. Osborne, M. Filippone et al., Entropic trace estimates for log determinants, Machine Learning and Knowledge Discovery in Databases -European Conference, ECML/PKDD, pp.323-338, 2017.

J. K. Fitzsimons, M. A. Osborne, S. J. Roberts, and J. F. Fitzsimons, Improved stochastic trace estimation using mutually unbiased bases, 34th Conference on Uncertainty in Artificial Intelligence (UAI), 2018.

S. Flaxman, A. Wilson, D. Neill, H. Nickisch, and A. Smola, Fast Kronecker inference in Gaussian processes with non-Gaussian likelihoods, 32nd International Conference on Machine Learning (ICML), pp.607-616, 2015.

R. Föll, B. Haasdonk, M. Hanselmann, and H. Ulmer, Deep recurrent Gaussian process with variational sparse spectrum approximation, 2017.

C. B. Frey and M. A. Osborne, The future of employment: How susceptible are jobs to computerisation?, Technological Forecasting and Social Change, vol.114, pp.254-280, 2017.

Y. Gal and Z. Ghahramani, Dropout as a Bayesian approximation: Representing model uncertainty in deep learning, 33rd International Conference on Machine Learning (ICML), pp.1050-1059, 2016.

Y. Gal and R. Turner, Improving the Gaussian process sparse spectrum approximation by representing uncertainty in frequency inputs, 32nd International Conference on Machine Learning (ICML), pp.655-664, 2015.

Y. Gal, M. Van-der-wilk, and C. E. Rasmussen, Distributed variational inference in sparse Gaussian process regression and latent variable models, Advances in Neural Information Processing Systems 27 (NeurIPS), pp.3257-3265, 2014.

J. R. Gardner, G. Pleiss, D. Bindel, K. Q. Weinberger, and A. G. Wilson, GPyTorch: Blackbox matrix-matrix Gaussian process inference with GPU acceleration, Advances in Neural Information Processing Systems 31 (NeurIPS), pp.7587-7597, 2018.

C. J. Geoga, M. Anitescu, and M. L. Stein, Scalable Gaussian process computations using hierarchical matrices, 2018.

S. Gershgorin, Uber die Abgrenzung der Eigenwerte einer Matrix, Izvestija Akademii Nauk SSSR, Serija Matematika, vol.7, issue.3, pp.749-754, 1931.

Z. Ghahramani, Bayesian non-parametrics and the probabilistic approach to modelling, Philosophical Transactions of the Royal Society, vol.371, p.20110553, 1984.

E. Gilboa, Y. Saatçi, and J. P. Cunningham, Scaling multidimensional Gaussian processes using projected additive approximations, 30th International Conference on Machine Learning (ICML), pp.454-461, 2013.

G. H. Golub and C. F. Van-loan, Matrix computations, 1996.

I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning, 2016.

A. Graves, Practical variational inference for neural networks, Advances in Neural Information Processing Systems 24 (NeurIPS), pp.2348-2356, 2011.

R. M. Gray, Entropy and Information Theory, 1990.

N. Halko, P. Martinsson, and J. A. Tropp, Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions, SIAM Review, vol.53, issue.2, pp.217-288, 2011.

I. Han, D. Malioutov, and J. Shin, Large-scale log-determinant computation through stochastic Chebyshev expansions, 32nd International Conference on Machine Learning (ICML), pp.908-917, 2015.

R. Henao and O. Winther, Predictive active set selection methods for Gaussian processes, Neurocomputing, vol.80, pp.10-18, 2012.

P. Hennig, Probabilistic interpretation of linear solvers, SIAM Journal on Optimization, vol.25, issue.1, pp.234-260, 2015.

P. Hennig, M. A. Osborne, and M. Girolami, Probabilistic numerics and uncertainty in computations, Proceedings of the Royal Society of London A: Mathematical, Physical and Engineering Sciences, p.471, 2015.

J. Hensman, N. Durrande, and A. Solin, Variational fourier features for Gaussian processes, Journal of Machine Learning Research, vol.18, p.52, 2017.
URL : https://hal.archives-ouvertes.fr/emse-01411206

J. Hensman, N. Fusi, and N. D. Lawrence, Gaussian processes for big data, 29th Conference on Uncertainty in Artificial Intelligence (UAI), 2013.

J. Hensman and N. D. Lawrence, Nested variational compression in deep Gaussian processes, 2014.

J. Hensman, A. G. Matthews, M. Filippone, and Z. Ghahramani, MCMC for variationally sparse Gaussian processes, Advances in Neural Information Processing Systems 28 (NeurIPS), pp.1648-1656, 2015.

J. Hensman, A. G. Matthews, and Z. Ghahramani, Scalable variational Gaussian process classification, 18th International Conference on Artificial Intelligence and Statistics (AISTATS), pp.351-360, 2015.

J. M. Hernández-lobato and R. P. Adams, Probabilistic backpropagation for scalable learning of Bayesian neural networks, 32nd International Conference on Machine Learning (ICML), pp.1861-1869, 2015.

M. R. Hestenes and E. Stiefel, Methods of conjugate gradients for solving linear systems, Journal of research of the National Bureau of Standards, vol.49, pp.409-436, 1952.

Q. M. Hoang, T. N. Hoang, and K. H. Low, A generalized stochastic variational Bayesian hyperparameter learning framework for sparse spectrum Gaussian process regression, 31st Conference on Artificial Intelligence (AAAI), pp.2007-2014, 2017.

M. D. Hoffman, D. M. Blei, C. Wang, and J. W. Paisley, Stochastic variational inference, Journal of Machine Learning Research, vol.14, issue.1, pp.1303-1347, 2013.

F. Huszar and D. Duvenaud, Optimally-weighted herding is Bayesian quadrature, 28th Conference on Uncertainty in Artificial Intelligence (UAI), pp.377-386, 2012.

M. I. Jordan, Z. Ghahramani, T. S. Jaakkola, and L. K. Saul, An introduction to variational methods for graphical models, Machine Learning, vol.37, pp.183-233, 1999.

E. F. Kaasschieter, Preconditioned conjugate gradients for solving singular systems, Journal of Computational and Applied mathematics, vol.24, issue.1-2, pp.265-275, 1988.

S. Kamthe and M. P. Deisenroth, Data-efficient reinforcement learning with probabilistic model predictive control, 21st International Conference on Artificial Intelligence and Statistics (AISTATS), pp.1701-1710, 2018.

A. Kendall and Y. Gal, What uncertainties do we need in Bayesian deep learning for computer vision?, Advances in Neural Information Processing Systems 30 (NeurIPS), pp.5580-5590, 2017.

D. P. Kingma and J. Ba, Adam: A method for stochastic optimization, 3rd International Conference on Learning Representations, 2015.

D. P. Kingma, T. Salimans, and M. Welling, Variational dropout and the local reparameterization trick, Advances in Neural Information Processing Systems 28 (NeurIPS), pp.2575-2583, 2015.

D. P. Kingma and M. Welling, Auto-encoding variational Bayes, 2nd International Conference on Learning Representations, 2014.

K. Samo, Y. Roberts, and S. , Generalized spectral kernels, 2015.

K. Krauth, E. V. Bonilla, K. Cutajar, and M. Filippone, AutoGP: Exploring the capabilities and limitations of Gaussian process models, 33rd Conference on Uncertainty in Artificial Intelligence (UAI), 2017.

M. Krzywinski and N. Altman, Importance of being uncertain, Nature Methods, vol.10, pp.809-810, 2013.

V. Kumar, V. Singh, P. K. Srijith, and A. Damianou, Deep Gaussian processes with convolutional kernels, 2018.

M. Kuss and C. E. Rasmussen, Assessing approximate inference for binary Gaussian process classification, Journal of Machine Learning Research, vol.6, pp.1679-1704, 2005.

N. D. Lawrence and A. J. Moore, Hierarchical Gaussian process latent variable models, 24th International Conference on Machine Learning (ICML), pp.481-488, 2007.

M. Lázaro-gredilla, J. Quinonero-candela, C. E. Rasmussen, and A. R. Vidal, Sparse spectrum Gaussian process regression, Journal of Machine Learning Research, vol.11, pp.1865-1881, 2010.

Q. V. Le, T. Sarlós, and A. J. Smola, Fastfood -computing Hilbert space expansions in loglinear time, 30th International Conference on Machine Learning (ICML), pp.244-252, 2013.

Y. Lecun, Y. Bengio, and G. E. Hinton, Deep learning, Nature, vol.521, issue.7553, pp.436-444, 2015.

Y. Lecun and C. Cortes, MNIST handwritten digit database, 2010.

J. Lee, J. Sohl-dickstein, J. Pennington, R. Novak, S. Schoenholz et al., Deep neural networks as Gaussian processes, 6th International Conference on Learning Representations, 2018.

M. Li, D. G. Andersen, A. J. Smola, Y. , and K. , Communication efficient distributed machine learning with the parameter server, Advances in Neural Information Processing Systems 27 (NeurIPS), pp.19-27, 2014.

Z. C. Lipton, The mythos of model interpretability, Communications of the ACM, vol.61, issue.10, pp.36-43, 2018.

H. Liu, Y. Ong, X. Shen, and J. Cai, When Gaussian process meets big data: A review of scalable GPs, 2018.

G. Loosli, S. Canu, and L. Bottou, Training invariant support vector machines using selective sampling, Large Scale Kernel Machines, pp.301-320, 2007.

M. Lorenzi and M. Filippone, Constraining the dynamics of deep probabilistic models, 35th International Conference on Machine Learning (ICML), pp.3233-3242, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01843006

C. Louizos and M. Welling, Structured and efficient variational deep learning with matrix Gaussian posteriors, 33rd International Conference on Machine Learning (ICML), pp.1708-1716, 2016.

D. J. Mackay, Bayesian interpolation, Neural Computation, vol.4, pp.415-447, 1991.

D. J. Mackay, Hyperparameters: optimize, or integrate out?, Maximum Entropy and Bayesian Methods, pp.43-60, 1996.

D. J. Mackay, Introduction to Gaussian processes, Neural Networks and Machine Learning, pp.133-166, 1998.

D. J. Mackay, Information theory, inference and learning algorithms, chapter 28, pp.343-355, 2003.

J. Mairal, P. Koniusz, Z. Harchaoui, and C. Schmid, Convolutional kernel networks, Advances in Neural Information Processing Systems 27 (NeurIPS), pp.2627-2635, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01005489

A. G. Matthews, J. Hensman, R. E. Turner, and Z. Ghahramani, On sparse variational methods and the Kullback-Leibler divergence between stochastic processes, 19th International Conference on Artificial Intelligence and Statistics (AISTATS), pp.231-239, 2016.

A. G. Matthews, J. Hron, M. Rowland, R. E. Turner, and G. , Z, 2018.

, Gaussian process behaviour in wide deep neural networks, 6th International Conference on Learning Representations (ICLR)

A. G. Matthews, M. Van-der-wilk, T. Nickson, K. Fujii, A. Boukouvalas et al., GPflow: A Gaussian process library using TensorFlow, Journal of Machine Learning Research, vol.18, issue.6, pp.1-40, 2017.

T. P. Minka, A Family of Algorithms for Approximate Bayesian Inference, 2001.

M. D. Morris, The design and analysis of computer experiments, Journal of the American Statistical Association, vol.99, issue.468, pp.1203-1204, 2004.

I. Murray, R. P. Adams, and D. J. Mackay, Elliptical slice sampling, 13th International Conference on Artificial Intelligence and Statistics (AISTATS), pp.541-548, 2010.

C. A. Nader, N. Ayache, P. Robert, and M. Lorenzi, Alzheimer's disease modelling and staging through independent Gaussian process analysis of spatio-temporal brain changes, Understanding and Interpreting Machine Learning in Medical Image Computing Applications -First International Workshops MLCN, DLF, and iMIMIC, Held in Conjunction with MICCAI, pp.3-14, 2018.

R. M. Neal, Bayesian learning via stochastic dynamics, Advances in Neural Information Processing Systems 5 (NeurIPS), pp.475-482, 1992.

R. M. Neal, Bayesian Learning for Neural Networks, 1995.

H. Nickisch and C. E. Rasmussen, Approximations for binary Gaussian process classification, Journal of Machine Learning Research, vol.9, pp.2035-2078, 2008.

Y. Notay, Flexible conjugate gradients, SIAM Journal on Scientific Computing, vol.22, issue.4, pp.1444-1460, 2000.

A. O'hagan, Bayes-Hermite Quadrature, Journal of Statistical Planning and Inference, vol.29, pp.245-260, 1991.

M. Opper and C. Archambeau, The variational Gaussian approximation revisited, Neural Computation, vol.21, issue.3, pp.786-792, 2009.

A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang et al., Automatic differentiation in PyTorch, Autodiff Workshop, Advances in Neural Information Processing Systems, p.31, 2017.

G. Petrone, Optimization under Uncertainty: theory, algorithms and industrial applications, 2011.

J. Platt, Probabilistic outputs for support vector machines and comparison to regularized likelihood methods, Advances in Large Margin Classifiers, pp.61-74, 1999.

G. Pleiss, J. R. Gardner, K. Q. Weinberger, and A. G. Wilson, Constant-time predictive distributions for Gaussian processes, 35th International Conference on Machine Learning (ICML), pp.4111-4120, 2018.

J. Quinonero-candela and C. E. Rasmussen, A unifying view of sparse approximate Gaussian process regression, Journal of Machine Learning Research, vol.6, pp.1939-1959, 2005.

A. Rahimi and B. Recht, Random features for large-scale kernel machines, Advances in Neural Information Processing Systems 20 (NeurIPS), pp.1177-1184, 2008.

R. Ranganath, S. Gerrish, and D. Blei, Black box variational inference, 17th International Conference on Artificial Intelligence and Statistics (AISTATS), pp.814-822, 2014.

C. E. Rasmussen and Z. Ghahramani, Occam's razor, Advances in Neural Information Processing Systems 13 (NeurIPS), pp.294-300, 2000.

C. E. Rasmussen and Z. Ghahramani, Bayesian Monte Carlo, Advances in Neural Information Processing Systems 15 (NeurIPS), pp.489-496, 2002.

C. E. Rasmussen and C. Williams, Gaussian Processes for Machine Learning, 2006.

S. Remes, M. Heinonen, and S. Kaski, Non-stationary spectral kernels, Advances in Neural Information Processing Systems 30 (NeurIPS), pp.4642-4651, 2017.

D. J. Rezende, S. Mohamed, and D. Wierstra, Stochastic backpropagation and approximate inference in deep generative models, 31st International Conference on Machine Learning (ICML), pp.1278-1286, 2014.

H. Robbins and S. Monro, A stochastic approximation method, Annals of Mathematical Statistics, vol.22, pp.400-407, 1951.

S. Roberts, M. A. Osborne, M. Ebden, S. Reece, N. Gibson et al., Gaussian processes for time-series modelling, Philosophical transactions. Series A, Mathematical, physical, and engineering sciences, vol.371, 2013.

A. Rudi and L. Rosasco, Generalization properties of learning with random features, Advances in Neural Information Processing Systems 30 (NeurIPS), pp.3218-3228, 2017.

W. Rudin, Fourier Analysis on Groups. A Wiley-interscience publication, 1990.

Y. Saad, Iterative Methods for Sparse Linear Systems, Society for Industrial and Applied Mathematics, 2003.

Y. Saatçi, Scalable inference for structured Gaussian process models, 2012.

T. N. Sainath, B. Kingsbury, V. Sindhwani, E. Arisoy, and B. Ramabhadran, Lowrank matrix factorization for deep neural network training with high-dimensional output targets, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.6655-6659, 2013.

H. Salimbeni and M. P. Deisenroth, Doubly stochastic variational inference for deep Gaussian processes, Advances in Neural Information Processing Systems 30 (NeurIPS), pp.4591-4602, 2017.

H. Salimbeni, S. Eleftheriadis, and J. Hensman, Natural gradients in practice: Nonconjugate variational inference in Gaussian process models, 21st International Conference on Artificial Intelligence and Statistics (AISTATS), pp.689-697, 2018.

B. Schölkopf, Support vector learning, 1997.

M. Seeger, C. K. Williams, and N. D. Lawrence, Fast forward selection to speed up sparse Gaussian process regression, 9th International Workshop on Artificial Intelligence and Statistics, 2003.

J. Shawe-taylor and N. Cristianini, Kernel Methods for Pattern Analysis, 2004.

Z. Shen, M. Heinonen, and S. Kaski, Harmonizable mixture kernels with variational Fourier features, 2018.

B. W. Silverman, Some aspects of the spline smoothing approach to non-parametric regression curve fitting, Journal of the Royal Statistical Society. Series B (Methodological), vol.47, issue.1, pp.1-52, 1985.

P. Y. Simard, D. Steinkraus, and J. C. Platt, Best practices for convolutional neural networks applied to visual document analysis, 7th International Conference on Document Analysis and Recognition, vol.2, p.958, 2003.

A. J. Smola and P. L. Bartlett, Sparse greedy Gaussian process regression, Advances in Neural Information Processing Systems 13 (NeurIPS), pp.619-625, 2001.

E. Snelson and Z. Ghahramani, Sparse Gaussian processes using pseudo-inputs, Advances in Neural Information Processing Systems 18 (NeurIPS), pp.1257-1264, 2005.

E. Snelson and Z. Ghahramani, Local and global sparse Gaussian process approximations, 11th International Conference on Artificial Intelligence and Statistics (AIS-TATS), pp.524-531, 2007.

B. V. Srinivasan, Q. Hu, N. A. Gumerov, R. Murtugudde, and R. Duraiswami, , 2014.

, Preconditioned Krylov solvers for kernel regression

S. Sundararajan and S. S. Keerthi, Predictive approaches for choosing hyperparameters in Gaussian processes, Neural Computation, vol.13, issue.5, pp.1103-1118, 2001.

S. Sundararajan and S. S. Keerthi, Predictive approaches for Gaussian process classifier model selection, 2012.

L. S. Tan, V. M. Ong, D. J. Nott, and A. Jasra, Variational inference for sparse spectrum Gaussian process regression, Statistics and Computing, vol.26, issue.6, pp.1243-1261, 2016.

M. K. Titsias, Variational learning of inducing variables in sparse Gaussian processes, 12th International Conference on Artificial Intelligence and Statistics (AISTATS), pp.567-574, 2009.

J. Ton, S. Flaxman, D. Sejdinovic, and S. Bhatt, Spatial mapping with Gaussian processes and nonstationary Fourier features, Spatial Statistics, vol.28, pp.59-78, 2018.

S. Ubaru, J. Chen, and Y. Saad, Fast estimation of tr(f(a)) via stochastic Lanczos quadrature, SIAM Journal on Matrix Analysis and Applications, vol.38, issue.4, pp.1075-1099, 2017.

M. Van-der-wilk, C. E. Rasmussen, and J. Hensman, Convolutional Gaussian processes, Advances in Neural Information Processing Systems 30 (NeurIPS), pp.2845-2854, 2017.

J. Vanhatalo, J. Riihimäki, J. Hartikainen, P. Jylänki, V. Tolvanen et al., GPstuff: Bayesian modeling with Gaussian processes, Journal of Machine Learning Research, vol.14, issue.1, pp.1175-1179, 2013.

A. Vehtari, T. Mononen, V. Tolvanen, T. Sivula, and O. Winther, Bayesian leaveone-out cross-validation approximations for Gaussian latent variable models, Journal of Machine Learning Research, vol.17, p.38, 2016.

G. Wahba, Spline Models for Observational Data, Society for Industrial and Applied Mathematics, 1990.

K. A. Wang, G. Pleiss, J. R. Gardner, K. Q. Weinberger, and A. G. Wilson, Exact Gaussian processes on a million data points, 2019.

Y. Wang, M. Brubaker, B. Chaib-draa, and R. Urtasun, Sequential inference for deep Gaussian process, 19th International Conference on Artificial Intelligence and Statistics (AISTATS), pp.694-703, 2016.

M. Welling and Y. W. Teh, Bayesian learning via stochastic gradient Langevin dynamics, 28th International Conference on Machine Learning (ICML), pp.681-688, 2011.

C. Williams, C. Rasmussen, A. Schwaighofer, and V. Tresp, Observations on the Nyström method for Gaussian process prediction, 2002.

C. K. Williams and M. Seeger, Using the Nyström method to speed up kernel machines, Advances in Neural Information Processing Systems 13 (NeurIPS), pp.682-688, 2001.

A. G. Wilson and R. Adams, Gaussian process kernels for pattern discovery and extrapolation, 30th International Conference on Machine Learning (ICML), pp.1067-1075, 2013.

A. G. Wilson, E. Gilboa, A. Nehorai, and J. P. Cunningham, Fast kernel learning for multidimensional pattern extrapolation, Advances in Neural Information Processing Systems 27 (NeurIPS), pp.3626-3634, 2014.

A. G. Wilson, Z. Hu, R. Salakhutdinov, and E. P. Xing, Deep kernel learning, 19th International Conference on Artificial Intelligence and Statistics (AISTATS), pp.370-378, 2016.

A. G. Wilson, Z. Hu, R. Salakhutdinov, and E. P. Xing, Stochastic variational deep kernel learning, Advances in Neural Information Processing Systems 30 (NeurIPS), pp.2594-2602, 2016.

A. G. Wilson, D. A. Knowles, and Z. Ghahramani, Gaussian process regression networks, 29th International Conference on Machine Learning (ICML), pp.1139-1146, 2012.

A. G. Wilson and H. Nickisch, Kernel interpolation for scalable structured Gaussian processes (KISS-GP), 32nd International Conference on Machine Learning (ICML), pp.1775-1784, 2015.

J. Wu and P. I. Frazier, Continuous-fidelity Bayesian optimization with knowledge gradient, BayesOpt Workshop, vol.30, 2017.

J. Wu, M. Poloczek, A. G. Wilson, and P. Frazier, Bayesian optimization with gradients, Advances in Neural Information Processing Systems 30 (NeurIPS), pp.5267-5278, 2017.

F. X. Yu, A. T. Suresh, K. M. Choromanski, D. N. Holtmann-rice, and S. Kumar, Orthogonal random features, Advances in Neural Information Processing Systems 29 (NeurIPS), pp.1975-1983, 2016.

C. Zhang, J. Butepage, H. Kjellstrom, and S. Mandt, Advances in variational inference, 2017.

J. Zhang, A. May, T. Dao, and C. Ré, Low-precision random Fourier features for memory-constrained kernel approximation, 2018.

W. Zhu, J. Miao, L. Qing, C. , and X. , Deep trans-layer unsupervised networks for representation learning, 2015.