. .. , 188 6.2.2 Control and value function learning by double DNN

. .. Numerical-applications,

C. .. Discussion, Algorithms (ii) take a very small learning rate parameter, for the Adam optimizer

, Doing so, one obtains stable estimates of the value function and optimal control

, the pseudo-code of an algorithm based on the quantization and k-nearest neighbors methods, called Qknn, which will be the benchmark in all the low-dimensional control problems that will be considered in Section 6.3 to test NNContPI, ClassifPI, Hybrid-Now and Hybrid-Later. Also, comparisons of Algorithm 6.5 to other well-known algorithms on various control problems in low-dimension are performed in [Bal+19

. .. , We also consider grids ? k , k = 0, we consider an L-optimal quantizer of the noise ? n , i.e. a discrete random variable? n valued in a grid {e 1 , . . . , e L } of L points in E, and with weights p 1

. .. Introduction, 3 Deep learning-based schemes for semi-linear PDEs

. .. Convergence-analysis,

. .. , 237 7.5.2 PDEs with unbounded solution and more complex structure, Deep learning schemes for high-dimensional PDEs 7.5 Numerical results

, ?W ti ) (called training data in the machine learning language), one expects the neural networks U i and Z i to learn, Therefore, by minimizing over ? this quadratic loss function, via SGD based on simulations of (X ti , X ti+1

, Similarly, the second scheme DPDP2, which uses only neural network on the value functions, learns u(t i , .) by means of the neural network U i , and ? (t i , )D x u(t i , ) via ? (t i , )D x U i . The rigorous arguments for the convergence of these schemes will be derived in the next section. The advantages of our two schemes

F. Abergel, A. Anane, A. Chakraborti, A. Jedidi, and I. Muni-toke, Limit Order Books, 2016.
URL : https://hal.archives-ouvertes.fr/hal-02177394

R. Almgren and N. Chriss, Optimal execution of portfolio transactions, Journal of Risk, vol.3, pp.5-40, 2001.

F. Abergel, C. Huré, and H. Pham, Algorithmic trading in a microstructural limit order book model, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01514987

C. Alasseur, A. Balata, S. Ben-aziza, A. Maheshwari, P. Tankov et al., Regression Monte Carlo for microgrid management, ESAIM Proceedings and surveys, p.65, 2019.

M. Avellaneda and S. Stoikov, High-frequency trading in a limit order book, Quantitative Finance, vol.8, pp.217-224, 2007.

R. Avikainen, On irregular functionals of SDEs and the Euler scheme, Finance and Stochastics, vol.13, pp.381-401, 2009.

F. Bach, Breaking the curse of dimensionality with convex neural networks, Journal of Machine Learning Research, vol.18, pp.1-53, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01098505

A. Balata, C. Huré, M. Laurière, H. Pham, and I. Pimentel, A class of finitedimensional numerically solvable McKean-Vlasov control problems, ESAIM Proceedings and surveys, p.65, 2019.
URL : https://hal.archives-ouvertes.fr/hal-01718751

N. Baradel, B. Bouchard, D. Evangelista, and O. Mounjid, Optimal inventory management and order book modeling, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01710301

B. Bouchard and J. F. Chassagneux, Discrete-time approximation for continuously and discretely reflected BSDEs, Stochastic Processes and their Applications, vol.118, pp.2269-2293, 2008.

A. Bismuth, O. Guéant, and J. Pu, Portfolio choice, portfolio liquidation, and portfolio transition under drift uncertainty, 2016.

C. Bender, C. Gärtner, and N. Schweizer, Pathwise Dynamic Programming". In: Mathematics of Operations Research, 2018.

A. Bachouch, C. Huré, N. Langrené, and H. Pham, Deep neural networks algorithms for stochastic control problems on finite horizon: numerical applications, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01949221

D. Bertsimas, L. Kogan, and A. W. Lo, Hedging derivative securities and incomplete markets: an ?-arbitrage approach, Operations Research, vol.49, pp.372-397, 2001.

D. Belomestny, A. Kolodko, and J. Schoenmakers, Regression methods for stochastic control problems and their convergence analysis, SIAM Journal on Control and Optimization, vol.48, pp.3562-3588, 2010.

R. Buckdahn, J. Li, S. Peng, and C. Rainer, Mean-field stochastic differential equations and associated PDEs, The Annals of Probability, vol.45, pp.824-878, 2017.

V. Bally and G. Pagès, Error analysis of the quantization algorithm for obstacle problems, Stochastic Processes and their Applications, vol.106, pp.1-40, 2003.
URL : https://hal.archives-ouvertes.fr/hal-00103987

A. Balata and J. Palczewski, Regress-Later Monte Carlo for optimal control of Markov processes, 2017.

A. Balata and J. Palczewski, Regress-Later Monte-Carlo for optimal inventory control with applications in energy, 2018.

N. Bäuerle and U. Rieder, Markov Decision Processes with Applications to Finance

. Springer, , 2011.

C. Bayer, M. Redmann, and J. Schoenmakers, Dynamic programming for optimal stopping via pseudo-regression, 2018.

P. Brémaud, Point Processes and Queues : Martingale Dynamics, 1981.

C. Bender and J. Steiner, Least-squares Monte Carlo for backward SDEs, Numerical methods in finance, pp.257-289, 2012.

D. Belomestny, J. Schoenmakers, V. Spokoiny, and Y. Tavyrikov, Optimal stopping via reinforced regression, 2018.

B. Bouchard and N. Touzi, Discrete-time approximation and Monte-Carlo simulation of backward stochastic differential equations, Stochastic Processes and their applications, vol.111, pp.175-206, 2004.
URL : https://hal.archives-ouvertes.fr/hal-00103046

B. Bouchard and X. Warin, Monte-Carlo valuation of American options: facts and new algorithms to improve existing methods, Numerical methods in finance, pp.215-255, 2012.

P. Cardaliaguet, Notes on mean field games. Tech. rep. from P.-L. Lions' lectures at Collège de France, 2010.

J. Chassagneux, D. Crisan, and F. Delarue, A probabilistic approach to classical solutions of the master equation for large population equilibria, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01144845

R. Carmona, J. Fouque, and L. Sun, Mean field games and systemic risk, Communications in Mathematical Sciences, vol.13, issue.4, pp.911-933, 2015.

A. Cartea and S. , Modeling Asset Prices for Algorithmic and High Frequency Trading, Applied Mathematical Finance, vol.20, issue.6, pp.512-547, 2010.

A. Cartea and S. , Risk metrics and fine tuning of high-frequency trading strategies, In: Mathematical Finance, vol.25, issue.3, pp.576-611, 2013.

A. Cartea, S. Jaimungal, and J. Ricci, Buy Low Sell High: a High Frequency Trading Perspective, SIAM Journal on Financial Mathematics, pp.415-444, 2014.

R. Carmona and M. Ludkovski, Valuation of energy storage: an optimal switching approach, Quantitative Finance, vol.10, pp.359-374, 2010.

Q. Chan-wai-nam, J. Mikael, and X. Warin, Machine Learning for semi linear PDEs, 2018.

A. Cartea, J. Penalva, and S. , Algorithmic and High-frequency trading, 2015.

J. Chassagneux and A. Richou, Numerical simulation of quadratic BSDEs, The Annals of Applied Probabilities, vol.26, pp.262-304, 2016.
URL : https://hal.archives-ouvertes.fr/hal-00990555

R. Cont, S. Stoikov, and R. Talreja, A stochastic model for order book dynamics, Operations Research, vol.58, pp.549-563, 2007.
URL : https://hal.archives-ouvertes.fr/hal-00497666

Q. Chan-wai-nam, J. Mikael, and X. Warin, Machine Learning for semi linear PDEs, 2018.

G. Cybenko, Approximations by superpositions of sigmoidal functions, Mathematics of Control, Signals, and Systems, vol.2, pp.303-314, 1989.

S. E. Aoud and F. , A stochastic control approach for options market making, In: World scientific publishing company, vol.1, issue.1, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01061852

W. E. , J. Han, and A. Jentzen, Deep learning-based numerical methods for highdimensional parabolic partial differential equations and backward stochastic differential equations, Communications in Mathematics and Statistics, vol.5, pp.349-380, 2017.

N. E. Karoui, C. Kapoudjian, E. Pardoux, S. Peng, and M. C. Quenez, Reflected Solutions of Backward SDEs, and related obstacle problems for PDEs, Annals of Probability, vol.25, pp.702-737, 1997.

P. Fodra and H. Pham, High Frequency trading and asymptotics for small risk aversion in a Markov renewal model, SIAM Journal of Financial Mathematics, vol.6, issue.1, pp.656-684, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01261362

P. Fodra and H. Pham, Semi Markov model for market microstructure, Applied Mathematical Finance, vol.22, issue.3, pp.261-295, 2015.
URL : https://hal.archives-ouvertes.fr/hal-00819269

L. Fiorin, G. Pagès, and A. Sagna, Product Markovian quantization of a diffusion process with applications to finance, Methodology and Computing in Applied Probability, pp.1-32, 2018.

A. Géron, Deep Learning avec TensorFlow, 2017.

I. Goodfellow, Y. Bengio, and A. Courville, Deep learning, 2016.

J. Guyon and P. Henry-labordère, Uncertain volatility model: a Monte-Carlo approach, p.SSRN, 2010.

L. Gyorfi, M. Kohler, A. Krzyzak, and H. Walk, A Distribution-Free Theory of Nonparametric Regression, 2002.

S. Graf and H. Luschgy, Foundations of quantization for probability distributions, vol.1730, 2000.

O. Guéant, C. Lahalle, and J. Fernandez-tapia, Dealing with the Inventory Risk, In: Mathematics and Financial Economics, vol.7, pp.477-507, 2012.

E. Gobet, J. Lemor, and X. Warin, A regression-based Monte Carlo method to solve backward stochastic differential equations, The Annals of Applied Probability, vol.15, pp.2172-2202, 2005.

E. Gobet, Monte-Carlo methods and stochastic processes: from linear to non-linear, 2016.

F. Guilbaud and H. Pham, Optimal High Frequency Trading with limit and market orders, Quantitative Finance, vol.13, pp.79-94, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00603385

J. Gatheral and A. Schied, Dynamical models of market impact and algorithms for order execution, Handbook on Systemic Risk, pp.579-602, 2013.

O. Guéant, The Financial Mathematics of Market Liquidity: From optimal execution to market making, 2016.

P. Glasserman and B. Yu, Simulation for American options: Regression Now or Regression Later?, pp.213-226, 2002.

J. Han and W. E. , Deep learning approximation for stochastic control problems, 2016.

B. Heymann, J. F. Bonnans, P. Martinon, F. J. Silva, F. Lanas et al., Continuous optimal control approaches to microgrid energy management, Energy Systems, vol.9, pp.59-77, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01129393

J. Han, A. Jentzen, and W. E. , Overcoming the curse of dimensionality: Solving highdimensional partial differential equations using deep learning, 2017.

P. Henry-labordere, N. Oudjane, X. Tan, N. Touzi, and X. Warin, Branching diffusion representation of semilinear PDEs and Monte Carlo approximation, Annales de l'Institut Henri Poincaré (B) Probabilités et Statistiques, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01429549

P. Henry-labordère, Deep primal-dual algorithm for BSDEs: applications of machine learning to CVA and IM, p.3071506, 2017.

J. Han and J. Long, Convergence of the deep BSDE method for coupled FBSDEs, 2018.

W. Huang, C. Lehalle, and M. Rosenbaum, Simulating and analyzing order book data: The queue-reactive model, Journal of the American Statistical Association, vol.110, issue.509, pp.107-122, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01172326

K. Hornick, Approximation capabilities of multilayer feedforward networks, Neural Networks, vol.4, pp.251-257, 1991.

C. Huré, H. Pham, A. Bachouch, and N. Langrené, Deep neural networks algorithms for stochastic control problems on finite horizon: convergence analysis, 2018.

C. Huré, H. Pham, and X. Warin, Some machine learning schemes for high-dimensional nonlinear PDEs, 2019.

T. Ho and H. Stoll, Optimal dealer pricing under transactions and return uncertainty, Journal of Financial Economics, vol.9, pp.47-73, 1979.

K. Hornik, M. Stinchcombe, and H. White, Multilayer feedforward networks are universal approximators, Neural Networks, vol.2, issue.5, pp.359-366, 1989.

K. Hornik, M. Stinchcombe, and H. White, Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks, Neural Networks, vol.3, pp.551-560, 1990.

R. Hildebrand, J. Schoenmakers, J. Zhang, and F. Dickmann, Regression based duality approach to optimal control with application to hydro electricity storage, 2016.

A. Jacquier and H. Liu, Optimal Liquidation in a level-I Limit order book for large tick stocks, SIFIN 9, pp.875-906, 2018.

P. Jaillet, D. Lamberton, and B. Lapeyre, Variational Inequalities and the Pricing of American Options, Acta Applicandae Mathematicae, vol.21, issue.3, pp.263-289, 1990.
URL : https://hal.archives-ouvertes.fr/hal-01667008

D. R. Jiang and W. B. Powell, An approximate dynamic programming algorithm for monotone value functions, Operations Research, vol.63, issue.6, pp.1489-1511, 2015.

M. Kohler, A. Krzy?ak, and N. Todorovic, Pricing of high-dimensional American options by neural networks, Mathematical Finance, vol.20, pp.383-410, 2010.

I. Kharroubi, N. Langrené, and H. Pham, A numerical algorithm for fully nonlinear HJB equations: an approach by control randomization, Monte Carlo Methods and Applications, vol.20, pp.145-165, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01019472

M. Kohler, Nonparametric regression with additional measurement errors in the dependent variable, Journal of Statistical Planning and Inference, vol.136, issue.10, pp.3339-3361, 2006.

A. N. Kolmogorov, On the representation of continuous functions of several variables by superpositions of continuous functions of a smaller number of variables, Mathematics and Its Applications, p.25, 1991.

S. Kou, X. Peng, and X. Xu, EM algorithm and stochastic control

Y. Lecun, Y. Bengio, and G. Hinton, Deep learning, Nature, vol.521, pp.436-444, 2015.

J. Lemor, E. Gobet, and X. Warin, Rate of convergence of an empirical regression method for solving generalized backward stochastic differential equations, In: Bernoulli, vol.12, issue.5, pp.889-916, 2006.
URL : https://hal.archives-ouvertes.fr/hal-00394976

Y. Li, Deep reinforcement learning: an overview, 2017.

P. Lions, Théorie des jeux de champ moyen et applications (mean field games, 2012.

M. Ludkovski and A. Maheshwari, Simulation methods for stochastic storage problems: a statistical learning perspective, 2018.

C. Lehalle, M. Othmane, and M. Rosenbaum, Optimal liquidity-based trading tactics, 2018.

F. A. Longstaff and E. S. Schwartz, Valuing American options by simulation: a simple least-squares approach, The Review of Financial Studies, vol.14, pp.113-147, 2001.

L. Massoulié, Stability results for a general class of interacting point processes dynamics, and applications, Stochastic Processes and their Applications, vol.75, pp.1-30, 1998.

V. Mnih, K. Kavukcuoglu, D. Silver, and A. A. Rusu, Human-level control through deep reinforcement learning, Nature, vol.518, pp.529-533, 2015.

M. Muja and D. Lowe, Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration, International Conference on Computer Vision Theory and Applications, 2009.

M. Nielsen, Neural networks and deep learning

S. Nadarajah, F. Margot, and N. Secomandi, Comparison of least squares Monte Carlo methods with applications to energy real options, European Journal of Operational Research, vol.256, pp.196-204, 2017.

D. P. Bertsekas and J. Tsitsiklis, Neuro-Dynamic Programming, Athena Scientific, 1996.

H. Pham, Linear quadratic optimal control of conditional McKean-Vlasov equation with random coefficients and applications, Probability, Uncertainty and Quantitative Risk, vol.1, p.7, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01305929

W. B. Powell, Approximate dynamic programming: solving the curses of dimensionality, 2011.

E. Pardoux and S. Peng, Adapted solution of a backward stochastic differential equation, Systems & Control Letters, vol.14, pp.55-61, 1990.

G. Pagès, H. Pham, and J. Printems, An optimal Markovian quantization algorithm for multi-dimensional stochastic control problems, Stochastics and Dynamics, vol.4, pp.501-545, 2004.

G. Pagès, H. Pham, and J. Printems, Optimal quantization methods and applications to numerical problems in finance, Handbook of computational and numerical methods in finance, pp.253-297, 2004.

G. Pagès, H. Pham, and J. Printems, Optimal quantization methods and applications to numerical problems in finance, Handbook on Numerical Methods in Finance

H. Pham and X. Wei, Dynamic programming for optimal control of stochastic McKean-Vlasov dynamics, SIAM Journal on Control and Optimization, vol.55, pp.1069-1101, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01302289

A. Richou, Etude théorique et numérique des équations différentielles stochastiques rétrogrades, 2010.

A. Richou, Numerical simulation of BSDEs with drivers of quadratic growth, The Annals of Applied Probability, vol.21, pp.1933-1964, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00443704

L. C. Rogers, Monte Carlo Valuation of American Options, In: Mathematical Finance, vol.12, pp.271-286, 2002.

I. Rosu, A dynamic model of the limit order book, Review of Financial Studies, vol.22, pp.4601-4641, 2008.
URL : https://hal.archives-ouvertes.fr/hal-00515873

R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, 1998.

D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre et al.,

D. Graepel and . Hassabis, Mastering the game of Go with deep neural networks and tree search, Nature, vol.529, 2016.

D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang et al., Mastering the game of Go without human knowledge, Nature, vol.550, 2017.

J. Sirignano and K. Spiliopoulos, DGM: A deep learning algorithm for solving partial differential equations, Journal of Computational Physics, vol.375, pp.1339-1364, 2018.

X. Warin, Monte Carlo for high-dimensional degenerated Semi Linear and Full Non Linear PDEs, 2018.

X. Warin, Nesting Monte Carlo for high-dimensional Non Linear PDEs, Monte Carlo Methods and Applications, 2018.

A. C. Wilson, R. Roelofs, M. Stern, N. Srebro, and B. Recht, The marginal value of adaptive gradient methods in machine learning, 31st Conference on Neural Information Processing Systems, 2017.

J. Zhang, A numerical scheme for BSDE's, The Annals of Applied Probability, vol.14, pp.459-488, 2004.