, its performance relies on the consistency of the parameter estimation which may or may not occur. We exhibit this behavior through numerical experiments and discuss it in Sec
Regret bounds for the adaptive control of linear quadratic systems, In COLT, vol.21, issue.116, pp.1-26, 2011. ,
Bayesian optimal control of smoothly parameterized systems, Proceedings of the Conference on Uncertainty in Artificial Intelligence, pp.24-60, 2015. ,
Improved algorithms for linear stochastic bandits, Proceedings of the 25th Annual Conference on Neural Information Processing Systems (NIPS), p.16, 2011. ,
Online least squares estimation with self-normalized processes: An application to bandit problems. arXiv preprint arXiv:1102, p.16, 2011. ,
Linear Thompson sampling revisited, AISTATS 2017-20th International Conference on Artificial Intelligence and Statistics, pp.2017-2046 ,
DOI : 10.1214/17-EJS1341SI
URL : https://hal.archives-ouvertes.fr/hal-01493561
Thompson sampling for linear-quadratic control problems, AISTATS, pp.2017-59 ,
URL : https://hal.archives-ouvertes.fr/hal-01493564
Lqg for portfolio optimization. arXiv preprint, p.105, 2016. ,
Fighting bandits with a new kind of smoothness, Advances in Neural Information Processing Systems 28, pp.2197-2205 ,
An optimal poincaré inequality in l 1 for convex domains, Proceedings of the american mathematical society, pp.195-202, 2004. ,
Sample mean based index policies by O(log n) regret for the multi-armed bandit problem, Advances in Applied Probability, vol.32, issue.04, pp.1054-1078, 1995. ,
DOI : 10.1016/0196-8858(85)90002-8
Analysis of thompson sampling for the multi-armed bandit problem, Proceedings of the 25th Annual Conference on Learning Theory (COLT), p.14, 2012. ,
Thompson sampling for contextual bandits with linear payoffs. arXiv preprint arXiv:1209, pp.16-31, 2012. ,
, Bibliography
Further optimal regret bounds for thompson sampling, Proceedings of AI&Stats, p.15, 2013. ,
Posterior sampling for reinforcement learning: worst-case regret bounds, pp.2017-2041 ,
Optimal execution of portfolio transactions, The Journal of Risk, vol.3, issue.2, pp.5-40, 2001. ,
DOI : 10.21314/JOR.2001.041
Tuning Bandit Algorithms in Stochastic Environments, ALT, pp.150-165, 2007. ,
DOI : 10.1093/biomet/25.3-4.285
URL : https://hal.archives-ouvertes.fr/inria-00203487
Logarithmic online regret bounds for undiscounted reinforcement learning, Advances in Neural Information Processing Systems, pp.49-56, 2007. ,
Finite-time analysis of the multi-armed bandit problem, Machine Learning, vol.47, issue.2/3, pp.235-256, 2002. ,
DOI : 10.1023/A:1013689704352
The Nonstochastic Multiarmed Bandit Problem, SIAM Journal on Computing, vol.32, issue.1, pp.48-77, 2002. ,
DOI : 10.1137/S0097539701398375
URL : http://homepages.math.uic.edu/%7Elreyzin/f14_mcs548/auer02.pdf
REGAL: A regularization based algorithm for reinforcement learning in weakly communicating MDPs, Proceedings of the 25th Annual Conference on Uncertainty in Artificial Intelligence, p.22, 2009. ,
Dynamic programming and optimal control, pp.20-127, 1995. ,
Optimal control of execution costs, Journal of Financial Markets, vol.1, issue.1, pp.1-50, 1998. ,
DOI : 10.1016/S1386-4181(97)00012-8
URL : http://web.mit.edu/dbertsim/www/papers/Finance/Optimal%20control%20of%20execution%20costs.pdf
[untitled], Communications in Information and Systems, vol.6, issue.4, pp.299-320, 2006. ,
DOI : 10.4310/CIS.2006.v6.n4.a3
How markets slowly digest changes in supply and demand. arXiv.org, p.106, 2008. ,
DOI : 10.1016/b978-012374258-2.50006-3
URL : http://arxiv.org/pdf/0809.0822
Slow decay of impact in equity markets. Available at SSRN 2471528, pp.2014-106 ,
DOI : 10.2139/ssrn.2471528
URL : http://arxiv.org/pdf/1407.3390
Regret analysis of stochastic and nonstochastic multiarmed bandit problems, Machine Learning, pp.1-122, 2012. ,
DOI : 10.1561/2200000024
URL : http://arxiv.org/pdf/1204.5721.pdf
Prior-free and prior-dependent regret bounds for Thompson Sampling, 2014 48th Annual Conference on Information Sciences and Systems (CISS), pp.638-646 ,
DOI : 10.1109/CISS.2014.6814158
URL : http://www.princeton.edu/~sbubeck/NIPS13_BL.pdf
Adaptive Linear Quadratic Gaussian Control: The Cost-Biased Approach Revisited, SIAM Journal on Control and Optimization, vol.36, issue.6, pp.1890-1907, 1998. ,
DOI : 10.1137/S0363012997317499
URL : http://black.csl.uiuc.edu/~prkumar/ps_files/adaptive_lqg_5.ps
Kullback?leibler upper confidence bounds for optimal sequential allocation. The Annals of Statistics, pp.1516-1541 ,
Prediction, learning, and games, 2006. ,
DOI : 10.1017/CBO9780511546921
Chernoff-Type Bounds for the Gaussian Error Function, IEEE Transactions on Communications, vol.59, issue.11, pp.2939-2944 ,
DOI : 10.1109/TCOMM.2011.072011.100049
An empirical evaluation of thompson sampling, Advances in Neural Information Processing Systems 24, pp.2249-2257, 2011. ,
Completely monotonic function associated with the gamma functions and proof of wallis' inequality, Tamkang Journal of Mathematics, vol.36, issue.4, pp.303-307, 2005. ,
Newtons method for discrete algebraic riccati equations when the closedloop matrix has eigenvalues on the unit circle, SIAM J. Matrix Anal. Appl, pp.279-294, 1998. ,
Multiperiod Consumption and Investment Behavior with Convex Transactions Costs, Management Science, vol.25, issue.11, pp.1127-1137, 1979. ,
DOI : 10.1287/mnsc.25.11.1127
Stochastic linear optimization under bandit feedback, COLT, pp.355-366, 2008. ,
Self-normalized processes: Limit theory and statistical applications, p.17, 2009. ,
DOI : 10.1007/978-3-540-85636-8
A fully consistent, minimal model for non-linear market impact. Minimal Model for Non-Linear Market Impact, pp.2014-106, 2014. ,
DOI : 10.2139/ssrn.2531917
URL : http://arxiv.org/pdf/1412.0141
Common risk factors in the returns on stocks and bonds, Journal of Financial Economics, vol.33, issue.1, pp.3-56, 1993. ,
DOI : 10.1016/0304-405X(93)90023-5
URL : http://www.nes.ru/~agoriaev/Papers/Fama-French%205%20factors%20for%20stocks%20and%20vonds%20JFE93.pdf
, Bibliography
Parametric bandits: The generalized linear case, Advances in Neural Information Processing Systems, pp.586-594, 2010. ,
Portfolio choice and pricing in illiquid markets, Journal of Economic Theory, vol.144, issue.2, pp.532-564, 2009. ,
DOI : 10.1016/j.jet.2008.07.006
Dynamic Trading with Predictable Returns and Transaction Costs, The Journal of Finance, vol.13, issue.6, pp.2309-2340 ,
DOI : 10.1007/s001990050268
No-dynamic-arbitrage and market impact, Quantitative Finance, vol.8, issue.7, pp.749-759, 2010. ,
DOI : 10.1080/14697680500244411
Thompson sampling for learning parameterized markov decision processes, Proceedings of The 28th Conference on Learning Theory, pp.2015-2039 ,
Signal weighting. The Journal of Portfolio Management, pp.24-34 ,
Optimal execution and block trade pricing: a general framework. arXiv preprint arXiv:1210, pp.2012-106 ,
Price Manipulation and Quasi-Arbitrage, Econometrica, vol.72, issue.4, pp.1247-1275, 2004. ,
DOI : 10.1111/j.1468-0262.2004.00531.x
General matrix pencil techniques for the solution of algebraic Riccati equations: a unified approach, IEEE Transactions on Automatic Control, vol.42, issue.8, pp.1085-1097, 1997. ,
DOI : 10.1109/9.618238
Near-optimal regret bounds for reinforcement learning, J. Mach. Learn. Res, vol.11, issue.66, pp.1563-1600, 2010. ,
The capital asset pricing model: Some empirical tests, p.106, 1972. ,
Scalable generalized linear bandits: Online computation and hashing. arXiv preprint, pp.2017-2037 ,
The general structure of optimal investment and consumption with small transaction costs. Swiss Finance Institute Research Paper, pp.13-15 ,
Thompson Sampling: An Asymptotically Optimal Finite-Time Analysis, Proceedings of the 23rd International Conference on Algorithmic Learning Theory, pp.199-213, 2012. ,
DOI : 10.1007/978-3-642-34106-9_18
URL : https://hal.archives-ouvertes.fr/hal-02286442
Controllability of dynamical systems, Mathematica Applicanda, vol.36, issue.50/09, pp.57-75 ,
DOI : 10.14708/ma.v36i50/09.1502
Thompson sampling for 1-dimensional exponential family bandits, Advances in Neural Information Processing Systems 26, pp.1448-1456 ,
The implicit function theorem: history, theory, and applications, pp.2012-92 ,
Stochastic systems: Estimation, identification, and adaptive control, SIAM, pp.2015-114 ,
DOI : 10.1137/1.9781611974263
Continuous Auctions and Insider Trading, Econometrica, vol.53, issue.6, pp.1315-1335, 1985. ,
DOI : 10.2307/1913210
Asymptotically efficient adaptive allocation rules, Advances in Applied Mathematics, vol.6, issue.1, pp.4-22, 1985. ,
DOI : 10.1016/0196-8858(85)90002-8
URL : https://doi.org/10.1016/0196-8858(85)90002-8
Algebraic riccati equations, pp.25-110, 1995. ,
Optimal trading with linear costs. arXiv preprint, pp.2012-106 ,
The end of optimism? an asymptotic analysis of finite-armed linear bandits, Artificial Intelligence and Statistics, pp.728-737 ,
A schur method for solving algebraic riccati equations Automatic Control, IEEE Transactions on, vol.24, issue.6, pp.913-921, 1979. ,
DOI : 10.1109/tac.1979.1102178
URL : http://dspace.mit.edu/bitstream/1721.1/1301/1/R-0859-05666488.pdf
Invariant Subspace Methods for the Numerical Solution of Riccati Equations, The Riccati Equation, pp.163-196, 1991. ,
DOI : 10.1007/978-3-642-58223-3_7
A contextual-bandit approach to personalized news article recommendation, Proceedings of the 19th international conference on World wide web, WWW '10, pp.661-670 ,
DOI : 10.1145/1772690.1772758
URL : http://www.cs.rutgers.edu/~lihong/pub/Li10Contextual.pdf
Provable optimal algorithms for generalized linear contextual bandits ,
Concise Formulas for the Area and Volume of a Hyperspherical Cap, Asian Journal of Mathematics & Statistics, vol.4, issue.1, pp.66-70 ,
DOI : 10.3923/ajms.2011.66.70
URL : https://scialert.net/qredirect.php?doi=ajms.2011.66.70&linkid=pdf
A finite-time analysis of multi-armed bandits problems with kullback-leibler divergences. arXiv preprint, p.12, 2011. ,
URL : https://hal.archives-ouvertes.fr/inria-00574987
, Bibliography
Portfolio selection*. The journal of finance, pp.77-91, 1952. ,
Agent-based models for latent liquidity and concave price impact, Physical Review E, vol.25, issue.4, pp.42805-2014 ,
DOI : 10.1080/14697688.2012.756146
Optimistic bayesian sampling in contextual-bandit problems, The Journal of Machine Learning Research, vol.13, issue.1, pp.2069-2106 ,
The stabilizing solution of the discrete algebraic riccati equation Automatic Control, IEEE Transactions on, vol.20, issue.126, pp.396-399, 1975. ,
Trading with small price impact. Swiss Finance Institute Research Paper, pp.14-17 ,
OPTIMAL PORTFOLIO MANAGEMENT WITH FIXED TRANSACTION COSTS, Mathematical Finance, vol.15, issue.4, pp.337-356, 1995. ,
DOI : 10.1016/0022-0531(71)90038-X
Convex functions and their applications: a contemporary approach, p.56, 2006. ,
Optimal trading strategy and supply/demand dynamics, Journal of Financial Markets, vol.16, issue.1, pp.1-32 ,
DOI : 10.1016/j.finmar.2012.09.001
Near-optimal reinforcement learning in factored mdps, Advances in Neural Information Processing Systems 27, pp.604-612 ,
Model-based reinforcement learning and the eluder dimension, Advances in Neural Information Processing Systems 27, pp.1466-1474 ,
Posterior sampling for reinforcement learning without episodes. arXiv preprint, pp.59-61 ,
On optimistic versus randomized exploration in reinforcement learning, pp.2017-2041 ,
(more) efficient reinforcement learning via posterior sampling, Proceedings of the 26th International Conference on Neural Information Processing Systems, NIPS'13, pp.3003-3011 ,
Adaptive Execution: Exploration and Learning of Price Impact, Operations Research, vol.63, issue.5, pp.1058-1076, 2015. ,
DOI : 10.1287/opre.2015.1415
An optimal poincaré inequality for convex domains Archive for Rational Mechanics and Analysis, pp.286-292, 1960. ,
DOI : 10.1007/bf00252910
On the necessity of identifying the true parameter in adaptive LQ control, Systems & Control Letters, vol.8, issue.2, pp.87-91, 1986. ,
DOI : 10.1016/0167-6911(86)90065-4
Markov decision processes: discrete stochastic dynamic programming, pp.2014-2036 ,
DOI : 10.1002/9780470316887
Time-sensitive bandit learning and satisficing thompson sampling. arXiv preprint, pp.2017-2032 ,
Capital asset prices: A theory of market equilibrium under conditions of risk*. The journal of finance, pp.425-442, 1964. ,
A bayesian framework for reinforcement learning, Proceedings of the Seventeenth International Conference on Machine Learning, ICML '00, pp.943-950, 2000. ,
Reinforcement Learning: An Introduction, IEEE Transactions on Neural Networks, vol.9, issue.5, p.20, 1998. ,
DOI : 10.1109/TNN.1998.712192
A Diffusion Model for Optimal Portfolio Selection in the Presence of Brokerage Fees, Mathematics of Operations Research, vol.13, issue.2, pp.277-294, 1988. ,
DOI : 10.1287/moor.13.2.277
On the likelihood that one unknown probability exceeds another in view of the evidence of two samples, Biometrika, vol.30, pp.285-294, 1933. ,
A Generalized Eigenvalue Approach for Solving Riccati Equations, SIAM Journal on Scientific and Statistical Computing, vol.2, issue.2, pp.121-135, 1981. ,
DOI : 10.1137/0902010
Algorithms for infinitely many-armed bandits, Advances in Neural Information Processing Systems, pp.1729-1736, 2009. ,
On the algebraic Riccati equation, Bulletin of the Australian Mathematical Society, vol.72, issue.03, pp.441-452, 1984. ,
DOI : 10.1137/0125020