C. , 204 6.6.2 Comparison to Birgé, On the "generalized Fanos's inequality, 2005.

. .. Kl, 209 6.7.2 Proofs of the refined Pinsker's inequality and of its consequence

. .. Elements-of-proof, 214 6.8.1 Two toy applications of the continuous Fano's inequality

, How to derive a Fano-type inequality: an example, p.185

. .. , 187 6.3.2 Any lower bound on kl leads to a Fano-type inequality

.. .. Main-applications,

. .. , 199 6.5.1 A simple proof of Cramér's theorem for Bernoulli distributions 199 6.5.2 Distribution-dependent posterior concentration, Other applications, with N = 1 pair of distributions, p.201

. .. References and . Chen, 203 6.6.1 On the "generalized Fanos's inequality, 2016.

. .. Kl, 208 6.7.1 Proofs of the convexity inequalities (6.11) and (6.12) . . . . . 209 6.7.2 Proofs of the refined Pinsker's inequality and of its consequence209 6.7.3 An improved Bretagnolle-Huber inequality, p.212

. .. Elements-of-proof, 214 6.8.1 Two toy applications of the continuous Fano's inequality . . . 214 6.8.2 From Bayesian posteriors to point estimators, p.222

, 225 6.8.4 Proofs of basic facts about f -divergences

, On Jensen's inequality

, Sections 6.2 and 6.3) for Kullback-Leibler divergences, which are a special case of f -divergences with f (x) = x log x. We restate them in greater generality and to that end, first recall the definition of f -divergences. Note that these f -divergences will be further, The results recalled and re-proved in this section were stated in the main body of the chapter

R. , Sample mean based index policies with o(log n) regret for the multi-armed bandit problem, Advances in Applied Probability, vol.27, issue.4, pp.1054-1078, 1995.

S. Ali and S. Silvey, A general class of coefficients of divergence of one distribution from another, Journal of the Royal Statistical Society. Series B. Methodological, vol.28, pp.131-142, 1966.

S. M. Ali and S. D. Silvey, A general class of coefficients of divergence of one distribution from another, Journal of the Royal Statistical Society. Series B. Methodological, vol.28, pp.131-142, 1966.

A. Antos, V. Grover, and C. Szepesvári, Active learning in multi-armed bandits, International Conference on Algorithmic Learning Theory, pp.287-302, 2008.

J. Audibert and S. Bubeck, Minimax policies for adversarial and stochastic bandits, Proceedings of the 22nd Annual Conference on Learning Theory (COLT), COLT'09, pp.217-226, 2009.
URL : https://hal.archives-ouvertes.fr/hal-00834882

J. Audibert and S. Bubeck, Best arm identification in multi-armed bandits, COLT-23th Conference on Learning Theory-2010, p.13, 2010.
URL : https://hal.archives-ouvertes.fr/hal-00654404

P. Auer and R. Ortner, UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem, Periodica Mathematica Hungarica, vol.61, issue.1, pp.55-65, 2010.

P. Auer, N. Cesa-bianchi, and P. Fischer, Finite-time analysis of the multiarmed bandit problem, Machine Learning, vol.47, pp.235-256, 2002.

P. Auer, N. Cesa-bianchi, Y. Freund, and R. Schapire, The nonstochastic multiarmed bandit problem, SIAM Journal on Computing, vol.32, issue.1, pp.48-77, 2002.

A. Baransi, O. Maillard, and S. Mannor, Sub-sampling for multi-armed bandits, Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases, vol.8724, pp.115-131, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01025651

R. E. Barlow, D. J. Bartholomew, J. M. Bremner, and H. D. Brunk, Statistical inference under order restrictions, 1973.

D. Berend and A. Kontorovich, On the concentration of the missing mass, Electronic Communications in Probability, vol.18, issue.3, pp.1-7, 2013.

C. Berge, Topological Spaces: including a treatment of multi-valued functions, vector spaces, and convexity. Courier Corporation, 1963.

L. Birgé, A new lower bound for multiple hypothesis testing, IEEE Transactions on Information Theory, vol.51, issue.4, pp.1611-1615, 2005.

J. M. Borwein and A. S. Lewis, Duality relationships for entropy-like minimization problems, SIAM Journal on Control and Optimization, vol.29, issue.2, pp.325-338, 1991.

S. Boucheron, G. Lugosi, and P. Massart, Concentration inequalities. A nonasymptotic theory of independence, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00751496

S. Boyd, L. Xiao, and A. Mutapcic, Subgradient methods. Lecture notes of EE392o, 2003.

J. Bretagnolle and C. Huber, Estimation des densités : risque minimax, Séminaire de Probabilités de Strasbourg, vol.12, pp.342-363, 1978.

J. Bretagnolle and C. Huber, Estimation des densités : risque minimax. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete, vol.47, pp.119-137, 1979.

S. Bubeck, Bandits Games and Clustering Foundations, 2010.
URL : https://hal.archives-ouvertes.fr/tel-00845565

S. Bubeck and N. Cesa-bianchi, Regret analysis of stochastic and nonstochastic multiarmed bandit problems. Foundations and Trends in Machine Learning, vol.5, pp.1-122, 2012.

S. Bubeck and C. Liu, Prior-free and prior-dependent regret bounds for thompson sampling, Advances in Neural Information Processing Systems, pp.638-646, 2013.

S. Bubeck and A. Slivkins, The best of both worlds: stochastic and adversarial bandits, Conference on Learning Theory, pp.42-43, 2012.

S. Bubeck and N. Cesa-bianchi, Regret analysis of stochastic and nonstochastic multiarmed bandit problems. Foundations and Trends® in Machine Learning, vol.5, pp.1-122, 2012.

S. Bubeck, V. Perchet, P. Rigollet, ;. , and J. W&cp, Bounded regret in stochastic multi-armed bandits, Proceedings of the 26th Annual Conference on Learning Theory (COLT), vol.30, pp.122-134, 2013.

S. Bubeck, V. Perchet, P. Rigollet, . Erratum, and . Bubeck, The proof of Theorem 8 is not correct. We do not know if the theorem holds true, 2013.

A. Burnetas and M. Katehakis, Optimal adaptive policies for sequential allocation problems, Advances in Applied Mathematics, vol.17, issue.2, pp.122-142, 1996.

C. Calabro, The Exponential Complexity of Satisfiability Problems, 2009.

O. Cappé, A. Garivier, O. Maillard, R. Munos, and G. Stoltz, Kullback-Leibler upper confidence bounds for optimal sequential allocation, Annals of Statistics, vol.41, issue.3, pp.1516-1541, 2013.

A. Carpentier and A. Locatelli, Tight (lower) bounds for the fixed budget best arm identification bandit problem, Conference on Learning Theory, pp.590-604, 2016.

R. Cerf and P. Petit, A short proof of Cramér's theorem in R. The American Mathematical Monthly, vol.118, pp.925-931, 2011.

N. Cesa-bianchi and G. Lugosi, Prediction, Learning, and Games, 2006.

N. Cesa-bianchi, Y. Freund, D. Haussler, D. Helmbold, R. Schapire et al., How to use expert advice, Journal of the ACM, vol.44, issue.3, pp.427-485, 1997.

N. Cesa-bianchi, G. Lugosi, and G. Stoltz, Minimizing regret with label-efficient prediction, IEEE Transactions on Information Theory, vol.51, pp.2152-2162, 2005.
URL : https://hal.archives-ouvertes.fr/hal-00007537

S. Chen, T. Lin, I. King, M. R. Lyu, and W. Chen, Combinatorial pure exploration of multi-armed bandits, Advances in Neural Information Processing Systems, vol.27, pp.379-387, 2014.

X. Chen, A. Guntuboyina, and Y. Zhang, On Bayes risk lower bounds, Journal of Machine Learning Research, vol.17, issue.219, pp.1-58, 2016.

H. Chernoff, A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations, The Annals of Mathematical Statistics, vol.23, issue.4, pp.493-507, 1952.

H. Chernoff, Sequential design of experiments, The Annals of Mathematical Statistics, vol.30, issue.3, pp.755-770, 1959.

Y. Chow and H. Teicher, Probability Theory, 1988.

R. Combes and A. Proutière, Unimodal bandits without smoothness, 2014.

T. Cover and J. Thomas, Elements of information theory, 2006.

W. Cowan and M. Katehakis, Asymptotically optimal sequential experimentation under generalized ranking, 2015.

H. Cramér, Sur un nouveau théorème limite de la théorie des probabilités, Actualites Scientifiques et Industrielles, vol.736, pp.5-23, 1938.

I. Csiszár, Eine informationstheoretische Ungleichung und ihre Anwendung auf den Beweis der Ergodizität von Markoffschen Ketten, vol.8, pp.85-108, 1963.

I. Csiszár, Sanov property, generalized i-projection and a conditional limit theorem. The Annals of Probability, pp.768-793, 1984.

I. Csiszár and F. Matus, Information projections revisited, IEEE Transactions on Information Theory, vol.49, issue.6, pp.1474-1490, 2003.

R. Degenne and V. Perchet, Anytime optimal algorithms in stochastic multi-armed bandits, Proceedings of the 2016 International Conference on Machine Learning, ICML'16, pp.1587-1595, 2016.

J. Duchi, Lecture Notes for Statistics 311/Electrical Engineering, vol.377, pp.41-48, 2014.

J. Duchi and M. Wainwright, Distance-based and continuum Fano inequalities with applications to statistical estimation, 2013.

R. Durrett, Probability: Theory and Examples, 2010.

E. Even-dar, S. Mannor, and Y. Mansour, Pac bounds for multi-armed bandit and markov decision processes, International Conference on Computational Learning Theory, pp.255-270, 2002.
DOI : 10.1007/3-540-45435-7_18

M. Faure, P. Gaillard, B. Gaujal, and V. Perchet, Online learning and game theory. a quick overview with recent results and applications, ESAIM: Proceedings and Surveys, vol.51, pp.246-271, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01237039

T. Ferguson, Mathematical statistics: A decision theoretic approach, Probability and Mathematical Statistics, vol.1, 1967.

T. S. Ferguson, A bayesian analysis of some nonparametric problems. The annals of statistics, pp.209-230, 1973.
DOI : 10.1214/aos/1176342360
URL : https://doi.org/10.1214/aos/1176342360

M. Frisén, Unimodal regression. The Statistician, pp.479-485, 1986.

A. Garivier and O. Cappé, The kl-ucb algorithm for bounded stochastic bandits and beyond, COLT, pp.359-376, 2011.

A. Garivier and E. Kaufmann, Optimal best arm identification with fixed confidence, Conference on Learning Theory, pp.998-1027, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01273838

A. Garivier, E. Kaufmann, and T. Lattimore, On explore-then-commit strategies, Advances in Neural Information Processing Systems 29 (NIPS 2016), pp.784-792, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01322906

Z. Geng and N. Shi, Algorithm as 257: isotonic regression for umbrella orderings, Journal of the Royal Statistical Society. Series C (Applied Statistics), vol.39, issue.3, pp.397-402, 1990.
DOI : 10.2307/2347399

M. C. Genovese, P. Durez, H. B. Richards, J. Supronik, E. Dokoupilova et al., Efficacy and safety of secukinumab in patients with rheumatoid arthritis: a phase ii, dose-finding, double-blind, randomised, placebo controlled study, Annals of the Rheumatic Diseases, vol.72, issue.6, pp.863-869, 2013.

S. Ghosal, J. Ghosh, A. Van-der, and . Vaart, Convergence rates of posterior distributions, Annals of Statistics, vol.28, issue.2, pp.500-531, 2000.
DOI : 10.1214/aos/1016218228
URL : https://doi.org/10.1214/aos/1016218228

J. Gittins, K. Glazebrook, and R. Weber, Multi-armed bandit allocation indices, 2011.
DOI : 10.1002/9780470980033
URL : https://onlinelibrary.wiley.com/doi/pdf/10.1002/9780470980033.fmatter

J. C. Gittins, Bandit processes and dynamic allocation indices, Journal of the Royal Statistical Society. Series B (Methodological), pp.148-177, 1979.
DOI : 10.1111/j.2517-6161.1979.tb01068.x

R. Gray, Entropy and Information Theory, 2011.

A. Guntuboyina, Lower bounds for the minimax risk using-divergences, and applications, IEEE Transactions on Information Theory, vol.57, issue.4, pp.2386-2399, 2011.

A. Gushchin, On fanos lemma and similar inequalities for the minimax risk, Probability Theory and Mathematical Statistics, vol.67, pp.26-37, 2003.

L. Györfi, M. Kohler, A. Krzy?ak, and H. Walk, A Distribution-Free Theory of Nonparametric Regression, Springer Series in Statistics, 2002.

T. Han and S. Verdú, Generalizing the Fano inequality, IEEE Transactions on Information Theory, vol.40, issue.4, pp.1247-1251, 1994.

H. Harari-kermadec,

, Vraisemblance empirique généralisée et estimation semiparamétrique, 2006.

W. Hoeffding, Probability inequalities for sums of bounded random variables, Journal of the American Statistical Association, vol.58, issue.301, pp.13-30, 1963.

M. Hoffmann, J. Rousseau, and J. Schmidt-hieber, On adaptive posterior concentration rates, Annals of Statistics, vol.43, issue.5, pp.2259-2295, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01251098

J. Honda and A. Takemura, An asymptotically optimal bandit algorithm for bounded support models, COLT, pp.67-79, 2010.

J. Honda and A. Takemura, Non-asymptotic analysis of a new bandit algorithm for semi-bounded rewards, Journal of Machine Learning Research, vol.16, pp.3721-3756, 2015.

A. Hoorfar and M. Hassani, Inequalities on the Lambert W function and hyperpower function, Journal of Inequalities in Pure and Applied Mathematics, vol.9, issue.2, p.51, 2008.

X. Hu, Maximum-likelihood estimation under bound restriction and order and uniform bound restrictions, Statistics & probability letters, vol.35, issue.2, pp.165-171, 1997.

I. Ibragimov and R. Has'minskii, Statistical Estimation: Asymptotic Theory, vol.16, 1981.

I. Ibragimov and R. Has'minskii, Bounds for the risks of non-parametric regression estimates, Theory of Probability and its Applications, vol.27, pp.84-99, 1982.

I. Ibragimov and R. Has'minskii, Asymptotic bounds on the quality of the nonparametric regression estimation in L p, Journal of Mathematical Sciences, vol.24, issue.5, pp.540-550, 1984.

M. Iltis, Sharp asymptotics of large deviations in d, Journal of Theoretical Probability, vol.8, issue.3, pp.501-522, 1995.

C. Jiang, Online Advertisements and Multi-Armed Bandits, 2015.

E. Kaufmann, On bayesian index policies for sequential resource allocation, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01251606

E. Kaufmann, O. Cappé, and A. Garivier, On bayesian upper confidence bounds for bandit problems, Artificial Intelligence and Statistics, pp.592-600, 2012.

E. Kaufmann, O. Cappé, and A. Garivier, On the complexity of best-arm identification in multi-armed bandit models, The Journal of Machine Learning Research, vol.17, issue.1, pp.1-42, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01024894

M. Kearns and L. Saul, Large deviation methods for approximate probabilistic inference, Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence (UAI'98), pp.311-319, 1998.

N. Korda, E. Kaufmann, and R. Munos, Thompson sampling for 1-dimensional exponential family bandits, Advances in Neural Information Processing Systems, pp.1448-1456, 2013.

S. Kulkarni and G. Lugosi, Minimax lower bounds for the two-armed bandit problem, IEEE Transactions on Automatic Control, vol.45, pp.711-714, 2000.

J. Kwon and V. Perchet, Gains and losses are fundamentally different in regret minimization: The sparse case, Journal of Machine Learning Research, vol.17, issue.229, pp.1-32, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01265075

T. L. Lai, Adaptive treatment allocation and the multi-armed bandit problem. The Annals of Statistics, pp.1091-1114, 1987.

T. L. Lai and H. Robbins, Asymptotically efficient adaptive allocation rules, Advances in Applied Mathematics, vol.6, pp.4-22, 1985.
DOI : 10.1016/0196-8858(85)90002-8
URL : https://doi.org/10.1016/0196-8858(85)90002-8

T. Lattimore, Optimally confident ucb: Improved regret for finite-armed bandits, 2015.

T. Lattimore, Regret analysis of the anytime optimally confident UCB algorithm, 2016.

T. Lattimore, A scale free algorithm for stochastic bandits with bounded kurtosis, Advances in Neural Information Processing Systems, pp.1583-1592, 2017.

T. Lattimore, Refining the confidence level for optimistic bandit strategies. 2018. submitted

L. and L. Cam, Asymptotic methods in statistical decision theory, Springer Series in Statistics, 1986.
DOI : 10.1007/978-1-4612-4946-7

L. , L. Cam, and G. Yang, Asymptotics in statistics: some basic concepts. Springer Series in Statistics, 2000.

C. L. Tourneau, J. J. Lee, and L. L. Siu, Dose escalation methods in phase i cancer clinical trials, JNCI: Journal of the National Cancer Institute, vol.101, issue.10, pp.708-720, 2009.

E. Lehmann and G. Casella, Theory of Point Estimation, 1998.

A. Locatelli, M. Gutzeit, and A. Carpentier, An optimal algorithm for the thresholding bandit problem, Proceedings of the 33nd International Conference on Machine Learning, pp.1690-1698, 2016.

S. Magureanu, R. Combes, and A. Proutière, Lipschitz bandits: Regret lower bound and optimal algorithms, Proceedings of The 27th Conference on Learning Theory, pp.975-999, 2014.

O. Maillard, R. Munos, and G. Stoltz, A finite-time analysis of multi-armed bandits problems with kullback-leibler divergences, Proceedings of the 24th annual Conference On Learning Theory, pp.497-514, 2011.
URL : https://hal.archives-ouvertes.fr/inria-00574987

S. Mannor and J. Tsitsiklis, The sample complexity of exploration in the multi-armed bandit problem, Journal of Machine Learning Research, vol.5, pp.623-648, 2004.

S. Mannor and J. N. Tsitsiklis, The sample complexity of exploration in the multi-armed bandit problem, Journal of Machine Learning Research, vol.5, pp.623-648, 2004.

P. Massart, Concentration Inequalities and Model Selection, Lecture Notes in Mathematics, vol.1896, 2007.

P. Ménard and A. Garivier, A minimax and asymptotically optimal algorithm for stochastic bandits, Procedings of the 2017 Algorithmic Learning Theory Conference, ALT'17, 2017.

R. Munos, From bandits to monte-carlo tree search: The optimistic principle applied to optimization and planning. Foundations and Trends in Machine Learning, vol.7, pp.1-129, 2014.
URL : https://hal.archives-ouvertes.fr/hal-00747575

R. Mureika, T. Turner, and P. Wollan, An algorithm for unimodal isotonic regression, with application to locating a maximum, univ. new brunswichk dept. math, 1992.

E. Ordentlich and M. Weinberger, A distribution dependent refinement of Pinsker's inequality, IEEE Transactions on Information Theory, vol.51, issue.5, pp.1836-1840, 2005.

A. Owen, Empirical likelihood ratio confidence regions. The Annals of Statistics, pp.90-120, 1990.
DOI : 10.1214/aos/1176347494
URL : https://doi.org/10.1214/aos/1176347494

C. Pandit and S. Meyn, Worst-case large-deviation asymptotics with application to queueing and information theory. Stochastic processes and their applications, vol.116, pp.724-756, 2006.
DOI : 10.1016/j.spa.2005.11.003
URL : https://doi.org/10.1016/j.spa.2005.11.003

L. Pardo, Statistical Inference Based on Divergence Measures, 2006.

T. Robertson, F. T. Wright, and R. L. Dykstra, Order restricted statistical inference, 1988.

R. Rockafellar, Convex Analysis, 1972.

D. Russo, Simple bayesian algorithms for best arm identification, Conference on Learning Theory, pp.1417-1418, 2016.

D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre et al., Mastering the game of go with deep neural networks and tree search, nature, vol.529, issue.7587, p.248, 2016.

M. Simchowitz, K. Jamieson, and B. Recht, The simulator: Understanding adaptive sampling in the moderate-confidence regime, 2017.

G. Stoltz, An introduction to the prediction of individual sequences: (1) oracle inequalities; (2) prediction with partial monitoring, Chevaleret, 2007.

Q. F. Stout, Optimal algorithms for unimodal regression, vol.1001, pp.48109-2122, 2000.

R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction, vol.1, 1998.

W. R. Thompson, On the likelihood that one unknown probability exceeds another in view of the evidence of two samples, Biometrika, vol.25, pp.285-294, 1933.

A. Tsybakov, Introduction to Nonparametric Estimation, 2009.

T. Weissman, E. Ordentlich, G. Seroussi, S. Verdu, and M. J. Weinberger, Inequalities for the l1 deviation of the empirical distribution, 2003.

A. Wu, C. György, and . Szepesvari, Online learning with Gaussian payoffs and side observations, Advances in Neural Information Processing Systems 28 (NIPS 2015), pp.1360-1368, 2015.

A. Xu and M. Raginsky, Information-theoretic lower bounds on Bayes risk in decentralized estimation, 2016.

Y. Yang and A. Barron, Information-theoretic determination of minimax rates of convergence, Annals of Statistics, vol.27, issue.5, pp.1564-1599, 1999.

B. Yu, F. Assouad, L. Cam-;-festschrift-for-lucien-le, and C. , Research Papers in Probability and Statistics, pp.423-435, 1997.