204 6.6.2 Comparison to Birgé, On the "generalized Fanos's inequality, 2005. ,
209 6.7.2 Proofs of the refined Pinsker's inequality and of its consequence ,
214 6.8.1 Two toy applications of the continuous Fano's inequality ,
, How to derive a Fano-type inequality: an example, p.185
187 6.3.2 Any lower bound on kl leads to a Fano-type inequality ,
,
199 6.5.1 A simple proof of Cramér's theorem for Bernoulli distributions 199 6.5.2 Distribution-dependent posterior concentration, Other applications, with N = 1 pair of distributions, p.201 ,
203 6.6.1 On the "generalized Fanos's inequality, 2016. ,
208 6.7.1 Proofs of the convexity inequalities (6.11) and (6.12) . . . . . 209 6.7.2 Proofs of the refined Pinsker's inequality and of its consequence209 6.7.3 An improved Bretagnolle-Huber inequality, p.212 ,
214 6.8.1 Two toy applications of the continuous Fano's inequality . . . 214 6.8.2 From Bayesian posteriors to point estimators, p.222 ,
, 225 6.8.4 Proofs of basic facts about f -divergences
, On Jensen's inequality
, Sections 6.2 and 6.3) for Kullback-Leibler divergences, which are a special case of f -divergences with f (x) = x log x. We restate them in greater generality and to that end, first recall the definition of f -divergences. Note that these f -divergences will be further, The results recalled and re-proved in this section were stated in the main body of the chapter
Sample mean based index policies with o(log n) regret for the multi-armed bandit problem, Advances in Applied Probability, vol.27, issue.4, pp.1054-1078, 1995. ,
A general class of coefficients of divergence of one distribution from another, Journal of the Royal Statistical Society. Series B. Methodological, vol.28, pp.131-142, 1966. ,
A general class of coefficients of divergence of one distribution from another, Journal of the Royal Statistical Society. Series B. Methodological, vol.28, pp.131-142, 1966. ,
Active learning in multi-armed bandits, International Conference on Algorithmic Learning Theory, pp.287-302, 2008. ,
Minimax policies for adversarial and stochastic bandits, Proceedings of the 22nd Annual Conference on Learning Theory (COLT), COLT'09, pp.217-226, 2009. ,
URL : https://hal.archives-ouvertes.fr/hal-00834882
Best arm identification in multi-armed bandits, COLT-23th Conference on Learning Theory-2010, p.13, 2010. ,
URL : https://hal.archives-ouvertes.fr/hal-00654404
UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem, Periodica Mathematica Hungarica, vol.61, issue.1, pp.55-65, 2010. ,
Finite-time analysis of the multiarmed bandit problem, Machine Learning, vol.47, pp.235-256, 2002. ,
The nonstochastic multiarmed bandit problem, SIAM Journal on Computing, vol.32, issue.1, pp.48-77, 2002. ,
Sub-sampling for multi-armed bandits, Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases, vol.8724, pp.115-131, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-01025651
Statistical inference under order restrictions, 1973. ,
On the concentration of the missing mass, Electronic Communications in Probability, vol.18, issue.3, pp.1-7, 2013. ,
Topological Spaces: including a treatment of multi-valued functions, vector spaces, and convexity. Courier Corporation, 1963. ,
A new lower bound for multiple hypothesis testing, IEEE Transactions on Information Theory, vol.51, issue.4, pp.1611-1615, 2005. ,
Duality relationships for entropy-like minimization problems, SIAM Journal on Control and Optimization, vol.29, issue.2, pp.325-338, 1991. ,
Concentration inequalities. A nonasymptotic theory of independence, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00751496
Subgradient methods. Lecture notes of EE392o, 2003. ,
Estimation des densités : risque minimax, Séminaire de Probabilités de Strasbourg, vol.12, pp.342-363, 1978. ,
Estimation des densités : risque minimax. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete, vol.47, pp.119-137, 1979. ,
Bandits Games and Clustering Foundations, 2010. ,
URL : https://hal.archives-ouvertes.fr/tel-00845565
Regret analysis of stochastic and nonstochastic multiarmed bandit problems. Foundations and Trends in Machine Learning, vol.5, pp.1-122, 2012. ,
Prior-free and prior-dependent regret bounds for thompson sampling, Advances in Neural Information Processing Systems, pp.638-646, 2013. ,
The best of both worlds: stochastic and adversarial bandits, Conference on Learning Theory, pp.42-43, 2012. ,
Regret analysis of stochastic and nonstochastic multiarmed bandit problems. Foundations and Trends® in Machine Learning, vol.5, pp.1-122, 2012. ,
Bounded regret in stochastic multi-armed bandits, Proceedings of the 26th Annual Conference on Learning Theory (COLT), vol.30, pp.122-134, 2013. ,
The proof of Theorem 8 is not correct. We do not know if the theorem holds true, 2013. ,
Optimal adaptive policies for sequential allocation problems, Advances in Applied Mathematics, vol.17, issue.2, pp.122-142, 1996. ,
The Exponential Complexity of Satisfiability Problems, 2009. ,
Kullback-Leibler upper confidence bounds for optimal sequential allocation, Annals of Statistics, vol.41, issue.3, pp.1516-1541, 2013. ,
Tight (lower) bounds for the fixed budget best arm identification bandit problem, Conference on Learning Theory, pp.590-604, 2016. ,
A short proof of Cramér's theorem in R. The American Mathematical Monthly, vol.118, pp.925-931, 2011. ,
, Prediction, Learning, and Games, 2006.
How to use expert advice, Journal of the ACM, vol.44, issue.3, pp.427-485, 1997. ,
Minimizing regret with label-efficient prediction, IEEE Transactions on Information Theory, vol.51, pp.2152-2162, 2005. ,
URL : https://hal.archives-ouvertes.fr/hal-00007537
Combinatorial pure exploration of multi-armed bandits, Advances in Neural Information Processing Systems, vol.27, pp.379-387, 2014. ,
On Bayes risk lower bounds, Journal of Machine Learning Research, vol.17, issue.219, pp.1-58, 2016. ,
A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations, The Annals of Mathematical Statistics, vol.23, issue.4, pp.493-507, 1952. ,
Sequential design of experiments, The Annals of Mathematical Statistics, vol.30, issue.3, pp.755-770, 1959. ,
, Probability Theory, 1988.
Unimodal bandits without smoothness, 2014. ,
Elements of information theory, 2006. ,
Asymptotically optimal sequential experimentation under generalized ranking, 2015. ,
Sur un nouveau théorème limite de la théorie des probabilités, Actualites Scientifiques et Industrielles, vol.736, pp.5-23, 1938. ,
Eine informationstheoretische Ungleichung und ihre Anwendung auf den Beweis der Ergodizität von Markoffschen Ketten, vol.8, pp.85-108, 1963. ,
Sanov property, generalized i-projection and a conditional limit theorem. The Annals of Probability, pp.768-793, 1984. ,
Information projections revisited, IEEE Transactions on Information Theory, vol.49, issue.6, pp.1474-1490, 2003. ,
Anytime optimal algorithms in stochastic multi-armed bandits, Proceedings of the 2016 International Conference on Machine Learning, ICML'16, pp.1587-1595, 2016. ,
, Lecture Notes for Statistics 311/Electrical Engineering, vol.377, pp.41-48, 2014.
Distance-based and continuum Fano inequalities with applications to statistical estimation, 2013. ,
, Probability: Theory and Examples, 2010.
Pac bounds for multi-armed bandit and markov decision processes, International Conference on Computational Learning Theory, pp.255-270, 2002. ,
DOI : 10.1007/3-540-45435-7_18
Online learning and game theory. a quick overview with recent results and applications, ESAIM: Proceedings and Surveys, vol.51, pp.246-271, 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01237039
Mathematical statistics: A decision theoretic approach, Probability and Mathematical Statistics, vol.1, 1967. ,
A bayesian analysis of some nonparametric problems. The annals of statistics, pp.209-230, 1973. ,
DOI : 10.1214/aos/1176342360
URL : https://doi.org/10.1214/aos/1176342360
Unimodal regression. The Statistician, pp.479-485, 1986. ,
The kl-ucb algorithm for bounded stochastic bandits and beyond, COLT, pp.359-376, 2011. ,
Optimal best arm identification with fixed confidence, Conference on Learning Theory, pp.998-1027, 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01273838
On explore-then-commit strategies, Advances in Neural Information Processing Systems 29 (NIPS 2016), pp.784-792, 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01322906
Algorithm as 257: isotonic regression for umbrella orderings, Journal of the Royal Statistical Society. Series C (Applied Statistics), vol.39, issue.3, pp.397-402, 1990. ,
DOI : 10.2307/2347399
Efficacy and safety of secukinumab in patients with rheumatoid arthritis: a phase ii, dose-finding, double-blind, randomised, placebo controlled study, Annals of the Rheumatic Diseases, vol.72, issue.6, pp.863-869, 2013. ,
Convergence rates of posterior distributions, Annals of Statistics, vol.28, issue.2, pp.500-531, 2000. ,
DOI : 10.1214/aos/1016218228
URL : https://doi.org/10.1214/aos/1016218228
Multi-armed bandit allocation indices, 2011. ,
DOI : 10.1002/9780470980033
URL : https://onlinelibrary.wiley.com/doi/pdf/10.1002/9780470980033.fmatter
Bandit processes and dynamic allocation indices, Journal of the Royal Statistical Society. Series B (Methodological), pp.148-177, 1979. ,
DOI : 10.1111/j.2517-6161.1979.tb01068.x
Entropy and Information Theory, 2011. ,
Lower bounds for the minimax risk using-divergences, and applications, IEEE Transactions on Information Theory, vol.57, issue.4, pp.2386-2399, 2011. ,
On fanos lemma and similar inequalities for the minimax risk, Probability Theory and Mathematical Statistics, vol.67, pp.26-37, 2003. ,
A Distribution-Free Theory of Nonparametric Regression, Springer Series in Statistics, 2002. ,
Generalizing the Fano inequality, IEEE Transactions on Information Theory, vol.40, issue.4, pp.1247-1251, 1994. ,
,
, Vraisemblance empirique généralisée et estimation semiparamétrique, 2006.
Probability inequalities for sums of bounded random variables, Journal of the American Statistical Association, vol.58, issue.301, pp.13-30, 1963. ,
On adaptive posterior concentration rates, Annals of Statistics, vol.43, issue.5, pp.2259-2295, 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01251098
An asymptotically optimal bandit algorithm for bounded support models, COLT, pp.67-79, 2010. ,
Non-asymptotic analysis of a new bandit algorithm for semi-bounded rewards, Journal of Machine Learning Research, vol.16, pp.3721-3756, 2015. ,
Inequalities on the Lambert W function and hyperpower function, Journal of Inequalities in Pure and Applied Mathematics, vol.9, issue.2, p.51, 2008. ,
Inequalities on the Lambert W function and hyperpower function, Journal of Inequalities in Pure and Applied Mathematics, vol.9, issue.2, p.51, 2008. ,
Maximum-likelihood estimation under bound restriction and order and uniform bound restrictions, Statistics & probability letters, vol.35, issue.2, pp.165-171, 1997. ,
Statistical Estimation: Asymptotic Theory, vol.16, 1981. ,
Bounds for the risks of non-parametric regression estimates, Theory of Probability and its Applications, vol.27, pp.84-99, 1982. ,
Asymptotic bounds on the quality of the nonparametric regression estimation in L p, Journal of Mathematical Sciences, vol.24, issue.5, pp.540-550, 1984. ,
Sharp asymptotics of large deviations in d, Journal of Theoretical Probability, vol.8, issue.3, pp.501-522, 1995. ,
Online Advertisements and Multi-Armed Bandits, 2015. ,
On bayesian index policies for sequential resource allocation, 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01251606
On bayesian upper confidence bounds for bandit problems, Artificial Intelligence and Statistics, pp.592-600, 2012. ,
On the complexity of best-arm identification in multi-armed bandit models, The Journal of Machine Learning Research, vol.17, issue.1, pp.1-42, 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01024894
Large deviation methods for approximate probabilistic inference, Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence (UAI'98), pp.311-319, 1998. ,
Thompson sampling for 1-dimensional exponential family bandits, Advances in Neural Information Processing Systems, pp.1448-1456, 2013. ,
Minimax lower bounds for the two-armed bandit problem, IEEE Transactions on Automatic Control, vol.45, pp.711-714, 2000. ,
Gains and losses are fundamentally different in regret minimization: The sparse case, Journal of Machine Learning Research, vol.17, issue.229, pp.1-32, 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01265075
Adaptive treatment allocation and the multi-armed bandit problem. The Annals of Statistics, pp.1091-1114, 1987. ,
Asymptotically efficient adaptive allocation rules, Advances in Applied Mathematics, vol.6, pp.4-22, 1985. ,
DOI : 10.1016/0196-8858(85)90002-8
URL : https://doi.org/10.1016/0196-8858(85)90002-8
Optimally confident ucb: Improved regret for finite-armed bandits, 2015. ,
Regret analysis of the anytime optimally confident UCB algorithm, 2016. ,
A scale free algorithm for stochastic bandits with bounded kurtosis, Advances in Neural Information Processing Systems, pp.1583-1592, 2017. ,
Refining the confidence level for optimistic bandit strategies. 2018. submitted ,
Asymptotic methods in statistical decision theory, Springer Series in Statistics, 1986. ,
DOI : 10.1007/978-1-4612-4946-7
Asymptotics in statistics: some basic concepts. Springer Series in Statistics, 2000. ,
Dose escalation methods in phase i cancer clinical trials, JNCI: Journal of the National Cancer Institute, vol.101, issue.10, pp.708-720, 2009. ,
Theory of Point Estimation, 1998. ,
An optimal algorithm for the thresholding bandit problem, Proceedings of the 33nd International Conference on Machine Learning, pp.1690-1698, 2016. ,
Lipschitz bandits: Regret lower bound and optimal algorithms, Proceedings of The 27th Conference on Learning Theory, pp.975-999, 2014. ,
A finite-time analysis of multi-armed bandits problems with kullback-leibler divergences, Proceedings of the 24th annual Conference On Learning Theory, pp.497-514, 2011. ,
URL : https://hal.archives-ouvertes.fr/inria-00574987
The sample complexity of exploration in the multi-armed bandit problem, Journal of Machine Learning Research, vol.5, pp.623-648, 2004. ,
The sample complexity of exploration in the multi-armed bandit problem, Journal of Machine Learning Research, vol.5, pp.623-648, 2004. ,
Concentration Inequalities and Model Selection, Lecture Notes in Mathematics, vol.1896, 2007. ,
A minimax and asymptotically optimal algorithm for stochastic bandits, Procedings of the 2017 Algorithmic Learning Theory Conference, ALT'17, 2017. ,
From bandits to monte-carlo tree search: The optimistic principle applied to optimization and planning. Foundations and Trends in Machine Learning, vol.7, pp.1-129, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-00747575
An algorithm for unimodal isotonic regression, with application to locating a maximum, univ. new brunswichk dept. math, 1992. ,
A distribution dependent refinement of Pinsker's inequality, IEEE Transactions on Information Theory, vol.51, issue.5, pp.1836-1840, 2005. ,
Empirical likelihood ratio confidence regions. The Annals of Statistics, pp.90-120, 1990. ,
DOI : 10.1214/aos/1176347494
URL : https://doi.org/10.1214/aos/1176347494
Worst-case large-deviation asymptotics with application to queueing and information theory. Stochastic processes and their applications, vol.116, pp.724-756, 2006. ,
DOI : 10.1016/j.spa.2005.11.003
URL : https://doi.org/10.1016/j.spa.2005.11.003
Statistical Inference Based on Divergence Measures, 2006. ,
Order restricted statistical inference, 1988. ,
, Convex Analysis, 1972.
Simple bayesian algorithms for best arm identification, Conference on Learning Theory, pp.1417-1418, 2016. ,
Mastering the game of go with deep neural networks and tree search, nature, vol.529, issue.7587, p.248, 2016. ,
The simulator: Understanding adaptive sampling in the moderate-confidence regime, 2017. ,
An introduction to the prediction of individual sequences: (1) oracle inequalities; (2) prediction with partial monitoring, Chevaleret, 2007. ,
Optimal algorithms for unimodal regression, vol.1001, pp.48109-2122, 2000. ,
Reinforcement learning: An introduction, vol.1, 1998. ,
On the likelihood that one unknown probability exceeds another in view of the evidence of two samples, Biometrika, vol.25, pp.285-294, 1933. ,
Introduction to Nonparametric Estimation, 2009. ,
Inequalities for the l1 deviation of the empirical distribution, 2003. ,
Online learning with Gaussian payoffs and side observations, Advances in Neural Information Processing Systems 28 (NIPS 2015), pp.1360-1368, 2015. ,
Information-theoretic lower bounds on Bayes risk in decentralized estimation, 2016. ,
Information-theoretic determination of minimax rates of convergence, Annals of Statistics, vol.27, issue.5, pp.1564-1599, 1999. ,
, Research Papers in Probability and Statistics, pp.423-435, 1997.