. Abbasi-yadkori, Forced-exploration based algorithms for playing in bandits with large action sets, 2009.

J. Abernethy, E. Hazan, and A. Rakhlin, Competing in the dark: An efficient algorithm for bandit linear optimization, COLT, pp.263-274, 2008.

. Agrawal, Sample mean based index policies by O(log n) regret for the multi-armed bandit problem, Advances in Applied Probability, vol.32, issue.04, pp.1054-1078, 1995.
DOI : 10.1016/0196-8858(85)90002-8

. Agrawal, The Continuum-Armed Bandit Problem, SIAM Journal on Control and Optimization, vol.33, issue.6, pp.1926-1951, 1995.
DOI : 10.1137/S0363012992237273

C. Allenberg, P. Auer, L. Györfi, and G. Ottucsák, Hannan Consistency in On-Line Learning in Case of Unbounded Losses Under Partial Monitoring, ALT, pp.229-243, 2006.
DOI : 10.1007/11894841_20

A. Antos, V. Grover, and C. Szepesvári, Active Learning in Multi-armed Bandits, Proc. of the 19th International Conference on Algorithmic Learning Theory, pp.329-343, 2008.
DOI : 10.1007/978-3-540-87987-9_25

D. Arthur and S. Vassilvitskii, k-means++: the advantages of careful seeding, SODA, 2007.

J. Audibert and S. Bubeck, Minimax policies for adversarial and stochastic bandits, 22nd annual conference on learning theory, 2009.
URL : https://hal.archives-ouvertes.fr/hal-00834882

J. Audibert and S. Bubeck, Minimax policies for bandits games, 2009.

J. Audibert, R. Munos, and C. Szepesvári, Exploration???exploitation tradeoff using variance estimates in multi-armed bandits, Theoretical Computer Science, vol.410, issue.19, pp.1876-1902, 2009.
DOI : 10.1016/j.tcs.2009.01.016

URL : https://hal.archives-ouvertes.fr/hal-00711069

J. Audibert, S. Bubeck, and R. Munos, Best arm identification in multi-armed bandits, 23rd annual conference on learning theory, 2010.
URL : https://hal.archives-ouvertes.fr/hal-00654404

P. Auer, N. Cesa-bianchi, Y. Freund, and R. Schapire, Gambling in a rigged casino: The adversarial multi-armed bandit problem, Proceedings of IEEE 36th Annual Foundations of Computer Science, pp.322-331, 1995.
DOI : 10.1109/SFCS.1995.492488

N. Auer, P. Cesa-bianchi, and . Fischer, Finite-time analysis of the multiarmed bandit problem, Machine Learning, vol.47, issue.2/3, pp.235-256, 2002.
DOI : 10.1023/A:1013689704352

P. Auer, N. Cesa-bianchi, Y. Freund, and R. Schapire, The Nonstochastic Multiarmed Bandit Problem, SIAM Journal on Computing, vol.32, issue.1, pp.48-77, 2003.
DOI : 10.1137/S0097539701398375

R. Auer, C. Ortner, and . Szepesvári, Improved Rates for the Stochastic Continuum-Armed Bandit Problem, 20th Conference on Learning Theory, pp.454-468, 2007.
DOI : 10.1007/978-3-540-72927-3_33

T. Auer, R. Jaksch, and . Ortner, Near-optimal regret bounds for reinforcement learning, Advances in Neural Information Processing Systems 21, pp.89-96, 2009.

P. Auer, Using confidence bounds for exploitation-exploration trade-offs, Journal of Machine Learning Research, vol.3, pp.397-422, 2002.

B. Awerbuch and R. D. Kleinberg, Adaptive routing with end-to-end feedback, Proceedings of the thirty-sixth annual ACM symposium on Theory of computing , STOC '04, pp.45-53, 2004.
DOI : 10.1145/1007352.1007367

J. Batstone, J. Keller, I. Angelidaki, S. V. Kalyuzhnyi, S. G. Pavlostathis et al., Anaerobic digestion model no, 2002.

. Ben-david, A framework for statistical clustering with constant time approximation algorithms for K-median and K-means clustering, Machine Learning, pp.243-257, 2007.
DOI : 10.1007/s10994-006-0587-3

U. Ben-david and . Von-luxburg, Relating clustering stability to properties of cluster boundaries, COLT, 2008.

U. Ben-david, D. Luxburg, and . Pál, A sober look on clustering stability Stability of k -means clustering, COLT COLT, 2006.

D. Ben-david, S. Pal, and . Shalev-shwartz, Agnostic online learning, 22nd annual conference on learning theory, 2009.

D. A. Berry, R. W. Chen, A. Zame, D. C. Heath, and L. A. Shepp, Bandit problems with infinitely many arms, The Annals of Statistics, vol.25, issue.5, pp.2103-2116, 1997.
DOI : 10.1214/aos/1069362389

. Billingsley, Convergence of Probability Measures, 1968.
DOI : 10.1002/9780470316962

L. Bottou and Y. Bengio, Convergence properties of the k-means algorithm, NIPS, 1995.

S. Bubeck and R. Munos, Open loop optimistic planning, 23rd annual conference on learning theory, 2010.
URL : https://hal.archives-ouvertes.fr/hal-00943119

S. Bubeck and U. Von-luxburg, Nearest neighbor clustering: A baseline method for consistent clustering with arbitrary objective functions, JMLR, vol.10, pp.657-698, 2009.
URL : https://hal.archives-ouvertes.fr/inria-00185780

S. Bubeck, R. Munos, and G. Stoltz, Pure Exploration in Multi-armed Bandits Problems, Proc. of the 20th International Conference on Algorithmic Learning Theory, 2009.
DOI : 10.1090/S0002-9904-1952-09620-8

S. Bubeck, R. Munos, and G. Stoltz, Pure exploration in finitely?armed and continuously?armed bandits, 2009.

S. Bubeck, R. Munos, G. Stoltz, and C. Szepesvari, Online optimization in X -armed bandits, Advances in Neural Information Processing Systems 22, pp.201-208, 2009.
URL : https://hal.archives-ouvertes.fr/inria-00329797

J. Buhmann, Empirical risk approximation: An induction principle for unsupervised learning, 1998.

N. Cesa-bianchi, Analysis of two gradient-based algorithms for on-line regression, Proceedings of the tenth annual conference on Computational learning theory , COLT '97, pp.392-411, 1999.
DOI : 10.1145/267460.267492

N. Cesa-bianchi and G. Lugosi, Combinatorial bandits, 22nd annual conference on learning theory, 2009.
DOI : 10.1016/j.jcss.2012.01.001

N. Cesa-bianchi, Y. Freund, D. Haussler, D. P. Helmbold, R. E. Schapire et al., How to use expert advice, Journal of the ACM, vol.44, issue.3, pp.427-485, 1997.
DOI : 10.1145/258128.258179

N. Cesa-bianchi, G. Lugosi, and G. Stoltz, Minimizing Regret With Label Efficient Prediction, IEEE Transactions on Information Theory, vol.51, issue.6, pp.2152-2162, 2005.
DOI : 10.1109/TIT.2005.847729

URL : https://hal.archives-ouvertes.fr/hal-00007537

D. Chakrabarti, R. Kumar, F. Radlinski, and E. Upfal, Mortal multi-armed bandits, Advances in Neural Information Processing Systems 22, pp.273-280, 2009.

H. Soo-chang, M. C. Fu, J. Hu, and S. I. Marcus, Simulation-based Algorithms for Markov Decision Processes, 2007.

G. Chaslot, M. H. Winands, H. Herik, J. Uiterwijk, and B. Bouzy, PROGRESSIVE STRATEGIES FOR MONTE-CARLO TREE SEARCH, New Mathematics and Natural Computation, vol.04, issue.03, pp.343-357, 2008.
DOI : 10.1142/S1793005708001094

. Cope, Regret and convergence bounds for immediate-reward reinforcement learning with continuous action spaces, 2004.

P. Coquelin and R. Munos, Bandit algorithms for tree search, Proceedings of the 23rd Conference on Uncertainty in Artificial Intelligence, 2007.
URL : https://hal.archives-ouvertes.fr/inria-00150207

M. Cover and J. A. Thomas, Elements of information theory, 1991.

A. Czumaj and C. Sohler, Sublinear-time approximation algorithms for clustering via random sampling. Random Struct, Algorithms, vol.30, issue.12, pp.226-256, 2007.

T. P. Dani, S. M. Hayes, and . Kakade, Stochastic linear optimization under bandit feedback, COLT, pp.355-366, 2008.

S. Dasgupta and L. Schulman, A probabilistic analysis of EM for mixtures of separated, spherical Gaussians, Journal of Machine Learnig Research, vol.8, pp.203-226, 2007.

L. Devroye and G. Lugosi, Combinatorial Methods in Density Estimation, 2001.
DOI : 10.1007/978-1-4613-0125-7

L. Devroye, G. Györfi, and . Lugosi, A Probabilistic Theory of Pattern Recognition, 1996.
DOI : 10.1007/978-1-4612-0711-5

J. L. Doob, Stochastic Processes, 1953.

E. Even-dar, S. Mannor, and Y. Mansour, PAC Bounds for Multi-armed Bandit and Markov Decision Processes, Proceedings of the 15th Annual Conference on Computational Learning Theory, pp.255-270, 2002.
DOI : 10.1007/3-540-45435-7_18

M. A. Figueiredo and A. K. Jain, Unsupervised learning of finite mixture models, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.24, issue.3, pp.381-396, 2002.
DOI : 10.1109/34.990138

O. Filippi, A. Cappé, and . Garivier, Regret bounds for opportunistic channel access, Available on Arxiv, 2009.

H. Finnsson and Y. Bjornsson, Simulation-based approach to general game playing, Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence, pp.259-264, 2008.

D. Foster and R. Vohra, Calibrated Learning and Correlated Equilibrium, Games and Economic Behavior, vol.21, issue.1-2, pp.40-55, 1997.
DOI : 10.1006/game.1997.0595

C. Fraley and A. Raftery, How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster Analysis, The Computer Journal, vol.41, issue.8, pp.41578-588, 1998.
DOI : 10.1093/comjnl/41.8.578

D. A. Freedman, On tail probabilities for martingales. The Annals of Probability, pp.100-118, 1975.

J. Fritz, Distribution-free exponential error bound for nearest neighbor pattern classification, IEEE Transactions on Information Theory, vol.21, issue.5, pp.552-557, 1975.
DOI : 10.1109/TIT.1975.1055443

M. Garey, D. Johnson, and H. Witsenhausen, The complexity of the generalized Lloyd - Max problem (Corresp.), IEEE Transactions on Information Theory, vol.28, issue.2, pp.255-256, 1982.
DOI : 10.1109/TIT.1982.1056488

A. Garivier and E. Moulines, On upper-confidence bound policies for non-stationary bandit problems . ArXiv e-prints, 2008.
URL : https://hal.archives-ouvertes.fr/hal-00281392

S. Gelly and D. Silver, Achieving master level play in 9× 9 computer go, Proceedings of AAAI, pp.1537-1540, 2008.

S. Gelly and D. Silver, Combining online and offline knowledge in UCT, Proceedings of the 24th international conference on Machine learning, ICML '07, pp.273-280, 2007.
DOI : 10.1145/1273496.1273531

URL : https://hal.archives-ouvertes.fr/inria-00164003

S. Gelly, Y. Wang, R. Munos, and O. Teytaud, Modification of UCT with patterns in Monte-Carlo go, 2006.
URL : https://hal.archives-ouvertes.fr/inria-00117266

J. C. Gittins, Multi-armed Bandit Allocation Indices. Wiley-Interscience series in systems and optimization, 1989.
DOI : 10.1002/9780470980033

P. Grünwald, The minimum description length principle, 2007.

S. Guattery and G. Miller, On the Quality of Spectral Separators, SIAM Journal on Matrix Analysis and Applications, vol.19, issue.3, pp.701-719, 1998.
DOI : 10.1137/S0895479896312262

A. György, T. Linder, G. Lugosi, and G. Ottucsák, The on-line shortest path problem under partial monitoring, J. Mach. Learn. Res, vol.8, pp.2369-2403, 2007.

J. Hartigan, Consistency of Single Linkage for High-Density Clusters, Journal of the American Statistical Association, vol.1, issue.374, pp.388-394, 1981.
DOI : 10.1080/00401706.1972.10488943

J. Hartigan, Statistical theory in clustering, Journal of Classification, vol.45, issue.B, pp.63-76, 1985.
DOI : 10.1007/BF01908064

D. Hochbaum and D. Shmoys, -Center Problem, Mathematics of Operations Research, vol.10, issue.2, pp.180-184, 1985.
DOI : 10.1287/moor.10.2.180

URL : https://hal.archives-ouvertes.fr/hal-00897097

W. Hoeffding, Probability Inequalities for Sums of Bounded Random Variables, Journal of the American Statistical Association, vol.1, issue.301, pp.13-30, 1963.
DOI : 10.1214/aoms/1177730491

J. Hren and R. Munos, Optimistic Planning of Deterministic Systems, European Workshop on Reinforcement Learning, 2008.
DOI : 10.1007/978-3-540-89722-4_12

URL : https://hal.archives-ouvertes.fr/hal-00830182

D. Hsu, W. S. Lee, and N. Rong, What makes some POMDP problems easy to approximate?, Neural Information Processing Systems, 2007.

M. Inaba, N. Katoh, and H. Imai, -clustering, Proceedings of the tenth annual symposium on Computational geometry , SCG '94, pp.332-339, 1994.
DOI : 10.1145/177424.178042

URL : https://hal.archives-ouvertes.fr/in2p3-01333933

P. Indyk, Sublinear time algorithms for metric space problems, Proceedings of the thirty-first annual ACM symposium on Theory of computing , STOC '99, pp.428-434, 1999.
DOI : 10.1145/301250.301366

S. Jegelka, Statistical learning theory approaches to clustering Master's thesis, 2007.

M. Kakade, On the Sample Complexity of Reinforcement Learning, 2003.

R. Kannan, S. Vempala, and A. Vetta, On clusterings, Journal of the ACM, vol.51, issue.3, pp.497-515, 2004.
DOI : 10.1145/990308.990313

M. Kearns, Y. Mansour, and A. Y. Ng, A sparse sampling algorithm for near-optimal planning in large Markovian decision processes, Machine Learning, pp.193-208, 2002.

R. Kleinberg, Nearly tight bounds for the continuum-armed bandit problem, 18th Advances in Neural Information Processing Systems, 2004.

R. Kleinberg, A. Slivkins, and E. Upfal, Multi-armed bandits in metric spaces, Proceedings of the fourtieth annual ACM symposium on Theory of computing, STOC 08, 2008.
DOI : 10.1145/1374376.1374475

R. D. Kleinberg, A. Niculescu-mizil, and Y. Sharma, Regret bounds for sleeping experts and bandits, COLT, pp.343-354, 2008.
DOI : 10.1007/s10994-010-5178-7

L. Kocsis and C. Szepesvari, Bandit Based Monte-Carlo Planning, Proceedings of the 15th European Conference on Machine Learning, pp.282-293, 2006.
DOI : 10.1007/11871842_29

T. L. Lai and H. Robbins, Asymptotically efficient adaptive allocation rules, Advances in Applied Mathematics, vol.6, issue.1, pp.4-22, 1985.
DOI : 10.1016/0196-8858(85)90002-8

V. Lange, M. Roth, J. Braun, and . Buhmann, Stability-Based Validation of Clustering Solutions, Neural Computation, vol.16, issue.6, pp.1299-1323, 2004.
DOI : 10.1093/bioinformatics/17.4.309

A. Lazaric and R. Munos, Hybrid stochastic-adversarial online learning, 22nd annual conference on learning theory, 2009.
URL : https://hal.archives-ouvertes.fr/hal-00830168

N. Littlestone and M. K. Warmuth, The Weighted Majority Algorithm, Information and Computation, vol.108, issue.2, pp.212-261, 1994.
DOI : 10.1006/inco.1994.1009

K. Liu and Q. Zhao, A Restless Bandit Formulation of Opportunistic Access: Indexablity and Index Policy, 2008 5th IEEE Annual Communications Society Conference on Sensor, Mesh and Ad Hoc Communications and Networks Workshops, 2008.
DOI : 10.1109/SAHCNW.2008.12

D. Madani, R. Lizotte, and . Greiner, The budgeted multi-armed bandit problem Open problems session, pp.643-645, 2004.

S. Mannor and J. N. Tsitsiklis, The sample complexity of exploration in the multi-armed bandit problem, Journal of Machine Learning Research, vol.5, pp.623-648, 2004.

O. Maron and A. W. Moore, Hoeffding races: Accelerating model selection search for classification and function approximation, NIPS, pp.59-66, 1993.

P. Massart, Ecole d'Ete de Probabilites de Saint-Flour XXXIII -2003, 2006.

. Mcdiarmid, On the method of bounded differences, pp.148-188, 1989.
DOI : 10.1017/CBO9781107359949.008

G. Mclachlan and D. Peel, Finite Mixture Models, 2004.
DOI : 10.1002/0471721182

H. B. Mcmahan and A. Blum, Online Geometric Optimization in the Bandit Setting Against an Adaptive Adversary, Proceedings of the 17th Annual Conference on Learning Theory, pp.109-123, 2004.
DOI : 10.1007/978-3-540-27819-1_8

N. Mishra, D. Oblinger, and L. Pitt, Sublinear time approximate clustering, Proceedings of the Twelfth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA-01), pp.439-447, 2001.

V. Mnih, C. Szepesvári, and J. Audibert, Empirical Bernstein stopping, Proceedings of the 25th international conference on Machine learning, ICML '08, pp.672-679, 2008.
DOI : 10.1145/1390156.1390241

URL : https://hal.archives-ouvertes.fr/hal-00834983

M. Newman, Finding community structure in networks using the eigenvectors of matrices, Physical Review E, vol.74, issue.3, p.36104, 2006.
DOI : 10.1103/PhysRevE.74.036104

N. Srebro, G. Shakhnarovich, and S. Roweis, An investigation of computational and informational limits in Gaussian mixture clustering, Proceedings of the 23rd international conference on Machine learning , ICML '06, 2006.
DOI : 10.1145/1143844.1143953

R. Ostrovsky, Y. Rabani, L. J. Schulman, and C. Swamy, The effectiveness of Lloyd-type methods for the k-means problem, FOCS, 2006.

D. Pandey, D. Agarwal, V. Chakrabarti, and . Josifovski, Bandits for taxonomies: A modelbased approach, Proceedings of the Seventh SIAM International Conference on Data Mining, 2007.

S. Pandey, D. Chakrabarti, and D. Agarwal, Multi-armed bandit problems with dependent arms, Proceedings of the 24th international conference on Machine learning, ICML '07, pp.721-728, 2007.
DOI : 10.1145/1273496.1273587

D. Pollard, Strong Consistency of $K$-Means Clustering, The Annals of Statistics, vol.9, issue.1, pp.135-140, 1981.
DOI : 10.1214/aos/1176345339

A. Rakhlin and A. Caponnetto, Stability of k-means clustering, Advances in Neural Information Processing Systems 19, 2007.

H. Robbins, Some aspects of the sequential design of experiments, Bulletin of the American Mathematical Society, vol.58, issue.5, pp.527-535, 1952.
DOI : 10.1090/S0002-9904-1952-09620-8

M. P. Schadd, M. H. Winands, H. J. Van-den-herik, and H. Aldewereld, Addressing NPcomplete puzzles with Monte-Carlo methods, Proceedings of the AISB 2008 Symposium on Logic and the Simulation of Interaction and Reasoning The Society for the study of Artificial Intelligence and Simulation of Behaviour, pp.55-61, 2008.

K. Schlag, Eleven tests needed for a recommendation, 2006.

O. Shamir and N. Tishby, Cluster stability for finite samples, NIPS, 2008.

O. Shamir and N. Tishby, Stability and model selection in k-means clustering, COLT, 2008.
DOI : 10.1007/s10994-010-5177-8

O. Shamir and N. Tishby, On the reliability of clustering stability in the large sample regime, NIPS, 2008.

A. Slivkins and E. Upfal, Adapting to a changing environment: the brownian restless bandits, COLT, pp.343-354, 2008.

D. Spielman and S. Teng, Spectral partitioning works: Planar graphs and finite element meshes, 37th Annual Symposium on Foundations of Computer Science, pp.96-105, 1996.
DOI : 10.1016/j.laa.2006.07.020

G. Stoltz, Incomplete Information and Internal Regret in Prediction of Individual Sequences, 2005.
URL : https://hal.archives-ouvertes.fr/tel-00009759

M. J. Streeter and S. F. Smith, A Simple Distribution-Free Approach to the Max k-Armed Bandit Problem, Principles and Practice of Constraint Programming (CP), pp.560-574, 2006.
DOI : 10.1007/11889205_40

M. J. Streeter and S. F. Smith, An asymptotically optimal algorithm for the max k-armed bandit problem, AAAI, 2006.

R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, IEEE Transactions on Neural Networks, vol.9, issue.5, 1998.
DOI : 10.1109/TNN.1998.712192

W. R. Thompson, On the likelihood that one unknown probability exceeds another in view of the evidence of two samples, Bulletin of the American Mathematics Society, vol.25, pp.285-294, 1933.

A. W. Van-der-vaart and J. A. Wellner, Weak Convergence and Empirical Processes, 1996.
DOI : 10.1007/978-1-4757-2545-2

V. Vapnik, The Nature of Statistical Learning Theory, 1995.

U. Luxburg, A tutorial on spectral clustering, Statistics and Computing, vol.21, issue.1, pp.395-416, 2007.
DOI : 10.1007/s11222-007-9033-z

U. Luxburg and S. Ben-david, Towards a statistical theory of clustering, PASCAL workshop on Statistics and Optimization of Clustering, 2005.

U. Von-luxburg, M. Belkin, and O. Bousquet, Consistency of spectral clustering, The Annals of Statistics, vol.36, issue.2, pp.555-586, 2008.
DOI : 10.1214/009053607000000640

U. Von-luxburg, S. Bubeck, S. Jegelka, and M. Kaufmann, Consistent minimization of clustering objective functions, Advances in Neural Information Processing Systems (NIPS) 21, 2008.
URL : https://hal.archives-ouvertes.fr/inria-00185777

D. Wagner and F. Wagner, Between Min Cut and Graph Bisection, Proceedings of the 18th International Symposium on Mathematical Foundations of Computer Science (MFCS), pp.744-750, 1993.
DOI : 10.1007/3-540-57182-5_65

C. Wang, S. R. Kulkarni, and H. V. Poor, Bandit problems with side observations, IEEE Transactions on Automatic Control, vol.50, issue.3, pp.338-355, 2005.
DOI : 10.1109/TAC.2005.844079

Y. Wang, J. Y. Audibert, and R. Munos, Algorithms for infinitely many-armed bandits, Advances in Neural Information Processing Systems 21, pp.1729-1736, 2009.

P. Whittle, Activity allocation in a changing world, Journal of Applied Probability, p.27, 1988.

M. Wong and T. Lane, A kth nearest neighbor clustering procedure, J.R. Statist.Soc B, vol.45, issue.3, pp.362-368, 1983.

C. Yu, J. Chuang, B. Gerkey, G. Gordon, and A. Y. Ng, Open loop plans in POMDPs, 2005.

Z. Zhang, B. Dai, and A. Tung, Estimating local optimums in EM algorithm over Gaussian mixture model, Proceedings of the 25th international conference on Machine learning, ICML '08, 2008.
DOI : 10.1145/1390156.1390312

Q. Zhao, L. Tong, and A. Swami, Decentralized cognitive mac for dynamic spectrum access, New Frontiers in Dynamic Spectrum Access Networks, 2005. DySPAN 2005. 2005 First IEEE International Symposium on, pp.224-232, 2005.