Forced-exploration based algorithms for playing in bandits with large action sets, 2009. ,
Competing in the dark: An efficient algorithm for bandit linear optimization, COLT, pp.263-274, 2008. ,
Sample mean based index policies by O(log n) regret for the multi-armed bandit problem, Advances in Applied Probability, vol.32, issue.04, pp.1054-1078, 1995. ,
DOI : 10.1016/0196-8858(85)90002-8
The Continuum-Armed Bandit Problem, SIAM Journal on Control and Optimization, vol.33, issue.6, pp.1926-1951, 1995. ,
DOI : 10.1137/S0363012992237273
Hannan Consistency in On-Line Learning in Case of Unbounded Losses Under Partial Monitoring, ALT, pp.229-243, 2006. ,
DOI : 10.1007/11894841_20
Active Learning in Multi-armed Bandits, Proc. of the 19th International Conference on Algorithmic Learning Theory, pp.329-343, 2008. ,
DOI : 10.1007/978-3-540-87987-9_25
k-means++: the advantages of careful seeding, SODA, 2007. ,
Minimax policies for adversarial and stochastic bandits, 22nd annual conference on learning theory, 2009. ,
URL : https://hal.archives-ouvertes.fr/hal-00834882
Minimax policies for bandits games, 2009. ,
Exploration???exploitation tradeoff using variance estimates in multi-armed bandits, Theoretical Computer Science, vol.410, issue.19, pp.1876-1902, 2009. ,
DOI : 10.1016/j.tcs.2009.01.016
URL : https://hal.archives-ouvertes.fr/hal-00711069
Best arm identification in multi-armed bandits, 23rd annual conference on learning theory, 2010. ,
URL : https://hal.archives-ouvertes.fr/hal-00654404
Gambling in a rigged casino: The adversarial multi-armed bandit problem, Proceedings of IEEE 36th Annual Foundations of Computer Science, pp.322-331, 1995. ,
DOI : 10.1109/SFCS.1995.492488
Finite-time analysis of the multiarmed bandit problem, Machine Learning, vol.47, issue.2/3, pp.235-256, 2002. ,
DOI : 10.1023/A:1013689704352
The Nonstochastic Multiarmed Bandit Problem, SIAM Journal on Computing, vol.32, issue.1, pp.48-77, 2003. ,
DOI : 10.1137/S0097539701398375
Improved Rates for the Stochastic Continuum-Armed Bandit Problem, 20th Conference on Learning Theory, pp.454-468, 2007. ,
DOI : 10.1007/978-3-540-72927-3_33
Near-optimal regret bounds for reinforcement learning, Advances in Neural Information Processing Systems 21, pp.89-96, 2009. ,
Using confidence bounds for exploitation-exploration trade-offs, Journal of Machine Learning Research, vol.3, pp.397-422, 2002. ,
Adaptive routing with end-to-end feedback, Proceedings of the thirty-sixth annual ACM symposium on Theory of computing , STOC '04, pp.45-53, 2004. ,
DOI : 10.1145/1007352.1007367
Anaerobic digestion model no, 2002. ,
A framework for statistical clustering with constant time approximation algorithms for K-median and K-means clustering, Machine Learning, pp.243-257, 2007. ,
DOI : 10.1007/s10994-006-0587-3
Relating clustering stability to properties of cluster boundaries, COLT, 2008. ,
A sober look on clustering stability Stability of k -means clustering, COLT COLT, 2006. ,
Agnostic online learning, 22nd annual conference on learning theory, 2009. ,
Bandit problems with infinitely many arms, The Annals of Statistics, vol.25, issue.5, pp.2103-2116, 1997. ,
DOI : 10.1214/aos/1069362389
Convergence of Probability Measures, 1968. ,
DOI : 10.1002/9780470316962
Convergence properties of the k-means algorithm, NIPS, 1995. ,
Open loop optimistic planning, 23rd annual conference on learning theory, 2010. ,
URL : https://hal.archives-ouvertes.fr/hal-00943119
Nearest neighbor clustering: A baseline method for consistent clustering with arbitrary objective functions, JMLR, vol.10, pp.657-698, 2009. ,
URL : https://hal.archives-ouvertes.fr/inria-00185780
Pure Exploration in Multi-armed Bandits Problems, Proc. of the 20th International Conference on Algorithmic Learning Theory, 2009. ,
DOI : 10.1090/S0002-9904-1952-09620-8
Pure exploration in finitely?armed and continuously?armed bandits, 2009. ,
Online optimization in X -armed bandits, Advances in Neural Information Processing Systems 22, pp.201-208, 2009. ,
URL : https://hal.archives-ouvertes.fr/inria-00329797
Empirical risk approximation: An induction principle for unsupervised learning, 1998. ,
Analysis of two gradient-based algorithms for on-line regression, Proceedings of the tenth annual conference on Computational learning theory , COLT '97, pp.392-411, 1999. ,
DOI : 10.1145/267460.267492
Combinatorial bandits, 22nd annual conference on learning theory, 2009. ,
DOI : 10.1016/j.jcss.2012.01.001
How to use expert advice, Journal of the ACM, vol.44, issue.3, pp.427-485, 1997. ,
DOI : 10.1145/258128.258179
Minimizing Regret With Label Efficient Prediction, IEEE Transactions on Information Theory, vol.51, issue.6, pp.2152-2162, 2005. ,
DOI : 10.1109/TIT.2005.847729
URL : https://hal.archives-ouvertes.fr/hal-00007537
Mortal multi-armed bandits, Advances in Neural Information Processing Systems 22, pp.273-280, 2009. ,
Simulation-based Algorithms for Markov Decision Processes, 2007. ,
PROGRESSIVE STRATEGIES FOR MONTE-CARLO TREE SEARCH, New Mathematics and Natural Computation, vol.04, issue.03, pp.343-357, 2008. ,
DOI : 10.1142/S1793005708001094
Regret and convergence bounds for immediate-reward reinforcement learning with continuous action spaces, 2004. ,
Bandit algorithms for tree search, Proceedings of the 23rd Conference on Uncertainty in Artificial Intelligence, 2007. ,
URL : https://hal.archives-ouvertes.fr/inria-00150207
Elements of information theory, 1991. ,
Sublinear-time approximation algorithms for clustering via random sampling. Random Struct, Algorithms, vol.30, issue.12, pp.226-256, 2007. ,
Stochastic linear optimization under bandit feedback, COLT, pp.355-366, 2008. ,
A probabilistic analysis of EM for mixtures of separated, spherical Gaussians, Journal of Machine Learnig Research, vol.8, pp.203-226, 2007. ,
Combinatorial Methods in Density Estimation, 2001. ,
DOI : 10.1007/978-1-4613-0125-7
A Probabilistic Theory of Pattern Recognition, 1996. ,
DOI : 10.1007/978-1-4612-0711-5
Stochastic Processes, 1953. ,
PAC Bounds for Multi-armed Bandit and Markov Decision Processes, Proceedings of the 15th Annual Conference on Computational Learning Theory, pp.255-270, 2002. ,
DOI : 10.1007/3-540-45435-7_18
Unsupervised learning of finite mixture models, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.24, issue.3, pp.381-396, 2002. ,
DOI : 10.1109/34.990138
Regret bounds for opportunistic channel access, Available on Arxiv, 2009. ,
Simulation-based approach to general game playing, Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence, pp.259-264, 2008. ,
Calibrated Learning and Correlated Equilibrium, Games and Economic Behavior, vol.21, issue.1-2, pp.40-55, 1997. ,
DOI : 10.1006/game.1997.0595
How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster Analysis, The Computer Journal, vol.41, issue.8, pp.41578-588, 1998. ,
DOI : 10.1093/comjnl/41.8.578
On tail probabilities for martingales. The Annals of Probability, pp.100-118, 1975. ,
Distribution-free exponential error bound for nearest neighbor pattern classification, IEEE Transactions on Information Theory, vol.21, issue.5, pp.552-557, 1975. ,
DOI : 10.1109/TIT.1975.1055443
The complexity of the generalized Lloyd - Max problem (Corresp.), IEEE Transactions on Information Theory, vol.28, issue.2, pp.255-256, 1982. ,
DOI : 10.1109/TIT.1982.1056488
On upper-confidence bound policies for non-stationary bandit problems . ArXiv e-prints, 2008. ,
URL : https://hal.archives-ouvertes.fr/hal-00281392
Achieving master level play in 9× 9 computer go, Proceedings of AAAI, pp.1537-1540, 2008. ,
Combining online and offline knowledge in UCT, Proceedings of the 24th international conference on Machine learning, ICML '07, pp.273-280, 2007. ,
DOI : 10.1145/1273496.1273531
URL : https://hal.archives-ouvertes.fr/inria-00164003
Modification of UCT with patterns in Monte-Carlo go, 2006. ,
URL : https://hal.archives-ouvertes.fr/inria-00117266
Multi-armed Bandit Allocation Indices. Wiley-Interscience series in systems and optimization, 1989. ,
DOI : 10.1002/9780470980033
The minimum description length principle, 2007. ,
On the Quality of Spectral Separators, SIAM Journal on Matrix Analysis and Applications, vol.19, issue.3, pp.701-719, 1998. ,
DOI : 10.1137/S0895479896312262
The on-line shortest path problem under partial monitoring, J. Mach. Learn. Res, vol.8, pp.2369-2403, 2007. ,
Consistency of Single Linkage for High-Density Clusters, Journal of the American Statistical Association, vol.1, issue.374, pp.388-394, 1981. ,
DOI : 10.1080/00401706.1972.10488943
Statistical theory in clustering, Journal of Classification, vol.45, issue.B, pp.63-76, 1985. ,
DOI : 10.1007/BF01908064
-Center Problem, Mathematics of Operations Research, vol.10, issue.2, pp.180-184, 1985. ,
DOI : 10.1287/moor.10.2.180
URL : https://hal.archives-ouvertes.fr/hal-00897097
Probability Inequalities for Sums of Bounded Random Variables, Journal of the American Statistical Association, vol.1, issue.301, pp.13-30, 1963. ,
DOI : 10.1214/aoms/1177730491
Optimistic Planning of Deterministic Systems, European Workshop on Reinforcement Learning, 2008. ,
DOI : 10.1007/978-3-540-89722-4_12
URL : https://hal.archives-ouvertes.fr/hal-00830182
What makes some POMDP problems easy to approximate?, Neural Information Processing Systems, 2007. ,
-clustering, Proceedings of the tenth annual symposium on Computational geometry , SCG '94, pp.332-339, 1994. ,
DOI : 10.1145/177424.178042
URL : https://hal.archives-ouvertes.fr/in2p3-01333933
Sublinear time algorithms for metric space problems, Proceedings of the thirty-first annual ACM symposium on Theory of computing , STOC '99, pp.428-434, 1999. ,
DOI : 10.1145/301250.301366
Statistical learning theory approaches to clustering Master's thesis, 2007. ,
On the Sample Complexity of Reinforcement Learning, 2003. ,
On clusterings, Journal of the ACM, vol.51, issue.3, pp.497-515, 2004. ,
DOI : 10.1145/990308.990313
A sparse sampling algorithm for near-optimal planning in large Markovian decision processes, Machine Learning, pp.193-208, 2002. ,
Nearly tight bounds for the continuum-armed bandit problem, 18th Advances in Neural Information Processing Systems, 2004. ,
Multi-armed bandits in metric spaces, Proceedings of the fourtieth annual ACM symposium on Theory of computing, STOC 08, 2008. ,
DOI : 10.1145/1374376.1374475
Regret bounds for sleeping experts and bandits, COLT, pp.343-354, 2008. ,
DOI : 10.1007/s10994-010-5178-7
Bandit Based Monte-Carlo Planning, Proceedings of the 15th European Conference on Machine Learning, pp.282-293, 2006. ,
DOI : 10.1007/11871842_29
Asymptotically efficient adaptive allocation rules, Advances in Applied Mathematics, vol.6, issue.1, pp.4-22, 1985. ,
DOI : 10.1016/0196-8858(85)90002-8
Stability-Based Validation of Clustering Solutions, Neural Computation, vol.16, issue.6, pp.1299-1323, 2004. ,
DOI : 10.1093/bioinformatics/17.4.309
Hybrid stochastic-adversarial online learning, 22nd annual conference on learning theory, 2009. ,
URL : https://hal.archives-ouvertes.fr/hal-00830168
The Weighted Majority Algorithm, Information and Computation, vol.108, issue.2, pp.212-261, 1994. ,
DOI : 10.1006/inco.1994.1009
A Restless Bandit Formulation of Opportunistic Access: Indexablity and Index Policy, 2008 5th IEEE Annual Communications Society Conference on Sensor, Mesh and Ad Hoc Communications and Networks Workshops, 2008. ,
DOI : 10.1109/SAHCNW.2008.12
The budgeted multi-armed bandit problem Open problems session, pp.643-645, 2004. ,
The sample complexity of exploration in the multi-armed bandit problem, Journal of Machine Learning Research, vol.5, pp.623-648, 2004. ,
Hoeffding races: Accelerating model selection search for classification and function approximation, NIPS, pp.59-66, 1993. ,
Ecole d'Ete de Probabilites de Saint-Flour XXXIII -2003, 2006. ,
On the method of bounded differences, pp.148-188, 1989. ,
DOI : 10.1017/CBO9781107359949.008
Finite Mixture Models, 2004. ,
DOI : 10.1002/0471721182
Online Geometric Optimization in the Bandit Setting Against an Adaptive Adversary, Proceedings of the 17th Annual Conference on Learning Theory, pp.109-123, 2004. ,
DOI : 10.1007/978-3-540-27819-1_8
Sublinear time approximate clustering, Proceedings of the Twelfth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA-01), pp.439-447, 2001. ,
Empirical Bernstein stopping, Proceedings of the 25th international conference on Machine learning, ICML '08, pp.672-679, 2008. ,
DOI : 10.1145/1390156.1390241
URL : https://hal.archives-ouvertes.fr/hal-00834983
Finding community structure in networks using the eigenvectors of matrices, Physical Review E, vol.74, issue.3, p.36104, 2006. ,
DOI : 10.1103/PhysRevE.74.036104
An investigation of computational and informational limits in Gaussian mixture clustering, Proceedings of the 23rd international conference on Machine learning , ICML '06, 2006. ,
DOI : 10.1145/1143844.1143953
The effectiveness of Lloyd-type methods for the k-means problem, FOCS, 2006. ,
Bandits for taxonomies: A modelbased approach, Proceedings of the Seventh SIAM International Conference on Data Mining, 2007. ,
Multi-armed bandit problems with dependent arms, Proceedings of the 24th international conference on Machine learning, ICML '07, pp.721-728, 2007. ,
DOI : 10.1145/1273496.1273587
Strong Consistency of $K$-Means Clustering, The Annals of Statistics, vol.9, issue.1, pp.135-140, 1981. ,
DOI : 10.1214/aos/1176345339
Stability of k-means clustering, Advances in Neural Information Processing Systems 19, 2007. ,
Some aspects of the sequential design of experiments, Bulletin of the American Mathematical Society, vol.58, issue.5, pp.527-535, 1952. ,
DOI : 10.1090/S0002-9904-1952-09620-8
Addressing NPcomplete puzzles with Monte-Carlo methods, Proceedings of the AISB 2008 Symposium on Logic and the Simulation of Interaction and Reasoning The Society for the study of Artificial Intelligence and Simulation of Behaviour, pp.55-61, 2008. ,
Eleven tests needed for a recommendation, 2006. ,
Cluster stability for finite samples, NIPS, 2008. ,
Stability and model selection in k-means clustering, COLT, 2008. ,
DOI : 10.1007/s10994-010-5177-8
On the reliability of clustering stability in the large sample regime, NIPS, 2008. ,
Adapting to a changing environment: the brownian restless bandits, COLT, pp.343-354, 2008. ,
Spectral partitioning works: Planar graphs and finite element meshes, 37th Annual Symposium on Foundations of Computer Science, pp.96-105, 1996. ,
DOI : 10.1016/j.laa.2006.07.020
Incomplete Information and Internal Regret in Prediction of Individual Sequences, 2005. ,
URL : https://hal.archives-ouvertes.fr/tel-00009759
A Simple Distribution-Free Approach to the Max k-Armed Bandit Problem, Principles and Practice of Constraint Programming (CP), pp.560-574, 2006. ,
DOI : 10.1007/11889205_40
An asymptotically optimal algorithm for the max k-armed bandit problem, AAAI, 2006. ,
Reinforcement Learning: An Introduction, IEEE Transactions on Neural Networks, vol.9, issue.5, 1998. ,
DOI : 10.1109/TNN.1998.712192
On the likelihood that one unknown probability exceeds another in view of the evidence of two samples, Bulletin of the American Mathematics Society, vol.25, pp.285-294, 1933. ,
Weak Convergence and Empirical Processes, 1996. ,
DOI : 10.1007/978-1-4757-2545-2
The Nature of Statistical Learning Theory, 1995. ,
A tutorial on spectral clustering, Statistics and Computing, vol.21, issue.1, pp.395-416, 2007. ,
DOI : 10.1007/s11222-007-9033-z
Towards a statistical theory of clustering, PASCAL workshop on Statistics and Optimization of Clustering, 2005. ,
Consistency of spectral clustering, The Annals of Statistics, vol.36, issue.2, pp.555-586, 2008. ,
DOI : 10.1214/009053607000000640
Consistent minimization of clustering objective functions, Advances in Neural Information Processing Systems (NIPS) 21, 2008. ,
URL : https://hal.archives-ouvertes.fr/inria-00185777
Between Min Cut and Graph Bisection, Proceedings of the 18th International Symposium on Mathematical Foundations of Computer Science (MFCS), pp.744-750, 1993. ,
DOI : 10.1007/3-540-57182-5_65
Bandit problems with side observations, IEEE Transactions on Automatic Control, vol.50, issue.3, pp.338-355, 2005. ,
DOI : 10.1109/TAC.2005.844079
Algorithms for infinitely many-armed bandits, Advances in Neural Information Processing Systems 21, pp.1729-1736, 2009. ,
Activity allocation in a changing world, Journal of Applied Probability, p.27, 1988. ,
A kth nearest neighbor clustering procedure, J.R. Statist.Soc B, vol.45, issue.3, pp.362-368, 1983. ,
Open loop plans in POMDPs, 2005. ,
Estimating local optimums in EM algorithm over Gaussian mixture model, Proceedings of the 25th international conference on Machine learning, ICML '08, 2008. ,
DOI : 10.1145/1390156.1390312
Decentralized cognitive mac for dynamic spectrum access, New Frontiers in Dynamic Spectrum Access Networks, 2005. DySPAN 2005. 2005 First IEEE International Symposium on, pp.224-232, 2005. ,