D. Abbasi-yadkori, C. Pál, and . Szepesvári, Improved algorithms for linear stochastic bandits, Neural Information Processing Systems, pp.39-40, 2011.

J. D. Abernethy, E. Hazan, and A. Rakhlin, Competing in the dark: an efficient algorithm for bandit linear optimization, Conference on Learning Theory, 2008.

S. Agrawal and N. Goyal, Thompson sampling for contextual bandits with linear payoffs, International Conference on Machine Learning, pp.45-46, 2013.

C. Allenberg, P. Auer, L. Györfi, and G. Ottucsák, Hannan Consistency in On-Line Learning in Case of Unbounded Losses Under Partial Monitoring, Algorithmic Learning Theory, p.89, 2006.
DOI : 10.1007/11894841_20

N. Alon, N. Cesa-bianchi, C. Gentile, and Y. Mansour, From bandits to experts: A tale of domination and independence, Neural Information Processing Systems, pp.64-99, 2013.

N. Alon, N. Cesa-bianchi, O. Dekel, and T. Koren, Online learning with feedback graphs: Beyond bandits, Conference on Learning Theory, pp.64-78, 2015.

J. Audibert and S. Bubeck, Regret bounds and minimax policies under partial monitoring, Journal of Machine Learning Research, vol.89, p.69, 2010.
URL : https://hal.archives-ouvertes.fr/hal-00654356

J. Audibert, S. Bubeck, and G. Lugosi, Regret in Online Combinatorial Optimization, Mathematics of Operations Research, vol.39, issue.1, pp.115-116, 2014.
DOI : 10.1287/moor.2013.0598

P. Auer, Using confidence bounds for exploitation-exploration trade-offs, Journal of Machine Learning Research, vol.6, issue.51, p.47, 2002.

P. Auer and R. Ortner, UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem, Periodica Mathematica Hungarica, vol.5, issue.1-2, 2010.
DOI : 10.1007/s10998-010-3055-6

P. Auer, N. Cesa-bianchi, and P. Fischer, Finite-time analysis of the multiarmed bandit problem, Machine Learning, vol.102, pp.10-72, 2002.

P. Auer, N. Cesa-bianchi, Y. Freund, and R. E. Schapire, The non-stochastic multiarmed bandit problem, SIAM Journal on Computing, vol.68, issue.114, pp.26-71, 2002.

P. Auer, N. Cesa-bianchi, and C. Gentile, Adaptive and Self-Confident On-Line Learning Algorithms, Journal of Computer and System Sciences, vol.64, issue.1, p.77, 2002.
DOI : 10.1006/jcss.2001.1795

B. Awerbuch and R. D. Kleinberg, Adaptive routing with end-to-end feedback, Proceedings of the thirty-sixth annual ACM symposium on Theory of computing , STOC '04, 2004.
DOI : 10.1145/1007352.1007367

M. Babaioff, Y. Sharma, and A. Slivkins, Characterizing truthful multi-armed bandit mechanisms, SIAM Journal on Computing, pp.2014-2017
DOI : 10.1137/120878768

URL : http://arxiv.org/pdf/0812.2291

G. Bartók, D. Pál, and C. Szepesvári, Minimax regret of finite partial-monitoring games in stochastic environments, Conference on Learning Theory, 2011.

G. Bartók, D. P. Foster, D. Pál, A. Rakhlin, and C. Szepesvári, Partial monitoringclassification , regret bounds, and algorithms, Mathematics of Operations Research, pp.2014-100

M. Belkin, I. Matveeva, and P. Niyogi, Regularization and semi-supervised learning on large graphs, Conference on Computational Learning Theory, 2004.

M. Belkin, P. Niyogi, and V. Sindhwani, Manifold regularization: A geometric framework for learning from labeled and unlabeled examples, Journal of Machine Learning Research, vol.20, pp.14-16, 2006.

A. Beygelzimer, J. Langford, L. Li, L. Reyzin, and R. E. Schapire, Contextual bandit algorithms with supervised learning guarantees, International Conference on Artificial Intelligence and Statistics, pp.2011-69

D. Billsus, M. J. Pazzani, and J. Chen, A learning agent for wireless news access, Proceedings of the 5th international conference on Intelligent user interfaces , IUI '00, 2000.
DOI : 10.1145/325737.325768

S. Bubeck and N. Cesa-bianchi, Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Machine Learning, pp.2012-69
DOI : 10.1561/2200000024

S. Bubeck, R. Munos, G. Stoltz, and C. Szepesvári, X-armed bandits, Journal of Machine Learning Research, pp.2011-2029
URL : https://hal.archives-ouvertes.fr/hal-00450235

S. Bubeck, N. Cesa-bianchi, and S. M. Kakade, Towards minimax policies for online linear optimization with bandit feedback, Conference on Learning Theory, pp.2012-2018

S. Buccapatnam, A. Eryilmaz, and N. B. Shroff, Stochastic bandits with side observations on networks, International Conference on Measurement and Modeling of Computer Systems, pp.2014-89

S. Caron, B. Kveton, M. Lelarge, and S. Bhagat, Leveraging side observations in stochastic bandits, Conference on Uncertainty in Artificial Intelligence, pp.2012-89
URL : https://hal.archives-ouvertes.fr/hal-01270324

A. Carpentier and M. Valko, Revealing graph bandits for maximizing local influence, International Conference on Artificial Intelligence and Statistics, pp.2016-89
URL : https://hal.archives-ouvertes.fr/hal-01304020

N. Cesa-bianchi and G. Lugosi, Prediction, Learning, and Games, p.79, 2006.
DOI : 10.1017/CBO9780511546921

N. Cesa-bianchi and G. Lugosi, Combinatorial bandits, Journal of Computer and System Sciences, vol.78, issue.5, 2012.
DOI : 10.1016/j.jcss.2012.01.001

N. Cesa-bianchi, Y. Freund, D. Haussler, D. P. Helmbold, R. E. Schapire et al., How to use expert advice, Journal of the ACM, 1997.
DOI : 10.1145/167088.167198

N. Cesa-bianchi, G. Lugosi, and G. Stoltz, Minimizing regret with label efficient prediction, IEEE Transactions on Information Theory, p.89, 2005.
DOI : 10.1109/tit.2005.847729

URL : https://hal.archives-ouvertes.fr/hal-00007537

N. Cesa-bianchi, S. Shalev-shwartz, and O. Shamir, Online learning of noisy data with kernels, Conference on Learning Theory, 2010.

N. Cesa-bianchi, C. Gentile, and G. Zappella, A gang of bandits, Neural Information Processing Systems, pp.2013-2031

O. Chapelle and L. Li, An empirical evaluation of Thompson sampling, Neural Information Processing Systems, pp.2011-2017

D. H. Chau, A. Kittur, J. I. Hong, and C. Faloutsos, Apolo, Proceedings of the 2011 annual conference on Human factors in computing systems, CHI '11, 2011.
DOI : 10.1145/1978942.1978967

W. Chen, Y. Wang, and Y. Yuan, Combinatorial multi-armed bandit: General framework and applications, International Conference on Machine Learning, pp.2013-114

W. Chu, L. Li, L. Reyzin, and R. E. Schapire, Contextual bandits with linear payoff functions, International Conference on Artificial Intelligence and Statistics, pp.2011-2017

A. Cohen, T. Hazan, and T. Koren, Online learning with feedback graphs without the graphs, International Conference on Machine Learning, p.65, 2016.

R. Combes and A. Proutière, Unimodal bandits: Regret lower bounds and optimal algorithms, International Conference on Machine Learning, pp.2014-2032
DOI : 10.1145/2745844.2745847

URL : https://hal.archives-ouvertes.fr/hal-01092662

V. Dani, T. P. Hayes, and S. M. Kakade, Stochastic linear optimization under bandit feedback, Conference on Learning Theory, 2008.

T. Desautels, A. Krause, and J. Burdick, Parallelizing exploration-exploitation tradeoffs in gaussian process bandit optimization, International Conference on Machine Learning, pp.2012-2045

L. Devroye, G. Lugosi, and G. Neu, Prediction by random-walk perturbation, Conference on Learning Theory, pp.2013-100

M. Fang and D. Tao, Networked bandits with disjoint linear payoffs, Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD '14, pp.2014-2032
DOI : 10.1145/2623330.2623672

Y. Freund and R. E. Schapire, A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting, Journal of Computer and System Sciences, vol.55, issue.1, p.95, 1997.
DOI : 10.1006/jcss.1997.1504

C. Gentile, S. Li, and G. Zappella, Online clustering of bandits, International Conference on Machine Learning, pp.2014-2032

Q. Gu and J. Han, Online Spectral Learning on a Graph with Bandit Feedback, 2014 IEEE International Conference on Data Mining, pp.2014-2033
DOI : 10.1109/ICDM.2014.72

L. Györfi and G. Ottucsák, Sequential Prediction of Unbounded Stationary Time Series, IEEE Transactions on Information Theory, vol.53, issue.5, p.72, 2007.
DOI : 10.1109/TIT.2007.894660

A. György, T. Linder, G. Lugosi, and G. Ottucsák, The on-line shortest path problem under partial monitoring, Journal of Machine Learning Research, issue.2, 2007.

M. K. Hanawal, V. Saligrama, M. Valko, and R. Munos, Cheap bandits, International Conference on Machine Learning, pp.2015-2034
URL : https://hal.archives-ouvertes.fr/hal-01153540

J. Hannan, Approximation to Bayes risk in repeated play. Contributions to the theory of games, p.116, 1957.

R. A. Horn and C. R. Johnson, Matrix analysis, 1990.

M. Hutter and J. Poland, Prediction with Expert Advice by Following the Perturbed Leader for General Weights, Algorithmic Learning Theory, p.119, 2004.
DOI : 10.1007/978-3-540-30215-5_22

M. Jamali and M. Ester, A matrix factorization technique with trust propagation for recommendation in social networks, Proceedings of the fourth ACM conference on Recommender systems, RecSys '10, pp.2010-59
DOI : 10.1145/1864708.1864736

D. Jannach, M. Zanker, A. Felfernig, and G. Friedrich, Recommender systems: An introduction, 2010.
DOI : 10.1017/CBO9780511763113

A. Kalai and S. Vempala, Efficient algorithms for online decision problems, Journal of Computer and System Sciences, vol.119, issue.129, p.115, 2005.

R. H. Keshavan, A. Montanari, and S. Oh, Matrix completion from a few entries, IEEE International Symposium on Information Theory, p.57, 2009.

R. Kleinberg, A. Slivkins, and E. Upfal, Multi-armed bandits in metric spaces, Proceedings of the fourtieth annual ACM symposium on Theory of computing, STOC 08, p.18, 2008.
DOI : 10.1145/1374376.1374475

T. Kocák, G. Neu, M. Valko, and R. Munos, Efficient learning by implicit exploration in bandit problems with side observations, Neural Information Processing Systems, pp.64-69, 2014.

T. Kocák, M. Valko, R. Munos, and S. Agrawal, Spectral Thompson sampling, AAAI Conference on Artificial Intelligence, 2014.

T. Kocák, M. Valko, R. Munos, B. Kveton, and S. Agrawal, Spectral bandits for smooth graph functions with applications in recommender systems, AAAI Workshop on Sequential Decision-Making with Big Data, 2014.

T. Kocák, G. Neu, and M. Valko, Online learning with noisy side observations, International Conference on Artificial Intelligence and Statistics, 2016.

T. Kocák, G. Neu, and M. Valko, Online learning with Erd?s-Rényi side-observation graphs, Conference on Uncertainty in Artificial Intelligence, 2016.

W. M. Koolen, M. K. Warmuth, and J. Kivinen, Hedging structured concepts, Conference on Learning Theory, pp.7-115, 2010.

N. Korda, B. Szörényi, and S. Li, Distributed clustering of linear andits in peer to peer networks, International Conference on Machine Learning, pp.2016-2034

I. Koutis, G. L. Miller, and D. Tolliver, Combinatorial preconditioners and multilevel solvers for problems in computer vision and image processing, Computer Vision and Image Understanding, pp.2011-2044

L. Li, W. Chu, J. Langford, and R. E. Schapire, A contextual-bandit approach to personalized news article recommendation, Proceedings of the 19th international conference on World wide web, WWW '10, pp.16-28, 2010.
DOI : 10.1145/1772690.1772758

S. Li, C. Gentile, A. Karatzoglou, and G. Zappella, Online context-dependent clustering in recommendations based on exploration-exploitation algorithms. arXiv preprint, pp.2015-2033

N. Littlestone and M. K. Warmuth, The weighted majority algorithm. Information and Computation, p.5, 1994.
DOI : 10.1016/b978-0-08-094829-4.50035-0

U. Luxburg, A tutorial on spectral clustering, Statistics and Computing, vol.21, issue.1, 2007.
DOI : 10.1017/CBO9780511810633

Y. Ma, T. Huang, and J. Schneider, Active search and bandits on graphs using sigma-optimality, Conference on Uncertainty in Artificial Intelligence, pp.2015-2034

S. Mannor and O. Shamir, From bandits to experts: On the value of side-observations

H. B. Mcmahan and A. Blum, Online Geometric Optimization in the Bandit Setting Against an Adaptive Adversary, Conference on Learning Theory, 2004.
DOI : 10.1007/978-3-540-27819-1_8

M. Mcpherson, L. Smith-lovin, and J. Cook, Birds of a Feather: Homophily in Social Networks, Annual Review of Sociology, vol.27, issue.1, 2001.
DOI : 10.1146/annurev.soc.27.1.415

S. K. Narang, A. Gadde, and A. Ortega, Signal processing techniques for interpolation in graph structured data, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp.2013-2032
DOI : 10.1109/ICASSP.2013.6638704

G. Neu and G. Bartók, An Efficient Algorithm for Learning with Semi-bandit Feedback, Algorithmic Learning Theory, pp.91-129, 2013.
DOI : 10.1007/978-3-642-40935-6_17

S. Pandey, D. Chakrabarti, and D. Agarwal, Multi-armed bandit problems with dependent arms, Proceedings of the 24th international conference on Machine learning, ICML '07, 2007.
DOI : 10.1145/1273496.1273587

URL : http://www.cs.cmu.edu/~spandey/publications/dependent-bandit.pdf

H. Robbins, Some aspects of the sequential design of experiments. Bulletin of the, 1952.

E. M. Schwartz, Optimizing adaptive marketing experiments with the multi-armed bandit, pp.2013-2016

Y. Seldin, P. Bartlett, K. Crammer, and Y. Abbasi-yadkori, Prediction with limited advice and multiarmed bandits with paid observations, International Conference on Machine Learning, pp.89-91, 2014.

A. Slivkins, Contextual bandits with similarity information, Conference on Learning Theory, p.18, 2009.

N. Srinivas, A. Krause, S. M. Kakade, and M. Seeger, Gaussian process optimization in the bandit setting: No regret and experimental design, International Conference on Machine Learning, pp.2010-2028

E. Takimoto and M. K. Warmuth, Path Kernels and Multiplicative Updates, Journal of Machine Learning Research, 2003.
DOI : 10.1007/3-540-45435-7_6

URL : http://www.cse.ucsc.edu/~manfred/pubs/J55.pdf

W. R. Thompson, On the likelihood that one unknown probability exceeds another in view of the evidence of two samples, Biometrika, vol.6, issue.2, p.1, 1933.

M. Valko, N. Korda, R. Munos, I. Flaounas, and N. Cristianini, Finite-time analysis of kernelised contextual bandits, In Uncertainty in Artificial Intelligence, pp.2013-2031
URL : https://hal.archives-ouvertes.fr/hal-00826946

M. Valko, R. Munos, B. Kveton, and T. Kocák, Spectral bandits for smooth graph functions, International Conference on Machine Learning, p.21, 2014.
URL : https://hal.archives-ouvertes.fr/hal-00986818

V. Vovk, AGGREGATING STRATEGIES, Proceedings of the third annual workshop on Computational learning theory, 1990.
DOI : 10.1016/B978-1-55860-146-8.50032-1

M. Wainwright, STAT 210B advanced mathematical statistics. Lecture notes, pp.2015-2064

Y. Wu, A. György, and C. Szepesvári, Online learning with Gaussian payoffs and side observations, Neural Information Processing Systems, p.100, 2015.

J. Y. Yu and S. Mannor, Unimodal bandits, International Conference on Machine Learning, pp.2011-2029

X. Zhu, Semi-supervised learning literature survey, 2008.