D. Abbasi-yadkori, C. Pál, and . Szepesvári, Improved algorithms for linear stochastic bandits, Proceedings of the 25th Annual Conference on Neural Information Processing Systems (NIPS), pp.78-121, 2011.

S. Ahipa?ao?lu, Solving ellipsoidal inclusion and optimal experimental design problems: theory and algorithms, p.73, 2009.

A. Antos, V. Grover, and C. Szepesvári, Active Learning in Multi-armed Bandits, Theoretical Computer Science, p.69, 2009.
DOI : 10.1007/978-3-540-87987-9_25

A. Antos, V. Grover, and C. Szepesvári, Active learning in heteroscedastic noise, Theoretical Computer Science, vol.411, issue.29-30, pp.2712-2728, 2010.
DOI : 10.1016/j.tcs.2010.04.007

J. Audibert, R. Munos, and C. Szepesvari, Exploration???exploitation tradeoff using variance estimates in multi-armed bandits, Theoretical Computer Science, vol.410, issue.19, pp.1876-1902, 2009.
DOI : 10.1016/j.tcs.2009.01.016

URL : https://hal.archives-ouvertes.fr/hal-00711069

J. Audibert, S. Bubeck, and R. Munos, Best Arm Identification in Multi-Armed Bandits, Proceedings of the 23rd Conference on Learning Theory (COLT), pp.13-30, 2010.
URL : https://hal.archives-ouvertes.fr/hal-00654404

P. Auer, Using confidence bounds for exploitation-exploration trade-offs, Journal of Machine Learning Research, vol.3, issue.22, pp.397-422, 2002.

P. Auer, N. Cesa-bianchi, and P. Fischer, Finite-time analysis of the multi-armed bandit problem, Machine Learning, vol.47, issue.2/3, pp.235-256, 2002.
DOI : 10.1023/A:1013689704352

R. Bechhofer, J. Kiefer, and M. Sobel, Sequential identification and ranking procedures, p.33, 1968.

P. J. Bickel and E. Levina, Regularized estimation of large covariance matrices, The Annals of Statistics, vol.36, issue.1, pp.199-227, 2008.
DOI : 10.1214/009053607000000758

J. Bien and R. J. Tibshirani, Sparse estimation of a covariance matrix, Biometrika, vol.98, issue.4, pp.807-820
DOI : 10.1093/biomet/asr054

M. Bouhtou, S. Gaubert, and G. Sagnol, Submodularity and Randomized rounding techniques for Optimal Experimental Design, Electronic Notes in Discrete Mathematics, vol.36, pp.679-686
DOI : 10.1016/j.endm.2010.05.086

S. Boyd and L. Vandenberghe, Convex Optimization, p.73, 2004.

E. Brunskill and L. Li, Sample complexity of multi-task reinforcement learning, Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence, p.106, 2013.

S. Bubeck and N. Cesa-bianchi, Regret analysis of stochastic and nonstochastic multiarmed bandit problems, Machine Learning, pp.1-122

S. Bubeck, R. Munos, and G. Stoltz, Pure Exploration in Multi-armed Bandits Problems, Proceedings of the Twentieth International Conference on Algorithmic Learning Theory, pp.23-37, 2009.
DOI : 10.1090/S0002-9904-1952-09620-8

S. Bubeck, R. Munos, and G. Stoltz, Pure Exploration in Multi-armed Bandits Problems, Proceedings of the 20th International Conference on Algorithmic Learning Theory (ALT), 2009.
DOI : 10.1090/S0002-9904-1952-09620-8

S. Bubeck, T. Wang, and N. Viswanathan, Multiple identifications in multi-armed bandits, Proceedings of the International Conference in Machine Learning (ICML), pp.258-265, 2013.

M. Burger and S. J. Osher, A survey on level set methods for inverse problems and optimal design, European Journal of Applied Mathematics, vol.16, issue.2, pp.263-301, 2005.
DOI : 10.1017/S0956792505006182

O. Cappé, A. Garivier, O. Maillard, R. Munos, and G. Stoltz, Kullback???Leibler upper confidence bounds for optimal sequential allocation, The Annals of Statistics, vol.41, issue.3, pp.1516-1541
DOI : 10.1214/13-AOS1119SUPP

A. Carpentier, A. Lazaric, M. Ghavamzadeh, R. Munos, and P. Auer, Upper-Confidence-Bound Algorithms for Active Learning in Multi-armed Bandits, Proceedings of the Twenty-Second International Conference on Algorithmic Learning Theory, pp.189-203, 2011.
DOI : 10.1007/978-3-642-24412-4_17

URL : https://hal.archives-ouvertes.fr/hal-00659696

W. Chu, L. Li, L. Reyzin, and R. E. Schapire, Contextual Bandits with Linear Payoff Functions, 2011.

R. Collobert and J. Weston, A unified architecture for natural language processing, Proceedings of the 25th international conference on Machine learning, ICML '08, p.106, 2008.
DOI : 10.1145/1390156.1390177

K. Crammer, M. Kearns, and J. Wortman, Learning from multiple sources, Journal of Machine Learning Research, vol.9, issue.118, pp.1757-1774, 2008.

V. Dani, T. P. Hayes, and S. M. Kakade, Stochastic Linear Optimization under Bandit Feedback, COLT 2008, pp.355-366, 2008.

E. Even-dar, S. Mannor, and Y. Mansour, Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems, Journal of Machine Learning Research, vol.7, issue.33, pp.1079-1105, 2006.

V. Fedorov, Theory of Optimal Experiments, pp.25-61, 1972.

W. A. Fuller and J. N. Rao, Estimation for a Linear Regression Model with Unknown Diagonal Covariance Matrix, The Annals of Statistics, vol.6, issue.5, pp.1149-1158, 1978.
DOI : 10.1214/aos/1176344317

V. Gabillon, M. Ghavamzadeh, A. Lazaric, and S. Bubeck, Multi-Bandit Best Arm Identification, Proceedings of the Advances in Neural Information Processing Systems 25, pp.2222-2230
URL : https://hal.archives-ouvertes.fr/hal-00632523

V. Gabillon, M. Ghavamzadeh, and A. Lazaric, Best arm identification: A unified approach to fixed budget and fixed confidence, Proceedings of the 26th Annual Conference on Neural Information Processing Systems (NIPS), pp.25-31, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00747005

C. Gentile, S. Li, and G. Zappella, Online clustering of bandits, Proceedings of the 31th International Conference on Machine Learning, ICML 2014, pp.21-26, 2014.

M. G. Azar, A. Lazaric, and B. Emma, Sequential transfer in multi-armed bandit with finite set of models, Advances in Neural Information Processing Systems 26 -NIPS, p.13, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00924025

M. D. Hoffman, B. Shahriari, and N. De-freitas, On correlation and budget constraints in model-based bandit optimization with application to automatic machine learning, Proceedings of the 17th International Conference on Artificial Intelligence and Statistics (AISTATS), pp.365-374

J. Honda and A. Takemura, An asymptotically optimal policy for finite support models in the multiarmed bandit problem, Machine Learning, pp.361-391
DOI : 10.1007/s10994-011-5257-4

K. G. Jamieson, M. Malloy, R. Nowak, and S. Bubeck, UCB : An optimal exploration algorithm for multi-armed bandits, Proceeding of the 27th Conference on Learning Theory (COLT), pp.2014-2039

C. Jennison, I. M. Johnstone, and B. W. Turnbull, Asymptotically Optimal Procedures for Sequential Adaptive Selection of the Best of Sebveral Normal Means, Statistical Decision Theory and Related Topics {III}, pp.55-86, 1982.

S. Kalyanakrishnan, A. Tewari, P. Auer, and P. Stone, PAC Subset Selection in Stochastic Multi-armed Bandits, Proceedings of the Twentieth International Conference on Machine Learning, pp.2012-2046

Z. Karnin, T. Koren, and O. Somekh, Almost Optimal Exploration in Multi-Armed Bandits, Proceedings of the Thirtieth International Conference on Machine Learning, pp.2013-2046

E. Kaufmann and S. Kalyanakrishnan, Information complexity in bandit subset selection, Proceedings of the 26th Conference on Learning Theory (COLT), pp.228-251, 2013.

E. Kaufmann, O. Cappé, and A. Garivier, On the complexity of best arm identification in multi-armed bandit models, Journal of Machine Learning Research, vol.34, issue.37, pp.25-36, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01024894

J. Kiefer and J. Wolfowitz, The equivalence of two extremum problems, Journal canadien de math??matiques, vol.12, issue.0, pp.363-366, 1960.
DOI : 10.4153/CJM-1960-030-4

I. Kuzborskij and F. Orabona, Learning by transferring from auxiliary hypotheses, pp.2014-115

T. Lai and H. Robbins, Asymptotically efficient adaptive allocation rules, Advances in Applied Mathematics, vol.6, issue.1, pp.4-22, 1985.
DOI : 10.1016/0196-8858(85)90002-8

T. L. Lai and C. Z. Wei, Least Squares Estimates in Stochastic Regression Models with Applications to Identification and Control of Dynamic Systems, The Annals of Statistics, vol.10, issue.1, pp.154-166
DOI : 10.1214/aos/1176345697

A. Lazaric and M. Restelli, Transfer from Multiple MDPs, Proceedings of the 25th Annual Conference on Neural Information Processing Systems (NIPS), pp.1746-1754, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00772620

A. Lazaric, M. Restelli, and A. Bonarini, Transfer of samples in batch reinforcement learning, Proceedings of the 25th international conference on Machine learning, ICML '08, pp.544-551, 2008.
DOI : 10.1145/1390156.1390225

L. Li, W. Chu, J. Langford, and R. E. Schapire, A contextual-bandit approach to personalized news article recommendation, Proceedings of the 19th international conference on World wide web, WWW '10, pp.661-670, 2010.
DOI : 10.1145/1772690.1772758

J. J. Lim, R. Salakhutdinov, and A. Torralba, Transfer learning by borrowing examples for multiclass object detection, NIPS, pp.118-126

A. Llamosi, A. Mezine, F. Buc, V. Letort, and M. Sebag, Experimental Design in Dynamical System Identification: A Bandit-Based Active Learning Approach, Machine Learning and Knowledge Discovery in Databases, pp.306-321, 2014.
DOI : 10.1007/978-3-662-44851-9_20

URL : https://hal.archives-ouvertes.fr/hal-01109775

O. Maillard and S. Mannor, Latent bandits, Proceedings of the 31th International Conference on Machine Learning, pp.21-26, 2014.
URL : https://hal.archives-ouvertes.fr/hal-00926281

Y. Mansour, M. Mohri, and A. Rostamizadeh, Domain adaptation: Learning bounds and algorithms, COLT 2009 -The 22nd Conference on Learning Theory, p.115, 2009.

O. Maron and A. Moore, Hoeffding races: Accelerating model selection search for classification and function approximation, Proceedings of the Advances in Neural Information Processing Systems 7, p.33, 1993.

J. Merikoski and R. Kumar, Inequalities for spreads of matrix sums and products, Applied Mathematics E-Notes, vol.4, pp.150-159, 2004.

R. Munos, From Bandits to Monte-Carlo Tree Search: The Optimistic Principle Applied to Optimization and Planning, Machine Learning, p.7
DOI : 10.1561/2200000038

URL : https://hal.archives-ouvertes.fr/hal-00747575

S. J. Pan and Q. Yang, A Survey on Transfer Learning, IEEE Transactions on Knowledge and Data Engineering, vol.22, issue.10, pp.1345-1359, 2010.
DOI : 10.1109/TKDE.2009.191

E. Paulson, A Sequential Procedure for Selecting the Population with the Largest Mean from $k$ Normal Populations, The Annals of Mathematical Statistics, vol.35, issue.1, pp.174-180
DOI : 10.1214/aoms/1177703739

A. Pentina, V. Sharmanska, and C. H. Lampert, Curriculum learning of multiple tasks, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.2015-107
DOI : 10.1109/CVPR.2015.7299188

F. Pukelsheim, Optimal Design of Experiments, Classics in Applied Mathematics . Society for Industrial and Applied Mathematics, vol.46, issue.73, pp.25-50, 2006.
DOI : 10.1137/1.9780898719109

F. Pukelsheim and B. Torsney, Optimal Weights for Experimental Designs on Linearly Independent Support Points, The Annals of Statistics, vol.19, issue.3, pp.1614-1625, 1991.
DOI : 10.1214/aos/1176348265

H. Robbins, Some aspects of the sequential design of experiments, Bulletin of the American Mathematical Society, vol.58, issue.5, pp.527-535, 1952.
DOI : 10.1090/S0002-9904-1952-09620-8

G. Sagnol, Approximation of a maximum-submodular-coverage problem involving spectral functions, with application to experimental designs, Discrete Applied Mathematics, vol.161, issue.1-2, pp.258-276, 2013.
DOI : 10.1016/j.dam.2012.07.016

R. Sibson, Discussion of a paper by hp wynn, Journal of the Royal Statistical Society. Series B, pp.181-183, 1972.

S. Silvey, Discussion of a paper by hp wynn, Journal of the Royal Statistical Society. Series B, pp.174-175, 1972.

M. Soare, A. Lazaric, and R. Munos, Active learning in linear stochastic bandits, NIPS 2013 Workshop on Bayesian Optimization in Theory and Practice, pp.2013-69

M. Soare, O. Alsharif, A. Lazaric, and J. Pineau, Multi-task linear bandits, NIPS 2014 Workshop on Transfer and Multi-task Learning, pp.2014-105

M. Soare, A. Lazaric, and R. Munos, Best-arm identification in linear bandits, Advances in Neural Information Processing Systems 27, pp.828-836
URL : https://hal.archives-ouvertes.fr/hal-01075701

N. Srinivas, A. Krause, S. Kakade, and M. Seeger, Gaussian process optimization in the bandit setting: No regret and experimental design, ICML, pp.1015-1022

M. E. Taylor and P. Stone, Transfer learning for reinforcement learning domains: A survey, Journal of Machine Learning Research, vol.10, issue.1, pp.1633-1685, 2009.

W. R. Thompson, ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES, Biometrika, vol.25, issue.3-4, pp.285-294, 1933.
DOI : 10.1093/biomet/25.3-4.285

D. M. Titterington, Optimal design: Some geometrical aspects of D-optimality, Biometrika, vol.62, issue.2, pp.313-320, 1975.
DOI : 10.1093/biomet/62.2.313

M. Todd, On minimum-volume ellipsoids, 2016.
DOI : 10.1137/1.9781611974386

D. P. Wiens and P. Li, V-optimal designs for heteroscedastic regression, Journal of Statistical Planning and Inference, vol.145, issue.100, pp.125-138, 2014.
DOI : 10.1016/j.jspi.2013.09.007

W. K. Wong and R. Cook, Heteroscedastic g-optimal designs, Journal of the Royal Statistical Society. Series B, vol.55, issue.4, pp.871-880, 1993.

K. Yu, J. Bi, and V. Tresp, Active learning via transductive experimental design, Proceedings of the 23rd international conference on Machine learning , ICML '06, pp.1081-1088, 2006.
DOI : 10.1145/1143844.1143980

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.68.3755

Y. Yu, Monotonic convergence of a general algorithm for computing optimal designs. The Annals of Statistics, pp.2010-2057