J. Abernethy, E. Hazan, and A. Rakhlin, Competing in the dark: An efficient algorithm for bandit linear optimization, 21st Annual Conference on Learning Theory -COLT 2008, pp.263-274, 2008.

A. Agarwal, P. L. Bartlett, and M. Dama, Optimal allocation strategies for the dark pool problem, AIS- TATS, volume 9 of JMLR Proceedings, pp.9-16

E. Agichtein, E. Brill, S. Dumais, and R. Ragno, Learning user interaction models for predicting web search result preferences, Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval , SIGIR '06, pp.3-10, 2006.
DOI : 10.1145/1148170.1148175

S. Agrawal and N. Goyal, Analysis of thompson sampling for the multiarmed bandit problem, Journal of Machine Learning Research -Proceedings Track, vol.23, pp.39-40, 2012.

S. Agrawal and N. Goyal, Further optimal regret bounds for thompson sampling, Proceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics, pp.99-107, 2013.
DOI : 10.1145/3088510

N. Ailon, Z. Shay-karnin, and T. Joachims, Reducing dueling bandits to cardinal bandits, ICML 2014 JMLR Proceedings, pp.856-864, 2014.

D. Angluin and P. Laird, Learning from noisy examples, Machine Learning, vol.27, issue.4, pp.343-370, 1988.
DOI : 10.1007/BF00116829

J. Y. Audibert, S. Bubeck, and R. Munos, Best arm identification in multi-armed bandits, In COLT, 2010.
URL : https://hal.archives-ouvertes.fr/hal-00654404

P. Auer, N. Cesa-bianchi, and P. Fischer, Finite-time analysis of the multiarmed bandit problem, Machine Learning, vol.47, issue.2/3, pp.235-256, 2002.
DOI : 10.1023/A:1013689704352

P. Auer, N. Cesa-bianchi, Y. Freund, E. Robert, and . Schapire, The Nonstochastic Multiarmed Bandit Problem, SIAM Journal on Computing, vol.32, issue.1, pp.48-77, 2002.
DOI : 10.1137/S0097539701398375

N. Farag-awad and M. S. Krishnan, The Personalization Privacy Paradox: An Empirical Evaluation of Information Transparency and the Willingness to Be Profiled Online for Personalization, MIS Quarterly, vol.30, issue.1, pp.13-28, 2006.
DOI : 10.2307/25148715

G. Bartók, The Role of Information in Online Learning

G. Bartók, A near-optimal algorithm for finite partial-monitoring games against adversarial opponents, Proc. COLT, 2013.

G. Bartók, D. Pál, and C. Szepesvári, Minimax regret of finite partialmonitoring games in stochastic environments, Conference on Learning Theory, 2011.

G. Bartók, D. P. Foster, D. Pál, A. Rakhlin, and C. Szepesvári, Partial Monitoring???Classification, Regret Bounds, and Algorithms, Mathematics of Operations Research, vol.39, issue.4, pp.967-997, 2014.
DOI : 10.1287/moor.2014.0663

R. E. Bechhofer, A Sequential Multiple-Decision Procedure for Selecting the Best One of Several Normal Populations with a Common Unknown Variance, and Its Use with Various Experimental Designs, Biometrics, vol.14, issue.3, pp.408-429, 1958.
DOI : 10.2307/2527883

J. Bennett and S. Lanning, The netflix prize, Proceedings of the KDD Cup Workshop, pp.3-6, 2007.

S. Bubeck and N. Cesa-bianchi, Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundations and Trends in Machine Learning, pp.1-122

S. Bubeck, R. Munos, and G. Stoltz, Pure Exploration in Multi-armed Bandits Problems, ALT, pp.23-37, 2009.
DOI : 10.1090/S0002-9904-1952-09620-8

URL : http://arxiv.org/pdf/0802.2655v2.pdf

J. A. Calandrino, A. Kilzer, A. Narayanan, E. W. Felten, and V. Shmatikov, "You Might Also Like:" Privacy Risks of Collaborative Filtering, 2011 IEEE Symposium on Security and Privacy, pp.231-246, 2011.
DOI : 10.1109/SP.2011.40

C. Callison-burch, P. Koehn, C. Monz, and O. F. Zaidan, Findings of the 2011 workshop on statistical machine translation, Proceedings of the Sixth Workshop on Statistical Machine Translation, pp.22-64, 2011.

O. Cappé, A. Garivier, O. Maillard, R. Munos, and G. Stoltz, Kullback???Leibler upper confidence bounds for optimal sequential allocation, The Annals of Statistics, vol.41, issue.3, pp.1516-1541, 2013.
DOI : 10.1214/13-AOS1119SUPP

N. Cesa-bianchi and G. Lugosi, Prediction, Learning, and Games, 2006.
DOI : 10.1017/CBO9780511546921

N. Cesa-bianchi, Y. Mansour, and G. Stoltz, Improved second-order bounds for prediction with expert advice, Machine Learning, vol.56, issue.2, pp.321-352, 2007.
DOI : 10.1007/s10994-006-5001-7

URL : https://hal.archives-ouvertes.fr/hal-00007539

N. Cesa-bianchi, G. Lugosi, and G. Stoltz, Minimizing regret with label efficient prediction, IEEE Trans. Inform. Theory, vol.51, pp.77-92, 2005.
DOI : 10.1109/tit.2005.847729

URL : https://hal.archives-ouvertes.fr/hal-00007537

T. Chan, E. Shi, and D. Song, Private and Continual Release of Statistics, ACM Transactions on Information and System Security, vol.14, issue.3, pp.1-26, 2011.
DOI : 10.1145/2043621.2043626

O. Chapelle, T. Joachims, F. Radlinski, and Y. Yue, Large-scale validation and analysis of interleaved search evaluation, ACM Transactions on Information Systems, vol.30, issue.1, p.6, 2012.
DOI : 10.1145/2094072.2094078

I. Charon and O. Hudry, An updated survey on the linear ordering problem for??weighted or??unweighted tournaments, Annals of Operations Research, vol.29, issue.2, pp.107-158, 2010.
DOI : 10.1007/BF01180541

K. Ramnath, R. G. Chellappa, and . Sin, Personalization versus privacy: An empirical examination of the online consumer's dilemma, Inf. Technol. and Management, vol.6, issue.2-3, pp.181-202, 2005.

M. J. Culnan, Protecting Privacy Online: Is Self-Regulation Working?, Journal of Public Policy & Marketing, vol.19, issue.1, pp.20-26, 2000.
DOI : 10.1509/jppm.19.1.20.16944

O. Dekel, O. Shamir, and L. Xiao, Learning to classify with missing and corrupted features, Machine Learning, pp.149-178, 2010.

F. Denis, Algorithmic Learning Theory, 9th International Conference, p.98

F. Denis, R. Gilleron, and M. Tommasi, Text Classification from Positive and Unlabeled Examples, Proceedings of the 9th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems, IPMU'02, pp.1927-1934, 2002.
URL : https://hal.archives-ouvertes.fr/inria-00538889

F. Denis, A. Laurent, R. Gilleron, and M. Tommasi, Text classification and co-training from positive and unlabeled examples, Proceedings of the ICML 2003 Workshop: The Continuum from Labeled to Unlabeled Data, pp.80-87, 2003.

F. Denis, R. Gilleron, and F. Letouzey, Learning from positive and unlabeled examples, S0304397505005256. Algorithmic Learning Theory (ALT 2000)11th International Conference, pp.70-83, 2000.
DOI : 10.1016/j.tcs.2005.09.007

URL : https://hal.archives-ouvertes.fr/inria-00538887

M. Dudík, K. Hofmann, R. E. Schapire, A. Slivkins, and M. Zoghi, Contextual dueling bandits, Proceedings of The 28th Conference on Learning Theory of Proceedings of Machine Learning Research, pp.563-587, 2015.

C. Dwork, Differential Privacy, 33rd International Colloquium on Automata, Languages and Programming, pp.1-12, 2006.
DOI : 10.1007/11787006_1

C. Dwork, The Differential Privacy Frontier (Extended Abstract), Theory of Cryptography, pp.496-502
DOI : 10.1109/FOCS.2008.38

C. Dwork, Differential privacy in new settings URL https://www.microsoft.com/en-us/research/ publication/differential-privacy-in-new-settings, Symposium on Discrete Algorithms (SODA). Society for Industrial and Applied Mathematics, 2010.

C. Dwork and A. Roth, The algorithmic foundations of differential privacy. Found. Trends Theor, Comput. Sci, vol.9, pp.211-407, 2014.

C. Dwork, F. Mcsherry, K. Nissim, and A. Smith, Calibrating Noise to Sensitivity in Private Data Analysis, Proceedings of the 3rd Theory of Cryptography Conference, pp.265-284, 2006.
DOI : 10.1007/11681878_14

C. Dwork, M. Naor, O. Reingold, G. N. Rothblum, and S. Vadhan, On the complexity of differentially private data release, Proceedings of the 41st annual ACM symposium on Symposium on theory of computing, STOC '09, pp.381-390, 2009.
DOI : 10.1145/1536414.1536467

E. Even-dar, S. Mannor, and Y. Mansour, Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems, Journal of Machine Learning Research, vol.7, pp.1079-1105, 2006.

U. Feige, Y. Mansour, and R. E. Schapire, Learning and inference in the presence of corrupted inputs, Proceedings of The 28th Conference on Learning Theory, COLT 2015, pp.637-657, 2015.

A. Flaxman, A. T. Kalai, and H. B. Mcmahan, Online convex optimization in the bandit setting: gradient descent without a gradient, 2004.

P. Dean, A. Foster, and . Rakhlin, No internal regret via neighborhood watch

B. Frénay and M. Verleysen, Classification in the Presence of Label Noise: A Survey, IEEE Transactions on Neural Networks and Learning Systems, vol.25, issue.5, pp.845-869, 2014.
DOI : 10.1109/TNNLS.2013.2292894

Y. Freund and R. E. Schapire, A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting, Journal of Computer and System Sciences, vol.55, issue.1, pp.119-139, 1997.
DOI : 10.1006/jcss.1997.1504

Y. Freund and R. E. Schapire, Adaptive Game Playing Using Multiplicative Weights, Games and Economic Behavior, vol.29, issue.1-2, pp.79-103, 1999.
DOI : 10.1006/game.1999.0738

URL : http://www.cs.princeton.edu/~schapire/papers/FreundScYY.pdf

Y. Freund and R. E. Schapire, Adaptive Game Playing Using Multiplicative Weights, Games and Economic Behavior, vol.29, issue.1-2, pp.79-103, 1999.
DOI : 10.1006/game.1999.0738

URL : http://www.cs.princeton.edu/~schapire/papers/FreundScYY.pdf

Y. Freund, R. Iyer, R. E. Schapire, and Y. Singer, An efficient boosting algorithm for combining preferences, J. Mach. Learn. Res, vol.4, pp.933-969, 2003.

P. Gajane, T. Urvoy, and F. Clérot, A relative exponential weighing algorithm for adversarial utility-based dueling bandits, Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, pp.6-11, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01225614

A. Garivier, G. Stoltz, and P. Ménard, Explore first, exploit next: The true shape of regret in bandit problems, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01276324

L. Marco-de-gemmis, P. Iaquinta, C. Lops, F. Musto, G. Narducci et al., Preference learning in recommender systems, Preference Learning (PL-09) ECML/PKDD-09 Workshop, 2009.

A. Globerson and S. Roweis, Nightmare at test time, Proceedings of the 23rd international conference on Machine learning , ICML '06, pp.353-360, 2006.
DOI : 10.1145/1143844.1143889

J. Harper, It's modern trade: Web users get as much as they give, The Wall Street Journal, 2010.

P. Jain, P. Kothari, and A. Thakurta, Differentially private online learning, COLT 2012 -The 25th Annual Conference on Learning Theory, pp.24-25, 2012.

T. Joachims, L. Granka, B. Pan, H. Hembrooke, F. Radlinski et al., Evaluating the accuracy of implicit feedback from clicks and query reformulations in Web search, ACM Transactions on Information Systems, vol.25, issue.2, 2007.
DOI : 10.1145/1229179.1229181

A. Kalai and S. Vempala, Efficient algorithms for online decision problems, Journal of Computer and System Sciences, vol.71, issue.3, pp.291-307, 2005.
DOI : 10.1016/j.jcss.2004.10.016

URL : http://www-math.mit.edu/~vempala/papers/online.ps

Z. Karnin, T. Koren, and O. Somekh, Almost optimal exploration in multi-armed bandits, Proceedings of the 30th International Conference on Machine Learning (ICML-13) Conference Proceedings, pp.1238-1246, 2013.

M. Richard, R. Karp, and . Kleinberg, Noisy binary search and its applications, SODA 2007, SIAM Proceedings, pp.881-890, 2007.

E. Kaufmann, O. Cappé, and A. Garivier, On the complexity of bestarm identification in multi-armed bandit models, Journal of Machine Learning Research, vol.17, issue.1, pp.1-42, 2016.

M. Kearns and M. Li, Learning in the Presence of Malicious Errors, SIAM Journal on Computing, vol.22, issue.4, pp.807-837, 1993.
DOI : 10.1137/0222052

J. Kim, A method for limiting disclosure in microdata based on random noise and transformation, Proceedings of the Survey Research Methods, pp.370-374, 1986.

J. J. Kim, J. J. Kim, W. E. Winkler, and W. E. Winkler, Multiplicative noise for masking continuous data, Statistical Research Division, US Bureau of the Census, 2003.

D. Robert, F. Kleinberg, and . Leighton, The value of knowing a demand curve: Bounds on regret for online posted-price auctions, FOCS, pp.594-605, 2003.

A. Korolova, Privacy Violations Using Microtargeted Ads: A Case Study, 2010 IEEE International Conference on Data Mining Workshops, pp.474-482, 2010.
DOI : 10.1109/ICDMW.2010.137

M. Kosinski, D. Stillwell, and T. Graepel, Private traits and attributes are predictable from digital records of human behavior, Proceedings of the National Academy of Sciences, pp.5802-5805, 2013.
DOI : 10.1111/1467-6494.05008

T. L. Lai and H. Robbins, Asymptotically efficient adaptive allocation rules, Advances in Applied Mathematics, vol.6, issue.1, pp.4-22, 1985.
DOI : 10.1016/0196-8858(85)90002-8

URL : https://doi.org/10.1016/0196-8858(85)90002-8

W. Sun, L. , and B. Liu, Learning with positive and unlabeled examples using weighted logistic regression, Machine Learning, Proceedings of the Twentieth International Conference, pp.448-455, 2003.

W. Sun, L. , and B. Liu, Learning with positive and unlabeled examples using weighted logistic regression, Proceedings of the Twentieth International Conference on Machine Learning (ICML, p.2003, 2003.

X. Li and B. Liu, Learning to classify texts using positive and unlabeled data, Proceedings of the 18th International Joint Conference on Artificial Intelligence, IJ- CAI'03, pp.587-592, 2003.

Y. Lindell and E. Omri, A practical application of differential privacy to personalized online advertising, IACR Cryptology ePrint Archive, vol.152, p.152, 2011.

N. Littlestone and M. K. Warmuth, The Weighted Majority Algorithm, Information and Computation, vol.108, issue.2, pp.212-261, 1994.
DOI : 10.1006/inco.1994.1009

URL : https://doi.org/10.1006/inco.1994.1009

L. Michael and . Littman, Algorithms for sequential decision making, 1996.

B. Liu, W. Sun-lee, P. S. Yu, and X. Li, Partially supervised classification of text documents, Proceedings of the Nineteenth International Conference on Machine Learning, ICML '02, pp.387-394, 2002.

B. Liu, Y. Dai, X. Li, W. Sun-lee, and P. S. Yu, Building text classifiers using positive and unlabeled examples, Third IEEE International Conference on Data Mining, p.179, 2003.
DOI : 10.1109/ICDM.2003.1250918

URL : http://array.bioengr.uic.edu/~yangdai/pub/liub_classifiers.pdf

T. Liu, Learning to Rank for Information Retrieval, Foundations and Trends?? in Information Retrieval, vol.3, issue.3, pp.225-331, 2009.
DOI : 10.1561/1500000016

T. Liu, J. Xu, T. Qin, W. Xiong, and H. Li, LETOR: Benchmark dataset for research on learning to rank for information retrieval, SIGIR, 2007.

N. Mishra and A. Thakurta, (nearly) optimal differentially private stochastic multi-arm bandits, Proceedings of the Thirty-First Conference on Uncertainty in Artificial Intelligence, UAI 2015, pp.592-601, 2015.

K. Mivule, Utilizing noise addition for data privacy, an overview

N. Natarajan, S. Inderjit, . Dhillon, K. Pradeep, A. Ravikumar et al., Learning with noisy labels, Advances in Neural Information Processing Systems 26, pp.1196-1204, 2013.

E. Paulson, A Sequential Procedure for Selecting the Population with the Largest Mean from $k$ Normal Populations, The Annals of Mathematical Statistics, vol.35, issue.1, pp.174-180
DOI : 10.1214/aoms/1177703739

A. Piccolboni and C. Schindelhauer, Discrete prediction games with arbitrary feedback and loss, COLT/EuroCOLT, pp.208-223, 2001.
DOI : 10.1007/3-540-44581-1_14

F. Radlinski and T. Joachims, Active exploration for learning rankings from clickthrough data, Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining , KDD '07, pp.570-579, 2007.
DOI : 10.1145/1281192.1281254

H. Robbins, Some aspects of the sequential design of experiments, Bulletin of the American Mathematical Society, vol.58, issue.5, pp.527-535, 1952.
DOI : 10.1090/S0002-9904-1952-09620-8

H. Robbins and S. Monro, A Stochastic Approximation Method, The Annals of Mathematical Statistics, vol.22, issue.3, pp.400-407
DOI : 10.1214/aoms/1177729586

F. Rosenblatt, The perceptron: A probabilistic model for information storage and organization in the brain., Psychological Review, vol.65, issue.6, pp.65-386, 1958.
DOI : 10.1037/h0042519

A. José, M. Sáez, J. Galar, F. Luengo, and . Herrera, Analyzing the presence of noise in multi-class problems: alleviating its influence with the one-vs-one decomposition, Knowledge and Information Systems, vol.38, issue.1, pp.179-206, 2014.

Y. Seldin, C. Szepesvári, P. Auer, and Y. Abbasi-yadkori, Evaluation and analysis of the performance of the exp3 Algorithm in stochastic environments, EWRL JMLR Proceedings, pp.103-116, 2012.

A. Guha, T. , and A. D. Smith, (nearly) optimal algorithms for private online learning in full-information and bandit settings, Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting, pp.2733-2741, 2013.

W. R. Thompson, On the likelihood that one unknown probability exceeds another in view of the evidence of two samples, Bulletin of the AMS, vol.25, pp.285-294, 1933.

C. Y. Aristide, C. Tossou, and . Dimitrakakis, Algorithms for differentially private multi-armed bandits, 13th International Conference on Artificial Intelligence, p.2016, 2016.

T. Urvoy, F. Clerot, R. Féraud, and S. Naamane, Generic exploration and K-armed voting bandits, ICML 2013 JMLR Proceedings, pp.91-99, 2013.

T. Urvoy, F. Clerot, R. Féraud, and S. Naamane, Generic exploration and k-armed voting bandits, Proceedings of the 30th International Conference on Machine Learning (ICML- 13) Conference Proceedings, pp.91-99, 2013.

A. Wald, On Cumulative Sums of Random Variables, The Annals of Mathematical Statistics, vol.15, issue.3, pp.283-296, 1177731235.
DOI : 10.1214/aoms/1177731235

Y. Wang, X. Wu, and D. Hu, Using randomized response for differential privacy preserving data collection, Proceedings of the Workshops of the EDBT/ICDT 2016 Joint Conference, EDBT/ICDT Workshops 2016, 2016.

L. Stanley and . Warner, Randomized Response: A Survey Technique for Eliminating Evasive Answer Bias, Journal of the American Statistical Association, vol.60, issue.309, p.63, 1965.

H. Wu and X. Liu, Double thompson sampling for dueling bandits, Advances in Neural Information Processing Systems 29, p.649

C. Associates and . Inc, URL http://papers.nips.cc/paper/ 6157-double-thompson-sampling-for-dueling-bandits, 2016.

H. Yu, J. Han, and K. Chang, PEBL, Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining , KDD '02, pp.239-248, 2002.
DOI : 10.1145/775047.775083

Y. Yue and T. Joachims, Beat the mean bandit, ICML 2011, pp.241-248, 2011.

Y. Yue, J. Broder, R. Kleinberg, and T. Joachims, The K-armed dueling bandits problem, Journal of Computer and System Sciences, vol.78, issue.5, pp.1538-1556, 2012.
DOI : 10.1016/j.jcss.2011.12.028

Y. Yue and T. Joachims, Interactively optimizing information retrieval systems as a dueling bandits problem, Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, pp.1201-1208, 2009.
DOI : 10.1145/1553374.1553527

B. Zhang and W. Zuo, Learning from Positive and Unlabeled Examples: A Survey, 2008 International Symposiums on Information Processing, pp.650-654, 2008.
DOI : 10.1109/ISIP.2008.79

D. Zhang, A simple probabilistic approach to learning from positive and unlabeled examples, Proc. of the 5th Annual UK Workshop on Computational Intelligence, 2005.

X. Zhu and X. Wu, Class Noise vs. Attribute Noise: A Quantitative Study, Artificial Intelligence Review, vol.3, issue.4, pp.177-210, 2003.
DOI : 10.1080/07421222.1996.11518099

M. Zinkevich, Online convex programming and generalized infinitesimal gradient ascent, Proceedings of the Twentieth International Conference (ICML 2003), pp.928-936, 2003.

M. Zinkevich, Online convex programming and generalized infinitesimal gradient ascent, Proceedings of the Twentieth International Conference on International Conference on Machine Learning, ICML'03, pp.928-935, 2003.

M. Zoghi, S. Whiteson, R. Munos, and M. De-rijke, Relative upper confidence bound for the k-armed dueling bandit problem, ICML 2014 JMLR Proceedings, pp.10-18, 2014.

M. Zoghi, S. A. Whiteson, R. Maarten-de-rijke, and . Munos, Relative confidence sampling for efficient on-line ranker evaluation, Proceedings of the 7th ACM international conference on Web search and data mining, WSDM '14, pp.73-82
DOI : 10.1145/2556195.2556256

M. Zoghi, A. Shimon, . Whiteson, R. Maarten-de-rijke, and . Munos, Relative confidence sampling for efficient on-line ranker evaluation, Proceedings of the 7th ACM international conference on Web search and data mining, WSDM '14, pp.73-82, 2014.
DOI : 10.1145/2556195.2556256

M. Zoghi, S. Whiteson, and M. De-rijke, MergeRUCB, Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, WSDM '15, pp.17-26
DOI : 10.1016/j.jcss.2011.12.028

M. Zoghi, S. Whiteson, and M. De-rijke, MergeRUCB, Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, WSDM '15, 2015.
DOI : 10.1016/j.jcss.2011.12.028