Data-driven evaluation of Contextual Bandit algorithms and applications to Dynamic Recommendation

Olivier Nicol

Thèse Année : 2014

Data-driven evaluation of Contextual Bandit algorithms and applications to Dynamic Recommendation

Évaluation par les données d'algorithmes de bandits contextuels avec applications à la recommandation dynamique

(1, 2)

1
2

Olivier Nicol

Fonction : Auteur

Sequential Learning

Laboratoire d'Informatique Fondamentale de Lille

Résumé

The context of this thesis work is dynamic recommendation. Recommendation is the action, for an intelligent system, to supply a user of an application with personalized content so as to enhance what is refered to as ”user experience” e.g. recommending a product on a merchant website or even an article on a blog. Recommendation is considered dynamic when the content to recommend or user tastes evolve rapidly e.g. news recommendation. Many applications that are of interest to us generates a tremendous amount of data through the millions of online users they have. Nevertheless, using this data to evaluate a new recommendation technique or even compare two dynamic recommendation algorithms is far from trivial. This is the problem we consider here. Some approaches have already been proposed. Nonetheless they were not studied very thoroughly both from a theoretical point of view (unquantified bias, loose convergence bounds...) and from an empirical one (experiments on private data only). In this work we start by filling many blanks within the theoretical analysis. Then we comment on the result of an experiment of unprecedented scale in this area: a public challenge we organized. This challenge along with a some complementary experiments revealed a unexpected source of a huge bias: time acceleration. The rest of this work tackles this issue. We show that a bootstrap-based approach allows to significantly reduce this bias and more importantly to control it.

Ce travail de th`ese a ́et ́e r ́ealis ́e dans le contexte de la recommandation dynamique. La recom- mandation est l’action de fournir du contenu personnalis ́e `a un utilisateur utilisant une ap- plication, dans le but d’am ́eliorer son utilisation e.g. la recommandation d’un produit sur un site marchant ou d’un article sur un blog. La recommandation est consid ́er ́ee comme dynamique lorsque le contenu a ` recommander ou encore les goˆ uts des utilisateurs ́evoluent rapidement e.g. la recommandation d’actualit ́es. Beaucoup d’applications auxquelles nous nous int ́eressons g ́en`erent d’ ́enormes quantit ́es de donn ́ees grˆace `a leurs millions d’utilisateurs sur Internet. N ́eanmoins, l’utilisation de ces donn ́ees pour ́evaluer une nouvelle technique de recommandation ou encore comparer deux algorithmes de recommandation est loin d’ˆetre triv- iale. C’est cette probl ́ematique que nous consid ́erons ici. Certaines approches ont d ́ej`a ́et ́e propos ́ees. N ́eanmoins elles sont tr`es peu ́etudi ́ees autant th ́eoriquement (biais non quantifi ́e, borne de convergence assez large...) qu’empiriquement (exp ́eriences sur donn ́ees priv ́ees). Dans ce travail nous commen ̧cons par combler de nombreuses lacunes de l’analyse th ́eorique. En- suite nous discutons les r ́esultats tr`es surprenants d’une exp ́erience `a tr`es grande ́echelle : une comp ́etition ouverte au public que nous avons organis ́ee. Cette comp ́etition nous a permis de mettre en ́evidence une source de biais consid ́erable et constamment pr ́esente en pratique : l’acc ́el ́eration temporelle. La suite de ce travail s’attaque a ` ce probl`eme. Nous montrons qu’une approche `a base de bootstrap permet de r ́eduire mais surtout de contrˆoler ce biais.

Mots clés

(Contextual) bandit games Offline evaluation Data-driven evaluation Non stationary environ- ment Recommendation News recommendation Dynamic recommendation Bias/variance/- concentration analysis Bootstrapping (statistical method of estimation of estimator properties) Cross-validation Bayesian inference Bias versus Variance trade-off Exploration versus Exploitation dilemma Entangled validation Data expansion Replay methodologies.

Jeux de bandits (contextuels) Evaluation hors ligne Evaluation bas ́ee sur les donn ́ees Envi- ronement non stationnaire Recommandation Recommandation d’articles de journaux Recom- mandation dynamique Analyse de biais/variance/concentration Bootstrap (m ́ethode statis- tique d’estimation de propri ́et ́es d’un estimateur) Validation crois ́ee Inf ́erence Bay ́esienne Classification Biais contre variance Dilemme entre exploration et exploitation Validation en- tremˆel ́ee Expansion de donn ́ees M ́ethodes de rejeu.

Domaines

Machine Learning [stat.ML]

Fichier principal

phd_nicol.pdf (8.79 Mo)

Preux Philippe : Connectez-vous pour contacter le contributeur

https://theses.hal.science/tel-01297407

Soumis le : lundi 4 avril 2016-11:58:04

Dernière modification le : vendredi 24 mars 2023-14:53:02

Archivage à long terme le : lundi 14 novembre 2016-15:45:24

Dates et versions

tel-01297407 , version 1 (04-04-2016)

Identifiants

HAL Id : tel-01297407 , version 1

Citer

Olivier Nicol. Data-driven evaluation of Contextual Bandit algorithms and applications to Dynamic Recommendation. Machine Learning [stat.ML]. Université de Lille I, 2014. English. ⟨NNT : ⟩. ⟨tel-01297407⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-LILLE3 CNRS INRIA LAGIS CRISTAL INRIA2 CRISTAL-SEQUEL

350 Consultations

1080 Téléchargements

Data-driven evaluation of Contextual Bandit algorithms and applications to Dynamic Recommendation

Évaluation par les données d'algorithmes de bandits contextuels avec applications à la recommandation dynamique

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager