Probability and Statistics news " [PS-news:http://groups.google.fr/group/maths-ps-news] in order to help broadcasting job announcements or conference events related to mathematical probability and statistics, like the google group " Machine-Learning news " [MLnews:http://groups .google.fr/group/ml-news] does successfully for the machine learning community. The goal is here to provide a tool in order to facilitate inter and intracommunication for the two strong communities of Probability and of Statistics at a worldscale level ,
Relative entropy inverse reinforcement learning, Proceedings of the 14th international conference on Artificial Intelligence and Statistics, 2011. ,
Optimal strategies and minimax lower bounds for online convex games, Servedio and Zhang, p.65, 2008. ,
Competing in the dark: An efficient algorithm for bandit linear optimization, pp.263-274, 2008. ,
Database-friendly random projections: Johnson-Lindenstrauss with binary coins, Journal of Computer and System Sciences, vol.66, issue.4, pp.671-687, 2003. ,
DOI : 10.1016/S0022-0000(03)00025-4
Application de l'apprentissage par renforcement à la gestion du risque, Conférences Francophone sur l'Apprentissage Automatique, 2003. ,
Approximate nearest neighbors and the fast johnsonlindenstrauss transform, Proceedings of the 38th annual ACM Symposium on Theory of computing, STOC '06, pp.557-563, 2006. ,
Hannan Consistency in On-Line Learning in Case of Unbounded Losses Under Partial Monitoring, Proceedings of the 17th international conference on Algorithmic Learning Theory, pp.229-243, 2006. ,
DOI : 10.1007/11894841_20
PAC-Bayesian bounds for randomized empirical risk minimizers, Mathematical Methods of Statistics, vol.17, issue.4, pp.279-304, 2008. ,
DOI : 10.3103/S1066530708040017
URL : https://hal.archives-ouvertes.fr/hal-00354922
PAC-Bayesian bounds for sparse regression estimation with exponential weights, Electronic Journal of Statistics, vol.5, issue.0, 2010. ,
DOI : 10.1214/11-EJS601
URL : https://hal.archives-ouvertes.fr/hal-00465801
Methods of Information Geometry, volume 191 of Translations of Mathematical monographs, 2000. ,
Learning on graph with laplacian regularization, pp.25-32, 2007. ,
An introduction to mcmc for machine learning, Machine Learning, pp.5-43, 1969. ,
Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path, Machine Learning, pp.89-129, 2008. ,
URL : https://hal.archives-ouvertes.fr/hal-00830201
Random Wavelet Series, Communications in Mathematical Physics, vol.227, issue.3, pp.483-514, 2002. ,
DOI : 10.1007/s002200200630
URL : https://hal.archives-ouvertes.fr/hal-00012098
Combining pac-bayesian and generic chaining bounds, Journal of Machine Learning Research, vol.8, issue.116, pp.863-889, 2007. ,
Minimax policies for adversarial and stochastic bandits, In Dasgupta and Klivans, p.85, 2009. ,
URL : https://hal.archives-ouvertes.fr/hal-00834882
Regret bounds and minimax policies under partial monitoring, Journal of Machine Learning Research, vol.11, issue.37, pp.2785-2836, 2010. ,
URL : https://hal.archives-ouvertes.fr/hal-00654356
Robust linear regression through PAC-Bayesian truncation, p.142, 2010. ,
Robust linear least squares regression, 48 pages 62J05, 62J07, 2010b. 117, p.119 ,
DOI : 10.1214/11-AOS918SUPP
URL : https://hal.archives-ouvertes.fr/hal-00522534
Exploration???exploitation tradeoff using variance estimates in multi-armed bandits, Theoretical Computer Science, vol.410, issue.19, pp.1876-1902, 2009. ,
DOI : 10.1016/j.tcs.2009.01.016
URL : https://hal.archives-ouvertes.fr/hal-00711069
Using confidence bounds for exploitation-exploration trade-offs, Journal of Machine Learning Research, vol.3, issue.22, pp.397-422, 2003. ,
Logarithmic online regret bounds for undiscounted reinforcement learning, Proceedings of the 20th conference on advances in Neural Information Processing Systems, NIPS '06, pp.49-56, 2006. ,
UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem, Periodica Mathematica Hungarica, vol.5, issue.1-2, pp.55-65, 2010. ,
DOI : 10.1007/s10998-010-3055-6
Gambling in a rigged casino: The adversarial multi-armed bandit problem, Proceedings of IEEE 36th Annual Foundations of Computer Science, pp.322-331, 1995. ,
DOI : 10.1109/SFCS.1995.492488
Adaptive and Self-Confident On-Line Learning Algorithms, Journal of Computer and System Sciences, vol.64, issue.1, p.68, 2000. ,
DOI : 10.1006/jcss.2001.1795
Finite-time analysis of the multiarmed bandit problem, Machine Learning, vol.47, issue.2/3, pp.235-256, 2002. ,
DOI : 10.1023/A:1013689704352
The Nonstochastic Multiarmed Bandit Problem, SIAM Journal on Computing, vol.32, issue.1, pp.48-77, 2003. ,
DOI : 10.1137/S0097539701398375
Online linear optimization and adaptive routing, Journal of Computer and System Sciences, vol.74, issue.1, pp.97-114, 2008. ,
DOI : 10.1016/j.jcss.2007.04.016
URL : http://doi.org/10.1016/j.jcss.2007.04.016
Weighted sums of certain dependent random variables, Tohoku Mathematical Journal, vol.19, issue.3, pp.357-367, 1967. ,
DOI : 10.2748/tmj/1178243286
Residual algorithms: Reinforcement learning with function approximation, Proceedings of the 12th International Conference on Machine Learning, ICML '95, pp.30-37, 1995. ,
A Simple Proof of the Restricted Isometry Property for Random Matrices, Constructive Approximation, vol.159, issue.2, pp.253-263, 2008. ,
DOI : 10.1007/s00365-007-9003-x
Approximation and learning by greedy algorithms, The Annals of Statistics, vol.36, issue.1, pp.64-94, 2008. ,
DOI : 10.1214/009053607000000631
Regal: a regularization based algorithm for reinforcement learning in weakly communicating mdps, Proceedings of the 25th conference on Uncertainty in Artificial Intelligence, UAI '09, pp.35-42, 2009. ,
Convexity, classification, and risk bounds, Journal of the American Statistical Association, p.183, 2003. ,
Adaptive online gradient descent, pp.65-72, 2007. ,
High-probability regret bounds for bandit online linear optimization, Servedio and Zhang, pp.335-342, 2008. ,
Reinforcement learning and chess, pp.91-116, 2001. ,
Regularization and Semi-supervised Learning on Large Graphs, Proceedings of the 17th annual Conference On Learning Theory, pp.624-638, 2004. ,
DOI : 10.1007/978-3-540-27819-1_43
On Manifold Regularization, Proceedings of the 8th international conference on Artificial Intelligence and Statistics, AI&Stats '05, pp.181-193, 2005. ,
Relating clustering stability to properties of cluster boundaries, pp.379-390, 0199. ,
A Sober Look at Clustering Stability, pp.5-19, 2006. ,
DOI : 10.1007/11776420_4
Probability Inequalities for the Sum of Independent Random Variables, Journal of the American Statistical Association, vol.18, issue.297, pp.33-45, 1962. ,
DOI : 10.1214/aoms/1177730437
Exponential inequalities for self-normalized martingales with applications, The Annals of Applied Probability, vol.18, issue.5, pp.1848-1869, 2008. ,
DOI : 10.1214/07-AAP506
URL : https://hal.archives-ouvertes.fr/hal-00165219
Reinforcement learning for weakly coupled mdps and an application to planetary rover control, 2001. ,
On a modification of chebyshev's inequality and of the error formula of laplace. Original publication, Ann. Sci. Inst. Sav. Ukraine, Sect. Math, vol.1, issue.31, p.108, 1924. ,
Bandit problems with infinitely many arms, The Annals of Statistics, vol.25, issue.5, pp.2103-2116, 1997. ,
DOI : 10.1214/aos/1069362389
Small ball estimates for Brownian motion under a weighted sup-norm, Studia Sci. Math. Hung, pp.1-2, 2001. ,
DOI : 10.1556/SScMath.36.2000.1-2.17
Stochastic Optimal Control (The Discrete Time Case), p.208, 1978. ,
Neuro-Dynamic Programming, Athena Scientific, vol.208, p.209, 0205. ,
Minimum Contrast Estimators on Sieves: Exponential Bounds and Rates of Convergence, Bernoulli, vol.4, issue.3, pp.329-375, 1998. ,
DOI : 10.2307/3318720
Pattern Recognition and Machine Learning (Information Science and Statistics), p.193, 2006. ,
On the rate of convergence of regularized boosting classifiers, Journal of Machine Learning Research, vol.4, pp.861-894, 2003. ,
Statistical performance of support vector machines. The Annals of Statistics, pp.489-531, 2008. ,
Learning from labeled and unlabeled data using graph mincuts, Proceedings of the 18th International Conference on Machine Learning, ICML '01, pp.19-26, 2001. ,
From External to Internal Regret, pp.621-636, 2005. ,
DOI : 10.1007/11503415_42
From External to Internal Regret, Journal of Machine Learning Research, vol.8, pp.1307-1324, 2007. ,
DOI : 10.1007/11503415_42
Combining labeled and unlabeled data with co-training, Proceedings of the eleventh annual conference on Computational learning theory , COLT' 98, pp.92-100, 1998. ,
DOI : 10.1145/279943.279962
Theory of Classification: a Survey of Some Recent Advances, ESAIM: Probability and Statistics, vol.9, pp.323-375, 2005. ,
DOI : 10.1051/ps:2005018
URL : https://hal.archives-ouvertes.fr/hal-00017923
Ondelettes et espaces de Besov, Revista Matem??tica Iberoamericana, vol.11, issue.3, pp.477-512, 1995. ,
DOI : 10.4171/RMI/181
Least-squares temporal difference learning, Proceedings of the 16th International Conference on Machine Learning, pp.49-56, 1999. ,
Linear least-squares algorithms for temporal difference learning, Machine Learning Journal, vol.22, pp.33-57, 1996. ,
R-max -a general polynomial time algorithm for near-optimal reinforcement learning, Journal of Machine Learning Research, vol.3, pp.213-231, 2003. ,
Efficient co-regularised least squares regression, Proceedings of the 23rd international conference on Machine learning , ICML '06, pp.137-144, 2006. ,
DOI : 10.1145/1143844.1143862
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.68.7014
Bandits Games and Clustering Foundations, pp.95-96, 2010. ,
URL : https://hal.archives-ouvertes.fr/tel-00845565
Online optimization of X-armed bandits, p.65, 2008. ,
Pure Exploration in Multi-armed Bandits Problems, pp.23-37, 2009. ,
DOI : 10.1090/S0002-9904-1952-09620-8
Sparse grids, Acta Numerica, vol.13, p.150, 2004. ,
Optimal Adaptive Policies for Sequential Allocation Problems, Advances in Applied Mathematics, vol.17, issue.2, pp.122-142, 1996. ,
DOI : 10.1006/aama.1996.0007
The restricted isometry property and its implications for compressed sensing, Comptes Rendus Mathematique, vol.346, issue.9-10, pp.589-592, 2008. ,
DOI : 10.1016/j.crma.2008.03.014
Sparsity and incoherence in compressive sampling, Inverse Problems, vol.23, issue.163, pp.969-985, 2007. ,
The Dantzig selector: statistical estimation when p is much larger than n, Annals of Statistics, vol.35, issue.6, pp.2313-2351, 2007. ,
Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information, IEEE Transactions on Information Theory, vol.52, issue.2, pp.489-509, 2006. ,
DOI : 10.1109/TIT.2005.862083
Stable signal recovery from incomplete and inaccurate measurements, Communications on Pure and Applied Mathematics, vol.7, issue.8, pp.1207-1223, 2006. ,
DOI : 10.1002/cpa.20124
Robust principal component analysis? CoRR, abs/0912, 2009. ,
Functional learning through kernel. arXiv, oct, p.135, 2009. ,
Optimal Rates for the Regularized Least-Squares Algorithm, Foundations of Computational Mathematics, vol.7, issue.3, pp.331-368, 2007. ,
DOI : 10.1007/s10208-006-0196-8
Statistical Learning Theory and Stochastic Optimization, p.140, 2004. ,
DOI : 10.1007/b99352
URL : https://hal.archives-ouvertes.fr/hal-00104952
Model selection via cross-validation in density estimation, regression, and change-points detection, 0199. ,
URL : https://hal.archives-ouvertes.fr/tel-00346320
Potential-based algorithms in on-line prediction and game theory, Machine Learning, vol.51, issue.3, pp.239-261, 2003. ,
DOI : 10.1023/A:1022901500417
Prediction, Learning, and Games, p.68, 2006. ,
DOI : 10.1017/CBO9780511546921
Combinatorial bandits, Journal of Computer and System Sciences, vol.78, issue.5, p.64, 2009. ,
DOI : 10.1016/j.jcss.2012.01.001
How to use expert advice, Proceedings of the 25th annual ACM Symposium on Theory Of Computing, STOC '93, pp.382-391, 1993. ,
How to use expert advice, Journal of the ACM, vol.44, issue.3, pp.427-485, 1997. ,
DOI : 10.1145/258128.258179
Minimizing Regret With Label Efficient Prediction, IEEE Transactions on Information Theory, vol.51, issue.6, pp.77-92, 2005. ,
DOI : 10.1109/TIT.2005.847729
URL : https://hal.archives-ouvertes.fr/hal-00007537
Mortal multi-armed bandits, pp.273-280, 2008. ,
Online model learning in adversarial markov decision processes International Foundation for Autonomous Agents and Multiagent Systems, Proceedings of the 9th international conference on Autonomous Agents and Multiagent Systems, pp.1583-1584, 2010. ,
Probability Theory, p.45, 1988. ,
Bandit problems with side observations, IEEE Transactions on Automatic Control, vol.50, issue.3, pp.338-355, 2005. ,
DOI : 10.1109/TAC.2005.844079
Matrix multiplication via arithmetic progressions, Proceedings of the nineteenth annual ACM conference on Theory of computing , STOC '87, pp.1-6, 1987. ,
DOI : 10.1145/28395.28396
URL : http://doi.org/10.1016/s0747-7171(08)80013-2
Computing Elo ratings of move patterns in the game of Go, ICGA Journal, vol.30, issue.4, pp.198-208, 2007. ,
URL : https://hal.archives-ouvertes.fr/inria-00149859
Elements of information theory, 1991. ,
Sur un nouveau th??or??me-limite de la th??orie des probabilit??s, Actualités Scientifiques et Industrielles, vol.736, pp.5-23, 1938. ,
DOI : 10.1007/978-3-642-40607-2_8
Improving elevator performance using reinforcement learning, Advances in Neural Information Processing Systems, pp.1017-1023, 1996. ,
Sanov property, generalized I-projection and a conditional limit theorem. The Annals of Probability, pp.768-793, 1984. ,
Sparse regression learning by aggregation and langevin monte-carlo. Arxiv preprint arXiv:0903.1223, p.164, 2009. ,
URL : https://hal.archives-ouvertes.fr/hal-00362471
Aggregation by exponential weighting, sharp pac-bayesian bounds and sparsity, Machine Learning Journal, vol.72, pp.39-61, 2008. ,
The price of bandit information for online optimization, pp.345-352, 2008. ,
Stochastic linear optimization under bandit feedback, pp.355-366, 2008. ,
Random projection trees and low dimensional manifolds, Proceedings of the fourtieth annual ACM symposium on Theory of computing, STOC 08, pp.537-546, 2008. ,
DOI : 10.1145/1374376.1374452
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.117.3236
An elementary proof of a theorem of johnson and lindenstrauss. Random Struct, Algorithms, vol.22, pp.60-65, 0114. ,
A general class of exponential inequalities for martingales and ratios. The Annals of Probability, pp.537-564, 0114. ,
Self-normalized processes: Exponential inequalities, moment bounds and iterated logarithm laws. The Annals of Probability, pp.1902-1933, 2004. ,
A flexible noise model for designing maps, Proceedings of the Vision Modeling and Visualization Conference 2001, VMV '01, pp.299-308 ,
Feynman-Kac formulae : genealogical and interacting particle systems with applications, 2004. ,
Nonparametric regression with martingale increment errors, Stochastic Processes and their Applications, vol.121, issue.12, p.114, 2010. ,
DOI : 10.1016/j.spa.2011.08.002
URL : https://hal.archives-ouvertes.fr/hal-00530581
Large deviations techniques and applications. Elearn, p.110, 1998. ,
DOI : 10.1007/978-1-4612-5320-4
Nonlinear approximation, Acta Numerica, vol.41, issue.2, p.148, 1997. ,
DOI : 10.1007/BF02274662
A Probabilistic Theory of Pattern Recognition, p.78, 1996. ,
DOI : 10.1007/978-1-4612-0711-5
Mesures dominantes et théorème de Sanov Annales de l'Institut Henri Poincaré ? Probabilités et Statistiques, pp.365-373, 1992. ,
Compressed sensing, IEEE Transactions on Information Theory, vol.52, issue.173, pp.1289-1306, 2006. ,
Uncertainty principles and signal recovery, SIAM Journal on Applied Mathematics, vol.49, issue.3, pp.906-931, 1989. ,
Minimum variance importance sampling via population monte carlo, Esaim P&S, issue.11, p.70, 2007. ,
URL : https://hal.archives-ouvertes.fr/inria-00070316
Real Analysis and Probability, p.124, 1989. ,
DOI : 10.1017/CBO9780511755347
Random Wavelet Series Based on a Tree-Indexed Markov Chain, Communications in Mathematical Physics, vol.41, issue.12, pp.451-477, 2008. ,
DOI : 10.1007/s00220-008-0504-7
Eigenvalues and Condition Numbers of Random Matrices, SIAM Journal on Matrix Analysis and Applications, vol.9, issue.4, pp.543-560, 1988. ,
DOI : 10.1137/0609045
Clinical data based optimal STI strategies for HIV: a reinforcement learning approach, Proceedings of the 45th IEEE Conference on Decision and Control, pp.65-72, 2006. ,
DOI : 10.1109/CDC.2006.377527
URL : https://hal.archives-ouvertes.fr/hal-00121732
Regularized policy iteration, pp.441-448, 2008. ,
Regularized fitted Q-iteration for planning in continuous-space Markovian decision problems, Proceedings of the American Control Conference, p.226, 2009. ,
Error propagation for approximate policy and value iteration, p.218, 2010. ,
Best Pinsker Bound equals Taylor Polynomial of Degree $49$, Computational Technologies, vol.8, issue.111, pp.3-14, 2003. ,
Stratégies optimistes en apprentissage par renforcement, pp.37-43, 2010. ,
Online convex optimization in the bandit setting: gradient descent without a gradient, Proceedings of the 16th annual ACM-SIAM Symposium On Discrete Algorithms, SODA '05, pp.385-394, 2005. ,
A pac-style model for learning from labeled and unlabeled data, pp.111-126, 2005. ,
Compressive Sensing ,
DOI : 10.1007/978-3-642-27795-5_6-5
Asymptotic calibration, Biometrika, vol.85, issue.23, pp.379-390, 1996. ,
Regret in the on-line decision problem, Games and Economic Behavior, vol.29, issue.24, pp.7-35, 1999. ,
Sparsest solutions of underdetermined linear systems via <mml:math altimg="si1.gif" overflow="scroll" xmlns:xocs="http://www.elsevier.com/xml/xocs/dtd" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.elsevier.com/xml/ja/dtd" xmlns:ja="http://www.elsevier.com/xml/ja/dtd" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:tb="http://www.elsevier.com/xml/common/table/dtd" xmlns:sb="http://www.elsevier.com/xml/common/struct-bib/dtd" xmlns:ce="http://www.elsevier.com/xml/common/dtd" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:cals="http://www.elsevier.com/xml/common/cals/dtd"><mml:msub><mml:mi>???</mml:mi><mml:mi>q</mml:mi></mml:msub></mml:math>-minimization for <mml:math altimg="si2.gif" overflow="scroll" xmlns:xocs="http://www.elsevier.com/xml/xocs/dtd" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.elsevier.com/xml/ja/dtd" xmlns:ja="http://www.elsevier.com/xml/ja/dtd" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:tb="http://www.elsevier.com/xml/common/table/dtd" xmlns:sb="http://www.elsevier.com/xml/common/struct-bib/dtd" xmlns:ce="http://www.elsevier.com/xml/common/dtd" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:cals="http://www.elsevier.com/xml/common/cals/dtd"><mml:mn>0</mml:mn><mml:mo><</mml:mo><mml:mi>q</mml:mi><mml:mo>???</mml:mo><mml:mn>1</mml:mn></mml:math>, Applied and Computational Harmonic Analysis, vol.26, issue.3, pp.395-407, 2009. ,
DOI : 10.1016/j.acha.2008.09.001
Decomposition of Besov Spaces, Indiana University Mathematics Journal, issue.34, p.134, 1985. ,
DOI : 10.1515/9781400827268.385
On tail probabilities for martingales. the Annals of Probability, pp.100-118, 1975. ,
A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting, EuroCOLT '95: Proceedings of the 2nd European conference on COmputational Learning Theory, pp.23-37, 1995. ,
DOI : 10.1006/jcss.1997.1504
Deviation bounds. Private communication, p.113, 2011. ,
The KL-UCB algorithm for bounded stochastic bandits and beyond, Proceedings of the 24th annual Conference On Learning Theory, pp.37-41, 2011. ,
Context tree selection: A unifying view, Stochastic Processes and their Applications, vol.121, issue.11, p.113 ,
DOI : 10.1016/j.spa.2011.06.012
Combining online and offline knowledge in UCT, Proceedings of the 24th international conference on Machine learning, ICML '07, pp.273-280, 2007. ,
DOI : 10.1145/1273496.1273531
URL : https://hal.archives-ouvertes.fr/inria-00164003
Odalric-Ambrym Maillard, and Rémi Munos. Lstd with random projections, pp.721-729, 2010. ,
Rémi Munos, and Odalric-Ambrym Maillard. LSPI with random projections, p.236 ,
Markov Chain Monte Carlo in Practice, p.69, 1996. ,
Multi-armed Bandit Allocation Indices, p.25, 1989. ,
DOI : 10.1002/9780470980033
Bayesian approach to feature selection and parameter tuning for support vector machine classifiers, Neural Networks, vol.18, issue.5-6, pp.693-701, 0199. ,
DOI : 10.1016/j.neunet.2005.06.044
Transport inequalities -a survey, Markov Processes and Related Fields, pp.635-736, 2010. ,
URL : https://hal.archives-ouvertes.fr/hal-00515419
Approximation algorithms for budgeted learning problems, Proceedings of the thirty-ninth annual ACM symposium on Theory of computing , STOC '07, pp.104-113, 1921. ,
DOI : 10.1145/1250790.1250807
Information acquisition and exploitation in multichannel wireless systems, IEEE Transactions on Information Theory, p.18, 2007. ,
Approximation algorithms for restless bandit problems. CoRR, abs/0711, p.18, 2007. ,
A distribution-free theory of nonparametric regression, pp.146-154, 2002. ,
DOI : 10.1007/b97848
A Simple Adaptive Procedure Leading to Correlated Equilibrium, Econometrica, vol.68, issue.5, pp.1127-1150, 2000. ,
DOI : 10.1111/1468-0262.00153
Logarithmic regret algorithms for online convex optimization, pp.499-513, 2006. ,
Probability Inequalities for Sums of Bounded Random Variables, Journal of the American Statistical Association, vol.1, issue.301, pp.13-30, 1963. ,
DOI : 10.1214/aoms/1177730491
An asymptotically optimal bandit algorithm for bounded support models, Proceedings of the 23rd annual Conference On Learning Theory, pp.67-79, 2010. ,
An asymptotically optimal policy for finite support models in the multiarmed bandit problem, Machine Learning, vol.28, issue.3, pp.50-59 ,
DOI : 10.1007/s10994-011-5257-4
Feature Reinforcement Learning: Part I. Unstructured MDPs, Journal of Artificial General Intelligence, vol.1, issue.1, pp.3-24, 2009. ,
DOI : 10.2478/v10229-011-0002-8
URL : http://arxiv.org/abs/0906.1713
Near-optimal regret bounds for reinforcement learning, Journal of Machine Learning Research, vol.99, issue.249, pp.1563-1600, 2010. ,
Gaussian Hilbert spaces, p.136, 1997. ,
DOI : 10.1017/CBO9780511526169
Efficient bandit algorithms for online multiclass prediction, pp.440-447, 2008. ,
Non-stochastic bandit slate problems, pp.1054-1062, 2010. ,
Sleeping experts and bandits with stochastic action availability and adversarial rewards, Proceedings of the 12th international conference on Artificial Intelligence and Statistics, number 5 in AI&Stats '09, pp.272-279, 2009. ,
Near-optimal reinforcement learning in polynomial time, Machine Learning, vol.49, issue.2/3, pp.209-232, 2002. ,
DOI : 10.1023/A:1017984413808
Automatic basis function construction for approximate dynamic programming and reinforcement learning, pp.449-456, 2006. ,
Regret bounds for sleeping experts and bandits, Servedio and Zhang, pp.425-436, 2008. ,
DOI : 10.1007/s10994-010-5178-7
Local rademacher complexities and oracle inequalities in risk minimization . The Annals of Statistics, pp.2593-2656, 0200. ,
The Dantzig selector and sparsity oracle inequalities, Bernoulli, vol.15, issue.3, pp.799-828, 2009. ,
DOI : 10.3150/09-BEJ187
Regularization and feature selection in least-squares temporal difference learning, Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, pp.521-528, 2009. ,
DOI : 10.1145/1553374.1553442
Diffusion kernels on graphs and other discrete input spaces, Proceedings of the 19th International Conference on Machine Learning, ICML '02, pp.315-322, 2002. ,
Least-squares policy iteration, Journal of Machine Learning Research, vol.4, issue.218, pp.1107-1149, 2003. ,
Asymptotically efficient adaptive allocation rules, Advances in Applied Mathematics, vol.6, issue.40, pp.4-22, 1985. ,
Hybrid stochastic-adversarial online learning, p.19, 2009. ,
URL : https://hal.archives-ouvertes.fr/hal-00830168
Finite-sample analysis of LSTD, pp.228-232 ,
URL : https://hal.archives-ouvertes.fr/inria-00482189
Finite-sample analysis of least-squares policy iteration, p.238, 2010. ,
URL : https://hal.archives-ouvertes.fr/inria-00528596
Finite-sample analysis of LSTD, Fürnkranz and Joachims, p.219, 2010. ,
URL : https://hal.archives-ouvertes.fr/inria-00482189
The Concentration of Measure Phenomenon. Mathematical surveys and monographs, p.107, 2001. ,
Universal Intelligence: A Definition of Machine Intelligence, Minds and Machines, vol.28, issue.1, pp.391-444, 2007. ,
DOI : 10.1007/s11023-007-9079-x
A wide range no-regret theorem. Game theory and information, p.81, 2003. ,
Markov Chains and Mixing Times, p.69, 2008. ,
DOI : 10.1090/mbk/058
Approximation, metric entropy and small ball estimates for gaussian measures, Annals of Probability, vol.27, pp.1556-1578, 0199. ,
Dense fast random projections and lean walsh transforms APPROX 2008, and 12th international workshop, RANDOM 2008 on Approximation, Randomization and Combinatorial Optimization: Algorithms and Techniques, APPROX '08 / RANDOM '08, Proceedings of the 11th international workshop, pp.512-522, 2008. ,
Gaussian random functions, p.132, 1995. ,
The weighted majority algorithm, Proceedings of the 30th annual Symposium on Foundations of Computer Science, pp.256-261, 1989. ,
Sparse Temporal Difference Learning Using LASSO, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning, pp.352-359, 2007. ,
DOI : 10.1109/ADPRL.2007.368210
URL : https://hal.archives-ouvertes.fr/inria-00117075
Contextual multi-armed bandits, Proceedings of the 13th international conference on Artificial Intelligence and Statistics, pp.485-492, 2010. ,
Representation policy iteration, Proceedings of the 21st conference on Uncertainty in Artificial Intelligence, UAI '05, pp.372-379, 2005. ,
Compressed least-squares regression, Bengio, pp.1213-1221, 2009. ,
Scrambled objects for least-squares regression, pp.1549-1557, 2010. ,
Online learning in adversarial lipschitz environments, Proceedings of the 2010 European Conference on Machine Learning and Knowledge Discovery in Databases: Part II, ECML PKDD'10, pp.305-320, 15882. ,
Adaptive bandits: Towards the best historydependent strategy, To appear in Proceedings of the 14th international conference on Artificial Intelligence and Statistics, p.255, 2011. ,
Complexity versus agreement for many views, pp.232-246, 2009. ,
Finite sample analysis of bellman residual minimization, Asian Conference on Machine Learning, p.255, 2010. ,
URL : https://hal.archives-ouvertes.fr/hal-00830212
Finite-time analysis of multiarmed bandits problems with kullback-leibler divergences, To appear in Proceedings of the 24th annual Conference On Learning Theory, p.255, 2011. ,
URL : https://hal.archives-ouvertes.fr/inria-00574987
Distribution of eigenvalues for some sets of random matrices, Mathematics of the USSR-Sbornik, pp.457-483, 1967. ,
Basis Function Adaptation in Temporal Difference Reinforcement Learning, Annals of Operations Research, vol.34, issue.1/2/3, pp.215-238, 2005. ,
DOI : 10.1007/s10479-005-5732-z
Error bounds for approximate policy iteration, Proceedings of the 19th International Conference on Machine Learning, ICML '03, pp.560-567, 2003. ,
Performance bounds in Lp norm for approximate value iteration, SIAM Journal of Control and Optimization, p.208, 2007. ,
URL : https://hal.archives-ouvertes.fr/inria-00124685
Finite time bounds for fitted value iteration, Journal of Machine Learning Research, vol.9, issue.218, pp.815-857, 2008. ,
URL : https://hal.archives-ouvertes.fr/inria-00120882
Randomized interior point methods for sampling and optimization, The Annals of Applied Probability, vol.26, issue.1, p.70, 2009. ,
DOI : 10.1214/15-AAP1104
Random walk approach to regret minimization, pp.1777-1785, 2010. ,
Algorithms for inverse reinforcement learning, Proceedings of the 17th International Conference on Machine Learning, ICML '00, pp.663-670, 2000. ,
Online regret bounds for markov decision processes with deterministic transitions, pp.123-137, 2009. ,
Multi-armed bandit problems with dependent arms, Proceedings of the 24th international conference on Machine learning, ICML '07, 2007. ,
DOI : 10.1145/1273496.1273587
Analyzing feature generation for value-function approximation, Proceedings of the 24th international conference on Machine learning, ICML '07, pp.737-744, 2007. ,
DOI : 10.1145/1273496.1273589
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.151.5750
Smo-style algorithms for learning using privileged information, DMIN, pp.235-241, 2010. ,
Feature selection using regularization in approximate linear programs for Markov decision processes, Fürnkranz and Joachims, pp.871-878, 2010. ,
Nonstochastic bandits: Countable decision set, unbounded costs and reactive environments, Theoretical Computer Science, vol.397, issue.1-3, pp.77-93, 2008. ,
DOI : 10.1016/j.tcs.2008.02.024
Empirical processes: theory and applications. NSF-CBMS regional conference series in probability and statistics, p.109, 1990. ,
Approximate Dynamic Programming: Solving the Curses of Dimensionality (Wiley Series in Probability and Statistics), p.218, 2007. ,
Markov Decision Processes ? Discrete Stochastic Dynamic Programming, p.208, 1994. ,
Random features for large-scale kernel machines, p.148, 2007. ,
Uniform approximation of functions with random bases, 2008 46th Annual Allerton Conference on Communication, Control, and Computing, p.149, 2008. ,
DOI : 10.1109/ALLERTON.2008.4797607
Online learning: Beyond regret. ArXiv e-prints, nov 2010, p.179 ,
Bayesian inverse reinforcement learning, Proceedings of the 20th international joint conference on Artifical intelligence, pp.2586-2591, 2007. ,
Compressive Sensing and Structured Random Matrices. Theoretical Foundations and Numerical Methods for Sparse Recovery, pp.169-172, 2010. ,
Sparse legendre expansions via l_1 minimization, Arxiv preprint, p.165, 2010. ,
Some aspects of the sequential design of experiments, Bulletin of the American Mathematical Society, vol.58, issue.5, pp.527-535, 1952. ,
DOI : 10.1090/S0002-9904-1952-09620-8
Semi-Supervised Learning with Multiple Views, p.193, 2008. ,
The rademacher complexity of co-regularized kernel classes, Proceedings of the Eleventh ICAIS, pp.186-188, 2007. ,
The Littlewood???Offord problem and invertibility of random matrices, Advances in Mathematics, vol.218, issue.2, pp.600-633, 2008. ,
DOI : 10.1016/j.aim.2008.01.010
On sparse reconstruction from Fourier and Gaussian measurements, Communications on Pure and Applied Mathematics, vol.52, issue.8, pp.611025-1045, 2008. ,
DOI : 10.1002/cpa.20227
Non-asymptotic theory of random matrices: extreme singular values. ArXiv e-prints, mar 2010, p.233 ,
Linearly Parameterized Bandits, Mathematics of Operations Research, vol.35, issue.2, pp.395-411, 1922. ,
DOI : 10.1287/moor.1100.0446
URL : http://arxiv.org/abs/0812.3465
On the possibility of learning in reactive environments with arbitrary dependence, Theoretical Computer Science, vol.405, issue.3, pp.274-284, 2008. ,
DOI : 10.1016/j.tcs.2008.06.039
URL : https://hal.archives-ouvertes.fr/hal-00639569
Theory of reproducing Kernels and its applications, Longman Scientific & Technical, p.135, 1988. ,
On the probability of large deviations of random magnitudes, pp.4211-4255, 1957. ,
Improved Approximation Algorithms for Large Matrices via Random Projections, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06), pp.143-152, 2006. ,
DOI : 10.1109/FOCS.2006.37
Should one compute the temporal difference fix point or minimize the bellman residual? the unified oblique projection view, Fürnkranz and Joachims, p.208, 2010. ,
URL : https://hal.archives-ouvertes.fr/inria-00537403
Anticipatory Behavior in Adaptive Learning Systems, chapter Driven by Compression Progress: A Simple Principle Explains Essential Aspects of Subjective Beauty, pp.48-76, 2009. ,
A Generalized Representer Theorem, pp.416-426, 2001. ,
DOI : 10.1007/3-540-44581-1_27
Generalized polynomial approximations in markovian decision processes, Journal of Mathematical Analysis and Applications, vol.110, pp.568-582, 1985. ,
Pac-bayesian analysis of martingales and multiarmed bandits. ArXiv e-prints, 0121. ,
Pac-bayesian analysis of the exploration-exploitation trade-off. ArXiv e-prints, 0121. ,
Online Learning, p.65, 2007. ,
DOI : 10.1017/CBO9781107298019.022
Handbook of Learning and Approximate Dynamic Programming ,
DOI : 10.1109/9780470544785
Processus décisionnels de Markov en intelligence artificielle , volume 1 -principes généraux et applications of IC2 -informatique et systèmes d'information, 2008. ,
An RKHS for multi-view learning and manifold co-regularization, Proceedings of the 25th international conference on Machine learning, ICML '08, pp.976-983, 2008. ,
DOI : 10.1145/1390156.1390279
A co-regularization approach to semisupervised learning with multiple views, Proceedings of the 22nd International Conference on Machine Learning of ICML '05, ACM International Conference Proceeding Series. Workshop on Learning with Multiple Views, p.193, 2005. ,
Kernels and Regularization on Graphs, Conference On Learning Theory and 7th Kernel Workshop, pp.144-158, 2003. ,
DOI : 10.1007/978-3-540-45167-9_12
An information theoretic framework for multi-view learning, pp.403-414, 2008. ,
A new concentration result for regularized risk minimizers. IMS Lecture notes monograph series, pp.260-183, 2006. ,
Incomplete Information and Internal Regret in Prediction of Individual Sequences, p.81, 2005. ,
URL : https://hal.archives-ouvertes.fr/tel-00009759
Contributions to the sequential prediction of arbitrary sequences: applications to the theory of repeated games and empirical studies of the performance of the aggregation of experts. Habilitation à diriger des recherches, p.29, 2011. ,
PAC model-free reinforcement learning, Proceedings of the 23rd international conference on Machine learning , ICML '06, pp.881-888, 2006. ,
DOI : 10.1145/1143844.1143955
Generalization in reinforcement learning: Successful examples using sparse coarse coding, Advances in Neural Information Processing Systems, pp.1038-1044, 1996. ,
Reinforcement Learning: An Introduction, p.227, 0209. ,
Online learning with random representations, Proceedings of the 10th International Conference on Machine Learning, ICML '93, pp.314-321, 1993. ,
Algorithms for Reinforcement Learning, Synthesis Lectures on Artificial Intelligence and Machine Learning, pp.1-103, 0205. ,
DOI : 10.2200/S00268ED1V01Y201005AIM009
The generic chaining: upper and lower bounds of stochastic processes. Springer monographs in mathematics, p.115, 2005. ,
Random matrices: Universality of ESDs and the circular law, The Annals of Probability, vol.38, issue.5, pp.2023-2065, 2010. ,
DOI : 10.1214/10-AOP534
Temporal difference learning and TD-Gammon, Communications of the ACM, vol.38, issue.3, pp.58-68, 1995. ,
DOI : 10.1145/203330.203343
Optimistic linear programming gives logarithmic regret for irreducible mdps, p.242, 2007. ,
On the likelihood that one unknown probability exceeds another in view of the evidence of two samples, Biometrika, vol.25, pp.285-294, 1933. ,
On the theory of apportionment, American Journal of Mathematics, vol.57, issue.6, pp.450-456, 1935. ,
Regression shrinkage and selection via the Lasso, Journal of the Royal Statistical Society, Series B, vol.58, issue.125, pp.267-288, 1994. ,
Solution of incorrectly formulated problems and the regularization method, Soviet Math Dokl, vol.4, pp.1035-1038, 1963. ,
Optimal Rates of Aggregation, Proceedings of the 16th annual Conference On Learning Theory, pp.303-313, 2003. ,
DOI : 10.1007/978-3-540-45167-9_23
URL : https://hal.archives-ouvertes.fr/hal-00104867
The deterministic lasso, Seminar für Statistik, Eidgenössische Technische Hochschule (ETH) Zürich, p.164, 2007. ,
On the conditions used to prove oracle results for the lasso, Electronic Journal of Statistics, vol.3, pp.1360-1392, 2009. ,
A new learning paradigm: Learning using privileged information, Neural Networks, vol.22, issue.5-6, pp.544-557, 2009. ,
The Random Projection Method, p.226, 2004. ,
Algorithms for infinitely many-armed bandits, pp.1729-1736, 2008. ,
Learning from Delayed Rewards King's College, p.218, 1989. ,
Semi-supervised protein classification using cluster kernels, Bioinformatics, vol.21, issue.15, pp.3241-3247, 2005. ,
DOI : 10.1093/bioinformatics/bti497
Multi-armed bandits and the gittins index, Journal of the Royal Statistical Society. Series B (Methodological), vol.42, issue.2, pp.143-149, 1980. ,
On the Distribution of the Roots of Certain Symmetric Matrices, The Annals of Mathematics, vol.67, issue.2, pp.325-327, 1958. ,
DOI : 10.2307/1970008
Tight performance bounds on greedy policies based on imperfect value functions, Proceedings of the Tenth Yale Workshop on Adaptive and Learning Systems, p.208, 1994. ,
Sparse grids Parallel Algorithms for Partial Differential Equations, Proceedings of the Sixth GAMM-Seminar, p.150, 1990. ,
Compressed Spectral Clustering, 2009 IEEE International Conference on Data Mining Workshops, pp.344-349, 2009. ,
DOI : 10.1109/ICDMW.2009.22
On model selection consistency of Lasso, Journal of Machine Learning Research, vol.7, pp.2563-164, 2006. ,
Learning with local and global consistency, Proceedings of the 17th conference on advances in Neural Information Processing Systems, NIPS '03, pp.321-328, 2003. ,
Models of cooperative teaching and learning, Journal of Machine Learning Research, vol.12, pp.349-384, 2011. ,
Online convex programming and generalized infinitesimal gradient ascent, Proceedings of the 20th International Conference on Machine Learning, ICML '03, pp.928-936, 2003. ,