1  3
1. Algorithms for reinforcement learning [2010]
 Szepesvári, Csaba.
 Cham, Switzerland : Springer, ©2010.
 Description
 Book — 1 online resource (xii, 89 pages) : illustrations
 Summary

 Markov Decision Processes Value Prediction Problems Control For Further Exploration.
 (source: Nielsen Book Data)
(source: Nielsen Book Data)
 Szepesvári, Csaba.
 San Rafael, Calif. (1537 Fourth Street, San Rafael, CA 94901 USA) : Morgan & Claypool, c2010.
 Description
 Book — 1 electronic text (xii, 89 p.) : ill.
 Summary

 Markov Decision Processes Value Prediction Problems Control For Further Exploration.
 (source: Nielsen Book Data)
(source: Nielsen Book Data)
3. Bandit algorithms [2020]
 Lattimore, Tor, 1987 author.
 Cambridge, United Kingdom ; New York, NY : Cambridge University Press, 2020
 Description
 Book — 1 online resource
 Summary

 1. Introduction
 2. Foundations of probability
 3. Stochastic processes and Markov chains
 4. Finitearmed stochastic bandits
 5. Concentration of measure
 6. The explorethencommit algorithm
 7. The upper confidence bound algorithm
 8. The upper confidence bound algorithm: asymptotic optimality
 9. The upper confidence bound algorithm: minimax optimality
 10. The upper confidence bound algorithm: Bernoulli noise
 11. The Exp3 algorithm
 12. The Exp3IX algorithm
 13. Lower bounds: basic ideas
 14. Foundations of information theory
 15. Minimax lower bounds
 16. Asymptotic and instance dependent lower bounds
 17. High probability lower bounds
 18. Contextual bandits
 19. Stochastic linear bandits
 20. Confidence bounds for least squares estimators
 21. Optimal design for least squares estimators
 22. Stochastic linear bandits with finitely many arms
 23. Stochastic linear bandits with sparsity
 24. Minimax lower bounds for stochastic linear bandits
 25. Asymptotic lower bounds for stochastic linear bandits
 26. Foundations of convex analysis
 27. Exp3 for adversarial linear bandits
 28. Follow the regularized leader and mirror descent
 29. The relation between adversarial and stochastic linear bandits
 30. Combinatorial bandits
 31. Nonstationary bandits
 32. Ranking
 33. Pure exploration
 34. Foundations of Bayesian learning
 35. Bayesian bandits
 36. Thompson sampling
 37. Partial monitoring
 38. Markov decision processes.
 (source: Nielsen Book Data)
(source: Nielsen Book Data)
Articles+
Journal articles, ebooks, & other eresources
Guides
Course and topicbased guides to collections, tools, and services.