1 - 3
1. Algorithms for reinforcement learning [2010]
- Szepesvári, Csaba.
- Cham, Switzerland : Springer, ©2010.
- Description
- Book — 1 online resource (xii, 89 pages) : illustrations
- Summary
-
- Markov Decision Processes Value Prediction Problems Control For Further Exploration.
- (source: Nielsen Book Data)
(source: Nielsen Book Data)
- Szepesvári, Csaba.
- San Rafael, Calif. (1537 Fourth Street, San Rafael, CA 94901 USA) : Morgan & Claypool, c2010.
- Description
- Book — 1 electronic text (xii, 89 p.) : ill.
- Summary
-
- Markov Decision Processes Value Prediction Problems Control For Further Exploration.
- (source: Nielsen Book Data)
(source: Nielsen Book Data)
3. Bandit algorithms [2020]
- Lattimore, Tor, 1987- author.
- Cambridge, United Kingdom ; New York, NY : Cambridge University Press, 2020
- Description
- Book — 1 online resource
- Summary
-
- 1. Introduction
- 2. Foundations of probability
- 3. Stochastic processes and Markov chains
- 4. Finite-armed stochastic bandits
- 5. Concentration of measure
- 6. The explore-then-commit algorithm
- 7. The upper confidence bound algorithm
- 8. The upper confidence bound algorithm: asymptotic optimality
- 9. The upper confidence bound algorithm: minimax optimality
- 10. The upper confidence bound algorithm: Bernoulli noise
- 11. The Exp3 algorithm
- 12. The Exp3-IX algorithm
- 13. Lower bounds: basic ideas
- 14. Foundations of information theory
- 15. Minimax lower bounds
- 16. Asymptotic and instance dependent lower bounds
- 17. High probability lower bounds
- 18. Contextual bandits
- 19. Stochastic linear bandits
- 20. Confidence bounds for least squares estimators
- 21. Optimal design for least squares estimators
- 22. Stochastic linear bandits with finitely many arms
- 23. Stochastic linear bandits with sparsity
- 24. Minimax lower bounds for stochastic linear bandits
- 25. Asymptotic lower bounds for stochastic linear bandits
- 26. Foundations of convex analysis
- 27. Exp3 for adversarial linear bandits
- 28. Follow the regularized leader and mirror descent
- 29. The relation between adversarial and stochastic linear bandits
- 30. Combinatorial bandits
- 31. Non-stationary bandits
- 32. Ranking
- 33. Pure exploration
- 34. Foundations of Bayesian learning
- 35. Bayesian bandits
- 36. Thompson sampling
- 37. Partial monitoring
- 38. Markov decision processes.
- (source: Nielsen Book Data)
(source: Nielsen Book Data)
Articles+
Journal articles, e-books, & other e-resources
Guides
Course- and topic-based guides to collections, tools, and services.