1 - 4
1. Algorithms for reinforcement learning [2010]
- Szepesvári, Csaba.
- Cham, Switzerland : Springer, ©2010.
- Description
- Book — 1 online resource (xii, 89 pages) : illustrations
- Summary
-
- Markov Decision Processes Value Prediction Problems Control For Further Exploration.
- (source: Nielsen Book Data)
(source: Nielsen Book Data)
- Szepesvári, Csaba.
- San Rafael, Calif. (1537 Fourth Street, San Rafael, CA 94901 USA) : Morgan & Claypool, c2010.
- Description
- Book — 1 electronic text (xii, 89 p.) : ill.
- Summary
-
- Markov Decision Processes Value Prediction Problems Control For Further Exploration.
- (source: Nielsen Book Data)
(source: Nielsen Book Data)
3. Bandit algorithms [2020]
- Lattimore, Tor, 1987- author.
- Cambridge, United Kingdom ; New York, NY : Cambridge University Press, 2020
- Description
- Book — 1 online resource
- Summary
-
- 1. Introduction
- 2. Foundations of probability
- 3. Stochastic processes and Markov chains
- 4. Finite-armed stochastic bandits
- 5. Concentration of measure
- 6. The explore-then-commit algorithm
- 7. The upper confidence bound algorithm
- 8. The upper confidence bound algorithm: asymptotic optimality
- 9. The upper confidence bound algorithm: minimax optimality
- 10. The upper confidence bound algorithm: Bernoulli noise
- 11. The Exp3 algorithm
- 12. The Exp3-IX algorithm
- 13. Lower bounds: basic ideas
- 14. Foundations of information theory
- 15. Minimax lower bounds
- 16. Asymptotic and instance dependent lower bounds
- 17. High probability lower bounds
- 18. Contextual bandits
- 19. Stochastic linear bandits
- 20. Confidence bounds for least squares estimators
- 21. Optimal design for least squares estimators
- 22. Stochastic linear bandits with finitely many arms
- 23. Stochastic linear bandits with sparsity
- 24. Minimax lower bounds for stochastic linear bandits
- 25. Asymptotic lower bounds for stochastic linear bandits
- 26. Foundations of convex analysis
- 27. Exp3 for adversarial linear bandits
- 28. Follow the regularized leader and mirror descent
- 29. The relation between adversarial and stochastic linear bandits
- 30. Combinatorial bandits
- 31. Non-stationary bandits
- 32. Ranking
- 33. Pure exploration
- 34. Foundations of Bayesian learning
- 35. Bayesian bandits
- 36. Thompson sampling
- 37. Partial monitoring
- 38. Markov decision processes.
- (source: Nielsen Book Data)
(source: Nielsen Book Data)
- French, Mark.
- Chichester ; Hoboken, NJ : Wiley, c2003.
- Description
- Book — xv, 396 p. : ill. ; 26 cm.
- Summary
-
- Preface.Introduction.Approximation Theory.Uncertainty Modelling, Control Design and System Performance.The Chain of Integrators.Function Approximator Designs for the Integrator Chain.Resolution Divergence.Resolution Scaling.Strict Feedback Systems.Output Feedback Control.Comparison to Alternative Designs.Conclusions and Outlook.Appendix A: Lyapunov's Direct Method.Appendix B: Functional Bounds from System Identification.References.Index.
- (source: Nielsen Book Data)
(source: Nielsen Book Data)
SAL3 (off-campus storage)
SAL3 (off-campus storage) | Status |
---|---|
Stacks | Request (opens in new tab) |
TJ217 .F75 2003 | Available |
Articles+
Journal articles, e-books, & other e-resources
Guides
Course- and topic-based guides to collections, tools, and services.