1 - 20
Next
1. Reinforcement learning : an introduction [2018]
- Sutton, Richard S. author.
- Second edition. - Cambridge, Massachusetts : The MIT Press, [2018]
- Description
- Book — xxii, 526 pages : color illustrations ; 24 cm.
- Summary
-
- Machine generated contents note: 1.Introduction
- 1.1.Reinforcement Learning
- 1.2.Examples
- 1.3.Elements of Reinforcement Learning
- 1.4.Limitations and Scope
- 1.5.An Extended Example: Tic-Tac-Toe
- 1.6.Summary
- 1.7.Early History of Reinforcement Learning
- I.Tabular solution methods
- 2.Multi-armed Bandits
- 2.1.A k-armed Bandit Problem
- 2.2.Action-value Methods
- 2.3.The 10-armed Testbed
- 2.4.Incremental Implementation
- 2.5.Tracking a Nonstationary Problem
- 2.6.Optimistic Initial Values
- 2.7.Upper-Confidence-Bound Action Selection
- 2.8.Gradient Bandit Algorithms
- 2.9.Associative Search (Contextual Bandits)
- 2.10.Summary
- 3.Finite Markov Decision Processes
- 3.1.The Agent-Environment Interface
- 3.2.Goals and Rewards
- 3.3.Returns and Episodes
- 3.4.Unified Notation for Episodic and Continuing Tasks
- 3.5.Policies and Value Functions
- 3.6.Optimal Policies and Optimal Value Functions
- 3.7.Optimality and Approximation
- 3.8.Summary
- 4.Dynamic Programming
- Note continued: 4.1.Policy Evaluation (Prediction)
- 4.2.Policy Improvement
- 4.3.Policy Iteration
- 4.4.Value Iteration
- 4.5.Asynchronous Dynamic Programming
- 4.6.Generalized Policy Iteration
- 4.7.Efficiency of Dynamic Programming
- 4.8.Summary
- 5.Monte Carlo Methods
- 5.1.Monte Carlo Prediction
- 5.2.Monte Carlo Estimation of Action Values
- 5.3.Monte Carlo Control
- 5.4.Monte Carlo Control without Exploring Starts
- 5.5.Off-policy Prediction via Importance Sampling
- 5.6.Incremental Implementation
- 5.7.Off-policy Monte Carlo Control
- 5.8.*Discounting-aware Importance Sampling
- 5.9.*Per-decision Importance Sampling
- 5.10.Summary
- 6.Temporal-Difference Learning
- 6.1.TD Prediction
- 6.2.Advantages of TD Prediction Methods
- 6.3.Optimality of TD(0)
- 6.4.Sarsa: On-policy TD Control
- 6.5.Q-learning: Off-policy TD Control
- 6.6.Expected Sarsa
- 6.7.Maximization Bias and Double Learning
- Note continued: 6.8.Games, Afterstates, and Other Special Cases
- 6.9.Summary
- 7.n-step Bootstrapping
- 7.1.n-step TD Prediction
- 7.2.n-step Sarsa
- 7.3.n-step Off-policy Learning
- 7.4.*Per-decision Methods with Control Variates
- 7.5.Off-policy Learning Without Importance Sampling: The n-step Tree Backup Algorithm
- 7.6.*A Unifying Algorithm: n-step Q([sigma])
- 7.7.Summary
- 8.Planning and Learning with Tabular Methods
- 8.1.Models and Planning
- 8.2.Dyna: Integrated Planning, Acting, and Learning
- 8.3.When the Model Is Wrong
- 8.4.Prioritized Sweeping
- 8.5.Expected vs. Sample Updates
- 8.6.Trajectory Sampling
- 8.7.Real-time Dynamic Programming
- 8.8.Planning at Decision Time
- 8.9.Heuristic Search
- 8.10.Rollout Algorithms
- 8.11.Monte Carlo Tree Search
- 8.12.Summary of the Chapter
- 8.13.Summary of Part I: Dimensions
- II.Approximate solution methods
- 9.On-policy Prediction with Approximation
- 9.1.Value-function Approximation
- 9.2.The Prediction Objective (VE)
- Note continued: 9.3.Stochastic-gradient and Semi-gradient Methods
- 9.4.Linear Methods
- 9.5.Feature Construction for Linear Methods
- 9.5.1.Polynomials
- 9.5.2.Fourier Basis
- 9.5.3.Coarse Coding
- 9.5.4.Tile Coding
- 9.5.5.Radial Basis Functions
- 9.6.Selecting Step-Size Parameters Manually
- 9.7.Nonlinear Function Approximation: Artificial Neural Networks
- 9.8.Least-Squares TD
- 9.9.Memory-based Function Approximation
- 9.10.Kernel-based Function Approximation
- 9.11.Looking Deeper at On-policy Learning: Interest and Emphasis
- 9.12.Summary
- 10.On-policy Control with Approximation
- 10.1.Episodic Semi-gradient Control
- 10.2.Semi-gradient n-step Sarsa
- 10.3.Average Reward: A New Problem Setting for Continuing Tasks
- 10.4.Deprecating the Discounted Setting
- 10.5.Differential Semi-gradient n-step Sarsa
- 10.6.Summary
- 11.*Off-policy Methods with Approximation
- 11.1.Semi-gradient Methods
- 11.2.Examples of Off-policy Divergence
- Note continued: 11.3.The Deadly Triad
- 11.4.Linear Value-function Geometry
- 11.5.Gradient Descent in the Bellman Error
- 11.6.The Bellman Error is Not Learnable
- 11.7.Gradient-TD Methods
- 11.8.Emphatic-TD Methods
- 11.9.Reducing Variance
- 11.10.Summary
- 12.Eligibility Traces
- 12.1.The [lambda]-return
- 12.2.TD([lambda])
- 12.3.n-step Truncated [lambda]-return Methods
- 12.4.Redoing Updates: Online [lambda]-return Algorithm
- 12.5.True Online TD([lambda])
- 12.6.*Dutch Traces in Monte Carlo Learning
- 12.7.Sarsa([lambda])
- 12.8.Variable [lambda] and [gamma]
- 12.9.Off-policy Traces with Control Variates
- 12.10.Watkins's Q([lambda]) to Tree-Backup([lambda])
- 12.11.Stable Off-policy Methods with Traces
- 12.12.Implementation Issues
- 12.13.Conclusions
- 13.Policy Gradient Methods
- 13.1.Policy Approximation and its Advantages
- 13.2.The Policy Gradient Theorem
- 13.3.REINFORCE: Monte Carlo Policy Gradient
- 13.4.REINFORCE with Baseline
- 13.5.Actor-Critic Methods
- Note continued: 13.6.Policy Gradient for Continuing Problems
- 13.7.Policy Parameterization for Continuous Actions
- 13.8.Summary
- III. Looking deeper
- 14.Psychology
- 14.1.Prediction and Control
- 14.2.Classical Conditioning
- 14.2.1.Blocking and Higher-order Conditioning
- 14.2.2.The Rescorla-Wagner Model
- 14.2.3.The TD Model
- 14.2.4.TD Model Simulations
- 14.3.Instrumental Conditioning
- 14.4.Delayed Reinforcement
- 14.5.Cognitive Maps
- 14.6.Habitual and Goal-directed Behavior
- 14.7.Summary
- 15.Neuroscience
- 15.1.Neuroscience Basics
- 15.2.Reward Signals, Reinforcement Signals, Values, and Prediction Errors
- 15.3.The Reward Prediction Error Hypothesis
- 15.4.Dopamine
- 15.5.Experimental Support for the Reward Prediction Error Hypothesis
- 15.6.TD Error/Dopamine Correspondence
- 15.7.Neural Actor-Critic
- 15.8.Actor and Critic Learning Rules
- 15.9.Hedonistic Neurons
- 15.10.Collective Reinforcement Learning
- 15.11.Model-based Methods in the Brain
- Note continued: 15.12.Addiction
- 15.13.Summary
- 16.Applications and Case Studies
- 16.1.TD-Gammon
- 16.2.Samuel's Checkers Player
- 16.3.Watson's Daily-Double Wagering
- 16.4.Optimizing Memory Control
- 16.5.Human-level Video Game Play
- 16.6.Mastering the Game of Go
- 16.6.1.AlphaGo
- 16.6.2.AlphaGo Zero
- 16.7.Personalized Web Services
- 16.8.Thermal Soaring
- 17.Frontiers
- 17.1.General Value Functions and Auxiliary Tasks
- 17.2.Temporal Abstraction via Options
- 17.3.Observations and State
- 17.4.Designing Reward Signals
- 17.5.Remaining Issues
- 17.6.Experimental Support for the Reward Prediction Error Hypothesis.
(source: Nielsen Book Data)
- Online
Engineering Library (Terman)
Engineering Library (Terman) | Status |
---|---|
Stacks | |
Q325.6 .R45 2018 | CHECKEDOUT Request |
2. Reinforcement learning : an introduction [1998]
- Sutton, Richard S.
- Cambridge, Mass. : MIT Press, c1998.
- Description
- Book — xviii, 322 p. : ill. ; 24 cm.
- Summary
-
Reinforcement learning, one of the most active research areas in artificial intelligence, is a computational approach to learning whereby an agent tries to maximize the total amount of reward it receives when interacting with a complex, uncertain environment. This text aims to provide a clear and simple account of the key ideas and algorithms of reinforcement learning. The discussion ranges from the history of the field's intellectual foundations to the most recent developments and applications. The only necessary mathematical background is familiarity with elementary concepts of probability. The book is divided into three parts. Part one defines the reinforcement learning problems in terms of Markov decision problems. Part two provides basic solution methods - dynamic programming, Monte Carlo simulation and temporal-difference learning - and part three presents a unified view of the solution methods and incorporates artificial neural networks, eligibility traces and planning. The two final chapters present case studies and consider the future of reinforcement learning.
(source: Nielsen Book Data)
Engineering Library (Terman)
Engineering Library (Terman) | Status |
---|---|
Stacks | |
Q325.6 .S88 1998 | Unknown |
- Sugiyama, Masashi, 1974- author.
- Boca Raton, Florida : CRC Press, [2015]
- Description
- Book — 1 online resource
- Summary
-
- 1. Introduction
- 2. Model-free policy iteration
- 3. Model-free policy search
- 4. Model-based reinforcement learning
- Sugiyama, Masashi, 1974- author.
- Boca Raton, Florida : CRC Press, [2015]
- Description
- Book — 1 online resource : text file, PDF.
- Summary
-
- Introduction to Reinforcement Learning. Model-Free Policy Iteration. Policy Iteration with Value Function Approximation. Basis Design for Value Function Approximation. Sample Reuse in Policy Iteration. Active Learning in Policy Iteration. Robust Policy Iteration. Model-Free Policy Search. Direct Policy Search by Gradient Ascent. Direct Policy Search by Expectation-Maximization. Policy-Prior Search. Model-Based Reinforcement Learning. Transition Model Estimation. Dimensionality Reduction for Transition Model Estimation.
- (source: Nielsen Book Data)
- Introduction to Reinforcement Learning Reinforcement Learning Mathematical Formulation Structure of the Book
- Model-Free Policy Iteration
- Model-Free Policy Search
- Model-Based Reinforcement Learning MODEL-FREE POLICY ITERATION Policy Iteration with Value Function Approximation Value Functions
- State Value Functions
- State-Action Value Functions Least-Squares Policy Iteration
- Immediate-Reward Regression
- Algorithm
- Regularization
- Model Selection Remarks Basis Design for Value Function Approximation Gaussian Kernels on Graphs
- MDP-Induced Graph
- Ordinary Gaussian Kernels
- Geodesic Gaussian Kernels
- Extension to Continuous State Spaces Illustration
- Setup
- Geodesic Gaussian Kernels
- Ordinary Gaussian Kernels
- Graph-Laplacian Eigenbases
- Diffusion Wavelets Numerical Examples
- Robot-Arm Control
- Robot-Agent Navigation Remarks Sample Reuse in Policy Iteration Formulation Off-Policy Value Function Approximation
- Episodic Importance Weighting
- Per-Decision Importance Weighting
- Adaptive Per-Decision Importance Weighting
- Illustration Automatic Selection of Flattening Parameter
- Importance-Weighted Cross-Validation
- Illustration Sample-Reuse Policy Iteration
- Algorithm
- Illustration Numerical Examples
- Inverted Pendulum
- Mountain Car Remarks Active Learning in Policy Iteration Efficient Exploration with Active Learning
- Problem Setup
- Decomposition of Generalization Error
- Estimation of Generalization Error
- Designing Sampling Policies
- Illustration Active Policy Iteration
- Sample-Reuse Policy Iteration with Active Learning
- Illustration Numerical Examples Remarks Robust Policy Iteration Robustness and Reliability in Policy Iteration
- Robustness
- Reliability Least Absolute Policy Iteration
- Algorithm
- Illustration
- Properties Numerical Examples Possible Extensions
- Huber Loss
- Pinball Loss
- Deadzone-Linear Loss
- Chebyshev Approximation
- Conditional Value-At-Risk Remarks MODEL-FREE POLICY SEARCH Direct Policy Search by Gradient Ascent Formulation Gradient Approach
- Gradient Ascent
- Baseline Subtraction for Variance Reduction
- Variance Analysis of Gradient Estimators Natural Gradient Approach
- Natural Gradient Ascent
- Illustration Application in Computer Graphics: Artist Agent
- Sumie Paining
- Design of States, Actions, and Immediate Rewards
- Experimental Results Remarks Direct Policy Search by Expectation-Maximization Expectation-Maximization Approach Sample Reuse
- Episodic Importance Weighting
- Per-Decision Importance Weight
- Adaptive Per-Decision Importance Weighting
- Automatic Selection of Flattening Parameter
- Reward-Weighted Regression with Sample Reuse Numerical Examples Remarks Policy-Prior Search Formulation Policy Gradients with Parameter-Based Exploration
- Policy-Prior Gradient Ascent
- Baseline Subtraction for Variance Reduction
- Variance Analysis of Gradient Estimators
- Numerical Examples Sample Reuse in Policy-Prior Search
- Importance Weighting
- Variance Reduction by Baseline Subtraction
- Numerical Examples Remarks MODEL-BASED REINFORCEMENT LEARNING Transition Model Estimation Conditional Density Estimation
- Regression-Based Approach
- Q-Neighbor Kernel Density Estimation
- Least-Squares Conditional Density Estimation Model-Based Reinforcement Learning Numerical Examples
- Continuous Chain Walk
- Humanoid Robot Control Remarks Dimensionality Reduction for Transition Model Estimation Sufficient Dimensionality Reduction Squared-Loss Conditional Entropy
- Conditional Independence
- Dimensionality Reduction with SCE
- Relation to Squared-Loss Mutual Information Numerical Examples
- Artificial and Benchmark Datasets
- Humanoid Robot Remarks References Index.
- (source: Nielsen Book Data)
(source: Nielsen Book Data)
Reinforcement learning is a mathematical framework for developing computer agents that can learn an optimal behavior by relating generic reward signals with its past actions. With numerous successful applications in business intelligence, plant control, and gaming, the RL framework is ideal for decision making in unknown environments with large amounts of data. Supplying an up-to-date and accessible introduction to the field, Statistical Reinforcement Learning: Modern Machine Learning Approaches presents fundamental concepts and practical algorithms of statistical reinforcement learning from the modern machine learning viewpoint. It covers various types of RL approaches, including model-based and model-free approaches, policy iteration, and policy search methods. Covers the range of reinforcement learning algorithms from a modern perspective Lays out the associated optimization problems for each reinforcement learning scenario covered Provides thought-provoking statistical treatment of reinforcement learning algorithms The book covers approaches recently introduced in the data mining and machine learning fields to provide a systematic bridge between RL and data mining/machine learning researchers. It presents state-of-the-art results, including dimensionality reduction in RL and risk-sensitive RL. Numerous illustrative examples are included to help readers understand the intuition and usefulness of reinforcement learning techniques. This book is an ideal resource for graduate-level students in computer science and applied statistics programs, as well as researchers and engineers in related fields.
(source: Nielsen Book Data)
- EWRL 2008 (2008 : Villeneuve-d'Ascq, France)
- Berlin ; New York : Springer, 2008.
- Description
- Book — xii, 281 p. : ill. (some col.) ; 24 cm.
- Summary
-
This book constitutes revised and selected papers of the 8th European Workshop on Reinforcement Learning, EWRL 2008, which took place in Villeneuve d'Ascq, France, during June 30 - July 3, 2008. The 21 papers presented were carefully reviewed and selected from 61 submissions. They are dedicated to the field of and current researches in reinforcement learning.
(source: Nielsen Book Data)
6. Synchronous Reinforcement Learning-Based Control for Cognitive Autonomy [electronic resource] [2020]
- Vamvoudakis, Kyriakos G.
- Norwell, MA : Now Publishers, 2020
- Description
- Book — 1 online resource (183 p.).
- Summary
-
- 1. Introduction
- 2. Optimal Regulation
- 3. Game-Theoretic Learning
- 4. Model-Free RL with Q-Learning
- 5. Model-Based and Model-Free Intermittent RL
- 6. Bounded Rationality and Non-Equilibrium RL in Games
- 7. Applications to Autonomous Vehicles
- 8. Concluding Remarks Acknowledgements References.
- (source: Nielsen Book Data)
(source: Nielsen Book Data)
- [First edition]. - [Place of publication not identified] : Manning Publications, [2021]
- Description
- Video — 1 online resource (1 video file (39 min.)) : sound, color. Sound: digital. Digital: video file.
- Summary
-
Learn how deep reinforcement learning works by focusing on Q-networks and policy gradients over a simple example.
- Rao, Ashwin (Mathematics professor), author.
- First edition - Boca Raton : Chapman & Hall, CRC Press, Taylor & Francis Group, 2023
- Description
- Book — xxii, 499 pages : illustrations ; 27 cm
- Summary
-
"Foundations of Reinforcement Learning with Applications in Finance aims to demystify Reinforcement Learning, and to make it a practically useful tool for those studying and working in applied areas - especially finance. Reinforcement Learning is emerging as a viable and powerful technique for solving a variety of complex problems across industries that involve Sequential Optimal Decisioning under Uncertainty. Its penetration in high-profile problems like self-driving cars, robotics, and strategy games points to a future where Reinforcement Learning algorithms will have decisioning abilities far superior to humans. But when it comes getting educated in this area, there seems to be a reluctance to jump right in, because Reinforcement Learning appears to have acquired a reputation for being mysterious and exotic. Even technical people will often claim that the subject involves "advanced math" and "complicated engineering", erecting a psychological barrier to entry against otherwise interested students. This book seeks to overcome that barrier, and to introduce the foundations of Reinforcement Learning in a way that balances depth of understanding with clear, minimally technical delivery. Features Focus on the foundational theory underpinning Reinforcement Learning Suitable as a primary text for courses in Reinforcement Learning, but also as supplementary reading for applied/financial mathematics, programming, and other related courses Suitable for a professional audience of quantitative analysts or industry specialists Blends theory/mathematics, programming/algorithms and real-world financial nuances while always striving to maintain simplicity and to build intuitive understanding"-- Provided by publisher
- Online
SAL3 (off-campus storage)
SAL3 (off-campus storage) | Status |
---|---|
Stacks | Request (opens in new tab) |
HG152 .R36 2023 | Available |
- Li, Shengbo Eben, author.
- Singapore : Springer, 2023.
- Description
- Book — 1 online resource
- Summary
-
- Chapter 1 Introduction of Reinforcement Learning
- Chapter 2 Principles of RL Problems
- Chapter 3 Model-free Indirect RL: Monte Carlo
- Chapter 4 Model-Free Indirect RL: Temporal-Difference
- Chapter 5 Model-based Indirect RL: Dynamic Programming
- Chapter 6 Indirect RL with Function Approximation
- Chapter 7 Direct RL with Policy Gradient
- Chapter 8 Infinite Horizon Approximate Dynamic Programming
- Chapter 9 Finite Horizon ADP and State Constraints
- Chapter 10 Deep Reinforcement Learning
- Chapter 11 Advanced RL Topics.
- Kunczik, Leonhard, author.
- Wiesbaden : Springer Vieweg, [2022]
- Description
- Book — 1 online resource : illustrations
- Summary
-
- Motivation: Complex Attacker-Defender Scenarios - The eternal conflict., The Information Game - A special Attacker-Defender Scenario., Reinforcement Learning and Bellman's Principle of Optimality., Quantum Reinforcement Learning - Connecting Reinforcement Learning and Quantum Computing.- Approximation in Quantum Computing.- Advanced Quantum Policy Approximation in Policy Gradient Rein-forcement Learning.- Applying Quantum REINFORCE to the Information Game.- Evaluating quantum REINFORCE on IBM's Quantum Hardware.- Future Steps in Quantum Reinforcement Learning for Complex Scenarios.- Conclusion.
- (source: Nielsen Book Data)
(source: Nielsen Book Data)
- Sadhu, Arup Kumar, author.
- Piscataway, NJ : IEEE Press ; Hoboken, NJ : John Wiley & Sons, Inc., 2021.
- Description
- Book — 1 online resource (xxii, 296 pages) : illustrations (some color)
- Summary
-
- PREFACE ACKNOWLEDGEMENT CHAPTER 1 INTRODUCTION: MULTI-AGENT COORDINATION BY REINFORCEMENT LEARNING AND EVOLUTIONARY ALGORITHMS 1 1.1 INTRODUCTION 2 1.2 SINGLE AGENT PLANNING 3 1.2.1 Terminologies used in single agent planning 4 1.2.2 Single agent search-based planning algorithms 9 1.2.2.1 Dijkstra's algorithm 10 1.2.2.2 A* (A-star) Algorithm 12 1.2.2.3 D* (D-star) Algorithm 14 1.2.2.4 Planning by STRIPS-like language 16 1.2.3 Single agent reinforcement learning 16 1.2.3.1 Multi-Armed Bandit Problem 17 1.2.3.2 Dynamic programming and Bellman equation 19 1.2.3.3 Correlation between reinforcement learning and Dynamic programming 20 1.2.3.4 Single agent Q-learning 20 1.2.3.5 Single agent planning using Q-learning 23 1.3 MULTI-AGENT PLANNING AND COORDINATION 24 1.3.1 Terminologies related to multi-agent coordination 24 1.3.2 Classification of multi-agent system 25 1.3.3 Game theory for multi-agent coordination 27 1.3.3.1 Nash equilibrium (NE) 30 1.3.3.1.1 Pure strategy NE (PSNE) 31 1.3.3.1.2 Mixed strategy NE (MSNE) 33 1.3.3.2 Correlated equilibrium (CE) 36 1.3.3.3 Static game examples 37 1.3.4 Correlation among RL, DP, and GT 39 1.3.5 Classification of MARL 39 1.3.5.1 Cooperative multi-agent reinforcement learning 41 1.3.5.1.1 Static 41 Independent Learner (IL) and Joint Action Learner (JAL) 41Frequency maximum Q-value (FMQ) heuristic 44 1.3.5.1.2 Dynamic 46 Team-Q 46 Distributed -Q 47 Optimal Adaptive Learning 50 Sparse cooperative Q-learning (SCQL) 52 Sequential Q-learning (SQL) 53 Frequency of the maximum reward Q-learning (FMRQ) 53 1.3.5.2 Competitive multi-agent reinforcement learning 55 1.3.5.2.1 Minimax-Q Learning 55 1.3.5.2.2 Heuristically-accelerated multi-agent reinforcement learning 56 1.3.5.3 Mixed multi-agent reinforcement learning 57 1.3.5.3.1 Static 57 Belief-based Learning rule 57 Fictitious play 57 Meta strategy 58 Adapt When Everybody is Stationary, Otherwise Move to Equilibrium (AWESOME) 60 Hyper-Q 62 Direct policy search based 63 Fixed learning rate 63 Infinitesimal Gradient Ascent (IGA) 63 Generalized Infinitesimal Gradient Ascent (GIGA) 65 Variable learning rate 66 Win or Learn Fast-IGA (WoLF-IGA) 66 GIGA-Win or Learn Fast (GIGA-WoLF) 66 1.3.5.3.2 Dynamic 67 Equilibrium dependent 67 Nash-Q Learning 67 Correlated-Q Learning (CQL) 68 Asymmetric-Q Learning (AQL) 68 Friend-or-Foe Q-learning 70 Negotiation-based Q-learning 71 MAQL with equilibrium transfer 74 Equilibrium independent 76 Variable learning rate 76 Win or Learn Fast Policy hill-climbing (WoLF-PHC) 76 Policy Dynamic based Win or Learn Fast (PD-WoLF) 78 Fixed learning rate 78 Non-Stationary Converging Policies (NSCP) 78 Extended Optimal Response Learning (EXORL) 79 1.3.6 Coordination and planning by MAQL 80 1.3.7 Performance analysis of MAQL and MAQL-based coordination 81 1.4 COORDINATION BY OPTIMIZATION ALGORITHM 83 1.4.1 Particle Swarm Optimization (PSO) Algorithm 84 1.4.2 Firefly Algorithm (FA) 87 1.4.2.1 Initialization 87 1.4.2.2 Attraction to Brighter Fireflies 87 1.4.2.3 Movement of Fireflies 88 1.4.3 Imperialist Competitive Algorithm (ICA) 89 1.4.3.1 Initialization 89 1.4.3.2 Selection of Imperialists and Colonies 89 1.4.3.3 Formation of Empires 89 1.4.3.4 Assimilation of Colonies 90 1.4.3.5 Revolution 91 1.4.3.6 Imperialistic Competition 91 1.4.3.6.1 Total Empire Power Evaluation 91 1.4.3.6.2 Reassignment of Colonies and Removal of Empire 92 1.4.3.6.3 Union of Empires 92 1.4.4 Differential evolutionary (DE) algorithm 93 1.4.4.1 Initialization 93 1.4.4.2 Mutation 93 1.4.4.3 Recombination 93 1.4.4.4 Selection 93 1.4.5 Offline optimization 94 1.4.6 Performance analysis of optimization algorithms 94 1.4.6.1 Friedman test 94 1.4.6.2 Iman-Davenport test 95 1.5 SCOPE OF THE Book 95 1.6 SUMMARY 98 References 98 CHAPTER 2 IMPROVE CONVERGENCE SPEED OF MULTI-AGENT Q-LEARNING FOR COOPERATIVE TASK-PLANNING 107 2.1 INTRODUCTION 108 2.2 LITERATURE REVIEW 112 2.3 PRELIMINARIES 114 2.3.1 Single agent Q-learning 114 2.3.2 Multi-agent Q-learning 115 2.4 PROPOSED MULTI-AGENT Q-LEARNING 118 2.4.1 Two useful properties 119 2.5 PROPOSED FCMQL ALGORITHMS AND THEIR CONVERGENCE ANALYSIS 120 2.5.1 Proposed FCMQL algorithms 120 2.5.2 Convergence analysis of the proposed FCMQL algorithms 121 2.6 FCMQL-BASED COOPERATIVE MULTI-AGENT PLANNING 122 2.7 EXPERIMENTS AND RESULTS 123 2.8 CONCLUSIONS 130 2.9 SUMMARY 131 2.10 APPENDIX 2.1 131 2.11 APPENDIX 2.2 135 References 152 CHAPTER 3 CONSENSUS Q-LEARNING FOR MULTI-AGENT COOPERATIVE PLANNING 157 3.1 INTRODUCTION 158 3.2 PRELIMINARIES 159 3.2.1 Single agent Q-learning 159 3.2.2 Equilibrium-based multi-agent Q-learning 160 3.3 CONSENSUS 161 3.4 PROPOSED CONSENSUS Q-LEARNING AND PLANNING 162 3.4.1 Consensus Q-learning 162 3.4.2 Consensus-based multi-robot planning 164 3.5 EXPERIMENTS AND RESULTS 165 3.5.1 Experimental setup 165 3.5.2 Experiments for CoQL 165 3.5.3 Experiments for consensus-based planning 166 3.6 CONCLUSIONS 168 3.7 SUMMARY 168 References 168 CHAPTER 4 AN EFFICIENT COMPUTING OF CORRELATED EQUILIBRIUM FOR COOPERATIVE Q-LEARNING BASED MULTI-AGENT PLANNING 171 4.1 INTRODUCTION 172 4.2 SINGLE-AGENT Q-LEARNING AND EQUILIBRIUM BASED MAQL 175 4.2.1 Single Agent Q learning 175 4.2.2 Equilibrium based MAQL 175 4.3 PROPOSED COOPERATIVE MULTI-AGENT Q-LEARNING AND PLANNING 176 4.3.1 Proposed schemes with their applicability 176 4.3.2 Immediate rewards in Scheme-I and -II 177 4.3.3 Scheme-I induced MAQL 178 4.3.4 Scheme-II induced MAQL 180 4.3.5 Algorithms for scheme-I and II 182 4.3.6 Constraint QL-I/ QL-II(C ......................................................... 183 4.3.7 Convergence 183 Multi-agent planning 185 4.4 COMPLEXITY ANALYSIS 186 4.4.1 Complexity of Correlated Q-Learning 187 4.4.1.1 Space Complexity 187 4.4.1.2 Time Complexity 187 4.4.2 Complexity of the proposed algorithms 188 4.4.2.1 Space Complexity 188 4.4.2.2 Time Complexity 188 4.4.3 Complexity comparison 189 4.4.3.1 Space complexity 190 4.4.3.2 Time complexity 190 4.5 SIMULATION AND EXPERIMENTAL RESULTS 191 4.5.1 Experimental platform 191 4.5.1.1 Simulation 191 4.5.1.2 Hardware 192 4.5.2 Experimental approach 192 4.5.2.1 Learning phase 193 4.5.2.2 Planning phase 193 4.5.3 Experimental results 194 4.6 CONCLUSION 201 4.7 SUMMARY 202 4.8 APPENDIX 203 References 209 CHAPTER 5 A MODIFIED IMPERIALIST COMPETITIVE ALGORITHM FOR MULTI-AGENT STICK- CARRYING APPLICATION
- 213 5.1 INTRODUCTION 214 5.2 PROBLEM FORMULATION FOR MULTI-ROBOT STICK-CARRYING 219 5.3 PROPOSED HYBRID ALGORITHM 222 5.3.1 An Overview of Imperialist Competitive Algorithm (ICA) 222 5.3.1.1 Initialization 222 5.3.1.2 Selection of Imperialists and Colonies 223 5.3.1.3 Formation of Empires 223 5.3.1.4 Assimilation of Colonies 223 5.3.1.5 Revolution 224 5.3.1.6 Imperialistic Competition 224 5.3.1.6.1 Total Empire Power Evaluation 225 5.3.1.6.2 Reassignment of Colonies and Removal of Empire 225 5.3.1.6.3 Union of Empires 226 5.4 AN OVERVIEW OF FIREFLY ALGORITHM (FA) 226 5.4.1 Initialization 226 5.4.2 Attraction to Brighter Fireflies 226 5.4.3 Movement of Fireflies 227 5.5 PROPOSED IMPERIALIST COMPETITIVE FIREFLY ALGORITHM 227 5.5.1 Assimilation of Colonies 229 5.5.1.1 Attraction to Powerful Colonies 230 5.5.1.2 Modification of Empire Behavior 230 5.5.1.3 Union of Empires 230 5.6 SIMULATION RESULTS 232 5.6.1 Comparative Framework 232 5.6.2 Parameter Settings 232 5.6.3 Analysis on Explorative Power of ICFA 232 5.6.4 Comparison of Quality of the Final Solution 233 5.6.5 Performance Analysis 233 5.7 COMPUTER SIMULATION AND EXPERIMENT 240 5.7.1 Average total path deviation (ATPD) 240 5.7.2 Average Uncovered Target Distance (AUTD) 241 5.7.3 Experimental Setup in Simulation Environment 241 5.7.4 Experimental Results in Simulation Environment 242 5.7.5 Experimental Setup with Khepera Robots 244 5.7.6 Experimental Results with Khepera Robots 244 5.8 CONCLUSION 245 5.9 SUMMARY 247 5.10 APPENDIX 5.1 248 References 249 CHAPTER 6 CONCLUSIONS AND FUTURE DIRECTIONS 255 6.1 CONCLUSIONS 256 6.2 FUTURE DIRECTIONS 257.
- (source: Nielsen Book Data)
(source: Nielsen Book Data)
- Bernstein, Andrey, author.
- Golden, CO : National Renewable Energy Laboratory, 2019
- Description
- Book — 1 online resource (8 pages) : color illustrations
- Kober, Jens, author.
- Cham : Springer, 2014.
- Description
- Book — 1 online resource (xvi, 191 pages) : illustrations (some color) Digital: text file.PDF.
- Summary
-
- Reinforcement Learning in Robotics: A Survey
- Movement Templates for Learning of Hitting and Batting
- Policy Search for Motor Primitives in Robotics
- Reinforcement Learning to Adjust Parameterized Motor Primitives to New Situations
- Learning Prioritized Control of Motor Primitives.
- Description
- Book
- Online
SAL3 (off-campus storage)
SAL3 (off-campus storage) | Status |
---|---|
Stacks | Request (opens in new tab) |
126620 | Available |
- Jiang, Z.-P. (Zhong-Ping), author.
- Hanover, MA : Now Publishers, [2020]
- Description
- Book — 1 online resource (viii, 109 pages)
- Summary
-
- 1. Introduction
- 2. Learning-Based Control of Continuous-Time Dynamical Systems
- 3. Learning-Based Control of Large-Scale Interconnected Systems
- 4. Learning-Based Output Regulation
- 5. Applications
- 6. Perspective and Future Work Acknowledgements References.
- (source: Nielsen Book Data)
(source: Nielsen Book Data)
- [First edition]. - [Place of publication not identified] : Manning Publications, [2020]
- Description
- Video — 1 online resource (1 video file (1 hr., 12 min.)) : sound, color. Sound: digital. Digital: video file.
- Summary
-
An overview of a reinforcement learning multi-agent (soccer) environment.
- Frommberger, Lutz.
- Heidelberg ; New York : Springer-Verlag, 2010.
- Description
- Book — 1 online resource (xvii, 174 pages) : illustrations Digital: text file; PDF.
- Summary
-
- Foundations of Reinforcement Learning
- Abstraction and Knowledge Transfer in Reinforcement Learning
- Qualitative State Space Abstraction
- Generalization and Transfer Learning with Qualitative Spatial Abstraction
- RLPR
- An Aspectualizable State Space Representation
- Empirical Evaluation
- Summary and Outlook.
- Description
- Book
- Online
SAL3 (off-campus storage)
SAL3 (off-campus storage) | Status |
---|---|
Stacks | Request (opens in new tab) |
133876 | Available |
- EWRL 2008 (2008 : Villeneuve d'Ascq, France)
- Berlin ; New York : Springer, 2008.
- Description
- Book — xii, 281 p. : ill.
- Dimitrakakis, Christos, author.
- Cham, Switzerland : Springer, 2022.
- Description
- Book — 1 online resource : illustrations (black and white).
- Summary
-
- Introduction.- Subjective probability and utility.- Decision problems.- Estimation. .
- (source: Nielsen Book Data)
(source: Nielsen Book Data)
Articles+
Journal articles, e-books, & other e-resources
Guides
Course- and topic-based guides to collections, tools, and services.