Librarian View
Last updated in SearchWorks on November 24, 2023 9:17pm
LEADER 06737cam a2200985Ia 4500
001
a14250301
003
SIRSI
006
m o d
007
cr |||||||||||
008
100715s2010 sz a ob 000 0 eng d
035
a| (Sirsi) a14250301
040
a| AZS
b| eng
e| pn
c| AZS
d| AZS
d| UMC
d| UWO
d| PUL
d| CEF
d| OCLCQ
d| OCLCO
d| QE2
d| EBLCP
d| E7B
d| YDXCP
d| DEBSZ
d| N$T
d| UKMGB
d| UMI
d| OCLCQ
d| COO
d| MERUC
d| OCLCQ
d| Z5A
d| RIU
d| OCLCQ
d| OCLCF
d| MERER
d| WYU
d| OCLCQ
d| YOU
d| TKN
d| OL$
d| OCLCQ
d| LEAUB
d| UKAHL
d| LVT
d| OCLCO
d| KSU
d| GW5XE
d| CSt
016
7
a| 015806633
2| Uk
020
a| 9781608454938
q| (electronic bk.)
020
a| 1608454932
q| (electronic bk.)
020
a| 1608454924
020
a| 9781608454921
020
z| 1608454924
020
z| 9781608454921
020
a| 9783031015519
q| (electronic bk.)
020
a| 3031015517
q| (electronic bk.)
024
7
a| 10.2200/S00268ED1V01Y201005AIM009
2| doi
024
3
a| 9781608454921
024
7
a| 10.1007/978-3-031-01551-9
2| doi
035
a| (OCoLC)647995927
z| (OCoLC)707877247
z| (OCoLC)785779112
z| (OCoLC)785947355
z| (OCoLC)861345348
z| (OCoLC)987457496
z| (OCoLC)1027305703
z| (OCoLC)1044286082
z| (OCoLC)1053120427
z| (OCoLC)1067055383
z| (OCoLC)1086525294
z| (OCoLC)1229610890
037
a| CL0500000322
b| Safari Books Online
050
4
a| Q325.6
b| .S94 2010
072
7
a| COM
x| 005030
2| bisacsh
072
7
a| COM
x| 004000
2| bisacsh
082
0
4
a| 006.31
2| 22
049
a| MAIN
100
1
a| Szepesvári, Csaba.
245
1
0
a| Algorithms for reinforcement learning /
c| Csaba Szepesvári.
260
a| Cham, Switzerland :
b| Springer,
c| ©2010.
300
a| 1 online resource (xii, 89 pages) :
b| illustrations
336
a| text
b| txt
2| rdacontent
337
a| computer
b| c
2| rdamedia
338
a| online resource
b| cr
2| rdacarrier
490
1
a| Synthesis lectures on artificial intelligence and machine learning,
x| 1939-4616 ;
v| #9
504
a| Includes bibliographical references (pages 73-88).
505
0
a| 1. Markov decision processes -- Preliminaries -- Markov decision processes -- Value functions -- Dynamic programming algorithms for solving MDPs.
520
3
a| Reinforcement learning is a learning paradigm concerned with learning to control a system so as to maximize a numerical performance measure that expresses a long-term objective. What distinguishes reinforcement learning from supervised learning is that only partial feedback is given to the learner about the learner's predictions. Further, the predictions may have long term effects through influencing the future state of the controlled system. Thus, time plays a special role. The goal in reinforcement learning is to develop efficient learning algorithms, as well as to understand the algorithms' merits and limitations. Reinforcement learning is of great interest because of the large number of practical applications that it can be used to address, ranging from problems in artificial intelligence to operations research or control engineering. In this book, we focus on those algorithms of reinforcement learning that build on the powerful theory of dynamic programming. We give a fairly comprehensive catalog of learning problems, describe the core ideas, note a large number of state of the art algorithms, followed by the discussion of their theoretical properties and limitations.
650
0
a| Reinforcement learning
x| Mathematical models.
650
0
a| Machine learning.
650
0
a| Markov processes.
650
2
a| Markov Chains
650
6
a| Apprentissage par renforcement (Intelligence artificielle)
x| Modèles mathématiques.
650
6
a| Apprentissage automatique.
650
6
a| Processus de Markov.
650
7
a| COMPUTERS
x| Enterprise Applications
x| Business Intelligence Tools.
2| bisacsh
650
7
a| COMPUTERS
x| Intelligence (AI) & Semantics.
2| bisacsh
650
7
a| Machine learning.
2| fast
0| (OCoLC)fst01004795
650
7
a| Markov processes.
2| fast
0| (OCoLC)fst01010347
653
a| Reinforcement learning
653
a| Markov Decision Processes
653
a| Temporal difference learning
653
a| Stochastic approximation
653
a| Two-timescale stochastic approximation
653
a| Monte-Carlo methods
653
a| Simulation optimization
653
a| Function approximation
653
a| Stochastic gradient methods
653
a| Least-squares methods
653
a| Overfitting
653
a| Bias-variance tradeoff
653
a| Online learning
653
a| Active learning
653
a| Planning
653
a| Simulation
653
a| PAC-learning
653
a| Q-learning
653
a| Actor-critic methods
653
a| Policy gradient
653
a| Natural gradient
776
0
8
i| Print version:
a| Szepesvári, Csaba.
t| Algorithms for reinforcement learning.
d| [San Rafael, Calif.] : Morgan & Claypool, ©2010
z| 1608454924
w| (OCoLC)671466144
830
0
a| Synthesis lectures on artificial intelligence and machine learning ;
v| #9.
856
4
0
z| Available to Stanford-affiliated users.
u| https://link.springer.com/10.1007/978-3-031-01551-9
x| WMS
y| SpringerLink
x| Provider: Springer
x| purchased
x| eLoaderURL
x| sp4
x| spocn647995927
994
a| 92
b| STF
905
0
a| Markov Decision Processes Value Prediction Problems Control For Further Exploration.
1| Nielsen
x| 9781608454921
x| 20220711
920
b| Reinforcement learning is a learning paradigm concerned with learning to control a system so as to maximize a numerical performance measure that expresses a long-term objective. What distinguishes reinforcement learning from supervised learning is that only partial feedback is given to the learner about the learner's predictions. Further, the predictions may have long term effects through influencing the future state of the controlled system. Thus, time plays a special role. The goal in reinforcement learning is to develop efficient learning algorithms, as well as to understand the algorithms' merits and limitations. Reinforcement learning is of great interest because of the large number of practical applications that it can be used to address, ranging from problems in artificial intelligence to operations research or control engineering. In this book, we focus on those algorithms of reinforcement learning that build on the powerful theory of dynamic programming. We give a fairly comprehensive catalog of learning problems, describe the core ideas, note a large number of state of the art algorithms, followed by the discussion of their theoretical properties and limitations.
1| Nielsen
x| 9781608454921
x| 20220711
596
a| 22
035
a| (Sirsi) spocn647995927
999
f
f
i| 021b91cf-c68f-50f5-86d9-2fbdbbe20ee7
s| 7ad057d3-1e58-5401-9478-c115c03d1520
Holdings JSON
{ "holdings": [ { "id": "ee38eecd-3ca5-5f0b-b7e1-43ca80c785d6", "hrid": "ah14250301_1", "notes": [ ], "_version": 1, "metadata": { "createdDate": "2023-08-21T21:26:31.279Z", "updatedDate": "2023-08-21T21:26:31.279Z", "createdByUserId": "58d0aaf6-dcda-4d5e-92da-012e6b7dd766", "updatedByUserId": "58d0aaf6-dcda-4d5e-92da-012e6b7dd766" }, "sourceId": "f32d531e-df79-46b3-8932-cdd35f7a2264", "boundWith": null, "formerIds": [ ], "illPolicy": null, "instanceId": "021b91cf-c68f-50f5-86d9-2fbdbbe20ee7", "holdingsType": { "id": "996f93e2-5b5e-4cf2-9168-33ced1f95eed", "name": "Electronic", "source": "folio" }, "holdingsItems": [ ], "callNumberType": null, "holdingsTypeId": "996f93e2-5b5e-4cf2-9168-33ced1f95eed", "electronicAccess": [ ], "bareHoldingsItems": [ ], "holdingsStatements": [ ], "statisticalCodeIds": [ ], "administrativeNotes": [ ], "effectiveLocationId": "b0a1a8c3-cc9a-487c-a2ed-308fc3a49a91", "permanentLocationId": "b0a1a8c3-cc9a-487c-a2ed-308fc3a49a91", "suppressFromDiscovery": false, "holdingsStatementsForIndexes": [ ], "holdingsStatementsForSupplements": [ ], "location": { "effectiveLocation": { "id": "b0a1a8c3-cc9a-487c-a2ed-308fc3a49a91", "code": "SUL-ELECTRONIC", "name": "online resource", "campus": { "id": "c365047a-51f2-45ce-8601-e421ca3615c5", "code": "SUL", "name": "Stanford Libraries" }, "details": { }, "library": { "id": "c1a86906-ced0-46cb-8f5b-8cef542bdd00", "code": "SUL", "name": "SUL" }, "isActive": true, "institution": { "id": "8d433cdd-4e8f-4dc1-aa24-8a4ddb7dc929", "code": "SU", "name": "Stanford University" } }, "permanentLocation": { "id": "b0a1a8c3-cc9a-487c-a2ed-308fc3a49a91", "code": "SUL-ELECTRONIC", "name": "online resource", "campus": { "id": "c365047a-51f2-45ce-8601-e421ca3615c5", "code": "SUL", "name": "Stanford Libraries" }, "details": { }, "library": { "id": "c1a86906-ced0-46cb-8f5b-8cef542bdd00", "code": "SUL", "name": "SUL" }, "isActive": true, "institution": { "id": "8d433cdd-4e8f-4dc1-aa24-8a4ddb7dc929", "code": "SU", "name": "Stanford University" } } } } ], "items": [ ] }
FOLIO JSON
{ "pieces": [ null ], "instance": { "id": "021b91cf-c68f-50f5-86d9-2fbdbbe20ee7", "hrid": "a14250301", "notes": [ { "note": "Includes bibliographical references (pages 73-88)", "staffOnly": false, "instanceNoteTypeId": "86b6e817-e1bc-42fb-bab0-70e7547de6c1" }, { "note": "1. Markov decision processes -- Preliminaries -- Markov decision processes -- Value functions -- Dynamic programming algorithms for solving MDPs", "staffOnly": false, "instanceNoteTypeId": "5ba8e385-0e27-462e-a571-ffa1fa34ea54" }, { "note": "Reinforcement learning is a learning paradigm concerned with learning to control a system so as to maximize a numerical performance measure that expresses a long-term objective. What distinguishes reinforcement learning from supervised learning is that only partial feedback is given to the learner about the learner's predictions. Further, the predictions may have long term effects through influencing the future state of the controlled system. Thus, time plays a special role. The goal in reinforcement learning is to develop efficient learning algorithms, as well as to understand the algorithms' merits and limitations. Reinforcement learning is of great interest because of the large number of practical applications that it can be used to address, ranging from problems in artificial intelligence to operations research or control engineering. In this book, we focus on those algorithms of reinforcement learning that build on the powerful theory of dynamic programming. We give a fairly comprehensive catalog of learning problems, describe the core ideas, note a large number of state of the art algorithms, followed by the discussion of their theoretical properties and limitations", "staffOnly": false, "instanceNoteTypeId": "10e2e11b-450f-45c8-b09b-0f819999966e" } ], "title": "Algorithms for reinforcement learning / Csaba Szepesvári.", "series": [ "Synthesis lectures on artificial intelligence and machine learning, 1939-4616 ; #9", "Synthesis lectures on artificial intelligence and machine learning ; #9" ], "source": "MARC", "_version": 1, "editions": [ ], "metadata": { "createdDate": "2023-08-21T21:22:27.688Z", "updatedDate": "2023-08-21T21:22:27.688Z", "createdByUserId": "58d0aaf6-dcda-4d5e-92da-012e6b7dd766", "updatedByUserId": "58d0aaf6-dcda-4d5e-92da-012e6b7dd766" }, "statusId": "9634a5ab-9228-4703-baf2-4d12ebc77d56", "subjects": [ "Reinforcement learning Mathematical models", "Machine learning", "Markov processes", "Markov Chains", "Apprentissage par renforcement (Intelligence artificielle) Modèles mathématiques", "Apprentissage automatique", "Processus de Markov", "COMPUTERS Enterprise Applications Business Intelligence Tools", "COMPUTERS Intelligence (AI) & Semantics" ], "languages": [ "eng" ], "indexTitle": "Algorithms for reinforcement learning", "identifiers": [ { "value": "(Sirsi) a14250301", "identifierTypeId": "7e591197-f335-4afb-bc6d-a6d76ca3bace" }, { "value": "9781608454938 (electronic bk.)", "identifierTypeId": "8261054f-be78-422d-bd51-4ed9f33c3422" }, { "value": "1608454932 (electronic bk.)", "identifierTypeId": "8261054f-be78-422d-bd51-4ed9f33c3422" }, { "value": "1608454924", "identifierTypeId": "8261054f-be78-422d-bd51-4ed9f33c3422" }, { "value": "9781608454921", "identifierTypeId": "8261054f-be78-422d-bd51-4ed9f33c3422" }, { "value": "1608454924", "identifierTypeId": "fcca2643-406a-482a-b760-7a7f8aec640e" }, { "value": "9781608454921", "identifierTypeId": "fcca2643-406a-482a-b760-7a7f8aec640e" }, { "value": "9783031015519 (electronic bk.)", "identifierTypeId": "8261054f-be78-422d-bd51-4ed9f33c3422" }, { "value": "3031015517 (electronic bk.)", "identifierTypeId": "8261054f-be78-422d-bd51-4ed9f33c3422" }, { "value": "10.2200/S00268ED1V01Y201005AIM009 doi", "identifierTypeId": "2e8b3b6c-0e7d-4e48-bca2-b0b23b376af5" }, { "value": "10.2200/S00268ED1V01Y201005AIM009", "identifierTypeId": "ebfd00b6-61d3-4d87-a6d8-810c941176d5" }, { "value": "10.2200/S00268ED1V01Y201005AIM009", "identifierTypeId": "1795ea23-6856-48a5-a772-f356e16a8a6c" }, { "value": "9781608454921", "identifierTypeId": "2e8b3b6c-0e7d-4e48-bca2-b0b23b376af5" }, { "value": "9781608454921", "identifierTypeId": "ebfd00b6-61d3-4d87-a6d8-810c941176d5" }, { "value": "9781608454921", "identifierTypeId": "1795ea23-6856-48a5-a772-f356e16a8a6c" }, { "value": "10.1007/978-3-031-01551-9 doi", "identifierTypeId": "2e8b3b6c-0e7d-4e48-bca2-b0b23b376af5" }, { "value": "10.1007/978-3-031-01551-9", "identifierTypeId": "ebfd00b6-61d3-4d87-a6d8-810c941176d5" }, { "value": "10.1007/978-3-031-01551-9", "identifierTypeId": "1795ea23-6856-48a5-a772-f356e16a8a6c" }, { "value": "(OCoLC)647995927", "identifierTypeId": "439bfbae-75bc-4f74-9fc7-b2a2d47ce3ef" }, { "value": "(OCoLC)707877247", "identifierTypeId": "fc4e3f2a-887a-46e5-8057-aeeb271a4e56" }, { "value": "(OCoLC)785779112", "identifierTypeId": "fc4e3f2a-887a-46e5-8057-aeeb271a4e56" }, { "value": "(OCoLC)785947355", "identifierTypeId": "fc4e3f2a-887a-46e5-8057-aeeb271a4e56" }, { "value": "(OCoLC)861345348", "identifierTypeId": "fc4e3f2a-887a-46e5-8057-aeeb271a4e56" }, { "value": "(OCoLC)987457496", "identifierTypeId": "fc4e3f2a-887a-46e5-8057-aeeb271a4e56" }, { "value": "(OCoLC)1027305703", "identifierTypeId": "fc4e3f2a-887a-46e5-8057-aeeb271a4e56" }, { "value": "(OCoLC)1044286082", "identifierTypeId": "fc4e3f2a-887a-46e5-8057-aeeb271a4e56" }, { "value": "(OCoLC)1053120427", "identifierTypeId": "fc4e3f2a-887a-46e5-8057-aeeb271a4e56" }, { "value": "(OCoLC)1067055383", "identifierTypeId": "fc4e3f2a-887a-46e5-8057-aeeb271a4e56" }, { "value": "(OCoLC)1086525294", "identifierTypeId": "fc4e3f2a-887a-46e5-8057-aeeb271a4e56" }, { "value": "(OCoLC)1229610890", "identifierTypeId": "fc4e3f2a-887a-46e5-8057-aeeb271a4e56" }, { "value": "(Sirsi) spocn647995927", "identifierTypeId": "7e591197-f335-4afb-bc6d-a6d76ca3bace" } ], "publication": [ { "place": "Cham, Switzerland", "publisher": "Springer", "dateOfPublication": "©2010" } ], "contributors": [ { "name": "Szepesvári, Csaba", "primary": true, "contributorTypeId": "9f0a2cf0-7a9b-45a2-a403-f68d2850d07c", "contributorNameTypeId": "2b94c631-fca9-4892-a730-03ee529ffe2a" } ], "catalogedDate": "2022-07-02", "staffSuppress": false, "instanceTypeId": "6312d172-f0cf-40f6-b27d-9fa8feaf332f", "previouslyHeld": false, "classifications": [ { "classificationNumber": "Q325.6 .S94 2010", "classificationTypeId": "ce176ace-a53e-4b4d-aa89-725ed7b2edac" }, { "classificationNumber": "006.31", "classificationTypeId": "42471af9-7d25-4f3a-bf78-60d29dcf463b" } ], "instanceFormats": [ ], "electronicAccess": [ { "uri": "https://link.springer.com/10.1007/978-3-031-01551-9", "name": "Resource", "linkText": "SpringerLink", "publicNote": "Available to Stanford-affiliated users", "relationshipId": "f5d0068e-6272-458e-8a81-b85e7b9a14aa" } ], "holdingsRecords2": [ ], "modeOfIssuanceId": "9d18a02f-5897-4c31-9106-c9abb5c7ae8b", "publicationRange": [ ], "statisticalCodes": [ ], "alternativeTitles": [ ], "discoverySuppress": false, "instanceFormatIds": [ "f5e8210f-7640-459b-a71f-552567f92369" ], "publicationPeriod": { "start": 2010 }, "statusUpdatedDate": "2023-08-21T21:22:26.994+0000", "statisticalCodeIds": [ ], "administrativeNotes": [ ], "physicalDescriptions": [ "1 online resource (xii, 89 pages) : illustrations" ], "publicationFrequency": [ ], "suppressFromDiscovery": false, "natureOfContentTermIds": [ ] }, "holdingSummaries": [ { "poLineId": null, "orderType": null, "orderStatus": null, "poLineNumber": null, "orderSentDate": null, "orderCloseReason": null, "polReceiptStatus": null } ] }