中心差分法的MATLAB代码-Temporal-Difference-Learning:Matlab中的时态差异学习和基本强化学习演示

共21个文件

m：19个

fig：1个

md：1个

系统开源

需积分: 39 21 浏览量 2021-05-22 17:51:10 上传评论 1 收藏 34KB ZIP 举报

资源详情

资源评论

资源推荐

收起资源包目录

Temporal-Difference-Learning-master.zip （21个子文件）

Temporal-Difference-Learning-master

ReadMe.md 3KB

src

DemoGUI.m 40KB

GridWorldSARSA.m 5KB

DemoGUI.fig 9KB

WindyGridWorldQLearning.m 5KB

FindColBaseCenter.m 535B

DrawActionOnCell.m 1KB

CliffWalkingQLearning.m 5KB

DrawTextOnCell.m 493B

WindyGridWorldSARSA.m 5KB

GenerateRandomWalkSequence.m 888B

PredictionRandomWalk.m 3KB

DrawWindyEpisodeState.m 798B

DrawEpisodeState.m 614B

DrawCliffEpisodeState.m 721B

PredictionRandomWalkAlphaEffect.m 2KB

RLRandomWalk.m 2KB

FindCellCenter.m 445B

GridWorldQLearning.m 5KB

CliffWalkingSARSA.m 5KB

DrawGrid.m 937B

# Temporal-Difference Learning Demos in MATLAB In this package you will find MATLAB codes which demonstrate some selected examples of *temporal-difference learning* methods in *prediction problems* and in *reinforcement learning*. To begin: * Run `DemoGUI.m` * Start with the set of predefined demos: select one and press *Go* * Modify demos: select one of the predefined demos, and modify the options Feel free to distribute or use package especially for educational purposes. I personally, learned too much from cliff-walking. The repository for the package is hosted on [GitHub](https://github.com/sinairv/Temporal-Difference-Learning). ## Why temporal difference learning is important A quotation from *R. S. Sutton*, and *A. G. Barto* from their book *Introduction to Reinforcement Learning* ([here](http://www.cs.ualberta.ca/~sutton/book/ebook/node60.html)): > If one had to identify one idea as central and novel to reinforcement learning, it would undoubtedly be temporal-difference (TD) learning. Many basic reinforcement learning algorithms such as *Q-Laerning* and *SARSA* are in essence *temporal difference learning methods*. ## Demos * *Prediciton random walk*: see how precise we can predict the probability of visiting nodes * *RL random walk*: see how RL generated random walk policy converges the computed probabilities. * *Simple grid world (with and without king moves)*: see how RL generated policy helps the agent find the goal through time (by *king-moves* it is meant moving along the four main directions and the diagonals, i.e., the way king moves in chess). * *Windy grid world*: the wind distracts the agent from its destination sought by its actions. See how RL solves this problem. * *Cliff walking*: the agent should reach its destination while avoiding the cliffs. A truly instructive example, which shows the differences between *on-policy*, and *off-policy* learning algorithms. ## References [1] Sutton, R. S., "Learning to predict by the methods of temporal differences, In *Machine Learning*, pp. 9-44, 1988 (available [online](http://webdocs.cs.ualberta.ca/~sutton/papers/sutton-88.pdf)) [2] Sutton, R. S. and Barto, A. G., "Reinforcement learning: An introduction," 1998 (available [online](http://webdocs.cs.ualberta.ca/~sutton/book/ebook/the-book.html)) [3] Kaelbling, L. P., Littman, M. L., and Moore, A. W., "Reinforcement learning: A survey," *Journal of Artificial Intelligence Research*, Vol.4, pp.237-285, 1997 (available [online](http://www.jair.org/media/301/live-301-1562-jair.pdf)) ## Contact Copyright (c) 2011 Sina Iravanian - licensed under MIT. Homepage: [sinairv.github.io](https://sinairv.github.io) GitHub: [github.com/sinairv](https://github.com/sinairv) Twitter: [@sinairv](http://www.twitter.com/sinairv) ## Screenshots Prediction random walk demo: ![Prediction random walk demo](http://sinairv.github.io/temporal-difference-learning/images/PrdRandomWalk.png) RL random walk demo: ![RL random walk demo](http://sinairv.github.io/temporal-difference-learning/images/RLRandomWalk.png) Simple grid-world demo: ![Simple grid-world demo](http://sinairv.github.io/temporal-difference-learning/images/GridWorlds.png)