# tetris-ai
A bot that plays [tetris](https://en.wikipedia.org/wiki/Tetris) using deep reinforcement learning.
## Demo
First 10000 points, after some training.
![Demo - First 10000 points](./demo.gif)
## How does it work
#### Reinforcement Learning
At first, the agent will play random moves, saving the states and the given reward in a limited queue (replay memory). At the end of each episode (game), the agent will train itself (using a neural network) with a random sample of the replay memory. As more and more games are played, the agent becomes smarter, achieving higher and higher scores.
Since in reinforcement learning once an agent discovers a good 'path' it will stick with it, it was also considered an exploration variable (that decreases over time), so that the agent picks sometimes a random action instead of the one it considers the best. This way, it can discover new 'paths' to achieve higher scores.
#### Training
The training is based on the [Q Learning algorithm](https://en.wikipedia.org/wiki/Q-learning). Instead of using just the current state and reward obtained to train the network, it is used Q Learning (that considers the transition from the current state to the future one) to find out what is the best possible score of all the given states **considering the future rewards**, i.e., the algorithm is not greedy. This allows for the agent to take some moves that might not give an immediate reward, so it can get a bigger one later on (e.g. waiting to clear multiple lines instead of a single one).
The neural network will be updated with the given data (considering a play with reward *reward* that moves from *state* to *next_state*, the latter having an expected value of *Q_next_state*, found using the prediction from the neural network):
if not terminal state (last round): *Q_state* = *reward* + *discount* × *Q_next_state*
else: *Q_state* = *reward*
#### Best Action
Most of the deep Q Learning strategies used output a vector of values for a certain state. Each position of the vector maps to some action (ex: left, right, ...), and the position with the higher value is selected.
However, the strategy implemented was slightly different. For some round of Tetris, the states for all the possible moves will be collected. Each state will be inserted in the neural network, to predict the score obtained. The action whose state outputs the biggest value will be played.
#### Game State
It was considered several attributes to train the network. Since there were many, after several tests, a conclusion was reached that only the first four present were necessary to train:
- **Number of lines cleared**
- **Number of holes**
- **Bumpiness** (sum of the difference between heights of adjacent pairs of columns)
- **Total Height**
- Max height
- Min height
- Max bumpiness
- Next piece
- Current piece
#### Game Score
Each block placed yields 1 point. When clearing lines, the given score is *number_lines_cleared*^2 × *board_width*. Losing a game subtracts 1 point.
## Implementation
All the code was implemented using `Python`. For the neural network, it was used the framework `Keras` with `Tensorflow` as backend.
#### Internal Structure
The agent is formed by a deep neural network, with variable number of layers, neurons per layer, activation functions, loss function, optimizer, etc. By default, it was chosen a neural network with 2 hidden layers (32 neurons each); the activations `ReLu` for the inner layers and the `Linear` for the last one; `Mean Squared Error` as the loss function; `Adam` as the optimizer; `Epsilon` (exploration) starting at 1 and ending at 0, when the number of episodes reaches 75%; `Discount` at 0.95 (significance given to the future rewards, instead of the immediate ones).
#### Training
For the training, the replay queue had size 20000, with a random sample of 512 selected for training each episode, using 1 epoch.
#### Requirements
- Tensorflow (`tensorflow-gpu==1.14.0`, CPU version can be used too)
- Tensorboard (`tensorboard==1.14.0`)
- Keras (`Keras==2.2.4`)
- Opencv-python (`opencv-python==4.1.0.25`)
- Numpy (`numpy==1.16.4`)
- Pillow (`Pillow==5.4.1`)
- Tqdm (`tqdm==4.31.1`)
## Results
For 2000 episodes, with epsilon ending at 1500, the agent kept going for too long around episode 1460, so it had to be terminated. Here is a chart with the maximum score every 50 episodes, until episode 1450:
![results](./results.svg)
Note: Decreasing the `epsilon_end_episode` could make the agent achieve better results in a smaller number of episodes.
## Useful Links
#### Deep Q Learning
- PythonProgramming - https://pythonprogramming.net/q-learning-reinforcement-learning-python-tutorial/
- Keon - https://keon.io/deep-q-learning/
- Towards Data Science - https://towardsdatascience.com/self-learning-ai-agents-part-ii-deep-q-learning-b5ac60c3f47
#### Tetris
- Code My Road - https://codemyroad.wordpress.com/2013/04/14/tetris-ai-the-near-perfect-player/ (uses evolutionary strategies)
没有合适的资源?快使用搜索试试~ 我知道了~
基于强化学习的AI俄罗斯方块
共19个文件
xml:5个
py:4个
pyc:3个
2 下载量 147 浏览量
2023-08-09
16:08:06
上传
评论
收藏 3.8MB ZIP 举报
温馨提示
基于强化学习的AI俄罗斯方块基于强化学习的AI俄罗斯方块基于强化学习的AI俄罗斯方块基于强化学习的AI俄罗斯方块基于强化学习的AI俄罗斯方块基于强化学习的AI俄罗斯方块基于强化学习的AI俄罗斯方块基于强化学习的AI俄罗斯方块基于强化学习的AI俄罗斯方块基于强化学习的AI俄罗斯方块基于强化学习的AI俄罗斯方块基于强化学习的AI俄罗斯方块基于强化学习的AI俄罗斯方块基于强化学习的AI俄罗斯方块基于强化学习的AI俄罗斯方块基于强化学习的AI俄罗斯方块基于强化学习的AI俄罗斯方块基于强化学习的AI俄罗斯方块基于强化学习的AI俄罗斯方块基于强化学习的AI俄罗斯方块基于强化学习的AI俄罗斯方块基于强化学习的AI俄罗斯方块基于强化学习的AI俄罗斯方块基于强化学习的AI俄罗斯方块基于强化学习的AI俄罗斯方块基于强化学习的AI俄罗斯方块基于强化学习的AI俄罗斯方块基于强化学习的AI俄罗斯方块基于强化学习的AI俄罗斯方块基于强化学习的AI俄罗斯方块基于强化学习的AI俄罗斯方块基于强化学习的AI俄罗斯方块基于强化学习的AI俄罗斯方块基于强化学习的AI俄罗斯方块基于强化学习的AI俄罗斯方块 语言:Python
资源推荐
资源详情
资源评论
收起资源包目录
tetris-ai-master.zip (19个子文件)
tetris-ai-master
logs.py 369B
LICENSE 1KB
tetris.py 10KB
.idea
workspace.xml 5KB
misc.xml 303B
inspectionProfiles
Project_Default.xml 1KB
profiles_settings.xml 174B
modules.xml 291B
.gitignore 184B
tetris-ai-master.iml 333B
results.svg 31KB
run.py 3KB
dqn_agent.py 5KB
requirements.txt 66B
__pycache__
tetris.cpython-37.pyc 9KB
dqn_agent.cpython-37.pyc 4KB
logs.cpython-37.pyc 910B
README.md 5KB
demo.gif 7.58MB
共 19 条
- 1
资源评论
Vec_Kun
- 粉丝: 1w+
- 资源: 58
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功