IQN-FQF代码浅析_IQN资源-CSDN文库

共29个文件

py：21个

yaml：4个

txt：1个

github

144 浏览量 2024-04-21 15:37:44 上传评论收藏 31KB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

fqf-iqn-qrdqn.pytorch-master.zip （29个子文件）

fqf-iqn-qrdqn.pytorch-master

train_qrdqn.py 1KB

fqf_iqn_qrdqn

agent

__init__.py 157B

fqf_agent.py 11KB

qrdqn_agent.py 5KB

base_agent.py 8KB

iqn_agent.py 6KB

utils.py 3KB

__init__.py 0B

network.py 7KB

model

__init__.py 124B

fqf.py 3KB

iqn.py 2KB

base_model.py 321B

qrdqn.py 3KB

memory

__init__.py 98B

per.py 3KB

segment_tree.py 2KB

base.py 5KB

env.py 10KB

LICENSE 1KB

train_fqf.py 1KB

requirements.txt 72B

.gitignore 39B

train_iqn.py 1KB

README.md 3KB

config

fqf.yaml 611B

iqn.yaml 541B

iqn-rainbow.yaml 537B

qrdqn.yaml 509B

# FQF, IQN and QR-DQN in PyTorch This is a PyTorch implementation of Fully parameterized Quantile Function(FQF)[[1]](#references), Implicit Quantile Networks(IQN)[[2]](#references) and Quantile Regression DQN(QR-DQN)[[3]](#references). I tried to make it easy for readers to understand algorithms. Please let me know if you have any questions. Also, any pull requests are welcomed. **UPDATE** - 2020.6.9 - Bump torch up to 1.5.0. - 2020.5.10 - Refactor codes. - Fix Prioritized Experience Replay and Noisy Networks. - Test IQN with Rainbow's components. - 2020.6.9 - Bump Torch up to 1.5.0. ## Setup If you are using Anaconda, first create the virtual environment. ```bash conda create -n fqf python=3.8 -y conda activate fqf ``` You can install Python liblaries using pip. ```bash pip install --upgrade pip pip install -r requirements.txt ``` If you're using other than CUDA 10.2, you may need to install PyTorch. See [instructions](https://pytorch.org/get-started/locally/) for more details. ## Examples You can train FQF agent using hyperparameters [here](https://github.com/ku2482/fqf-iqn-qrdqn.pytorch/blob/master/config/fqf.yaml). ``` python train_fqf.py --cuda --env_id PongNoFrameskip-v4 --seed 0 --config config/fqf.yaml ``` You can also train IQN or QR-DQN agent in the same way. Note that we log results with the number of frames, which equals to the number of agent's steps multiplied by 4 (e.g. 100M frames means 25M agent's steps). ## Results Results of examples (without n-step rewards, double q-learning, dueling network nor noisy net) are shown below, which is comparable (if no better) with the paper. Scores below are evaluated arfer every 1M frames (250k agent's steps). Result are averaged over 2 seeds and visualized with min/max. **Note that I reported the "mean" score, not the "best" score as in the paper.** Also, I only trained a limited number of frames due to limited resources (e.g. 100M frames instead of 200M). ### BreakoutNoFrameskip-v4 I tested FQF, IQN and QR-DQN on `BreakoutNoFrameskip-v4` for 30M frames to see algorithms worked. <img src="https://user-images.githubusercontent.com/37267851/75846342-5a49bb00-5e1f-11ea-911c-ae287d45426f.png" width=700> ### BerzerkNoFrameskip-v4 I also tested FQF and IQN on `BerzerkNoFrameskip-v4` for 100M frames to see the difference between FQF's performance and IQN's, which is quite obvious on this task. <img src="https://user-images.githubusercontent.com/37267851/75846243-0ccd4e00-5e1f-11ea-9c03-b93e7b505dc8.png" width=700> ### IQN-Rainbow I also tested IQN with Rainbow's components on `PongNoFrameskip-v4` (just 1 seed). Note that I decreased `num_steps` to 7500000(30M frames), but kept `start_steps` as the same. <img src="https://user-images.githubusercontent.com/37267851/81501233-340a3500-9312-11ea-8384-4b9c0b660583.png" width=700> ## TODO - [ ] Implement risk-averse policies for IQN. - [ ] Test FQF-Rainbow agent. ## References [[1]](https://arxiv.org/abs/1911.02140) Yang, Derek, et al. "Fully Parameterized Quantile Function for Distributional Reinforcement Learning." Advances in Neural Information Processing Systems. 2019. [[2]](https://arxiv.org/abs/1806.06923) Dabney, Will, et al. "Implicit quantile networks for distributional reinforcement learning." arXiv preprint. 2018. [[3]](https://arxiv.org/abs/1710.10044) Dabney, Will, et al. "Distributional reinforcement learning with quantile regression." Thirty-Second AAAI Conference on Artificial Intelligence. 2018.

评论收藏

内容反馈