# rubik
Learning how to solve a Rubik's cube using Reinforcement Learning
## Status
Model is learning something. I tried tweaking the structure of the model but could not get to a loss below 18, which seems quite high.
It's good enough to solve
cubes scrambled with 5 rotations with just a 1-depth greedy search.
Next steps:
- batch calls to the model in get_td_value_examples and in the greedy solver.
- implement A*.
- investigate the model's behavior more:
* more metrics than the loss (e.g. average L1 error)
* slice metrics by the label: are we better at cubes closer or further from
a solved state?
- weight training examples by 1/{# of rotations done to scramble}.
- implement the model that has both a value head and a policy head
- implement MCTS.
## References
- Agostinelli, F., McAleer, S., Shmakov, A. et al. Solving the Rubik’s cube with
deep reinforcement learning and search. Nat Mach Intell 1, 356–363 (2019).
https://doi.org/10.1038/s42256-019-0070-z
* DeepCubeA.
* DNN learns value function using TD(0)
* more complex network: 2 fully connected, then 4 residual blocks.
* cube represented as a one-hot vector of size 6 for each sticker (54
stickers total).
* uses A* for search (with a tweak).
- McAleer, Stephen, et al. "Solving the Rubik's Cube with Approximate Policy
Iteration." (2018). https://openreview.net/forum?id=Hyfn2jCcKm
* DeepCube
* DNN with both a value head (trained using TD(0) and policy head)
* training set sampled from scrambling the solved cube, each example is
weighted by 1/{distance to solved}.
* simple model (sequence of fully connected layers)
* MCTS to search
* trick: use the same network weights for the labels, only update them when
the loss gets below a threshold.
* cube represented as binary 20x24 array
### Not read yet
- Brunetto, R. & Trunda, O. Deep heuristic-learning in the Rubik’s cube domain:
an experimental evaluation. Proc. ITAT 1885, 57–64 (2017).
- Johnson, C. G. Solving the Rubik’s cube with learned guidance functions. In
Proceedings of 2018 IEEE Symposium Series on Computational Intelligence (SSCI)
2082–2089 (IEEE, 2018).
没有合适的资源?快使用搜索试试~ 我知道了~
rubik:学习如何使用强化学习来解决魔方
共15个文件
py:8个
pylintrc:1个
lock:1个
5星 · 超过95%的资源 需积分: 14 9 下载量 146 浏览量
2021-02-16
10:29:00
上传
评论 1
收藏 95KB ZIP 举报
温馨提示
魔方 学习如何使用强化学习来解决魔方 状态 模型正在学习一些东西。 我尝试调整模型的结构,但无法达到低于18的损失,这似乎很高。 仅需1个深度的贪婪搜索就足以解决5次旋转扰乱的多维数据集。 下一步: 在get_td_value_examples和贪婪求解器中批量调用模型。 实施A *。 进一步调查模型的行为: 比损失多的指标(例如平均L1误差) 通过标签对度量进行切片:我们是否更擅长将立方体距已解决状态更近或更远? 以1 / {为打乱而进行的旋转次数}为单位进行举重训练示例。 实现既有价值头又有政策头的模型 实施MCTS。 参考 Agostinelli,F.,McAleer,S.,Shmakov,A。等。 通过深度强化学习和搜索解决魔方。 Nat Mach Intell 1,356–363(2019)。 DeepCubeA。 DNN使用TD(0)学习值函数 更复杂的网
资源详情
资源评论
资源推荐
收起资源包目录
rubik-master.zip (15个子文件)
rubik-master
util.py 8KB
benchmarks.py 4KB
eval.ipynb 78KB
.pylintrc 64B
cube.py 18KB
Pipfile.lock 52KB
LICENSE 34KB
README.md 2KB
Pipfile 279B
util_test.py 4KB
.gitignore 2KB
cube_test.py 2KB
trainer.py 7KB
solver.py 13KB
solver_test.py 7KB
共 15 条
- 1
sleepsoft
- 粉丝: 36
- 资源: 4634
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功
评论1