rubik：学习如何使用强化学习来解决魔方资源-CSDN文库

共15个文件

py：8个

pylintrc：1个

lock：1个

JupyterNotebook

5星 · 超过95%的资源需积分: 14 146 浏览量 2021-02-16 10:29:00 上传评论 1 收藏 95KB ZIP 举报

资源详情

资源评论

资源推荐

收起资源包目录

rubik-master.zip （15个子文件）

rubik-master

util.py 8KB

benchmarks.py 4KB

eval.ipynb 78KB

.pylintrc 64B

cube.py 18KB

Pipfile.lock 52KB

LICENSE 34KB

README.md 2KB

Pipfile 279B

util_test.py 4KB

.gitignore 2KB

cube_test.py 2KB

trainer.py 7KB

solver.py 13KB

solver_test.py 7KB

# rubik Learning how to solve a Rubik's cube using Reinforcement Learning ## Status Model is learning something. I tried tweaking the structure of the model but could not get to a loss below 18, which seems quite high. It's good enough to solve cubes scrambled with 5 rotations with just a 1-depth greedy search. Next steps: - batch calls to the model in get_td_value_examples and in the greedy solver. - implement A*. - investigate the model's behavior more: * more metrics than the loss (e.g. average L1 error) * slice metrics by the label: are we better at cubes closer or further from a solved state? - weight training examples by 1/{# of rotations done to scramble}. - implement the model that has both a value head and a policy head - implement MCTS. ## References - Agostinelli, F., McAleer, S., Shmakov, A. et al. Solving the Rubik’s cube with deep reinforcement learning and search. Nat Mach Intell 1, 356–363 (2019). https://doi.org/10.1038/s42256-019-0070-z * DeepCubeA. * DNN learns value function using TD(0) * more complex network: 2 fully connected, then 4 residual blocks. * cube represented as a one-hot vector of size 6 for each sticker (54 stickers total). * uses A* for search (with a tweak). - McAleer, Stephen, et al. "Solving the Rubik's Cube with Approximate Policy Iteration." (2018). https://openreview.net/forum?id=Hyfn2jCcKm * DeepCube * DNN with both a value head (trained using TD(0) and policy head) * training set sampled from scrambling the solved cube, each example is weighted by 1/{distance to solved}. * simple model (sequence of fully connected layers) * MCTS to search * trick: use the same network weights for the labels, only update them when the loss gets below a threshold. * cube represented as binary 20x24 array ### Not read yet - Brunetto, R. & Trunda, O. Deep heuristic-learning in the Rubik’s cube domain: an experimental evaluation. Proc. ITAT 1885, 57–64 (2017). - Johnson, C. G. Solving the Rubik’s cube with learned guidance functions. In Proceedings of 2018 IEEE Symposium Series on Computational Intelligence (SSCI) 2082–2089 (IEEE, 2018).