PyPI官网下载|pyqlearning-1.1.3.tar.gz资源-CSDN文库

版权申诉

14 浏览量 2022-01-29 11:33:49 上传评论收藏 31KB GZ 举报

共25个文件

py：17个

txt：4个

pkg-info：2个

资源推荐

资源详情

资源评论

收起资源包目录

pyqlearning-1.1.3.tar.gz （25个子文件）

pyqlearning-1.1.3

PKG-INFO 29KB

pyqlearning

annealing_model.py 6KB

annealingmodel

simulated_annealing.py 5KB

__init__.py 0B

costfunctionable

greedy_q_learning_cost.py 2KB

boltzmann_q_learning_cost.py 2KB

__init__.py 0B

simulatedannealing

adaptive_simulated_annealing.py 2KB

__init__.py 0B

cost_functionable.py 380B

beta_dist.py 3KB

qlearning

__init__.py 0B

greedy_q_learning.py 2KB

boltzmann_q_learning.py 3KB

__init__.py 0B

thompson_sampling.py 1KB

q_learning.py 13KB

setup.cfg 38B

setup.py 1KB

README.md 25KB

pyqlearning.egg-info

PKG-INFO 29KB

requires.txt 13B

SOURCES.txt 926B

top_level.txt 12B

dependency_links.txt 1B

# Reinforcement Learning Library: pyqlearning `pyqlearning` is Python library to implement Reinforcement Learning, especially for Q-Learning. ## Installation Install using pip: ```sh pip install pyqlearning ``` ### Source code The source code is currently hosted on GitHub. - [accel-brain-code/Reinforcement-Learning](https://github.com/chimera0/accel-brain-code/tree/master/Reinforcement-Learning) ### Python package index(PyPI) Installers for the latest released version are available at the Python package index. - [pyqlearning : Python Package Index](https://pypi.python.org/pypi/pyqlearning/) ### Dependencies - numpy: v1.13.3 or higher. - pandas: v0.22.0 or higher. ## Description In Reinforcement Learning problem settings, Q-Learning is a kind of Temporal Difference learning(TD Learning) that can be considered as hybrid of Monte Carlo method and Dynamic Programming Method. As Monte Carlo method, TD Learning algorithm can learn by experience without model of environment. And this learning algorithm is functional extension of bootstrap method as Dynamic Programming Method. In this library, Q-Learning can be distinguished into Epsilon Greedy Q-Leanring and Boltzmann Q-Learning. These algorithm is functionally equivalent but their structures should be conceptually distinguished. Epsilon Greedy Q-Leanring algorithm is a typical off-policy algorithm. In this paradigm, *stochastic* searching and *deterministic* searching can coexist by hyperparameter <img src="https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/latex/epsilon.gif" /> that is probability that agent searches greedy. Greedy searching is *deterministic* in the sense that policy of agent follows the selection that maximizes the Q-Value. Boltzmann Q-Learning algorithm is based on Boltzmann action selection mechanism, where the probability <img src="https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/latex/x_i.gif" /> of selecting the action <img src="https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/latex/i.gif" /> is given by  <div><img src="https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/latex/boltzmann_action_selection.gif" /></div> where the temperature <img src="https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/latex/t_gt_0.gif" /> controls exploration/exploitation tradeoff. For <img src="https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/latex/t_to_0.gif" /> the agent always acts greedily and chooses the strategy corresponding to the maximum Q–value, so as to be pure *deterministic* exploitation, whereas for <img src="https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/latex/t_to_infty.gif" /> the agent’s strategy is completely random, so as to be pure *stochastic* exploration. Considering many variable parts and functional extensions in the Q-learning paradigm from perspective of *commonality/variability analysis* in order to practice object-oriented design, this library provides abstract class that defines the skeleton of a Q-Learning algorithm in an operation, deferring some steps in concrete variant algorithms such as Epsilon Greedy Q-Leanring and Boltzmann Q-Learning to client subclasses. The abstract class in this library lets subclasses redefine certain steps of a Q-Learning algorithm without changing the algorithm's structure. ## Documentation Full documentation is available on [https://code.accel-brain.com/Reinforcement-Learning/](https://code.accel-brain.com/Reinforcement-Learning/) . This document contains information on functionally reusability, functional scalability and functional extensibility. ## Tutorial: Simple Maze Solving by Q-Learning (Jupyter notebook) [search_maze_by_q_learning.ipynb](https://github.com/chimera0/accel-brain-code/blob/master/Reinforcement-Learning/search_maze_by_q_learning.ipynb) is a Jupyter notebook which demonstrates a simple maze solving algorithm based on Epsilon-Greedy Q-Learning or Q-Learning, loosely coupled with Deep Boltzmann Machine(DBM). In this demonstration, let me cite the Q-Learning, *loosely coupled with Deep Boltzmann Machine* (DBM). As API Documentation of [pydbm](https://github.com/chimera0/accel-brain-code/tree/master/Deep-Learning-by-means-of-Design-Pattern) library has pointed out, DBM is functionally equivalent to stacked auto-encoder. The main function I observe is the same as dimensions reduction(or pre-training). Then the function of this DBM is dimensionality reduction of reward value matrix. Q-Learning, loosely coupled with Deep Boltzmann Machine (DBM), is a more effective way to solve maze. The pre-training by DBM allow Q-Learning agent to abstract feature of reward value matrix and to observe the map in a bird's-eye view. Then agent can reach the goal with a smaller number of trials. As shown in the below image, the state-action value function and parameters setting can be designed to correspond with the optimality of route. <div align="center"> <table style="border: none;"> <tr> <td width="45%" align="center"> <a href="https://github.com/chimera0/accel-brain-code/blob/master/Reinforcement-Learning/search_maze_by_q_learning.ipynb" target="_top"><img src="https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/maze_map.png" /></a> <p>Maze map</p> </td> <td width="45%" align="center"> <a href="https://github.com/chimera0/accel-brain-code/blob/master/Reinforcement-Learning/search_maze_by_q_learning.ipynb" target="_top"><img src="https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/feature_point.png" /></a> <p>Feature Points in the maze map</p> </td> </tr> <tr> <td width="45%" align="center"> <a href="https://github.com/chimera0/accel-brain-code/blob/master/Reinforcement-Learning/search_maze_by_q_learning.ipynb" target="_top"><img src="https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/fail_searched.png" /></a> <p>The result of searching by Epsilon-Greedy Q-Learning</p> </td> <td width="45%" align="center"> <a href="https://github.com/chimera0/accel-brain-code/blob/master/Reinforcement-Learning/search_maze_by_q_learning.ipynb" target="_top"><img src="https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/maze_q_learning_result.png" /></a> <p>The result of searching by Q-Learning, loosely coupled with Deep Boltzmann Machine.</p> </td> </tr> </table> </div> ## Tutorial: Complexity of Hyperparameters, or how can be hyperparameters decided? There are many hyperparameters that we have to set before the actual searching and learning process begins. Each parameter should be decided in relation to Reinforcement Learning theory and it cause side effects in training model. Because of this complexity of hyperparameters, so-called the hyperparameter tuning must become a burden of Data scientists and R & D engineers from the perspective of not only a theoretical point of view but also implementation level. This issue can be considered as Combinatorial optimization problem which is an optimization problem, where an optimal solution has to be identified from a finite set of solutions. The solutions are normally discrete or can be converted into discrete. This is an important topic studied in operations research such as software engineering, artificial intelligence(AI), and machine learning. For instance, travelling sales man problem is one of the popular combinatorial optimization problem. In this problem setting, this library provides an Annealing Model to search optimal combination of hyperparameters. For instance, Simulated Annealing is a probabilistic single solution based search method inspired by the annealing process in metallurgy. Annealing is a physical process referred to as

评论收藏

内容反馈

版权申诉