Q学习算法（Q-learning）_q学习,q学习算法资源-CSDN文库

共2个文件

doc：2个

Q学习算法

4星 · 超过85%的资源需积分: 50 51 浏览量 2012-04-18 09:00:30 上传评论 10 收藏 174KB RAR 举报

资源详情

资源评论

收起资源包目录

Q学习算法.rar （2个子文件）

q learning-2.doc 243KB

q learning-1.doc 67KB

In the previous sections of this tutorial, we have modeled the environment

and the reward system for our agent. This section will describe learning

algorithm called Q learning (which is a simplication of reinforcement

learning).

We have model the environment reward system as matrix R.

Now we need to put similar matrix name Q in the brain of our agent that will

represent the memory of what the agent have learned through many

experiences. The row of matrix Q represents current state of the agent, the

column of matrix Q pointing to the action to go to the next state.

In the beginning, we say that the agent know nothing, thus we put Q as zero

matrix. In this example, for the simplicity of explanation, we assume the

number of state is known (to be six). In more general case, you can start with

zero matrix of single cell. It is a simple task to add more column and rows in Q

matrix if a new state is found.

The transition rule of this Q learning is a very simple formula

The formula above have meaning that the entry value in matrix Q (that is row

represent state and column represent action) is equal to corresponding entry

of matrix R added by a multiplication of a learning parameter and maximum

value of Q for all action in the next state.

Q Learning Algorithm goes as follow

1. Set parameter , and environment reward matrix R

2. Initialize matrix Q as zero matrix

3. For each episode:

o Select random initial state

o Do while not reach goal state

 Select one among all possible actions for the

current state

 Using this possible action, consider to go to

the next state

 Get maximum Q value of this next state

based on all possible actions

 Compute

 Set the next state as the current state

End Do

End For

The above algorithm is used by the agent to learn from experience or

training. Each episode is equivalent to one training session. In each training

session, the agent explores the environment (represented by Matrix R ), get

the reward (or none) until it reach the goal state. The purpose of the training

is to enhance the ‘brain' of our agent that represented by Q matrix. More

training will give better Q matrix that can be used by the agent to move in

optimal way. In this case, if the Q matrix has been enhanced, instead of

exploring around and go back and forth to the same room, the agent will nd

the fastest route to the goal state.

Parameter has range value of 0 to 1( ). If is closer to zero, the

agent will tend to consider only immediate reward. If is closer to one, the

agent will consider future reward with greater weight, willing to delay the

reward.

To use the Q matrix, the agent traces the sequence of states, from the initial state until goal

state. The algorithm is as simple as finding action that makes maximum Q for current state:

Algorithm to utilize the Q matrix

Input: Q matrix, initial state

1. Set current state = initial state

2. From current state, nd action that produce maximum Q

value

3. Set current state = next state

4. Go to 2 until current state = goal state

Let us set the value of learning parameter and initial state as room B.

First we set matrix Q as a zero matrix.

I put again the instant reward matrix R that represents the environment in

here for your convenience.

Look at the second row (state B) of matrix R. There are two possible actions

for the current state B, that is to go to state D, or go to state F. By random

selection, we select to go to F as our action.

Now we consider that suppose we are in state F. Look at the sixth row of

reward matrix R (i.e. state F). It has 3 possible actions to go to state B, E or F.

Since matrix Q that is still zero, are all zero. The

result of computation is also 100 because of the instant reward.

The next state is F, now become the current state. Because F is the goal state,

we nish one episode. Our agent's brain now contain updated matrix Q as

For the next episode, we start with initial random state. This time for instance

we have state D as our initial state.

Look at the fourth row of matrix R; it has 3 possible actions, that is to go to

state B, C and E. By random selection, we select to go to state B as our action.

Now we imagine that we are in state B. Look at the second row of reward

matrix R (i.e. state B). It has 2 possible actions to go to state D or state F.

Then, we compute the Q value

We use the updated matrix Q from the last episode. and

. The result of computation because of the reward is

zero. The Q matrix becomes

The next state is B, now become the current state. We repeat the inner loop

in Q learning algorithm because state B is not the goal state.

For the new loop, the current state is state B. I copy again the state diagram

that represent instant reward matrix R for your convenient.

There are two possible actions from the current state B, that is to go to state

D, or go to state F. By lucky draw, our action selected is state F.

Now we think of state F that has 3 possible actions to go to state B, E or F. We

compute the Q value using the maximum value of these possible actions.

The entries of updated Q matrix contain are all zero.

The result of computation is also 100 because of the instant reward.

This result does not change the Q matrix.

Because F is the goal state, we nish this episode. Our agent's brain now

contain updated matrix Q as

评论收藏

内容反馈

小透明fighting

2013-05-28

一句话总结：只有两篇篇幅很小的英文word文档，不值得3分！

Q学习算法（Q-learning）

评论9

最新资源

Q学习算法（Q-learning）

评论9

最新资源

相关推荐

Q_learning_Qlearning_Q算法_Q学习算法_Q-learning_q学习

基于python的强化学习算法Q-learning设计与实现

基于Qlearning算法最优路径规划算法matlab仿真,同时使用A星算法进行对比+代码操作视频

基于Q-learning的应用算法

Q 学习算法，希望对那些想使用Q学习算法的人有所帮助

Q-Learning更新公式

Q-Learning算法 Matlab代码实现

Q-learning代码

深度强化学习 - QLearning

matlab Q-learning 无障碍路径规划仿真

Q学习——Q-learning

强化学习Q-learning算法

Q_learning:实施Q学习

learning_algorithm:学习算法

learning-algorithm:学习算法

Q_learning_Qlearning_Q算法_Q学习算法_Q-learning_q学习.zip

Q_learning_Qlearning_Q算法_Q学习算法_Q-learning_q学习_源码.rar

深度学习算法 Q-learning 原理

强化学习Qlearning算法matlab

Q_learning代码实例

qlearning:qlearning的Matlab教程

Q-Learing路径规划MATLAB仿真

Q学习，matlab

强化学习 Q-learning

learning_algos:学习算法

Q-learning与蚁群算法代码

C-learning:学习C ++

强化学习算法-基于python的Q学习算法q-learning实现