《智能驾驶技术》 软英 1702 20175188 马洪升
1
Reinforcement Learning: Q-Learning
1 Project Introduction
1.1 Background
Reinforcement learning is a machine learning method. Through the interactive
feedback system (reward and punishment) between Agent and Environment, the agent
needs to use a series of decisions and state transitions to achieve the preset goal.
A classic example is training a rat (an intelligence agent) to find the shortest path
to a cake in a maze (an environment). Agents use Exploration and Exploitation of past
Experiences to achieve their goals. It may fail over and over again, but after a long
period of trial and error, the agent can finally find the answer to the problem.
The value of Accumulated Rewards can be maximized when an intelligent agent
can continuously find an optimal state in the long run. (in short, an algorithm with
feedback rewards can be used to induce an intelligent agent to achieve a goal by
continuously acquiring rewards.)
In addition, the agent may have to endure many penalties (negative rewards) in
order to achieve the goal. For example, the mouse in the maze was given a slap on the
wrist for every legal action because we wanted it to take the shortest possible route to
reach the target unit, otherwise it would be rewarded for wandering around the maze at
will. The shortest path to the target cake can sometimes be long and convoluted, and
the agent (the mouse) may have to endure many penalties until it finally reaches the
Delayed Reward goal.
1.2 Maze Problem
Maze problem has been applied in data structure and algorithm research. The well-
known Dijkstra shortest path algorithm is still one of the most practical methods to
solve these problems. But because of the intuitive nature of the maze problem, it is well
suited to demonstrate and test reinforcement learning techniques.