车杆轨迹控制和平衡、DDPG强化学习训练环境和Matlab可视化.rar资源-CSDN文库

共23个文件

m：11个

mat：8个

png：2个

版权申诉

Matlab

198 浏览量 2024-11-30 23:58:42 上传评论收藏 4.31MB RAR 举报

资源推荐

资源详情

资源评论

收起资源包目录

车杆轨迹控制和平衡、DDPG强化学习训练环境和 Matlab 可视化.rar （23个子文件）

车杆轨迹控制和平衡、DDPG强化学习训练环境和 Matlab 可视化

1.png 44KB

CartPoleControl-main

UnstableTraining.m 2KB

main.m 202B

Agents

Stable

agent_3.mat 319KB

agent_1.mat 315KB

agent_4.mat 320KB

agent_2.mat 318KB

Unstable

agent_3.mat 323KB

agent_1.mat 324KB

agent_4.mat 323KB

agent_2.mat 323KB

+CartPole

Reward.m 4KB

Trajectory.m 10KB

Environment.m 19KB

Misc

MoveRectangle.m 207B

Documentation.pdf 1.85MB

CircularBuffer.m 2KB

manipulate.m 10KB

plots.m 11KB

Swingup.gif 101KB

StableTraining.m 2KB

BuildCartPoleAgent.m 2KB

2.png 36KB

Eötvös Loránd University

Faculty of Science

Institute of Mathematics

Neural network-based control of a

non-linear dynamical system

Supervisor: Author:

Dr. Ferenc Izsák Károly Csurilla

Habil. Associate Professor Postgraduate student

Budapest, 2023

Abstract

This thesis aims to be an entry point to reinforcement learning control. We

are applying the DDPG (Deep Deterministic Policy Gradient) algorithm to the

cart pole system, which is a classical non-linear dynamical system often used for

benchmarking of control algorithms. The resulting Matlab framework contains the

end-to-end system specication from the derivation of its dierential equations (as a

proxy for a "real" system) to the real-time interactive visualisation of its trajectories.

To improve training eciency, special attention was paid to the trajectory generation

and reward structure.

Contents

1 Introduction 2

2 Reinforcement learning foundations 3

2.1 Markov decision process . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.2 Value functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.3 Deterministic Gradient Policy Theorem . . . . . . . . . . . . . . . . . 9

2.4 DDPG algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3 Cart pole environment 15

3.1 Dynamical model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.2 Trajectory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.3 Environment signals . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.4 Reward . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

4 Agents 26

4.1 Agent architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4.2 Training process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4.3 Simulation results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

4.3.1 Stable agent . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

4.3.2 Unstable agent . . . . . . . . . . . . . . . . . . . . . . . . . . 33

5 Summary 36

Bibliography 37

Chapter 1

Introduction

Control engineering is an engineering discipline that focuses on actuation meth-

ods to make a dynamical system converge to a desired state or approximate a pre-

scribed trajectory. Also, a basic requirement for this procedure is the robustness

against environmental or modelling uncertainties. Since its formalisation in the late

19th century, control theory turned out to be a melting pot of real and complex

analysis, linear algebra and probability theory.

With the resurgence of neural networks in the 1990s, it was natural to use them

as a substitute for control principles designed with classical methods. However, em-

bedding them in a learning scheme was prone to the vanishing gradient problem,

similar to recurrent neural networks. It took recent breakthroughs in reinforcement

learning to provide a stable training environment, which turned out to be especially

eective in tackling high output-dimensional, non-linear problems, where human

intuition was infeasible.

Chapter 2 aims to give a concise but self-contained introduction to reinforce-

ment learning, covering the basic results necessary for understanding the Deep

Deterministic Policy Gradient (DDPG) algorithm, one of the stepping stones in con-

tinuous control. It is followed by the environment description in Chapter 3, where the

details of the environment, trajectory and reward equations are discussed. Finally

in Chapter 4, we collect the training and controller performance results, with a few

concluding remarks in Chapter 5.

Chapter 2

Reinforcement learning foundations

Reinforcement learning can be recognised as a learning framework, where an

agent is conditioned to continuously improve by sequentially interacting with a

stochastic environment [1]. As an area of machine learning, it focuses on explor-

ing and exploiting (maximizing) the reward mechanism of the environment, instead

of mimicking observation-action relations coming from a supposed optimal policy

(as is the case in supervised learning).

Reinforcement learning is such a vast area that it cannot be reasonably cov-

ered within the frames of this work (Figure 2.1). Instead, we are focusing on the

derivation of the DDPG algorithm, the central method that we are applying to

the cart-pole problem. As these reinforcement learning algorithms have numerous

hyperparameters, it is crucial to understand the theory behind them.

DDPG is a model-free, online, o-policy actor-critic reinforcement learning

Figure 2.1: Reinforcement learning methodology (based on [2]).

评论收藏

内容反馈

版权申诉

天天Matlab代码科研顾问

粉丝: 3w+
资源: 2504

车杆轨迹控制和平衡、DDPG强化学习训练环境和 Matlab 可视化.rar

车杆轨迹控制和平衡、DDPG强化学习训练环境和 Matlab 可视化 matlab代码.rar

基于深度强化学习DDPG DQN PD的垂直起降系统模型控制器设计Simulink、Matlab代码实现.rar

基于深度强化学习的小球弹射控制系统仿真对比DDPG和TD3，matlab2021a仿真测试。

【VTOL控制器】基于深度强化学习DDPG DQN PD的垂直起降系统模型控制器设计Simulink、Matlab代码实现.rar

基于强化学习+MPC模型预测控制算法的车辆变道轨迹跟踪控制MATLAB仿真

MATLAB可视化大学物理学 全部PPT课件 含每个章节所有源代码.rar

数学物理方程的MATLAB解法与可视化.part1(学习matlat 15本必备书之十三)

matlab比较了DDPG强化学习模型与PID和恒温控制器的温度控制性能.zip

pole.zip_matlab 强化学习_平衡小车_强化学习 matlab_强化学习matlab_强化学习控制

飞车数学模型的Matlab可视化分析.rar

基于DDPG-PID方法的水下机器人姿态控制python程序.rar

通过simulink实现基于DDPG强化学习的控制器建模与仿真

强化学习解最优控制的matlab代码.rar_EVX8_matlab_matlab 强化学习_强化学习matlab_强化学习控制

强化学习MATLAB脚本示例代码，只需替换为自己环境即可使用

Excel-MATLAB-绘图.rar

CST方向图数据可视化.rar_CST方向图可视化_cst 方向图_cst复制方向图_matlab_方向图导入

Matlab系列--基于MATLAB的深度强化学习控制.zip

MATLAB可视化技术.rar

PID.rar_matlab小车轨迹_小车 matlab_小车PID_小车直线_小车轨迹

DDPG智能体强化学习倒立摆案例

matlab强化学习代码.7z

Matlab系列--关于书《强化学习第二版》（作者Richard S. Sutton）每章节的代码实现（matlab版）.zip

语音信号处理MATLAB.rar

MATLAB工具箱-深度学习matlab工具箱.rar

Downloads.rar_强化学习_强化学习 matlab_强化学习matlab_强化学习算法

MATLAB无碳小车轨迹代码.rar

强化学习_倒立摆_Matlab程序

张志涌的matlab精华-matlab_c.rar

最新资源

MATLAB可视化大学物理学全部PPT课件含每个章节所有源代码.rar