使用自适应动态规划的一类具有多个时滞的线性离散时间系统的无模型最优控制设计资源-CSDN文库

39 浏览量 2021-03-03 05:12:15 上传评论 1 收藏 595KB PDF 举报

本文所涉及的知识点包括线性离散时间系统、多时滞、模型无关的最优控制设计、自适应动态规划（ADP）、Q学习方法以及价值迭代（VI）算法。线性离散时间系统是一种系统动态行为可以通过离散时间迭代计算来表达的系统。在计算机和数字信号处理中非常常见。这类系统的特点是系统状态仅在离散的时间点发生变化，例如在每个采样周期结束时。时滞现象是指系统当前时刻的输出不仅取决于当前时刻的输入，还受到过去某一时刻输入的影响。在多种研究领域中，包括生物学、化学、经济学、机械学、电气工程、物理学和工程科学等，时滞现象普遍存在。例如，在网络通信中，数据包传输的延迟就可以看作是一种时滞。在控制系统中，时滞可以出现在状态、控制和输出变量中。多时滞系统指的是系统内存在多个不同时长的时滞现象。这种系统由于其内在的复杂性，在控制上更加困难。而线性离散时间系统带有多个时滞，意味着需要在控制策略中考虑多个历史时刻状态和控制对当前输出的影响，这无疑增加了控制器设计的复杂度。模型无关的最优控制设计，又称为无模型控制，是一种不需要系统精确数学模型的控制方法。在实践中，系统模型可能因为未知的物理特性、外部扰动或参数变化而难以获得或存在较大误差。模型无关的最优控制方法能够通过输入输出数据来逼近或估计系统的动态行为，从而设计出最优的控制策略。自适应动态规划（ADP）是一种优化控制技术，它结合了动态规划和机器学习的原理，来解决具有时滞或不确定性的复杂系统的最优控制问题。ADP通过迭代更新控制策略，使得性能指标（如成本函数）最小化。ADP的核心思想在于通过与环境的交互学习和适应，逐步改善控制策略。 Q学习是ADP中的一种方法，属于强化学习范畴。它通过探索和利用的策略来更新一个表征最优价值函数的Q表，从而得到最优策略。Q学习不需要系统的模型，而是直接从与环境的交互中学习。价值迭代（VI）算法是动态规划中用来计算最优价值函数的一种方法。它通过不断更新价值函数的估计值，直至收敛到最优解。文章中提到的研究工作，是将自适应动态规划技术应用于一类具有多个时滞的线性离散时间系统的最优控制设计中。这项工作不依赖于系统的精确模型，而是通过观测系统的输入输出数据来设计最优控制器。通过结合Q学习方法和价值迭代算法，研究者能够得到一个没有模型的最优控制策略，以此来最小化给定的代价函数。研究者使用了几个数值例子来验证所提方法的有效性。这些例子能够说明通过ADP技术，即便是具有多个时滞的复杂系统，也能够得到有效的最优控制策略。这项研究工作深入探讨了在具有多时滞现象的线性离散时间系统中实施模型无关最优控制设计的可能性，利用自适应动态规划技术，提出了一个基于Q学习和价值迭代算法的控制框架。这项研究在控制理论和实际应用中具有重要的理论价值和实际意义。

资源推荐

资源详情

资源评论

Model-free optimal control design for a class of linear discrete-time

systems with multiple delays using adaptive dynamic programming

Jilie Zhang

, Huaguang Zhang

a,b,

, Yanhong Luo

, Tao Feng

School of Information Science and Engineering, Northeastern University, Shenyang, Liaoning 110819, P.R. China

State Key Laboratory of Synthetical Automation for Process Industries (Northeastern University), Shenyang, Liaoning 110819, P.R. China

article info

Article history:

Received 9 July 2013

Received in revised form

4 November 2013

Accepted 16 December 2013

Communicated by D. Liu

Available online 24 January 2014

Keywords:

Model-free optimal control

Discrete-time delay system

Optimal control

Adaptive dynamic programming

abstract

In this paper, a model-free optimal control scheme for a class of linear discrete-time systems with

multiple delays in state, control and output vectors is proposed. The optimal control can be obtained

using only measured input/output data from systems, by adaptive dynamic programming (ADP)

technology. First, we give a class of systems what we want to address. Then, a model-free optimal

control is designed to minimize the given cost functional by ADP technology, which combines a similar

Q-learning method with a value iteration (VI) algorithm, using only the measured input/output data.

Finally, several numerical examples are given to illustrate the effectiveness of our approach.

1. Introduction

Since systems with time delay phenomena are ubiquity in

various research ﬁelds, such as biology, chemistry, economics,

mechanics, electrical, physics, as well as engineering sciences

[1–4], the optimal control problem is discussed as a key topic for

time-delay problems in [5–7] over the past several decades. In

fact, the optimal control for time-delay systems is an inﬁnite-

dimensional control problem [8], which is hard to be solved.

However, because adaptive (approximate) dynamic programming

is a powerful tool for solving optimal control problems [9–11], the

optimal control based on ADP attracts considerably attention of

researchers.

In recent years, the ADP is used to design the optimal control

for control systems [12–18,20,32–34]. The optimal control pro-

blem for continuous-time systems is studied in [12–15,17,20,

32,34]. While Refs. [16,18,33] design the optimal control for

discrete-time systems. However, to the best of our knowledge,

the optimal control results based on ADP for time-delay systems

are rare. There exist only some relevant results, such as [19,21,22].

An optimal control scheme for nonlinear systems with delays is

proposed by using a new iterative ADP algorithm in [21].In[19],a

new iterative heuristic dynamic programming (HDP) algorithm is

proposed to solve the optimal control problem for a class of

nonlinear discrete time-delay systems with saturating actuators.

The local and global optimization searching processes are devel-

oped to solve the optimal control problem in the iterative HDP

algorithm. Later, Ref. [22] designs the optimal control for tracking

control systems by a novel HDP iteration algorithm which contains

state updating, control policy iteration and performance index

iteration. However, most of the above results design the optimal

control for time-delay systems with known knowledge of systems.

Although ADP algorithms, which are used to obtain the optimal

control for time-delay systems, have made some progress, how to

design the model-free optimal control for time-delay systems is

still an open problem. For a simple case without delays, Lewis has

made a contribution [23] to the model-free optimal control design,

but few researches focus on designing the model-free optimal

control for systems with multiple time delays. Therefore, a control

we present by the method in [23] is used to drive the time-delay

systems, rather than the systems without delays. Namely, we

design the optimal control for the equivalent systems by the

method in [23], then drive the original systems using it. However,

the system must satisfy the certain conditions. Although the

systems we address are not general, it has been a progress for

designing the model-free optimal control for time-delay systems

in the ADP ﬁeld. The other contribution is that we ﬁnd a class of

systems with delays, which can be drove by an optimal control

without delays.

In this paper, we not only expand the necessary and sufﬁcient

conditions in [24] to linear discrete-time systems with multiple

Contents lists available at ScienceDirect

journal homepage: www.elsevier.com/locate/neucom

Neurocomputing

http://dx.doi.org/10.1016/j.neucom.2013.12.038

Corresponding author.

E-mail addresses: jilie0226@163.com (J. Zhang), hgzhang@ieee.org (H. Zhang),

neuluo@gmail.com (Y. Luo), sunnyfengtao@163.com (T. Feng).

Neurocomputing 135 (2014) 163–170

delays by using the bicausal change of coordinates method [25],

but also design the model-free optimal control for systems with

multiple delays by using the design approach of state estimator

based on measured input/output data in [26,27].

The following several factors motivate our research on the

model-free optimal control for systems with multiple delays: ﬁrst,

since the system model is usually unknown, the model-based

optimal control is not available in practical situations. Therefore,

the optimal control which is not dependent on the knowledge of

the systems is very useful in practice. This point motivates our

research on the model-free optimal control based on ADP for

systems with multiple delays. Second, the model-free optimal

control design based on ADP for time-delay is an open problem.

There are not relevant results at present, while our work deals

with the optimal control problem for systems with multiple delays

for the ﬁrst time. Finally, seeking a model-free optimal control for

systems with delays is a problem that it is difﬁcult to be solved,

because the form of optimal cost functional for systems with

multiply time delays in state, input and output vectors cannot be

predicted as linear systems. It also motivates our interest.

Here, the model-free optimal control for a class of systems with

multiple delays is successfully designed by ADP technology, using

the measured input/output data from systems.

This paper is organized as follows. Section 3 gives a class of

systems what we want to address. In Section 4, a model-free

optimal control for systems with multiple delays is designed by

ADP technology, using the measured input-output data. Finally,

several numerical examples are given to illustrate the effectiveness

of our approach.

Note: In this paper, we use Q 4 0(Q o 0, Q Z 0, Q r 0) to denote

the Q matrix as positive (negative, positive semideﬁniteness and

negative semideﬁniteness).

2. System description and preliminaries

2.1. Problem description

Now, we concentrate our attention on linear discrete-time

systems with multiple delays in state, input and output vectors.

The system model is given by

xðkþ1Þ¼ ∑

i ¼ 0

xðkiÞþ ∑

j ¼ 0

uðkjÞð1aÞ

yðkÞ¼ ∑

h ¼ 0

xðkhÞð1bÞ

where the state xðkÞ A R

, the input uðkÞA R

and the measure-

ment output vector yðkÞA R

; A

A R

nn

ði ¼ 0; …; τ

Þ; B

A R

nm

ðj ¼ 0; …; τ

Þ and C

A R

qn

ðh ¼ 0; …; τ

Þ, τ

; τ

and τ

A IN.

Assumption 1. The system (1) is controllable and observable.

Problem 1. The control object is how to ﬁnd a control u(k)to

minimize the following cost functional subject to the system (1):

JðxðkÞÞ ¼ ∑

i ¼ k

ℓðxðiÞ; uðiÞÞ; ð2Þ

ℓðxðiÞ; uðiÞÞ ¼ yðiÞ

QyðiÞþuðiÞ

RuðiÞ is a utility function, where Q and

R are constant weight matrices such that Q ¼ Q

4 0 and

R ¼ R

4 0.

2.2. Preliminaries

We deﬁne ∇ as the delay operator, i.e., ∇

xðkÞ¼xðk iÞ, with

iA IN. Let R½∇ be the ring of polynomials in ∇ with coefﬁcients

in R. Polynomial matrices with ∇ may be written as

M ½∇¼M

þM

∇þ ⋯þ M

∇

;

where M

ði ¼ 0; …; r

Þ are constant matrices, r

A IN. The addition

and the multiplication are deﬁned as usual

M ½∇þN½∇¼ ∑

sup½r



i ¼ 0

ðM

þN

Þ∇

;

M ½∇N½ ∇¼N½∇M½∇¼ ∑

i ¼ 0

∑

j ¼ 0

∇

i þ j

Deﬁnition 1 (Unimodular matrix). A polynomial matrix AA R

½∇

nn

is said to be unimodular if it has a polynomial inverse on

the same ring.

Deﬁnition 2 (Smith invariant factors [24]). Every m  n matrix

polynomial Pð

λÞ of rank r is equivalent to the matrix

Sð

λÞ : PðλÞ¼U

ðλÞSðλÞU

ðλÞ

with

Sð

λÞ¼

ΔðλÞ 0

;

ΔðλÞ¼diag½d

ðλÞ; …; d

ðλÞ such that d

ðλÞ is divisible by d

i  1

ðλÞ for

i ¼ 2; …; r for some unimodular matrices U

ðλÞ and U

ðλÞ. The

matrix polynomial S ð

λÞ is called the Smith form of PðλÞ and the

diagonal elements d

ðλÞ are the invariant factors.

Deﬁnition 3 (Change of coordinates [25]). Considering the time-

delay system (1), with state coordinates x(k), then

zðkÞ¼T½∇xðkÞ with T½∇A R½∇

nn

ð3Þ

is a causal change of coordinates if the Smith Invariant Factors of

T½∇ have the form ∇

for some τ

A IN.Ifτ

¼ 0; 8i ¼ 1; …; n, then

the change of coordinates is said to be bicausal.

Deﬁnition 4 (Delay-free equivalence system model). The linear

discrete systems (1) and the delay-free system model

zðkþ 1Þ¼

zðkÞþB

uðkÞð4aÞ

y ¼

zðkÞð4bÞ

are equivalent if there exists a unimodular matrix T½∇ such

that (3).

3. Necessary and sufﬁcient conditions

According to [24], we expand the results of continuous time

systems with only state delays and input delays (that in [24])to

that of discrete-time systems with multiple delays in state, input

and output vectors. The following lemma shows the results.

Lemma 1. The original system (1) is equivalent to the delay-free

system (4), if and only if there exist T

A R

nn

for i ¼ 0; 1; …; τ

, with

r τ

¼ supðτ

; ðn 1Þτ

Þ, such that:

(a) ∑

i ¼ 0

 i

¼ A

for κ ¼ 0; …; τ

þτ

, with A

¼ A

;

(b) ∑

i ¼ 0

 i

¼ B

for κ ¼ 0; 1; …; τ

þτ

, with B

¼ B

; B

¼ 0;

κ 4 0;

i ¼ 0

 i

¼ C

for κ ¼ 0; 1; …; τ

þτ

, with C

¼ C

and

¼ 0; 8κ 4 0;

(d) detð∑

i ¼ 0

∇

ÞA R\f0g.

Proof. If Lemma 1 holds, the system (1) is equivalent to the

system (4). Next, the equivalency is proved.

J. Zhang et al. / Neurocomputing 135 (2014) 163–170164

剩余7页未读，继续阅读

评论收藏

内容反馈

weixin_38660579

粉丝: 11
资源: 917

使用自适应动态规划的一类具有多个时滞的线性离散时间系统的无模型最优控制设计

无模型自适应控制策略在动态电压恢复器中的应用研究

MFAC无模型自适应控制代码

基于预补偿器的自适应动态规划的连续时间非线性系统无模型最优控制器设计

自适应动态规划matalab简单代码实现

UCSNet:用于“使用具有不确定性意识的自适应瘦体积表示的深度立体声”的代码

具有解耦性能的离散时间线性多变量系统最优跟踪控制.docx

一类离散时间非线性多智能体系统的一致性.pdf

多模型智能控制的博士论文

强化学习与自适应动态规划-RL and ADP

基于多参数灵敏度分析与遗传优化的铁水质量无模型自适应控制.docx

基于局部值迭代的离散非线性系统最优自学习控制方案

控制系统设计PPT课件.pptx

习题解答-控制系统计算机辅助设计-薛定宇

北京科技大学研究生2014年动态系统建模期末考试题目.zip

计算机控制原理与设计

基于Matlab实现模型预测控制(MPC).zip

现代控制理论课件现代控制理论.zip

自动控制系统课后答案

基于粒子群优化算法的一类离散混沌系统的参数辨识.pdf

matlab开发-对重复过程的自适应优化控制.zip

自动控制原理课件（吴麒）

自动控制原理精品课件

研究生现代控制理论课件

清华大学控制课件

现代控制工程课件 .rar

自动控制原理ppt.rar

最新资源