ADP_main.zip资源-CSDN文库

共3个文件

m：2个

pdf：1个

版权申诉

Control

自适应动态规划

5星 · 超过95%的资源 105 浏览量 2021-05-05 18:39:45 上传评论 3 收藏 1.53MB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

ADP_main.zip （3个子文件）

ADP_main

Online fault compensation control based on policy iteration algorithm for a class of affine non-linear systems with actuator failures.pdf 1.7MB

main.m 1KB

dynamics.m 2KB

See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/303948974

Online fault compensation control based on policy iteration algorithm for a

class of afﬁne nonlinear systems with actuator failures

ArticleinIET Control Theory and Applications · June 2016

DOI: 10.1049/iet-cta.2015.1105

CITATIONS

READS

3 authors, including:

Bo Zhao

Beijing Normal University

80 PUBLICATIONS432 CITATIONS

SEE PROFILE

All content following this page was uploaded by Bo Zhao on 02 January 2019.

The user has requested enhancement of the downloaded file.

IET Control Theory & Applications

Research Article

Online fault compensation control based on

policy iteration algorithm for a class of affine

non-linear systems with actuator failures

ISSN 1751-8644

Received on 29th October 2015

Revised 16th April 2016

Accepted on 3rd May 2016

E-First on 2nd August 2016

doi: 10.1049/iet-cta.2015.1105

www.ietdl.org

Bo Zhao

, Derong Liu

, Yuanchun Li

State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing

100190, People's Republic of China

School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing 100083, People's Republic of China

Department of Control Science and Engineering, Changchun University of Technology, Changchun 130012, People's Republic of China

E-mail: derong@ustb.edu.cn

Abstract: In this study, a novel online fault compensation control scheme based on policy iteration (PI) algorithm is developed

for a class of affine non-linear systems with actuator failures. The control scheme consists of a PI algorithm and a fault

compensator. For fault-free dynamic models, the PI algorithm is developed to solve the Hamilton–Jacobi–Bellman equation by

constructing a critic neural network, and then the approximate optimal control policy can be derived directly. Alternatively, the

actuator failure is reconstructed adaptively to achieve online fault compensation without the fault detection and isolation

mechanism. The closed-loop system is proved to be asymptotically stable via Lyapunov's direct method. Two numerical

simulation examples are given to demonstrate the effectiveness of the proposed fault compensation control scheme.

1 Introduction

With the fast development of science and technology, industrial

applications are becoming increasingly complex and large scale.

Consequently, the occurrence of failures is inevitable as the

number of components increases. Malfunctions of a component

may not only degrade control performance, but also result in the

loss of system reliability and safety. It should be emphasised that

about 60

of control performance degradation in industrial

systems is caused by actuator and sensor failures and equipment

fouling [1]. To achieve higher reliability and better control

performance, a large amount of research efforts on fault tolerant

control (FTC) systems have been made during the past 30 years to

ensure stability and maintain acceptable control. Among all kinds

of possible failures, actuator failures are considered as one of the

most critical challenges, mainly for the reason that the control

performance can be deteriorated by unexpected and unknown

actuator actions.

Many FTC approaches have been developed to deal with

actuator failures, such as linear quadratic control, intelligent

control, adaptive control and methods based on a combination of

different strategies. In general, FTC approaches can be classified

into two categories, namely passive approaches and active ones.

The main difference between them lies in whether the fault tolerant

controller depends on the fault detection and identification (FDI)

unit or not. By using passive designs, Zhou and Ren [2] proposed

an architecture that included two parts, where the feedback control

system was solely controlled by the performance controller, while

the model uncertainties and external disturbances were handled by

the robustness of the controller. Xiao et al. [3] derived an adaptive

sliding mode controller by estimating the bound of actuator faults

with an online updating law. Wang et al. [4] investigated a robust

fault-tolerant





control of active suspension systems with finite-

frequency constraints. The passive designs achieved insensitivity

of systems to certain possible failures by their robustness, so they

have the drawback that the designed controllers are often

constrained to handling large failures. On the contrary, active FTC,

which possesses stronger fault tolerant capability, achieves stability

and required performance by tuning control strategies under the

decision of an FDI unit. Zhang and Jiang [5], Hwang et al. [6] gave

some excellent reviews on fault reconfiguration methods. Different

methods for handling the reconfiguration problem have been

reported, such as multiple-model approach [7, 8], adaptive control

approach [9, 10], linear quadratic control [11, 12], pseudo-inverse

[13, 14], artificial intelligence [15], model predictive control [16,

17], linear matrix inequality [18, 19], variable structure control [20,

21] and so on. Nazari et al. [22] reconfigured the sensory faulty

system by using a virtual sensor which is adapted to the FDI unit.

Fault accommodation strategy is another way to achieve the goal of

active FTC. Yang and Wang [23] employed a bank of T-S fuzzy

model-based FDI observers to describe particular faults, such that

one of them can track the current system state, and the

corresponding observer estimated error can converge exponentially

to zero. Yoo [24] investigated a time-delay independent fault

detection and accommodation scheme, where an approximation-

based fault accommodation design is activated to compensate for

multiple time-delay faults after the fault being detected. Based on

the fault compensation technique, Wang and Wen [25] proposed an

adaptive failure compensation control scheme with the non-linear

damping and parameter projection techniques for parametric strict

feedback non-linear systems.

It is well-known that adaptive dynamic programming (ADP) is

a powerful approximation tool to solve Hamilton–Jacobi–Bellman

(HJB) equations in non-linear systems [26]. In recent years, ADP

algorithms were further developed to solve the control problem of

continuous-time and discrete-time systems with time delays [27],

external disturbances [28], and control constraints [29], as well as

for trajectory tracking [30], coordination control [31] and so on.

These algorithms are mainly classified into heuristic dynamic

programming (HDP), dual HDP (DHP), action-dependent HDP

(ADHDP), ADDHP, globalised DHP (GDHP) and ADGDHP.

Iterative methods that can be classified into value iteration (VI)

algorithms and policy iteration (PI) algorithms are used in ADP to

solve the HJB equation indirectly. Al-Tamimi et al. [32] proved

that the iterative performance index function is a non-decreasing

sequence and with upper bound, and it converges to the optimal

performance index function, which satisfies the HJB equation. Liu

et al. [33] investigated a neuro-optimal control scheme for a class

of unknown discrete-time non-linear systems with a discount factor

in the cost function and GDHP technique. Zhang et al. [34]

addressed the infinite-time optimal tracking control problem by

using the greedy HDP iteration algorithm. We can conclude from

these studies that VI can remove the requirement of the initial

stabilising control, but it cannot guarantee the stability of the

IET Control Theory Appl., 2016, Vol. 10 Iss. 15, pp. 1816-1823

1816

system. Actually, only the converged optimal control law can be

used to control non-linear systems [35]. In contrast to VI

algorithms, the iterative performance index function of PI

algorithms converge to the optimum non-increasingly and each of

the iterative controls stabilises the non-linear systems [36, 37].

Abu-Khalaf and Lewis [38] proposed a PI algorithm for

continuous-time non-linear systems with control constraints. Liu

and Wei [35] proposed a discrete-time PI ADP method for solving

the infinite horizon optimal control problem of non-linear systems.

Zhang et al. [31] addressed the optimal coordination control for

multiagent differential games by solving the coupled Hamilton–

Jacobi equations via a PI algorithm.

Several papers considered the fault tolerance problem by using

reinforcement learning and ADP strategies. Wang et al. [39]

developed a robust state feedback reliable control scheme

integrated with an iterative learning. By solving linear matrix

inequalities (LMIs), the developed scheme was explicitly

formulated together with an adjustable robust





performance

level for batch process systems with unknown actuator failures.

Feng et al. [40] proposed a reconfigurable fault tolerant deflection

routing algorithm based on reinforcement learning for network on

chip. An optimised routing algorithm-based hierarchical Q-learning

was proposed to reduce the routing table size. He and Shayman

[41] developed a reinforcement-learning based fast algorithm for

proactive network fault management. The proactive diagnosis

information was considered to produce effective monitoring and

control policies for intelligent managers or agents. Zhu and Yuan

[42] presented a novel approach to automate recovery policy

generation with reinforcement learning techniques. It could learn a

new and locally optimal policy that outperformed the original one

based on the recovery history of the original user-defined policy.

Yen and DeLima [43, 44] proposed a supervisor making use of two

quality indices to perform FDI and isolation based on GDHP.

Although it could reduce the reconfiguration time of the controller,

the strategy was implemented under the condition that a priori

knowledge was stored in a dynamic model bank.

In this paper, an online fault compensation control scheme-

based PI algorithm is established to obtain the optimal control of

affine non-linear systems with actuator failures. Due to the

occurrence of actuator failures, the PI algorithm may be biased or

fail to achieve the optimal control. In order to reduce the

degradation caused by faults, a redesigned fault compensation

based PI controller is provided. The weight errors of the critic

neural network are proved to be uniformly ultimately bounded

(UUB), and the stability of the closed-loop system with actuator

failures is guaranteed via Lyapunov's approach. Different from

classic ADP algorithms, the action neural network is no longer

required in this algorithm, which reduces the computational burden

effectively. Meanwhile, the proposed FTC strategy consists of two

parts, namely the PI-based optimal control part and the online fault

compensation part. In this sense, it can be conveniently

implemented to handle fault tolerant problems.

The rest of this paper is organised as follows. In Section 2, we

present the problem statement. In Section 3, the PI algorithm for

fault-free systems is presented. A fault compensator is developed

with the adaptive technique to redesign the FTC, and then stability

analysis is given. In Section 4, two examples are provided to

demonstrate the effectiveness of the present scheme. In Section 5,

the conclusion is drawn.

2 Problem statement

Consider the following affine non-linear system with actuator

failures:



() = (()) + (())((())  

())

(1)

where

 



is the system state vector,

 



is the control input

vector,

(  )

and

(  )

are locally Lipchitz and differentiable in

their arguments with (0) = 0, and



()  R



is an unknown

additive actuator failure. Here, let



(0) = 

be the initial state.

For the system (1) with 

() = 0 (i.e. the system is fault-free),

the performance index function can be defined as

(

) =





((), ())d

(2)

where (, ) = 



 + 



 is the utility function,  (0, 0) = 0,

and (, )  0 for all



and



, in which   R

 × 

and   R

 × 

are positive definite matrices.

Remark 1: Possible failures that occur on actuators may present

many scenarios, such as partial loss of effectiveness, locked in

place, saturation and free-swing. They affect the efficiency of

actuators, i.e. execution capability of the actuators will change.

Thus, the term

()

in (1) should be changed directly to affect the

considered system. In this case, actuator failures can be seen as

matched disturbances. However, there exist different physical

meanings between them. Generally speaking, disturbances are

assumed to be known norm-bounded and inevitable in real

applications. On the other hand, actuator failures occur

stochastically, and are assumed to be unknown bounded functions.

Indeed, disturbances will lead the system performance to

impreciseness, rather than badly destroy the system which may

suffer from actuator faults.

To handle the optimal control problem, the designed feedback

control must be admissible. Before the algorithm is presented, the

definition of admissible control is introduced.

Definition 1: For system (1) with



= 0

, a control policy

()

is said to be admissible, if

()

is continuous on a set

Ω  R



(0) = 0,

()

stabilises the system, and

(

)

in (2) is finite for

all

  Ω

For any admissible control policy

  Ψ(Ω), where

Ψ(Ω)

denotes the set of admissible control, if the performance index

function



(

) =





((), ())d

(3)

is continuously differentiable, then the infinitesimal version of (3)

is the Lyapunov equation

0 =

(, ) + ()



(() + ())

(4)

with (0) = 0, and the term

()

denotes the partial derivative

()

with respect to



, i.e. () = ()/.

Define the Hamiltonian function of the problem and the optimal

performance index function as



, , () = (, ) + ()



(() + ())

and





(

) = min

  Ψ(Ω)





((), ())d .

(5)

Let





()

be the optimal performance index function, then

0 = min

  Ψ(Ω)

 , , 



()

(6)

where





() = 



()/

. If the solution





()

exists and is

continuously differentiable, the optimal control can be expressed as





() = 



1





()



() .

(7)

In general, if the system is fault-free (i.e.



= 0

), the solution of

(6) can be approximated by using the PI technique (see Algorithm

1).

IET Control Theory Appl., 2016, Vol. 10 Iss. 15, pp. 1816-1823

1817

评论收藏

内容反馈

版权申诉

山海平--L

2022-04-25

用户下载后在一定时间内未进行评价，系统默认好评。
想念風

2021-09-22

用户下载后在一定时间内未进行评价，系统默认好评。
不搭491

2022-03-28

用户下载后在一定时间内未进行评价，系统默认好评。
qq_44189536

2024-05-11

资源有很好的参考价值，总算找到了自己需要的资源啦。
黑白冷羽

2022-12-25

终于找到了超赞的宝藏资源，果断冲冲冲，支持！

前往

页

iiwgwsw1111

粉丝: 1
资源: 3

ADP_main.zip

main.zip

ADP.zip_adp

thinglinks-main.zip

adp_ctrl.h

ADP冲冲冲_actor-critic_criticactor_ADP神经网络_adp_CRITIC.zip

ADP冲冲冲_actor-critic_criticactor_ADP神经网络_adp_CRITIC_源码.zip

120_adp_动态规划_自适应动态_ADPmatlab程序_自适应规划_源码.zip

main.c.zip

devtools-main.zip

hii-main.zip

with-jake-main.zip

timeline-main.zip

ADP_V2.zip_ADP-V2_Adaptive dynamic_adp 动态规划_动态规划

120_adp_动态规划_自适应动态_ADPmatlab程序_自适应规划.zip

ADP4J.zip

海泰克触摸屏软件ADP_V6.3.1-168.zi(有SN).zip

ADP V6.14.0-B204_20200807.zip

BlobAssets-main.zip

样例-FX5U内置模拟量，4AD-ADP,4DA-ADP.zip

MemmoveBypass-main.zip

三菱3uEntFxPlc_ADP V34.rar

ADPCM_voice_compression_process.rar_ADPCM语音压缩_adp_adpcm_语音压缩

OlymTech iBOS_ADP参考手册.pdf 请大家参考

adp.zip_ADP MATLAB_adp_cart pole_nueral_nueral network

CT graph games.zip_adp_agent based_multi adp_图论_多智能体ADP

FX3U-ENET-ADP_User's_Manual.pdf

HITECH触摸屏软件ADP_V6.3.1.rar

最新资源