See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/303948974
Online fault compensation control based on policy iteration algorithm for a
class of affine nonlinear systems with actuator failures
ArticleinIET Control Theory and Applications · June 2016
DOI: 10.1049/iet-cta.2015.1105
CITATIONS
27
READS
98
3 authors, including:
Bo Zhao
Beijing Normal University
80 PUBLICATIONS432 CITATIONS
SEE PROFILE
All content following this page was uploaded by Bo Zhao on 02 January 2019.
The user has requested enhancement of the downloaded file.
IET Control Theory & Applications
Research Article
Online fault compensation control based on
policy iteration algorithm for a class of affine
non-linear systems with actuator failures
ISSN 1751-8644
Received on 29th October 2015
Revised 16th April 2016
Accepted on 3rd May 2016
E-First on 2nd August 2016
doi: 10.1049/iet-cta.2015.1105
www.ietdl.org
Bo Zhao
1
, Derong Liu
2
, Yuanchun Li
3
1
State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing
100190, People's Republic of China
2
School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing 100083, People's Republic of China
3
Department of Control Science and Engineering, Changchun University of Technology, Changchun 130012, People's Republic of China
E-mail: derong@ustb.edu.cn
Abstract: In this study, a novel online fault compensation control scheme based on policy iteration (PI) algorithm is developed
for a class of affine non-linear systems with actuator failures. The control scheme consists of a PI algorithm and a fault
compensator. For fault-free dynamic models, the PI algorithm is developed to solve the Hamilton–Jacobi–Bellman equation by
constructing a critic neural network, and then the approximate optimal control policy can be derived directly. Alternatively, the
actuator failure is reconstructed adaptively to achieve online fault compensation without the fault detection and isolation
mechanism. The closed-loop system is proved to be asymptotically stable via Lyapunov's direct method. Two numerical
simulation examples are given to demonstrate the effectiveness of the proposed fault compensation control scheme.
1 Introduction
With the fast development of science and technology, industrial
applications are becoming increasingly complex and large scale.
Consequently, the occurrence of failures is inevitable as the
number of components increases. Malfunctions of a component
may not only degrade control performance, but also result in the
loss of system reliability and safety. It should be emphasised that
about 60
%
of control performance degradation in industrial
systems is caused by actuator and sensor failures and equipment
fouling [1]. To achieve higher reliability and better control
performance, a large amount of research efforts on fault tolerant
control (FTC) systems have been made during the past 30 years to
ensure stability and maintain acceptable control. Among all kinds
of possible failures, actuator failures are considered as one of the
most critical challenges, mainly for the reason that the control
performance can be deteriorated by unexpected and unknown
actuator actions.
Many FTC approaches have been developed to deal with
actuator failures, such as linear quadratic control, intelligent
control, adaptive control and methods based on a combination of
different strategies. In general, FTC approaches can be classified
into two categories, namely passive approaches and active ones.
The main difference between them lies in whether the fault tolerant
controller depends on the fault detection and identification (FDI)
unit or not. By using passive designs, Zhou and Ren [2] proposed
an architecture that included two parts, where the feedback control
system was solely controlled by the performance controller, while
the model uncertainties and external disturbances were handled by
the robustness of the controller. Xiao et al. [3] derived an adaptive
sliding mode controller by estimating the bound of actuator faults
with an online updating law. Wang et al. [4] investigated a robust
fault-tolerant
control of active suspension systems with finite-
frequency constraints. The passive designs achieved insensitivity
of systems to certain possible failures by their robustness, so they
have the drawback that the designed controllers are often
constrained to handling large failures. On the contrary, active FTC,
which possesses stronger fault tolerant capability, achieves stability
and required performance by tuning control strategies under the
decision of an FDI unit. Zhang and Jiang [5], Hwang et al. [6] gave
some excellent reviews on fault reconfiguration methods. Different
methods for handling the reconfiguration problem have been
reported, such as multiple-model approach [7, 8], adaptive control
approach [9, 10], linear quadratic control [11, 12], pseudo-inverse
[13, 14], artificial intelligence [15], model predictive control [16,
17], linear matrix inequality [18, 19], variable structure control [20,
21] and so on. Nazari et al. [22] reconfigured the sensory faulty
system by using a virtual sensor which is adapted to the FDI unit.
Fault accommodation strategy is another way to achieve the goal of
active FTC. Yang and Wang [23] employed a bank of T-S fuzzy
model-based FDI observers to describe particular faults, such that
one of them can track the current system state, and the
corresponding observer estimated error can converge exponentially
to zero. Yoo [24] investigated a time-delay independent fault
detection and accommodation scheme, where an approximation-
based fault accommodation design is activated to compensate for
multiple time-delay faults after the fault being detected. Based on
the fault compensation technique, Wang and Wen [25] proposed an
adaptive failure compensation control scheme with the non-linear
damping and parameter projection techniques for parametric strict
feedback non-linear systems.
It is well-known that adaptive dynamic programming (ADP) is
a powerful approximation tool to solve Hamilton–Jacobi–Bellman
(HJB) equations in non-linear systems [26]. In recent years, ADP
algorithms were further developed to solve the control problem of
continuous-time and discrete-time systems with time delays [27],
external disturbances [28], and control constraints [29], as well as
for trajectory tracking [30], coordination control [31] and so on.
These algorithms are mainly classified into heuristic dynamic
programming (HDP), dual HDP (DHP), action-dependent HDP
(ADHDP), ADDHP, globalised DHP (GDHP) and ADGDHP.
Iterative methods that can be classified into value iteration (VI)
algorithms and policy iteration (PI) algorithms are used in ADP to
solve the HJB equation indirectly. Al-Tamimi et al. [32] proved
that the iterative performance index function is a non-decreasing
sequence and with upper bound, and it converges to the optimal
performance index function, which satisfies the HJB equation. Liu
et al. [33] investigated a neuro-optimal control scheme for a class
of unknown discrete-time non-linear systems with a discount factor
in the cost function and GDHP technique. Zhang et al. [34]
addressed the infinite-time optimal tracking control problem by
using the greedy HDP iteration algorithm. We can conclude from
these studies that VI can remove the requirement of the initial
stabilising control, but it cannot guarantee the stability of the
IET Control Theory Appl., 2016, Vol. 10 Iss. 15, pp. 1816-1823
© The Institution of Engineering and Technology 2016
1816
system. Actually, only the converged optimal control law can be
used to control non-linear systems [35]. In contrast to VI
algorithms, the iterative performance index function of PI
algorithms converge to the optimum non-increasingly and each of
the iterative controls stabilises the non-linear systems [36, 37].
Abu-Khalaf and Lewis [38] proposed a PI algorithm for
continuous-time non-linear systems with control constraints. Liu
and Wei [35] proposed a discrete-time PI ADP method for solving
the infinite horizon optimal control problem of non-linear systems.
Zhang et al. [31] addressed the optimal coordination control for
multiagent differential games by solving the coupled Hamilton–
Jacobi equations via a PI algorithm.
Several papers considered the fault tolerance problem by using
reinforcement learning and ADP strategies. Wang et al. [39]
developed a robust state feedback reliable control scheme
integrated with an iterative learning. By solving linear matrix
inequalities (LMIs), the developed scheme was explicitly
formulated together with an adjustable robust
performance
level for batch process systems with unknown actuator failures.
Feng et al. [40] proposed a reconfigurable fault tolerant deflection
routing algorithm based on reinforcement learning for network on
chip. An optimised routing algorithm-based hierarchical Q-learning
was proposed to reduce the routing table size. He and Shayman
[41] developed a reinforcement-learning based fast algorithm for
proactive network fault management. The proactive diagnosis
information was considered to produce effective monitoring and
control policies for intelligent managers or agents. Zhu and Yuan
[42] presented a novel approach to automate recovery policy
generation with reinforcement learning techniques. It could learn a
new and locally optimal policy that outperformed the original one
based on the recovery history of the original user-defined policy.
Yen and DeLima [43, 44] proposed a supervisor making use of two
quality indices to perform FDI and isolation based on GDHP.
Although it could reduce the reconfiguration time of the controller,
the strategy was implemented under the condition that a priori
knowledge was stored in a dynamic model bank.
In this paper, an online fault compensation control scheme-
based PI algorithm is established to obtain the optimal control of
affine non-linear systems with actuator failures. Due to the
occurrence of actuator failures, the PI algorithm may be biased or
fail to achieve the optimal control. In order to reduce the
degradation caused by faults, a redesigned fault compensation
based PI controller is provided. The weight errors of the critic
neural network are proved to be uniformly ultimately bounded
(UUB), and the stability of the closed-loop system with actuator
failures is guaranteed via Lyapunov's approach. Different from
classic ADP algorithms, the action neural network is no longer
required in this algorithm, which reduces the computational burden
effectively. Meanwhile, the proposed FTC strategy consists of two
parts, namely the PI-based optimal control part and the online fault
compensation part. In this sense, it can be conveniently
implemented to handle fault tolerant problems.
The rest of this paper is organised as follows. In Section 2, we
present the problem statement. In Section 3, the PI algorithm for
fault-free systems is presented. A fault compensator is developed
with the adaptive technique to redesign the FTC, and then stability
analysis is given. In Section 4, two examples are provided to
demonstrate the effectiveness of the present scheme. In Section 5,
the conclusion is drawn.
2 Problem statement
Consider the following affine non-linear system with actuator
failures:
˙
() = (()) + (())((())
a
())
(1)
where
R
is the system state vector,
R
is the control input
vector,
( )
and
( )
are locally Lipchitz and differentiable in
their arguments with (0) = 0, and
a
() R
is an unknown
additive actuator failure. Here, let
(0) =
0
be the initial state.
For the system (1) with
a
() = 0 (i.e. the system is fault-free),
the performance index function can be defined as
(
0
) =
0
((), ())d
(2)
where (, ) =
+
is the utility function, (0, 0) = 0,
and (, ) 0 for all
and
, in which R
×
and R
×
are positive definite matrices.
Remark 1: Possible failures that occur on actuators may present
many scenarios, such as partial loss of effectiveness, locked in
place, saturation and free-swing. They affect the efficiency of
actuators, i.e. execution capability of the actuators will change.
Thus, the term
()
in (1) should be changed directly to affect the
considered system. In this case, actuator failures can be seen as
matched disturbances. However, there exist different physical
meanings between them. Generally speaking, disturbances are
assumed to be known norm-bounded and inevitable in real
applications. On the other hand, actuator failures occur
stochastically, and are assumed to be unknown bounded functions.
Indeed, disturbances will lead the system performance to
impreciseness, rather than badly destroy the system which may
suffer from actuator faults.
To handle the optimal control problem, the designed feedback
control must be admissible. Before the algorithm is presented, the
definition of admissible control is introduced.
Definition 1: For system (1) with
a
= 0
, a control policy
()
is said to be admissible, if
()
is continuous on a set
Ω R
,
(0) = 0,
()
stabilises the system, and
(
0
)
in (2) is finite for
all
Ω
.
For any admissible control policy
Ψ(Ω), where
Ψ(Ω)
denotes the set of admissible control, if the performance index
function
(
0
) =
0
((), ())d
(3)
is continuously differentiable, then the infinitesimal version of (3)
is the Lyapunov equation
0 =
(, ) + ()
(() + ())
(4)
with (0) = 0, and the term
()
denotes the partial derivative
of
()
with respect to
, i.e. () = ()/.
Define the Hamiltonian function of the problem and the optimal
performance index function as
, , () = (, ) + ()
(() + ())
and
(
0
) = min
Ψ(Ω)
0
((), ())d .
(5)
Let
()
be the optimal performance index function, then
0 = min
Ψ(Ω)
, ,
()
(6)
where
() =
()/
. If the solution
()
exists and is
continuously differentiable, the optimal control can be expressed as
() =
1
2
1
()
() .
(7)
In general, if the system is fault-free (i.e.
a
= 0
), the solution of
(6) can be approximated by using the PI technique (see Algorithm
1).
IET Control Theory Appl., 2016, Vol. 10 Iss. 15, pp. 1816-1823
© The Institution of Engineering and Technology 2016
1817
- 1
- 2
前往页