1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
60
61
Fuzzy Wavelet Network with Reinforcement
Learning: Application on Underactuated System
Iv´an S. Razo-Zapata
Instituto Tecnol´ogico de Monterrey
Departamento de Ingenier´ıa El´ectrica y Computaci´on
Eugenio Garza Sada 2501, Col. Tecn´ologico,
Monterrey, N. L. M´exico
Email: [email protected]
Luis E. Ramos-Velasco
Centro de Investigaci´on en Tecnolog´ıas de Informaci´on y
Sistemas, Universidad Aut´onoma del Estado de Hidalgo,
Carretera Pachuca-Tulancingo, Km. 4.5,
Mineral de la Reforma, Hidalgo, M´exico.
Tel/Fax:(+52) 7717172000 ext. 6738.
Email: [email protected]
Julio C. Ramos Fern´andez
Universidad Polit´ecnica de Pachuca,
Carretera Pachuca-Cd. Sahag´un, Km. 20, Rancho Luna,
Ex-Hacienda de Sta. B´arbara, Municipio de Zempoala,
Hidalgo, M´exico.
Email: [email protected]
Mar´ıa A. Espejel-Rivera
Universidad la Salle Pachuca,
Campus La Concepci´on, Av. San Juan Bautista
de La Salle No. 1. San Juan Tilcuautla, San Agust´ın
Tlaxiaca, Hgo. C.P. 42160. M´exico.
Email: aesp[email protected]
Julio Waissman-Vilanova
Universidad de Sonora,
Blvd. Encinas esquina con Rosales s/n C.P. 83000,
Hermosillo, Sonora, M´exico.
Email: juliow[email protected]
Abstract —This paper presents a novel approach of
reinforcement learning for continuous systems. The
scheme is based in wavelet networks to approximating
the continuous space of states. The structure of the
wavelet network is dynamically generated accord to
the explored regions and trained with a modified Q-
Learning algorithm. The wavelet network include a
fuzzy inference system which computes the value of the
set of possible actions, in order to deal with continuous
actions. This novel approach is called adaptive wavelet
reinforcement learning control (AWRLC). Simulations
of applying the proposed method to underactuated
systems are performed to demonstrate the properties
of the adaptive wavelet network controller.
I. Introduction
Reinforcement learning (RL) is learning to perform
sequential decision tasks without explicit instructions, only
optimizing a criterion about how the task is perform.
So, the learner doesn’t know which actions to take, but
instead must discover which actions yield the most reward
by trying them. This method, is goal-directed, and seems
better adapted to the solution of a kind of control prob-
lems [1], [2], which ones about searching a final goal, and
the problem is to find a policy that reach this goal [5].
The basic RL algorithms use a look-up table scheme
in order to represent the value function Q(s, a). Un-
fortunately this representation is limited when working
with continuous spaces like physical systems. Several ap-
proaches can be applied to deal with this problems, like
function approximation techniques. Neural networks offers
an interesting perspective due to their ability to approxi-
mate nonlinear functions [6].
In recent years, wavelets have attracted much attention
in many scientific and engineering research areas. Wavelets
possess two features that make them especially valuable for
data analysis: they reveal local properties of the data and
they allow multiscale analysis. The local property is useful
for applications that requires online response to changes,
such a controlling process. Wavelets and neural networks
have been combined [7], [8], to form a class of networks,
so called wavelet networks, which are capable of handling
moderately high-dimensional problems [6].
Inspired by the theory of multi-resolution analysis of
wavelet transform and suitable adaptive fuzzy wavelet
network, an adaptive wavelet network is proposed for ap-
proximating action-value functions, system identification
and control [9], [15]. In [4] presents an adaptive fuzzy
wavelet network controller for control of nonlinear affine
systems and is testing in numerical simulations for the
inverted pendulum system.
Reinforcement learning control, which has been applied
to control a variety of systems [17], [16], [18], [19], [22],
most recently, actor-critic reinforcement learning and the
adaptive control theory have been combined to ensure the
tracking performance and stability [20], [21].
In this paper, we propose an adaptive wavelet reinforce-
ment learning control (AWRLC) whose design is based
WAC 2012 1569534923
1