RRR.zip_33rrr.comw_RRR31W日期_adaptiveimaging_single

共5个文件

pdf：5个

版权申诉

single

168 浏览量 2022-09-20 10:26:58 上传评论收藏 4.17MB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

RRR.zip （5个子文件）

2009. An Adaptive AQM Algorithm Based on Neuron Reinforcement Learning.pdf 256KB

2015. Retrieval of Leaf Rigment Content using Wavelet-Based Prospect Inversion from Leaf Reflectance Spectra.pdf 410KB

2016. Combined Prediction Model of Quantum Genetic Grey Prediction ModelandSupport VectorMachine.pdf 387KB

2009. Prerequisites For Integrating Unsupervised And Reinforcement Learning In A Single Network Of Spiking Neurons.pdf 1.66MB

2012. Fuzzy Wavelet Network with Reinforcement Learning Application on Underactuated System.pdf 1.88MB

Fuzzy Wavelet Network with Reinforcement

Learning: Application on Underactuated System

Iv´an S. Razo-Zapata

Instituto Tecnol´ogico de Monterrey

Departamento de Ingenier´ıa El´ectrica y Computaci´on

Eugenio Garza Sada 2501, Col. Tecn´ologico,

Monterrey, N. L. M´exico

Email: [email protected]

Luis E. Ramos-Velasco

Centro de Investigaci´on en Tecnolog´ıas de Informaci´on y

Sistemas, Universidad Aut´onoma del Estado de Hidalgo,

Carretera Pachuca-Tulancingo, Km. 4.5,

Mineral de la Reforma, Hidalgo, M´exico.

Tel/Fax:(+52) 7717172000 ext. 6738.

Email: [email protected]

Julio C. Ramos Fern´andez

Universidad Polit´ecnica de Pachuca,

Carretera Pachuca-Cd. Sahag´un, Km. 20, Rancho Luna,

Ex-Hacienda de Sta. B´arbara, Municipio de Zempoala,

Hidalgo, M´exico.

Email: [email protected]

Mar´ıa A. Espejel-Rivera

Universidad la Salle Pachuca,

Campus La Concepci´on, Av. San Juan Bautista

de La Salle No. 1. San Juan Tilcuautla, San Agust´ın

Tlaxiaca, Hgo. C.P. 42160. M´exico.

Email: aesp[email protected]

Julio Waissman-Vilanova

Universidad de Sonora,

Blvd. Encinas esquina con Rosales s/n C.P. 83000,

Hermosillo, Sonora, M´exico.

Email: juliow[email protected]

Abstract —This paper presents a novel approach of

reinforcement learning for continuous systems. The

scheme is based in wavelet networks to approximating

the continuous space of states. The structure of the

wavelet network is dynamically generated accord to

the explored regions and trained with a modiﬁed Q-

Learning algorithm. The wavelet network include a

fuzzy inference system which computes the value of the

set of possible actions, in order to deal with continuous

actions. This novel approach is called adaptive wavelet

reinforcement learning control (AWRLC). Simulations

of applying the proposed method to underactuated

systems are performed to demonstrate the properties

of the adaptive wavelet network controller.

I. Introduction

Reinforcement learning (RL) is learning to perform

sequential decision tasks without explicit instructions, only

optimizing a criterion about how the task is perform.

So, the learner doesn’t know which actions to take, but

instead must discover which actions yield the most reward

by trying them. This method, is goal-directed, and seems

better adapted to the solution of a kind of control prob-

lems [1], [2], which ones about searching a ﬁnal goal, and

the problem is to ﬁnd a policy that reach this goal [5].

The basic RL algorithms use a look-up table scheme

in order to represent the value function Q(s, a). Un-

fortunately this representation is limited when working

with continuous spaces like physical systems. Several ap-

proaches can be applied to deal with this problems, like

function approximation techniques. Neural networks oﬀers

an interesting perspective due to their ability to approxi-

mate nonlinear functions [6].

In recent years, wavelets have attracted much attention

in many scientiﬁc and engineering research areas. Wavelets

possess two features that make them especially valuable for

data analysis: they reveal local properties of the data and

they allow multiscale analysis. The local property is useful

for applications that requires online response to changes,

such a controlling process. Wavelets and neural networks

have been combined [7], [8], to form a class of networks,

so called wavelet networks, which are capable of handling

moderately high-dimensional problems [6].

Inspired by the theory of multi-resolution analysis of

wavelet transform and suitable adaptive fuzzy wavelet

network, an adaptive wavelet network is proposed for ap-

proximating action-value functions, system identiﬁcation

and control [9], [15]. In [4] presents an adaptive fuzzy

wavelet network controller for control of nonlinear aﬃne

systems and is testing in numerical simulations for the

inverted pendulum system.

Reinforcement learning control, which has been applied

to control a variety of systems [17], [16], [18], [19], [22],

most recently, actor-critic reinforcement learning and the

adaptive control theory have been combined to ensure the

tracking performance and stability [20], [21].

In this paper, we propose an adaptive wavelet reinforce-

ment learning control (AWRLC) whose design is based

WAC 2012 1569534923

on the promising function approximation capability of

wavelet networks. The goal of the paper is to propose a

control scheme based on RL algorithms and an AWRLC

to control underactuated systems. In order to avoid bang-

bang controllers, is necessary deal with continuous actions.

A Fuzzy Inference System is applied to computing the

value of the action between the set of possible actions.

In this work the the Pendubot was used like example to

evaluate the advantages and disadvantages of AWRLC

methods for control of underactuated systems.

The work is organized as follows. Section II presents

the reinforcement learning approach. In Section III is sum-

mariezed the background about wavelets networks, while

Section IV shows the control scheme which is implemented

in the system. Section V gives the results obtained by

numerical simulation. Finally, in Section VI conclusions

from results and future work are presented.

II. Reinforcement Learning

Q-Learning is a reinforcement learning method where

the learner builds incrementally a Q-function which

attempts to estimate the discounted future rewards for

taking actions given states. the system is assumed as a

Markov Decision Process (MDP) [5]. So, in a common

control task maximize the total return R

expressed in

(1) is the main objective.

k=0

t+k+1

(1)

Where R

is the total return at state s

and r

is the reward

value (numerical) when the system reach the state s

.In

this way, the output of the Q-function for state s

and

action a

is denoted by Q(s

, a

). When action a

has been

chosen and applied, the system is moved to a new state,

t+1

, and a reinforcement signal, r

t+1

, is received, Q(s

, a

)

is updated by [5]:

Q(s

t+1

, a

t+1

) ←− Q(s

, a

) + αδ (2)

where

δ = r

t+1

+ γ max

Q(s

t+1

, a) −Q(s

, a

)

0 ≤ α ≤ 1 is the learning rate, and 0 ≤ γ ≤ 1 is called

the discount, this parameter is used to decrease r

t+1

in the

total return (1).

III. Wavelet Networks

Wavelets are a class of functions which have some

interesting and special properties. These properties are

localization in scale and time, compact support, multires-

olution analysis among others. The original objective of

the theory of wavelets is to construct orthogonal bases of

(R). These bases are constituted by translations and

dilations of the same function ψ called “mother wavelet”

[10].

The structure of a wavelet network is a type of building

block similar to a RBF network [8]. This building block

allows the approximation of unknown functions by the con-

cept of the multi-resolution approximation. The building

block is formed by shifting and dilating the basis function

ψ, (the modiﬁed version is its “daughter wavelet”) and

a “father wavelet” φ. Most commonly, wavelet bases are

derived using shift-invariance and dyadic dilation. In this

way we use the dyadic series expansion

(x) = 2

j/2

ψ(2

x − k), j, k ∈ Z (3)

which is integral power of 2 for frequency partitioning. The

daughter wavelet (3) is obtained from a mother wavelet

function ψ by a binary dilation (i.e. dilation by 2

) and a

dyadic translation (of k/2

Fig. 1. Structure of wavelet network.

In this way combination of wavelet and neural networks

can handle problems of large dimensions well and can

make constructing network easily. The basic structure of

a wavelet network is illustrated in Fig. 1. The operation

of each layer is summarized as follows [11]

• Using I

and O

to denote the input and output of

the ith node in the jth layer, in ﬁrst layer inputs are

introduce into the network

= I

= x

, i = 1, 2, . . . , n

• Second layer consists of wavelet which one corre-

sponding to pairs of (j, k) in (3), and the inputs and

outputs of the wavelet nodes in this layer can be

described as

(



, . . . , O



= 2

j/2

ψ(2

− k)

i = 1, 2, . . . , m

• Finally the input-output reltion in third layer is ex-

pressed with

y = O

i=1

评论收藏

内容反馈

版权申诉

我虽横行却不霸道

粉丝: 75
资源: 1万+

RRR.zip_33rrr.comw_RRR31W日期_adaptive imaging_single_一些文獻

最新资源

RRR.zip_33rrr.comw_RRR31W日期_adaptive imaging_single_一些文獻

PO2290_BIOS.rar_BIOS code_lpc2294_www.2290po.com_www.2789po.coMw

w40ap_ip-comW40AP编辑器固件_

580066网站导航PHP正式版

statens-vegvesen-express：https：github.comw3bdesignStatens-Vegvesen使用的无服务器应用程序后端

An Object-Oriented Random-Number Package with Many Long Streams and Substreams - 2000 (streams4)-计算机科学

冰河的渗透实战笔记-冰河.pdf

大灰狼远控2021最新版，解压密码222

J-LINK V10 V11固件.rar

ISO21434.pdf

Web安全漏洞扫描工具-AWVS14

stm32f103 adc采样+dma传输+fft处理 频率计_fft处理_stm32_ADCFFT_频率计_ADC采样_

CTF 竞赛入门指南（ctf-all-in-one）.pdf

Web中间件常见漏洞总结.pdf

jts-1.14.zip

CobaltStrike4.4.zip

cisp-pte渗透测试资源下载 （考试环境+题库）

goby2021红队专版，1.8.255

RK3568硬件设计资料.zip_C#

DEAP2.1.zip_DEA2.1软件下载_dea 2.1软件下载_deap2.1_deap2.1基础模型_dea模型

数据结构与算法分析--C语言描述_数据结构与算法_

海康威视配置文件解码专用工具器.rar

苹果越狱解ID博客中提到的所有工具集.zip

QT帮助文档_中文版_QT中文版帮助文档_

pconline1478255959502.rar

熵值法_stata熵权法_熵权法stata_熵值法stata_state熵值法_面板数据熵值法stata代

APPinvent，蓝牙软件源文件

Burpsuite使用手册中文版全套

AWD之赛前培训.pptx

红蓝对抗——蓝队手册.pdf

最新资源

stm32f103 adc采样+dma传输+fft处理频率计_fft处理_stm32_ADCFFT_频率计_ADC采样_

cisp-pte渗透测试资源下载（考试环境+题库）