【免费】61DACProceedings.zip资源-CSDN文库

共336个文件

pdf：336个

需积分: 0 182 浏览量更新于2024-08-27 收藏 566.39MB ZIP 举报

61 DAC Proceedings.zip

收起资源包目录

61 DAC Proceedings.zip （336个子文件）

1683_Camera_Ready_Paper.pdf 20.27MB

1788_Camera_Ready_Paper.pdf 15.61MB

1267_Camera_Ready_Paper.pdf 13.28MB

1617_Camera_Ready_Paper.pdf 12.41MB

2078_Camera_Ready_Paper.pdf 12.23MB

1378_Camera_Ready_Paper.pdf 11.7MB

1555_Camera_Ready_Paper.pdf 9.91MB

399_Camera_Ready_Paper.pdf 9.35MB

1455_Camera_Ready_Paper.pdf 9.3MB

501_Camera_Ready_Paper.pdf 9.1MB

515_Camera_Ready_Paper.pdf 8.07MB

1219_Camera_Ready_Paper.pdf 8.03MB

1053_Camera_Ready_Paper.pdf 6.89MB

1912_Camera_Ready_Paper.pdf 6.84MB

2070_Camera_Ready_Paper.pdf 6.22MB

852_Camera_Ready_Paper.pdf 5.29MB

1535_Camera_Ready_Paper.pdf 5.16MB

1521_Camera_Ready_Paper.pdf 4.88MB

223_Camera_Ready_Paper.pdf 4.59MB

391_Camera_Ready_Paper.pdf 4.53MB

220_Camera_Ready_Paper.pdf 4.18MB

682_Camera_Ready_Paper.pdf 4.1MB

497_Camera_Ready_Paper.pdf 3.93MB

1561_Camera_Ready_Paper.pdf 3.8MB

1851_Camera_Ready_Paper.pdf 3.8MB

1473_Camera_Ready_Paper.pdf 3.8MB

1784_Camera_Ready_Paper.pdf 3.67MB

1612_Camera_Ready_Paper.pdf 3.66MB

1233_Camera_Ready_Paper.pdf 3.65MB

1461_Camera_Ready_Paper.pdf 3.65MB

651_Camera_Ready_Paper.pdf 3.61MB

1062_Camera_Ready_Paper.pdf 3.6MB

1147_Camera_Ready_Paper.pdf 3.57MB

924_Camera_Ready_Paper.pdf 3.56MB

481_Camera_Ready_Paper.pdf 3.55MB

770_Camera_Ready_Paper.pdf 3.54MB

911_Camera_Ready_Paper.pdf 3.48MB

2060_Camera_Ready_Paper.pdf 3.39MB

310_Camera_Ready_Paper.pdf 3.38MB

833_Camera_Ready_Paper.pdf 3.36MB

1041_Camera_Ready_Paper.pdf 3.32MB

991_Camera_Ready_Paper.pdf 3.3MB

1609_Camera_Ready_Paper.pdf 3.25MB

1351_Camera_Ready_Paper.pdf 3.15MB

1067_Camera_Ready_Paper.pdf 3.11MB

1395_Camera_Ready_Paper.pdf 3.09MB

489_Camera_Ready_Paper.pdf 3.06MB

415_Camera_Ready_Paper.pdf 3.02MB

1703_Camera_Ready_Paper.pdf 2.99MB

502_Camera_Ready_Paper.pdf 2.95MB

1011_Camera_Ready_Paper.pdf 2.93MB

355_Camera_Ready_Paper.pdf 2.87MB

781_Camera_Ready_Paper.pdf 2.8MB

1762_Camera_Ready_Paper.pdf 2.79MB

1365_Camera_Ready_Paper.pdf 2.78MB

756_Camera_Ready_Paper.pdf 2.77MB

731_Camera_Ready_Paper.pdf 2.69MB

575_Camera_Ready_Paper.pdf 2.66MB

91_Camera_Ready_Paper.pdf 2.64MB

1364_Camera_Ready_Paper.pdf 2.62MB

649_Camera_Ready_Paper.pdf 2.6MB

215_Camera_Ready_Paper.pdf 2.58MB

432_Camera_Ready_Paper.pdf 2.57MB

883_Camera_Ready_Paper.pdf 2.55MB

709_Camera_Ready_Paper.pdf 2.55MB

1317_Camera_Ready_Paper.pdf 2.51MB

1907_Camera_Ready_Paper.pdf 2.48MB

869_Camera_Ready_Paper.pdf 2.47MB

768_Camera_Ready_Paper.pdf 2.44MB

1468_Camera_Ready_Paper.pdf 2.42MB

812_Camera_Ready_Paper.pdf 2.41MB

774_Camera_Ready_Paper.pdf 2.37MB

521_Camera_Ready_Paper.pdf 2.33MB

995_Camera_Ready_Paper.pdf 2.29MB

255_Camera_Ready_Paper.pdf 2.29MB

458_Camera_Ready_Paper.pdf 2.28MB

1872_Camera_Ready_Paper.pdf 2.25MB

1312_Camera_Ready_Paper.pdf 2.24MB

1615_Camera_Ready_Paper.pdf 2.22MB

1349_Camera_Ready_Paper.pdf 2.21MB

652_Camera_Ready_Paper.pdf 2.21MB

452_Camera_Ready_Paper.pdf 2.16MB

656_Camera_Ready_Paper.pdf 2.14MB

891_Camera_Ready_Paper.pdf 2.09MB

1852_Camera_Ready_Paper.pdf 2.05MB

676_Camera_Ready_Paper.pdf 2.02MB

482_Camera_Ready_Paper.pdf 2MB

1037_Camera_Ready_Paper.pdf 1.98MB

653_Camera_Ready_Paper.pdf 1.97MB

541_Camera_Ready_Paper.pdf 1.96MB

158_Camera_Ready_Paper.pdf 1.92MB

1337_Camera_Ready_Paper.pdf 1.91MB

767_Camera_Ready_Paper.pdf 1.86MB

637_Camera_Ready_Paper.pdf 1.85MB

895_Camera_Ready_Paper.pdf 1.8MB

733_Camera_Ready_Paper.pdf 1.77MB

243_Camera_Ready_Paper.pdf 1.72MB

1968_Camera_Ready_Paper.pdf 1.7MB

618_Camera_Ready_Paper.pdf 1.69MB

426_Camera_Ready_Paper.pdf 1.68MB

共 336 条

资源推荐

资源预览

资源评论

DeepRIoT: Continuous Integration and Deployment Of

Robotic-IoT Applications

Meixun Qu

∗,†

, Jie He

∗,†

, Zlatan Tucaković

†

, Ezio Bartocci

†

, Dejan Ničković

‡

, Haris Isaković

†

, Radu Grosu

†

Technische Universität Wien

†

, AIT Austrian Institute of Technology

‡

ABSTRACT

We present DeepRIoT, a continuous integration and continuous de-

ployment (CI/CD) based architecture that accelerates the learning

and deployment of a Robotic-IoT system trained from deep rein-

forcement learning (RL). We adopted a multi-stage approach that

agilely trains a multi-objective RL controller in the simulator. We

then collected traces from the real robot to optimize its plant model,

and used transfer learning to adapt the controller to the updated

model. We automated our framework through CI/CD pipelines, and

nally, with low cost, succeeded in deploying our controller in a

real F1tenth car that is able to reach the goal and avoid collision

from a virtual car through mixed reality.

KEYWORDS

Deep Reinforcement Learning, Sim2Real, DevOps, CI/CD

1 INTRODUCTION

Deep Reinforcement Learning [

] (RL) has been gaining momen-

tum in the control tasks of Robotic-IoT systems [

–

]. Although RL

has achieved state-of-the-art in some simulation benchmarks, there

are still many factors that hinder its application in real Robotic-IoT

systems, for example, the sample ineciency problem of RL algo-

rithms, the generalization problem of deep neural networks, the

unavoidable noise in the sensors, and the deviation of the simulation

from the real system (Sim2Real problem [5]).

Unfortunately, there is no systematic approach that takes the

aforementioned problems into consideration, to guide the training

and deployment of RL in real Robotic-IoT systems. In this paper,

we ll in this blank by proposing DeepRIoT, a practical framework

that aims at accelerating the learning and deployment for RL algo-

rithms. To achieve this goal, we leverage the DevOps [

] practices

that integrate the process of software development (Dev) with the

monitoring of the real system during its operation (Ops). In our con-

text, the collection of the execution traces of the real Robotic-IoT

running the RL policy is used to improve the models and thus the

RL policy in the simulation environment. Using DevOps machinery

we can fully automate the integration of model changes w.r.t. the

real system execution, RL policy retraining in the simulation en-

vironment and its continuous deployment in the real Robotic-IoT.

These processes are referred as Continuous Integration [

] (CI) and

∗

Both authors contribute equally to this paper.

Permission to make digital or hard copies of part or all of this work for personal or

classroom use is granted without fee provided that copies are not made or distributed

for prot or commercial advantage and that copies bear this notice and the full citation

on the rst page. Copyrights for third-party components of this work must be honored.

For all other uses, contact the owner/author(s).

DAC ’24, June 23–27, 2024, San Francisco, CA, USA

ACM ISBN 979-8-4007-0601-1/24/06.

https://doi.org/10.1145/3649329.3658250

Figure 1: De epRIoT architecture and workow

Continuous Deployment [

] (CD). To accelerate RL in the simulation

environment, we further enhance this process by using models of

the real Robotic-IoT of dierent complexity.

We select a classic use case in motion planning to demonstrate

our approach. We perform our experiments using F1tenth [

], an

open-source autonomous vehicle platform. Our task is to teach an

F1tenth car to reach the goal position from a starting point while

avoiding collisions with static obstacles and other vehicles.

DeepRIoT Architecture. We sketch the architecture of DeepRIoT,

depicted in Fig. 1. The architecture consists of Pipelines (A-E).

Pipeline A. Here we specify requirements

for the given robotic

tasks. These requirements are given in the form of formal speci-

cations using Signal Temporal Logic (STL) [

]. There are two

major advantages of using formalized requirements. First, one can

use runtime monitors to measure the degree to which the observed

robotic behaviors satisfy or violate the specications (see Pipelines

C-E). Second, these same specications are used to engineer the

reward function during the agent training process [

] (Pipeline C).

After selecting a suitable simulator for training the robotic tasks,

we begin to construct the training environment in the context of

RL, e.g., dening states, actions, and reward functions.

Pipeline B. This further constructs observers and lters for the

states according to pre-dened feature rules. For example, we add

Gaussian noises to the LiDAR model to make it more realistic. When

we teach the car to avoid collision to obstacles, we only consider

laser scans whose lengths are within a specied range.

Pipeline C. This is responsible for training the RL policy. It takes

the inputs processed by the previous two pipelines, and starts from

picking a simple kinematics model (KiModel) to simulate the dy-

namics of the car. Every step of training yields a four-tuple called

experience that includes current state, current action, next state and

associated reward. We classify each piece of experience into dierent

types, and store them in dierent replay buers according to their

types. In each step, the trainer will proportionally sample batched

experiences from these replay buers and use them for training.

For faster convergence, we divide the whole training procedure

into dierent stages, and every stage emphasizes dierent require-

ments. We set the priority hierarchy by dynamically changing the

sampling ratios for dierent types of experiences, and the weights

of dierent reward functions. With specication

, we evaluate

the success rate after each training stage, which is dened as the

percentage of satisfying

among episodes. Finally, we export the

satisfactory policy (Base Policy) to a corresponding repository. The

success of training in Pipeline C, however, is only based on a simple

plant model, very dierent from the actual dynamics of the car.

Pipeline D. This focuses on adapting the trained policy to richer

dynamics. We rst pick a dynamic model (DyModel) from the model

repository, and test the performance of the so-far learned policy

(trained with KiModel) on this model. If the policy passes the test,

we directly go to the Sim2Real pipeline. Otherwise, we improve

the policy and KiModel as follows. We rst collect the traces, i.e.

the sequences of output states from the Base Policy + DyModel

collected in each time step. Then we use a nonlinear supervised-

learning (SL) method, such as Particle Swarm Optimization [

]

(PSO), to optimize the hyper-parameters of the KiModel, making its

traces closer to the ones output from the DyModel. Next, we store

the new version of the KiModel in the model repository, and go

back to Pipeline C to train a second time our policy on the updated

KiModel. Afterwards, we test the updated policy (Improved Policy)

on the DyModel. The success means that the new policy has adapted

to the DyModel without directly being trained on it. This further

indicates that, without training the policy on the real car, which

is extremely expensive, we can continue to update the policy by

training it on a DyModel whose traces are as similar as the real car.

Pipeline E. Here we rst evaluate the Improved Policy on the real car.

If requirements in

are satised by the observed behaviors, this

Sim2Real pipeline terminates. Otherwise, by taking the same ap-

proach as in Pipeline D, we collect traces from the car, and optimize

the hyper-parameters of DyModel, reducing the dierence between

its output traces and the real car’s ones. The updated DyModel is

the best model whose dynamic behaviors is as close as possible to

the real car. With this new DyModel, we go back to Pipeline C to

train a third and last time our policy (Final Policy). This is the best

policy we can obtain through our framework. Finally, we evaluate

the Final Policy in a mixed-reality environment where a real car

and a simulated car should cross each other without collision, and

both reach their goal positions. The workow dened by the above

architecture and its pipelines is fully automated using the CI/CD

integration, which is described in more details in Section 6.

Our contribution. We developed DeepRIoT, an ecient and low-

cost CI/CD framework for training and deployment of RL policies in

Robotic-IoTs. This narrows the sim2real gap in the RL community,

and bridges the gap between robotics and CI/CD approaches in

software engineering [

]. The high success rate in our simulation

and on-site experiments proves the feasibility of our approach.

2 RELATED WORK

Over the past years, the great potential of RL was increasingly ex-

ploited in more and more application domains, ranging from learn-

ing winning strategies in complex games such as AlphaGo [

to continuous control of robotics tasks, as in [

–

]. However, RL

generally requires millions of training episodes to be eective. Gen-

erating so many episodes in real-world Robotic-IoTs is practically

infeasible. Furthermore, RL may be unsafe and costly, consider-

ing the potential damage and harm it may cause by applying a

trial-and-error method directly on Robotic-IoTs.

Similarly to work in [

–

], we address this training prob-

lem by adopting a Sim2Real [

] approach: We rst train the RL

algorithm with episodes generated in a simulation environment,

a digital twin reproducing the physical behavior of the real sys-

tem, and then we deploy the learned RL policy in the real-world

scenario. However, as highlighted in [

], transferring a controller

policy from a virtual to a physical environment is generally a very

critical operation, because the simulated behavior is based on an

approximated model of the physical system. Hence, undesired be-

haviors or failures may still appear only in the real Robotic-IoT.

Thus, it is necessary to continuously update the digital twin(s) after

observing the execution of a policy in the real Robotic-IoT.

In contrast to the works in [

–

], here we propose a novel

approach to reduce the gap between the real-world scenario and

its digital twin, by employing a DevOps methodology based on

CI/CD software development patterns [

]. In particular, we show

how to use digital-twin models of increasing complexity, to accel-

erate the convergence of RL on the digital twin, and how to use

a mixed reality of virtual and actual Robotic-IoTs perceiving each

other, to improve the learning and deployment of the RL policy in

the real world. In contrast with other model-based DevOps pro-

posed for cyber-physical systems [

–

], we design our approach

for specically accelerating the learning and deployment of deep

reinforcement learning policies in Robotic-IoT systems.

3 PLANT MODEL

We assume that the plant, which in our example is the vehicle to

be controlled by the policy we will synthesize, is given as a Markov

Decision Process (MDP)

𝑀 =

⟨

Σ, 𝐴, 𝑃, 𝑅

⟩

. Here

is the state-space,

𝐴

is the action space,

𝑃

is a probabilistic transition relation, and

𝑅

is a reward function. Given an initial state

𝜎

and action sequence

𝑎

0:𝑇 −1

, we denote the state sequence generated by

𝑀

for

𝑇

time

steps as

𝜎

0:𝑇

. Every state

𝜎

′

∈ Σ

has a scalar measure

𝑅(𝜎)

, given

by the reward function 𝑅. A trace is a pair (𝑎

0:𝑇 −1

, 𝜎

0:𝑇

Vehicle Dynamics. This concertizes the transition relation of

the MDP. As discussed in the introduction, we are using both a

kinematic model [

], we call KiModel, and a dynamic bycicle

model [

], we call DyModel, for this purpose. Both models are

deterministic. However, their main sensor, the LiDAR scan, is prob-

abilistic, as we add Gaussian noise. This captures the nature of

observations received from the LiDAR mounted on the F1tenth car.

As a consequence, the vehicle model itself becomes probabilistic.

In order to design the reward, we start from a task specication.

Task Specication. Our goal is to train an RL policy that satises

both reachability and safety properties. To this end, we rst specify

the reachability property in Signal Temporal Logic (STL) [10]:

柴0426

粉丝: 0
资源: 1

61 DAC Proceedings.zip

ISWC 2016 Proceedings.zip

ISWC 2019 Proceedings.zip

ISWC 2018 Proceedings.zip

ISWC 2017 Proceedings.zip

sec19_full_proceedings.zip

CCKS 2016-Proceedings.zip

PAKDD - 2019 proceedings.zip

China Satellite Navigation Conference (CSNC) 2018 Proceedings.pdf

Microsoft+Word+2003+Proceedings+Templates.zip

WISE 2018 proceedings.zip

PCODA2005proceedings.pdf

Approver_ A Confused Entity in Criminal Proceedings.pdf

KSEM 2018 proceedings.zip

COLING 2018 proceedings.zip

COLING 2016 proceedings.zip

proceedings-ranlp-2019.zip

Springer的LNCS格式word模板2010-2016.zip

DASFAA - 2019 - proceedings.zip

APWeb - WAIM 2019 - proceedings.zip

CCKS2016-Proceedings.zip

KSEM 2019 proceedings.zip

IEEE 会议模板(2019.10).zip

WOSD.1.1.0.0.zip

IEEE A-SSCC2019 PROCEEDINGS.pdf

Modelica2019Proceedings.pdf

Cats and dogs.zip

springer LNCS.zip

PCODA2005proceedings.7z.002

sec18_full-proceedings.pdf

最新资源