模仿学习（imitation_learning）【62页ppt】.zip_强化学习模仿学习是资源-CSDN文库

共1个文件

pdf：1个

imitation_learni

需积分: 50 157 浏览量 2019-11-14 22:43:18 上传评论收藏 67.78MB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

imitation_learning.zip （1个子文件）

imitation_learning.pdf 70.35MB

New Frontiers in Imitation

Learning

Yisong Yue

Behavioral Modeling

12.4. TAXI DRIVER ROUTE PREFERENCE DATA 147

Figure 12.2: The collected GPS datapoints

12.4.3 Fitting to the Road Network and Segmenting

To address noise in the GPS data, we ﬁt it to the road network using a particle ﬁlter (Thrun et al.,

2005). A particle ﬁlter simulates a large number of vehicles traversing over the road network,

focusing its attention on particles that best match the GPS readings. A motion model is employed

to simulate the movement of the vehicle and an observation model is employed to express the

relationship between the true location of the vehicle and the GPS reading of the vehicle. We use

a motion model based on the empirical distribution of changes in speed and a Laplace distribution

for our observation model.

Once ﬁtted to the road network, we segmented our GPS traces into distinct trips. Our segmen-

tation is based on time-thresholds. Position readings with a small velocity for a period of time are

considered to be at the end of one trip and the beginning of a new trip. We note that this problem

is particularly difﬁcult for taxi driver data, because these drivers may often stop only long enough

to let out a passenger and this can be difﬁcult to distinguish from stopping at a long stoplight. To

address same of the potential noise, we discard trips that are too short, too noisy, and too cyclic.

• Find function from input space X to output space Y

such that the prediction error is low.

Microsoft announced today that they

acquired Apple for the amount equal to the

gross national product of Switzerland.

Microsoft officials stated that they first

wanted to buy Switzerland, but eventually

were turned off by the mountains and the

snowy winters…

GATACAACCTATCCCCGTATATATATTCTA

TGGGTATAGTATTAAATCAATACAACCTAT

CCCCGTATATATATTCTATGGGTATAGTAT

TAAATCAATACAACCTATCCCCGTATATAT

ATTCTATGGGTATAGTATTAAATCAGATAC

AACCTATCCCCGTATATATATTCTATGGGT

ATAGTATTAAATCACATTTA

-1

7.3

Warm Up: Supervised Learning

Imitation Learning

• Input:

– Sequence of contexts/states:

• Predict:

– Sequence of actions

• Learn Using:

– Sequences of demonstrated actions

Example: Basketball Player Trajectories

• 𝑠 = location of players & ball

• 𝑎 = next location of player

• Training set: 𝐷 = %𝑠, %𝑎

– %𝑠 = sequence of 𝑠

– %𝑎 = sequence of 𝑎

• Goal: learn ℎ(𝑠) → 𝑎

Generating Long-term Trajectories Using Deep

Hierarchical Networks

Stephan Zheng

Caltech

stzheng@caltech.edu

Yisong Yue

Caltech

yyue@caltech.edu

Patrick Lucey

STATS

plucey@stats.com

Abstract

We study the problem of modeling spatiotemporal trajectories over long time

horizons using expert demonstrations. For instance, in sports, agents often choose

action sequences with long-term goals in mind, such as achieving a certain strategic

position. Conventional policy learning approaches, such as those based on Markov

decision processes, generally fail at learning cohesive long-term behavior in such

high-dimensional state spaces, and are only effective when fairly myopic decision-

making yields the desired behavior. The key difﬁculty is that conventional models

are “single-scale” and only learn a single state-action policy. We instead propose a

hierarchical policy class that automatically reasons about both long-term and short-

term goals, which we instantiate as a hierarchical neural network. We showcase our

approach in a case study on learning to imitate demonstrated basketball trajectories,

and show that it generates signiﬁcantly more realistic trajectories compared to

non-hierarchical baselines as judged by professional sports analysts.

1 Introduction

Figure 1: The player (green)

has two macro-goals: 1)

pass the ball (orange) and

2) move to the basket.

Modeling long-term behavior is a key challenge in many learning prob-

lems that require complex decision-making. Consider a sports player

determining a movement trajectory to achieve a certain strategic position.

The space of such trajectories is prohibitively large, and precludes conven-

tional approaches, such as those based on simple Markovian dynamics.

Many decision problems can be naturally modeled as requiring high-level,

long-term macro-goals, which span time horizons much longer than the

timescale of low-level micro-actions (cf. He et al.

[8]

, Hausknecht and

Stone

[7]

). A natural example for such macro-micro behavior occurs in

spatiotemporal games, such as basketball where players execute complex

trajectories. The micro-actions of each agent are to move around the

court and, if they have the ball, dribble, pass or shoot the ball. These

micro-actions operate at the centisecond scale, whereas their macro-goals,

such as "maneuver behind these 2 defenders towards the basket", span

multiple seconds. Figure 1 depicts an example from a professional basketball game, where the player

must make a sequence of movements (micro-actions) in order to reach a speciﬁc location on the

basketball court (macro-goal).

Intuitively, agents need to trade-off between short-term and long-term behavior: often sequences of

individually reasonable micro-actions do not form a cohesive trajectory towards a macro-goal. For

instance, in Figure 1 the player (green) takes a highly non-linear trajectory towards his macro-goal of

positioning near the basket. As such, conventional approaches are not well suited for these settings,

as they generally use a single (low-level) state-action policy, which is only successful when myopic

or short-term decision-making leads to the desired behavior.

30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain.

评论收藏

内容反馈

syp_net

粉丝: 158
资源: 1196

模仿学习（imitation_learning）【62页ppt】.zip

模仿学习（Imitation Learning）

imitation-learning-master.zip_Python 深度学习_imitation learning_模仿学

基于生成对抗网络的模仿学习综述（计算机学报）.pdf

模仿学习论文 One-Shot Imitation Learning .Yan Duan

模仿学习Imitation Learning最新论文2018

ICML 2018强化学习tutorial: Imitation Learning

论文研究-模仿学习方法综述及其在机器人领域的应用.pdf

MILT: Matlab Imitation Learning Toolbox-开源

imitation:模仿学习算法的干净PyTorch实现

模仿学习论文 无模式的模仿学习 Model-Free Imitation Learning with Policy Optimization

imitation-learning:模仿学习算法

知识图谱中的可解释可验证表示学习【XAI 2019，62页ppt】.zip

深度学习第一次.zip_deep learning_deep learning ppt_深度学习_深度学习 PPT_深度学习pp

car-racing-imitation-learning:使用模仿学习解决Open AI的CarRacing-v0

gvim_8.2.0014_x64_signed.exe.zip

基于自由能原理的强化模仿学习_Reinforced Imitation Learning by Free Energy Prin

2020年机器学习深度学习下载地址.txt

peocess__other__imitation.rar_The Process

Imitation Learning A Survey of Learning Methods.pdf

MFC-imitation-QQ-programming-.zip_MFC QQ_Programming MFC _imitat

VC-imitation-QQ-.zip_VC 好友列表_imitation_vc 导航栏_任务栏消息_抽屉菜单

simple-imitation-Jingdong-Mall-master.zip

Third-Person Imitation Learning, OpenAI, 2017.pdf模仿学习

最新 「模仿学习Imitation Learning」综述论文

Generative Adversarial Imitation Learning 生成对抗的模仿学习

Generative Adversarial Imitation Learning.pdf

最新《模仿学习(Imitation Learning》进展报告

ChatGPT教程（终极版）最全整理

博客中Kmeans以及FCM算法数据（免积分）

最新资源

模仿学习论文　One-Shot Imitation Learning .Yan Duan

模仿学习论文无模式的模仿学习 Model-Free Imitation Learning with Policy Optimization

peocessotherimitation.rar_The Process

最新「模仿学习Imitation Learning」综述论文