【免费】TemporalDifferenceVariationalAuto-Encoder_序列生成模型+VAE1

需积分: 0 162 浏览量 2022-08-03 15:44:01 上传评论收藏 1.53MB PDF 举报

Temporal Difference Variational Auto-Encoder（TD-VAE）是一种序列生成模型，旨在为复杂环境中的智能体提供一种心理模拟器，以实现更好的行动和规划。该模型满足三个关键特性：(a) 建立一个抽象状态来表示世界的状态；(b) 形成一个表示对世界不确定性信念的表示；(c) 超越简单的逐步模拟，展示时间抽象能力。传统的自编码器（Auto-Encoder）与变分自编码器（Variational Auto-Encoder, VAE）已经被广泛应用于数据生成和表示学习。而TD-VAE则在这些基础上进一步发展，特别针对强化学习（Reinforcement Learning, RL）中的部分观察环境问题。在部分观察环境中，智能体需要基于已收集的信息构建对世界不确定性的表示，以便进行最优的行动和探索。 TD-VAE受到强化学习中时间差分学习（Temporal-Difference Learning, TD Learning）的启发，通过训练两个时间点之间的数据对，学习能够表示未来几步状态的明确信念的表示。这种方法使得模型无需通过单步转换就可以直接展开预测，从而提高了模型的效率和时间抽象能力。时间差分学习是强化学习中常用的一种算法，它通过估计未来奖励的期望值来更新当前状态的价值，以此来进行策略优化。在TD-VAE中，这种思想被应用于序列数据的生成，使模型能够预测未来的序列状态，而不仅仅是当前状态。模型的架构可能包含一个编码器（Encoder）和一个解码器（Decoder），编码器将观测到的序列转化为潜在的表示，而解码器则根据这些表示生成新的序列。通过变分推断，TD-VAE可以学习到一个分布，这个分布能够捕获序列中的模式和不确定性，并且能够预测未来的时间步。在训练过程中，TD-VAE利用来自不同时间点的数据对，通过最小化生成序列与实际序列之间的差距来优化模型参数。这有助于模型学习到更稳定的长期依赖关系，并能有效地处理序列中的噪声和不完整信息。应用场景方面，TD-VAE可能被用于语音合成、神经机器翻译、图像描述生成等多种领域，适应不同的需求，如长期一致性、样本质量、抽象学习等。同时，由于其在处理部分观察环境的能力，TD-VAE在强化学习中有着显著的优势，可以帮助智能体在未知环境中进行更有效的探索和决策。 Temporal Difference Variational Auto-Encoder是一种创新的序列生成模型，通过结合时间差分学习的思想和变分自编码器的框架，能够学习到对未来的预测性表示，这对于在复杂环境中构建智能代理具有重要意义。

资源详情

资源评论

资源推荐

Published as a conference paper at ICLR 2019

TEMPORAL DIFFERENCE

VARIATIONAL AUTO-ENCODER

Karol Gregor, George Papamakarios, Frederic Besse, Lars Buesing, Théophane Weber

DeepMind

{karolg, gpapamak, fbesse, lbuesing, theophane}@google.com

ABSTRACT

To act and plan in complex environments, we posit that agents should have a mental

simulator of the world with three characteristics: (a) it should build an abstract state

representing the condition of the world; (b) it should form a belief which represents

uncertainty on the world; (c) it should go beyond simple step-by-step simulation,

and exhibit temporal abstraction. Motivated by the absence of a model satisfying all

these requirements, we propose TD-VAE, a generative sequence model that learns

representations containing explicit beliefs about states several steps into the future,

and that can be rolled out directly without single-step transitions. TD-VAE is

trained on pairs of temporally separated time points, using an analogue of temporal

difference learning used in reinforcement learning.

1 INTRODUCTION

Generative models of sequential data have received a lot of attention, due to their wide applicability

in domains such as speech synthesis (van den Oord et al., 2016a; 2017), neural translation (Bahdanau

et al., 2014), image captioning (Xu et al., 2015), and many others. Different application domains

will often have different requirements (e.g. long term coherence, sample quality, abstraction learning,

etc.), which in turn will drive the choice of the architecture and training algorithm.

Of particular interest to this paper is the problem of reinforcement learning in partially observed

environments, where, in order to act and explore optimally, agents need to build a representation of

the uncertainty about the world, computed from the information they have gathered so far. While

an agent endowed with memory could in principle learn such a representation implicitly through

model-free reinforcement learning, in many situations the reinforcement signal may be too weak to

quickly learn such a representation in a way which would generalize to a collection of tasks.

Furthermore, in order to plan in a model-based fashion, an agent needs to be able to imagine distant

futures which are consistent with the agent’s past. In many situations however, planning step-by-step

is not a cognitively or computationally realistic approach.

To successfully address an application such as the above, we argue that a model of the agent’s

experience should exhibit the following properties:

•

The model should learn an abstract state representation of the data and be capable of making

predictions at the state level, not just the observation level.

•

The model should learn a belief state, i.e. a deterministic, coded representation of the ﬁltering

posterior of the state given all the observations up to a given time. A belief state contains all the

information an agent has about the state of the world and thus about how to act optimally.

•

The model should exhibit temporal abstraction, both by making ‘jumpy’ predictions (predictions

several time steps into the future), and by being able to learn from temporally separated time

points without backpropagating through the entire time interval.

To our knowledge, no model in the literature meets these requirements. In this paper, we develop a

new model and associated training algorithm, called Temporal Difference Variational Auto-Encoder

(TD-VAE), which meets all of the above requirements. We ﬁrst develop TD-VAE in the sequential,

non-jumpy case, by using a modiﬁed evidence lower bound (ELBO) for stochastic state space models

Published as a conference paper at ICLR 2019

(Krishnan et al., 2015; Fraccaro et al., 2016; Buesing et al., 2018) which relies on jointly training a

ﬁltering posterior and a local smoothing posterior. We demonstrate that on a simple task, this new

inference network and associated lower bound lead to improved likelihood compared to methods

classically used to train deep state-space models.

Following the intuition given by the sequential TD-VAE, we develop the full TD-VAE model, which

learns from temporally extended data by making jumpy predictions into the future. We show it can

be used to train consistent jumpy simulators of complex 3D environments. Finally, we illustrate how

training a ﬁltering a posterior leads to the computation of a neural belief state with good representation

of the uncertainty on the state of the environment.

2 MODEL DESIDERATA

2.1 CONSTRUCTION OF A LATENT STATE-SPACE

Autoregressive models

. One of the simplest way to model sequential data

, . . . , x

)

is to use

the chain rule to decompose the joint sequence likelihood as a product of conditional probabili-

ties, i.e.

log p(x

, . . . , x

) =

log p(x

| x

, . . . , x

t−1

)

. This formula can be used to train an

autoregressive model of data, by combining an RNN which aggregates information from the past

(recursively computing an internal state

= f(h

t−1

, x

)

) with a conditional generative model

which can score the data

given the context

. This idea is used in handwriting synthesis (Graves,

2013), density estimation (Uria et al., 2016), image synthesis (van den Oord et al., 2016b), audio

synthesis (van den Oord et al., 2017), video synthesis (Kalchbrenner et al., 2016), generative recall

tasks (Gemici et al., 2017), and environment modeling (Oh et al., 2015; Chiappa et al., 2017).

While these models are conceptually simple and easy to train, one potential weakness is that they only

make predictions in the original observation space, and don’t learn a compressed representation of

data. As a result, these models tend to be computationally heavy (for video prediction, they constantly

decode and re-encode single video frames). Furthermore, the model can be computationally unstable

at test time since it is trained as a next step model (the RNN encoding real data), but at test time

it feeds back its prediction into the RNN. Various methods have been used to alleviate this issue

(Bengio et al., 2015; Lamb et al., 2016; Goyal et al., 2017; Amos et al., 2018).

State-space models

. An alternative to autoregressive models are models which operate on a higher

level of abstraction, and use latent variables to model stochastic transitions between states (grounded

by observation-level predictions). This enables to sample state-to-state transitions only, without

needing to render the observations, which can be faster and more conceptually appealing. They

generally consist of decoder or prior networks, which detail the generative process of states and

observations, and encoder or posterior networks, which estimate the distribution of latents given the

observed data. There is a large amount of recent work on these type of models, which differ in the

precise wiring of model components (Bayer & Osendorfer, 2014; Chung et al., 2015; Krishnan et al.,

2015; Archer et al., 2015; Fraccaro et al., 2016; Liu et al., 2017; Serban et al., 2017; Buesing et al.,

2018; Lee et al., 2018; Ha & Schmidhuber, 2018).

Let

z = (z

, . . . , z

)

be a state sequence and

x = (x

, . . . , x

)

an observation sequence. We

assume a general form of state-space model, where the joint state and observation likelihood can

be written as p(x, z) =

p(z

| z

t−1

)p(x

| z

These models are commonly trained with a VAE-

inspired bound, by computing a posterior

q(z | x)

over the states given the observations. Often, the

posterior is decomposed autoregressively:

q(z | x) =

q(z

| z

t−1

, φ

(x))

, where

is a function

, . . . , x

)

for ﬁltering posteriors or the entire sequence

for smoothing posteriors. This leads

to the following lower bound:

log p(x) ≥ E

z∼q(z | x)

log p(x

| z

) + log p(z

| z

t−1

) − log q(z

| z

t−1

, φ

(x))

. (1)

For notational simplicity,

p(z

| z

) = p(z

)

. Also note the conditional distributions could be very complex,

using additional latent variables, ﬂow models, or implicit models (for instance, if a deterministic RNN with

stochastic inputs is used in the decoder).

Published as a conference paper at ICLR 2019

2.2 ONLINE CREATION OF BELIEF STATE.

A key feature of sequential models of data is that they allow to reason about the conditional distribution

of the future given the past:

p(x

t+1

, . . . , x

| x

, . . . , x

)

. For reinforcement learning in partially

observed environments, this distribution governs the distribution of returns given past observations,

and as such, it is sufﬁcient to derive the optimal policy. For generative sequence modeling, it

enables conditional generation of data given a context sequence. For this reason, it is desirable

to compute sufﬁcient statistics

= b

, . . . , x

)

of the future given the past, which allow to

rewrite the conditional distribution as

p(x

t+1

, . . . , x

| x

, . . . , x

) ≈ p(x

t+1

, . . . , x

| b

)

. For an

autoregressive model as described in section 2.1, the internal RNN state

can immediately be

identiﬁed as the desired sufﬁcient statistics

. However, for the reasons mentioned in the previous

section, we would like to identify an equivalent quantity for a state-space model.

For a state-space model, the ﬁltering distribution

p(z

| x

, . . . , x

)

, also known as the belief state in

reinforcement learning, is sufﬁcient to compute the conditional future distribution, due to the Markov

assumption underlying the state-space model and the following derivation:

p(x

t+1

, . . . , x

| x

, . . . , x

) =

p(z

| x

, . . . , x

)p(x

t+1

, . . . , x

| z

) dz

. (2)

Thus, if we train a network that extracts a code

from

, . . . , x

)

so that

p(z

| x

, . . . , x

) ≈

p(z

| b

)

would contain all the information about the state of the world the agent has, and would

effectively form a neural belief state, i.e. a code fully characterizing the ﬁltering distribution.

Classical training of state-space model does not compute a belief state: by computing a joint,

autoregressive posterior

q(z | x) =

q(z

| z

t−1

, x)

, some of the uncertainty about the marginal

posterior of

may be ‘leaked’ in the sample

t−1

. Since that sample is stochastic, to obtain all

information from

, . . . , x

)

about

, we would need to re-sample

t−1

, which would in turn

require re-sampling z

t−2

all the way to z

While the notion of a belief state itself and its connection to optimal policies in POMDPs is well

known (Astrom, 1965; Kaelbling et al., 1998; Hauskrecht, 2000), it has often been restricted to the

tabular case (Markov chain), and little work investigates computing belief states for learned deep

models. A notable exception is (Igl et al., 2018), which uses a neural form of particle ﬁltering,

and represents the belief state more explicitly as a weighted collection of particles. Related to our

deﬁnition of belief states as sufﬁcient statistics is the notion of predictive state representations (PSRs)

(Littman & Sutton, 2002); see also (Venkatraman et al., 2017) for a model that learns PSRs which,

combined with a decoder, can predict future observations.

Our last requirement for the model is that of temporal abstraction. We postpone the discussion of this

aspect until section 4.

3 BELIEF-STATE-BASED ELBO FOR SEQUENTIAL TD-VAE

In this section, we develop a sequential model that satisﬁes the requirements given in the previ-

ous section, namely (a) it constructs a latent state-space, and (b) it creates a online belief state.

We consider an arbitrary state space model with joint latent and observable likelihood given by

p(x, z) =

p(z

| z

t−1

)p(x

| z

)

, and we aim to optimize the data likelihood

log p(x)

. We begin

by autoregressively decomposing the data likelihood as:

log p(x) =

log p(x

| x

)

. For a given

, we evaluate the conditional likelihood

p(x

| x

)

by inferring over two latent states only:

t−1

and

, as they will naturally make belief states appear for times t − 1 and t:

log p(x

| x

) ≥ E

t−1

)∼q( z

t−1

≤t

)

log p(x

| z

t−1

, z

, x

) + log p(z

t−1

, z

| x

)

− log q(z

t−1

, z

| x

≤t

)

. (3)

Because of the Markov assumptions underlying the state-space model, we can simplify

p(x

| z

t−1

, z

, x

) = p(x

| z

)

and decompose

p(z

t−1

, z

| x

) = p(z

t−1

| x

)p(z

| z

t−1

)

. Next,

we choose to decompose

q(z

t−1

, z

| x

≤t

)

as a belief over

and a one-step smoothing distribution

over

t−1

q(z

t−1

, z

| x

≤t

) = q(z

| x

≤t

)q(z

t−1

| z

, x

≤t

)

. We obtain the following belief-based

Published as a conference paper at ICLR 2019

ELBO for state-space models:

log p(x

| x

) ≥ E

t−1

)∼q( z

t−1

| x

≤t

)

log p(x

| z

) + log p(z

t−1

| x

) + log p(z

| z

t−1

)

− log q(z

| x

≤t

) − log q(z

t−1

| z

, x

≤t

)

. (4)

Both quantities

p(z

t−1

| x

≤t−1

)

and

q(z

| x

≤t

)

represent the belief state of the model at different

times, so at this stage we approximate them with the same distribution

(z | b)

, with

= f(b

t−1

, x

)

representing the belief state code for

. Similarly, we represent the smoothing posterior over

t−1

q(z

t−1

| z

, b

t−1

, b

). We obtain the following loss:

−L = E

∼p

)

t−1

∼q(z

t−1

)

log p(x

| z

) + log p

t−1

| b

t−1

) + log p(z

| z

t−1

)

− log p

| b

) − log q(z

t−1

| z

, b

t−1

, b

)

. (5)

We provide an intuition on the different terms of the ELBO in the next section.

4 TD-VAE AND JUMPY STATE MODELING

The model derived in the previous section expresses a state model

p(z

| z

t−1

)

that describes how the

state of the world evolves from one time step to the next. However, in many applications, the relevant

timescale for planning may not be the one at which we receive observations and execute simple

actions. Imagine for example planning for a trip abroad; the different steps involved (discussing

travel options, choosing a destination, buying a ticket, packing a suitcase, going to the airport, and so

on), all occur at vastly different time scales (potentially months in the future at the beginning of the

trip, and days during the trip). Certainly, making a plan for this situation does not involve making

second-by-second decisions. This suggests that we should look for models that can imagine future

states directly, without going through all intermediate states.

Beyond planning, there are several other reasons that motivate modeling the future directly. First,

training signal coming from the future can be stronger than small changes happening between

time steps. Second, the behavior of the model should ideally be independent from the underlying

temporal sub-sampling of the data, if the latter is an arbitrary choice. Third, jumpy predictions can be

computationally efﬁcient; when predicting several steps into the future, there may be some intervals

where the prediction is either easy (e.g. a ball moving straight), or the prediction is complex but does

not affect later time steps — which Neitz et al. (2018) call inconsequential chaos.

There is a number of research directions that consider temporal jumps. Koutnik et al. (2014) and

Chung et al. (2016) consider recurrent neural network with skip connections, making it easier to

bridge distant timesteps. Buesing et al. (2018) temporally sub-sample the data and build a jumpy

model (for ﬁxed jump size) of this data; but by doing so they also drop the information contained

in the skipped observations. Neitz et al. (2018) and Jayaraman et al. (2018) predict sequences with

variable time-skips, by choosing as target the most predictable future frames. They predict the

observations directly without learning appropriate states, and only focus on nearly fully observed

problems (and therefore do not need to learn a notion of belief state). For more general problems,

this is a fundamental limitation, as even if one could in principle learn a jumpy observation model

p(x

t+δ

≤t

)

, it cannot be used recursively (feeding

t+δ

back to the RNN and predicting

t+δ+δ

This is because

t+δ

does not capture the full state of the system and so we would be missing

information from

t + δ

to fully characterize what happens after time

t + δ

. In addition,

t+δ

might not be appropriate even as target, because some important information can only be extracted

from a number of frames (potentially arbitrarily separated), such as a behavior of an agent.

4.1 THE TD-VAE MODEL

Motivated by the model derived in section 3, we extend sequential TD-VAE to exhibit time abstraction.

We start from the same assumptions and architectural form: there exists a sequence of states

, . . . , z

from which we can predict the observations

, . . . , x

. A forward RNN encodes a belief state

剩余16页未读，继续阅读

评论收藏

内容反馈

虚伪的小白

粉丝: 26
资源: 321

Temporal Difference Variational Auto-Encoder_序列生成模型+VAE1

评论0

最新资源

Temporal Difference Variational Auto-Encoder_序列生成模型+VAE1

评论0

Chapter 6 (Temporal Difference Learning).rar_Q-learning_SARSA Q-

TPA-LSTM-master_attention_attention-LSTM_attentionLSTM_TPA-LSTM_

MTGNN-923_神经网络预测_序列预测_MTGNN_图模型_源码.zip

TPA-LSTM-master_TPALSTM_TPA-LSTM_LSTM_LSTM时间序列_TPA_源码.zip

Python库 | signal_temporal_logic-0.1.1-cp38-cp38-win_amd64.whl

Statistical_Analysis_of_Spatial_and_Spatio－Temporal_Point_Patterns

Spatio-temporal_Graph_Convolutional_Neural_Network.pdf

Exploratory_Analysis_of_Spatial_and_Temporal_Data_-_A_Systematic_Approach

TCN-with-attention-master_attention_tcn_attention预测_attention-LS

bilinear_interpolation.zip_Spatio-temporal_bilinear_时空匹配_时空匹配子函数

lisa-caffe-public-lstm_video_deploy.zip_lstm识别_lstm语音_深度学习语音_语音深

TL-VIEWS_ A Tool for Temporal Logic Verification of

TRCA-SSVEP-master_ssvep_BCISSVEP_TRCA.zip

CVPR2016Structural-RNN: Deep Learning on Spatio-Temporal _trained_models-002.zip

Tutorial-on-DNN-4-of-9-DNN-Accelerator-Architectures

CVPR2016Structural-RNN: Deep Learning on Spatio-Temporal _trained_models-001.zip

kinetics-skeleton_label.zip

TPA-LSTM时间序列预测实战案例

CVPR2016Structural-RNN: Deep Learning on Spatio-Temporal _trained_models-003.zip

TRCA-SSVEP-master_ssvep_BCISSVEP_TRCA

Matlab-code-TGCN-Mar2018-master_Will_unicode_

RLbook-2nd-Sutton-Answer_Sutton_youthock_强化学习_RLbook2020_monthz1

Python库 | ds_ctcdecoder-0.9.0a10-cp38-cp38-win_amd64.whl

PyPI 官网下载 | deepspeech-0.6.0a5-cp36-cp36m-manylinux1_x86_64.whl

BurpLoaderKeygen.jar.zip

最新版ISO/IEC 27001:2022、ISO 27002:2022中英文合集

Chrome Header Editor 插件

Goby红队版-win-x64-2.4.7版本

最新资源