没有合适的资源?快使用搜索试试~ 我知道了~
Long-term Forecasting with TiDE- Time-series Dense Encoder
需积分: 5 1 下载量 93 浏览量
2023-10-02
22:01:04
上传
评论
收藏 740KB PDF 举报
温馨提示


试读
18页
simple linear models can outperform several Transformer based approaches in long term time-series forecasting. 简单的线性模型可以胜过几种基于变压器的模型 长期时间序列预测的方法。受此启发,我们提出了一个多层 基于感知器(MLP)的编码器-解码器模型,时间序列密集编码器(TiDE),用于长期 时间序列预测,享受线性模型的简单性和速度,同时也 能够处理协变量和非线性依赖关系。
资源推荐
资源详情
资源评论



























Long-term Forecasting with
TiDE: Time-series Dense Encoder
Abhimanyu Das
1
, Weihao Kong
1
, Andrew Leach
2
, Rajat Sen
1
, and Rose Yu
2, 3
1
Google Research
2
Google Cloud
3
University of California, San Diego
April 18, 2023
Abstract
Recent work has shown that simple linear models can outperform several Transformer based
approaches in long term time-series forecasting. Motivated by this, we propose a Multi-layer
Perceptron (MLP) based encoder-decoder model,
Ti
me-series
D
ense
E
ncoder (TiDE), for long-
term time-series forecasting that enjoys the simplicity and speed of linear models while also
being able to handle covariates and non-linear dependencies. Theoretically, we prove that the
simplest linear analogue of our model can achieve near optimal error rate for linear dynamical
systems (LDS) under some assumptions. Empirically, we show that our method can match or
outperform prior approaches on popular long-term time-series forecasting benchmarks while
being 5-10x faster than the best Transformer based model.
1 Introduction
Long-term forecasting, which is to predict several steps into the future given a long context or
look-back, is one of the most fundamental problems in time series analysis, with broad applications
in energy, finance, and transportation. Deep learning models [
WXWL21
,
NNSK22
] have emerged
as the preferred approach for forecasting rich, multivariate, time series data, outperforming classical
statistical approaches such as ARIMA or GARCH [
BJRL15
]. In several forecasting competitions
such as the M5 competition [
MSA20
] and IARAI Traffic4cast contest [
KKJ
+
20
], almost all the
winning solutions are based on deep neural networks.
Various neural network architectures have been explored for forecasting, ranging from recurrent
neural networks to convolutional networks to graph-neural-networks. For sequence modeling
tasks in domains such as language, speech and vision, Transformers [
VSP
+
17
] have emerged as
the most successful deep learning architecture, even outperforming recurrent neural networks
(LSTMs)[
HS97
]. Subsequently, there has been a surge of Transformer-based forecasting papers
[
WXWL21
,
ZZP
+
21
,
ZMW
+
22
] in the time-series community that have claimed state-of-the-art
(SoTA) forecasting performance for long-horizon tasks. However recent work [ZCZX23] has shown
that these Tranformers-based architectures may not be as powerful as one might expect for time
Authors are listed in alphabetical order.
1
arXiv:2304.08424v1 [stat.ML] 17 Apr 2023

series forecasting, and can be easily outperformed by a simple linear model on forecasting bench-
marks. Such a linear model however has deficiencies since it is ill-suited for modeling non-linear
dependencies among the time-series sequence and the time-independent covariates. Indeed, a
very recent paper [
NNSK22
] proposed a new Transformer-based architecture that obtains SoTA
performance for deep neural networks on the standard multivariate forecasting benchmarks.
In this paper, we present a simple and effective deep learning architecture for forecasting that
obtains superior performance when compared to existing SoTA neural network based models on the
long-term time series forecasting benchmarks. Our Multi-Layer Perceptron (MLP)-based model is
embarrassingly simple without any self-attention, recurrent or convolutional mechanism. Therefore,
it enjoys a linear computational scaling in terms of the context and horizon lengths unlike many
Transformer based solutions.
The main contributions of this work are as follows:
•
We propose the
Ti
me-series
D
ense
E
ncoder (TiDE) model architecture for long-term time
series forecasting. TiDE encodes the past of a time-series along with covariates using dense
MLPs and then decodes time-series along with future covariates, again using dense MLPs.
•
We analyze the simplest linear analogue of our model and prove that this linear model can
achieve near optimal error rate in linear dynamical systems (LDS) [
Kal63
] when the design
matrix of the LDS has maximum singular value bounded away from 1. We empirically verify
this on a simulated dataset where the linear model outperforms LSTMs and Transformers.
•
On popular real-world long-term forecasting benchmarks, our model achieves better or similar
performance compared to prior neural network based baselines (
>
10% lower Mean Squared
Error on the largest dataset). At the same time, TiDE is 5x faster in terms of inference and
more than 10x faster in training when compared to the best Transformer based model.
2 Related Work
In this section we will focus on the prior work on deep neural network models for long-term forecasting.
The comparison with classical methods such as ARIMA has been discussed in prior work [
WXWL21
]
and references there in. LongTrans [
LJX
+
19
] uses attention layer with LogSparse design to capture
local information with near linear space and computational complexity. Informer [
ZZP
+
21
] uses
the ProbSparse self-attention mechanism to achieve sub-quadratic dependency on the lenght of
the context. Autoformer [
WXWL21
] uses trend and seasonal decomposition with sub-quadratic
self attention mechanism. FEDFormer [
ZMW
+
22
] uses a frequency enchanced structure while
Pyraformer [
LYL
+
21
] uses pyramidal self-attention that has linear complexity and can attend to
different granularities.
Recently, [
ZCZX23
] proposes DLinear, a simple linear model that surprisingly outperforms many of
the Transformer based models mentioned above. Dlinear learns a linear mapping from context to
horizon, pointing to deficiencies in sub-quadratic approximations to the self-attention mechanism.
Indeed, a very recent model, PatchTST [
NNSK22
] has shown that feeding contiguous patches of
time-series as tokens to the vanilla self-attention mechanism can beat the performance of DLinear
in long-term forecasting benchmarks.
2

3 Problem Setting
Before we describe the problem setting we will need to setup some general notation.
3.1 Notation
We will denote matrices by bold capital letters like
X ∈ R
N×T
. The slice notation
i
:
j
denotes the
set
{i, i
+ 1
, ···j}
and [
n
] :=
{
1
,
2
, ··· , n}
. The individual rows and columns are always treated as
column vectors unless otherwise specified. We can also use sets to select sub-matrices i.e
X
[
I, J
]
denotes the sub-matrix with rows in
I
and columns in
J
.
X
[:
, j
] means selecting the
j
-th column
while
X
[
i,
:] means the
i
-th row. The notation [
v
;
u
] will denote the concatenation of the two column
vectors and the same notation can be used for matrices along a dimension.
3.2 Multivariate Forecasting
In this section we first abstract out the core problem in long-term multivariate forecasting. There
are
N
time-series in the dataset. The look-back of the
i
-th time-series will be denoted by
y
(i)
1:L
, while
the horizon is denoted by
y
(i)
L+1:L+H
. The task of the forecaster is to predict the horizon time-points
given access to the look-back.
In many forecasting scenarios, there might be dynamic and static covariates that are known in
advance. With slight abuse of notation, we will use
x
(i)
t
∈ R
r
to denote the
r
-dimensional dynamic
covariates of time-series
i
at time
t
. For instance, they can be global covariates (common to all
time-series) such as day of the week, holidays etc or specific to a time-series for instance the discount
of a particular product on a particular day in a demand forecasting use case. We can also have
static attributes of a time-series denoted by
a
(i)
such as features of a product in retail demand
forecasting that do not change with time. In many applications, these covariates are vital for
accurate forecasting and a good model architecture should have provisions to handle them.
The forecaster can be thought of as a function that maps the history
y
(i)
1:L
, the dynamic covariates
x
(i)
1:L+H
and the static attributes a
(i)
to an accurate prediction of the future, i.e.,
f :
n
y
(i)
1:L
o
N
i=1
,
n
x
(i)
1:L+H
o
N
i=1
,
n
a
(i)
o
N
i=1
−→
n
ˆ
y
(i)
L+1:L+H
o
N
i=1
. (1)
The accuracy of the prediction will be measured by a metric that quantifies their closeness to the
actual values. For instance, if the metric is Mean Squared Error (MSE), then the goodness of fit is
measured by,
MSE
n
y
(i)
L+1:L+H
o
N
i=1
,
n
ˆ
y
(i)
L+1:L+H
o
N
i=1
=
1
NH
N
X
i=1
y
(i)
L+1:L+H
−
ˆ
y
(i)
L+1:L+H
2
2
. (2)
4 Model
Recently, it has been observed that simple linear models [
ZCZX23
] can outperform Tranformer based
models in several long-term forecasting benchmarks. On the other hand, linear models will fall short
3

when there are inherent non-linearities in the dependence of the future on the past. Furthermore,
linear models would not be able to model the dependence of the prediction on the covariates as
evidenced by the fact that [ZCZX23] do not use time-covariates as they hurt performance.
In this section, we introduce a simple and efficient MLP based architecture for long-term time-series
forecasting. In our model we add non-linearities in the form of MLPs in a manner that can handle
past data and covariates. The model is dubbed TiDE (
Ti
me-series
D
ense
E
ncoder) as it encodes
the past of a time-series along with covariates using dense MLP’s and then decodes the encoded
time-series along with future covariates.
Figure 1: Overview of TiDE architecture. The dynamic covariates per time-point are mapped to a
lower dimensional space using a feature projection step. Then the encoder combines the look-back
along with the projected covariates with the static attributes to form an encoding. The decoder
maps this encoding to a vector per time-step in the horizon. Then a temporal decoder combines
this vector (per time-step) with the projected features of that time-step in the horizon to form the
final predictions. We also add a global linear residual connection from the look-back to the horizon.
An overview of our architecture has been presented in Figure 1. Our model is applied in a channel
independent manner i.e the input to the model is the past and covariates of one time-series at a
4
剩余17页未读,继续阅读
资源评论


AIFEx
- 粉丝: 34
- 资源: 34
上传资源 快速赚钱
我的内容管理 展开
我的资源 快来上传第一个资源
我的收益
登录查看自己的收益我的积分 登录查看自己的积分
我的C币 登录后查看C币余额
我的收藏
我的下载
下载帮助


安全验证
文档复制为VIP权益,开通VIP直接复制
