没有合适的资源?快使用搜索试试~ 我知道了~
一种新的ARMA模型拟合 大时间序列数据的算法.pdf
1.该资源内容由用户上传,如若侵权请联系客服进行举报
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
版权申诉
0 下载量 186 浏览量
2024-04-02
14:36:23
上传
评论
收藏 1.89MB PDF 举报
温馨提示
试读
23页
一种新的ARMA模型拟合 大时间序列数据的算法.pdf
资源推荐
资源详情
资源评论
A New ARMA Model Fitting
Algorithm for Big Time Series Data
Luke Yerbury
Supervised by Ali Eshragh and Glen Livingston
University of Newcastle
Vacation Research Scholarships are funded jointly by the Department of Education and
Training and the Australian Mathematical Sciences Institute.
Abstract
Big data matrix compression via non-uniform sampling schemes such as leverage score sampling, pro-
vide excellent alternatives to na¨ıve computation that result in high-quality numerical implementations
and strong theoretical guarantees. In the context of autoregressive (AR) time series models, a highly
efficient algorithm to approximate leverage scores of the underlying regressors has recently been es-
tablished. For more general ARMA models, unlike the AR model, the likelihood function is a non-convex
nonlinear function which makes the problem more complicated. One approach considered here is util-
ising the Hannan-Rissanen (HR) algorithm, which allows unobserved white noise to be approximated
using high order AR models. An investigation into the optimal order of this intermediary AR model
and the effectiveness of the trimming step in the HR algorithm has been conducted.
Contents
1 Introduction 2
2 Background Theory 4
2.1 Autoregressive Moving Average Models . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1.1 Estimating Model Orders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.2 Estimating the Model Parameters . . . . . . . . . . . . . . . . . . . . . . . . . 5
3 Results and Discussion 8
3.1 The Optimal Value of ˜p . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.2 The Effectiveness of Trimming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4 Conclusion 14
5 References 15
6 Appendix 16
1
1 Introduction
A time series is a sequence of random variables indexed according to the order they are observed
in time. Considered here are regularly sampled time series, some examples for which include daily
closing stock prices, daily sunspot counts, monthly average temperature in Newcastle, etc. The primary
objective of time series analysis is to forecast the future behaviour of a system via the construction of
suitable models. Such modelling can be approached from one of the frequency domain or time domain
perspectives. Frequency domain exploits methods from Fourier Analysis to build models that focus on
the periodic variations in the data. The time domain approach observes the correlation between the
time series and lagged versions of itself to build parametric functions of past and present values. The
latter is more prevalent for its simplicity and typically superior performance, and the models featured
in this report belong to that framework.
A time series is said to be stationary when the mean function and variance are independent of time,
and the autocovariance depends only on the time difference. Time series with this property can have
an autoregressive moving average (ARMA) model fitted to them. The autoregressive (AR) component
refers to linear regression of current values of the time series against particular lagged values where
significant correlation was identified. The moving average (MA) component involves linear regression
of current values against previous white noise - which are uncorrelated, zero mean, equal variance
random variables representing variation not explained by the series itself. Introduced by Box and
Jenkins (1976), these simple models are still widely used to great effect.
An ever increasing capacity to collect large masses of data has required reconsideration of typical
approaches to data analysis. The fitting of ARMA models involves solving many ordinary least squares
problems, and in the context of big time series data, this can create a significant computational bot-
tleneck. Randomised Numerical Linear Algebra (RandNLA) employs random sub-sampling routines
to develop improved algorithms for large-scale linear algebra problems such as matrix multiplcation,
regression and low rank matrix approximations (Drineas and Mahoney, 2017). By intelligent sampling,
the matrices involved in these problems can be compressed in such a way that they still retain im-
portant properties of the original matrix. Calculations subsequently performed using the compressed
matrices are then not only more efficient, but also theoretically accurate with high probability (Ma-
honey, 2011, Woodruff, 2014). Sampling based on leverage scores has been shown to aptly identify
non-uniformity in data, ultimately providing strong theoretical guarantees and high quality numerical
implementations (Drineas et al. (2012)). The LSAR algorithm, introduced by Eshragh et al. (2019),
uses leverage score based sampling to estimate the order and parameters of an AR model, a special
2
case of an ARMA model without the MA component. Na¨ıvely, computation of the leverage scores in this
ordinary least squares (OLS) regression context is as costly as solving the original OLS problem. By
exploiting the Toeplitz structure of the design matrix, Eshragh et al. developed a method to recursively
calculate the leverage scores during the model fitting process. They also developed theoretical rela-
tive error bounds for these recursively calculated leverage scores with high probability, and conducted
empirical testing to demonstrate the effectiveness of the final algorithm compared to state-of-the-art
alternatives.
A natural progression upon establishing LSAR is to extend the algorithm to the LSARMA, allowing
for the inclusion of MA terms. This is much less straightforward due to the non-convex, non-linear
nature of the likelihood function for ARMA parameters. One solution to this problem is to use the
Hannan-Rissanen (HR) algorithm. This algorithm exploits the equivalence between invertible MA(q)
and AR(∞) models to fit a large order AR model to the data, the residuals from which can then be used
in place of the unobserved white noise in OLS parameter estimation. An optional extra trimming step
improves the original estimates. The aim of this report was to investigate the characteristics of the HR
algorithm within the context of big time series data to inform algorithmic decisions in the formulation
of LSARMA. Of primary concern was understanding how large the order of the intermediate AR model
should be for various models, and understanding the effectiveness of the trimming step. Simulated
data from known ARMA processes were used for this investigation.
Statement of Authorship
Eshragh and Livingston devised and supervised the project. They provided initial code frameworks
which were altered and extended by Yerbury. Yerbury performed the simulations and interpreted
results with support from Livingston.
3
2 Background Theory
This section provides an overview of the relevant theory behind the simulations performed for this
project.
2.1 Autoregressive Moving Average Models
A time series {X
t
; t = 0, ±1, ±2, . . .} is a sequence of random variables indexed according to the order
they are observed in time. A realisation of the random variable X
t
is denoted x
t
. The time series X
t
is called (weakly) stationary if:
(i) the mean function E[X
t
] is constant, and;
(ii) the autocovariance function Cov(X
t
, X
t+h
) only depends on the lag h (independent of time t).
A time series process is ARMA(p, q) if it is stationary and
X
t
= φ
1
X
t−1
+ · · · + φ
p
X
t−p
+ θ
1
W
t−1
+ · · · + θ
q
W
t−q
+ W
t
where φ
p
6= 0, θ
q
6= 0 and the time series {W
t
; t = 0, ±1, ±2, . . .} is a Gaussian white noise process,
meaning E[W
t
] = 0 and Cov(W
t
, W
s
) = δ
ts
σ
2
W
where δ
ts
is the Kronecker delta.
Another way of expressing the above ARMA model is through the use of the backshift operator B.
We can define BY
t
= Y
t−1
, which naturally extends to powers with B
k
Y
t
= Y
t−k
. Hence we can
define the autoregressive operator Φ
p
(B) = 1 − φ
1
B − . . . − φ
p
B
p
and the moving average operator
Θ
p
(B) = 1 + θ
1
B + . . . + θ
q
B
q
to express the model as
Φ
p
(B)X
t
= Θ
q
(B)W
t
.
Additionally, an ARMA(p, q) process is said to be invertible if the time series can be written as:
W
t
=
∞
X
j=0
π
j
X
t−j
, where π
0
= 1 and
∞
X
j=0
|π
j
| < ∞
Equivalently, the process is invertible if and only if the roots of the MA polynomial Θ
q
(z), for z ∈ C,
lie outside the unit circle. Analagous to this definition, the process is said to be causal if it can be
written as:
X
t
=
∞
X
j=0
ψ
j
W
t−j
, where ψ
0
= 1 and
∞
X
j=0
|ψ
j
| < ∞
or again, if the roots of the AR polynomial Φ
p
(z) lie outside the unit circle.
[Shumway and Stoffer, 2017]
4
剩余22页未读,继续阅读
资源评论
百态老人
- 粉丝: 1641
- 资源: 2万+
下载权益
C知道特权
VIP文章
课程特权
开通VIP
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功