Chapter 6: Time Series and Forecasting
Chapter 6: Time Series and Forecasting
Chapter 6: Time Series and Forecasting
Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 1 / 31
Outline and learning objectives
Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 2 / 31
Time series data
582
581
580
data(LakeHuron)
plot.ts(LakeHuron)
LakeHuron
579
Time
Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 3 / 31
Regression fit
We could just try to fit a simple regression model to this data ...
582
581
t = c(1875:1972)
580
model0 = lm(LakeHuron ˜ t)
LakeHuron
f = model0$fitted.values
579
lines(t,f,lwd=2,col=’red’)
578
The fit doesn’t look too bad but ...
577
576
Time
Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 4 / 31
Regression fit
2
r = model0$residuals
1
plot(t,r,col=’red’)
lines(t,rep(0,length(t)))
0
−1
... the residuals seem to be
dependent on x ....
−2
Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 5 / 31
Regression fit
ACF of residuals
1.0
0.8
0.6
acf(r,lwd=2,col=’red’,main=’ACF of
residuals’)
ACF
0.4
0.2
... and are autocorrelated!
0.0
−0.2
0 5 10 15
Lag
Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 6 / 31
A local level model
yt = θt + νt νt ∼ Normal(0, Vt )
θt = θt−1 + ωt ωt ∼ Normal(0, Wt )
Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 7 / 31
Inference for the local level model
Assume that the variances Vt , Wt are known for all t. Then the model
parameters are {θt }. Bayesian inference starts by specifying a prior
distribution for θ0 , say θ0 ∼ Normal(m0 , C0 ). Then inference can be
carried out sequentially:
Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 8 / 31
The joint distribution of θt and yt is:
θt mt−1 Rt Rt
y ∼N ,
yt t−1 mt−1 Rt Qt
Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 9 / 31
The joint distribution of θt and yt is:
θt mt−1 Rt Rt
y ∼N ,
yt t−1 mt−1 Rt Qt
θt | yt ∼ Normal(mt , Ct ), where
mt = mt−1 + At et ,
At = Rt /Qt ,
et = yt − mt−1 ,
Ct = Rt − A2t Qt .
mt = (1 − At ) mt−1 + At yt .
Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 9 / 31
k step ahead prediction
It is straightforward to predict various steps into the future as follows:
Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 10 / 31
Fitting the local level model to the Lake Huron data
Assume that the observation and system variances are Vt = Wt = 1 for all
t.
if(!require(dlm))install.packages(”dlm”)
library(dlm)
mod1 = dlmModPoly (order=1)
filt1 = dlmFilter(LakeHuron,mod1)
names(filt1)
filt1$y
filt1$mod
filt1$m
dlm can be used for fitting dynamic linear models. An alternative, more
recent package is bsts.
Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 11 / 31
Selecting the variances
Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 12 / 31
Dynamic linear models
yt = FT
t θ t + νt , νt ∼ Normal(0, Vt ) (1)
θt = Gt θ t−1 + ω t , ω t ∼ Normal(0, Wt ). (2)
These models are linear state space models. Equation (1) shows the
behaviour of the observations {yt } over time. This depends on an
unobserved signal {θ t }, whose evolution is represented by (2).
The usual features of a time series such as trend and seasonality can be
modeled within this format.
Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 13 / 31
Examples
Dynamic regression model
yt = θ1t + θ2t xt + νt
θt = θ t−1 + ω t
Autoregressive models
The standard AR(p) model is
This can be expressed as a DLM by setting Ft = (1, yt−1 , ..., yt−p )T with
G = Ip+1 and W having all elements equal to 0.
Allowing W to be a positive variance matrix, gives a time-varying
autoregressive model.
Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 14 / 31
Combining models
yt = y1t + . . . + yh,t
Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 15 / 31
Combining models
and
G1t W1t
Gt =
.. , Wt =
.. .
. .
Ght Wht
Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 16 / 31
Inference for a DLM: The Kalman filter
For the general model, we can carry out inference in the same way.
Assume that θ t−1 |yt−1 ∼ Normal(mt−1 , Ct−1 ). Then:
Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 17 / 31
Nowcasting example
5
4
library(bsts)
3
data(iclaims)
2
claims
Time
Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 18 / 31
Nowcasting example: model fitting and prediction
ss = AddLocalLinearTrend(list(), initial.claims$iclaimsNSA)
ss = AddSeasonal(ss, initial.claims$iclaimsNSA, nseasons = 52)
model1 = bsts(initial.claims$iclaimsNSA,state.specification = ss,niter =
1000)
plot(model)
plot(model, ”components”)
pred = predict(model, horizon = 12)
plot(pred, plot.original = 156)
Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 19 / 31
State space models
yt = ht (θ t , νt )
θt = gt (θ t−1 , ω t )
Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 20 / 31
Forward filtering backward sampling
One way of dealing with state space models is via MCMC algorithms.
These are usually based on the so-called forward filtering backward
sampling algorithm designed for HMMs.
However, this algorithm is very costly and not useful for on line inference.
Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 21 / 31
Using FFBS on the Lake Huron data
library(dlm)
gibbsout = dlmGibbs-
DIG(LakeHuron,mod=dlmModPoly(1),shape.y=0.1,rate.y=0.1,shape.theta=0.1,
rate.theta=0.1,n.sample=5000,thin=10)
burn = 500
attach(gibbsout)
ts.plot(ergMean(dV[-burn]),ylab=expression(bar(V)),xlab=’iterations’)
ts.plot(ergMean(dW[-burn]),ylab=expression(bar(V)),xlab=’iterations’)
acf(dV[-burn],lwd=3,col=’red’)
acf(dW[-burn],lwd=3,col=’red’)
hist(dV[-burn],probability=TRUE,nclass=20,xlab=’V’,ylab=’f’,main=”)
lines(density(dV[-burn]),lwd=3,col=’red’)
hist(dW[-burn],probability=TRUE,nclass=20,xlab=’W’,ylab=’f’,main=”)
lines(density(dW[-burn]),lwd=3,col=’red’)
Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 22 / 31
Online inference: the Bayesian bootstrap filter
Given an initial distribution for θ 0 , the evolution and updating
distributions f (θ t |yt−1 ) and f (θ t |yt ) will not usually have a closed form
for general state space models. Therefore, MCMC methods could be used
but these are time consuming.
Particle filtering uses sampling importance resampling (SIR) ideas to
obtain draws from f (θ t |yt ) based on draws from draws from f (θ t−1 |yt−1 ).
Remember the SIR algorithm to generate a sample from f via an
importance sampler q
1 Let (θ0i )N
i=1 be a sample from the prior distribution, f (θ0 |y0 ) with
initial weights (w0i )N
i=1 all proportional to 1.
2 Then we can draw θ1i∗ ∼ Normal(θ0i , W ).
3 Resample (θ1i )N i∗ N i i
i=1 from (θ1 )i=1 with weights w1 = f (y1 |θ1 , V ).
Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 24 / 31
Using the bootstrap filter with the Lake Huron data
1.0
0.8
The graph compares the filtered
0.6
predicted density of θn with the true
f
density.
0.4
0.2
0.0
θn
Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 25 / 31
Using the bootstrap filter with the Lake Huron data
582
581
580
Here we see 1 step ahead predictions.
LakeHuron
579
This was very fast!
578
577
576
Time
Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 26 / 31
Why does this scheme work?
i
The bootstrap filter combines old particles θt−1 generated from
i i
f (θt−1 |yt−1 ) with new particles θt generated from f (θt |θt−1 ) so the
i
combined particles (θt−1 , θt ) are draws from f (θt−1 , θt |yt−1 ).
Then the particles are resampled with weights proportional to the
likelihood:
Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 27 / 31
Problems and solutions
Parameter collapsing:
Over time, the resampling scheme can lead to the same parameter
value being resampled almost always with weight 1. Various
alternative filters have been developed to counteract this problem.
One example is the auxiliary particle filter based on
0.04
0.02
1.4
exrates$USD
0.00
ret
1.2
−0.02
1.0
−0.04
0.8
2000 2002 2004 2006 2008 2010 2012 2000 2002 2004 2006 2008 2010 2012
Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 29 / 31
Bayesian inference for SV models
Given suitable prior parameter distributions, SV models can be run in R via
MCMC using stochvol or using filtering approaches via nimble.
Estimated volatilities in percent (5% / 50% / 95% posterior quantiles)
1.6
1.4
1.2
1.0
0.8
0.6
0.4
Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 30 / 31
Conclusions
Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 31 / 31