Chapter 6: Time Series and Forecasting

Download as pdf or txt
Download as pdf or txt
You are on page 1of 32

Chapter 6: Time Series and Forecasting

Conchi Ausı́n and Mike Wiper


Department of Statistics
Universidad Carlos III de Madrid

Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 1 / 31
Outline and learning objectives

To introduce Bayesian approaches to time series and forecasting.

Dynamic linear models and the Kalman filter.


State space models.
Online inference and sequential Monte Carlo.

Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 2 / 31
Time series data
582
581
580

data(LakeHuron)
plot.ts(LakeHuron)
LakeHuron

579

The graph shows the average yearly


578

water levels in feet of Lake Huron


between 1875 and 1972.
577
576

1880 1900 1920 1940 1960

Time

Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 3 / 31
Regression fit
We could just try to fit a simple regression model to this data ...

582
581
t = c(1875:1972)

580
model0 = lm(LakeHuron ˜ t)

LakeHuron
f = model0$fitted.values

579
lines(t,f,lwd=2,col=’red’)

578
The fit doesn’t look too bad but ...
577
576

1880 1900 1920 1940 1960

Time

Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 4 / 31
Regression fit

2
r = model0$residuals

1
plot(t,r,col=’red’)
lines(t,rep(0,length(t)))

0
−1
... the residuals seem to be
dependent on x ....
−2

1880 1900 1920 1940 1960

Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 5 / 31
Regression fit
ACF of residuals

1.0
0.8
0.6
acf(r,lwd=2,col=’red’,main=’ACF of
residuals’)

ACF

0.4
0.2
... and are autocorrelated!

0.0
−0.2

0 5 10 15

Lag

Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 6 / 31
A local level model

One of the simplest time series models is the following:

yt = θt + νt νt ∼ Normal(0, Vt )
θt = θt−1 + ωt ωt ∼ Normal(0, Wt )

The observations vary around a local mean which moves according to a


random walk.

Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 7 / 31
Inference for the local level model

Assume that the variances Vt , Wt are known for all t. Then the model
parameters are {θt }. Bayesian inference starts by specifying a prior
distribution for θ0 , say θ0 ∼ Normal(m0 , C0 ). Then inference can be
carried out sequentially:

Set t = 1. Let information up to time t − 1 be yt−1 = (y1 , ..., yt−1 )T .


Calculate the prior distribution for θt as:

θt | yt−1 ∼ Normal(mt−1 , Rt ) where Rt = Ct−1 + Wt .

The one step ahead predictive distribution for yt is:

yt | yt−1 ∼ Normal(mt−1 , Qt ) where Qt = Rt + Vt .

Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 8 / 31
The joint distribution of θt and yt is:
  
θt mt−1 Rt Rt
y ∼N ,
yt t−1 mt−1 Rt Qt

Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 9 / 31
The joint distribution of θt and yt is:
  
θt mt−1 Rt Rt
y ∼N ,
yt t−1 mt−1 Rt Qt

The posterior distribution for θt given yt = {yt−1 , yt } is:

θt | yt ∼ Normal(mt , Ct ), where
mt = mt−1 + At et ,
At = Rt /Qt ,
et = yt − mt−1 ,
Ct = Rt − A2t Qt .

Note that et is simply a prediction error term. The posterior mean


formula could also be written as:

mt = (1 − At ) mt−1 + At yt .

Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 9 / 31
k step ahead prediction
It is straightforward to predict various steps into the future as follows:

yt+k = θt+k + νt+k


= θt+k−1 + ωt+k + νt+k
= θt+k−2 + ωt+k−1 + ωt+k + νt+k
.
= ..
k
X
= θt + ωj + νt+k
j=1

Assuming that θt |yt ∼ Normal(mt , Ct ), then


K
X
θt+k |yt ∼ Normal(mt , Ct + Wt+j + Vt+k ).
j=1

Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 10 / 31
Fitting the local level model to the Lake Huron data

Assume that the observation and system variances are Vt = Wt = 1 for all
t.
if(!require(dlm))install.packages(”dlm”)
library(dlm)
mod1 = dlmModPoly (order=1)
filt1 = dlmFilter(LakeHuron,mod1)
names(filt1)
filt1$y
filt1$mod
filt1$m

dlm can be used for fitting dynamic linear models. An alternative, more
recent package is bsts.

Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 11 / 31
Selecting the variances

Different levels of smoothing of the series occur when different variance


structures are considered.
mod2 = dlmModPoly(order=1,dV=10,dW=1)
filt2 = dlmFilter(LakeHuron,mod2)
mod3 = dlmModPoly(order=1,dV=1,dW=10)
filt3 = dlmFilter(LakeHuron,mod3)

In practice, discount factors can be used to specify the relative size of V


and W .

Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 12 / 31
Dynamic linear models

The univariate, normal dynamic linear model (DLM) is:

yt = FT
t θ t + νt , νt ∼ Normal(0, Vt ) (1)
θt = Gt θ t−1 + ω t , ω t ∼ Normal(0, Wt ). (2)

These models are linear state space models. Equation (1) shows the
behaviour of the observations {yt } over time. This depends on an
unobserved signal {θ t }, whose evolution is represented by (2).

The usual features of a time series such as trend and seasonality can be
modeled within this format.

If the matrices Ft , Gt , Vt and Wt are constants, the model is said to be


time invariant.

Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 13 / 31
Examples
Dynamic regression model

yt = θ1t + θ2t xt + νt
θt = θ t−1 + ω t

Autoregressive models
The standard AR(p) model is

yt = θ0 + θ1 yt−1 + θ2 yt−2 + · · · + θp yt−p + νt

This can be expressed as a DLM by setting Ft = (1, yt−1 , ..., yt−p )T with
G = Ip+1 and W having all elements equal to 0.
Allowing W to be a positive variance matrix, gives a time-varying
autoregressive model.

Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 14 / 31
Combining models

The additive structure of the DLMs makes it easy to think of observed


series as originating form the sum of different components,e.g.,

yt = y1t + . . . + yh,t

where y1t might represent a trend component, y2t a seasonal component,


and so on. Then, each component, yit , might be described by a different
DLM:

yt = Fit θ it + νit , νit ∼ N (0, Vit )


θ it = Git θ t−1 + ω it , ω it ∼ N (0, Wit ).

Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 15 / 31
Combining models

By the assumption of independence of the components, yt is also a DLM


described by:

Ft = (F1t |. . .| Fht ) , Vt = V1t + . . . + Vht ,

and
   
G1t W1t
Gt = 
 ..  , Wt = 
  .. .

. .
Ght Wht

Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 16 / 31
Inference for a DLM: The Kalman filter

For the general model, we can carry out inference in the same way.
Assume that θ t−1 |yt−1 ∼ Normal(mt−1 , Ct−1 ). Then:

θ t |yt−1 ∼ Normal(at , Rt ) where at = Gt mt−1 and


Rt = Gt Ct−1 GTt + Wt .
yt |yt−1 ∼ Normal(ft , Qt ) where ft = FT T
t at and Qt = Ft Rt Ft + Vt .
  
θ t at Rt Rt Ft
yt−1 ∼ Normal , T .
yt ft Ft Rt Qt
−1
θ t |yt ∼ Normal (mt , Ct ) where mt = at + Rt FT
t Qt et where
−1
et = yt − ft−1 is the forecast error, and Ct = Rt − Rt FT
t Qt Ft Rt .

These equations are the well known, Kalman filtering system.

Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 17 / 31
Nowcasting example
5
4

library(bsts)
3

data(iclaims)
2
claims

Time series of weekly unemployment


1

claims. There are various


components.
0
−1

12500 13000 13500 14000 14500 15000 15500

Time

Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 18 / 31
Nowcasting example: model fitting and prediction

ss = AddLocalLinearTrend(list(), initial.claims$iclaimsNSA)
ss = AddSeasonal(ss, initial.claims$iclaimsNSA, nseasons = 52)
model1 = bsts(initial.claims$iclaimsNSA,state.specification = ss,niter =
1000)
plot(model)
plot(model, ”components”)
pred = predict(model, horizon = 12)
plot(pred, plot.original = 156)

Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 19 / 31
State space models

State space models generalize the DLM by allowing for non-linear


relationships:

yt = ht (θ t , νt )
θt = gt (θ t−1 , ω t )

These are a particular example of hidden Markov models.

Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 20 / 31
Forward filtering backward sampling

One way of dealing with state space models is via MCMC algorithms.
These are usually based on the so-called forward filtering backward
sampling algorithm designed for HMMs.

1 The forward filtering step is the standard normal linear analysis to


give f (θt |yt ) at each t, for t = 1, . . . , T .
2 The backward sampling step uses the Markov property and samples
θn∗ from f (θn |yn ) and then, for t = n − 1, . . . , 1, samples from
∗ )
f (θt | yt , θt+1

Thus, a sample from the posterior parameter structure is generated.

However, this algorithm is very costly and not useful for on line inference.

Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 21 / 31
Using FFBS on the Lake Huron data

library(dlm)
gibbsout = dlmGibbs-
DIG(LakeHuron,mod=dlmModPoly(1),shape.y=0.1,rate.y=0.1,shape.theta=0.1,
rate.theta=0.1,n.sample=5000,thin=10)
burn = 500
attach(gibbsout)
ts.plot(ergMean(dV[-burn]),ylab=expression(bar(V)),xlab=’iterations’)
ts.plot(ergMean(dW[-burn]),ylab=expression(bar(V)),xlab=’iterations’)
acf(dV[-burn],lwd=3,col=’red’)
acf(dW[-burn],lwd=3,col=’red’)
hist(dV[-burn],probability=TRUE,nclass=20,xlab=’V’,ylab=’f’,main=”)
lines(density(dV[-burn]),lwd=3,col=’red’)
hist(dW[-burn],probability=TRUE,nclass=20,xlab=’W’,ylab=’f’,main=”)
lines(density(dW[-burn]),lwd=3,col=’red’)

FFBS is very slow!

Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 22 / 31
Online inference: the Bayesian bootstrap filter
Given an initial distribution for θ 0 , the evolution and updating
distributions f (θ t |yt−1 ) and f (θ t |yt ) will not usually have a closed form
for general state space models. Therefore, MCMC methods could be used
but these are time consuming.
Particle filtering uses sampling importance resampling (SIR) ideas to
obtain draws from f (θ t |yt ) based on draws from draws from f (θ t−1 |yt−1 ).
Remember the SIR algorithm to generate a sample from f via an
importance sampler q

1 Draw θ ∗1 , ..., θ ∗N from q().


2 Calculate unnormalized weights wi = f (xi∗ )/q(xi∗ ).
3 Resample x1 , ..., xN from the set {x1∗ , ..., xN∗ } with weights
proportional to wi .

When f is a posterior distribution and we draw from the prior distribution,


q, then the unnormalized weight, w is the asociated likelihood function.
Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 23 / 31
Revisiting the local level model

1 Let (θ0i )N
i=1 be a sample from the prior distribution, f (θ0 |y0 ) with
initial weights (w0i )N
i=1 all proportional to 1.
2 Then we can draw θ1i∗ ∼ Normal(θ0i , W ).
3 Resample (θ1i )N i∗ N i i
i=1 from (θ1 )i=1 with weights w1 = f (y1 |θ1 , V ).

Cycling through these steps, we update the weights and θs sequentially.


This is ideal for on-line inference as, at the end of the sequence we have
N
(θn , wn )i i=1 which is a summary of f (θn |yn ) and when we observe
further data, we can simply update the weights online in the same way.

Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 24 / 31
Using the bootstrap filter with the Lake Huron data

1.0
0.8
The graph compares the filtered

0.6
predicted density of θn with the true

f
density.

0.4
0.2
0.0

578.5 579.0 579.5 580.0 580.5 581.0 581.5

θn

Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 25 / 31
Using the bootstrap filter with the Lake Huron data

582
581
580
Here we see 1 step ahead predictions.

LakeHuron

579
This was very fast!

578
577
576

1880 1900 1920 1940 1960

Time

Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 26 / 31
Why does this scheme work?

Given the Markovian nature of the system, we have

f (θt , θt−1 |yt ) ∝ f (yt |θt ) f (θt |θt−1 )f (θt−1 |yt−1 )


| {z } | {z }
Resample Propogate

i
The bootstrap filter combines old particles θt−1 generated from
i i
f (θt−1 |yt−1 ) with new particles θt generated from f (θt |θt−1 ) so the
i
combined particles (θt−1 , θt ) are draws from f (θt−1 , θt |yt−1 ).
Then the particles are resampled with weights proportional to the
likelihood:

f (yt |θti )f (θti |θt−1


i i
)f (θt−1 |yt−1 )
wti ∝ i i i
= f (yt |θti ).
f (θt |θt−1 )f (θt−1 |yt−1 )

Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 27 / 31
Problems and solutions
Parameter collapsing:
Over time, the resampling scheme can lead to the same parameter
value being resampled almost always with weight 1. Various
alternative filters have been developed to counteract this problem.
One example is the auxiliary particle filter based on

f (θt , θt−1 |yt ) ∝ f (θt |θt−1 , yt ) f (yt |θt−1 )f (θt−1 |yt−1 ) .


| {z }| {z }
Propogate Resample
How to estimate fixed parameters.
The filtering scheme is designed to update parameters which change
over time. What can we do if e.g. the model variances or other
parameters, say φ are unknown?
The Liu West filter looks at
f (θt , θt−1 , φ|yt ) ∝ f (θt , φ|θt−1 , yt ) f (yt |θt−1 , φ)f (θt−1 , φ|yt−1 ) .
| {z }| {z }
Propogate Resample

Other filters or particle learning techniques can also be applied.


Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 28 / 31
Stochastic volatility
The stochastic volatility model is:
ht
yt = e 2 t
ht = µ + φht−1 + τ ηt
where t and ηt are standard normally distributed shocks. This is a
common model for asset returns.
Price of 1 EUR in USD Demeaned log returns
1.6

0.04
0.02
1.4
exrates$USD

0.00
ret
1.2

−0.02
1.0

−0.04
0.8

2000 2002 2004 2006 2008 2010 2012 2000 2002 2004 2006 2008 2010 2012

Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 29 / 31
Bayesian inference for SV models
Given suitable prior parameter distributions, SV models can be run in R via
MCMC using stochvol or using filtering approaches via nimble.
Estimated volatilities in percent (5% / 50% / 95% posterior quantiles)

1.6
1.4
1.2
1.0
0.8
0.6
0.4

2001−12−14 2005−11−11 2009−10−13

Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 30 / 31
Conclusions

In this session we have introduced Bayesian approaches to some of the


most important time series models and shown that fast implementation
can be achieved via sequential Monte Carlo or filtering approaches.

Thanks for listening!

Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 31 / 31

You might also like