Chapter 6: Time Series and Forecasting

Chapter 6: Time Series and Forecasting
Conchi Ausı́n and Mike Wiper

Department of Statistics
Universidad Carlos III de Madrid
Conchi Ausı́n and Mike Wiper Bayesian Methods for Big Data Masters in Big Data 1 / 31
Outline and learning objectives
To introduce Bayesian approaches to time series and forecasting.
Dynamic linear models and the Kalman filter.

State space models.
Online inference and sequential Monte Carlo.
Time series data
582
581
580
data(LakeHuron)
plot.ts(LakeHuron)
LakeHuron
579
The graph shows the average yearly

578
water levels in feet of Lake Huron

between 1875 and 1972.
577
576
1880 1900 1920 1940 1960
Time
Regression fit
We could just try to fit a simple regression model to this data ...
582
581
t = c(1875:1972)
580
model0 = lm(LakeHuron ˜ t)
LakeHuron
f = model0$fitted.values
579
lines(t,f,lwd=2,col=’red’)
578
The fit doesn’t look too bad but ...
577
576
1880 1900 1920 1940 1960
Time
Regression fit
2
r = model0$residuals
1
plot(t,r,col=’red’)
lines(t,rep(0,length(t)))
0
−1
... the residuals seem to be
dependent on x ....
−2
1880 1900 1920 1940 1960
Regression fit
ACF of residuals
1.0
0.8
0.6
acf(r,lwd=2,col=’red’,main=’ACF of
residuals’)
ACF
0.4
0.2
... and are autocorrelated!
0.0
−0.2
0 5 10 15
Lag
A local level model
One of the simplest time series models is the following:
yt = θt + νt νt ∼ Normal(0, Vt )
θt = θt−1 + ωt ωt ∼ Normal(0, Wt )
The observations vary around a local mean which moves according to a

random walk.
Inference for the local level model
Assume that the variances Vt , Wt are known for all t. Then the model
parameters are {θt }. Bayesian inference starts by specifying a prior
distribution for θ0 , say θ0 ∼ Normal(m0 , C0 ). Then inference can be
carried out sequentially:
Set t = 1. Let information up to time t − 1 be yt−1 = (y1 , ..., yt−1 )T .

Calculate the prior distribution for θt as:
θt | yt−1 ∼ Normal(mt−1 , Rt ) where Rt = Ct−1 + Wt .
The one step ahead predictive distribution for yt is:
yt | yt−1 ∼ Normal(mt−1 , Qt ) where Qt = Rt + Vt .
The joint distribution of θt and yt is:

θt mt−1 Rt Rt
y ∼N ,
yt t−1 mt−1 Rt Qt
The joint distribution of θt and yt is:

θt mt−1 Rt Rt
y ∼N ,
yt t−1 mt−1 Rt Qt
The posterior distribution for θt given yt = {yt−1 , yt } is:
θt | yt ∼ Normal(mt , Ct ), where
mt = mt−1 + At et ,
At = Rt /Qt ,
et = yt − mt−1 ,
Ct = Rt − A2t Qt .
Note that et is simply a prediction error term. The posterior mean

formula could also be written as:
mt = (1 − At ) mt−1 + At yt .
k step ahead prediction
It is straightforward to predict various steps into the future as follows:
yt+k = θt+k + νt+k

= θt+k−1 + ωt+k + νt+k
= θt+k−2 + ωt+k−1 + ωt+k + νt+k
.
= ..
k
X
= θt + ωj + νt+k
j=1
Assuming that θt |yt ∼ Normal(mt , Ct ), then

K
X
θt+k |yt ∼ Normal(mt , Ct + Wt+j + Vt+k ).
j=1
Fitting the local level model to the Lake Huron data
Assume that the observation and system variances are Vt = Wt = 1 for all
t.
if(!require(dlm))install.packages(”dlm”)
library(dlm)
mod1 = dlmModPoly (order=1)
filt1 = dlmFilter(LakeHuron,mod1)
names(filt1)
filt1$y
filt1$mod
filt1$m
dlm can be used for fitting dynamic linear models. An alternative, more
recent package is bsts.
Selecting the variances
Different levels of smoothing of the series occur when different variance

structures are considered.
mod2 = dlmModPoly(order=1,dV=10,dW=1)
mod3 = dlmModPoly(order=1,dV=1,dW=10)
In practice, discount factors can be used to specify the relative size of V

and W .
Dynamic linear models
The univariate, normal dynamic linear model (DLM) is:
yt = FT
t θ t + νt , νt ∼ Normal(0, Vt ) (1)
θt = Gt θ t−1 + ω t , ω t ∼ Normal(0, Wt ). (2)
These models are linear state space models. Equation (1) shows the
behaviour of the observations {yt } over time. This depends on an
unobserved signal {θ t }, whose evolution is represented by (2).
The usual features of a time series such as trend and seasonality can be
modeled within this format.
If the matrices Ft , Gt , Vt and Wt are constants, the model is said to be

time invariant.
Examples
Dynamic regression model
yt = θ1t + θ2t xt + νt
θt = θ t−1 + ω t
Autoregressive models
The standard AR(p) model is
yt = θ0 + θ1 yt−1 + θ2 yt−2 + · · · + θp yt−p + νt
This can be expressed as a DLM by setting Ft = (1, yt−1 , ..., yt−p )T with
G = Ip+1 and W having all elements equal to 0.
Allowing W to be a positive variance matrix, gives a time-varying
autoregressive model.
Combining models
The additive structure of the DLMs makes it easy to think of observed

series as originating form the sum of different components,e.g.,
yt = y1t + . . . + yh,t
where y1t might represent a trend component, y2t a seasonal component,

and so on. Then, each component, yit , might be described by a different
DLM:
yt = Fit θ it + νit , νit ∼ N (0, Vit )

θ it = Git θ t−1 + ω it , ω it ∼ N (0, Wit ).
Combining models
By the assumption of independence of the components, yt is also a DLM

described by:
Ft = (F1t |. . .| Fht ) , Vt = V1t + . . . + Vht ,
and
   
G1t W1t
Gt = 
 ..  , Wt = 
  .. .

. .
Ght Wht
Inference for a DLM: The Kalman filter
For the general model, we can carry out inference in the same way.
Assume that θ t−1 |yt−1 ∼ Normal(mt−1 , Ct−1 ). Then:
θ t |yt−1 ∼ Normal(at , Rt ) where at = Gt mt−1 and

Rt = Gt Ct−1 GTt + Wt .
yt |yt−1 ∼ Normal(ft , Qt ) where ft = FT T
t at and Qt = Ft Rt Ft + Vt .

θ t at Rt Rt Ft
yt−1 ∼ Normal , T .
yt ft Ft Rt Qt
−1
θ t |yt ∼ Normal (mt , Ct ) where mt = at + Rt FT
t Qt et where
−1
et = yt − ft−1 is the forecast error, and Ct = Rt − Rt FT
t Qt Ft Rt .
These equations are the well known, Kalman filtering system.
Nowcasting example
5
4
library(bsts)
3
data(iclaims)
2
claims
Time series of weekly unemployment

1
claims. There are various

components.
0
−1
12500 13000 13500 14000 14500 15000 15500
Time
Nowcasting example: model fitting and prediction
ss = AddLocalLinearTrend(list(), initial.claims$iclaimsNSA)
ss = AddSeasonal(ss, initial.claims$iclaimsNSA, nseasons = 52)
model1 = bsts(initial.claims$iclaimsNSA,state.specification = ss,niter =
1000)
plot(model)
plot(model, ”components”)
pred = predict(model, horizon = 12)
plot(pred, plot.original = 156)
State space models
State space models generalize the DLM by allowing for non-linear

relationships:
yt = ht (θ t , νt )
θt = gt (θ t−1 , ω t )
These are a particular example of hidden Markov models.
Forward filtering backward sampling
One way of dealing with state space models is via MCMC algorithms.
These are usually based on the so-called forward filtering backward
sampling algorithm designed for HMMs.
1 The forward filtering step is the standard normal linear analysis to

give f (θt |yt ) at each t, for t = 1, . . . , T .
2 The backward sampling step uses the Markov property and samples
θn∗ from f (θn |yn ) and then, for t = n − 1, . . . , 1, samples from
∗ )
f (θt | yt , θt+1
Thus, a sample from the posterior parameter structure is generated.
However, this algorithm is very costly and not useful for on line inference.
Using FFBS on the Lake Huron data
library(dlm)
gibbsout = dlmGibbs-
DIG(LakeHuron,mod=dlmModPoly(1),shape.y=0.1,rate.y=0.1,shape.theta=0.1,
rate.theta=0.1,n.sample=5000,thin=10)
burn = 500
attach(gibbsout)
ts.plot(ergMean(dV[-burn]),ylab=expression(bar(V)),xlab=’iterations’)
ts.plot(ergMean(dW[-burn]),ylab=expression(bar(V)),xlab=’iterations’)
acf(dV[-burn],lwd=3,col=’red’)
acf(dW[-burn],lwd=3,col=’red’)
hist(dV[-burn],probability=TRUE,nclass=20,xlab=’V’,ylab=’f’,main=”)
lines(density(dV[-burn]),lwd=3,col=’red’)
hist(dW[-burn],probability=TRUE,nclass=20,xlab=’W’,ylab=’f’,main=”)
lines(density(dW[-burn]),lwd=3,col=’red’)
FFBS is very slow!
Online inference: the Bayesian bootstrap filter
Given an initial distribution for θ 0 , the evolution and updating
distributions f (θ t |yt−1 ) and f (θ t |yt ) will not usually have a closed form
for general state space models. Therefore, MCMC methods could be used
but these are time consuming.
Particle filtering uses sampling importance resampling (SIR) ideas to
obtain draws from f (θ t |yt ) based on draws from draws from f (θ t−1 |yt−1 ).
Remember the SIR algorithm to generate a sample from f via an
importance sampler q
1 Draw θ ∗1 , ..., θ ∗N from q().

2 Calculate unnormalized weights wi = f (xi∗ )/q(xi∗ ).
3 Resample x1 , ..., xN from the set {x1∗ , ..., xN∗ } with weights
proportional to wi .
When f is a posterior distribution and we draw from the prior distribution,

q, then the unnormalized weight, w is the asociated likelihood function.
Revisiting the local level model
1 Let (θ0i )N
i=1 be a sample from the prior distribution, f (θ0 |y0 ) with
initial weights (w0i )N
i=1 all proportional to 1.
2 Then we can draw θ1i∗ ∼ Normal(θ0i , W ).
3 Resample (θ1i )N i∗ N i i
i=1 from (θ1 )i=1 with weights w1 = f (y1 |θ1 , V ).
Cycling through these steps, we update the weights and θs sequentially.

This is ideal for on-line inference as, at the end of the sequence we have
N
(θn , wn )i i=1 which is a summary of f (θn |yn ) and when we observe
further data, we can simply update the weights online in the same way.
Using the bootstrap filter with the Lake Huron data
1.0
0.8
The graph compares the filtered
0.6
predicted density of θn with the true
f
density.
0.4
0.2
0.0
578.5 579.0 579.5 580.0 580.5 581.0 581.5
θn
Using the bootstrap filter with the Lake Huron data
582
581
580
Here we see 1 step ahead predictions.
LakeHuron
579
This was very fast!
578
577
576
1880 1900 1920 1940 1960
Time
Why does this scheme work?
Given the Markovian nature of the system, we have
f (θt , θt−1 |yt ) ∝ f (yt |θt ) f (θt |θt−1 )f (θt−1 |yt−1 )

| {z } | {z }
Resample Propogate
i
The bootstrap filter combines old particles θt−1 generated from
i i
f (θt−1 |yt−1 ) with new particles θt generated from f (θt |θt−1 ) so the
i
combined particles (θt−1 , θt ) are draws from f (θt−1 , θt |yt−1 ).
Then the particles are resampled with weights proportional to the
likelihood:
f (yt |θti )f (θti |θt−1

i i
)f (θt−1 |yt−1 )
wti ∝ i i i
= f (yt |θti ).
f (θt |θt−1 )f (θt−1 |yt−1 )
Problems and solutions
Parameter collapsing:
Over time, the resampling scheme can lead to the same parameter
value being resampled almost always with weight 1. Various
alternative filters have been developed to counteract this problem.
One example is the auxiliary particle filter based on
f (θt , θt−1 |yt ) ∝ f (θt |θt−1 , yt ) f (yt |θt−1 )f (θt−1 |yt−1 ) .

| {z }| {z }
Propogate Resample
How to estimate fixed parameters.
The filtering scheme is designed to update parameters which change
over time. What can we do if e.g. the model variances or other
parameters, say φ are unknown?
The Liu West filter looks at
f (θt , θt−1 , φ|yt ) ∝ f (θt , φ|θt−1 , yt ) f (yt |θt−1 , φ)f (θt−1 , φ|yt−1 ) .
| {z }| {z }
Propogate Resample
Other filters or particle learning techniques can also be applied.

Stochastic volatility
The stochastic volatility model is:
ht
yt = e 2 t
ht = µ + φht−1 + τ ηt
where t and ηt are standard normally distributed shocks. This is a
common model for asset returns.
Price of 1 EUR in USD Demeaned log returns
1.6
0.04
0.02
1.4
exrates$USD
0.00
ret
1.2
−0.02
1.0
−0.04
0.8
2000 2002 2004 2006 2008 2010 2012 2000 2002 2004 2006 2008 2010 2012
Bayesian inference for SV models
Given suitable prior parameter distributions, SV models can be run in R via
MCMC using stochvol or using filtering approaches via nimble.
Estimated volatilities in percent (5% / 50% / 95% posterior quantiles)
1.6
1.4
1.2
1.0
0.8
0.6
0.4
2001−12−14 2005−11−11 2009−10−13
Conclusions
In this session we have introduced Bayesian approaches to some of the

most important time series models and shown that fast implementation
can be achieved via sequential Monte Carlo or filtering approaches.
Thanks for listening!

Chapter 6: Time Series and Forecasting

Uploaded by

Copyright:

Available Formats

Chapter 6: Time Series and Forecasting

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 6: Time Series and Forecasting

Uploaded by

Copyright:

Available Formats

Chapter 6: Time Series and Forecasting

Conchi Ausı́n and Mike Wiper

To introduce Bayesian approaches to time series and forecasting.

Dynamic linear models and the Kalman filter.

The graph shows the average yearly

water levels in feet of Lake Huron

1880 1900 1920 1940 1960

1880 1900 1920 1940 1960

1880 1900 1920 1940 1960

One of the simplest time series models is the following:

The observations vary around a local mean which moves according to a

Set t = 1. Let information up to time t − 1 be yt−1 = (y1 , ..., yt−1 )T .

θt | yt−1 ∼ Normal(mt−1 , Rt ) where Rt = Ct−1 + Wt .

The one step ahead predictive distribution for yt is:

yt | yt−1 ∼ Normal(mt−1 , Qt ) where Qt = Rt + Vt .

The posterior distribution for θt given yt = {yt−1 , yt } is:

Note that et is simply a prediction error term. The posterior mean

yt+k = θt+k + νt+k

Assuming that θt |yt ∼ Normal(mt , Ct ), then

Different levels of smoothing of the series occur when different variance

In practice, discount factors can be used to specify the relative size of V

The univariate, normal dynamic linear model (DLM) is:

If the matrices Ft , Gt , Vt and Wt are constants, the model is said to be

yt = θ0 + θ1 yt−1 + θ2 yt−2 + · · · + θp yt−p + νt

The additive structure of the DLMs makes it easy to think of observed

where y1t might represent a trend component, y2t a seasonal component,

yt = Fit θ it + νit , νit ∼ N (0, Vit )

By the assumption of independence of the components, yt is also a DLM

Ft = (F1t |. . .| Fht ) , Vt = V1t + . . . + Vht ,

θ t |yt−1 ∼ Normal(at , Rt ) where at = Gt mt−1 and

These equations are the well known, Kalman filtering system.

Time series of weekly unemployment

claims. There are various

12500 13000 13500 14000 14500 15000 15500

State space models generalize the DLM by allowing for non-linear

These are a particular example of hidden Markov models.

1 The forward filtering step is the standard normal linear analysis to

Thus, a sample from the posterior parameter structure is generated.

FFBS is very slow!

1 Draw θ ∗1 , ..., θ ∗N from q().

When f is a posterior distribution and we draw from the prior distribution,

Cycling through these steps, we update the weights and θs sequentially.

578.5 579.0 579.5 580.0 580.5 581.0 581.5

1880 1900 1920 1940 1960

Given the Markovian nature of the system, we have

f (θt , θt−1 |yt ) ∝ f (yt |θt ) f (θt |θt−1 )f (θt−1 |yt−1 )

f (yt |θti )f (θti |θt−1

f (θt , θt−1 |yt ) ∝ f (θt |θt−1 , yt ) f (yt |θt−1 )f (θt−1 |yt−1 ) .

Other filters or particle learning techniques can also be applied.

2001−12−14 2005−11−11 2009−10−13

In this session we have introduced Bayesian approaches to some of the

Thanks for listening!

You might also like