Applied Time Series Analysis
Applied Time Series Analysis
Applied Time Series Analysis
Environmental Sciences
2020-02-03
2
Contents
3
4 CONTENTS
Book package
The book uses a number of R packages and a variety of fisheries data sets.
The packages and data sets can be installed by installing our atsalibrary
package which is hosted on GitHub:
library(devtools)
devtools::install_github("nwfsc-timeseries/atsalibrary")
Authors
The authors are research scientists at the Northwest Fisheries Science Center
(NWFSC). This work was conducted as part of our jobs at the NWFSC, a re-
search center for NOAA Fisheries which is a United States federal government
agency.
Links to more code and publications can be found on our academic websites:
• http://faculty.washington.edu/eeholmes
• http://faculty.washington.edu/scheuerl
• http://faculty.washington.edu/warde
9
10 CONTENTS
Citation
This chapter reviews the basic matrix math operations that you will need to
understand the course material and shows how to do these operations in R.
A script with all the R code in the chapter can be downloaded here.
11
12 CHAPTER 1. MATRIX MATH
[3,] 3 6 9 12
matrix(1:12, 3, 4, byrow = TRUE)
[,1]
[1,] 1
[2,] 2
[3,] 3
[4,] 4
Create a matrix with one row:
matrix(1:4, nrow = 1)
[1] 2 3
Get the number of rows in a matrix:
dim(A)[1]
[1] 2
1.1. CREATING MATRICES IN R 13
nrow(A)
[1] 2
Create a 3D matrix (called array):
A = array(1:6, dim = c(2, 3, 2))
A
, , 1
, , 2
[1] 2 3 2
Check if an object is a matrix. A data frame is not a matrix. A vector is not
a matrix.
A = matrix(1:4, 1, 4)
A
[1] "matrix"
B = data.frame(A)
B
X1 X2 X3 X4
1 1 2 3 4
14 CHAPTER 1. MATRIX MATH
class(B)
[1] "data.frame"
C = 1:4
C
[1] 1 2 3 4
class(C)
[1] "integer"
[,1] [,2]
[1,] 22 49
[2,] 28 64
B%*%A #this works
[2,] 12 26 40
[3,] 15 33 51
try(B%*%B) #this doesn't
[,1] [,2]
[1,] 1 2
[2,] 3 4
[3,] 5 6
try(A%*%A) #this won't work
[,1] [,2]
[1,] 35 44
16 CHAPTER 1. MATRIX MATH
[2,] 44 56
[,1] [,2]
[1,] 1 4
[2,] 2 5
#What does this do?
A[c(1,3),c(1,3)]
[,1] [,2]
[1,] 1 7
[2,] 3 9
#This?
A[c(1,2,1),c(2,3)]
[,1] [,2]
[1,] 4 7
[2,] 5 8
[3,] 4 7
If you have used matlab, you know you can say something like A[1,end] to
denote the element of a matrix in row 1 and the last column. R does not
have ‘end’. To do, the same in R you do something like:
1.3. SUBSETTING A MATRIX 17
A=matrix(1:9, 3, 3)
A[1,ncol(A)]
[1] 7
#or
A[1,dim(A)[2]]
[1] 7
Warning R will create vectors from subsetting matrices!
One of the really bad things that R does with matrices is create a vector if
you happen to subset a matrix to create a matrix with 1 row or 1 column.
Look at this:
A=matrix(1:9, 3, 3)
#take the first 2 rows
B=A[1:2,]
#everything is ok
dim(B)
[1] 2 3
class(B)
[1] "matrix"
#take the first row
B=A[1,]
#oh no! It should be a 1x3 matrix but it is not.
dim(B)
NULL
#It is not even a matrix any more
class(B)
[1] "integer"
#and what happens if we take the transpose?
#Oh no, it's a 1x3 matrix not a 3x1 (transpose of 1x3)
t(B)
18 CHAPTER 1. MATRIX MATH
[,1]
[1,] 66
[2,] 78
[3,] 90
#It works? That is horrible!
This will create hard to find bugs in your code because you will look at
B=A[1,] and everything looks fine. Why is R saying it is not a matrix! To
stop R from doing this use drop=FALSE.
B=A[1,,drop=FALSE]
#Now it is a matrix as it should be
dim(B)
[1] 1 3
class(B)
[1] "matrix"
#this fails as it should (alerting you to a problem!)
try(A%*%B)
Replace 1 element.
A=matrix(1, 3, 3)
A[1,1]=2
A
[1,] 2 1 1
[2,] 1 1 1
[3,] 1 1 1
Replace a row with all 1s or a string of values
A=matrix(1, 3, 3)
A[1,]=2
A
The diag() function can also be used to replace elements on the diagonal of
a matrix:
1.5. DIAGONAL MATRICES AND IDENTITY MATRICES 21
A = matrix(3, 3, 3)
diag(A) = 1
A
[1] 1 5 9
The identity matrix is a special kind of diagonal matrix with 1s on the
diagonal. It is denoted I. I3 would mean a 3 × 3 diagonal matrix. A identity
matrix has the property that AI = A and IA = A so it is like a 1.
A = matrix(1:9, 3, 3)
I = diag(3) #shortcut for 3x3 identity matrix
A %*% I
22 CHAPTER 1. MATRIX MATH
The inverse of a matrix is denoted A−1 . You can think of the inverse of a
matrix like 1/a. 1/a × a = 1. A−1 A = AA−1 = I. The inverse of a matrix
does not always exist; for one it has to be square. We’ll be using inverses
for variance-covariance matrices and by definition (of a variance-covariance
matrix), the inverse of those exist. In R, there are a couple way common
ways to take the inverse of a variance-covariance matrix (or something with
the same properties). solve() is the most common probably:
A = diag(3, 3) + matrix(1, 3, 3)
invA = solve(A)
invA %*% A
1
The Cholesky decomposition is a handy way to keep your variance-covariance matrices
valid when doing a parameter search. Don’t search over the raw variance-covariance matrix.
Search over a matrix where the lower triangle is 0, that is what a Cholesky decomposition
looks like. Let’s call it B. Your variance-covariance matrix is t(B)%*%B.
1.6. TAKING THE INVERSE OF A SQUARE MATRIX 23
A = diag(3, 3) + matrix(1, 3, 3)
invA = chol2inv(chol(A))
invA %*% A
1.7 Problems
1. Build a 4 × 3 matrix with the numbers 1 through 4 in each row.
2. Extract the elements in the 1st and 2nd rows and 1st and 2nd columns
(you’ll have a 2 × 2 matrix). Show the R code that will do this.
3. Build a 4 × 3 matrix with the numbers 1 through 12 by row (meaning
the first row will have the numbers 1 through 4 in it).
4. Extract the 3rd row of the above. Show R code to do this where you
end up with a vector and how to do this where you end up with a 1 × 3
matrix.
5. Build a 4 × 3 matrix that is all 1s except a 2 in the (2,3) element (2nd
row, 3rd column).
6. Take the transpose of the above.
7. Build a 4 × 4 diagonal matrix with 1 through 4 on the diagonal.
8. Build a 5 × 5 identity matrix.
9. Replace the diagonal in the above matrix with 2 (the number 2).
10. Build a matrix with 2 on the diagonal and 1s on the offdiagonals.
11. Take the inverse of the above.
12. Build a 3 × 3 matrix with the first 9 letters of the alphabet. First
column should be “a”, “b”, “c”. letters[1:9] gives you these letters.
13. Replace the diagonal of this matrix with the word “cat”.
14. Build a 4 × 3 matrix with all 1s. Multiply by a 3 × 4 matrix with all 2s.
15. If A is a 4 × 3 matrix, is AA possible? Is AA> possible? Show how to
write AA> in R.
h1 4 7i
16. In the equation, AB = C, let A = 2 5 8 . Build a B matrix with only
3 6 9
1s and 0s such that the values on the diagonal of C are 1, 8, 6 (in that
order). Show your R code for A, B and AB.
h 2 8 14 i AB = C. Build a 3 × 3 B
17. Same A matrix as above and same equation
matrix such that C = 2A. So C = 4 10 16 . Hint, B is diagonal.
6 12 18
1.7. PROBLEMS 25
This chapter shows how to write linear regression models in matrix form.
The purpose is to get you comfortable writing multivariate linear models in
different matrix forms before we start working with time series versions of
these models. Each matrix form is an equivalent model for the data, but
written in different forms. You do not need to worry which form is better
or worse at this point. Simply get comfortable writing multivariate linear
models in different matrix forms.
A script with all the R code in the chapter can be downloaded here. The
Rmd file of this chapter can be downloaded here.
This chapter uses the stats, MARSS and datasets packages. Install those
packages, if needed, and load:
library(stats)
library(MARSS)
library(datasets)
We will work with the stackloss dataset available in the datasets package.
The dataset consists of 21 observations on the efficiency of a plant that
27
28 CHAPTER 2. LINEAR REGRESSION
We will start by regressing stack loss against air flow. In R using the lm()
function this is
# the dat data.frame is defined on the first page of the
# chapter
lm(stack.loss ~ Air.Flow, data = dat)
We will write the model for all the measurements together in two different
ways, Form 1 and Form 2.
2.2. MATRIX FORM 1 29
stack.loss4 1 air4 e4
You should work through the matrix algebra to make sure you understand
why Equation (2.2) is Equation (2.1) for all the i data points together.
We can write the first line of Equation (2.2) succinctly as
y = Zx + e (2.3)
where x are our parameters, y are our response variables, and Z are our
explanatory variables (with a 1 column for the intercept). The lm() function
uses Form 1, and we can recover the Z matrix for Form 1 by using the
model.matrix() function on the output from a lm() call:
fit = lm(stack.loss ~ Air.Flow, data = dat)
Z = model.matrix(fit)
Z[1:4, ]
(Intercept) Air.Flow
1 1 80
2 1 80
3 1 75
4 1 62
Note: You will not need to know how to solve linear matrix equations for
this course. This section just shows you what the lm() function is doing to
estimate the parameters.
Notice that Z is not a square matrix and its inverse does not exist but the
inverse of Z> Z exists—if this is a solveable problem. We can go through the
following steps to solve for x, our parameters α and β.
30 CHAPTER 2. LINEAR REGRESSION
Let’s assume our errors, the e, are i.i.d. which means that
2
σ 0 0 0
0 σ2 0 0
e∼ MVN
0,
2
0 0 σ 0
0 0 0 σ2
Let’s try that with R and compare to what you get with lm():
y = matrix(dat$stack.loss, ncol = 1)
Z = cbind(1, dat$Air.Flow) #or use model.matrix() to get Z
solve(t(Z) %*% Z) %*% t(Z) %*% y
[,1]
[1,] -11.6159170
[2,] 0.6412918
2.2. MATRIX FORM 1 31
(Intercept) Air.Flow
-11.6159170 0.6412918
As you see, you get the same values.
We can solve for x just like before and compare to what we get with lm():
y = matrix(dat$stack.loss, ncol = 1)
Z = cbind(1, dat$Air.Flow, dat$Water.Temp, dat$Acid.Conc)
# or Z=model.matrix(fit2)
solve(t(Z) %*% Z) %*% t(Z) %*% y
[,1]
[1,] -524.904762
[2,] -1.047619
[3,] 7.619048
[4,] 5.000000
coef(fit1.mult)
This form of writing a regression model will come up when you work with
dynamic linear models (DLMs). With DLMs, you will be fitting models of
the form yt = Zt xt + et . In these models you have multiple y at regular time
points and you allow your regression parameters, the x, to evolve through
time as a random walk.
This is just the transpose of Form 1. Work through the matrix algebra to
make sure you understand why Equation (2.6) is Equation (2.1) for all the i
data points together and why it is equal to the transpose of Equation (2.2).
You’ll need the relationship (AB)> = B> A> .
Let’s write Equation (2.6) as y = Dd, where D contains our parameters.
Then we can solve for D following the steps in Section 2.2.1 but multiplying
from the right instead of from the left. Work through the steps to show that
d = yd> (dd> )−1 .
y = matrix(dat$stack.loss, nrow = 1)
d = rbind(1, dat$Air.Flow, dat$Water.Temp, dat$Acid.Conc)
y %*% t(d) %*% solve(d %*% t(d))
In this form, we have the explanatory variables in a matrix on the right of our
parameter matrix as in Form 1b but we arrange everything a little differently:
1
stack.loss1 α β 0 0 0 e1
air
1
stack.loss α 0 β 0 0 e
2 2
= air2
+ (2.7)
stack.loss3 α 0 0 β 0 e3
air3
stack.loss4 α 0 0 0 β e4
air4
Work through the matrix algebra to make sure you understand why Equation
(2.7) is the same as Equation (2.1) for all the i data points together.
We will write Form 2 succinctly as
y = Zx + e (2.8)
34 CHAPTER 2. LINEAR REGRESSION
acid4
Add columns to the Z matrix for each new variable.
α β1 0 0 0 β2 0 0 0 β3 0 0 0
α 0 β 0 0 0 β2 0 0 0 β3 0 0
1
(2.10)
α 0 0 β1 0 0 0 β2 0 0 0 β3 0
α 0 0 0 β1 0 0 0 β2 0 0 0 β3
The number of rows of Z is always n, the number of rows of y, because the
number of rows on the left and right of the equal sign must match. The
number of columns in Z is determined by the size of x. If there is an intercept,
there is a 1 in x. Then each explanatory variable (like air flow and wind)
appears n times. So if the number of explanatory variables is k, the number
of columns in Z is 1 + k × n if there is an intercept term and k × n if there is
not.
Form 2 is similar to how multivariate time series models are typically written
for reading by humans (on a whiteboard or paper). In these models, we see
2.3. MATRIX FORM 2 35
y1 βa βb " # e1
y
2
β
a 0.1 x
1
e
2
= + (2.11)
y3 βb βa x2 e3
t
y4 t 0 βa e4 t
You can just skim this section if you want but make sure you carefully look at
the code in
refsec-mlr-solveform2code. You will need to adapt that for the homework.
Though you will not need any of the math discussed here for the course, this
section will help you practice matrix multiplication and will introduce you to
‘permutation’ matrices which will be handy in many other contexts.
To solve for α and β, we need our parameters in a column matrix like so
[ αβ ]. We do this by rewritting Zx in Equation (2.8) in ‘vec’ form: if Z
is a n × m matrix and x is a matrix with 1 column and m rows, then
Zx = (x> ⊗ In ) vec(Z). The symbol ⊗ means Kronecker product and just
ignore it since you’ll never see it again in our course (or google ‘kronecker
product’ if you are curious). The “vec” of a matrix is that matrix rearranged
as a single column:
1
" #
1 2 3
vec =
3 4 2
4
Notice how you just take each column one by one and stack them under each
other. In R, the vec is
A = matrix(1:6, nrow = 2, byrow = TRUE)
vecA = matrix(A, ncol = 1)
36 CHAPTER 2. LINEAR REGRESSION
1
stack.loss1 α β 0 0 e1
air1
stack.loss2 = α 0 β 0 + e2 (2.12)
air2
stack.loss3 α 0 0 β e3
air3
as before.
In the homework, you will use the R code in this section to solve for the
parameters in Form 2. Later when you are fitting multivariate time series
models, you will not solve for parameters this way but you will need to both
construct Z matrices in R and read Z matrices. The homework will give you
practice creating Z matrices in R.
#make your y and x matrices
y=matrix(dat$stack.loss, ncol=1)
x=matrix(c(1,dat$Air.Flow),ncol=1)
#make the Z matrix
n=nrow(dat) #number of rows in our data file
k=1
#Z has n rows and 1 col for intercept, and n cols for the n air data points
#a list matrix allows us to combine "characters" and numbers
Z=matrix(list(0),n,k*n+1)
Z[,1]="alpha"
diag(Z[1:n,1+1:n])="beta"
#this function creates that permutation matrix for you
P=MARSS:::convert.model.mat(Z)$free[,,1]
M=kronecker(t(x),diag(n))%*%P
solve(t(M)%*%M)%*%t(M)%*%y
[,1]
alpha -11.6159170
beta 0.6412918
38 CHAPTER 2. LINEAR REGRESSION
coef(lm(dat$stack.loss ~ dat$Air.Flow))
(Intercept) dat$Air.Flow
-11.6159170 0.6412918
Go through this code line by line at the R command line. Look at Z. It
is a list matrix that allows you to combine numbers (the 0s) with charac-
ter string (names of parameters). Look at the permutation matrix P. Try
MARSS:::convert.model.mat(Z)$free and see that it returns a 3D matrix,
which is why the [„1] appears (to get us a 2D matrix). To use more data
points, you can redefine dat to say dat=stackloss to use all 21 data points.
Let’s say that the odd numbered plants are in the north and the even numbered
are in the south. We want to include this as a factor in our model that affects
the intercept. Let’s go back to just having air flow be our explanatory variable.
Now if the plant is in the north our model is
stack.loss1 air1 1 0 e1
β
stack.loss
2
air
2 0 1 e2
= αn + (2.18)
stack.loss3 air3 1 0 e3
αs
stack.loss4 air4 0 1 e4
Notice that odd plants get αn and even plants get αs . Use model.matrix()
to see that this is the Z matrix that lm() formed. Notice the matrix output
by model.matrix() looks exactly like Z in Equation (2.18).
Z = model.matrix(fit2)
Z[1:4, ]
Or just use model.matrix(). This will save time when models are more
complex.
40 CHAPTER 2. LINEAR REGRESSION
Z = model.matrix(fit2)
Z[1:4, ]
[,1]
Air.Flow 0.5358166
regn -2.0257880
regs -5.5429799
Compare to the output from lm() and you will see it is the same.
coef(fit2)
y = matrix(dat$stack.loss, ncol = 1)
x = matrix(c(1, dat$Air.Flow), ncol = 1)
n = nrow(dat)
k = 1
# list matrix allows us to combine numbers and character
# strings
Z = matrix(list(0), n, k * n + 1)
Z[seq(1, n, 2), 1] = "alphanorth"
Z[seq(2, n, 2), 1] = "alphasouth"
diag(Z[1:n, 1 + 1:n]) = "beta"
P = MARSS:::convert.model.mat(Z)$free[, , 1]
M = kronecker(t(x), diag(n)) %*% P
solve(t(M) %*% M) %*% t(M) %*% y
[,1]
alphanorth -2.0257880
alphasouth -5.5429799
beta 0.5358166
Make sure you understand the code used to form the Z matrix. Also notice
that class(Z[1,3])="numeric" while class(Z[1,2])="character". This
is important. 0 in R is a number while "0" would be a character (the name
of a parameter).
The air data have been written to the right of the 1s and 0s for north/south
intercepts because that is how lm() writes this model in Form 1 and I want
to duplicate that (for teaching purposes). Also the β’s are ordered to be
alphabetical because lm() writes the Z matrix like that.
2.5. GROUPS OF β’S 43
Now our model is more complicated and using model.matrix() to get our Z
saves us a lot tedious matrix building.
fit3 = lm(stack.loss ~ -1 + Air.Flow:owner + reg, data = dat)
Z = model.matrix(fit3)
Z[1:4, ]
[,1]
regn -38.0
regs -3.0
Air.Flow:ownera 0.5
Air.Flow:owners 1.0
Compare to the output from lm() and you will see it is the same.
To write this model in Form 2, we just add subscripts to the β’s in our Form
2 Z matrix:
1
stack.loss1 αn βs 0 0 0 e1
air1
stack.loss
2
α
s 0 βa 0 0
e2
= air2 + = Zx + e (2.23)
stack.loss3 αn 0 0 βs 0 e3
air3
stack.loss4 αs 0 0 0 βa e4
air4
To estimate the parameters, we change the β’s in our Z list matrix to have
owner designations:
44 CHAPTER 2. LINEAR REGRESSION
y = matrix(dat$stack.loss, ncol = 1)
x = matrix(c(1, dat$Air.Flow), ncol = 1)
n = nrow(dat)
k = 1
Z = matrix(list(0), n, k * n + 1)
Z[seq(1, n, 2), 1] = "alpha.n"
Z[seq(2, n, 2), 1] = "alpha.s"
diag(Z[1:n, 1 + 1:n]) = rep(c("beta.s", "beta.a"), n)[1:n]
P = MARSS:::convert.model.mat(Z)$free[, , 1]
M = kronecker(t(x), diag(n)) %*% P
solve(t(M) %*% M) %*% t(M) %*% y
[,1]
alpha.n -38.0
alpha.s -3.0
beta.s 1.0
beta.a 0.5
The parameters estimates are the same, though β’s are given in reversed
order simply due to the way convert.model.mat() is ordering the columns
in Form 2’s Z.
etc.
We add a column to our dataframe to account for season:
2.6. SEASONAL EFFECT AS A FACTOR 45
Remembering that lm() puts models in Form 1, look at the Z matrix for
Form 1:
fit4 = lm(stack.loss ~ -1 + qtr, data = dat)
Z = model.matrix(fit4)
Z[1:4, ]
2 0 1 0 0
3 0 0 1 0
4 0 0 0 1
stack.loss1 1 0 0 0 α1 e1
stack.loss 0 1 0 0 α2 e2
2
= + = Zx + e (2.26)
stack.loss3 0 0 1 0 α3 e3
stack.loss4 0 0 0 1 α4 e4
Compare to the model that lm() is using when the intercept included. What
does this model look like written in matrix form?
fit5 = lm(stack.loss ~ qtr, data = dat)
Z = model.matrix(fit5)
Z[1:4, ]
stack.loss1 α1 e1
stack.loss α h i e
2 2 2
= 1 + = Zx + e (2.27)
stack.loss3 α3 e3
stack.loss4 α4 e4
2.7. SEASONAL EFFECT PLUS OTHER EXPLANATORY VARIABLES*47
With our 4 data points, we are limited to estimating 4 parameters. Let’s use
the full 21 data points so we can estimate some more complex models. We’ll
add an owner variable and a quarter variable to the stackloss dataset.
data(stackloss, package = "datasets")
fulldat = stackloss
n = nrow(fulldat)
fulldat = cbind(fulldat, owner = rep(c("sue", "aneesh", "joe"),
n)[1:n], qtr = paste("qtr", rep(1:4, n)[1:n], sep = ""),
reg = rep(c("n", "s"), n)[1:n])
Let’s fit a model where there is only an effect of air flow, but that effect varies
by owner and by quarter. We also want a different intercept for each quarter.
So if datapoint i is from quarter j on a plant owned by owner k, the model is
So there there are 4 × 3 β’s (4 quarters and 3 owners) and 4 α’s (4 quarters).
With lm(), we fit the model as:
fit7 = lm(stack.loss ~ -1 + qtr + Air.Flow:qtr:owner, data = fulldat)
Take a look at Z for Form 1 using model.matrix(Z). It’s not shown since it
is large:
model.matrix(fit7)
The x will be
α1
α2
α3
α
4
β1,a
β
2,a
β3,a
...
48 CHAPTER 2. LINEAR REGRESSION
Take a look at the model matrix that lm() is using and make sure you
understand how Zx produces Equation (2.28).
Z = model.matrix(fit7)
For Form 2, our Z size doesn’t change; number of rows is n (the number
data points) and number of columns is 1 (for intercept) plus the number of
explanatory variables times n. So in this case, we only have one explanatory
variable (air flow) so Z has 1+21 columns. To allow the intercept to vary by
quarter, we use α1 in the rows of Z where the data is from quarter 1, use α2
where the data is from quarter 2, etc. Similarly we use the appropriate βj,k
depending on the quarter and owner for that data point.
We could construct Z, x and y for Form 2 using
y=matrix(fulldat$stack.loss, ncol=1)
x=matrix(c(1,fulldat$Air.Flow),ncol=1)
n=nrow(fulldat)
k=1
Z=matrix(list(0),n,k*n+1)
#give the intercepts names based on qtr
Z[,1]=paste(fulldat$qtr)
#give the betas names based on qtr and owner
diag(Z[1:n,1+1:n])=paste("beta",fulldat$qtr,fulldat$owner,sep=".")
P=MARSS:::convert.model.mat(Z)$free[,,1]
M=kronecker(t(x),diag(n))%*%P
solve(t(M)%*%M)%*%t(M)%*%y
Note, the estimates are the same as for lm() but are not listed in the same
order.
Make sure to look at the Z and x for the models and that you understand
why they look like they do.
Try adding region as another factor in your model along with quarter and fit
with lm():
2.8. MODELS WITH CONFOUNDED PARAMETERS* 49
But why is the estimate for quarter 4 equal to NA? What if the ordering of
north and south regions was different, say 1 through 4 north, 5 through 8
south, 9 through 12 north, etc?
fulldat2 = fulldat
fulldat2$reg2 = rep(c("n", "n", "n", "n", "s", "s", "s", "s"),
3)[1:21]
fit = lm(stack.loss ~ Air.Flow + reg2 + qtr, data = fulldat2)
coef(fit)
the estimates for the confounded parameters are meaningless. So you will
need to think carefully about the model you are fitting and consider if there
are multiple parameters measuring the same thing (for example 2 intercept
parameters).
2.9. PROBLEMS 51
2.9 Problems
For the homework questions, we will using part of the airquality data set
in R. Load that as
data(airquality, package="datasets")
#remove any rows with NAs omitted.
airquality=na.omit(airquality)
#make Month a factor (i.e., the Month number is a name rather than a number)
airquality$Month=as.factor(airquality$Month)
#add a region factor
airquality$region = rep(c("north","south"),60)[1:111]
#Only use 5 data points for the homework so you can show the matrices easily
homeworkdat = airquality[1:5,]
c. Solve for the parameters using the code from subsection 2.3.4.
5. A model of the ozone data with only a region (north/south) effect can
be written:
fit = lm(Ozone ~ -1 + region, data = homeworkdat)
A script with all the R code in the chapter can be downloaded here. The
Rmd for this chapter can be downloaded here.
53
54 CHAPTER 3. INTRODUCTION TO TIME SERIES
200
150
100
0 20 40 60 80 100
Time
7000
6000
5000
4000
3000
2000
1000
Time
{x1 , x2 , x3 , . . . , xn }
For example,
{10, 31, 27, 42, 53, 15}
It can be further classified.
58 58
56 56
54 54
52 52
50 50
Jan 04 2016 Mar 01 2016 May 02 2016 Jul 01 2016 Sep 01 2016
3.4. WHAT IS A TIME SERIES MODEL? 57
We use a time series model to analyze time series data. A time series model
for {xt } is a specification of the joint distributions of a sequence of random
variables {Xt }, of which {xt } is thought to be a realization.
1
Xt
−1
−2
−3
0 10 20 30 40
Time
1
Xt
−1
−2
−3
0 10 20 30 40
Time
1
xt
−1
−2
−3
0 10 20 30 40
Time
3.6. CLASSICAL DECOMPOSITION 59
10
5
xt
−5
−10
0 10 20 30 40
Time
1. trend (mt )
3. remainder (et )
x t = m t + st + e t
60 CHAPTER 3. INTRODUCTION TO TIME SERIES
We need a way to extract the so-called signal. One common method is via
“linear filters”
∞
X
mt = λi xt+1
i=−∞
a
X 1
mt = xt+i
i=−a 2a + 1
If a = 1, then
1
mt = (xt−1 + xt + xt+1 )
3
st = xt − mt
3.6. CLASSICAL DECOMPOSITION 61
600
500
400
300
200
100
Time
600 λ = 1/3
500
400
300
200
100
Time
Figure 3.6: Monthly airline passengers from 1949-1960 with a low filter.
62 CHAPTER 3. INTRODUCTION TO TIME SERIES
600 λ = 1/3
λ = 1/9
500
400
300
200
100
Time
Figure 3.7: Monthly airline passengers from 1949-1960 with a medium filter.
600 λ = 1/3
λ = 1/9
500
λ = 1/27
400
300
200
100
Time
Figure 3.8: Monthly airline passengers from 1949-1960 with a high filter.
3.6. CLASSICAL DECOMPOSITION 63
50
−50
−100
Time
This is the seasonal effect (st ), assuming λ = 1/9, but, st includes the
remainder et as well. Instead we can estimate the mean seasonal effect (st ).
seas_2 <- decompose(xx)$seasonal
par(mai = c(0.9, 0.9, 0.1, 0.1), omi = c(0, 0, 0, 0))
plot.ts(seas_2, las = 1, ylab = "")
e t = xt − m t − s t
ee <- decompose(xx)$random
par(mai = c(0.9, 0.9, 0.1, 0.1), omi = c(0, 0, 0, 0))
plot.ts(ee, las = 1, ylab = "")
64 CHAPTER 3. INTRODUCTION TO TIME SERIES
60
40
20
−20
−40
Time
60
40
20
−20
−40
Time
Let’s repeat the decomposition with the log of the airline data.
lx <- log(AirPassengers)
par(mai = c(0.9, 0.9, 0.1, 0.1), omi = c(0, 0, 0, 0))
plot.ts(lx, las = 1, ylab = "")
6.5
6.0
5.5
5.0
Time
6.5
6.0
5.5
5.0
Time
0.0
−0.1
−0.2
−0.3
Time
3.7. DECOMPOSITION ON LOG-TRANSFORMED DATA 67
0.2
0.1
0.0
−0.1
Time
le <- lx - pp - seas_2
par(mai = c(0.9, 0.9, 0.1, 0.1), omi = c(0, 0, 0, 0))
plot.ts(le, las = 1, ylab = "")
68 CHAPTER 3. INTRODUCTION TO TIME SERIES
0.15
0.10
0.05
0.00
−0.05
−0.10
−0.15
Time
Chapter 4
This chapter introduces you to some of the basic functions in R for plotting
and analyzing univariate time series data. Many of the things you learn here
will be relevant when we start examining multivariate time series as well. We
will begin with the creation and plotting of time series objects in R, and then
moves on to decomposition, differencing, and correlation (e.g., ACF, PACF)
before ending with fitting and simulation of ARMA models.
A script with all the R code in the chapter can be downloaded here. The
Rmd for this chapter can be downloaded here.
This chapter uses the stats package, which is often loaded by default when
you start R, the MARSS package and the forecast package. The problems
use a dataset in the datasets package. After installing the packages, if
needed, load:
library(stats)
library(MARSS)
library(forecast)
library(datasets)
The chapter uses data sets which are in the atsalibrary package. If needed,
install using the devtools package.
69
70 CHAPTER 4. BASIC TS FUNCTIONS IN R
library(devtools)
devtools::install_github("nwfsc-timeseries/atsalibrary")
The main one is a time series of the atmospheric concentration of CO2 collected
at the Mauna Loa Observatory in Hawai’i (MLCO2). The second is Northern
Hemisphere land and ocean temperature anomalies from NOAA. (NHTemp).
The problems use a data set on hourly phytoplankton counts (hourlyphyto).
Use ?MLCO2, ?NHTemp and ?hourlyphyto for information on these datasets.
Load the data.
data(NHTemp, package = "atsalibrary")
Temp <- NHTemp
data(MLCO2, package = "atsalibrary")
CO2 <- MLCO2
data(hourlyphyto, package = "atsalibrary")
pDat <- hourlyphyto
Time series plots are an excellent way to begin the process of understanding
what sort of process might have generated the data of interest. Traditionally,
time series have been plotted with the observed data on the y-axis and time on
the x-axis. Sequential time points are usually connected with some form of line,
but sometimes other plot forms can be a useful way of conveying important
information in the time series (e.g., barplots of sea-surface temperature
anomolies show nicely the contrasting El Niño and La Niña phenomena).
The CO2 data are stored in R as a data.frame object, but we would like
to transform the class to a more user-friendly format for dealing with time
series. Fortunately, the ts() function will do just that, and return an object
of class ts as well. In addition to the data themselves, we need to provide
ts() with 2 pieces of information about the time index for the data.
4.1. TIME SERIES PLOTS 71
The first, frequency, is a bit of a misnomer because it does not really refer
to the number of cycles per unit time, but rather the number of observa-
tions/samples per cycle. So, for example, if the data were collected each hour
of a day then frequency=24.
The second, start, specifies the first sample in terms of (day, hour), (year,
month), etc. So, for example, if the data were collected monthly beginning
in November of 1969, then frequency=12 and start=c(1969,11). If the
data were collected annually, then you simply specify start as a scalar (e.g.,
start=1991) and omit frequency (i.e., R will set frequency=1 by default).
The Mauna Loa time series is collected monthly and begins in March of 1958,
which we can get from the data themselves, and then pass to ts().
## create a time series (ts) object from the CO2 data
co2 <- ts(data = CO2$ppm, frequency = 12, start = c(CO2[1, "year"],
CO2[1, "month"]))
Now let’s plot the data using plot.ts(), which is designed specifically for ts
objects like the one we just created above. It’s nice because we don’t need to
specify any x-values as they are taken directly from the ts object.
## plot the ts
plot.ts(co2, ylab = expression(paste("CO"[2], " (ppm)")))
400
CO2 (ppm)
360
320
Time
Figure 4.1: Time series of the atmospheric CO2 concentration at Mauna Loa,
Hawai’i measured monthly from March 1958 to present.
Examination of the plotted time series (Figure 4.1) shows 2 obvious features
72 CHAPTER 4. BASIC TS FUNCTIONS IN R
Before we examine the CO2 data further, however, let’s see a quick example
of how you can combine and plot multiple time series together. We’ll use the
data on monthly mean temperature anomolies for the Northern Hemisphere
(Temp). First convert Temp to a ts object.
temp.ts <- ts(data = Temp$Value, frequency = 12, start = c(1880,
1))
Before we can plot the two time series together, however, we need to line up
their time indices because the temperature data start in January of 1880,
but the CO2 data start in March of 1958. Fortunately, the ts.intersect()
function makes this really easy once the data have been transformed to ts
objects by trimming the data to a common time frame. Also, ts.union()
works in a similar fashion, but it pads one or both series with the appropriate
number of NA’s. Let’s try both.
## intersection (only overlapping times)
datI <- ts.intersect(co2, temp.ts)
## dimensions of common-time data
dim(datI)
[1] 682 2
## union (all times)
datU <- ts.union(co2, temp.ts)
## dimensions of all-time data
dim(datU)
[1] 1647 2
As you can see, the intersection of the two data sets is much smaller than
the union. If you compare them, you will see that the first 938 rows of datU
contains NA in the co2 column.
4.2. DECOMPOSITION OF TIME SERIES 73
It turns out that the regular plot() function in R is smart enough to recognize
a ts object and use the information contained therein appropriately. Here’s
how to plot the intersection of the two time series together with the y-axes
on alternate sides (results are shown in Figure 4.2):
## plot the ts
plot(datI, main = "", yax.flip = TRUE)
400
360
co2
320
temp.ts
0.5
−0.5
Time
Figure 4.2: Time series of the atmospheric CO2 concentration at Mauna Loa,
Hawai’i (top) and the mean temperature index for the Northern Hemisphere
(bottom) measured monthly from March 1958 to present.
Plotting time series data is an important first step in analyzing their various
components. Beyond that, however, we need a more formal means for
identifying and removing characteristics such as a trend or seasonal variation.
As discussed in lecture, the decomposition model reduces a time series into 3
74 CHAPTER 4. BASIC TS FUNCTIONS IN R
xt = mt + st + et , (4.1)
In lecture we discussed how linear filters are a common way to estimate trends
in time series. One of the most common linear filters is the moving average,
which for time lags from −a to a is defined as
a
1
X
m̂t = xt+k . (4.2)
k=−a 1 + 2a
This model works well for moving windows of odd-numbered lengths, but
should be adjusted for even-numbered lengths by adding only 12 of the 2 most
extreme lags so that the filtered value at time t lines up with the original
observation at time t. So, for example, in a case with monthly data such as
the atmospheric CO2 concentration where a 12-point moving average would
be an obvious choice, the linear filter would be
1
x
2 t−6
+ xt−5 + · · · + xt−1 + xt + xt+1 + · · · + xt+5 + 12 xt+6
m̂t = (4.3)
12
It is important to note here that our time series of the estimated trend {m̂t }
is actually shorter than the observed time series by 2a units.
Conveniently, R has the built-in function filter() for estimating moving-
average (and other) linear filters. In addition to specifying the time series to
4.2. DECOMPOSITION OF TIME SERIES 75
Now let’s get our estimate of the trend {m̂} with filter()} and plot it:
## estimate of trend
co2.trend <- filter(co2, filter = fltr, method = "convo", sides = 2)
## plot the trend
plot.ts(co2.trend, ylab = "Trend", cex = 1)
The trend is a more-or-less smoothly increasing function over time, the average
slope of which does indeed appear to be increasing over time as well (Figure
4.3).
400
Trend
360
320
Time
Figure 4.3: Time series of the estimated trend {m̂t } for the atmospheric CO2
concentration at Mauna Loa, Hawai’i.
Once we have an estimate of the trend for time t (m̂t ) we can easily obtain
an estimate of the seasonal effect at time t (ŝt ) by subtraction
This estimate of the seasonal effect for each time t also contains the random
error et , however, which can be seen by plotting the time series and careful
comparison of Equations (4.1) and (4.4).
## plot the monthly seasonal effects
plot.ts(co2.1T, ylab = "Seasonal effect", xlab = "Month", cex = 1)
4
Seasonal effect plus errors
2
0
−2
−4
Month
Figure 4.4: Time series of seasonal effects plus random errors for the atmo-
spheric CO2 concentration at Mauna Loa, Hawai’i, measured monthly from
March 1958 to present.
We can obtain the overall seasonal effect by averaging the estimates of {ŝt }
for each month and repeating this sequence over all years.
## length of ts
ll <- length(co2.1T)
## frequency (ie, 12)
ff <- frequency(co2.1T)
## number of periods (years); %/% is integer division
periods <- ll%/%ff
## index of cumulative month
index <- seq(1, ll, by = ff) - 1
## get mean by month
mm <- numeric(ff)
4.2. DECOMPOSITION OF TIME SERIES 77
for (i in 1:ff) {
mm[i] <- mean(co2.1T[index + i], na.rm = TRUE)
}
## subtract mean to make overall mean=0
mm <- mm - mean(mm)
Before we create the entire time series of seasonal effects, let’s plot them for
each month to see what is happening within a year:
## plot the monthly seasonal effects
plot.ts(mm, ylab = "Seasonal effect", xlab = "Month", cex = 1)
1
−1 0
−3
2 4 6 8 10 12
Month
Figure 4.5: Estimated monthly seasonal effects for the atmospheric CO2
concentration at Mauna Loa, Hawai’i.
Finally, let’s create the entire time series of seasonal effects {ŝt }:
## create ts object for season
co2.seas <- ts(rep(mm, periods + 1)[seq(ll)], start = start(co2.1T),
frequency = ff)
78 CHAPTER 4. BASIC TS FUNCTIONS IN R
The last step in completing our full decomposition model is obtaining the
random errors {êt }, which we can get via simple subtraction
Now that we have all 3 of our model components, let’s plot them together
with the observed data {xt }. The results are shown in Figure 4.6.
## plot the obs ts, trend & seasonal effect
plot(cbind(co2, co2.trend, co2.seas, co2.err), main = "", yax.flip = TRUE)
Now that we have seen how to estimate and plot the various components of a
classical decomposition model in a piecewise manner, let’s see how to do this
in one step in R with the function decompose(), which accepts a ts object
as input and returns an object of class decomposed.ts.
## decomposition of CO2 data
co2.decomp <- decompose(co2)
400
co2
360
320
400
co2.trend
360
320
1 2 3
co2.seas
−1
−3
1.0
co2.err
0.0
−1.0
Time
400
trend
360
320
1 2 3
seasonal
−1
−3
1.0
random
0.0
−1.0
Time
The results obtained with decompose() (Figure 4.7) are identical to those
we estimated previously.
Another nice feature of the decompose() function is that it can be used for
decomposition models with multiplicative (i.e., non-additive) errors (e.g., if
the original time series had a seasonal amplitude that increased with time).
4.3. DIFFERENCING TO REMOVE A TREND OR SEASONAL EFFECTS81
∇d xt = (1 − B)d xt , (4.7)
So, for example, a random walk is one of the most simple and widely used
time series models, but it is not stationary. We can write a random walk
model as
Applying the difference operator to Equation (4.8) will yield a time series of
Gaussian white noise errors {wt }:
∇(xt = xt−1 + wt )
xt − xt−1 = xt−1 − xt−1 + wt (4.9)
xt − xt−1 = wt
82 CHAPTER 4. BASIC TS FUNCTIONS IN R
In R we can use the diff() function for differencing a time series, which
requires 3 arguments: x (the data), lag (the lag at which to difference), and
differences (the order of differencing; d in Equation (4.7)). For example,
first-differencing a time series will remove a linear trend (i.e., differences=1);
twice-differencing will remove a quadratic trend (i.e., differences=2). In
addition, first-differencing a time series at a lag equal to the period will
remove a seasonal trend (e.g., set lag=12 for monthly data).
Let’s use diff() to remove the trend and seasonal signal from the CO2 time
series, beginning with the trend. Close inspection of Figure 4.1 would suggest
that there is a nonlinear increase in CO2 concentration over time, so we’ll set
differences=2):
## twice-difference the CO2 data
co2.D2 <- diff(co2, differences = 2)
## plot the differenced data
plot(co2.D2, ylab = expression(paste(nabla^2, "CO"[2])))
2
1
∇2CO2
0
−1
−2
Time
We were apparently successful in removing the trend, but the seasonal effect
still appears obvious (Figure 4.8). Therefore, let’s go ahead and difference
that series at lag-12 because our data were collected monthly.
4.4. CORRELATION WITHIN AND AMONG TIME SERIES 83
0
−1
−2
Time
Now we have a time series that appears to be random errors without any
obvious trend or seasonal components (Figure 4.9).
1 n−k
X
ck = (xt − x̄) (xt+k − x̄) (4.10)
n t=1
Note that the sample autocovariance of {xt } at lag 0, c0 , equals the sample
variance of {xt } calculated with a denominator of n. The sample autocorrela-
tion function (ACF) is defined as
ck
rk = = Cor(xt , xt+k ) (4.11)
c0
Recall also that an approximate 95% confidence interval on the ACF can be
estimated by
1 2
− ±√ (4.12)
n n
where n is the number of data points used in the calculation of the ACF.
It is important to remember two things here. First, although the confidence
interval is commonly plotted and interpreted as a horizontal line over all time
lags, the interval itself actually grows as the lag increases because the number
of data points n used to estimate the correlation decreases by 1 for every
integer increase in lag. Second, care must be exercised when interpreting the
“significance” of the correlation at various lags because we should expect, a
priori, that approximately 1 out of every 20 correlations will be significant
based on chance alone.
We can use the acf() function in R to compute the sample ACF (note
that adding the option type="covariance" will return the sample auto-
covariance (ACVF) instead of the ACF–type ?acf for details). Calling the
function by itself will will automatically produce a correlogram (i.e., a plot
of the autocorrelation versus time lag). The argument lag.max allows you to
set the number of positive and negative lags. Let’s try it for the CO2 data.
## correlogram of the CO2 data
acf(co2, lag.max = 36)
0.8
ACF
0.4
0.0
Lag
Now we can assign the result of acf() to a variable and then use the infor-
mation contained therein to plot the correlogram with our new plot function.
86 CHAPTER 4. BASIC TS FUNCTIONS IN R
Series co2
1.0
0.8
0.6
ACF
0.4
0.2
0.0
Lag
1.0
0.8
Correlation
0.6
0.4
0.2
0.0
0 5 10 15 20 25 30 35
Lag
Notice that all of the relevant information is still there (Figure 4.11), but
now r0 = 1 is not plotted at lag-0 and the lags on the x-axis are displayed
correctly as integers.
4.4. CORRELATION WITHIN AND AMONG TIME SERIES 87
Before we move on to the PACF, let’s look at the ACF for some deterministic
time series, which will help you identify interesting properties (e.g., trends,
seasonal effects) in a stochastic time series, and account for them in time
series models–an important topic in this course. First, let’s look at a straight
line.
## length of ts
nn <- 100
## create straight line
tt <- seq(nn)
## set up plot area
par(mfrow = c(1, 2))
## plot line
plot.ts(tt, ylab = expression(italic(x[t])))
## get ACF
line.acf <- acf(tt, plot = FALSE)
## plot ACF
plot.acf(line.acf)
Series tt
1.0
0.8
0.6
ACF
0.4
0.2
−0.2
0 5 10 15 20
Lag
The correlogram for a straight line is itself a linearly decreasing function over
time (Figure 4.12).
Now let’s examine the ACF for a sine wave and see what sort of pattern
88 CHAPTER 4. BASIC TS FUNCTIONS IN R
80 100
1.0
0.8
Correlation
60
0.6
xt
40
0.4
20
0.2
0
0.0
0 20 40 60 80 100 0 5 10 15 20
Time Lag
Figure 4.12: Time series plot of a straight line (left) and the correlogram of
its ACF (right).
arises.
## create sine wave
tt <- sin(2 * pi * seq(nn)/12)
## set up plot area
par(mfrow = c(1, 2))
## plot line
plot.ts(tt, ylab = expression(italic(x[t])))
## get ACF
sine.acf <- acf(tt, plot = FALSE)
## plot ACF
plot.acf(sine.acf)
Perhaps not surprisingly, the correlogram for a sine wave is itself a sine wave
whose amplitude decreases linearly over time (Figure 4.13).
Now let’s examine the ACF for a sine wave with a linear downward trend
and see what sort of patterns arise.
## create sine wave with trend
tt <- sin(2 * pi * seq(nn)/12) - seq(nn)/50
## set up plot area
par(mfrow = c(1, 2))
## plot line
plot.ts(tt, ylab = expression(italic(x[t])))
4.4. CORRELATION WITHIN AND AMONG TIME SERIES 89
1.0
1.0
0.5
0.5
Correlation
0.0
xt
0.0
−0.5
−1.0
−1.0
0 20 40 60 80 100 0 5 10 15 20
Time Lag
Figure 4.13: Time series plot of a discrete sine wave (left) and the correlogram
of its ACF (right).
## get ACF
sili.acf <- acf(tt, plot = FALSE)
## plot ACF
plot.acf(sili.acf)
1.0
1
0.5
0
Correlation
−1
xt
0.0
−2
−0.5
−3
−1.0
0 20 40 60 80 100 0 5 10 15 20
Time Lag
Figure 4.14: Time series plot of a discrete sine wave (left) and the correlogram
of its ACF (right).
The correlogram for a sine wave with a trend is itself a nonsymmetrical sine
wave whose amplitude and center decrease over time (Figure 4.14).
As we have seen, the ACF is a powerful tool in time series analysis for
identifying important features in the data. As we will see later, the ACF is
90 CHAPTER 4. BASIC TS FUNCTIONS IN R
also an important diagnostic tool for helping to select the proper order of p
and q in ARMA(p,q) models.
with
xk−1
k = β1 xk−1 + β2 xk−2 + · · · + βk−1 x1 ; (4.14a)
xk−1
0 = β1 x1 + β2 x2 + · · · + βk−1 xk−1 . (4.14b)
It’s easy to compute the PACF for a variable in R using the pacf() function,
which will automatically plot a correlogram when called by itself (similar to
acf()). Let’s look at the PACF for the CO2 data.
## PACF of the CO2 data
pacf(co2, lag.max = 36)
The default plot for PACF is a bit better than for ACF, but here is another
plotting function that might be useful.
## better PACF plot
plot.pacf <- function(PACFobj) {
rr <- PACFobj$acf
kk <- length(rr)
nn <- PACFobj$n.used
plot(seq(kk), rr, type = "h", lwd = 2, yaxs = "i", xaxs = "i",
ylim = c(floor(min(rr)), 1), xlim = c(0, kk + 1), xlab = "Lag",
ylab = "PACF", las = 1)
4.4. CORRELATION WITHIN AND AMONG TIME SERIES 91
0.4
0.0
Lag
Figure 4.15: Correlogram of the PACF for the observed atmospheric CO2
concentration at Mauna Loa, Hawai’i obtained with the function pacf().
Notice in Figure 4.15 that the partial autocorrelation at lag-1 is very high
(it equals the ACF at lag-1), but the other values at lags > 1 are relatively
small, unlike what we saw for the ACF. We will discuss this in more detail
later on in this lab.
Notice also that the PACF plot again has real-valued indices for the time
lag, but it does not include any value for lag-0 because it is impossible to
remove any intermediate autocorrelation between t and t − k when k = 0, and
therefore the PACF does not exist at lag-0. If you would like, you can use the
plot.acf() function we defined above to plot the PACF estimates because
acf() and pacf() produce identical list structures (results not shown here).
## PACF of the CO2 data
co2.pacf <- pacf(co2)
## correlogram of the CO2 data
plot.acf(co2.pacf)
As with the ACF, we will see later on how the PACF can also be used to help
identify the appropriate order of p and q in ARMA(p,q) models.
92 CHAPTER 4. BASIC TS FUNCTIONS IN R
1 n−k
gkxy
X
= (yt − ȳ) (xt+k − x̄) , (4.15)
n t=1
but now we are estimating the correlation between a variable y and a different
time-shifted variable xt+k . The sample cross-correlation function (CCF) is
then defined analogously to the ACF, such that
gkxy
rkxy = q ; (4.16)
SDx SDy
SDx and SDy are the sample standard deviations of {xt } and {yt }, respectively.
It is important to re-iterate here that rkxy = xy
6 r−k , but rkxy = r−k
yx
. Therefore,
it is very important to pay particular attention to which variable you call y
(i.e., the “response”) and which you call x (i.e., the “predictor”).
As with the ACF, an approximate 95% confidence interval on the CCF can
be estimated by
1 2
− ±√ (4.17)
n n
where n is the number of data points used in the calculation of the CCF, and
the same assumptions apply to its interpretation.
Computing the CCF in R is easy with the function ccf() and it works just
like acf(). In fact, ccf() is just a “wrapper” function that calls acf(). As
4.5. WHITE NOISE (WN) 93
an example, let’s examine the CCF between sunspot activity and number of
lynx trapped in Canada as in the classic paper by Moran1 .
To begin, let’s get the data, which are conveniently included in the datasets
package included as part of the base installation of R. Before calculating the
CCF, however, we need to find the matching years of data. Again, we’ll use
the ts.intersect() function.
## get the matching years of sunspot data
suns <- ts.intersect(lynx, sunspot.year)[, "sunspot.year"]
## get the matching lynx data
lynx <- ts.intersect(lynx, sunspot.year)[, "lynx"]
From Figures 4.16 and 4.17 it looks like lynx numbers are relatively low 3-5
years after high sunspot activity (i.e., significant correlation at lags of -3 to
-5).
120
80
suns
40
0
5000
lynx
2000
0
1820 1840 1860 1880 1900 1920
Time
Figure 4.16: Time series of sunspot activity (top) and lynx trappings in
Canada (bottom) from 1821-1934.
0.1 0.2
Cross−correlation
−0.1
−0.3
−15 −10 −5 0 5 10 15
Lag
Figure 4.17: CCF for annual sunspot activity and the log of the number of
lynx trappings in Canada from 1821-1934.
4.5. WHITE NOISE (WN) 95
our time series model has done an adequate job of removing all of the serial
autocorrelation in the time series with trends, seasonal effects, etc., then
the model residuals (et = yt − ŷt ) will be a WN sequence with the following
properties for its mean (ē), covariance (ck ), and autocorrelation (rk ):
x̄ = 0
q if k = 0
ck = Cov(et , et+k ) =
6 1
0 if k = (4.18)
1 if k = 0
rk = Cor(et , et+k ) =
6 1.
0 if k =
Here are plots of the time series. Notice that on one occasion the same number
was drawn twice in a row from the Poisson distribution, which is discrete.
That is virtually guaranteed to never happen with a continuous distribution.
96 CHAPTER 4. BASIC TS FUNCTIONS IN R
30
5.4
25
5.2
GWN
PWN
5.0
20
4.8
15
4.6
10
0 20 40 60 80 100 0 10 20 30 40 50
Time Time
Figure 4.18: Time series plots of simulated Gaussian (left) and Poisson (right)
white noise.
Now let’s examine the ACF for the 2 white noise series and see if there is, in
fact, zero autocorrelation for lags ≥ 1.
## set up plot region
par(mfrow = c(1, 2))
## plot normal variates with mean
acf(GWN, main = "", lag.max = 20)
## plot Poisson variates with mean
acf(PWN, main = "", lag.max = 20)
Interestingly, the rk are all greater than zero in absolute value although they
are not statistically different from zero for lags 1-20. This is because we are
dealing with a sample of the distributions rather than the entire population
of all random variates. As an exercise, try setting n=1e6 instead of n=100
or n=50 in the calls calls above to generate the WN sequences and see what
effect it has on the estimation of rk . It is also important to remember, as we
4.6. RANDOM WALKS (RW) 97
1.0
1.0
0.6
0.6
ACF
ACF
0.2
0.2
−0.2
−0.2
0 5 10 15 20 0 5 10 15 20
Lag Lag
Figure 4.19: ACF’s for the simulated Gaussian (left) and Poisson (right)
white noise shown in Figure 4.18.
xt = xt−1 + wt , (4.19)
and wt is a discrete white noise series where all values are independent and
identically distributed (IID) with a mean of zero. In practice, we will almost
always assume that the wt are Gaussian white noise, such that wt ∼ N(0, q).
We will see later that a random walk is a special case of an autoregressive
model.
98 CHAPTER 4. BASIC TS FUNCTIONS IN R
Now let’s plot the simulated time series and its ACF.
## setup plot area
par(mfrow = c(1, 2))
## plot line
plot.ts(xx, ylab = expression(italic(x[t])))
## plot ACF
plot.acf(acf(xx, plot = FALSE))
1.0
8 10
0.5
Correlation
6
4
xt
0.0
2
−0.5
−2 0
−1.0
0 20 40 60 80 100 0 5 10 15 20
Time Lag
Figure 4.20: Simulated time series of a random walk model (left) and its
associated ACF (right).
xt = xt−1 + wt
xt−1 = xt−2 + wt−1
xt−2 = xt−3 + wt−2 (4.20)
..
.
Therefore, if we substitute xt−2 + wt−1 for xt−1 in the first equation, and then
xt−3 + wt−2 for xt−2 , and so on in a recursive manner, we get
In practice, however, the time series will not start an infinite time ago, but
rather at some t = 1, in which case we can write
xt = w1 + w2 + · · · + wt
T
X (4.22)
= wt .
t=1
From Equation (4.22) it is easy to see that the value of an RW process at time
step t is the sum of all the random errors up through time t. Therefore, in R
we can easily simulate a realization from an RW process using the cumsum(x)
100 CHAPTER 4. BASIC TS FUNCTIONS IN R
function, which does cumulative summation of the vector x over its entire
length. If we use the same errors as before, we should get the same results.
## simulate RW
x2 <- cumsum(ww)
8 10
6
6
4
4
xt
xt
2
2
−2 0
−2 0
0 20 40 60 80 100 0 20 40 60 80 100
Time Time
Figure 4.21: Time series of the same random walk model formulated as
Equation (4.19) and simulated via a for loop (left), and as Equation (4.22)
and simulated via cumsum() (right).
where {wt } is a white noise sequence with zero mean and some variance σ 2 .
For our purposes we usually assume that wt ∼ N(0, q). Note that the random
walk in Equation (4.19) is a special case of an AR(1) model where φ1 = 1
and φk = 0 for k ≥ 2.
Let’s begin by simulating some AR(1) models and comparing their behavior.
First, let’s choose models with contrasting AR coefficients. Recall that in
order for an AR(1) model to be stationary, φ < |1|, so we’ll try 0.1 and 0.9.
We’ll again set the random number seed so we will get the same answers.
set.seed(456)
## list description for AR(1) model with small coef
AR.sm <- list(order = c(1, 0, 0), ar = 0.1, sd = 0.1)
## list description for AR(1) model with large coef
AR.lg <- list(order = c(1, 0, 0), ar = 0.9, sd = 0.1)
## simulate AR(1)
AR1.sm <- arima.sim(n = 50, model = AR.sm)
AR1.lg <- arima.sim(n = 50, model = AR.lg)
φ = 0.1 φ = 0.9
1 2
1 2
−1
−1
xt
xt
−3
−3
−5
−5
0 10 20 30 40 50 0 10 20 30 40 50
Time Time
Figure 4.22: Time series of simulated AR(1) processes with φ = 0.1 (left) and
φ = 0.9 (right).
4.7. AUTOREGRESSIVE (AR) MODELS 103
What do you notice about the two plots in Figure 4.22? It looks like the time
series with the smaller AR coefficient is more “choppy” and seems to stay
closer to 0 whereas the time series with the larger AR coefficient appears to
wander around more. Remember that as the coefficient in an AR(1) model
goes to 0, the model approaches a WN sequence, which is stationary in both
the mean and variance. As the coefficient goes to 1, however, the model
approaches a random walk, which is not stationary in either the mean or
variance.
Next, let’s generate two AR(1) models that have the same magnitude coefi-
cient, but opposite signs, and compare their behavior.
set.seed(123)
## list description for AR(1) model with small coef
AR.pos <- list(order = c(1, 0, 0), ar = 0.5, sd = 0.1)
## list description for AR(1) model with large coef
AR.neg <- list(order = c(1, 0, 0), ar = -0.5, sd = 0.1)
## simulate AR(1)
AR1.pos <- arima.sim(n = 50, model = AR.pos)
AR1.neg <- arima.sim(n = 50, model = AR.neg)
Now it appears like both time series vary around the mean by about the same
amount, but the model with the negative coefficient produces a much more
“sawtooth” time series. It turns out that any AR(1) model with −1 < φ < 0
will exhibit the 2-point oscillation you see here.
We can simulate higher order AR(p) models in the same manner, but care
must be exercised when choosing a set of coefficients that result in a stationary
104 CHAPTER 4. BASIC TS FUNCTIONS IN R
φ1 = 0.5 φ1 = −0.5
3
2
2
1
1
xt
xt
0
0
−2 −1
−2 −1
0 10 20 30 40 50 0 10 20 30 40 50
Time Time
Figure 4.23: Time series of simulated AR(1) processes with φ1 = 0.5 (left)
and φ1 = −0.5 (right).
model or else arima.sim() will fail and report an error. For example, an
AR(2) model with both coefficients equal to 0.5 is not stationary, and therefore
this function call will not work:
arima.sim(n = 100, model = list(order(2, 0, 0), ar = c(0.5, 0.5)))
If you try, R will respond that the “'ar' part of model is not
stationary”.
Let’s review what we learned in lecture about the general behavior of the ACF
and PACF for AR(p) models. To do so, we’ll simulate four stationary AR(p)
models of increasing order p and then examine their ACF’s and PACF’s. Let’s
use a really big n so as to make them “pure”, which will provide a much
better estimate of the correlation structure.
set.seed(123)
## the 4 AR coefficients
ARp <- c(0.7, 0.2, -0.1, -0.3)
## empty list for storing models
AR.mods <- list()
## loop over orders of p
for (p in 1:4) {
4.8. MOVING-AVERAGE (MA) MODELS 105
Now that we have our four AR(p) models, lets look at plots of the time series,
ACF’s, and PACF’s.
## set up plot region
par(mfrow = c(4, 3))
## loop over orders of p
for (p in 1:4) {
plot.ts(AR.mods[[p]][1:50], ylab = paste("AR(", p, ")", sep = ""))
acf(AR.mods[[p]], lag.max = 12)
pacf(AR.mods[[p]], lag.max = 12, ylab = "PACF")
}
As we saw in lecture and is evident from our examples shown in Figure 4.24,
the ACF for an AR(p) process tails off toward zero very slowly, but the PACF
goes to zero for lags > p. This is an important diagnostic tool when trying to
identify the order of p in ARMA(p, q) models.
where {wt } is a white noise sequence with zero mean and some variance σ 2 ; for
our purposes we usually assume that wt ∼ N(0, q). Of particular note is that
because MA processes are finite sums of stationary errors, they themselves
are stationary.
Of interest to us are so-called “invertible” MA processes that can be expressed
as an infinite AR process with no error term. The term invertible comes from
the inversion of the backshift operator (B) that we discussed in class (i.e.,
106 CHAPTER 4. BASIC TS FUNCTIONS IN R
0.6
0.8
1
AR(1)
PACF
ACF
0.3
0.4
−1
0.0
0.0
−3
0 20 40 0 4 8 12 2 6 10
Series AR.mods[[p]] Series AR.mods[[p]]
Time Lag Lag
0.8
0.8
0
AR(2)
PACF
ACF
−2
0.4
0.4
−5
0.0
0.0
0 20 40 0 4 8 12 2 6 10
Series AR.mods[[p]] Series AR.mods[[p]]
Time Lag Lag
0.8
0.8
0
AR(3)
PACF
0.4
ACF
0.4
−2
0.0
0.0
−4
0 20 40 0 4 8 12 2 6 10
Series AR.mods[[p]] Series AR.mods[[p]]
Time Lag Lag
3
0.4
0.5
−1 1
AR(4)
PACF
ACF
−0.2
−0.5
−4
0 20 40 0 4 8 12 2 6 10
Bxt = xt−1 ). So, for example, an MA(1) process with θ < |1| is invertible
because it can be written using the backshift operator as
xt = wt − θwt−1
xt = wt − θBwt
xt = (1 − θB)wt ,
⇓
(4.25)
1
wt = xt
(1 − θB)
wt = (1 + θB + θ2 B2 + θ3 B3 + . . . )xt
wt = xt + θxt−1 + θ2 xt−2 + θ3 xt−3 + . . .
We can simulate MA(q) processes just as we did for AR(p) processes using
arima.sim(). Here are 3 different ones with contrasting θ’s:
set.seed(123)
## list description for MA(1) model with small coef
MA.sm <- list(order = c(0, 0, 1), ma = 0.2, sd = 0.1)
## list description for MA(1) model with large coef
MA.lg <- list(order = c(0, 0, 1), ma = 0.8, sd = 0.1)
## list description for MA(1) model with large coef
MA.neg <- list(order = c(0, 0, 1), ma = -0.5, sd = 0.1)
## simulate MA(1)
MA1.sm <- arima.sim(n = 50, model = MA.sm)
MA1.lg <- arima.sim(n = 50, model = MA.lg)
MA1.neg <- arima.sim(n = 50, model = MA.neg)
2
3
2
1
1
1
0
xt
xt
xt
0
−1
−1
−2
−2
−2
0 20 40 0 20 40 0 20 40
Figure 4.25: Time series of simulated MA(1) processes with θ = 0.2 (left),
θ = 0.8 (middle), and θ = −0.5 (right).
We saw in lecture and above how the ACF and PACF have distinctive features
for AR(p) models, and they do for MA(q) models as well. Here are examples
of four MA(q) processes. As before, we’ll use a really big n so as to make
them “pure”, which will provide a much better estimate of the correlation
structure.
set.seed(123)
## the 4 MA coefficients
MAq <- c(0.7, 0.2, -0.1, -0.3)
## empty list for storing models
MA.mods <- list()
## loop over orders of q
4.9. AUTOREGRESSIVE MOVING-AVERAGE (ARMA) MODELS 109
for (q in 1:4) {
## assume SD=1, so not specified
MA.mods[[q]] <- arima.sim(n = 1000, list(ma = MAq[1:q]))
}
Now that we have our four MA(q) models, lets look at plots of the time series,
ACF’s, and PACF’s.
## set up plot region
par(mfrow = c(4, 3))
## loop over orders of q
for (q in 1:4) {
plot.ts(MA.mods[[q]][1:50], ylab = paste("MA(", q, ")", sep = ""))
acf(MA.mods[[q]], lag.max = 12)
pacf(MA.mods[[q]], lag.max = 12, ylab = "PACF")
}
Note very little qualitative difference in the realizations of the four MA(q)
processes (Figure 4.26). As we saw in lecture and is evident from our examples
here, however, the ACF for an MA(q) process goes to zero for lags > q, but
the PACF tails off toward zero very slowly. This is an important diagnostic
tool when trying to identify the order of q in ARMA(p, q) models.
ARMA(p, q) models have a rich history in the time series literature, but they
are not nearly as common in ecology as plain AR(p) models. As we discussed
in lecture, both the ACF and PACF are important tools when trying to
identify the appropriate order of p and q. Here we will see how to simulate
time series from AR(p), MA(q), and ARMA(p, q) processes, as well as fit time
series models to data based on insights gathered from the ACF and PACF.
0.2
MA(1)
PACF
ACF
0
−0.2
−2
0 20 40 0 4 8 12 2 6 10
Series MA.mods[[q]] Series MA.mods[[q]]
Time Lag Lag
4
PACF
0.2
2
ACF
−0.2
−2
0 20 40 0 4 8 12 2 6 10
Series MA.mods[[q]] Series MA.mods[[q]]
Time Lag Lag
0.0 0.4 0.8
2
MA(3)
0.2
PACF
ACF
0
−0.2
−2
0 20 40 0 4 8 12 2 6 10
Series MA.mods[[q]] Series MA.mods[[q]]
Time Lag Lag
1.0
2
0.2
MA(4)
PACF
ACF
0.4
0
−0.2
−0.2
−3
0 20 40 0 4 8 12 2 6 10
xt = φ1 xt−1 +φ2 xt−2 +· · ·+φp xt−p +wt +θwt−1 +θ2 wt−2 +· · ·+θq xt−q , (4.26)
We have already seen how to simulate AR(p) and MA(q) models with
arima.sim(); the same concepts apply to ARMA(p, q) models and therefore
we will not do that here. Instead, we will move on to fitting ARMA(p, q)
models when we only have a realization of the process (i.e., data) and do not
know the underlying parameters that generated it.
The function arima() accepts a number of arguments, but two of them are
most important:
• x a univariate time series
• order a vector of length 3 specifying the order of ARIMA(p,d,q) model
In addition, note that by default arima() will estimate an underlying mean
of the time series unless d > 0. For example, an AR(1) process with mean µ
would be written
xt = µ + φ(xt−1 − µ) + wt . (4.27)
If you know for a fact that the time series data have a mean of zero (e.g., you
already subtracted the mean from them), you should include the argument
include.mean=FALSE, which is set to TRUE by default. Note that ignoring
and not estimating a mean in ARMA(p, q) models when one exists will bias
the estimates of all other parameters.
Let’s see an example of how arima() works. First we’ll simulate an
ARMA(2,2) model and then estimate the parameters to see how well we can
recover them. In addition, we’ll add in a constant to create a non-zero mean,
which arima() reports as intercept in its output.
112 CHAPTER 4. BASIC TS FUNCTIONS IN R
set.seed(123)
## ARMA(2,2) description for arim.sim()
ARMA22 <- list(order = c(2, 0, 2), ar = c(-0.7, 0.2), ma = c(0.7,
0.2))
## mean of process
mu <- 5
## simulated process (+ mean)
ARMA.sim <- arima.sim(n = 10000, model = ARMA22) + mu
## estimate parameters
arima(x = ARMA.sim, order = c(2, 0, 2))
Call:
arima(x = ARMA.sim, order = c(2, 0, 2))
Coefficients:
ar1 ar2 ma1 ma2 intercept
-0.7079 0.1924 0.6912 0.2001 4.9975
s.e. 0.0291 0.0284 0.0289 0.0236 0.0125
In an ideal situation, you could examine the ACF and PACF of the time
series of interest and immediately decipher what orders of p and q must have
generated the data, but that doesn’t always work in practice. Instead, we are
often left with the task of searching over several possible model forms and
seeing which of them provides the most parsimonious fit to the data. There
are two easy ways to do this for ARIMA models in R. The first is to write a
4.9. AUTOREGRESSIVE MOVING-AVERAGE (ARMA) MODELS 113
little script that loops ove the possible dimensions of p and q. Let’s try that
for the process we simulated above and search over orders of p and q from
0-3 (it will take a few moments to run and will likely report an error about a
“possible convergence problem”, which you can ignore).
## empty list to store model fits
ARMA.res <- list()
## set counter
cc <- 1
## loop over AR
for (p in 0:3) {
## loop over MA
for (q in 0:3) {
ARMA.res[[cc]] <- arima(x = ARMA.sim, order = c(p, 0,
q))
cc <- cc + 1
}
}
Call:
arima(x = ARMA.sim, order = c(p, 0, q))
Coefficients:
ar1 ar2 ma1 ma2 intercept
-0.7079 0.1924 0.6912 0.2001 4.9975
s.e. 0.0291 0.0284 0.0289 0.0236 0.0125
conduct an automatic search over all possible orders of ARIMA models that
you specify. For details, type ?auto.arima after loading the package. Let’s
repeat our search using the same criteria.
## find best ARMA(p,q) model
auto.arima(ARMA.sim, start.p = 0, max.p = 3, start.q = 0, max.q = 3)
Series: ARMA.sim
ARIMA(2,0,2) with non-zero mean
Coefficients:
ar1 ar2 ma1 ma2 mean
-0.7079 0.1924 0.6912 0.2001 4.9975
s.e. 0.0291 0.0284 0.0289 0.0236 0.0125
4.10 Problems
We have seen how to do a variety of introductory time series analyses with R.
Now it is your turn to apply the information you learned here and in lecture to
complete some analyses. You have been asked by a colleague to help analyze
some time series data she collected as part of an experiment on the effects of
light and nutrients on the population dynamics of phytoplankton. Specifically,
after controlling for differences in light and temperature, she wants to know
if the natural log of population density can be modeled with some form of
ARMA(p, q) model.
The data are expressed as the number of cells per milliliter recorded every
hour for one week beginning at 8:00 AM on December 1, 2014. You can load
the data using
data(hourlyphyto, package = "atsalibrary")
pDat <- hourlyphyto
2. Plot the time series of phytoplankton density and provide a brief de-
scription of any notable features.
3. Although you do not have the actual measurements for the specific
temperature and light regimes used in the experiment, you have been
informed that they follow a regular light/dark period with accompanying
warm/cool temperatures. Thus, estimating a fixed seasonal effect is
justifiable. Also, the instrumentation is precise enough to preclude any
systematic change in measurements over time (i.e., you can assume
mt = 0 for all t). Obtain the time series of the estimated log-density
of phytoplankton absent any hourly effects caused by variation in
temperature or light. (Hint: You will need to do some decomposition.)
4. Use diagnostic tools to identify the possible order(s) of ARMA model(s)
116 CHAPTER 4. BASIC TS FUNCTIONS IN R
that most likely describes the log of population density for this particular
experiment. Note that at this point you should be focusing your analysis
on the results obtained in Question 3.
5. Use some form of search to identify what form of ARMA(p, q) model best
describes the log of population density for this particular experiment.
Use what you learned in Question 4 to inform possible orders of p and
q. (Hint: if you use auto.arima(), include the additional argument
seasonal=FALSE)
6. Write out the best model in the form of Equation (4.26) using the
underscore notation to refer to subscripts (e.g., write x_t for xt ). You
can round any parameters/coefficients to the nearest hundreth. (Hint:
if the mean of the time series is not zero, refer to Eqn 1.27 in the lab
handout).
Chapter 5
Box-Jenkins method
In this chapter, you will practice selecting and fitting an ARIMA model to
catch data using the Box-Jenkins method. After fitting a model, you will
prepare simple forecasts using the forecast package.
We will use the catch landings from Greek waters (greeklandings) and
the Chinook landings (chinook) in Washington data sets for this chapter.
These datasets are in the atsalibrary package on GitHub. Install using the
devtools package.
library(devtools)
devtools::install_github("nwfsc-timeseries/atsalibrary")
117
118 CHAPTER 5. BOX-JENKINS METHOD
library(ggplot2)
library(gridExtra)
library(reshape2)
library(tseries)
library(urca)
library(forecast)
5.2 Stationarity
We will start by looking at white noise and a stationary AR(1) process from
simulated data. White noise is simply a string of random numbers drawn
from a Normal distribution. rnorm() with return random numbers drawn
from a Normal distribution. Use ?rnorm to understand what the function
requires.
TT <- 100
y <- rnorm(TT, mean = 0, sd = 1) # 100 random numbers
op <- par(mfrow = c(1, 2))
plot(y, type = "l")
acf(y)
Series y
1.0
2
0.8
1
0.6
ACF
0.4
0
y
0.2
−1
−2
−0.2
0 20 40 60 80 100 0 5 10 15 20
Index Lag
par(op)
ys$id = 1:TT
1
value
−1
−2
0 25 50 75 100
2
value
−2
0 25 50 75 100
These are stationary because the variance and mean (level) does not change
with time.
2
0
ar1
−2
−4
0 20 40 60 80 100
Time
We can use ggplot to plot 10 AR(1) time series, but we need to change the
data to a data frame.
dat <- data.frame(t = 1:TT, y = ar1)
p1 <- ggplot(dat, aes(x = t, y = y)) + geom_line() + ggtitle("AR-1") +
xlab("") + ylab("value")
ys <- matrix(0, TT, nsim)
for (i in 1:nsim) ys[, i] <- as.vector(arima.sim(TT, model = list(ar = theta)))
ys <- data.frame(ys)
ys$id <- 1:TT
Don't know how to automatically pick scale for object of type ts. Defaulting to contin
122 CHAPTER 5. BOX-JENKINS METHOD
AR−1
2.5
0.0
value
−2.5
−5.0
0 25 50 75 100
2.5
value
0.0
−2.5
−5.0
0 25 50 75 100
See how the white noise with trend is just the white noise overlaid on a linear
trend.
op <- par(mfrow = c(1, 3))
plot(wn, type = "l")
plot(trend * 1:TT)
plot(wnti, type = "l")
5.2. STATIONARITY 123
3.0
2.0
1.0
2.5
1.5
0.5
2.0
trend * 1:TT
1.5
wnti
wn
1.0
0.0
1.0
0.5
0.5
−0.5
0.0
5 10 15 20 5 10 15 20 5 10 15 20
par(op)
0.5 1.0 2
wnti
wni
wn
0.0 0.5
1
−0.5 0.0
5 10 15 20 5 10 15 20 5 10 15 20
t t t
We can make a similar plot with AR(1) data. Ignore the warnings about not
knowing how to pick the scale.
beta1 <- 0.8
ar1 <- arima.sim(TT, model = list(ar = beta1), sd = sd)
ar1i <- ar1 + intercept
ar1ti <- ar1 + trend * (1:TT) + intercept
dat <- data.frame(t = 1:TT, ar1 = ar1, ar1i = ar1i, ar1ti = ar1ti)
p4 <- ggplot(dat, aes(x = t, y = ar1)) + geom_line() + ggtitle("AR1")
p5 <- ggplot(dat, aes(x = t, y = ar1i)) + geom_line() + ggtitle("with non-zero
p6 <- ggplot(dat, aes(x = t, y = ar1ti)) + geom_line() + ggtitle("with linear
Don't know how to automatically pick scale for object of type ts. Defaulting t
Don't know how to automatically pick scale for object of type ts. Defaulting t
Don't know how to automatically pick scale for object of type ts. Defaulting t
5.2. STATIONARITY 125
3.0
1.4
0.8 2.5
ar1ti
ar1i
ar1
2.0
1.0
0.4
1.5
0.6
0.0 1.0
5 10 15 20 5 10 15 20 5 10 15 20
t t t
We will look at the anchovy data. Notice the two == in the subset call not
one =. We will use the Greek data before 1989 for the lab.
anchovy <- subset(landings, Species == "Anchovy" & Year <= 1989)$log.metric.tons
anchovyts <- ts(anchovy, start = 1964)
10.0
9.5
log catch
9.0
8.5
Time
Questions to ask.
• Does it have a trend (goes up or down)? Yes, definitely
• Does it have a non-zero mean? Yes
• Does it look like it might be stationary around a trend? Maybe
The null hypothesis for both tests is that the data are non-stationary. We
want to REJECT the null hypothesis for this test, so we want a p-value of
less that 0.05 (or smaller).
Let’s start by doing the test on data that we know are stationary, white
noise. We will use an Augmented Dickey-Fuller test where we use the default
number of lags (amount of time-dependency) in our test. For a time-series of
100, this is 4.
TT <- 100
wn <- rnorm(TT) # white noise
tseries::adf.test(wn)
128 CHAPTER 5. BOX-JENKINS METHOD
data: wn
Dickey-Fuller = -4.8309, Lag order = 4, p-value = 0.01
alternative hypothesis: stationary
The null hypothesis is rejected.
Try a Dickey-Fuller test. This is testing with a null hypothesis of AR(1)
stationarity versus a null hypothesis with AR(4) stationarity when we used
the default k.
tseries::adf.test(wn, k = 0)
data: wn
Dickey-Fuller = -10.122, Lag order = 0, p-value = 0.01
alternative hypothesis: stationary
Notice that the test-statistic is smaller. This is a more restrictive test and we
can reject the null with a higher significance level.
data: wnt
Dickey-Fuller = -4.8309, Lag order = 4, p-value = 0.01
alternative hypothesis: stationary
The null hypothesis is still rejected. adf.test() uses a model that allows an
intercept and trend.
data: rw
Dickey-Fuller = -2.3038, Lag order = 4, p-value = 0.4508
alternative hypothesis: stationary
The null hypothesis is NOT rejected as the p-value is greater than 0.05.
Try a Dickey-Fuller test.
tseries::adf.test(rw, k = 0)
data: rw
Dickey-Fuller = -1.7921, Lag order = 0, p-value = 0.6627
alternative hypothesis: stationary
Notice that the test-statistic is larger.
tseries::adf.test(anchovyts)
130 CHAPTER 5. BOX-JENKINS METHOD
data: anchovyts
Dickey-Fuller = -1.6851, Lag order = 2, p-value = 0.6923
alternative hypothesis: stationary
The p-value is greater than 0.05. We cannot reject the null hypothesis. The
null hypothesis is that the data are non-stationary.
Let’s first do the test on data we know is stationary, white noise. We have to
choose the type and lags. If you have no particular reason to not include an
intercept and trend, then use type="trend". This allows both intercept and
trend. When you might you have a particular reason not to use "trend"?
When you have removed the trend and/or intercept.
Next you need to chose the lags. We will use lags=0 to do the Dickey-Fuller
test. Note the number of lags you can test will depend on the amount of data
that you have. adf.test() used a default of trunc((length(x)-1)ˆ(1/3))
for the lags, but ur.df() requires that you pass in a value or use a fixed
default of 1.
5.3. DICKEY-FULLER AND AUGMENTED DICKEY-FULLER TESTS131
lags=0 is fitting this model to the data. You are testing if the effect for
z.lag.1 is 0.
z.diff = gamma * z.lag.1 + intercept + trend * tt z.diff means
∆yt and z.lag.1 is yt−1 .
When you use summary() for the output from ur.df(), you will see the
estimated values for γ (denoted z.lag.1), intercept and trend. If you see
*** or ** on the coefficients list for z.lag.1, it indicates that the effect of
z.lag.1 is significantly different than 0 and this supports the assumption of
stationarity.
The intercept and tt estimates indicate where there is a non-zero level
(intercept) or linear trend (tt).
wn <- rnorm(TT)
test <- urca::ur.df(wn, type = "trend", lags = 0)
summary(test)
###############################################
# Augmented Dickey-Fuller Test Unit Root Test #
###############################################
Call:
lm(formula = z.diff ~ z.lag.1 + 1 + tt)
Residuals:
Min 1Q Median 3Q Max
-2.2170 -0.6654 -0.1210 0.5311 2.6277
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.0776865 0.2037709 0.381 0.704
z.lag.1 -1.0797598 0.1014244 -10.646 <2e-16 ***
tt 0.0004891 0.0035321 0.138 0.890
---
132 CHAPTER 5. BOX-JENKINS METHOD
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The coefficient part of the summary indicates that z.lag.1 is different than
0 (so stationary) and no support for intercept or trend.
Notice that the test statistic is LESS than the critical value for tau3 at 5
percent. This means the null hypothesis is rejected at α = 0.05, a standard
level for significance testing.
If you remove the trend (and/or level) from your data, the ur.df() test
allows you to increase the power of the test by removing the trend and/or
level from the model.
The null hypothesis for the KPSS test is that the data are stationary. For
this test, we do NOT want to reject the null hypothesis. In other words, we
want the p-value to be greater than 0.05 not less than 0.05.
5.4. KPSS TEST 133
Let’s try the KPSS test on white noise with a trend. The default is a null
hypothesis with no trend. We will change this to null="Trend".
tseries::kpss.test(wnt, null = "Trend")
data: wnt
KPSS Trend = 0.045579, Truncation lag parameter = 4, p-value = 0.1
The p-value is greater than 0.05. The null hypothesis of stationarity around
a trend is not rejected.
Let’s try the KPSS test on white noise with a trend but let’s use the default
of stationary with no trend.
tseries::kpss.test(wnt, null = "Level")
data: wnt
KPSS Level = 2.1029, Truncation lag parameter = 4, p-value = 0.01
The p-value is less than 0.05. The null hypothesis of stationarity around
a level is rejected. This is white noise around a trend so it is definitely a
stationary process but has a trend. This illustrates that you need to be
thoughtful when applying stationarity tests.
data: anchovyts
KPSS Trend = 0.14779, Truncation lag parameter = 2, p-value = 0.04851
The null is rejected (p-value less than 0.05). Again stationarity is not sup-
ported.
xt − xt−1 = et , et ∼ N (0, σ)
adf.test(diff(rw))
data: diff(rw)
Dickey-Fuller = -3.8711, Lag order = 4, p-value = 0.01834
alternative hypothesis: stationary
kpss.test(diff(rw))
data: diff(rw)
KPSS Level = 0.30489, Truncation lag parameter = 3, p-value = 0.1
If we difference random walk data, the null is rejected for the ADF test and
not rejected for the KPSS test. This is what we want.
Let’s try a single difference with the anchovy data. A single difference means
dat(t)-dat(t-1). We get this using diff(anchovyts).
diff1dat <- diff(anchovyts)
adf.test(diff1dat)
data: diff1dat
Dickey-Fuller = -3.2718, Lag order = 2, p-value = 0.09558
alternative hypothesis: stationary
kpss.test(diff1dat)
data: diff1dat
KPSS Level = 0.089671, Truncation lag parameter = 2, p-value = 0.1
If a first difference were not enough, we would try a second difference which
is the difference of a first difference.
diff2dat <- diff(diff1dat)
adf.test(diff2dat)
data: diff2dat
Dickey-Fuller = -4.8234, Lag order = 2, p-value = 0.01
136 CHAPTER 5. BOX-JENKINS METHOD
The null hypothesis of a random walk is now rejected so you might think that
a 2nd difference is needed for the anchovy data. However the actual problem
is that the default for adf.test() includes a trend but we removed the trend
with our first difference. Thus we included an unneeded trend parameter in
our test. Our data are not that long and this affects the result.
Let’s repeat without the trend and we’ll see that the null hypothesis is rejected.
The number of lags is set to be what would be used by adf.test(). See
?adf.test.
k <- trunc((length(diff1dat) - 1)^(1/3))
test <- urca::ur.df(diff1dat, type = "drift", lags = k)
summary(test)
###############################################
# Augmented Dickey-Fuller Test Unit Root Test #
###############################################
Call:
lm(formula = z.diff ~ z.lag.1 + 1 + z.diff.lag)
Residuals:
Min 1Q Median 3Q Max
-0.37551 -0.13887 0.04753 0.13277 0.28223
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.11062 0.06165 1.794 0.08959 .
z.lag.1 -2.16711 0.64900 -3.339 0.00365 **
z.diff.lag1 0.58837 0.47474 1.239 0.23113
z.diff.lag2 0.13273 0.25299 0.525 0.60623
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
5.6. SUMMARY: STATIONARITY TESTING 137
5.5.1 ndiffs()
[1] 1
forecast::ndiffs(anchovyts, test = "adf")
[1] 1
One difference is required to pass both the ADF and KPSS stationarity tests.
Simulate AR(2) data and add a mean level so that the data are not mean 0.
xt = 0.8xt−1 + 0.1xt−2 + et yt = xt + m
m <- 1
ar2 <- arima.sim(n = 1000, model = list(ar = c(0.8, 0.1))) +
m
Series: ar2
ARIMA(2,0,0) with non-zero mean
5.7. ESTIMATING ARMA PARAMETERS 139
Coefficients:
ar1 ar2 mean
0.7684 0.1387 0.9561
s.e. 0.0314 0.0314 0.3332
yt = m + 0.8yt−1 + 0.1yt−2 + et
It is this model:
xt = 0.8xt−1 + 0.1xt−2 + et yt = xt + m
Call:
arima(x = ar2, order = c(2, 0, 0), include.mean = TRUE)
Coefficients:
ar1 ar2 intercept
0.7684 0.1387 0.9561
s.e. 0.0314 0.0314 0.3332
Series: ar1
ARIMA(1,0,0) with non-zero mean
Coefficients:
ar1 mean
0.7091 0.4827
s.e. 0.0705 0.3847
Simulate ARMA(1,2)
Series: arma12
ARIMA(1,0,2) with non-zero mean
Coefficients:
5.7. ESTIMATING ARMA PARAMETERS 141
Create some AR(2) data and then add missing values (NA).
ar2miss <- arima.sim(n = 100, model = list(ar = c(0.8, 0.1)))
ar2miss[sample(100, 50)] <- NA
plot(ar2miss, type = "l")
title("many missing values")
0
−2
−4
0 20 40 60 80 100
Time
Fit
142 CHAPTER 5. BOX-JENKINS METHOD
Series: ar2miss
ARIMA(2,0,0) with non-zero mean
Coefficients:
ar1 ar2 mean
1.0625 -0.2203 -0.0586
s.e. 0.1555 0.1618 0.6061
Note fitted() does not return the expected value at time t. It is the expected
value of yt given the data up to time t − 1.
plot(ar2miss, type = "l")
title("many missing values")
lines(fitted(fit), col = "blue")
0
−2
−4
0 20 40 60 80 100
Time
It is easy enough to get the expected value of yt for all the missing values but
we’ll learn to do that when we learn the MARSS package and can apply the
5.8. ESTIMATING THE ARMA ORDERS 143
forecast::auto.arima(ar2)
Series: ar2
ARIMA(2,0,2) with non-zero mean
Coefficients:
ar1 ar2 ma1 ma2 mean
0.2795 0.5938 0.4861 -0.0943 0.9553
s.e. 1.1261 1.0413 1.1284 0.1887 0.3398
Works with missing data too though might not estimate very close to the
true model form.
forecast::auto.arima(ar2miss)
Series: ar2miss
ARIMA(0,1,0)
Let’s fit to 100 simulated data sets and see how often the true (generating)
model form is selected.
save.fits <- rep(NA, 100)
for (i in 1:100) {
a2 <- arima.sim(n = 100, model = list(ar = c(0.8, 0.1)))
fit <- auto.arima(a2, seasonal = FALSE, max.d = 0, max.q = 0)
save.fits[i] <- paste0(fit$arma[1], "-", fit$arma[2])
}
table(save.fits)
save.fits
1-0 2-0 3-0
71 22 7
auto.arima() uses AICc for selection by default. You can change that to
AIC or BIC using ic="aic" or ic="bic".
Repeat the simulation using AIC and BIC to see how the choice of the infor-
mation criteria affects the model that is selected.
5.8.3 Trace=TRUE
Coefficients:
ar1 ar2 ma1 ma2 mean
0.2795 0.5938 0.4861 -0.0943 0.9553
s.e. 1.1261 1.0413 1.1284 0.1887 0.3398
5.8.4 stepwise=FALSE
Coefficients:
ar1 ar2 mean
0.7684 0.1387 0.9561
s.e. 0.0314 0.0314 0.3332
Series: anchovyts
ARIMA(0,1,1) with drift
Coefficients:
ma1 drift
-0.6685 0.0542
s.e. 0.1977 0.0142
xt = et + b1 et−1 + b2 et−2
xt = et − θ1 et−1 − θ2 et−2
Box-Ljung test
data: res
X-squared = 5.1609, df = 10, p-value = 0.8802
checkresiduals() in the forecast package will automate this test and show
some standard diagnostics plots.
forecast::checkresiduals(fit)
5.10. FORECAST FROM A FITTED ARIMA MODEL 149
0.2
0.0
−0.2
−0.4
1965 1970 1975 1980 1985 1990
0.4 10.0
0.2 7.5
count
ACF
0.0 5.0
−0.2 2.5
−0.4 0.0
1 2 3 4 5 6 7 8 9 −0.4 0.0 0.4
Lag residuals
Ljung-Box test
We can create a forecast from our anchovy ARIMA model using forecast().
The shading is the 80% and 95% prediction intervals.
fr <- forecast::forecast(fit, h = 10)
plot(fr)
150 CHAPTER 5. BOX-JENKINS METHOD
The Chinook data are monthly and start in January 1990. To make this into
a ts object do
chinookts <- ts(chinook$log.metric.tons, start = c(1990, 1),
frequency = 12)
start is the year and month and frequency is the number of months in the
year.
plot(chinookts)
5.11. SEASONAL ARIMA MODEL 151
6
4
chinookts
2
0
−2
Time
auto.arima() will recognize that our data has season and fit a seasonal
ARIMA model to our data by default. Let’s define the training data up to
1998 and use 1999 as the test data.
traindat <- window(chinookts, c(1990, 10), c(1998, 12))
testdat <- window(chinookts, c(1999, 1), c(1999, 12))
fit <- forecast::auto.arima(traindat)
fit
Series: traindat
ARIMA(1,0,0)(0,1,0)[12] with drift
Coefficients:
ar1 drift
0.3676 -0.0320
s.e. 0.1335 0.0127
5.13 Problems
For these problems, use the catch landings from Greek waters
(greeklandings) and the Chinook landings (chinook) in Washington
data. Load the data as follows:
data(greeklandings, package = "atsalibrary")
landings <- greeklandings
data(chinook, package = "atsalibrary")
chinook <- chinook.month
a. Do a Dickey-Fuller (DF) test using `ur.df()` and `adf.test()`. You have to set the
154 CHAPTER 5. BOX-JENKINS METHOD
a. Do an Augmented Dickey-Fuller (ADF) test using `ur.df()`. How did you choos
b. Do a KPSS test using `kpss.test()`. What does the result tell you?
4. Using the anchovy 1964-1987 data, fit using auto.arima() with
trace=TRUE.
forecast::auto.arima(anchovy, trace = TRUE)
a. Fit each of the models listed using `Arima()` and show that you can produce
b. What models are within $\Delta$AIC of 2? What is different about these mode
5. Repeat the stationarity tests and differencing tests for anchovy using
the following two time ranges: 1964-1987 and 1988-2007. The following
shows you how to subset the data:
datdf <- subset(landings, Species == "Anchovy")
dat <- ts(datdf$log.metric.tons, start = 1964)
dat64.87 <- window(dat, start = 1964, end = 1987)
a. Plot the time series for the two time periods. For the `kpss.test()`, which
a. Do the conclusions regarding stationarity and the amount of differencing ne
c. Fit each time period using `auto.arima()`. Do the selected models change?
d. Discuss the best models for each time period. How are they different?
e. You cannot compare the AIC values for an Arima(0,1,0) and Arima(0,0,1). Why
6. For the anchovy 1964-2007 data, use auto.arima() with stepwise=FALSE
to fit models.
a. find the set of models within ∆AICc = 2 of the top model.
b. Use Arima() to fit the models with Inf or -Inf in the list. Does the
set of models within ∆AICc = 2 change?
c. Create a 5-year forecast for each of the top 3 models according to
AICc.
d. Plot the forecast with the 2014 and 2015 actual landings added as
data points.
e. The model from part b has drift. Fit this model using Arima()
without drift and compare the 2015 forecast with this model.
156 CHAPTER 5. BOX-JENKINS METHOD
Chapter 6
This chapter will show you how to fit some basic univariate state-space
models using the MARSS package, the StructTS() function, and JAGS
code. This chapter will also introduce you to the idea of writing AR(1) models
in state-space form.
A script with all the R code in the chapter can be downloaded here. The
Rmd for this chapter can be downloaded here.
All the data used in the chapter are in the MARSS package. The other
required packages are stats (normally loaded by default when starting R),
datasets and forecast. Install the packages, if needed, and load:
library(stats)
library(MARSS)
library(forecast)
library(datasets)
To run the JAGS code example (optional), you will also need JAGS installed
and the R2jags, rjags and coda R packages. To run the Stan code example
(optional), you will need the rstan package.
157
158 CHAPTER 6. UNIVARIATE STATE-SPACE MODELS
To fit your time series model with the MARSS package, you need to put
your model into the form above. The B, Z, u, a, Q, R and µ are parameters
that are (potentially) estimated. The y are your data. The x are the hidden
state(s). Everything in bold is a matrix; if it is a small bolded letter, it is a
matrix with 1 column.
Important: In the state-space model equation, y is always the data and x is a
hidden random walk estimated from the data.
A basic MARSS() call looks like fit=MARSS(y, model=list(...)). The ar-
gument model tells the function what form the parameters take. The list
has the elements with the names: B, U, Q, etc. The names correspond to the
parameters with the same names in Equation (6.1) except that µ is called x0.
tinitx indicates whether the initial x is specified at t = 0 so x0 or t = 1 so
x1 .
Here’s an example. Let’s say we want to fit a univariate AR(1) model observed
with error. Here is that model:
To fit this with MARSS(), we need to write Equation (6.2) as Equation (6.1).
Equation (6.1) is in MATRIX form. In the model list, the parameters must
be written EXACTLY like they would be written for Equation (6.1). For
example, 1 is the number 1 in R. It is not a matrix:
class(1)
[1] "numeric"
6.1. FITTING A STATE-SPACE MODEL WITH MARSS 159
If you need a 1 (or 0) in your model, you need to pass in the parameter as a
1 × 1 matrix: matrix(1).
With that mind, our model list for Equation (6.2) is:
mod.list <- list(B = matrix(1), U = matrix(0), Q = matrix("q"),
Z = matrix(1), A = matrix(0), R = matrix("r"), x0 = matrix("mu"),
tinitx = 0)
MARSS fit is
Estimation method: kem
Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001
Estimation converged in 16 iterations.
Log-likelihood: -65.70444
AIC: 137.4089 AICc: 137.6589
Estimate
R.r 0.1066
Q.q 0.0578
x0.mu -0.2024
Initial states (x0) defined at t=0
We will use the data from the Nile River (Figure 6.1). We will fit different
flow models to the data and compare the models with AIC.
library(datasets)
dat <- as.vector(Nile)
1400
1200
Flow volume
1000
800
600
Year
Figure 6.1: The Nile River flow volume 1871 to 1970 (Nile dataset in R).
We will start by modeling these data as a simple average river flow with
variability around some level µ.
6.2. EXAMPLES USING THE NILE RIVER DATA 161
Output not shown, but here are the estimates and AICc.
c(coef(kem.0, type = "vector"), LL = kem.0$logLik, AICc = kem.0$AICc)
Figure 6.2 shows the fit for the flat average river flow model. Looking at the
data, we might expect that a declining average river flow would be better. In
MARSS form, that model would be:
where u is now the average per-year decline in river flow volume. The model
is specified as follows:
mod.nile.1 <- list(B = matrix(1), U = matrix("u"), Q = matrix(0),
Z = matrix(1), A = matrix(0), R = matrix("r"), x0 = matrix("mu"),
tinitx = 0)
Looking at the flow levels, we might suspect that a model that allows the
average flow to change would model the data better and we might suspect
that there have been sudden, and anomalous, changes in the river flow
level. We will now model the average river flow at year t as a random walk,
specifically an autoregressive process which means that average river flow is
year t is a function of average river flow in year t − 1.
We could also use the text shortcuts to specify the model. Because R and Q
are 1 × 1 matrices, “unconstrained”, “diagonal and unequal”, “diagonal and
equal” and “equalvarcov” will all lead to a 1 × 1 matrix with one estimated
element. For a and u, the following shortcut could be used:
A <- "zero"
U <- "zero"
We can add a drift to term to our random walk; the u in the process model
(x) is the drift term. This causes the random walk to tend to trend up or
down.
Figure 6.2 shows all the models along with their AICc values.
The StructTS function in the stats package in R will also fit the stochastic
level model:
fit.sts <- StructTS(dat, type = "level")
fit.sts
Call:
StructTS(x = dat, type = "level")
Variances:
level epsilon
1469 15099
StructTS() is much, much faster for long time series. The example in
?StructTS is pretty much instantaneous with StructTS() but takes minutes
with the EM algorithm that is the default in MARSS(). With the BFGS
algorithm, it is much closer to StructTS():
6.3. THE STRUCTTS FUNCTION 165
[1] 1162.904
is the expected value of yt+1 (in this case y11 since we set t = 10) given
the data up to yt (in this case, up to y10 ). It is called the one-step ahead
prediction.
We are not going to use the one-step ahead predictions unless we are forecasting
or doing cross-validation.
If you needed the one-step predictions from MARSS(), you can get them from
the Kalman filter output:
kf = print(kem.2, what = "kfs")
kf$xtt1[1, t]
1000 1400
model 0, AICc= 1313
Flow volume
600
1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970
1000 1400
600
1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970
1000 1400
600
1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970
1000 1400
600
1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970
Figure 6.2: The Nile River flow volume with the model estimated flow rates
(solid lines). The bottom model is a stochastic level model, meaning there
isn’t one level line. Rather the level line is a distribution that has a mean
and standard deviation. The solid state line in the bottom plots is the mean
of the stochastic level and the 2 standard deviations are shown. The other
two models are deterministic level models so the state is not stochastic and
does not have a standard deviation.
6.4. COMPARING MODELS WITH AIC AND MODEL WEIGHTS 167
To get the AIC or AICc values for a model fit from a MARSS fit, use
fit$AIC or fit$AICc. The log-likelihood is in fit$logLik and the number
of estimated parameters in fit$num.params. For fits from other functions,
try AIC(fit) or look at the function documentation.
Let’s put the AICc values 3 Nile models together:
nile.aic = c(kem.0$AICc, kem.1$AICc, kem.2$AICc, kem.3$AICc)
Then we calculate the AICc minus the minus AICc in our model set and
compute the model weights. ∆AIC is the AIC values minus the minimum
AIC value in your model set.
delAIC <- nile.aic - min(nile.aic)
relLik <- exp(-0.5 * delAIC)
aicweight <- relLik/sum(relLik)
Here the table is printed using round() to limit the number of digits shown.
round(aic.table, digits = 3)
The first diagnostic that you do with any statistical analysis is check that
your residuals correspond to your assumed error structure. We have two
types of errors in a univariate state-space model: process errors, the wt , and
observation errors, the vt .
They should not have a temporal trend. To get the residuals from most
types of fits in R, you can use residuals(fit). MARSS() calls the vt , “model
residuals”, and the wt “state residuals”. We can plot these using the following
code (Figure 6.3).
par(mfrow = c(1, 2))
resids <- residuals(kem.0)
state residual
−400
−1.0
0 20 40 60 80 100 0 20 40 60 80 100
state residual
−400
−1.0
0 20 40 60 80 100 0 20 40 60 80 100
state residual
−300
−40
0 20 40 60 80 100 0 20 40 60 80 100
Figure 6.3: The model and state residuals for the first 3 models.
autocorrelation plots are shown in Figure 6.4. The stochastic level model
looks the best in that its model residuals (the vt ) are fine but the state model
still has problems. Clearly the state is not a simple random walk. This is not
surprising. The Aswan Low Dam was completed in 1902 and changed the
mean flow. The Aswan High Dam was completed in 1970 and also affected
the flow. You can see these perturbations in Figure 6.1.
par(mfrow = c(2, 2))
resids <- residuals(kem.0)
0.6
ACF
ACF
−0.2
−0.2
0 5 10 15 20 0 5 10 15 20
Lag Lag
ACF
−0.2
−0.2
0 5 10 15 20 0 5 10 15 20
Lag Lag
Figure 6.4: The model and state residual acfs for the 3 models.
6.6. FITTING WITH JAGS 171
Here we show how to fit the stochastic level model, model 3 Equation (6.7),
with JAGS. This is a model where the level is a random walk with drift and
the Nile River flow is that level plus error.
library(datasets)
y <- as.vector(Nile)
This section requires that you have JAGS installed and the R2jags, rjags
and coda R packages loaded.
library(R2jags)
library(rjags)
library(coda)
The first step is to write the model for JAGS to a file (filename in model.loc):
model.loc <- "ss_model.txt"
jagsscript <- cat("
model {
# priors on parameters
mu ~ dnorm(Y1, 1/(Y1*100)); # normal mean = 0, sd = 1/sqrt(0.01)
tau.q ~ dgamma(0.001,0.001); # This is inverse gamma
sd.q <- 1/sqrt(tau.q); # sd is treated as derived parameter
tau.r ~ dgamma(0.001,0.001); # This is inverse gamma
sd.r <- 1/sqrt(tau.r); # sd is treated as derived parameter
u ~ dnorm(0, 0.01);
for(i in 2:TT) {
predX[i] <- X[i-1]+u;
X[i] ~ dnorm(predX[i],tau.q); # Process variation
Y[i] ~ dnorm(X[i], tau.r); # Observation variation
}
172 CHAPTER 6. UNIVARIATE STATE-SPACE MODELS
}
",
file = model.loc)
Next we specify the data (and any other input) that the JAGS code needs. In
this case, we need to pass in dat and the number of time steps since that is
used in the for loop. We also specify the parameters that we want to monitor.
We need to specify at least one, but we will monitor all of them so we can
plot them after fitting. Note, that the hidden state is a parameter in the
Bayesian context (but not in the maximum likelihood context).
jags.data <- list(Y = y, TT = length(y), Y1 = y[1])
jags.params <- c("sd.q", "sd.r", "X", "mu", "u")
We can then show the posteriors along with the MLEs from MARSS on top
(Figure 6.5 ) using the code below.
attach.jags(mod_ss)
par(mfrow = c(2, 2))
hist(mu)
abline(v = coef(kem.3)$x0, col = "red")
hist(u)
abline(v = coef(kem.3)$U, col = "red")
hist(log(sd.q^2))
abline(v = log(coef(kem.3)$Q), col = "red")
hist(log(sd.r^2))
abline(v = log(coef(kem.3)$R), col = "red")
detach.jags()
Histogram of mu Histogram of u
Frequency
Frequency
2000
0.6
0.0
0
0 1 2 3 4 5 −20 −10 0 10 20
mu u
Frequency
1500
0 1500
0
3 4 5 6 7 8 9 9.0 9.5 10.0 10.5
log(sd.q^2) log(sd.r^2)
Figure 6.5: The posteriors for model 3 with MLE estimates from MARSS()
shown in red.
mu
174 CHAPTER 6. UNIVARIATE STATE-SPACE MODELS
1000
800
600
0 20 40 60 80 100
Figure 6.6: The estimated states from the Bayesian fit along with 95% credible
intervals (black and grey) with the MLE states and 95% condidence intervals
in red.
y <- as.vector(Nile)
First we write the model. We could write this to a file (recommended), but for
this example, we write as a character object. Though the syntax is different
from the JAGS code, it has many similarities. Note, unlike the JAGS, the
Stan does not allow any NAs in your data. Thus we have to specify the
location of the NAs in our data. The Nile data does not have NAs, but we
want to write the code so it would work even if there were NAs.
scode <- "
data {
int<lower=0> TT;
int<lower=0> n_pos; // number of non-NA values
int<lower=0> indx_pos[n_pos]; // index of the non-NA values
vector[n_pos] y;
}
parameters {
real x0;
real u;
vector[TT] pro_dev;
real<lower=0> sd_q;
real<lower=0> sd_r;
}
transformed parameters {
vector[TT] x;
x[1] = x0 + u + pro_dev[1];
for(i in 2:TT) {
x[i] = x[i-1] + u + pro_dev[i];
}
}
model {
x0 ~ normal(y[1],10);
u ~ normal(0,2);
sd_q ~ cauchy(0,5);
sd_r ~ cauchy(0,5);
pro_dev ~ normal(0, sd_q);
for(i in 1:n_pos){
y[i] ~ normal(x[indx_pos[i]], sd_r);
176 CHAPTER 6. UNIVARIATE STATE-SPACE MODELS
}
}
generated quantities {
vector[n_pos] log_lik;
for (i in 1:n_pos) log_lik[i] = normal_lpdf(y[i] | x[indx_pos[i]], sd_r);
}
"
Then we call stan() and pass in the data, names of parameter we wish to
have returned, and information on number of chains, samples (iter), and
thinning. The output is verbose (hidden here) and may have some warnings.
# We pass in the non-NA ys as vector
ypos <- y[!is.na(y)]
n_pos <- sum(!is.na(y)) #number on non-NA ys
indx_pos <- which(!is.na(y)) #index on the non-NAs
mod <- rstan::stan(model_code = scode, data = list(y = ypos,
TT = length(y), n_pos = n_pos, indx_pos = indx_pos), pars = c("sd_q",
"x", "sd_r", "u", "x0"), chains = 3, iter = 1000, thin = 1)
We use extract() to extract the parameters from the fitted model and we
can plot. The estimated level is x and we will plot that with the 95% credible
intervals.
pars <- rstan::extract(mod)
pred_mean <- apply(pars$x, 2, mean)
pred_lo <- apply(pars$x, 2, quantile, 0.025)
pred_hi <- apply(pars$x, 2, quantile, 0.975)
plot(pred_mean, type = "l", lwd = 3, ylim = range(c(pred_mean,
pred_lo, pred_hi)), ylab = "Nile River Level")
lines(pred_lo)
lines(pred_hi)
points(y, col = "blue")
1200
Nile River Level
900 1000
800
700
0 20 40 60 80 100
Index
Figure 6.7: Estimated level and 95 percent credible intervals. Blue dots are
the actual Nile River levels.
We can plot the histogram of the samples against the values estimated via
maximum likelihood.
par(mfrow = c(2, 2))
hist(pars$x0)
abline(v = coef(kem.3)$x0, col = "red")
hist(pars$u)
abline(v = coef(kem.3)$U, col = "red")
hist(log(pars$sd_q^2))
abline(v = log(coef(kem.3)$Q), col = "red")
hist(log(pars$sd_r^2))
abline(v = log(coef(kem.3)$R), col = "red")
178 CHAPTER 6. UNIVARIATE STATE-SPACE MODELS
1400
1000
800
600
Frequency
150
200
0
pars$x0 pars$u
Frequency
150
0 150
0
log(pars$sd_q^2) log(pars$sd_r^2)
Figure 6.9: Histogram of the parameter samples versus the estimate (red line)
from maximum likelihood.
6.8. A RANDOM WALK MODEL OF ANIMAL MOVEMENT 179
6.9 Problems
1. Write the equations for each of these models: ARIMA(0,0,0),
ARIMA(0,1,0), ARIMA(1,0,0), ARIMA(0,0,1), ARIMA(1,0,1). Read
the help file for the Arima() function (in the forecast package) if you
are fuzzy on the arima notation.
2. The MARSS package includes a data set of sharp-tailed grouse in
Washington. Load the data to use as follows:
library(MARSS)
dat = log(grouse[, 2])
6. The MARSS package includes a data set of gray whales. Load the
data to use as follows:
library(MARSS)
dat <- log(graywhales[, 2])
Fit a random walk with drift model observed with error to the data:
xt = xt−1 + u + wt where wt ∼ N(0, q)
yt = xt + vt where vt ∼ N(0, r) (6.19)
x0 = a
• Model 4 Process error with drift and observation error with obser-
vation error variance estimated.
a. Compute the AICc’s for each model and likelihood or deviance (-2
* log likelihood). Where to find these? Try names(fit). logLik()
is the standard R function to return log-likelihood from fits.
b. Calculate a table of ∆AICc values and AICc weights.
c. Show the acf of the model and state residuals for the best model.
You will need a vector of the residuals to do this. If fit is the fit
from a fit call like fit = MARSS(dat), you get the residuals using
this code:
residuals(fit)$state.residuals[1, ]
residuals(fit)$model.residuals[1, ]
This will prepare the training data and set aside the last 3 data points
for validation.
a. Fit the following four models using Arima(): ARIMA(0,0,0),
ARIMA(1,0,0), ARIMA(0,0,1), ARIMA(1,0,1).
b. Use forecast() to make 3 step ahead forecasts from each.
c. Calculate the MASE statistic for each using the accuracy() func-
tion in the forecast package. Type ?accuracy to learn how to
use this function.
d. Present the results in a table.
e. Which model is best supported based on the MASE statistic?
6.9. PROBLEMS 185
a. Plot MaryLee’s locations (as a line not dots). Put the latitude
locations on the y-axis and the longitude on the y-axis. You can
use rownames(dat) to see which is in which row. You can just use
plot() for the homework. But if you want, you can look at the
MARSS Manual chapter on animal movement to see how to plot
the turtle locations on a map using the maps package.
b. Analyze the data with a state-space model (movement observed
with error) using
fit0 <- MARSS(dat)
Look at the output from the above MARSS call. What is the
meaning of the parameters output from MARSS in terms of turtle
movement? What exactly is the u estimate for example? Look at
the data and think about the model you fit.
c. What assumption did the default MARSS model make about obser-
vation error and process error? What does that assumption mean
in terms of how steps in the N-S and E-W directions are related?
What does that assumption mean in terms of our assumption about
the latitudal and longitudinal observation errors?
d. Does MaryLee move faster in the latitude direction versus longitude
direction?
e. Add MaryLee’s estimated “true” positions to your plot of her
186 CHAPTER 6. UNIVARIATE STATE-SPACE MODELS
g. Plot your state residuals (true location residuals). What are the
problems? Discuss in reference to your plot of the location data.
Here is how to get state residuals from MARSS() output:
resids <- residuals(fit0)$state.residuals
The lon residuals are in row 1 and lat residuals are in row 2 (same o
Chapter 7
MARSS models
This lab will show you how to fit multivariate state-space (MARSS) models
using the MARSS package. This class of time-series model is also called vec-
tor autoregressive state-space (VARSS) models. This chapter works through
an example which uses model selection to test different population structures
in west coast harbor seals. See Holmes et al. (2014) for a fuller version of this
example.
A script with all the R code in the chapter can be downloaded here. The
Rmd for this chapter can be downloaded here
All the data used in the chapter are in the MARSS package. For most
examples, we will use the MARSS() function to fit models via maximum-
likelihood. We also show how to fit a Bayesian model using JAGS and Stan.
For these sectiosn you will need the R2jags, coda and rstan packages. To
run the JAGS code, you will also need JAGS installed. See Chapter 12 for
more details on JAGS and Chapter 13 for more details on Stan.
library(MARSS)
library(R2jags)
library(coda)
library(rstan)
187
188 CHAPTER 7. MULTIVARIATE STATE-SPACE MODELS
7.1 Overview
As discussed in Chapter 6, the MARSS package fits multivariate state-space
models in this form:
xt = Bxt−1 + u + wt where wt ∼ N(0, Q)
yt = Zxt + a + vt where vt ∼ N(0, R) (7.1)
x0 = µ
where each of the bolded terms are matrices. Those that are bolded and small
(not capitalized) have one column only, so are column matrices.
To fit a multivariate time series model with the MARSS package, you need
to first determine the size and structure of each of the parameter matrices:
B, u, Q, Z, a, R and µ. This requires first writing down your model in
matrix form. We will illustarte this with a series of models for the temporal
population dynamics of West coast harbor seals.
observation errors may be quite different. The regions have had different
levels of sampling; the best sampled region has only 4 years missing while the
worst has over half the years missing (Figure 7.1).
region
SJF
value
SJI
EBays
7
PSnd
HC
Figure 7.1: Plot of the of the count data from the five harbor seal regions
(Jeffries et al. 2003). The numbers on each line denote the different regions:
1) Strait of Juan de Fuca (SJF), 2) San Juan Islands (SJI), 2) Eastern Bays
(EBays), 4) Puget Sound (PSnd), and 5) Hood Canal (HC). Each region is
an index of the total harbor seal population in each region.
The harbor seal data are included in the MARSS package as matrix with
years in column 1 and the logged counts in the other columns. Let’s look at
190 CHAPTER 7. MULTIVARIATE STATE-SPACE MODELS
We assume that all four regional time series are observations of this one pop-
ulation trajectory but they are scaled up or down relative to that trajectory.
In effect, we think of each regional survey as an index of the total population.
With this model, we do not think the regions represent independent subpopu-
lations but rather independent observations of one population. Our model
for the data, yt = Zxt + a + vt , is written as:
y1 1 0 v1
y2 1 a2 v2
= xt + + (7.4)
y3 1 a3 v3
y4 t
1 a4 v4 t
Each yi is the observed time series of counts for a different region. The
a’s are the bias between the regional sample and the total population. Z
specifies which observation time series, yi , is associated with which population
trajectory, xj . In this case, Z is a matrix with 1 column since each region is
an observation of the one population trajectory.
We allow that each region could have a unique observation variance and that
the observation errors are independent between regions. We assume that the
observations errors on log(counts) are normal and thus the errors on (counts)
are log-normal. The assumption of normality is not unreasonable since these
regional counts are the sum of counts across multiple haul-outs. We specify
independent observation errors with different variances by specifying that
192 CHAPTER 7. MULTIVARIATE STATE-SPACE MODELS
r1 0 0 0
0 r 0 0
2
R= (7.5)
0 0 r3 0
0 0 0 r4
This is a diagonal matrix with unequal variances. The shortcut for this
structure in MARSS() is "diagonal and unequal".
We need to write the model in the form of Equation (7.1) with each parameter
written as a matrix. The observation model (Equation (7.4)) is already in
matrix form. Let’s write the state model in matrix form too:
and fit:
fit.0 <- MARSS(dat, model = mod.list.0)
MARSS fit is
Estimation method: kem
7.3. A SINGLE WELL-MIXED POPULATION 193
Estimate
A.SJI 0.79583
A.EBays 0.27528
A.PSnd -0.54335
R.(SJF,SJF) 0.02883
R.(SJI,SJI) 0.03063
R.(EBays,EBays) 0.01661
R.(PSnd,PSnd) 0.01168
U.u 0.05537
Q.q 0.00642
x0.mu 6.22810
Initial states (x0) defined at t=0
The model fits fine but look at the model residuals (Figure 7.2). They have
problems.
par(mfrow = c(2, 2))
resids <- residuals(fit.0)
for (i in 1:4) {
plot(resids$model.residuals[i, ], ylab = "model residuals",
xlab = "")
abline(h = 0)
title(rownames(dat)[i])
}
SJF SJI
model residuals
model residuals
0.1
0.1
−0.2
−0.3
5 10 15 20 5 10 15 20
EBays PSnd
model residuals
model residuals
0.05
0.1
−0.20
−0.2
5 10 15 20 5 10 15 20
Figure 7.2: The model residuals for the first model. SJI and EBays do not
look good.
7.4. FOUR SUBPOPULATIONS WITH TEMPORALLY UNCORRELATED ERRORS195
The model for one well-mixed population was not very good. Another
reasonable assumption is that the different census regions are measuring four
different temporally independent subpopulations. We write a model of the
log subpopulation abundances for this case as:
x1 1 0 0 0 x1 u w1
x 0 1 0 0 x2
u w2
2
= +
+
x3 0 0 1 0 x3 u w3
x4 t 0 0 0 1 x4 t−1 u w4 t
q 0 0 0
0 q 0 0
where wt ∼ MVN 0, (7.7)
0 0 q 0
0 0 0 q
x1 µ1
x µ
2 2
=
x3 µ3
x4 0 µ4 t
The Q matrix is diagonal with one variance value. This means that the process
variance (variance in year-to-year population growth rates) is independent
(good and bad years are not correlated) but the level of variability is the same
across regions. We made the u matrix with one u value. This means that we
assume the population growth rates are the same across regions.
Notice that we set the B matrix equal to a diagonal matrix with 1 on the
diagonal. This is the “identity” matrix and it is like a 1 but for matrices. We
do not need B for our model, but MARSS() requires a value.
196 CHAPTER 7. MULTIVARIATE STATE-SPACE MODELS
y1 1 0 0 0 x1 0 v1
y2 0 1 0 0 0
x2 v2
= + + (7.8)
y3 0 0 1 0 x3 0 v3
y4 t
0 0 0 1 x4 t 0 v4 t
0 1 0 0
1 0 0 0
(7.9)
0 0 0 1
0 0 1 0
c c c q
This Q matrix structure means that the process variance (variance in year-to-
year population growth rates) is the same across regions and the covariance
in year-to-year population growth rates is also the same across regions.
Results are not shown, but here are the AICc. This last model is much better:
c(fit.0$AICc, fit.1$AICc, fit.2$AICc)
Look at the model residuals (Figure 7.3). They are also much better.
SJF SJI
model residuals
model residuals
0.02
0.1
−0.06
−0.2
5 10 15 20 5 10 15 20
EBays PSnd
model residuals
model residuals
0.02
−0.15 0.00
−0.04
5 10 15 20 5 10 15 20
Figure 7.3: The model residuals for the model with four temporally correlated
subpopulations.
Figure 7.4 shows the estimated states for each region using this code:
par(mfrow = c(2, 2))
for (i in 1:4) {
plot(years, fit.2$states[i, ], ylab = "log subpopulation estimate",
xlab = "", type = "l")
lines(years, fit.2$states[i, ] - 1.96 * fit.2$states.se[i,
], type = "l", lwd = 1, lty = 2, col = "red")
lines(years, fit.2$states[i, ] + 1.96 * fit.2$states.se[i,
], type = "l", lwd = 1, lty = 2, col = "red")
title(rownames(dat)[i])
}
7.5. FOUR SUBPOPULATIONS WITH TEMPORALLY CORRELATED ERRORS199
SJF SJI
log subpopulation estimate
8.5
7.5
8.0
7.0
7.5
6.5
7.0
EBays PSnd
log subpopulation estimate
7.0
7.4
6.6
7.0
6.2
5.8
6.6
Figure 7.4: Plot of the estimate of log harbor seals in each region. The 95%
confidence intervals on the population estimates are the dashed lines. These
are not the confidence intervals on the observations, and the observations
(the numbers) will not fall between the confidence interval lines.
200 CHAPTER 7. MULTIVARIATE STATE-SPACE MODELS
For our next example, we will use MARSS models to test hypotheses about
the population structure of harbor seals on the west coast. For this example,
we will evaluate the support for different population structures (numbers of
subpopulations) using different Zs to specify how survey regions map onto
subpopulations. We will assume correlated process errors with the same
magnitude of process variance and covariance. We will assume independent
observations errors with equal variances at each site. We could do unequal
variances but it takes a long time to fit so for this example, the observation
variances are set equal.
The dataset we will use is harborSeal, a 29-year dataset of abundance indices
for 12 regions along the U.S. west coast between 1975-2004 (Figure 7.5).
We start by setting up our data matrix. We will leave off Hood Canal.
dat <- MARSS::harborSeal
years <- dat[, "Year"]
good <- !(colnames(dat) %in% c("Year", "HoodCanal"))
sealData <- t(dat[, good])
We will evaluate the data support for the following hypotheses about the
population structure:
• H1: stock 3 subpopulations defined by management units
• H2: coast+PS 2 subpopulations defined by coastal versus WA inland
• H3: N+S 2 subpopulations defined by north and south split in the middle
of Oregon
• H4:NC+strait+PS+SC 4 subpopulations defined by N coastal, S coastal,
SJF+Georgia Strait, and Puget Sound
• H5: panmictic All regions are part of the same panmictic population
• H6: site Each of the 11 regions is a subpopulation
7.7. HYPOTHESES REGARDING SPATIAL STRUCTURE 201
10
10
9
value
10
6
1980 1990 2000 1980 1990 2000 1980 1990 2000 1980 1990 2000
Year
Figure 7.5: Plot of log counts at each survey region in the harborSeal dataset.
Each region is an index of the harbor seal abundance in that region.
202 CHAPTER 7. MULTIVARIATE STATE-SPACE MODELS
H1 H2 H4 H5
pnw ps ca coast pc nc is ps sc pan
Coastal Estuaries 1 0 0 1 0 1 0 0 0 1
Olympic Peninsula 1 0 0 1 0
1 0 0
0
1
Str. Juan de Fuca 0 1 0 0 1 0 1 0 0 1
San Juan Islands 0 1 0 0 1
0 1 0
0
1
Eastern Bays 0 1 0 0 1 0 0 1 0 1
Puget Sound 0 1 0 0 1
0 0 1
0
1
CA Mainland 0 0 1 1 0
0 0 0
1 1
CA Channel Islands 0 0 1 1 0 0 0 0 1 1
OR North Coast 1 0 0 1 0
1 0 0
0
1
OR South Coast 1 0 0 1 0 0 0 0 1 1
Georgia Strait 0 1 0 0 1 0 1 0 0 1
Only the Z matrices change for our model. We will set up a base model list
used for all models.
7.8. SET UP THE HYPOTHESES AS DIFFERENT MODELS 203
mod.list = list(
B = "identity",
U = "unequal",
Q = "equalvarcov",
Z = "placeholder",
A = "scaling",
R = "diagonal and equal",
x0 = "unequal",
tinitx = 0 )
We will use AICc and AIC weights to summarize the data support for the
different hypotheses. First we will sort the fits based on AICc:
min.AICc <- order(out.tab$AICc)
out.tab.1 <- out.tab[min.AICc, ]
The AIC weight for a model is its relative likelihood divided by the sum of
all the relative likelihoods.
out.tab.1 <- cbind(out.tab.1, AIC.weight = out.tab.1$rel.like/sum(out.tab.1$re
We will fit the model with four temporally independent subpopulations with
the same population growth rate (u) and year-to-year variance (q). This is
the model in Section 7.4.
The first step is to write this model in JAGS. See Chapter 12 for more
information on and examples of JAGS models.
jagsscript <- cat("
model {
U ~ dnorm(0, 0.01);
tauQ~dgamma(0.001,0.001);
Q <- 1/tauQ;
# Observation model
# The Rs are different in each site
for(i in 1:nSites) {
tauR[i]~dgamma(0.001,0.001);
R[i] <- 1/tauR[i];
}
206 CHAPTER 7. MULTIVARIATE STATE-SPACE MODELS
for(t in 1:nYears) {
for(i in 1:nSites) {
Y[i,t] ~ dnorm(X[i,t],tauR[i]);
}
}
}
",
file = "marss-jags.txt")
{#sec-mss-fit-jags}
Then we write the data list, parameter list, and pass the model to the jags()
function:
jags.data <- list(Y = Y, nSites = nrow(Y), nYears = ncol(Y)) # named list
jags.params <- c("X", "U", "Q", "R")
model.loc <- "marss-jags.txt" # name of the txt file
mod_1 <- jags(jags.data, parameters.to.save = jags.params, model.file = model.
n.chains = 3, n.burnin = 5000, n.thin = 1, n.iter = 10000,
DIC = TRUE)
SJF SJI
log abundance
log abundance
8.0
7.0
7.0
6.0
5 10 15 20 5 10 15 20
EBays PSnd
log abundance
log abundance
6.5
7.4
5.0
6.6
5 10 15 20 5 10 15 20
Figure 7.6: Plot of the posterior means and credible intervals for the estimated
states.
detach.jags()
208 CHAPTER 7. MULTIVARIATE STATE-SPACE MODELS
Let’s fit the same model as in Section 7.9 with Stan using the rstan package. If
you have not already, you will need to install the rstan package. This package
depends on a number of other packages which should install automatically
when you install rstan.
First we write the model. We could write this to a file (recommended), but for
this example, we write as a character object. Though the syntax is different
from the JAGS code, it has many similarities. Note that Stan does not allow
missing values in the data, thus we need to pass in only the non-missing
values along with the row and column indices of those values. The latter is
so we can match them to the appropriate state (x) values.
scode <- "
data {
int<lower=0> TT; // length of ts
int<lower=0> N; // num of ts; rows of y
int<lower=0> n_pos; // number of non-NA values in y
int<lower=0> col_indx_pos[n_pos]; // col index of non-NA vals
int<lower=0> row_indx_pos[n_pos]; // row index of non-NA vals
vector[n_pos] y;
}
parameters {
vector[N] x0; // initial states
real u;
vector[N] pro_dev[TT]; // refed as pro_dev[TT,N]
real<lower=0> sd_q;
real<lower=0> sd_r[N]; // obs variances are different
}
transformed parameters {
vector[N] x[TT]; // refed as x[TT,N]
for(i in 1:N){
x[1,i] = x0[i] + u + pro_dev[1,i];
for(t in 2:TT) {
x[t,i] = x[t-1,i] + u + pro_dev[t,i];
}
}
7.10. FITTING A MARSS MODEL WITH STAN 209
}
model {
sd_q ~ cauchy(0,5);
for(i in 1:N){
x0[i] ~ normal(y[i],10); // assume no missing y[1]
sd_r[i] ~ cauchy(0,5);
for(t in 1:TT){
pro_dev[t,i] ~ normal(0, sd_q);
}
}
u ~ normal(0,2);
for(i in 1:n_pos){
y[i] ~ normal(x[col_indx_pos[i], row_indx_pos[i]], sd_r[row_indx_pos[i]]);
}
}
generated quantities {
vector[n_pos] log_lik;
for (n in 1:n_pos) log_lik[n] = normal_lpdf(y[n] | x[col_indx_pos[n], row_indx_pos[n
}
"
Then we call stan() and pass in the data, names of parameter we wish to
have returned, and information on number of chains, samples (iter), and
thinning. The output is verbose (hidden here) and may have some warnings.
ypos <- Y[!is.na(Y)]
n_pos <- length(ypos) #number on non-NA ys
indx_pos <- which(!is.na(Y), arr.ind = TRUE) #index on the non-NAs
col_indx_pos <- as.vector(indx_pos[, "col"])
row_indx_pos <- as.vector(indx_pos[, "row"])
mod <- rstan::stan(model_code = scode, data = list(y = ypos,
TT = ncol(Y), N = nrow(Y), n_pos = n_pos, col_indx_pos = col_indx_pos,
row_indx_pos = row_indx_pos), pars = c("sd_q", "x", "sd_r",
"u", "x0"), chains = 3, iter = 1000, thin = 1)
We use extract() to extract the parameters from the fitted model and then
the means and 95% credible intervals.
210 CHAPTER 7. MULTIVARIATE STATE-SPACE MODELS
SJF SJI
6
mean
EBays PSnd
1980 1985 1990 1995 2000 1980 1985 1990 1995 2000
year
7.11 Problems
For these questions, use the harborSealWA data set in MARSS. The data
are already logged, but you will need to remove the year column and have
time going across the columns not down the rows.
require(MARSS)
data(harborSealWA, package = "MARSS")
dat <- t(harborSealWA[, 2:6])
The sites are San Juan de Fuca (SJF 3), San Juan Islands (SJI 4), Eastern
Bays (EBays 5), Puget Sound (PSnd 6) and Hood Canal (HC 7).
1. Plot the harbor seal data. Use whatever plotting functions you wish
(e.g. ggplot(), plot(); points(); lines(), matplot()).
2. Fit a panmictic population model that assumes that each of the 5 sites
is observing one “Inland WA” harbor seal population with trend u.
Assume the observation errors are independent and identical. This
means 1 variance on diagonal and 0s on off-diagonal. This is the default
assumption for MARSS().
a. Write the Z for this model. The code to use for making a matrix
in Rmarkdown is
$$\begin{bmatrix}a & b & 0\\d & e & f\\0 & h & i\end{bmatrix}$$
b. Write the Z matrix in R using Z=matrix(...) and using the
factor short-cut for specifying Z. Z=factor(c(...).
c. Fit the model using MARSS(). What is the estimated trend (u)?
How fast was the population increasing (percent per year) based
on this estimated u?
d. Compute the confidence intervals for the parameter estimates.
Compare the intervals using the Hessian approximation and using
a parametric bootstrap. What differences do you see between the
two approaches? Use this code:
library(broom)
tidy(fit)
# set nboot low so it doesn't take forever
212 CHAPTER 7. MULTIVARIATE STATE-SPACE MODELS
tidy(fit, method="parametric",nboot=100)
e. What does an estimate of Q = 0 mean? What would the estimated
state (x) look like when Q = 0?
3. Using the same panmictic population model, compare 3 assumptions
about the observation error structure.
• The observation errors are independent with different variances.
• The observation errors are independent with the same variance.
• The observation errors are correlated with the same variance and
same correlation.
a. Write the R variance-covariance matrices for each assumption.
b. Create each R matrix in R. To combine, numbers and characters
in a matrix use a list matrix like so:
A <- matrix(list(0),3,3)
A[1,1] <- "sigma2"
c. Fit each model using MARSS() and compute the confidence intervals
(CIs) for the estimated parameters. Compare the estimated u
(the population long-term trend) along with their CIs. Does the
assumption about the observation errors change the u estimate?
d. Plot the state residuals, the ACF of the state residuals, and the
histogram of the state residuals for each fit. Are there any issues
that you see? Use this code to get your state residuals:
residuals(fit)$state.residuals[1,]
You need the [1,] since the residuals are returned as a matrix.
4. Fit a model with 3 subpopulations. 1=SJF,SJI; 2=PS,EBays; 3=HC.
The x part of the model is the population structure. Assume that the
observation errors are identical and independent (R="diagonal and
equal"). Assume that the process errors are unique and independent
(Q="diagonal and unequal"). Assume that the u are unique among
the 3 subpopulation.
a. Write the x equation. Make sure each matrix in the equation has
the right number of rows and columns.
214 CHAPTER 7. MULTIVARIATE STATE-SPACE MODELS
This will put NAs in the model residuals where there is missing data.
Then do the tests on each row of resids.
resids <- residuals(fit)$model.residuals
resids[is.na(dat)] <- NA
A script with all the R code in the chapter can be downloaded here. The
Rmd for this chapter can be downloaded here
For the chapter examples, we will use the green and bluegreen algae in the Lake
Washington plankton data set and the covariates in that dataset. This is a 32-
year time series (1962-1994) of monthly plankton counts (cells per mL) from
Lake Washington, Washington, USA with the covariates total phosphorous
and pH. lakeWAplanktonTrans is a transformed version of the raw data used
for teaching purposes. Zeros have been replaced with NAs (missing). The
logged (natural log) raw plankton counts have been standardized to a mean
of zero and variance of 1 (so logged and then z-scored). Temperature, TP and
pH were also z-scored but not logged (so z-score of the untransformed values
for these covariates). The single missing temperature value was replaced with
-1 and the single missing TP value was replaced with -0.3.
We will use the 10 years of data from 1965-1974 (Figure 8.1), a decade with
particularly high green and bluegreen algae levels.
data(lakeWAplankton, package = "MARSS")
# lakeWA
fulldat = lakeWAplanktonTrans
217
218 CHAPTER 8. MARSS WITH COVARIATES
years = fulldat[, "Year"] >= 1965 & fulldat[, "Year"] < 1975
dat = t(fulldat[years, c("Greens", "Bluegreens")])
covariates = t(fulldat[years, c("Temp", "TP")])
Packages:
library(MARSS)
library(ggplot2)
8.1 Overview
A multivariate autoregressive state-space (MARSS) model with covariate
effects in both the process and observation components is written as:
Bluegreens
0 1 2
−2
1
Temp
−1 0
TP
1
−1
Time
Figure 8.1: Time series of Green and Bluegreen algae abundances in Lake
Washington along with the temperature and total phosporous covariates.
here so you see the relationship of MARSS model to more familiar linear
regression models.
In a standard multivariate linear regression, we only have an observation
model with independent errors (the state process does not appear in the
model):
yt = a + Ddt + vt , where vt ∼ MVN(0, R) (8.2)
The elements in a are the intercepts and those in D are the slopes (effects).
We have dropped the t subscript on a and D because these will be modeled as
time-constant. Writing this out for the two plankton and the two covariates
we get:
" # " # " #" # " #
yg a β βg,tp temp v
= 1 + g,temp + 1 (8.3)
ybg t
a2 βbg,temp βbg,tp tp t−1
v2 t
Let’s fit this model with MARSS. The x part of the model is irrelevant so we
want to fix the parameters in that part of the model. We won’t set B = 0 or
Z = 0 since that might cause numerical issues for the Kalman filter. Instead
we fix them as identity matrices and fix x0 = 0 so that xt = 0 for all t.
Q <- U <- x0 <- "zero"
B <- Z <- "identity"
d <- covariates
A <- "zero"
D <- "unconstrained"
y <- dat # to show relationship between dat & the equation
model.list <- list(B = B, U = U, Q = Q, Z = Z, A = A, D = D,
d = d, x0 = x0)
kem <- MARSS(y, model = model.list)
Success! algorithm run for 15 iterations. abstol and log-log tests passed.
Alert: conv.test.slope.tol is 0.5.
Test with smaller values (<0.1) to ensure convergence.
MARSS fit is
Estimation method: kem
Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001
Algorithm ran 15 (=minit) iterations and convergence was reached.
Log-likelihood: -276.4287
8.4. PROCESS-ERROR ONLY MODEL 221
Estimate
R.diag 0.706
D.(Greens,Temp) 0.367
D.(Bluegreens,Temp) 0.392
D.(Greens,TP) 0.058
D.(Bluegreens,TP) 0.535
Initial states (x0) defined at t=0
C <- "unconstrained"
model.list <- list(B = B, U = U, Q = Q, Z = Z, A = A, R = R,
C = C, c = covariates)
kem <- MARSS(dat, model = model.list)
Success! algorithm run for 15 iterations. abstol and log-log tests passed.
Alert: conv.test.slope.tol is 0.5.
Test with smaller values (<0.1) to ensure convergence.
MARSS fit is
Estimation method: kem
Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001
Algorithm ran 15 (=minit) iterations and convergence was reached.
Log-likelihood: -285.0732
AIC: 586.1465 AICc: 586.8225
Estimate
Q.diag 0.7269
Q.offdiag -0.0210
x0.X.Greens -0.5189
x0.X.Bluegreens -0.2431
C.(X.Greens,Temp) -0.0434
C.(X.Bluegreens,Temp) 0.0988
C.(X.Greens,TP) -0.0589
C.(X.Bluegreens,TP) 0.0104
Initial states (x0) defined at t=0
Now, it looks like temperature has a strong negative effect on algae? Also
our log-likelihood dropped a lot. Well, the data do not look at all like a
random walk model (where B = 1), which we can see from the plot of the
data (Figure 8.1). The data are fluctuating about some mean so let’s switch
to a better autoregressive model—a mean-reverting model. To do this, we
will allow the diagonal elements of B to be something other than 1.
8.4. PROCESS-ERROR ONLY MODEL 223
Success! algorithm run for 15 iterations. abstol and log-log tests passed.
Alert: conv.test.slope.tol is 0.5.
Test with smaller values (<0.1) to ensure convergence.
MARSS fit is
Estimation method: kem
Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001
Algorithm ran 15 (=minit) iterations and convergence was reached.
Log-likelihood: -236.6106
AIC: 493.2211 AICc: 494.2638
Estimate
B.(X.Greens,X.Greens) 0.1981
B.(X.Bluegreens,X.Bluegreens) 0.7672
Q.diag 0.4899
Q.offdiag -0.0221
x0.X.Greens -1.2915
x0.X.Bluegreens -0.4179
C.(X.Greens,Temp) 0.2844
C.(X.Bluegreens,Temp) 0.1655
C.(X.Greens,TP) 0.0332
C.(X.Bluegreens,TP) 0.1340
Initial states (x0) defined at t=0
Notice that the log-likelihood goes up quite a bit, which means that the
mean-reverting model fits the data much better.
Success! algorithm run for 15 iterations. abstol and log-log tests passed.
Alert: conv.test.slope.tol is 0.5.
Test with smaller values (<0.1) to ensure convergence.
MARSS fit is
Estimation method: kem
Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001
Algorithm ran 15 (=minit) iterations and convergence was reached.
Log-likelihood: -235.4827
AIC: 486.9653 AICc: 487.6414
Estimate
B.(X.Greens,X.Greens) 0.1980
B.(X.Bluegreens,X.Bluegreens) 0.7671
Q.diag 0.4944
Q.offdiag -0.0223
C.(X.Greens,Temp) 0.2844
C.(X.Bluegreens,Temp) 0.1655
C.(X.Greens,TP) 0.0332
C.(X.Bluegreens,TP) 0.1340
Initial states (x0) defined at t=1
Here is an example where we have both process and observation error but
the covariates only affect the process:
MARSS fit is
Estimation method: kem
Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001
Estimation converged in 36 iterations.
Log-likelihood: -240.3694
AIC: 500.7389 AICc: 501.7815
226 CHAPTER 8. MARSS WITH COVARIATES
Estimate
B.(X.Greens,X.Greens) 0.30848
B.(X.Bluegreens,X.Bluegreens) 0.76101
Q.diag 0.33923
Q.offdiag -0.00411
x0.X.Greens -0.52614
x0.X.Bluegreens -0.32836
C.(X.Greens,Temp) 0.23790
C.(X.Bluegreens,Temp) 0.16991
C.(X.Greens,TP) 0.02505
C.(X.Bluegreens,TP) 0.14183
Initial states (x0) defined at t=1
MARSS fit is
Estimation method: kem
Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001
Estimation converged in 45 iterations.
Log-likelihood: -239.5879
AIC: 499.1759 AICc: 500.2185
Estimate
B.(X.Greens,X.Greens) 0.428
B.(X.Bluegreens,X.Bluegreens) 0.859
Q.diag 0.314
Q.offdiag -0.030
x0.X.Greens -0.121
x0.X.Bluegreens -0.119
D.(Greens,Temp) 0.373
D.(Bluegreens,Temp) 0.276
D.(Greens,TP) 0.042
D.(Bluegreens,TP) 0.115
Initial states (x0) defined at t=1
Time-series data are often collected at intervals with some implicit “seasonality.”
For example, quarterly earnings for a business, monthly rainfall totals, or
228 CHAPTER 8. MARSS WITH COVARIATES
One common approach for estimating seasonal effects is to treat each one
as a fixed factor, such that the number of parameters equals the number of
“seasons” (e.g., 24 hours per day, 4 quarters per year). The plankton data
are collected monthly, so we will treat each month as a fixed factor. To fit a
model with fixed month effects, we create a 12 × T covariate matrix c with one
row for each month (Jan, Feb, . . . ) and one column for each time point. We
put a 1 in the January row for each column corresponding to a January time
point, a 1 in the February row for each column corresponding to a February
time point, and so on. All other values of c equal 0. The following code will
create such a c matrix.
# number of 'seasons' (e.g., 12 months per year)
period <- 12
# first 'season' (e.g., Jan = 1, July = 7)
per.1st <- 1
# create factors for seasons
8.6. INCLUDING SEASONAL EFFECTS IN MARSS MODELS 229
Next we need to set up the form of the C matrix which defines any constraints
we want to set on the month effects. C is a 5 × 12 matrix. Five taxon and
12 month effects. If we wanted each taxon to have the same month effect,
i.e. there is a common month effect across all taxon, then we have the same
value in each C column1 :
C <- matrix(month.abb, 5, 12, byrow = TRUE)
C
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
[1,] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov" "Dec"
[2,] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov" "Dec"
[3,] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov" "Dec"
[4,] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov" "Dec"
[5,] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov" "Dec"
Notice, that C only has 12 values in it, the 12 common month effects. However,
for this example, we will let each taxon have a different month effect thus
allowing different seasonality for each taxon. For this model, we want each
value in C to be unique:
C <- "unconstrained"
1
‘month.abb‘ is a R constant that gives month abbreviations in text.
230 CHAPTER 8. MARSS WITH COVARIATES
Now we can set up the model list for MARSS and fit the model (results are
not shown since they are verbose with 60 different month effects).
model.list <- list(B = B, U = U, Q = Q, Z = Z, A = A, R = R,
C = C, c = c.in, D = D, d = d)
seas.mod.1 <- MARSS(dat, model = model.list, control = list(maxit = 1500))
# Get the estimated seasonal effects rows are taxa, cols are
# seasonal effects
seas.1 <- coef(seas.mod.1, type = "matrix")$C
rownames(seas.1) <- phytos
colnames(seas.1) <- month.abb
The top panel in Figure 8.2 shows the estimated seasonal effects for this
model. Note that if we had set U=“unequal”, we would need to set one of the
columns of C to zero because the model would be under-determined (infinite
number of solutions). If we substracted the mean January abundance off
each time series, we could set the January column in C to 0 and get rid of 5
estimated effects.
8.6. INCLUDING SEASONAL EFFECTS IN MARSS MODELS 231
The middle panel in Figure 8.2 shows the estimated seasonal effects for this
polynomial model.
Note: Setting the covariates up like this means that our covariates are collinear
since m, m2 and m3 are correlated, obviously. A better approach is to use
the poly() function to create an orthogonal polynomial covariate matrix
c.m.poly.o:
month.cov.o <- cbind(1, poly(1:period, poly.order))
c.m.poly.o <- matrix(t(month.cov.o), poly.order + 1, TT + period,
byrow = FALSE)
c.m.poly.o <- c.m.poly.o[, (1:TT) + (per.1st - 1)]
The factor approach required estimating 60 effects, and the 3rd order polyno-
mial model was an improvement at only 20 parameters. A third option is to
use a discrete Fourier series, which is combination of sine and cosine waves;
it would require only 10 parameters. Specifically, the effect of month m on
taxon i is ai × cos(2πm/p) + bi × sin(2πm/p), where p is the period (e.g., 12
months, 4 quarters), and ai and bi are contained in the i-th row of C.
Everything else remains the same and we can fit this model as follows:
model.list <- list(B = B, U = U, Q = Q, Z = Z, A = A, R = R,
C = C, c = c.Four, D = D, d = d)
seas.mod.3 <- MARSS(dat, model = model.list, control = list(maxit = 1500))
The bottom panel in Figure 8.2 shows the estimated seasonal effects for this
seasonal-effects model based on a discrete Fourier series.
Diatoms
Greens
Fixed monthly
0.5
Bluegreens
Unicells
Other.algae
−0.5
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Diatoms
0.4
Greens
Bluegreens
Unicells
Cubic
Other.algae
−0.2
−0.8
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Diatoms
0.4
Greens
Bluegreens
Unicells
Fourier
Other.algae
0.0
−0.4
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Figure 8.2: Estimated monthly effects for the three approaches to estimating
seasonal effects. Top panel: each month modelled as a separate fixed effect for
each taxon (60 parameters); Middle panel: monthly effects modelled as a 3rd
order polynomial (20 parameters); Bottom panel: monthly effects modelled
as a discrete Fourier series (10 parameters).
Rather than rely on our eyes to judge model fits, we should formally assess
which of the 3 approaches offers the most parsimonious fit to the data. Here
is a table of AICc values for the 3 models:
data.frame(Model = c("Fixed", "Cubic", "Fourier"), AICc = round(c(seas.mod.1$AICc,
seas.mod.2$AICc, seas.mod.3$AICc), 1))
Model AICc
234 CHAPTER 8. MARSS WITH COVARIATES
1 Fixed 1188.4
2 Cubic 1144.9
3 Fourier 1127.4
The model selection results indicate that the model with monthly seasonal
effects estimated via the discrete Fourier sequence is the best of the 3 models.
Its AICc value is much lower than either the polynomial or fixed-effects
models.
We will examine some basic model diagnostics for these three approaches by
looking at plots of the model residuals and their autocorrelation functions
(ACFs) for all five taxa using the following code:
for (i in 1:3) {
dev.new()
modn <- paste("seas.mod", i, sep = ".")
for (j in 1:5) {
plot.ts(residuals(get(modn))$model.residuals[j, ], ylab = "Residual",
main = phytos[j])
abline(h = 0, lty = "dashed")
acf(residuals(get(modn))$model.residuals[j, ], na.action = na.pass)
}
}
For these problems, use the following code to load in 1980-1994 phytoplankton
data, covariates, and z-score all the data. Run the code below and use dat
and covars directly in your code.
8.8. HOMEWORK DATA AND DISCUSSION 235
Diatoms
Residual
−0.2
0 20 40 60 80 100 120
Time
Series residuals(get(modn))$model.residuals[j, ]
ACF
−0.2
0 5 10 15 20
Lag
Figure 8.3: Residuals for model with season modelled as a discrete Fourier
series.
library(MARSS)
spp <- c("Cryptomonas", "Diatoms", "Greens", "Unicells", "Other.algae",
"Daphnia")
yrs <- lakeWAplanktonTrans[, "Year"] %in% 1980:1994
dat <- t(lakeWAplanktonTrans[yrs, spp])
# z-score the data
avg <- apply(dat, 1, mean, na.rm = TRUE)
sd <- sqrt(apply(dat, 1, var, na.rm = TRUE))
dat <- (dat - avg)/sd
rownames(dat) = spp
# always check that the mean and variance are 1 after
# z-scoring
apply(dat, 1, mean, na.rm = TRUE) #this should be 0
apply(dat, 1, var, na.rm = TRUE) #this should be 1
8.9 Problems
Read Section 8.8 for the data and tips on answering the questions and setting
up your models. Note the questions asking about the effects on growth rate
are asking about the C matrix in
xt = Bxt−1 + Cct + wt
The Cct + wt are the process errors and represent the growth rates (growth
above or below what you would expect given xt−1 ). Use your raw data in the
MARSS model. You do not need to difference the data to get at the growth
rates since the process model is modeling that.
1. How does month affect the mean phytoplankton population growth
rates? Show a plot of the estimated mean growth rate versus month
for each taxon using three approaches to estimate the month effect
(factor, polynomial, Fourier series). Estimate seasonal effects without
any covariate (Temp, TP) effects.
2. It is likely that both temperature and total phosphorus (TP) affect
phytoplankton population growth rates. Using MARSS models, estimate
the effect of Temp and TP on growth rates of each taxon. Leave out
the seasonal covariates from question 1, i.e. only use Temp and TP as
covariates. Make a plot of the point estimates of the Temp and TP
effects with the 95% CIs added to the plot. tidy() is an easy way to
get the parameters CIs.
3. Estimate the Temp and TP effects using B="unconstrained".
a. Compare the B matrix for the fit from question 2 and from question
3. Describe the species interactions modeled by the B matrix when
B="unconstrained". How is it different than the B matrix from
question 2? Note, you can retrieve the matrix using coef(fit,
type="matrix")$B.
b. Do the Temp and TP effects change when you use B="unconstrained"?
Make sure to look at the CIs also.
4. Using MARSS models, evaluate which (Temp or TP) is the more
important driver or if both are important. Again, leave out the seasonal
covariates from question 1, i.e. only use Temp and TP as covariates.
238 CHAPTER 8. MARSS WITH COVARIATES
9. Compare the AICc’s of the 3 seasonal models from question 1 and the
4 Temp/TP models from question 5. What does this tell you about the
Temp and TP only models?
10. We cannot just fit a model with season and Temp plus TP since Temp
and TP are highly seasonal. That will cause problems if we have
something that explain season (a polynomial) and a covariate that
has seasonality. Instead, use sequential regression to fit a model with
seasonality, Temp and TP. Use a 3rd order polynomial with the poly()
function to create orthogonal season covariates and then use sequential
regression (code in problem 8) to create Temp and TP covariates that
are orthogonal to your season covariates. Fit the model and compare a
model with only season to a model with season and Temp plus TP.
11. Another approach to looking at effects of covariates which have season
cycles is to examine if the seasonal anomalies of the independent variable
can be explained by the seasonal anomalies of the dependent variables.
In other words, can an unusually high February abundance (higher
than expected) be explained by an unusually high or low February
temperature? In this approach, you remove season so you do not need
to model it (with factor, polynomial, etc). The stl() function can be
used to decompose a time series using LOESS. We’ll use stl() since it
can handle missing values.
a. Decompose the Diatom time series using stl() and plot.
Use na.action=zoo::na.approx to deal with the NAs. Use
s.window="periodic". Other than that you can use the defaults.
i <- "Diatoms"
dati <- ts(dat[i, ], frequency = 12)
a <- stl(dati, "periodic", na.action = zoo::na.approx)
240 CHAPTER 8. MARSS WITH COVARIATES
c. Notice that you have simply removed the seasonal cycle from the data. Using
anoms <- matrix(NA, dim(dat)[1] + dim(covars)[1], dim(dat)[2])
rownames(anoms) <- c(rownames(dat), rownames(covars))
for (i in 1:dim(dat)[1]) {
a <- stl(ts(dat[i, ], frequency = 12), "periodic", na.action = zoo::na.app
anoms[i, ] <- a$time.series[, "remainder"] + a$time.series[,
"trend"]
}
for (i in 1:dim(covars)[1]) {
a <- stl(ts(covars[i, ], frequency = 12), "periodic", na.action = zoo::na.
anoms[i + dim(dat)[1], ] <- a$time.series[, "remainder"] +
a$time.series[, "trend"]
}
Chapter 9
Dynamic linear models (DLMs) are a type of linear regression model, wherein
the parameters are treated as time-varying rather than static. DLMs are
used commonly in econometrics, but have received less attention in the
ecological literature (c.f. Lamon et al., 1998; Scheuerell and Williams, 2005).
Our treatment of DLMs is rather cursory—we direct the reader to excellent
textbooks by Pole et al. (1994) and Petris et al. (2009) for more in-depth
treatments of DLMs. The former focuses on Bayesian estimation whereas the
latter addresses both likelihood-based and Bayesian estimation methods.
A script with all the R code in the chapter can be downloaded here. The
Rmd for this chapter can be downloaded here.
Data
Most of the data used in the chapter are from the MARSS package. Install
the package, if needed, and load:
library(MARSS)
The problem set uses an additional data set on spawners and recruits
(KvichakSockeye) in the atsalibrary package.
241
242 CHAPTER 9. DYNAMIC LINEAR MODELS
9.1 Overview
We begin our description of DLMs with a static regression model, wherein
the ith observation (response variable) is a linear function of an intercept,
predictor variable(s), and a random error term. For example, if we had one
predictor variable (f ), we could write the model as
yi = α + βfi + vi , (9.1)
yt = F>
t θ t + vt , (9.3)
θ t = Gt θ t−1 + wt , (9.4)
y t = F>
t θ t + et θ t = Gθ t−1 + wt ⇓ yt = Zt xt + vt xt = Bxt−1 + wt (9.5)
yt = Zt xt + vt . (9.6)
yt = Zt xt + Ddt + vt . (9.7)
The most simple DLM is a stochastic level model, where the level is a random
walk without drift, and this level is observed with error. We will write it first
in using regression notation where the intercept is α and then in MARSS
notation. In the latter, αt = xt .
Using this model, we can model the Nile River level and fit the model using
MARSS().
data(Nile, package = "datasets")
mod_list <- list(B = "identity", U = "zero", Q = matrix("q"),
Z = "identity", A = matrix("a"), R = matrix("r"))
fit <- MARSS(matrix(Nile, nrow = 1), mod_list)
MARSS fit is
Estimation method: kem
Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001
Estimation converged in 82 iterations.
Log-likelihood: -637.7569
AIC: 1283.514 AICc: 1283.935
Estimate
A.a -0.338
R.r 15135.796
Q.q 1381.153
x0.x0 1111.791
9.3. STOCHASTIC LEVEL MODELS 245
1400
1200
Flow of the River Nile
1000
800
600
Year
We can add a drift term to the level model to allow the level to tend upward
or downward with a deterministic rate η. This is a random walk with bias.
We can allow that the drift term η evolves over time along with the level. In
this case, η is modeled as a random walk along with α. This model is
" # " #
1 1 α h i
where B = ,x= and Z = 1 0 .
0 1 η
See Section 6.2 for more discussion of stochastic level models and Section @ref()
to see how to fit this model with the StructTS(sec-uss-the-structts-function)
function in the stats package.
The stochastic level models in Section 9.3 do not have predictor variables
(covariates). Let’s add one predictor variable ft and write a simple DLM
where the intercept α and slope β are stochastic. We will specify that α
and β evolve according to a simple random walk. Normally x is used for the
predictor variables in a regression model, but we will avoid that since we are
using x for the state equation in a state-space model. This model is
γ1 if qtr =1
γ if qtr =2
2
yt = αt + βt xt + γqtr + et γqtr = (9.16)
γ3 if qtr =3
γ4 if qtr =4
We can write Equation (9.16) in matrix form. In our model for γ, we will set
the variance to 0 so that the γ does not change with time.
h i α α α wα
y t = 1 xt β +et β = β +wβ ⇓ yt = Zt xt +vt xt = xt−1 +wt
1
αt
βt
γ1
i
h
yt = 1 xt 1 0 0 0
(9.18)
γ2
γ3
γ4
αt
βt
γ1
i
h
yt = 1 xt 0 1 0 0
γ
(9.19)
2
γ3
γ4
248 CHAPTER 9. DYNAMIC LINEAR MODELS
This would work, but we would have to have a different Zt matrix and it
might get cumbersome to keep track of the 0s and 1s. If we wanted the γ to
evolve with time, we might need to do this. However, if the γ are fixed, i.e. the
quarterly effect does not change over time, a less cumbersome approach is
possible.
We could instead keep the Zt matrix the same, but reorder the γi within x.
If t is in quarter 1,
αt
βt
γ1
i
h
yt = 1 xt 1 0 0 0
(9.20)
γ2
γ3
γ4
While if t is in quarter 2,
αt
βt
γ2
i
h
yt = 1 xt 1 0 0 0
(9.21)
γ3
γ4
γ1
1 0 0 0 0 0
0 1 0 0 0 0
0 0 0 1 0 0
G=
0
0 0 0 1 0
0 0 0 0 0 1
0 0 1 0 0 0
With this G, the γ rotate within x with each time step. If t is in quarter 1,
then t + 1 is in quarter 2, and we want γ2 to be in the 3rd row.
9.6. ANALYSIS OF SALMON SURVIVAL 249
α 1 0 0 0 0 0 α wα
β 0 1 0 0 0 0 β wβ
0 0 0 1 0 0
γ1 + 0
γ2
=
0 (9.22)
γ
3 0 0 0 1 0 γ2
0
γ4 0 0 0 0 0 1 γ3 0
γ1 t+1 0 0 1 0 0 0 γ4 t 0 t
α 1 0 0 0 0 0 α wα
β 0 1 0 0 0 0 β wβ
γ3 0 0 0 1 0 0 0
γ2
=
0 + (9.23)
γ
4 0 0 0 1 0 γ3
0
γ1 0 0 0 0 0 1 γ4 0
γ2 t+2 0 0 1 0 0 0 γ1 t+1 0 t
Let’s see an example of a DLM used to analyze real data from the literature.
Scheuerell and Williams (2005) used a DLM to examine the relationship
between marine survival of Chinook salmon and an index of ocean upwelling
strength along the west coast of the USA. Upwelling brings cool, nutrient-
rich waters from the deep ocean to shallower coastal areas. Scheuerell &
Williams hypothesized that stronger upwelling in April should create better
growing conditions for phytoplankton, which would then translate into more
zooplankton. In turn, juvenile salmon (“smolts”) entering the ocean in May
and June should find better foraging opportunities. Thus, for smolts entering
the ocean in year t,
and ft is the coastal upwelling index (cubic meters of seawater per second
per 100 m of coastline) for the month of April in year t.
Both the intercept and slope are time varying, so
250 CHAPTER 9. DYNAMIC LINEAR MODELS
yt = F>
t θ t +vt with vt ∼ N(0, r)θ t = Gt θ t−1 +wt with wt ∼ MVN(0, Q)θ 0 = π 0 .
(9.27)
Equation (9.27) is equivalent to our standard MARSS model:
TT <- length(years)
# get response variable: logit(survival)
dat <- matrix(SalmonSurvCUI[, 2], nrow = 1)
Plots of logit-transformed survival and the z-scored April upwelling index are
shown in Figure 9.1.
−4.0
Logit(s)
−6.0
1
CUI
−3 −1
Next, we need to set up the appropriate matrices and vectors for MARSS. Let’s
begin with those for the process equation because they are straightforward.
# for process eqn
B <- diag(m) ## 2x2; Identity
U <- matrix(0, nrow = m, ncol = 1) ## 2x1; both elements = 0
Q <- matrix(list(0), m, m) ## 2x2; all 0 for now
diag(Q) <- c("q.alpha", "q.beta") ## 2x2; diag = (q1,q2)
Defining the correct form for the observation model is a little more tricky,
however, because of how we model the effect(s) of predictor variables. In a
DLM, we need to use Zt (instead of dt ) as the matrix of predictor variables
that affect yt , and we use xt (instead of Dt ) as the regression parameters.
Therefore, we need to set Zt equal to an n × m × T array, where n is the
number of response variables (= 1; yt is univariate), m is the number of
regression parameters (= intercept + slope = 2), and T is the length of the
time series (= 42).
# for observation eqn
Z <- array(NA, c(1, m, TT)) ## NxMxT; empty for now
Z[1, 1, ] <- rep(1, TT) ## Nx1; 1's for intercept
Z[1, 2, ] <- CUI.z ## Nx1; predictor variable
A <- matrix(0) ## 1x1; scalar = 0
R <- matrix("r") ## 1x1; scalar = r
Lastly, we need to define our lists of initial starting values and model matri-
ces/vectors.
# only need starting values for regr parameters
inits.list <- list(x0 = matrix(c(0, 0), nrow = m))
# list of model matrices & vectors
mod.list <- list(B = B, U = U, Q = Q, Z = Z, A = A, R = R)
MARSS fit is
Estimation method: kem
Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001
Estimation converged in 115 iterations.
Log-likelihood: -40.03813
AIC: 90.07627 AICc: 91.74293
Estimate
R.r 0.15708
Q.q.alpha 0.11264
Q.q.beta 0.00564
x0.X1 -3.34023
x0.X2 -0.05388
Initial states (x0) defined at t=0
Notice that the MARSS output does not list any estimates of the regression
parameters themselves. Why not? Remember that in a DLM the matrix of
states (x) contains the estimates of the regression parameters (θ). Therefore,
we need to look in dlm1$states for the MLEs of the regression parameters,
and in dlm1$states.se for their standard errors.
Time series of the estimated intercept and slope are shown in Figure 9.2. It
appears as though the intercept is much more dynamic than the slope, as
indicated by a much larger estimate of process variance for the former (Q.q1).
In fact, although the effect of April upwelling appears to be increasing over
time, it doesn’t really become important as a predictor variable until about
1990 when the approximate 95% confidence interval for the slope no longer
overlaps zero.
9.8 Forecasting
−4.0
αt
−6.0
1.0
−0.2 0.4
βt
Figure 9.2: Time series of estimated mean states (thick lines) for the intercept
(top) and slope (bottom) parameters from the DLM specified by Equations
(9.24)–(9.27). Thin lines denote the mean ± 2 standard deviations.
where π t = E(θ t|t ) and Λt = Var(θ t|t ). Now we can compute the distribution
of θ t conditioned on y1:t−1 using the process equation for θ:
For step 2, we make the prediction of yt given the predictor variables at time
t and the estimate of the regression parameters at time t. This is called
the one-step ahead prediction for the observation at time t. We will denote
the prediction of y as ŷ and we want to compute its distribution (mean and
variance). We do this using the equation for yt but substituting the expected
value of θ t|t−1 for θ t .
ŷt|t−1 = F>
t E(θ t|t−1 ) + et with et ∼ N(0, r) (9.34)
Our prediction of y at t has a normal distribution with mean (expected value)
and variance. The expected value of ŷt|t−1 is
Var(ŷt|t−1 ) = F>
t Var(θ t|t−1 )Ft + r (9.36)
= F> >
t (Gt Λt−1 Gt + Q)Ft + r (9.37)
(9.38)
256 CHAPTER 9. DYNAMIC LINEAR MODELS
The expectations and variance of θ t conditioned on y1:t and y1:t−1 are standard
output from the Kalman filter. Thus to produce the predictions, all we need
to do is run our DLM state-space model through a Kalman filter to get
E(θ t|t−1 ) and Var(θ t|t−1 ) and then use Equation (9.35) to compute the mean
prediction and Equation (9.36) to compute its variance.
The Kalman filter will need Ft , Gt and estimates of Q and r. The latter are
calculated by fitting the DLM to the data y1:t , using for example the MARSS()
function.
Let’s see an example with the salmon survival DLM. We will use the Kalman
filter function in the MARSS package and the DLM fit from MARSS().
Scheuerell and Williams (2005) were interested in how well upwelling could be
used to actually forecast expected survival of salmon, so let’s look at how well
our model does in that context. To do so, we need the predictive distribution
for the survival at time t given the upwelling at time t and the predicted
regression parameters at t.
In the salmon survival DLM, the Gt matrix is the identity matrix, thus
the mean and variance of the one-step ahead predictive distribution for the
observation at time t reduces to (from Equations (9.35) and (9.36))
E(ŷt|t−1 ) = F> >
t E(θ t|t−1 )Var(ŷt|t−1 ) = Ft Var(θ t|t−1 )Ft + r̂ (9.39)
where " #
1
Ft =
ft
and ft is the upwelling index at t + 1. r̂ is the estimated observation variance
from our model fit.
Working from Equation (9.39), we can compute the expected value of the
forecast at time t and its variance using the Kalman filter. For the expectation,
9.8. FORECASTING 257
For the variance of the forecasts, we need F> t Var(θ t|t−1 )Ft + r̂. As with
the mean, F> t ≡ Z t . The variances of the one-step ahead forecasts of the
regression parameters at time t, Var(θ t|t−1 ), are also calculated as part of
the Kalman filter algorithm—they are stored as Vtt1 in the list produced by
the MARSSkfss() function. Lastly, the observation variance r̂ was estimated
when we fit the DLM to the data using MARSS() and can be extracted from
the dlm1 fit.
Plots of the model mean forecasts with their estimated uncertainty are shown
258 CHAPTER 9. DYNAMIC LINEAR MODELS
in Figure 9.3. Nearly all of the observed values fell within the approximate
prediction interval. Notice that we have a forecasted value for the first year of
the time series (1964), which may seem at odds with our notion of forecasting
at time t based on data available only through time t − 1. In this case,
however, MARSS is actually estimating the states at t = 0 (θ 0 ), which allows
us to compute a forecast for the first time point.
−2
−4
Logit(s)
−6
−8
Figure 9.3: Time series of logit-transformed survival data (blue dots) and
model mean forecasts (thick line). Thin lines denote the approximate 95%
prediction intervals.
Notice that we passed the DLM fit to all the data to MARSSkfss(). This
meant that the Kalman filter used estimates of Q and r using all the data
in the xtt1 and Vtt1 calculations. Thus our predictions at time t are not
entirely based on only data up to time t − 1 since the Q and r estimates were
from all the data from 1964 to 2005.
9.9. FORECAST DIAGNOSTICS 259
0.12
Survival
0.06
0.00
Figure 9.4: Time series of survival data (blue dots) and model mean forecasts
(thick line). Thin lines denote the approximate 95% prediction intervals.
As with other time series models, evaluation of a DLM should include diag-
nostics. In a forecasting context, we are often interested in the forecast errors,
which are simply the observed data minus the forecasts et = yt − E(yt |y1:t−1 ).
In particular, the following assumptions should hold true for et :
Let’s see if our innovations meet the model assumptions. Beginning with (1),
we can use a Q-Q plot to see whether the innovations are normally distributed
with a mean of zero. We’ll use the qqnorm() function to plot the quantiles of
the innovations on the y-axis versus the theoretical quantiles from a Normal
distribution on the x-axis. If the 2 distributions are similar, the points should
fall on the line defined by y = x.
260 CHAPTER 9. DYNAMIC LINEAR MODELS
1.5
Sample Quantiles
0.5
−0.5
−1.5
−2 −1 0 1 2
Theoretical Quantiles
Figure 9.5: Q-Q plot of the forecast errors (innovations) for the DLM specified
in Equations (9.24)–(9.27).
The Q-Q plot (Figure 9.5) indicates that the innovations appear to be more-
or-less normally distributed (i.e., most points fall on the line). Furthermore,
it looks like the mean of the innovations is about 0, but we should use a more
reliable test than simple visual inspection. We can formally test whether the
mean of the innovations is significantly different from 0 by using a one-sample
t-test. based on a null hypothesis of E(et ) = 0. To do so, we will use the
function t.test() and base our inference on a significance value of α = 0.05.
# p-value for t-test of H0: E(innov) = 0
t.test(t(innov), mu = 0)$p.value
[1] 0.4840901
The p-value >> 0.05 so we cannot reject the null hypothesis that E(et ) = 0.
Moving on to assumption (2), we can use the sample autocorrelation function
(ACF) to examine whether the innovations covary with a time-lagged version
of themselves. Using the acf() function, we can compute and plot the
correlations of et and et−k for various values of k. Assumption (2) will be
met if none of the correlation coefficients exceed the 95% confidence intervals
9.9. FORECAST DIAGNOSTICS 261
√
defined by ± z0.975 / n.
# plot ACF of innovations
acf(t(innov), lag.max = 10)
1.0
0.6
ACF
0.2
−0.2
0 2 4 6 8 10
Lag
Figure 9.6: Autocorrelation plot of the forecast errors (innovations) for the
DLM specified in Equations (9.24)–(9.27). Horizontal blue lines define the
upper and lower 95% confidence intervals.
The ACF plot (Figure 9.6) shows no significant autocorrelation in the innova-
tions at lags 1–10, so it looks like both of our model assumptions have indeed
been met.
262 CHAPTER 9. DYNAMIC LINEAR MODELS
For the homework this week we will use a DLM to examine some of the
time-varying properties of the spawner-recruit relationship for Pacific salmon.
Much work has been done on this topic, particularly by Randall Peterman and
his students and post-docs at Simon Fraser University. To do so, researchers
commonly use a Ricker model because of its relatively simple form, such that
the number of recruits (offspring) born in year t (Rt ) from the number of
spawners (parents) (St ) is
In this case we are estimating some base-level productivity (α) plus the
time-varying effect of some covariate Xt (δt ).
The data come from a large public database begun by Ransom Myers many
years ago. If you are interested, you can find lots of time series of spawning-
stock, recruitment, and harvest for a variety of fishes around the globe. Here
is the website:
https://www.ramlegacy.org/
For this exercise, we will use spawner-recruit data for sockeye salmon (On-
corhynchus nerka) from the Kvichak River in SW Alaska that span the years
1952-1989. In addition, we’ll examine the potential effects of the Pacific
Decadal Oscillation (PDO) during the salmon’s first year in the ocean, which
is widely believed to be a “bottleneck” to survival.
These data are in the atsalibrary package on GitHub. If needed, install
using the devtools package.
library(devtools)
devtools::install_github("nwfsc-timeseries/atsalibrary")
The data are a dataframe with columns for brood year (brood.yr), number
of spawners (Sp), number of recruits (Rec) and PDO at year t − 2 (PDO.t2)
and t − 3 (PDO.t3).
# head of data file
head(SRdata)
9.11 Problems
Use the information and data in the previous section to answer the following
questions. Note that if any model is not converging, then you will need to
increase the maxit parameter in the control argument/list that gets passed
to MARSS(). For example, you might try control=list(maxit=2000).
1. Begin by fitting a reduced form of Equation (9.44) that includes only a
time-varying level (αt ) and observation error (vt ). That is,
log(Rt ) = αt + log(St ) + vt
log(Rt /St ) = αt + vt
Here we will use the MARSS package to do Dynamic Factor Analysis (DFA),
which allows us to look for a set of common underlying processes among
a relatively large set of time series (Zuur et al., 2003). There have been a
number of recent applications of DFA to ecological questions surrounding
Pacific salmon (Stachura et al., 2014; Jorgensen et al., 2016; Ohlberger et al.,
2016) and stream temperatures (Lisi et al., 2015). For a more in-depth
treatment of potential applications of MARSS models for DFA, see Chapter
9 in the MARSS User’s Guide.
A script with all the R code in the chapter can be downloaded here. The
Rmd for this chapter can be downloaded here.
All the data used in the chapter are in the MARSS package. Install the
package, if needed, and load to run the code in the chapter.
267
268 CHAPTER 10. DYNAMIC FACTOR ANALYSIS
library(MARSS)
10.1 Introduction
DFA is conceptually different than what we have been doing in the previous
applications. Here we are trying to explain temporal variation in a set of n
observed time series using linear combinations of a set of m hidden random
walks, where m << n. A DFA model is a type of MARSS model with the
following structure:
This equation should look rather familiar as it is exactly the same form we
used for estimating varying number of processes from a set of observations in
Lesson II. The difference with DFA is that rather than fixing the elements
within Z at 1 or 0 to indicate whether an observation does or does not
correspond to a trend, we will instead estimate them as “loadings” on each of
the states/processes.
y1 z11 z12 z13 a1 v1
y2 z21 z22 z23 x a v
1 2 2
y3 = z31 z32 z33 x2 + a3 + v3 (10.2)
.
y4 z41 z42 z43 x3 t a4 v4
y5 t z51 z52 z53 a5 v5 t
10.3. CONSTRAINING A DFA MODEL 269
x1 1 0 0 x1 w1
x2 = 0 1 0 x2 + w2 (10.3)
x3 t 0 0 1 x3 t−1 w3 t
v1 0 r11 r12 r13 r14 r15
v2 0 r12 r22 r23 r24 r25
v3 ∼ MVN 0 , r13 r23 r33 r34 r35
(10.4)
v4 0 r14 r24 r34 r44 r45
v5 t 0 r15 r25 r35 r45 r55
w1 0 q11 q12 q13
w2 ∼ MVN 0 , q12 q22 q23 . (10.5)
• in the first m − 1 rows of Z, the z-value in the j-th column and i-th
row is set to zero if j > i; and
y1 z11 0 0 0 v1
y2 z21 z22 0 x 0 v2
1
2 + 0 + v3 .
y3 = z31 z32 z33 x (10.6)
y4 z41 z42 z43 x3 t 0 v4
y5 t z51 z52 z53 0 v5 t
x1 1 0 0 x1 w1
x2 = 0 1 0 x2 + w2 (10.7)
x3 t 0 0 1 x3 t−1 w3 t
The distribution of the observation errors would stay the same, such that
v1 0 r11 r12 r13 r14 r15
v2 0 r12 r22 r23 r24 r25
v3 ∼ MVN 0 , r13 r23 r33 r34 r35 (10.8)
.
v4 0 r14 r24 r34 r44 r45
v5 t 0 r15 r25 r35 r45 r55
w1 0 1 0 0
w
2 ∼ MVN 0 1 0 ,
0 ,
(10.9)
w3 t 0 0 0 1
r1 0 0 0 0
0 r2 0 0 0
0 0 r3 0 0 .
R=
0 0 0 r4 0
0 0 0 0 r5
Any of these options for R (and other custom options as well) are available to
us in a DFA model, just as they were in the MARSS models used in previous
chapters.
Next, we transpose the data matrix and calculate the number of time series
and their length.
## transpose data so time goes across columns
dat_1980 <- t(dat_1980)
## get number of time series
N_ts <- dim(dat_1980)[1]
## get length of time series
TT <- dim(dat_1980)[2]
Here are time series plots of all five phytoplankton functional groups.
spp <- rownames(dat_1980)
clr <- c("brown", "blue", "darkgreen", "darkred", "purple")
cnt <- 1
par(mfrow = c(N_ts, 1), mai = c(0.5, 0.7, 0.1, 0.1), omi = c(0,
0, 0, 0))
for (i in spp) {
plot(dat[i, ], xlab = "", ylab = "Abundance index", bty = "L",
xaxt = "n", pch = 16, col = clr[cnt], type = "b")
axis(1, 12 * (0:dim(dat_1980)[2]) + 1, yr_frst + 0:dim(dat_1980)[2])
title(i)
cnt <- cnt + 1
10.6. FITTING DFA MODELS WITH THE MARSS PACKAGE 273
}
Cryptomonas
Abundance index
1.0
0.0
−1.0
1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990
Diatoms
2
Abundance index
1
0
−1
−2
1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990
Greens
3
Abundance index
2
1
0
−2
1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990
Unicells
1
Abundance index
0
−2
−4
1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990
Other.algae
2
Abundance index
1
0
−1
−2
1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990
Note that even though some of the model elements are scalars and vectors,
we will need to specify everything as a matrix (or array for time series of
matrices).
Notice that the code below uses some of the MARSS shortcuts
for specifying forms of vectors and matrices. We will also use the
matrix(list(),nrow,ncol) trick we learned previously.
Here we will fit the DFA model above where we have R N_ts observed time
series and we want 3 hidden states. Now we need to set up the observation
model for MARSS. Here are the vectors and matrices for our first model where
each nutrient follows its own process. Recall that we will need to set the
elements in the upper R corner of Z to 0. We will assume that the observation
errors have different variances and they are independent of one another.
## 'ZZ' is loadings matrix
Z_vals <- list("z11", 0, 0, "z21", "z22", 0, "z31", "z32", "z33",
"z41", "z42", "z43", "z51", "z52", "z53")
ZZ <- matrix(Z_vals, nrow = N_ts, ncol = 3, byrow = TRUE)
ZZ
We need to specify the explicit form for all of the vectors and matrices in the
full form of the MARSS model we defined in Sec 3.1. Note that we do not
have to specify anything for the states (x) – those are elements that MARSS
will identify and estimate itself based on our definitions of the other vectors
and matrices.
## number of processes
mm <- 3
## 'BB' is identity: 1's along the diagonal & 0's elsewhere
BB <- "identity" # diag(mm)
## 'uu' is a column vector of 0's
uu <- "zero" # matrix(0,mm,1)
## 'CC' and 'cc' are for covariates
CC <- "zero" # matrix(0,mm,1)
cc <- "zero" # matrix(0,1,wk_last)
## 'QQ' is identity
QQ <- "identity" # diag(mm)
Now it’s time to fit our first DFA model To do so, we need to create three
lists that we will need to pass to the MARSS() function:
1. A list of specifications for the model’s vectors and matrices;
2. A list of any initial values – MARSS will pick its own otherwise;
3. A list of control parameters for the MARSS() function.
## list with specifications for model vectors/matrices
mod_list <- list(Z = ZZ, A = aa, D = DD, d = dd, R = RR, B = BB,
U = uu, C = CC, c = cc, Q = QQ)
## list with model inits
init_list <- list(x0 = matrix(rep(0, mm), mm, 1))
## list with model control parameters
con_list <- list(maxit = 3000, allow.degen = TRUE)
## fit MARSS
dfa_1 <- MARSS(y = dat, model = mod_list, inits = init_list,
control = con_list)
MARSS fit is
Estimation method: kem
Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001
Estimation converged in 246 iterations.
Log-likelihood: -692.9795
AIC: 1425.959 AICc: 1427.42
Estimate
Z.z11 0.2738
Z.z21 0.4487
Z.z31 0.3170
Z.z41 0.4107
Z.z51 0.2553
Z.z22 0.3608
Z.z32 -0.3690
Z.z42 -0.0990
Z.z52 -0.3793
Z.z33 0.0185
Z.z43 -0.1404
Z.z53 0.1317
R.(Cryptomonas,Cryptomonas) 0.1638
R.(Diatoms,Diatoms) 0.2913
R.(Greens,Greens) 0.8621
R.(Unicells,Unicells) 0.3080
R.(Other.algae,Other.algae) 0.5000
x0.X1 0.2218
x0.X2 1.8155
x0.X3 -4.8097
Initial states (x0) defined at t=0
10.7. INTERPRETING THE MARSS OUTPUT 277
and
There are many ways of doing factor rotations, but a common method is the
“varimax”" rotation, which seeks a rotation matrix H that creates the largest
278 CHAPTER 10. DYNAMIC FACTOR ANALYSIS
difference between the loadings in Z. For example, imagine that row 3 in our
estimated Z matrix was (0.2, 0.2, 0.2). That would mean that green algae
were a mixture of equal parts of processes 1, 2, and 3. If instead row 3 was
(0.8, 0.1, 0.05), this would make our interpretation of the model fits easier
because we could say that green algae followed the first process most closely.
The varimax rotation would find the H matrix that makes the rows in Z
more like (0.8, 0.1, 0.05) and less like (0.2, 0.2, 0.2).
The varimax rotation is easy to compute because R has a built in function for
this: varimax(). Interestingly, the function returns the inverse of H, which
we need anyway.
## get the estimated ZZ
Z_est <- coef(dfa_1, type = "matrix")$Z
## get the inverse of the rotation matrix
H_inv <- varimax(Z_est)$rotmat
Here are plots of the three hidden processes (left column) and the loadings
for each of phytoplankton groups (right column).
ylbl <- phytoplankton
w_ts <- seq(dim(dat)[2])
layout(matrix(c(1, 2, 3, 4, 5, 6), mm, 2), widths = c(2, 1))
## par(mfcol=c(mm,2), mai=c(0.5,0.5,0.5,0.1), omi=c(0,0,0,0))
par(mai = c(0.5, 0.5, 0.5, 0.1), omi = c(0, 0, 0, 0))
## plot the processes
for (i in 1:mm) {
ylm <- c(-1, 1) * max(abs(proc_rot[i, ]))
## set up plot area
10.9. ESTIMATED STATES AND LOADINGS 279
It looks like there are strong seasonal cycles in the data, but there is some
indication of a phase difference between some of the groups. We can use
ccf() to investigate further.
par(mai = c(0.9, 0.9, 0.1, 0.1))
ccf(proc_rot[1, ], proc_rot[2, ], lag.max = 12, main = "")
280 CHAPTER 10. DYNAMIC FACTOR ANALYSIS
0.6
2
0.4
0.2
1
0.0
0
Cryptomonas
Diatoms
Greens
Unicells
Other.algae
−0.6 −0.4 −0.2
−1
−2
1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990
0.6
3
Cryptomonas
Other.algae
0.4
2
Unicells
Greens
0.2
1
0.0
0
Diatoms
−0.6 −0.4 −0.2
−1
−2
−3
1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990
Cryptomonas
Other.algae
0.4
5
Diatoms
Unicells
Greens
0.2
0.0
0
1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990
0.2
0.0
ACF
−0.2
−0.4
−10 −5 0 5 10
Lag
We can plot the fits for our DFA model along with the data. The following
function will return the fitted values ± (1-α)% confidence intervals.
get_DFA_fits <- function(MLEobj, dd = NULL, alpha = 0.05) {
## empty list for results
fits <- list()
## extra stuff for var() calcs
Ey <- MARSS:::MARSShatyt(MLEobj)
## model params
ZZ <- coef(MLEobj, type = "matrix")$Z
## number of obs ts
nn <- dim(Ey$ytT)[1]
## number of time steps
TT <- dim(Ey$ytT)[2]
## get the inverse of the rotation matrix
H_inv <- varimax(ZZ)$rotmat
## check for covars
282 CHAPTER 10. DYNAMIC FACTOR ANALYSIS
if (!is.null(dd)) {
DD <- coef(MLEobj, type = "matrix")$D
## model expectation
fits$ex <- ZZ %*% H_inv %*% MLEobj$states + DD %*% dd
} else {
## model expectation
fits$ex <- ZZ %*% H_inv %*% MLEobj$states
}
## Var in model fits
VtT <- MARSSkfss(MLEobj)$VtT
VV <- NULL
for (tt in 1:TT) {
RZVZ <- coef(MLEobj, type = "matrix")$R - ZZ %*% VtT[,
, tt] %*% t(ZZ)
SS <- Ey$yxtT[, , tt] - Ey$ytT[, tt, drop = FALSE] %*%
t(MLEobj$states[, tt, drop = FALSE])
VV <- cbind(VV, diag(RZVZ + SS %*% t(ZZ) + ZZ %*% t(SS)))
}
SE <- sqrt(VV)
## upper & lower (1-alpha)% CI
fits$up <- qnorm(1 - alpha/2) * SE + fits$ex
fits$lo <- qnorm(alpha/2) * SE + fits$ex
return(fits)
}
Here are time series of the five phytoplankton groups (points) with the mean
of the DFA fits (black line) and the 95% confidence intervals (gray lines).
## get model fits & CI's
mod_fit <- get_DFA_fits(dfa_1)
## plot the fits
ylbl <- phytoplankton
par(mfrow = c(N_ts, 1), mai = c(0.5, 0.7, 0.1, 0.1), omi = c(0,
0, 0, 0))
for (i in 1:N_ts) {
up <- mod_fit$up[i, ]
mn <- mod_fit$ex[i, ]
lo <- mod_fit$lo[i, ]
10.10. PLOTTING THE DATA AND MODEL FITS 283
plot(w_ts, mn, xlab = "", ylab = ylbl[i], xaxt = "n", type = "n",
cex.lab = 1.2, ylim = c(min(lo), max(up)))
axis(1, 12 * (0:dim(dat_1980)[2]) + 1, yr_frst + 0:dim(dat_1980)[2])
points(w_ts, dat[i, ], pch = 16, col = clr[i])
lines(w_ts, up, col = "darkgray")
lines(w_ts, mn, col = "black", lwd = 2)
lines(w_ts, lo, col = "darkgray")
}
1.5
Cryptomonas
0.5
−0.5
−2.0
1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990
2
1
Diatoms
0
−1
−3
1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990
2
Greens
0
−2
−4
1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990
2
1
Unicells
0
−2
−4
1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990
2
Other.algae
1
0
−2
1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990
We will now fit three different models that each add covariate effects (i.e.,
Temp, TP, Temp and TP) to our existing model above where m = 3 and R is
"diagonal and unequal".
mod_list = list(m = 3, R = "diagonal and unequal")
dfa_temp <- MARSS(dat, model = mod_list, form = "dfa", z.score = FALSE,
control = con_list, covariates = temp)
dfa_TP <- MARSS(dat, model = mod_list, form = "dfa", z.score = FALSE,
control = con_list, covariates = TP)
10.12. EXAMPLE FROM LAKE WASHINGTON 285
Next we can compare whether the addition of the covariates improves the
model fit.
print(cbind(model = c("no covars", "Temp", "TP", "Temp & TP"),
AICc = round(c(dfa_1$AICc, dfa_temp$AICc, dfa_TP$AICc, dfa_both$AICc))),
quote = FALSE)
model AICc
[1,] no covars 1427
[2,] Temp 1356
[3,] TP 1414
[4,] Temp & TP 1362
This suggests that adding temperature or phosphorus to the model, either
alone or in combination with one another, does seem to improve overall model
fit. If we were truly interested in assessing the “best” model structure that
includes covariates, however, we should examine all combinations of 1-4 trends
and different structures for R.
Now let’s try to fit a model with a dummy variable for season, and see how
that does.
cos_t <- cos(2 * pi * seq(TT)/12)
sin_t <- sin(2 * pi * seq(TT)/12)
dd <- rbind(cos_t, sin_t)
dfa_seas <- MARSS(dat_1980, model = mod_list, form = "dfa", z.score = TRUE,
control = con_list, covariates = dd)
MARSS fit is
Estimation method: kem
Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001
Estimation converged in 384 iterations.
Log-likelihood: -713.8464
286 CHAPTER 10. DYNAMIC FACTOR ANALYSIS
Estimate
Z.11 0.49562
Z.21 0.27206
Z.31 0.03354
Z.41 0.51692
Z.51 0.18981
Z.22 0.05290
Z.32 -0.08042
Z.42 0.06336
Z.52 0.06157
Z.33 0.02383
Z.43 0.19506
Z.53 -0.10800
R.(Cryptomonas,Cryptomonas) 0.51583
R.(Diatoms,Diatoms) 0.53296
R.(Greens,Greens) 0.60329
R.(Unicells,Unicells) 0.19787
R.(Other.algae,Other.algae) 0.52977
D.(Cryptomonas,cos_t) -0.43973
D.(Diatoms,cos_t) -0.44836
D.(Greens,cos_t) -0.66003
D.(Unicells,cos_t) -0.34898
D.(Other.algae,cos_t) -0.42773
D.(Cryptomonas,sin_t) 0.23672
D.(Diatoms,sin_t) 0.72062
D.(Greens,sin_t) -0.46019
D.(Unicells,sin_t) -0.00873
D.(Other.algae,sin_t) -0.64228
Initial states (x0) defined at t=0
[1] 1484.355
10.12. EXAMPLE FROM LAKE WASHINGTON 287
The model with a dummy seasonal factor does much better than the covariate
models, but still not as well as the model with only 3 trends. The model fits
for the seasonal effects model are shown below.
## get model fits & CI's
mod_fit <- get_DFA_fits(dfa_seas, dd = dd)
## plot the fits
ylbl <- phytoplankton
par(mfrow = c(N_ts, 1), mai = c(0.5, 0.7, 0.1, 0.1), omi = c(0,
0, 0, 0))
for (i in 1:N_ts) {
up <- mod_fit$up[i, ]
mn <- mod_fit$ex[i, ]
lo <- mod_fit$lo[i, ]
plot(w_ts, mn, xlab = "", ylab = ylbl[i], xaxt = "n", type = "n",
cex.lab = 1.2, ylim = c(min(lo), max(up)))
axis(1, 12 * (0:dim(dat_1980)[2]) + 1, yr_frst + 0:dim(dat_1980)[2])
points(w_ts, dat[i, ], pch = 16, col = clr[i])
lines(w_ts, up, col = "darkgray")
lines(w_ts, mn, col = "black", lwd = 2)
lines(w_ts, lo, col = "darkgray")
}
288 CHAPTER 10. DYNAMIC FACTOR ANALYSIS
1 2 3
Cryptomonas
−1
−3
1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990
3
2
Diatoms
1
0
−2
1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990
2
1
Greens
0
−2
1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990
2
Unicells
1
0
−2 −1
1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990
3
2
Other.algae
1
−1 0
−3
1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990
Figure 10.5: Data and model fits for the DFA with covariates.
10.13. PROBLEMS 289
10.13 Problems
For your homework this week, we will continue to investigate common trends
in the Lake Washington plankton data.
1. Fit other DFA models to the phytoplankton data with varying numbers
of trends from 1-4 (we fit a 3-trend model above). Do not include any
covariates in these models. Using R="diagonal and unequal" for the
observation errors, which of the DFA models has the most support from
the data?
Plot the model states and loadings as in Section 10.9. Describe the
general patterns in the states and the ways the different taxa load onto
those trends.
Also plot the the model fits as in Section 10.10. Do they reasonable?
Are there any particular problems or outliers?
2. How does the best model from Question 1 compare to a DFA model
with the same number of trends, but with R="unconstrained"?
Plot the model states and loadings as in Section 10.9. Describe the
general patterns in the states and the ways the different taxa load onto
those trends.
Also plot the the model fits as in Section 10.10. Do they reasonable?
Are there any particular problems or outliers?
3. Fit a DFA model that includes temperature as a covariate and 3 trends
(as in Section 10.12), but withR="unconstrained"? How does this
model compare to the model with R="diagonal and unequal"? How
does it compare to the model in Question 2?
Plot the model states and loadings as in Section 10.9. Describe the
general patterns in the states and the ways the different taxa load onto
those trends.
Also plot the the model fits as in Section 10.10. Do they reasonable?
Are there any particular problems or outliers?
290 CHAPTER 10. DYNAMIC FACTOR ANALYSIS
Chapter 11
A script with all the R code in the chapter can be downloaded here. The
Rmd for this chapter can be downloaded here.
This chapter will use a SNOTEL dataset. These are data on snow water
equivalency at locations throughtout the state of Washington. The data are
in the atsalibrary package.
data(snotel, package = "atsalibrary")
The main packages used in this chapter are MARSS and forecast.
library(MARSS)
library(forecast)
library(ggplot2)
library(ggmap)
library(broom)
291
292 CHAPTER 11. COVARIATES WITH NAS
The elements with superscript (v) are for the k variate states and those with
superscript (c) are for the q covariate states. The dimension of x(c) is q ×1 and
q is not necessarily equal to p, the number of covariate observation time series
in your dataset. Imagine, for example, that you have two temperature sensors
and you are combining these data. Then you have two covariate observation
time series (p = 2) but only one underlying covariate state time series (q = 1).
The matrix C is dimension k × q, and B(c) and Q(c) are dimension q × q.
The dimension of x(v) is k × 1, and B(v) and Q(v) are dimension k × k. The
dimension of x is always denoted m. If your process model includes only
variates, then k = m, but now your process model includes k variates and q
covariate states so m = k + q.
Next, we can write the observation equation in an analogous manner, such
11.1. COVARIATES WITH MISSING VALUES OR OBSERVATION ERROR293
log-likelihood of the non-covariate data, you will need to subtract off the
log-likelihood of the covariate model:
(c) (c)
xt = B(c) xt−1 + u(c) + wt , where wt ∼ MVN(0, Q(c) )
(c) (c)
(11.5)
yt = Z(c) xt + a(c) + vt , where vt ∼ MVN(0, R(c) )
An easy way to get this log-likelihood for the covariate data only is use the
augmented model (Equation (11.2) with terms defined as in Equation (11.4)
but pass in missing values for the non-covariate data. The following code
shows how to do this.
y.aug = rbind(data, covariates)
fit.aug = MARSS(y.aug, model = model.aug)
fit.aug is the MLE object that can be passed to MARSSkf(). You need to
make a version of this MLE object with the non-covariate data filled with NAs
so that you can compute the log-likelihood without the covariates. This needs
to be done in the marss element since that is what is used by MARSSkf().
Below is code to do this.
fit.cov = fit.aug
fit.cov$marss$data[1:dim(data)[1], ] = NA
extra.LL = MARSSkf(fit.cov)$logLik
Note that when you fit the augmented model, the estimates of C and B(c) are
affected by the non-covariate data since the model for both the non-covariate
and covariate data are estimated simultaneously and are not independent
(since the covariate states affect the non-covariates states). If you want the
covariate model to be unaffected by the non-covariate data, you can fit the
covariate model separately and use the estimates for B(c) and Q(c) as fixed
values in your augmented model.
Let’s see an example using the Washington SNOTEL data. The data we will
use is the snow water equivalent percent of normal. This represents the snow
water equivalent compared to the average value for that site on the same
11.2. EXAMPLE: SNOTEL DATA 295
day. We will look at a subset of sites in the Central Cascades in our snotel
dataset (Figure 11.1).
y <- snotelmeta
# Just use a subset
y = y[which(y$Longitude < -121.4), ]
y = y[which(y$Longitude > -122.5), ]
y = y[which(y$Latitude < 47.5), ]
y = y[which(y$Latitude > 46.5), ]
SnoTel sites
49
48
Longitude
47
46
45
−124 −122 −120 −118
Latitude
For the first analysis, we are just going to look at February Snow Water
Equivalent (SWE). Our subset of stations is y$Station.Id. There are many
missing years among some of our stations (Figure 11.2).
swe.feb <- snotel
swe.feb <- swe.feb[swe.feb$Station.Id %in% y$Station.Id & swe.feb$Month ==
"Feb", ]
p <- ggplot(swe.feb, aes(x = Date, y = SWE)) + geom_line()
p + facet_wrap(~Station)
296 CHAPTER 11. COVARIATES WITH NAS
60
40
20
0
Huckleberry Creek Lynn Lake Meadows Pass Morse Lake
60
40
20
0
SWE
60
40
20
0
1980 1990 2000 2010
Rex River Sawmill Ridge Tinkham Creek
60
40
20
0
1980 1990 2000 20101980 1990 2000 20101980 1990 2000 2010
Date
Figure 11.2: Snow water equivalent time series from each SNOTEL station.
Imagine that for our study we need an estimate of SWE for all sites. We will
use the information from the sites with full data to estimate the missing SWE
for other sites. We will use a MARSS model to use all the available data.
x1 b 0 ... 0 x1 w1 y1 x1 a1 v1
x 0 b ... 0 x w y x a v
2 2 2 2 2 2 2
= + = + +
. . . . . . ... ... . . . . . . ... . . . . . . . . . . . .
We set up the model for MARSS so that it is the same as (11.6). We will fix
the measurement error to be small; we could use 0 but the fitting is more
stable if we use a small variance instead. When estimating B, setting the
initial value to be at t = 1 instead of t = 0 works better.
ns <- length(unique(swe.feb$Station))
B <- "diagonal and equal"
Q <- "unconstrained"
R <- diag(0.01, ns)
U <- "zero"
A <- "unequal"
x0 <- "unequal"
mod.list.ar1 = list(B = B, Q = Q, R = R, U = U, x0 = x0, A = A,
tinitx = 1)
Now we can fit a MARSS model and get estimates of the missing SWEs.
Convergence is slow. We set a equal to the mean of the time series to speed
convergence.
library(MARSS)
m <- apply(dat.feb, 1, mean, na.rm = TRUE)
fit.ar1 <- MARSS(dat.feb, model = mod.list.ar1, control = list(maxit = 5000),
inits = list(A = matrix(m, ns, 1)))
Let’s plot the estimated SWEs for the missing years (Figure 11.3). These
estimates use all the information about the correlation with other sites and
uses information about correlation with the prior and subsequent years. We
will use the tidy() function to get the estimates and the 95% prediction
intervals. The prediction interval is for the range of SWE values we might
observe for that site. Notice that for some sites, intervals are low in early
years as these sites are highly correlated with site for which there are data.
In other sites, the uncertainty is high in early years because the sites with
data in those years are not highly correlated. There are no intervals for sites
with data. We have data for those sites, so we are not uncertain about the
observed SWE for those.
298 CHAPTER 11. COVARIATES WITH NAS
40
20
0
Figure 11.3: Estimated SWEs for the missing sites with prediction intervals.
11.2.1.1 Diagnostics
0.5
0.5
0.5
ACF
ACF
ACF
0.2
−0.5
−0.5
−0.4
−0.5
0.8
0.8
0.8
0.5
ACF
ACF
ACF
−0.4 0.2
0.2
0.2
−0.5
−0.4
−0.4
0.5
0.5
ACF
ACF
ACF
−0.5
−0.5
−0.5
−0.5
0.5
0.5
ACF
ACF
0.2
−0.5
−0.5
−0.4
0 2 4 6 8 12 0 2 4 6 8 12 0 2 4 6 8 12
Figure 11.4: State residuals for the AR(1) model. Many stations for autocor-
relation at lag-1.
for our estimates. In this case, the state of the missing SWE values at time t
is the expected value conditioned on all the stations with data at time t given
the estimated variance-covariance matrix Q.
y1 a1 v1 σ1 ζ1,2 . . . ζ1,15
y a v ζ σ2 . . . ζ2,15
2 2 2 2,1
= + , (11.7)
. . . . . . . . . ... ... ... ...
Now we can fit a MARSS model and get estimates of the missing SWEs.
Convergence is slow. We set a equal to the mean of the time series to speed
convergence.
11.2. EXAMPLE: SNOTEL DATA 301
The estimated SWEs for the missing years uses the information about the
correlation with other sites only.
fit <- fit.corr
d <- broom::tidy(fit, type = "ytT", conf.int = TRUE)
d$Year <- d$t + 1980
d$Station <- d$.rownames
p <- ggplot(data = d) + geom_line(aes(Year, estimate)) + geom_point(aes(Year,
y)) + geom_ribbon(aes(x = Year, ymin = pred.low, ymax = pred.high),
linetype = 2, alpha = 0.2, fill = "blue") + facet_wrap(~Station) +
xlab("") + ylab("SWE (demeaned)")
p
25
0
1980 1990 2000 20101980 1990 2000 20101980 1990 2000 2010
Figure 11.5: Estimated SWEs from the expected value of the states x̂ condi-
tioned on all the data for the model with only correlation across stations at
time t.
302 CHAPTER 11. COVARIATES WITH NAS
11.2.2.1 Diagnostics
The state and model residuals have no tendency towards negative autocorre-
lation now that we removed the autoregressive component from the process
(x) model.
fit <- fit.corr
par(mfrow = c(4, 4), mar = c(2, 2, 1, 1))
apply(residuals(fit)$state.residuals, 1, acf, na.action = na.pass)
mtext("State Residuals ACF", outer = TRUE, side = 3)
1.0
1.0
1.0
0.8
0.4
0.4
0.4
ACF
ACF
ACF
−0.4 0.2
−0.2
−0.2
−0.2
1.0
1.0
0.4
0.4
ACF
ACF
ACF
−0.2
−0.2
−0.2
−0.5
0.8
0.8
ACF
ACF
ACF
−0.4 0.2
−0.4 0.2
−0.4 0.2
0.2
−0.4
0.5
0.4
ACF
ACF
−0.4 0.2
−0.2
−0.5
0 5 10 15 0 5 10 15 0 5 10 15
0.8
0.8
0.8
0.8
ACF
ACF
ACF
−0.4 0.2
0.2
0.2
0.2
−0.4
−0.4
−0.4
Series newX[, i] Series newX[, i] Series newX[, i] Series newX[, i]
0 2 4 6 8 12 0 2 4 6 8 12 0 2 4 6 8 12 0 2 4 6 8 12
0.8
0.8
0.8
0.8
ACF
ACF
ACF
−0.4 0.2
0.2
0.2
0.2
−0.4
−0.4
−0.4
Series newX[, i] Series newX[, i] Series newX[, i] Series newX[, i]
0 2 4 6 8 12 0 2 4 6 8 12 0 2 4 6 8 12 0 2 4 6 8 12
0.8
0.8
0.8
ACF
ACF
ACF
0.2
0.2
0.2
0.2
−0.4
−0.4
−0.4
−0.4
Series newX[, i] Series newX[, i] Series newX[, i]
0 2 4 6 8 12 0 2 4 6 8 12 0 2 4 6 8 12 0 2 4 6 8 12
0.8
0.8
ACF
ACF
−0.4 0.2
−0.4 0.2
0.2
−0.4
0 2 4 6 8 12 0 2 4 6 8 12 0 2 4 6 8 12
y1 z1,1 0 " # a1 v1
" # " #" # " #
x1 b1 0 x1 w1 y
2
z
2,1 z2,2 x1
a v
2 2
= + = + +
x2 0 b2 x2 w2 . . . . . . x2 . . . . . .
t t−1 t t
y15 t z3,1 z3,2 a15 v15 t
x0 <- "zero"
Z <- matrix(list(0), ns, 2)
Z[1:(ns * 2)] <- c(paste0("z1", 1:ns), paste0("z2", 1:ns))
Z[1, 2] <- 0
A <- "unequal"
mod.list.dfa = list(B = B, Z = Z, Q = Q, R = R, U = U, A = A,
x0 = x0)
Now we can fit a MARSS model and get estimates of the missing SWEs. We
pass in the initial value for a as the mean level so it fits easier.
library(MARSS)
m <- apply(dat.feb, 1, mean, na.rm = TRUE)
fit.dfa <- MARSS(dat.feb, model = mod.list.dfa, control = list(maxit = 1000),
inits = list(A = matrix(m, ns, 1)))
25
0
11.2.4 Diagnostics
1.0
0.8
0.8
0.6
0.6
0.4
0.4
ACF
0.2
0.2
0.0
0.0
−0.2
−0.2
−0.4
−0.4
0 2 4 6 8 10 12 14 0 2 4 6 8 10 12 14
1.0
1.0
1.0
0.5
0.4
0.4
0.4
ACF
ACF
ACF
−0.2
−0.2
−0.2
−0.5
Series x Series x Series x Series x
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
1.0
Lag Lag Lag Lag
0.8
0.8
0.8
0.4
ACF
ACF
ACF
−0.4 0.2
−0.4 0.2
−0.4 0.2
−0.2
Series x Series x Series x Series x
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
1.0
1.0
1.0
1.0
Lag Lag Lag Lag
0.4
0.4
0.4
0.4
ACF
ACF
ACF
−0.2
−0.2
−0.2
−0.2
Series x Series x Series x
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
1.0
0.5
0.5
0.4
ACF
ACF
−0.2
−0.5
−0.5
0 5 10 15 0 5 10 15 0 5 10 15
The plots showed the estimate of the missing Feb SWE values, which is the
expected value of y conditioned on all the data. For the non-missing SWE,
this expected value is just the observation. Many times we want the model
fit for the covariate. If the measurements have observation error, the fitted
value is the estimate without this observation error.
25
0
When we look at all months, we see that SWE is highly seasonal. Note
October and November are missing for all years.
swe.yr <- snotel
swe.yr <- swe.yr[swe.yr$Station.Id %in% y$Station.Id, ]
swe.yr$Station <- droplevels(swe.yr$Station)
308 CHAPTER 11. COVARIATES WITH NAS
We will model the seasonal differences using a periodic model. The covariates
are
period <- 12
TT <- dim(dat.yr)[2]
cos.t <- cos(2 * pi * seq(TT)/period)
sin.t <- sin(2 * pi * seq(TT)/period)
c.seas <- rbind(cos.t, sin.t)
We will create a state for the seasonal cycle and each station will have a
scaled effect of that seasonal cycle. The observations will have the seasonal
effect plus a mean and residuals (observation - season - mean) will be allowed
11.3. MODELING SEASONAL SWE 309
The estimated mean SWE at each station is E(yi |y1:T ). This is the estimate
conditioned on all the data and includes the seasonal component plus the
information from the data from other stations. Because we estimated a R
matrix with covariance, stations with data at time t help inform the value of
stations without data at time t. Only years up to 1990 are shown, but the
model is fit to all years. The stations with no data before 1990 are being
estimated based on the information in the later years when they do have
data. We did not constrain the SWE to be positive, so negative estimates are
possible and occurs in the months in which we have no SWE data (because
there is no snow).
11.3. MODELING SEASONAL SWE 311
25
0
In this lab, we will work through using Bayesian methods to estimate pa-
rameters in time series models. There are a variety of software tools to do
time series analysis using Bayesian methods. R lists a number of packages
available on the R Cran TimeSeries task view.
After updating to the latest version of R, install JAGS for your operating
platform using the instructions here. Click on JAGS, then the most recent
folder, then the platform of your machine. You will also need the coda, rjags
and R2jags packages.
library(coda)
library(rjags)
library(R2jags)
313
314 CHAPTER 12. JAGS
yt = µ + et , et ∼ N(0, σ 2 ) (12.1)
An equivalent way to think about this model is that instead of the residuals
as normally distributed with mean zero, we can think of the data y as being
normally distributed with a mean of the intercept, and the same residual
standard deviation:
y ∼ N(E[yt ], σ 2 ) (12.2)
Remember that in linear regression models, the residual error is interpreted
as independent and identically distributed observation error.
To run the JAGS model, we will need to start by writing the model in JAGS
notation. For our linear regression model, one way to construct the model is
# 1. LINEAR REGRESSION with no covariates no covariates, so
# intercept only. The parameters are mean 'mu' and
# precision/variance parameter 'tau.obs'
12.2. LINEAR REGRESSION WITH NO COVARIATES 315
for(i in 1:N) {
Y[i] ~ dnorm(mu, tau.obs);
}
}
",
file = model.loc)
A couple things to notice: JAGS is not vectorized so we need to use for loops
(instead of matrix multiplication) and the dnorm notation means that we
assume that value (on the left) is normally distributed around a particular
mean with a particular precision (1 over the square root of the variance).
The model can briefly be summarized as follows: there are 2 parameters in the
model (the mean and variance of the observation error). JAGS is a bit funny in
that instead of giving a normal distribution the standard deviation or variance,
you pass in the precision (1/variance), so our prior on µ is pretty vague. The
precision receives a gamma prior, which is equivalent to the variance receiving
an inverse gamma prior (fairly common for standard Bayesian regression
models). We will treat the standard deviation as derived (if we know the
variance or precision, which we are estimating, we automatically know the
standard deviation). Finally, we write a model for the data yt (Y[i]). Again
we use the dnorm distribution to say that the data are normally distributed
(equivalent to our likelihood).
The function from the R2jags package that we actually use to run the model
is jags(). There is a parallel version of the function called jags.parallel()
which is useful for larger, more complex models. The details of both can be
found with ?jags or ?jags.parallel.
To actually run the model, we need to create several new objects, representing
(1) a list of data that we will pass to JAGS, (2) a vector of parameters that
316 CHAPTER 12. JAGS
we want to monitor in JAGS and have returned back to R, and (3) the name
of our text file that contains the JAGS model we wrote above. With those
three things, we can call the jags() function.
jags.data = list(Y = Wind, N = N) # named list of inputs
jags.params = c("sd.obs", "mu") # parameters to be monitored
mod_lm_intercept = jags(jags.data, parameters.to.save = jags.params,
model.file = model.loc, n.chains = 3, n.burnin = 5000, n.thin = 1,
n.iter = 10000, DIC = TRUE)
Notice that the jags() function contains a number of other important argu-
ments. In general, larger is better for all arguments: we want to run multiple
MCMC chains (maybe 3 or more), and have a burn-in of at least 5000. The
total number of samples after the burn-in period is n.iter-n.burnin, which in
this case is 5000 samples. Because we are doing this with 3 MCMC chains,
and the thinning rate equals 1 (meaning we are saving every sample), we will
retain a total of 1500 posterior samples for each parameter.
The saved object storing our model diagnostics can be accessed directly, and
includes some useful summary output.
mod_lm_intercept
examine the output more closely, we can pull all of the results directly into R,
attach.jags(mod_lm_intercept)
mu
Attaching the R2jags object allows us to work with the named parameters
directly in R. For example, we could make a histogram of the posterior
distributions of the parameters mu and sd.obs with the following code,
# Now we can make plots of posterior values
par(mfrow = c(2, 1))
hist(mu, 40, col = "grey", xlab = "Mean", main = "")
hist(sd.obs, 40, col = "grey", xlab = expression(sigma[obs]),
main = "")
Finally, we can run some useful diagnostics from the coda package on this
model output. We have written a small function to make the creation of
mcmc lists (an argument required for many of the diagnostics). The function
createMcmcList = function(jagsmodel) {
McmcArray = as.array(jagsmodel$BUGSoutput$sims.array)
McmcList = vector("list", length = dim(McmcArray)[2])
for (i in 1:length(McmcList)) McmcList[[i]] = as.mcmc(McmcArray[,
i, ])
McmcList = mcmc.list(McmcList)
return(McmcList)
}
Creating the MCMC list preserves the random samples generated from each
chain and allows you to extract the samples for a given parameter (such as µ)
from any chain you want. To extract µ from the first chain, for example, you
could use the following code. Because createMcmcList() returns a list of
mcmc objects, we can summarize and plot these directly. Figure 12.2 shows
the plot from plot(myList[[1]]).
myList = createMcmcList(mod_lm_intercept)
summary(myList[[1]])
318 CHAPTER 12. JAGS
0.8
Frequency
0.4
0.0
0 1 2 3 4 5
Mean
1500
Frequency
500
0
σobs
Figure 12.1: Plot of the posteriors for the linear regression model.
12.2. LINEAR REGRESSION WITH NO COVARIATES 319
Iterations = 1:5000
Thinning interval = 1
Number of chains = 1
Sample size per chain = 5000
0.0
Trace of mu Density of mu
1.4
9.0
0.0
0 1000 2000 3000 4000 5000 9.0 9.5 10.0 10.5 11.0
0.0
0.2
820
0.0
0 1000 2000 3000 4000 5000 820 825 830 835
Trace of mu Density of mu
10.0 11.0
1.2
0.6
9.0
0.0
0 1000 2000 3000 4000 5000 9.0 9.5 10.0 10.5 11.0
1.0
3.0
0.0
In our first model, the errors were independent in time. We are going to
modify this to model autocorrelated errors. Autocorrelated errors are widely
used in ecology and other fields – for a greater discussion, see Morris and
Doak (2002) Quantitative Conservation Biology. To make the deviations
autocorrelated, we start by defining the deviation in the first time step,
e1 = Y1 − u. The expectation of yt in each time step is then written as
Like in our first model, we assume that the data follow a normal likelihood
(or equivalently that the residuals are normally distributed), yt = E[yt ] + et ,
or yt ∼ N(E[yt ], σ 2 ). Thus, it is possible to express the subsequent deviations
as et = yt − E[yt ], or equivalently as et = yt − µ − φ × et−1 . The JAGS script
for this model is:
322 CHAPTER 12. JAGS
model.loc = ("lmcor_intercept.txt")
jagsscript = cat("
model {
# priors on parameters
mu ~ dnorm(0, 0.01);
tau.obs ~ dgamma(0.001,0.001);
sd.obs <- 1/sqrt(tau.obs);
phi ~ dunif(-1,1);
tau.cor <- tau.obs / (1-phi*phi); # Var = sigma2 * (1-rho^2)
Notice several subtle changes from the simpler first model: (1) we are esti-
mating the autocorrelation parameter φ, which is assigned a Uniform(-1, 1)
prior, (2) we model the residual variance as a function of the autocorrelation,
and (3) we allow the autocorrelation to affect the predicted values predY.
One other change we can make is to add predY to the list of parameters we
want returned to R.
jags.data = list(Y = Wind, N = N)
jags.params = c("sd.obs", "predY", "mu", "phi")
mod_lmcor_intercept = jags(jags.data, parameters.to.save = jags.params,
model.file = model.loc, n.chains = 3, n.burnin = 5000, n.thin = 1,
n.iter = 10000, DIC = TRUE)
data. You can make this plot yourself, but we have also put together a simple
function whose arguments are one of our fitted models and the raw data. The
function is:
plotModelOutput = function(jagsmodel, Y) {
# attach the model
attach.jags(jagsmodel)
x = seq(1, length(Y))
summaryPredictions = cbind(apply(predY, 2, quantile, 0.025),
apply(predY, 2, mean), apply(predY, 2, quantile, 0.975))
plot(Y, col = "white", ylim = c(min(c(Y, summaryPredictions)),
max(c(Y, summaryPredictions))), xlab = "", ylab = "95% CIs of predictions and
main = paste("JAGS results:", jagsmodel$model.file))
polygon(c(x, rev(x)), c(summaryPredictions[, 1], rev(summaryPredictions[,
3])), col = "grey70", border = NA)
lines(summaryPredictions[, 2])
points(Y)
}
We can use the function to plot the predicted posterior mean with 95% CIs,
as well as the raw data. For example, try
plotModelOutput(mod_lmcor_intercept, Wind)
mu
15
10
5
0 50 100 150
this simple model, we will assume that our process of interest (in this case,
daily wind speed) exhibits no daily trend, but behaves as a random walk.
The JAGS random walk model and R script to run the AR(1) model is below:
# 4. AR(1) MODEL WITH AND ESTIMATED AR COEFFICIENT We're
# introducting a new AR coefficient 'phi', so the model is
# y[t] ~ N(mu + phi*y[n-1], sigma^2)
model.loc = ("ar1_intercept.txt")
jagsscript = cat("
model {
mu ~ dnorm(0, 0.01);
tau.pro ~ dgamma(0.001,0.001);
sd.pro <- 1/sqrt(tau.pro);
phi ~ dnorm(0, 1);
For the process model, there are a number of ways to parameterize the first
state (x1 ), and we will talk about this more in the class. For the sake of this
model, we will place a vague weakly informative prior on x1 : x1 ∼ N(0, 0.01).
Second, we need to construct an observation model linking the estimate
unseen states of nature xt to the data yt . For simplicitly, we will assume that
the observation errors are indepdendent and identically distributed, with no
observation component. Mathematically, this model is
yt ∼ N(xt , r) (12.8)
In the two above models, q is the process variance and r is the observation
error variance. The JAGS code will use the standard deviation (square root)
of these. The code to produce and fit this model is below:
# 5. MAKE THE SS MODEL a univariate random walk no
# covariates.
model.loc = ("ss_model.txt")
328 CHAPTER 12. JAGS
jagsscript = cat("
model {
# priors on parameters
mu ~ dnorm(0, 0.01);
tau.pro ~ dgamma(0.001,0.001);
sd.q <- 1/sqrt(tau.pro);
tau.obs ~ dgamma(0.001,0.001);
sd.r <- 1/sqrt(tau.obs);
phi ~ dnorm(0,1);
for(i in 2:N) {
predX[i] <- phi*X[i-1];
X[i] ~ dnorm(predX[i],tau.pro); # Process variation
predY[i] <- X[i];
Y[i] ~ dnorm(X[i], tau.obs); # Observation variation
}
}
",
file = model.loc)
Returning to the first example of regression with the intercept only, we will
introduce Temp as the covariate explaining our response variable Wind. Note
that to include the covariate, we (1) modify the JAGS script to include a new
coefficient—in this case beta, (2) update the predictive equation to include
12.7. FORECASTING WITH JAGS MODELS 329
the effects of the new covariate, and (3) we include the new covariate in our
named data list.
# 6. Include some covariates in a linear regression Use
# temperature as a predictor of wind
model.loc = ("lm.txt")
jagsscript = cat("
model {
mu ~ dnorm(0, 0.01);
beta ~ dnorm(0,0.01);
tau.obs ~ dgamma(0.001,0.001);
sd.obs <- 1/sqrt(tau.obs);
for(i in 1:N) {
predY[i] <- mu + C[i]*beta;
Y[i] ~ dnorm(predY[i], tau.obs);
}
}
",
file = model.loc)
We can inspect the fitted model object, and see that predY contains the 3
new predictions for the forecasts from this model.
12.8. PROBLEMS 331
12.8 Problems
1. Fit the intercept only model from section 12.2. Set the burn-in to 3,
and when the model completes, plot the time series of the parameter
mu for the first MCMC chain.
a. Based on your visual inspection, has the MCMC chain convered?
b. What is the ACF of the first MCMC chain?
2. Increase the MCMC burn-in for the model in question 1 to a value that
you think is reasonable. After the model has converged, calculate the
Gelman-Rubin diagnostic for the fitted model object.
3. Compare the results of the plotModelOutput() function for the inter-
cept only model from section 12.2. You will to add “predY” to your
JAGS model and to the list of parameters to monitor, and re-run the
model.
4. Modify the random walk model without drift from section 12.4 to a
random walk model with drift. The equation for this model is
a. Fit the following Ricker model to these data using the following
linear form of this model with normally distributed errors:
For this lab, we will use Stan for fitting models. These examples are primarily
drawn from the Stan manual and previous code from this class.
A script with all the R code in the chapter can be downloaded here.
You will need the atsar package we have written for fitting state-space time
series models with Stan. This is hosted on Github safs-timeseries. Install
using the devtools package.
library(devtools)
devtools::install_github("nwfsc-timeseries/atsar")
In addition, you will need the rstan, datasets, parallel and loo packages.
After installing, if needed, load the packages:
library(atsar)
library(rstan)
library(loo)
Once you have Stan and rstan installed, optimize Stan on your machine:
333
334 CHAPTER 13. STAN
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())
For this lab, we will use a data set on airquality in New York from the
datasets package. Load the data and create a couple new variables for future
use.
data(airquality, package = "datasets")
Wind <- airquality$Wind # wind speed
Temp <- airquality$Temp # air temperature
An equivalent way to think about this model is that instead of the residuals
as normally distributed with mean zero, we can think of the data yt as being
drawn from a normal distribution with a mean of the intercept, and the same
residual standard deviation:
Yt ∼ N (E[Yt ], σ)
Fitting the model using our function is done with this code,
13.1. LINEAR REGRESSION 335
But to get more detailed output for each parameter, you have to use the
extract() function,
pars <- rstan::extract(lm_intercept)
names(pars)
extract() will return the draws from the posterior for your parameters and
any derived variables specified in your stan code. In this case, our model is
yt = β × 1 + et , et ∼ N (0, σ)
so our estimated parameters are β and σ. Our stan code computed the derived
variables: predicted yt which is ŷt = β × 1 and the log-likelihood. lp__ is
the log posterior which is automatically returned.
120
Frequency
80
60
40
20
0
2 4 6 8 10 12 14 16
Intercept
Predicted values
14
12
10
Wind
8
6
4
2
0 50 100 150
Index
Figure 13.1: Data and predicted values for the linear regression model.
Here is a plot of the time series of beta with one chain and no burn-in. Based
on visual inspection, when does the chain converge?
pars <- rstan::extract(lm_intercept)
plot(pars$beta)
In our first model, the errors were independent in time. We’re going to modify
this to model autocorrelated errors. Autocorrelated errors are widely used in
ecology and other fields – for a greater discussion, see Morris and Doak (2002)
Quantitative Conservation Biology. To make the errors autocorrelated, we
start by defining the error in the first time step, e1 = y1 − β. The expectation
of Yt in each time step is then written as
E[Yt ] = β + φet−1
338 CHAPTER 13. STAN
15
10
pars$beta
5
0
Index
Figure 13.2: A time series of our posterior draws using one chain and no
burn-in.
σ 2 = ψ 2 1 − φ2
Like in our first model, we assume that the data follows a normal likelihood
(or equivalently that the residuals are normally distributed), yt = E[Yt ] + et ,
or Yt ∼ N (E[Yt ], σ). Thus, it is possible to express the subsequent deviations
as et = yt − E[Yt ], or equivalently as et = yt − β − φet−1 .
We can fit this regression with autocorrelated errors by changing the model
name to ‘regression_cor’
lm_intercept_cor <- atsar::fit_stan(y = Temp, x = rep(1, length(Temp)),
model_name = "regression_cor", mcmc_list = list(n_mcmc = 1000,
n_burn = 1, n_chain = 1, n_thin = 1))
13.3. RANDOM WALK MODEL 339
yt = yt−1 + et
And the et ∼ N (0, σ). Remember back to the autocorrelated model (or
MA(1) models) that we assumed that the errors et followed a random walk.
In contrast, this model assumes that the errors are independent, but that the
state of nature follows a random walk. Note also that this model as written
doesn’t include a drift term (this can be turned on / off using the est_drift
argument).
We can fit the random walk model using argument model_name = 'rw'
passed to the fit_stan() function.
rw <- atsar::fit_stan(y = Temp, est_drift = FALSE, model_name = "rw")
more in future lectures. The math to describe the AR(1) model is:
yt = φyt−1 + et
.
The fit_stan() function can fit higher order AR models, but for now we
just want to fit an AR(1) model and make a histogram of phi.
ar1 <- atsar::fit_stan(y = Temp, x = matrix(1, nrow = length(Temp),
ncol = 1), model_name = "ar", est_drift = FALSE, P = 1)
xt = φxt−1 + et , et ∼ N (0, q)
For the process model, there are a number of ways to parameterize the first
‘state’, and we’ll talk about this more in the class, but for the sake of this model,
we’ll place a vague weakly informative prior on x1 , x1 ∼ N (0, 0.01).Second,
we need to construct an observation model linking the estimate unseen states
of nature xt to the data yt . For simplicitly, we’ll assume that the observation
errors are indepdendent and identically distributed, with no observation
component. Mathematically, this model is
Yt ∼ N (xt , r)
In the two above models, we’ll refer to q as the standard deviation of the
process variance and r as the standard deviation of the observation error
variance
13.6. DYNAMIC FACTOR ANALYSIS 341
We can fit the state-space AR(1) and random walk models using the
fit_stan() function:
ss_ar <- atsar::fit_stan(y = Temp, est_drift = FALSE, model_name = "ss_ar")
ss_rw <- atsar::fit_stan(y = Temp, est_drift = FALSE, model_name = "ss_rw")
Cryptomonas Diatoms
0 1 2 3
2
1
dat.ts[, i]
0
−2
−2
1980 1982 1984 1986 1988 1990 1980 1982 1984 1986 1988 1990
Greens
Time
Unicells
Time
0 1 2 3
0
dat.ts[, i]
−2
−2
−4
1980 1982 1984 1986 1988 1990 1980 1982 1984 1986 1988 1990
Other.algae
Time Time
2
1
0
−2
2
0
−2
0 20 40 60 80 100 120
We will fit multiple DFA with different numbers of trends and use leave one
out (LOO) cross-validation to choose the best model.
mod_1 = atsar::fit_dfa(y = dat.spp.1980, num_trends = 1)
mod_2 = atsar::fit_dfa(y = dat.spp.1980, num_trends = 2)
Warning: The largest R-hat is 1.07, indicating chains have not mixed.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#r-hat
Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#bulk-ess
344 CHAPTER 13. STAN
Warning: Tail Effective Samples Size (ESS) is too low, indicating posterior va
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#tail-ess
mod_3 = atsar::fit_dfa(y = dat.spp.1980, num_trends = 3)
mod_4 = atsar::fit_dfa(y = dat.spp.1980, num_trends = 4)
Warning: The largest R-hat is 1.07, indicating chains have not mixed.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#r-hat
Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior me
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#bulk-ess
Warning: Tail Effective Samples Size (ESS) is too low, indicating posterior va
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#tail-ess
mod_5 = atsar::fit_dfa(y = dat.spp.1980, num_trends = 5)
Warning: There were 15 transitions after warmup that exceeded the maximum tree
http://mc-stan.org/misc/warnings.html#maximum-treedepth-exceeded
Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior me
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#bulk-ess
Warning: Tail Effective Samples Size (ESS) is too low, indicating posterior va
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#tail-ess
We will compute the Leave One Out Information Criterion (LOOIC) using
the loo package. Like AIC, lower is better.
loo::loo(loo::extract_log_lik(mod_1))$looic
[1] 1634.44
trends LOOIC
1 1 1634.440
2 2 1561.259
3 3 1480.432
4 4 1415.185
5 5 1402.356
8.5
8.0
Log abundance
7.5
7.0
6.5
6.0
Trend
4
2
0
pred_mean
−2
−4
−6
−8
5 10 15 20
Index
13.8 Problems
1. By adapting the code in Section 13.1, fit a regression model that includes
the intercept and a slope, modeling the effect of Wind. What is the
mean wind effect you estimate?
2. Using the results from the linear regression model fit with no burn-in
(Section 13.1.1), calculate the ACF of the beta time series using acf().
Would thinning more be appropriate? How much?
3. Using the fit of the random walk model to the temperature data (Section
13.3), plot the predicted values (states) and 95% CIs.
4. To see the effect of this increased flexibility in estimating the autocorre-
lation, make a plot of the predictions from the AR(1) model (Section
13.4 and the RW model (13.3).
5. Fit the univariate state-space model (Section 13.5) with and without
the autoregressive parameter φ and compare the estimated process and
observation error variances. Recall that AR(1) without the φ parameter
is a random walk.
Bibliography
Dormann, C. F., Elith, J., Bacher, S., Buchmann, C., Carl, G., Carré, G.,
Marquéz, J. R. G., Gruber, B., Lafourcade, B., Leitão, P. J., Münkemüller,
T., McClean, C., Osborne, P. E., Reineking, B., Schröder, B., Skidmore,
A. K., Zurell, D., and Lautenbach, S. (2013). Collinearity: a review of
methods to deal with it and a simulation study evaluating their performance.
Ecography, 36:027–046.
Lamon, E. I., Carpenter, S., and Stow, C. (1998). Forecasting pcb concen-
trations in lake michigan salmonids: a dynamic linear model approach.
Ecological Applications, 8:659–668.
Lisi, P. J., Schindler, D. E., Cline, T. J., Scheuerell, M. D., and Walsh, P. B.
(2015). Watershed geomorphology and snowmelt control stream thermal
sensitivity to air temperature. Geophysical Research Letters, 42(9):3380–
3388.
Petris, G., Petrone, S., and Campagnoli, P. (2009). Dynamic Linear Models
with R. Use R! Springer, London.
349
350 BIBLIOGRAPHY
Pole, A., West, M., and Harrison, J. (1994). Applied Bayesian forecasting and
time series analysis. Chapman and Hall, New York.
Scheuerell, M. D. and Williams, J. G. (2005). Forecasting climate induced
changes in the survival of snake river spring/summer chinook salmon
(oncorhynchus tshawytscha). Fisheries Oceanography, 14(6):448–457.
Stachura, M. M., Mantua, N. J., and Scheuerell, M. D. (2014). Oceanographic
influences on patterns in North Pacific salmon abundance. Canadian
Journal of Fisheries and Aquatic Sciences, 71(2):226–235.
Zuur, A. F., Fryer, R. J., Jolliffe, I. T., Dekker, R., and Beukema, J. J. (2003).
Estimating common trends in multivariate time series using dynamic factor
analysis. Environmetrics, 14(7):665–685.