Applied Time Series Analysis

Applied Time Series Analysis for Fisheries and
Environmental Sciences
E. E. Holmes, M. D. Scheuerell, and E. J. Ward
2020-02-03
2
Contents
1 Basic matrix math in R 11

1.1 Creating matrices in R . . . . . . . . . . . . . . . . . . . . . . 11
1.2 Matrix multiplication, addition and transpose . . . . . . . . . 14
1.3 Subsetting a matrix . . . . . . . . . . . . . . . . . . . . . . . . 16
1.4 Replacing elements in a matrix . . . . . . . . . . . . . . . . . 18
1.5 Diagonal matrices and identity matrices . . . . . . . . . . . . 20
1.6 Taking the inverse of a square matrix . . . . . . . . . . . . . . 22
1.7 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2 Linear regression in matrix form 27

2.1 A simple regression: one explanatory variable . . . . . . . . . 28
2.2 Matrix Form 1 . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.3 Matrix Form 2 . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.4 Groups of intercepts . . . . . . . . . . . . . . . . . . . . . . . 38
2.5 Groups of β’s . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.6 Seasonal effect as a factor . . . . . . . . . . . . . . . . . . . . 44
2.7 Seasonal effect plus other explanatory variables* . . . . . . . . 47
2.8 Models with confounded parameters* . . . . . . . . . . . . . . 48
2.9 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3 Introduction to time series 53

3.1 Examples of time series . . . . . . . . . . . . . . . . . . . . . . 53
3.2 Classification of time series . . . . . . . . . . . . . . . . . . . . 55
3.3 Statistical analyses of time series . . . . . . . . . . . . . . . . 56
3.4 What is a time series model? . . . . . . . . . . . . . . . . . . . 57
3.5 Two simple and classic time series models . . . . . . . . . . . 57
3.6 Classical decomposition . . . . . . . . . . . . . . . . . . . . . . 59
3
4 CONTENTS
3.7 Decomposition on log-transformed data . . . . . . . . . . . . . 65
4 Basic time series functions in R 69

4.1 Time series plots . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.2 Decomposition of time series . . . . . . . . . . . . . . . . . . . 73
4.3 Differencing to remove a trend or seasonal effects . . . . . . . 81
4.4 Correlation within and among time series . . . . . . . . . . . . 83
4.5 White noise (WN) . . . . . . . . . . . . . . . . . . . . . . . . 93
4.6 Random walks (RW) . . . . . . . . . . . . . . . . . . . . . . . 97
4.7 Autoregressive (AR) models . . . . . . . . . . . . . . . . . . . 100
4.8 Moving-average (MA) models . . . . . . . . . . . . . . . . . . 105
4.9 Autoregressive moving-average (ARMA) models . . . . . . . . 109
4.10 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
5 Box-Jenkins method 117

5.1 Box-Jenkins method . . . . . . . . . . . . . . . . . . . . . . . 118
5.2 Stationarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
5.3 Dickey-Fuller and Augmented Dickey-Fuller tests . . . . . . . 126
5.4 KPSS test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
5.5 Dealing with non-stationarity . . . . . . . . . . . . . . . . . . 134
5.6 Summary: stationarity testing . . . . . . . . . . . . . . . . . . 137
5.7 Estimating ARMA parameters . . . . . . . . . . . . . . . . . . 138
5.8 Estimating the ARMA orders . . . . . . . . . . . . . . . . . . 143
5.9 Check residuals . . . . . . . . . . . . . . . . . . . . . . . . . . 148
5.10 Forecast from a fitted ARIMA model . . . . . . . . . . . . . . 149
5.11 Seasonal ARIMA model . . . . . . . . . . . . . . . . . . . . . 150
5.12 Forecast using a seasonal model . . . . . . . . . . . . . . . . . 152
5.13 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
6 Univariate state-space models 157

6.1 Fitting a state-space model with MARSS . . . . . . . . . . . . 158
6.2 Examples using the Nile river data . . . . . . . . . . . . . . . 160
6.3 The StructTS function . . . . . . . . . . . . . . . . . . . . . . 164
6.4 Comparing models with AIC and model weights . . . . . . . . 167
6.5 Basic diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . 168
6.6 Fitting with JAGS . . . . . . . . . . . . . . . . . . . . . . . . 171
6.7 Fitting with Stan . . . . . . . . . . . . . . . . . . . . . . . . . 174
6.8 A random walk model of animal movement . . . . . . . . . . . 179
CONTENTS 5
6.9 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
7 MARSS models 187

7.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
7.2 West coast harbor seals counts . . . . . . . . . . . . . . . . . . 188
7.3 A single well-mixed population . . . . . . . . . . . . . . . . . 190
7.4 Four subpopulations with temporally uncorrelated errors . . . 195
7.5 Four subpopulations with temporally correlated errors . . . . . 197
7.6 Using MARSS models to study spatial structure . . . . . . . . 200
7.7 Hypotheses regarding spatial structure . . . . . . . . . . . . . 200
7.8 Set up the hypotheses as different models . . . . . . . . . . . . 202
7.9 Fitting a MARSS model with JAGS . . . . . . . . . . . . . . . 204
7.10 Fitting a MARSS model with Stan . . . . . . . . . . . . . . . 208
7.11 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
8 MARSS models with covariates 217

8.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
8.2 Prepare the plankton data . . . . . . . . . . . . . . . . . . . . 218
8.3 Observation-error only model . . . . . . . . . . . . . . . . . . 219
8.4 Process-error only model . . . . . . . . . . . . . . . . . . . . . 221
8.5 Both process- and observation-error . . . . . . . . . . . . . . . 225
8.6 Including seasonal effects in MARSS models . . . . . . . . . . 227
8.7 Model diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . 234
8.8 Homework data and discussion . . . . . . . . . . . . . . . . . . 234
8.9 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
9 Dynamic linear models 241

9.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
9.2 DLM in state-space form . . . . . . . . . . . . . . . . . . . . . 243
9.3 Stochastic level models . . . . . . . . . . . . . . . . . . . . . . 244
9.4 Stochastic regression model . . . . . . . . . . . . . . . . . . . 246
9.5 DLM with seasonal effect . . . . . . . . . . . . . . . . . . . . . 247
9.6 Analysis of salmon survival . . . . . . . . . . . . . . . . . . . . 249
9.7 Fitting with MARSS() . . . . . . . . . . . . . . . . . . . . . . . 250
9.8 Forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
9.9 Forecast diagnostics . . . . . . . . . . . . . . . . . . . . . . . . 259
9.10 Homework discussion and data . . . . . . . . . . . . . . . . . . 262
9.11 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
6 CONTENTS
10 Dynamic Factor Analysis 267

10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268
10.2 Example of a DFA model . . . . . . . . . . . . . . . . . . . . . 268
10.3 Constraining a DFA model . . . . . . . . . . . . . . . . . . . . 269
10.4 Different error structures . . . . . . . . . . . . . . . . . . . . . 270
10.5 Lake Washington phytoplankton data . . . . . . . . . . . . . . 271
10.6 Fitting DFA models with the MARSS package . . . . . . . . . 273
10.7 Interpreting the MARSS output . . . . . . . . . . . . . . . . . 277
10.8 Rotating trends and loadings . . . . . . . . . . . . . . . . . . . 277
10.9 Estimated states and loadings . . . . . . . . . . . . . . . . . . 278
10.10Plotting the data and model fits . . . . . . . . . . . . . . . . . 281
10.11Covariates in DFA models . . . . . . . . . . . . . . . . . . . . 284
10.12Example from Lake Washington . . . . . . . . . . . . . . . . . 284
10.13Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289
11 Covariates with Missing Values 291

11.1 Covariates with missing values or observation error . . . . . . 292
11.2 Example: Snotel Data . . . . . . . . . . . . . . . . . . . . . . 294
11.3 Modeling Seasonal SWE . . . . . . . . . . . . . . . . . . . . . 307
12 JAGS for Bayesian time series analysis 313

12.1 The airquality dataset . . . . . . . . . . . . . . . . . . . . . . 314
12.2 Linear regression with no covariates . . . . . . . . . . . . . . . 314
12.3 Regression with autocorrelated errors . . . . . . . . . . . . . . 321
12.4 Random walk time series model . . . . . . . . . . . . . . . . . 323
12.5 Autoregressive AR(1) time series models . . . . . . . . . . . . 326
12.6 Univariate state space model . . . . . . . . . . . . . . . . . . . 327
12.7 Forecasting with JAGS models . . . . . . . . . . . . . . . . . . 329
12.8 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331
13 Stan for Bayesian time series analysis 333

13.1 Linear regression . . . . . . . . . . . . . . . . . . . . . . . . . 334
13.2 Linear regression with correlated errors . . . . . . . . . . . . . 337
13.3 Random walk model . . . . . . . . . . . . . . . . . . . . . . . 339
13.4 Autoregressive models . . . . . . . . . . . . . . . . . . . . . . 339
13.5 Univariate state-space models . . . . . . . . . . . . . . . . . . 340
13.6 Dynamic factor analysis . . . . . . . . . . . . . . . . . . . . . 341
13.7 Uncertainty intervals on states . . . . . . . . . . . . . . . . . . 345
CONTENTS 7
13.8 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348

8 CONTENTS
Preface
This is material that was developed as part of a course we teach at the

University of Washington on applied time series analysis for fisheries and
environmental data. You can find our lectures on our course website ATSA.
Book package
The book uses a number of R packages and a variety of fisheries data sets.
The packages and data sets can be installed by installing our atsalibrary
package which is hosted on GitHub:
library(devtools)
devtools::install_github("nwfsc-timeseries/atsalibrary")
Authors
The authors are research scientists at the Northwest Fisheries Science Center
(NWFSC). This work was conducted as part of our jobs at the NWFSC, a re-
search center for NOAA Fisheries which is a United States federal government
agency.
Links to more code and publications can be found on our academic websites:
• http://faculty.washington.edu/eeholmes
• http://faculty.washington.edu/scheuerl
• http://faculty.washington.edu/warde
9
10 CONTENTS
Citation
Holmes, E. E., M. D. Scheuerell, and E. J. Ward. Applied time series

analysis for fisheries and environmental data. NOAA Fisheries, Northwest
Fisheries Science Center, 2725 Montlake Blvd E., Seattle, WA 98112. Contacts
eli.holmes@noaa.gov, eric.ward@noaa.gov, and mark.scheuerell@noaa.gov
Chapter 1
Basic matrix math in R
This chapter reviews the basic matrix math operations that you will need to
understand the course material and shows how to do these operations in R.
A script with all the R code in the chapter can be downloaded here.
1.1 Creating matrices in R
Create a 3 × 4 matrix, meaning 3 row and 4 columns, that is all 1s:

matrix(1, 3, 4)
[,1] [,2] [,3] [,4]

[1,] 1 1 1 1
[2,] 1 1 1 1
[3,] 1 1 1 1
Create a 3 × 4 matrix filled in with the numbers 1 to 12 by column (default)
and by row:
matrix(1:12, 3, 4)
[,1] [,2] [,3] [,4]

[1,] 1 4 7 10
[2,] 2 5 8 11
11
12 CHAPTER 1. MATRIX MATH
[3,] 3 6 9 12
matrix(1:12, 3, 4, byrow = TRUE)
[,1] [,2] [,3] [,4]

[1,] 1 2 3 4
[2,] 5 6 7 8
[3,] 9 10 11 12
Create a matrix with one column:
matrix(1:4, ncol = 1)
[,1]
[1,] 1
[2,] 2
[3,] 3
[4,] 4
Create a matrix with one row:
matrix(1:4, nrow = 1)
[,1] [,2] [,3] [,4]

[1,] 1 2 3 4
Check the dimensions of a matrix
A = matrix(1:6, 2, 3)
A
[,1] [,2] [,3]

[1,] 1 3 5
[2,] 2 4 6
dim(A)
[1] 2 3
Get the number of rows in a matrix:
dim(A)[1]
[1] 2
1.1. CREATING MATRICES IN R 13
nrow(A)
[1] 2
Create a 3D matrix (called array):
A = array(1:6, dim = c(2, 3, 2))
A
, , 1
[,1] [,2] [,3]

[1,] 1 3 5
[2,] 2 4 6
, , 2
[,1] [,2] [,3]

[1,] 1 3 5
[2,] 2 4 6
dim(A)
[1] 2 3 2
Check if an object is a matrix. A data frame is not a matrix. A vector is not
a matrix.
A = matrix(1:4, 1, 4)
A
[,1] [,2] [,3] [,4]

[1,] 1 2 3 4
class(A)
[1] "matrix"
B = data.frame(A)
B
X1 X2 X3 X4
1 1 2 3 4
class(B)
[1] "data.frame"
C = 1:4
C
[1] 1 2 3 4
class(C)
[1] "integer"
1.2 Matrix multiplication, addition and

transpose
You will need to be very solid in matrix multiplication for the course. If you
haven’t done it in awhile, google ‘matrix multiplication youtube’ and you find
lots of 5min videos to remind you.
In R, you use the %*% operation to do matrix multiplication. When you do
matrix multiplication, the columns of the matrix on the left must equal the
rows of the matrix on the right. The result is a matrix that has the number
of rows of the matrix on the left and number of columns of the matrix on the
right.
(n × m)(m × p) = (n × p)
A=matrix(1:6, 2, 3) #2 rows, 3 columns

B=matrix(1:6, 3, 2) #3 rows, 2 columns
A%*%B #this works
[,1] [,2]
[1,] 22 49
[2,] 28 64
B%*%A #this works
[,1] [,2] [,3]

[1,] 9 19 29
1.2. MATRIX MULTIPLICATION, ADDITION AND TRANSPOSE 15
[2,] 12 26 40
[3,] 15 33 51
try(B%*%B) #this doesn't
Error in B %*% B : non-conformable arguments

To add two matrices use +. The matrices have to have the same dimensions.
A+A #works
[,1] [,2] [,3]

[1,] 2 6 10
[2,] 4 8 12
A+t(B) #works
[,1] [,2] [,3]

[1,] 2 5 8
[2,] 6 9 12
try(A+B) #does not work since A has 2 rows and B has 3
Error in A + B : non-conformable arrays

The transpose of a matrix is denoted A> or A0 . To transpose a matrix in R,
you use t().
t(A) #is the transpose of A
[,1] [,2]
[1,] 1 2
[2,] 3 4
[3,] 5 6
try(A%*%A) #this won't work
Error in A %*% A : non-conformable arguments

A%*%t(A) #this will
[,1] [,2]
[1,] 35 44
[2,] 44 56
1.3 Subsetting a matrix

To subset a matrix, we use [ ]:
#get the first and second rows of A
#it's a 2x3 matrix
A[1:2,]
[,1] [,2] [,3]

[1,] 1 4 7
[2,] 2 5 8
#get the top 2 rows and left 2 columns
A[1:2,1:2]
[,1] [,2]
[1,] 1 4
[2,] 2 5
#What does this do?
A[c(1,3),c(1,3)]
[,1] [,2]
[1,] 1 7
[2,] 3 9
#This?
A[c(1,2,1),c(2,3)]
[,1] [,2]
[1,] 4 7
[2,] 5 8
[3,] 4 7
If you have used matlab, you know you can say something like A[1,end] to
denote the element of a matrix in row 1 and the last column. R does not
have ‘end’. To do, the same in R you do something like:
1.3. SUBSETTING A MATRIX 17
A=matrix(1:9, 3, 3)
A[1,ncol(A)]
[1] 7
#or
A[1,dim(A)[2]]
[1] 7
Warning R will create vectors from subsetting matrices!
One of the really bad things that R does with matrices is create a vector if
you happen to subset a matrix to create a matrix with 1 row or 1 column.
Look at this:
A=matrix(1:9, 3, 3)
#take the first 2 rows
B=A[1:2,]
#everything is ok
dim(B)
[1] 2 3
class(B)
[1] "matrix"
#take the first row
B=A[1,]
#oh no! It should be a 1x3 matrix but it is not.
dim(B)
NULL
#It is not even a matrix any more
class(B)
[1] "integer"
#and what happens if we take the transpose?
#Oh no, it's a 1x3 matrix not a 3x1 (transpose of 1x3)
t(B)
[,1] [,2] [,3]

[1,] 1 4 7
#A%*%B should fail because A is (3x3) and B is (1x3)
A%*%B
[,1]
[1,] 66
[2,] 78
[3,] 90
#It works? That is horrible!
This will create hard to find bugs in your code because you will look at
B=A[1,] and everything looks fine. Why is R saying it is not a matrix! To
stop R from doing this use drop=FALSE.
B=A[1,,drop=FALSE]
#Now it is a matrix as it should be
dim(B)
[1] 1 3
class(B)
[1] "matrix"
#this fails as it should (alerting you to a problem!)
try(A%*%B)
Error in A %*% B : non-conformable arguments
1.4 Replacing elements in a matrix
Replace 1 element.
A=matrix(1, 3, 3)
A[1,1]=2
A
[,1] [,2] [,3]

1.4. REPLACING ELEMENTS IN A MATRIX 19
[1,] 2 1 1
[2,] 1 1 1
[3,] 1 1 1
Replace a row with all 1s or a string of values
A=matrix(1, 3, 3)
A[1,]=2
A
[,1] [,2] [,3]

[1,] 2 2 2
[2,] 1 1 1
[3,] 1 1 1
A[1,]=1:3
A
[,1] [,2] [,3]

[1,] 1 2 3
[2,] 1 1 1
[3,] 1 1 1
Replace group of elements. This often does not work as one expects so be
sure look at your matrix after trying something like this. Here I want to
replace elements (1,3) and (3,1) with 2, but it didn’t work as I wanted.
A=matrix(1, 3, 3)
A[c(1,3),c(3,1)]=2
A
[,1] [,2] [,3]

[1,] 2 1 2
[2,] 1 1 1
[3,] 2 1 2
How do I replace elements (1,1) and (3,3) with 2 then? It’s tedious. If you
have a lot of elements to replace, you might want to use a for loop.
A=matrix(1, 3, 3)
A[1,3]=2
A[3,1]=2
[,1] [,2] [,3]

[1,] 1 1 2
[2,] 1 1 1
[3,] 2 1 1
1.5 Diagonal matrices and identity matrices
A diagonal matrix is one that is square, meaning number of rows equals

number of columns, and it has 0s on the off-diagonal and non-zeros on the
diagonal. In R, you form a diagonal matrix with the diag() function:
diag(1,3) #put 1 on diagonal of 3x3 matrix
[,1] [,2] [,3]

[1,] 1 0 0
[2,] 0 1 0
[3,] 0 0 1
diag(2, 3) #put 2 on diagonal of 3x3 matrix
[,1] [,2] [,3]

[1,] 2 0 0
[2,] 0 2 0
[3,] 0 0 2
diag(1:4) #put 1 to 4 on diagonal of 4x4 matrix
[,1] [,2] [,3] [,4]

[1,] 1 0 0 0
[2,] 0 2 0 0
[3,] 0 0 3 0
[4,] 0 0 0 4
The diag() function can also be used to replace elements on the diagonal of
a matrix:
1.5. DIAGONAL MATRICES AND IDENTITY MATRICES 21
A = matrix(3, 3, 3)
diag(A) = 1
A
[,1] [,2] [,3]

[1,] 1 3 3
[2,] 3 1 3
[3,] 3 3 1
A = matrix(3, 3, 3)
diag(A) = 1:3
A
[,1] [,2] [,3]

[1,] 1 3 3
[2,] 3 2 3
[3,] 3 3 3
A = matrix(3, 3, 4)
diag(A[1:3, 2:4]) = 1
A
[,1] [,2] [,3] [,4]

[1,] 3 1 3 3
[2,] 3 3 1 3
[3,] 3 3 3 1
The diag() function is also used to get the diagonal of a matrix.
A = matrix(1:9, 3, 3)
diag(A)
[1] 1 5 9
The identity matrix is a special kind of diagonal matrix with 1s on the
diagonal. It is denoted I. I3 would mean a 3 × 3 diagonal matrix. A identity
matrix has the property that AI = A and IA = A so it is like a 1.
A = matrix(1:9, 3, 3)
I = diag(3) #shortcut for 3x3 identity matrix
A %*% I
[,1] [,2] [,3]

[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
1.6 Taking the inverse of a square matrix
The inverse of a matrix is denoted A−1 . You can think of the inverse of a
matrix like 1/a. 1/a × a = 1. A−1 A = AA−1 = I. The inverse of a matrix
does not always exist; for one it has to be square. We’ll be using inverses
for variance-covariance matrices and by definition (of a variance-covariance
matrix), the inverse of those exist. In R, there are a couple way common
ways to take the inverse of a variance-covariance matrix (or something with
the same properties). solve() is the most common probably:
A = diag(3, 3) + matrix(1, 3, 3)
invA = solve(A)
invA %*% A
[,1] [,2] [,3]

[1,] 1.000000e+00 -6.938894e-18 0
[2,] 2.081668e-17 1.000000e+00 0
[3,] 0.000000e+00 0.000000e+00 1
A %*% invA
[,1] [,2] [,3]

[1,] 1.000000e+00 -6.938894e-18 0
[2,] 2.081668e-17 1.000000e+00 0
[3,] 0.000000e+00 0.000000e+00 1
Another option is to use chol2inv() which uses a Cholesky decomposition1 :
1
The Cholesky decomposition is a handy way to keep your variance-covariance matrices
valid when doing a parameter search. Don’t search over the raw variance-covariance matrix.
Search over a matrix where the lower triangle is 0, that is what a Cholesky decomposition
looks like. Let’s call it B. Your variance-covariance matrix is t(B)%*%B.
1.6. TAKING THE INVERSE OF A SQUARE MATRIX 23
A = diag(3, 3) + matrix(1, 3, 3)
invA = chol2inv(chol(A))
invA %*% A
[,1] [,2] [,3]

[1,] 1.000000e+00 6.938894e-17 0.000000e+00
[2,] 2.081668e-17 1.000000e+00 -2.775558e-17
[3,] -5.551115e-17 0.000000e+00 1.000000e+00
A %*% invA
[,1] [,2] [,3]

[1,] 1.000000e+00 2.081668e-17 -5.551115e-17
[2,] 6.938894e-17 1.000000e+00 0.000000e+00
[3,] 0.000000e+00 -2.775558e-17 1.000000e+00
For the purpose of this course, solve() is fine.
1.7 Problems
1. Build a 4 × 3 matrix with the numbers 1 through 4 in each row.
2. Extract the elements in the 1st and 2nd rows and 1st and 2nd columns
(you’ll have a 2 × 2 matrix). Show the R code that will do this.
3. Build a 4 × 3 matrix with the numbers 1 through 12 by row (meaning
the first row will have the numbers 1 through 4 in it).
4. Extract the 3rd row of the above. Show R code to do this where you
end up with a vector and how to do this where you end up with a 1 × 3
matrix.
5. Build a 4 × 3 matrix that is all 1s except a 2 in the (2,3) element (2nd
row, 3rd column).
6. Take the transpose of the above.
7. Build a 4 × 4 diagonal matrix with 1 through 4 on the diagonal.
8. Build a 5 × 5 identity matrix.
9. Replace the diagonal in the above matrix with 2 (the number 2).
10. Build a matrix with 2 on the diagonal and 1s on the offdiagonals.
11. Take the inverse of the above.
12. Build a 3 × 3 matrix with the first 9 letters of the alphabet. First
column should be “a”, “b”, “c”. letters[1:9] gives you these letters.
13. Replace the diagonal of this matrix with the word “cat”.
14. Build a 4 × 3 matrix with all 1s. Multiply by a 3 × 4 matrix with all 2s.
15. If A is a 4 × 3 matrix, is AA possible? Is AA> possible? Show how to
write AA> in R.
h1 4 7i
16. In the equation, AB = C, let A = 2 5 8 . Build a B matrix with only
3 6 9
1s and 0s such that the values on the diagonal of C are 1, 8, 6 (in that
order). Show your R code for A, B and AB.
h 2 8 14 i AB = C. Build a 3 × 3 B
17. Same A matrix as above and same equation
matrix such that C = 2A. So C = 4 10 16 . Hint, B is diagonal.
6 12 18
1.7. PROBLEMS 25
18. Same A and AB = C equation. Build a B matrix to compute the row

sums of A. So the first ‘row sum’hwould be 1 + 4 + 7, the sum of all
12
i
elements in row 1 of A. C will be 15 , the row sums of A. Hint, B is
18
a column matrix (1 column).
19. Same A matrix as above but now equation BA = C. Build a B matrix
to compute the column sums of A. So the first ‘column sum’ would be
1 + 2 + 3. C will be a 1 × 3 matrix.
h2 1 1i
20. Let AB = C equation but A = 1 2 1 (so A=diag(3)+1). Build a B
h3i 1 1 2
matrix such that C = 3 . Hint, you need to use the inverse of A.
3
Chapter 2
Linear regression in matrix

form
This chapter shows how to write linear regression models in matrix form.
The purpose is to get you comfortable writing multivariate linear models in
different matrix forms before we start working with time series versions of
these models. Each matrix form is an equivalent model for the data, but
written in different forms. You do not need to worry which form is better
or worse at this point. Simply get comfortable writing multivariate linear
models in different matrix forms.
A script with all the R code in the chapter can be downloaded here. The
Rmd file of this chapter can be downloaded here.
Data and packages
This chapter uses the stats, MARSS and datasets packages. Install those
packages, if needed, and load:
library(stats)
library(MARSS)
library(datasets)
We will work with the stackloss dataset available in the datasets package.
The dataset consists of 21 observations on the efficiency of a plant that
27
28 CHAPTER 2. LINEAR REGRESSION
produces nitric acid as a function of three explanatory variables: air flow,

water temperature and acid concentration. We are going to use just the first
4 datapoints so that it is easier to write the matrices, but the concepts extend
to as many datapoints as you have.
data(stackloss, package = "datasets")
dat = stackloss[1:4, ] #subsetted first 4 rows
dat
Air.Flow Water.Temp Acid.Conc. stack.loss

1 80 27 89 42
2 80 27 88 37
3 75 25 90 37
4 62 24 87 28
2.1 A simple regression: one explanatory

variable
We will start by regressing stack loss against air flow. In R using the lm()
function this is
# the dat data.frame is defined on the first page of the
# chapter
lm(stack.loss ~ Air.Flow, data = dat)
This fits the following model for the i-th measurment:
stack.lossi = α + βairi + ei , where ei ∼ N(0, σ 2 ) (2.1)
We will write the model for all the measurements together in two different
ways, Form 1 and Form 2.
2.2. MATRIX FORM 1 29
2.2 Matrix Form 1

In this form, we have the explanatory variables in a matrix on the left of our
parameter matrix:
stack.loss1 1 air1 " # e1
     
stack.loss  1 air2  α e 
  
2
= +  2 (2.2)
 
stack.loss3  1 air3  β e3 
 
stack.loss4 1 air4 e4
You should work through the matrix algebra to make sure you understand
why Equation (2.2) is Equation (2.1) for all the i data points together.
We can write the first line of Equation (2.2) succinctly as
y = Zx + e (2.3)
where x are our parameters, y are our response variables, and Z are our
explanatory variables (with a 1 column for the intercept). The lm() function
uses Form 1, and we can recover the Z matrix for Form 1 by using the
model.matrix() function on the output from a lm() call:
fit = lm(stack.loss ~ Air.Flow, data = dat)
Z = model.matrix(fit)
Z[1:4, ]
(Intercept) Air.Flow
1 1 80
2 1 80
3 1 75
4 1 62
2.2.1 Solving for the parameters
Note: You will not need to know how to solve linear matrix equations for
this course. This section just shows you what the lm() function is doing to
estimate the parameters.
Notice that Z is not a square matrix and its inverse does not exist but the
inverse of Z> Z exists—if this is a solveable problem. We can go through the
following steps to solve for x, our parameters α and β.
Start with y = Zx + e and multiply by Z> on the left to get
Z> y = Z> Zx + Z> e
Multiply that by (Z> Z)−1 on the left to get
(Z> Z)−1 Z> y = (Z> Z)−1 Z> Zx + (Z> Z)−1 Z> e
(Z> Z)−1 Z> Z equals the identity matrix, thus
(Z> Z)−1 Z> y = x + (Z> Z)−1 Z> e
Move x to the right by itself, to get
(Z> Z)−1 Z> y − (Z> Z)−1 Z> e = x
Let’s assume our errors, the e, are i.i.d. which means that
 2
σ 0 0 0
 
 0 σ2 0 0  
e∼ MVN 
0, 

2

 0 0 σ 0 

0 0 0 σ2
This equation means e is drawn from a multivariate normal distribution with

a variance-covariance matrix that is diagonal with equal variances. Under
that assumption, the expected value of (Z> Z)−1 Z> e is zero. So we can solve
for x as
x = (Z> Z)−1 Z> y
Let’s try that with R and compare to what you get with lm():
y = matrix(dat$stack.loss, ncol = 1)
Z = cbind(1, dat$Air.Flow) #or use model.matrix() to get Z
solve(t(Z) %*% Z) %*% t(Z) %*% y
[,1]
[1,] -11.6159170
[2,] 0.6412918
coef(lm(stack.loss ~ Air.Flow, data = dat))
(Intercept) Air.Flow
-11.6159170 0.6412918
As you see, you get the same values.
2.2.2 Form 1 with multiple explanatory variables
We can easily extend Form 1 to multiple explanatory variables. Let’s say we

wanted to fit this model:
stack.lossi = α + β1 airi + β2 wateri + β3 acidi + ei (2.4)
With lm(), we can fit this with

fit1.mult = lm(stack.loss ~ Air.Flow + Water.Temp + Acid.Conc.,
data = dat)
Written in matrix form (Form 1), this is
stack.loss1 1 air1 water1 acid1 α e1

      
stack.loss  1 air2 water2 acid2  β1  e2 
   
2
=   +   (2.5)
  
stack.loss3  1 air3 water3 acid3  β2  e3 

stack.loss4 1 air4 water4 acid4 β3 e4
Now Z is a matrix with 4 columns and x is a column vector with 4 rows. We

can show the Z matrix again directly from our lm() fit:
Z = model.matrix(fit1.mult)
Z
(Intercept) Air.Flow Water.Temp Acid.Conc.

1 1 80 27 89
2 1 80 27 88
3 1 75 25 90
4 1 62 24 87
attr(,"assign")
[1] 0 1 2 3
We can solve for x just like before and compare to what we get with lm():
Z = cbind(1, dat$Air.Flow, dat$Water.Temp, dat$Acid.Conc)
# or Z=model.matrix(fit2)
solve(t(Z) %*% Z) %*% t(Z) %*% y
[,1]
[1,] -524.904762
[2,] -1.047619
[3,] 7.619048
[4,] 5.000000
coef(fit1.mult)

-524.904762 -1.047619 7.619048 5.000000
Take a look at the Z we made in R. It looks exactly like what is in our model
written in matrix form (Equation (2.5)).
2.2.3 When does Form 1 arise?
This form of writing a regression model will come up when you work with
dynamic linear models (DLMs). With DLMs, you will be fitting models of
the form yt = Zt xt + et . In these models you have multiple y at regular time
points and you allow your regression parameters, the x, to evolve through
time as a random walk.
2.2.4 Matrix Form 1b: The transpose of Form 1
We could also write Form 1 as follows:

h i
stack.loss1 stack.loss2 stack.loss3 stack.loss4 =
1 1 1 1
 
h i  air air air air  h i (2.6)
1 2 3 4 
α β1 β2 β3   + e1 e2 e3 e4
wind1 wind2 wind3 wind4 

acid1 acid2 acid3 acid4

This is just the transpose of Form 1. Work through the matrix algebra to
make sure you understand why Equation (2.6) is Equation (2.1) for all the i
data points together and why it is equal to the transpose of Equation (2.2).
You’ll need the relationship (AB)> = B> A> .
Let’s write Equation (2.6) as y = Dd, where D contains our parameters.
Then we can solve for D following the steps in Section 2.2.1 but multiplying
from the right instead of from the left. Work through the steps to show that
d = yd> (dd> )−1 .
y = matrix(dat$stack.loss, nrow = 1)
d = rbind(1, dat$Air.Flow, dat$Water.Temp, dat$Acid.Conc)
y %*% t(d) %*% solve(d %*% t(d))
[,1] [,2] [,3] [,4]

[1,] -524.9048 -1.047619 7.619048 5
coef(fit1.mult)

-524.904762 -1.047619 7.619048 5.000000
2.3 Matrix Form 2
In this form, we have the explanatory variables in a matrix on the right of our
parameter matrix as in Form 1b but we arrange everything a little differently:
 
1
stack.loss1 α β 0 0 0  e1
     
air

1
stack.loss  α 0 β 0 0 e 
 
2   2
=  air2 
+ (2.7)
  
stack.loss3  α 0 0 β 0   e3 
   
air3 

stack.loss4 α 0 0 0 β e4
air4
Work through the matrix algebra to make sure you understand why Equation
(2.7) is the same as Equation (2.1) for all the i data points together.
We will write Form 2 succinctly as
y = Zx + e (2.8)
2.3.1 Form 2 with multiple explanatory variables
The x is a column vector of the explanatory variables. If we have more

explanatory variables, we add them to the column vector at the bottom. So
if we had air flow, water temperature and acid concentration as explanatory
variables, x looks like  
1
 air1 
 
 air2 
 
 
 air 
 3 
 air4 
 
 
water 
 1
water2  (2.9)
 
 
water3 
 
water 
 4
 acid1 
 
 
 acid 
 2 
 acid3 
 
acid4
Add columns to the Z matrix for each new variable.
α β1 0 0 0 β2 0 0 0 β3 0 0 0
 
α 0 β 0 0 0 β2 0 0 0 β3 0 0 
1
(2.10)
 
α 0 0 β1 0 0 0 β2 0 0 0 β3 0 
 
α 0 0 0 β1 0 0 0 β2 0 0 0 β3
The number of rows of Z is always n, the number of rows of y, because the
number of rows on the left and right of the equal sign must match. The
number of columns in Z is determined by the size of x. If there is an intercept,
there is a 1 in x. Then each explanatory variable (like air flow and wind)
appears n times. So if the number of explanatory variables is k, the number
of columns in Z is 1 + k × n if there is an intercept term and k × n if there is
not.
2.3.2 When does Form 2 arise?
Form 2 is similar to how multivariate time series models are typically written
for reading by humans (on a whiteboard or paper). In these models, we see
equations like this:
y1 βa βb " # e1
     
y 
 2
β
 a 0.1 x
 1
e 
2
  = + (2.11)
y3   βb βa  x2 e3 
  
t
y4 t 0 βa e4 t
In this case, yt is the set of 4 observations at time t and xt is the set of 2

explanatory variables at time t. The Z is showing how we are modeling the
effects of x1 and x2 on the ys. Notice that the effects are not consistent across
the x and y. This model would not be possible to fit with lm() but will be
easy to fit with MARSS().
2.3.3 Solving for the parameters for Form 2
You can just skim this section if you want but make sure you carefully look at
the code in
refsec-mlr-solveform2code. You will need to adapt that for the homework.
Though you will not need any of the math discussed here for the course, this
section will help you practice matrix multiplication and will introduce you to
‘permutation’ matrices which will be handy in many other contexts.
To solve for α and β, we need our parameters in a column matrix like so
[ αβ ]. We do this by rewritting Zx in Equation (2.8) in ‘vec’ form: if Z
is a n × m matrix and x is a matrix with 1 column and m rows, then
Zx = (x> ⊗ In ) vec(Z). The symbol ⊗ means Kronecker product and just
ignore it since you’ll never see it again in our course (or google ‘kronecker
product’ if you are curious). The “vec” of a matrix is that matrix rearranged
as a single column:
1
 
" #
1 2 3
vec = 
 
3 4 2
4
Notice how you just take each column one by one and stack them under each
other. In R, the vec is
A = matrix(1:6, nrow = 2, byrow = TRUE)
vecA = matrix(A, ncol = 1)
In is a n × n identity matrix, a diagonal matrix with all 0s on the off-diagonals

and all 1s on the diagonal. In R, this is simply diag(n).
To show how we solve for α and β, let’s use an example with only 3 data
points so Equation (2.7) becomes:
1
 
     
stack.loss1 α β 0 0  e1
 air1   

stack.loss2  = α 0 β 0    + e2  (2.12)
  
air2 
stack.loss3 α 0 0 β e3
air3
Using Zx = (x> ⊗ In ) vec(Z), this means

 
α
α
 
 
α
 
β 
 
 
1  0
 
  
α β 0 0  1 0 0  
 air1   0
 h i  

α 0 β 0 = 1 air 1 air 2 air 3 ⊗ 0 1

0   (2.13)
air2  0
   
α 0 0 β 0 0 1  
air3 β 
 
0
 
 
0
 
0
 
β
We need to rewrite the vec(Z) as a ‘permutation’ matrix times [ αβ ]:

   
α 1 0
α 1 0
   
   
α 1 0
   
β  0 1
  

   
0 0 0
   " #
0 0 0 α
 =
  
= Pp (2.14)
0 0 0 β

  
β  0 1
   
0 0 0
   
   
0 0 0
   
0 0 0
  

β 0 1
where P is the permutation matrix and p = [ αβ ]. Thus,

" #
α
y = Zx + e = (x> ⊗ In )P = Mp + e (2.15)
β
where M = (x> ⊗ In )P. We can solve for p, the parameters, using
(M> M)−1 M> y
as before.
2.3.4 Code to solve for parameters in Form 2
In the homework, you will use the R code in this section to solve for the
parameters in Form 2. Later when you are fitting multivariate time series
models, you will not solve for parameters this way but you will need to both
construct Z matrices in R and read Z matrices. The homework will give you
practice creating Z matrices in R.
#make your y and x matrices
y=matrix(dat$stack.loss, ncol=1)
x=matrix(c(1,dat$Air.Flow),ncol=1)
#make the Z matrix
n=nrow(dat) #number of rows in our data file
k=1
#Z has n rows and 1 col for intercept, and n cols for the n air data points
#a list matrix allows us to combine "characters" and numbers
Z=matrix(list(0),n,k*n+1)
Z[,1]="alpha"
diag(Z[1:n,1+1:n])="beta"
#this function creates that permutation matrix for you
P=MARSS:::convert.model.mat(Z)$free[,,1]
M=kronecker(t(x),diag(n))%*%P
solve(t(M)%*%M)%*%t(M)%*%y
[,1]
alpha -11.6159170
beta 0.6412918
coef(lm(dat$stack.loss ~ dat$Air.Flow))
(Intercept) dat$Air.Flow
-11.6159170 0.6412918
Go through this code line by line at the R command line. Look at Z. It
is a list matrix that allows you to combine numbers (the 0s) with charac-
ter string (names of parameters). Look at the permutation matrix P. Try
MARSS:::convert.model.mat(Z)$free and see that it returns a 3D matrix,
which is why the [„1] appears (to get us a 2D matrix). To use more data
points, you can redefine dat to say dat=stackloss to use all 21 data points.
2.4 Groups of intercepts
Let’s say that the odd numbered plants are in the north and the even numbered
are in the south. We want to include this as a factor in our model that affects
the intercept. Let’s go back to just having air flow be our explanatory variable.
Now if the plant is in the north our model is
stack.lossi = αn + βairi + ei , where ei ∼ N(0, σ 2 ) (2.16)
If the plant is in the south, our model is
stack.lossi = αs + βairi + ei , where ei ∼ N(0, σ 2 ) (2.17)
We’ll add north/south as a factor called ‘reg’ (region) to our dataframe:

dat = cbind(dat, reg = rep(c("n", "s"), n)[1:n])
dat
Air.Flow Water.Temp Acid.Conc. stack.loss reg

1 80 27 89 42 n
2 80 27 88 37 s
3 75 25 90 37 n
4 62 24 87 28 s
And we can easily fit this model with lm().
2.4. GROUPS OF INTERCEPTS 39
fit2 = lm(stack.loss ~ -1 + Air.Flow + reg, data = dat)

coef(fit2)
Air.Flow regn regs

0.5358166 -2.0257880 -5.5429799
The -1 is added to the lm() call to get rid of α. We just want the αn and αs
intercepts coming from our regions.
2.4.1 North/South intercepts in Form 1
Written in matrix form, Form 1 for this model is
stack.loss1 air1 1 0   e1
     
β
stack.loss 
2
air
2 0 1   e2 
 
=  αn  +   (2.18)
  
stack.loss3  air3 1 0 e3 

αs
stack.loss4 air4 0 1 e4
Notice that odd plants get αn and even plants get αs . Use model.matrix()
to see that this is the Z matrix that lm() formed. Notice the matrix output
by model.matrix() looks exactly like Z in Equation (2.18).
Z = model.matrix(fit2)
Z[1:4, ]
Air.Flow regn regs

1 80 1 0
2 80 0 1
3 75 1 0
4 62 0 1
We can solve for the parameters using x = (Z> Z)−1 Z> y as we did for Form 1
before by adding on the 1s and 0s columns we see in the Z matrix in Equation
(2.18). We could build this Z using the following R code:
Z = cbind(dat$Air.Flow, c(1, 0, 1, 0), c(0, 1, 0, 1))
colnames(Z) = c("beta", "regn", "regs")
Or just use model.matrix(). This will save time when models are more
complex.
Z[1:4, ]
Air.Flow regn regs

1 80 1 0
2 80 0 1
3 75 1 0
4 62 0 1
Now we can solve for the parameters:
solve(t(Z) %*% Z) %*% t(Z) %*% y
[,1]
Air.Flow 0.5358166
regn -2.0257880
regs -5.5429799
Compare to the output from lm() and you will see it is the same.
coef(fit2)
Air.Flow regn regs

0.5358166 -2.0257880 -5.5429799
2.4.2 North/South intercepts in Form 2
We would write this model in Form 2 as

 
1
stack.loss1 αn β 0 0 0  e1
     
air

1
stack.loss  α 0 β 0 0 e 
 
2  s   2
=  air2 
+  = Zx + e (2.19)
 
stack.loss3  αn 0 0 β 0   e3 
  
air3 

stack.loss4 αs 0 0 0 β e4
air4
To estimate the parameters, we need to be able to write a list matrix that

looks like Z in Equation (2.19). We can use the same code we used in Section
2.3.4 with Z changed to look like that in Equation (2.19).
2.5. GROUPS OF β’S 41
x = matrix(c(1, dat$Air.Flow), ncol = 1)
n = nrow(dat)
k = 1
# list matrix allows us to combine numbers and character
# strings
Z = matrix(list(0), n, k * n + 1)
Z[seq(1, n, 2), 1] = "alphanorth"
Z[seq(2, n, 2), 1] = "alphasouth"
diag(Z[1:n, 1 + 1:n]) = "beta"
P = MARSS:::convert.model.mat(Z)$free[, , 1]
M = kronecker(t(x), diag(n)) %*% P
solve(t(M) %*% M) %*% t(M) %*% y
[,1]
alphanorth -2.0257880
alphasouth -5.5429799
beta 0.5358166
Make sure you understand the code used to form the Z matrix. Also notice
that class(Z[1,3])="numeric" while class(Z[1,2])="character". This
is important. 0 in R is a number while "0" would be a character (the name
of a parameter).
2.5 Groups of β’s

Now let’s say that the plants have different owners, Sue and Aneesh, and we
want to have β for the air flow effect vary by owner. If the plant is in the
north and owned by Sue, the model is
stack.lossi = αn + βs airi + ei , where ei ∼ N(0, σ 2 ) (2.20)
If it is in the south and owned by Aneesh, the model is
stack.lossi = αs + βa airi + ei , where ei ∼ N(0, σ 2 ) (2.21)
You get the idea.

Now we need to add an operator variable as a factor in our stackloss dataframe.

Plants 1,3 are run by Sue and plants 2,4 are run by Aneesh.
dat = cbind(dat, owner = c("s", "a"))
dat
Air.Flow Water.Temp Acid.Conc. stack.loss reg owner

1 80 27 89 42 n s
2 80 27 88 37 s a
3 75 25 90 37 n s
4 62 24 87 28 s a
Since the operator names can be replicated the length of our data set, R fills
in the operator colmun by replicating our string of operator names to the
right length, conveniently (or alarmingly).
We can easily fit this model with lm() using the “:” notation.
coef(lm(stack.loss ~ -1 + Air.Flow:owner + reg, data = dat))
regn regs Air.Flow:ownera Air.Flow:owners

-38.0 -3.0 0.5 1.0
Notice that we have 4 datapoints and are estimating 4 parameters. We are
not going to be able to estimate any more parameters than data points. If
we want to estimate any more, we’ll need to use the fuller stackflow dataset
(which has 21 data points).
2.5.1 Owner β’s in Form 1
Written in Form 1, this model is

stack.loss1 1 0 0 air1 αn e1
      
stack.loss  0 1 air2 0   αs  e2 
   
2
=    +   = Zx + e (2.22)
  
stack.loss3  1 0 0 air3 βa e3 

   
stack.loss4 0 1 air4 0 βs e4
The air data have been written to the right of the 1s and 0s for north/south
intercepts because that is how lm() writes this model in Form 1 and I want
to duplicate that (for teaching purposes). Also the β’s are ordered to be
alphabetical because lm() writes the Z matrix like that.
2.5. GROUPS OF β’S 43
Now our model is more complicated and using model.matrix() to get our Z
saves us a lot tedious matrix building.
fit3 = lm(stack.loss ~ -1 + Air.Flow:owner + reg, data = dat)
Z[1:4, ]
regn regs Air.Flow:ownera Air.Flow:owners

1 1 0 0 80
2 0 1 80 0
3 1 0 0 75
4 0 1 62 0
Notice the matrix output by model.matrix() looks exactly like Z in Equation
(2.22) (ignore the attributes info). Now we can solve for the parameters:
solve(t(Z) %*% Z) %*% t(Z) %*% y
[,1]
regn -38.0
regs -3.0
Air.Flow:ownera 0.5
Air.Flow:owners 1.0
Compare to the output from lm() and you will see it is the same.
2.5.2 Owner β’s in Form 2
To write this model in Form 2, we just add subscripts to the β’s in our Form
2 Z matrix:
 
1
stack.loss1 αn βs 0 0 0  e1
     
air1   

stack.loss 
2
α
 s 0 βa 0 0 
   e2 
=  air2 +  = Zx + e (2.23)

stack.loss3  αn 0 0 βs 0   e3 
  
air3 

stack.loss4 αs 0 0 0 βa e4
air4
To estimate the parameters, we change the β’s in our Z list matrix to have
owner designations:
x = matrix(c(1, dat$Air.Flow), ncol = 1)
n = nrow(dat)
k = 1
Z = matrix(list(0), n, k * n + 1)
Z[seq(1, n, 2), 1] = "alpha.n"
Z[seq(2, n, 2), 1] = "alpha.s"
diag(Z[1:n, 1 + 1:n]) = rep(c("beta.s", "beta.a"), n)[1:n]
P = MARSS:::convert.model.mat(Z)$free[, , 1]
M = kronecker(t(x), diag(n)) %*% P
solve(t(M) %*% M) %*% t(M) %*% y
[,1]
alpha.n -38.0
alpha.s -3.0
beta.s 1.0
beta.a 0.5
The parameters estimates are the same, though β’s are given in reversed
order simply due to the way convert.model.mat() is ordering the columns
in Form 2’s Z.
2.6 Seasonal effect as a factor

Let’s imagine that the data were taken consecutively in time by quarter. We
want to model the seasonal effect as an intercept change. We will drop all
other effects for now. If the data were collected in quarter 1, the model is
stack.lossi = α1 + ei , where ei ∼ N(0, σ 2 ) (2.24)
If collected in quarter 2, the model is
stack.lossi = α2 + ei , where ei ∼ N(0, σ 2 ) (2.25)
etc.
We add a column to our dataframe to account for season:
2.6. SEASONAL EFFECT AS A FACTOR 45
dat = cbind(dat, qtr = paste(rep("qtr", n), 1:4, sep = ""))

dat
Air.Flow Water.Temp Acid.Conc. stack.loss reg owner qtr

1 80 27 89 42 n s qtr1
2 80 27 88 37 s a qtr2
3 75 25 90 37 n s qtr3
4 62 24 87 28 s a qtr4
And we can easily fit this model with lm().
coef(lm(stack.loss ~ -1 + qtr, data = dat))
qtrqtr1 qtrqtr2 qtrqtr3 qtrqtr4

42 37 37 28
The -1 is added to the lm() call to get rid of α. We just want the α1 , α2 , etc.
intercepts coming from our quarters.
For comparison look at
coef(lm(stack.loss ~ qtr, data = dat))
(Intercept) qtrqtr2 qtrqtr3 qtrqtr4

42 -5 -5 -14
Why does it look like that when -1 is missing from the lm() call? Where
did the intercept for quarter 1 go and why are the other intercepts so much
smaller?
2.6.1 Seasonal intercepts written in Form 1
Remembering that lm() puts models in Form 1, look at the Z matrix for
Form 1:
fit4 = lm(stack.loss ~ -1 + qtr, data = dat)
Z[1:4, ]
qtrqtr1 qtrqtr2 qtrqtr3 qtrqtr4

1 1 0 0 0
2 0 1 0 0
3 0 0 1 0
4 0 0 0 1
Written in Form 1, this model is
stack.loss1 1 0 0 0 α1 e1
      
stack.loss  0 1 0 0 α2  e2 
   
2
=    +   = Zx + e (2.26)
  
stack.loss3  0 0 1 0 α3  e3 

stack.loss4 0 0 0 1 α4 e4
Compare to the model that lm() is using when the intercept included. What
does this model look like written in matrix form?
fit5 = lm(stack.loss ~ qtr, data = dat)
Z[1:4, ]
(Intercept) qtrqtr2 qtrqtr3 qtrqtr4

1 1 0 0 0
2 1 1 0 0
3 1 0 1 0
4 1 0 0 1
2.6.2 Seasonal intercepts written in Form 2
We do not need to add 1s and 0s to our Z matrix in Form 2; we just add

subscripts to our intercepts like we did when we had north-south intercepts.
In this model, we do not have any explanatory variables (except intercept) so
our x is just a 1 × 1 matrix:
stack.loss1 α1 e1
     
stack.loss  α  h i e 
2  2  2
 =   1 +   = Zx + e (2.27)

stack.loss3  α3  e3 

stack.loss4 α4 e4
2.7. SEASONAL EFFECT PLUS OTHER EXPLANATORY VARIABLES*47
2.7 Seasonal effect plus other explanatory

variables*
With our 4 data points, we are limited to estimating 4 parameters. Let’s use
the full 21 data points so we can estimate some more complex models. We’ll
add an owner variable and a quarter variable to the stackloss dataset.
data(stackloss, package = "datasets")
fulldat = stackloss
n = nrow(fulldat)
fulldat = cbind(fulldat, owner = rep(c("sue", "aneesh", "joe"),
n)[1:n], qtr = paste("qtr", rep(1:4, n)[1:n], sep = ""),
reg = rep(c("n", "s"), n)[1:n])
Let’s fit a model where there is only an effect of air flow, but that effect varies
by owner and by quarter. We also want a different intercept for each quarter.
So if datapoint i is from quarter j on a plant owned by owner k, the model is
stack.lossi = αj + βj,k airi + ei (2.28)
So there there are 4 × 3 β’s (4 quarters and 3 owners) and 4 α’s (4 quarters).
With lm(), we fit the model as:
fit7 = lm(stack.loss ~ -1 + qtr + Air.Flow:qtr:owner, data = fulldat)
Take a look at Z for Form 1 using model.matrix(Z). It’s not shown since it
is large:
model.matrix(fit7)
The x will be  
α1
 α2 
 
 
 α3 
 
α 
 4
β1,a 
 
 
β 
 2,a 
β3,a 
 
...
Take a look at the model matrix that lm() is using and make sure you
understand how Zx produces Equation (2.28).
For Form 2, our Z size doesn’t change; number of rows is n (the number
data points) and number of columns is 1 (for intercept) plus the number of
explanatory variables times n. So in this case, we only have one explanatory
variable (air flow) so Z has 1+21 columns. To allow the intercept to vary by
quarter, we use α1 in the rows of Z where the data is from quarter 1, use α2
where the data is from quarter 2, etc. Similarly we use the appropriate βj,k
depending on the quarter and owner for that data point.
We could construct Z, x and y for Form 2 using
y=matrix(fulldat$stack.loss, ncol=1)
x=matrix(c(1,fulldat$Air.Flow),ncol=1)
n=nrow(fulldat)
k=1
Z=matrix(list(0),n,k*n+1)
#give the intercepts names based on qtr
Z[,1]=paste(fulldat$qtr)
#give the betas names based on qtr and owner
diag(Z[1:n,1+1:n])=paste("beta",fulldat$qtr,fulldat$owner,sep=".")
P=MARSS:::convert.model.mat(Z)$free[,,1]
M=kronecker(t(x),diag(n))%*%P
solve(t(M)%*%M)%*%t(M)%*%y
Note, the estimates are the same as for lm() but are not listed in the same
order.
Make sure to look at the Z and x for the models and that you understand
why they look like they do.
2.8 Models with confounded parameters*
Try adding region as another factor in your model along with quarter and fit
with lm():
2.8. MODELS WITH CONFOUNDED PARAMETERS* 49
coef(lm(stack.loss ~ -1 + Air.Flow + reg + qtr, data = fulldat))
Air.Flow regn regs qtrqtr2 qtrqtr3 qtrqtr4

1.066524 -49.024320 -44.831760 -3.066094 3.499428 NA
The estimate for quarter 1 is gone (actually it was set to 0) and the estimate
for quarter 4 is NA. Look at the Z matrix for Form 1 and see if you can figure
out the problem. Try also writing out the model for the 1st plant and you’ll
see what part of the problem is and why the estimate for quarter 1 is fixed at
0.
fit = lm(stack.loss ~ -1 + Air.Flow + reg + qtr, data = fulldat)
Z = model.matrix(fit)
But why is the estimate for quarter 4 equal to NA? What if the ordering of
north and south regions was different, say 1 through 4 north, 5 through 8
south, 9 through 12 north, etc?
fulldat2 = fulldat
fulldat2$reg2 = rep(c("n", "n", "n", "n", "s", "s", "s", "s"),
3)[1:21]
fit = lm(stack.loss ~ Air.Flow + reg2 + qtr, data = fulldat2)
coef(fit)
(Intercept) Air.Flow reg2s qtrqtr2 qtrqtr3 qtrqtr4

-45.6158421 1.0407975 -3.5754722 0.7329027 3.0389763 3.6960928
Now an estimate for quarter 4 appears.
The problem is two-fold. First by having both region and quarter intercepts,
we created models where 2 intercepts appear for one i model and we cannot
estimate both. lm() helps us out by setting one of the factor effects to 0. It
will chose the first alphabetically. But as we saw with the model where odd
numbered plants were north and even numbered were south, we can still have
a situation where one of the intercepts is non-identifiable. lm() helps us out
by alerting us to the problem by setting one to NA.
Once you start developing your own models, you will need to make sure that
all your parameters are identifiable. If they are not, your code will simply
‘chase its tail’. The code will generally take forever to converge or if you did
not try different starting conditions, it may look like it converged but actually
the estimates for the confounded parameters are meaningless. So you will
need to think carefully about the model you are fitting and consider if there
are multiple parameters measuring the same thing (for example 2 intercept
parameters).
2.9. PROBLEMS 51
2.9 Problems
For the homework questions, we will using part of the airquality data set
in R. Load that as
data(airquality, package="datasets")
#remove any rows with NAs omitted.
airquality=na.omit(airquality)
#make Month a factor (i.e., the Month number is a name rather than a number)
airquality$Month=as.factor(airquality$Month)
#add a region factor
airquality$region = rep(c("north","south"),60)[1:111]
#Only use 5 data points for the homework so you can show the matrices easily
homeworkdat = airquality[1:5,]
1. Using Form 1 y = Zx + e, write out the model, showing the Z and x

matrices, being fit by this command
fit = lm(Ozone ~ Wind + Temp, data = homeworkdat)
2. For the above model, write out the following R code.

a. Create the y and Z matrices in R.
b. Solve for x (the parameters). Show that they match what you get
from the first lm() call.
3. Add -1 to your lm() call in question 1:
fit = lm(Ozone ~ -1 + Wind + Temp, data = homeworkdat)
a. What changes in your model?

b. Write out the in Form 1 as an equation. Show the new Z and x
matrices.
c. Solve for the parameters (x) and show they match what is returned
by lm().
4. For the model for question 1,
a. Write in Form 2 as an equation.
b. Adapt the code from subsection 2.3.4 and construct new Z, y and
x in R code.
c. Solve for the parameters using the code from subsection 2.3.4.
5. A model of the ozone data with only a region (north/south) effect can
be written:
fit = lm(Ozone ~ -1 + region, data = homeworkdat)
a. Write this model in Form 1 as an equation.

b. Solve for the parameter values and show that they match what
you get from the lm() call.
6. Using the same model from question 5,
a. Write the model in Form 2 as an equation.
b. Write out the Z and x in R code.
c. Solve for the parameter values and show that they match what
you get from the lm() call. To do this, you adapt the code from
subsection 2.3.4.
7. Write the model below in Form 2 as an equation. Show the Z, y and x
matrices.
fit = lm(Ozone ~ Temp:region, data = homeworkdat)
8. Using the airquality dataset with 111 data points

a. Write the model below in Form 2.
fit = lm(Ozone ~ -1 + Temp:region + Month, data = airquality)
b. Solve for the parameters by adapting code from subsection 2.3.4.

Chapter 3
Introduction to time series
At a very basic level, a time series is a set of observations taken sequentially

in time. It is different than non-temporal data because each data point has
an order and is, typically, related to the data points before and after by some
process.
Rmd for this chapter can be downloaded here.
3.1 Examples of time series
data(WWWusage, package = "datasets")

par(mai = c(0.9, 0.9, 0.1, 0.1), omi = c(0, 0, 0, 0))
plot.ts(WWWusage, ylab = "", las = 1, col = "blue", lwd = 2)
data(lynx, package = "datasets")

par(mai = c(0.9, 0.9, 0.1, 0.1), omi = c(0, 0, 0, 0))
plot.ts(lynx, ylab = "", las = 1, col = "blue", lwd = 2)
53
54 CHAPTER 3. INTRODUCTION TO TIME SERIES
200
150
100
0 20 40 60 80 100
Time
Figure 3.1: Number of users connected to the internet
7000
6000
5000
4000
3000
2000
1000
1820 1840 1860 1880 1900 1920
Time
Figure 3.2: Number of lynx trapped in Canada from 1821-1934

3.2. CLASSIFICATION OF TIME SERIES 55
3.2 Classification of time series
A ts can be represented as a set
{x1 , x2 , x3 , . . . , xn }
For example,
{10, 31, 27, 42, 53, 15}
It can be further classified.
3.2.1 By some index set
Interval across real time; x(t)

• begin/end: t ∈ [1.1, 2.5]
Discrete time; xt
• Equally spaced: t = {1, 2, 3, 4, 5}
• Equally spaced w/ missing value: t = {1, 2, 4, 5, 6}
• Unequally spaced: t = {2, 3, 4, 6, 9}
3.2.2 By the underlying process
Discrete (eg, total # of fish caught per trawl)

Continuous (eg, salinity, temperature)
3.2.3 By the number of values recorded
Univariate/scalar (eg, total # of fish caught)

Multivariate/vector (eg, # of each spp of fish caught)
3.2.4 By the type of values recorded
Integer (eg, # of fish in 5 min trawl = 2413)

Rational (eg, fraction of unclipped fish = 47/951)
Real (eg, fish mass = 10.2 g)
Complex (eg, cos(2 π 2.43) + i sin(2 π 2.43))
3.3 Statistical analyses of time series

Most statistical analyses are concerned with estimating properties of a popu-
lation from a sample. For example, we use fish caught in a seine to infer the
mean size of fish in a lake. Time series analysis, however, presents a different
situation:
• Although we could vary the length of an observed time series, it is often
impossible to make multiple observations at a given point in time
For example, one can’t observe today’s closing price of Microsoft stock more
than once. Thus, conventional statistical procedures, based on large sample
estimates, are inappropriate.
MSFT 2016−01−04 / 2016−09−30
58 58
56 56
54 54
52 52
50 50
Jan 04 2016 Mar 01 2016 May 02 2016 Jul 01 2016 Sep 01 2016
3.4. WHAT IS A TIME SERIES MODEL? 57
3.4 What is a time series model?
We use a time series model to analyze time series data. A time series model
for {xt } is a specification of the joint distributions of a sequence of random
variables {Xt }, of which {xt } is thought to be a realization.
Here is a plot of many realizations from a time series model.
1
Xt
−1
−2
−3
0 10 20 30 40
Time
Figure 3.3: Distribution of realizations
These lines represent the distribution of possible realizations. However, we

have only one realization. The time series model allows us to use the one
realization we have to make inferences about the underlying joint distribution
from whence our realization came.
3.5 Two simple and classic time series models
White noise: xt ∼ N (0, 1)

1
Xt
−1
−2
−3
0 10 20 30 40
Time
Figure 3.4: Blue line is our one realization.
par(mai = c(0.9, 0.9, 0.1, 0.1), omi = c(0, 0, 0, 0))

matplot(ww, type = "l", lty = "solid", las = 1, ylab = expression(italic(x[t])
xlab = "Time", col = gray(0.5, 0.4))
1
xt
−1
−2
−3
0 10 20 30 40
Time
3.6. CLASSICAL DECOMPOSITION 59
Random walk: xt = xt−1 + wt , with wt ∼ N (0, 1)

par(mai = c(0.9, 0.9, 0.1, 0.1), omi = c(0, 0, 0, 0))
matplot(apply(ww, 2, cumsum), type = "l", lty = "solid", las = 1,
ylab = expression(italic(x[t])), xlab = "Time", col = gray(0.5,
0.4))
10
5
xt
−5
−10
0 10 20 30 40
Time
3.6 Classical decomposition
Model time series {xt } as a combination of
1. trend (mt )
2. seasonal component (st )
3. remainder (et )
x t = m t + st + e t
3.6.1 1. The trend (mt )
We need a way to extract the so-called signal. One common method is via
“linear filters”
∞
X
mt = λi xt+1
i=−∞
For example, a moving average
a
X 1
mt = xt+i
i=−a 2a + 1
If a = 1, then
1
mt = (xt−1 + xt + xt+1 )
3
3.6.2 Example of linear filtering
Here is a time series.

A linear filter with a = 3 closely tracks the data.
As we increase the length of data that is averaged from 1 on each side (a = 3)
to 4 on each side (a = 9), the trend line is smoother.
When we increase up to 13 points on each side (a = 27), the trend line is very
smooth.
3.6.3 2. Seasonal effect (st )
Once we have an estimate of the trend mt , we can estimate st simply by

subtraction:
st = xt − mt
600
500
400
300
200
100
1950 1952 1954 1956 1958 1960
Time
Figure 3.5: Monthly airline passengers from 1949-1960
600 λ = 1/3
500
400
300
200
100
1950 1952 1954 1956 1958 1960
Time
Figure 3.6: Monthly airline passengers from 1949-1960 with a low filter.
600 λ = 1/3
λ = 1/9
500
400
300
200
100
1950 1952 1954 1956 1958 1960
Time
Figure 3.7: Monthly airline passengers from 1949-1960 with a medium filter.
600 λ = 1/3
λ = 1/9
500
λ = 1/27
400
300
200
100
1950 1952 1954 1956 1958 1960
Time
Figure 3.8: Monthly airline passengers from 1949-1960 with a high filter.
50
−50
−100
1950 1952 1954 1956 1958 1960
Time
This is the seasonal effect (st ), assuming λ = 1/9, but, st includes the
remainder et as well. Instead we can estimate the mean seasonal effect (st ).
seas_2 <- decompose(xx)$seasonal
par(mai = c(0.9, 0.9, 0.1, 0.1), omi = c(0, 0, 0, 0))
plot.ts(seas_2, las = 1, ylab = "")
3.6.4 3. Remainder (et )
Now we can estimate et via subtraction:
e t = xt − m t − s t
ee <- decompose(xx)$random
par(mai = c(0.9, 0.9, 0.1, 0.1), omi = c(0, 0, 0, 0))
plot.ts(ee, las = 1, ylab = "")
60
40
20
−20
−40
1950 1952 1954 1956 1958 1960
Time
Figure 3.9: Mean seasonal effect.
60
40
20
−20
−40
1950 1952 1954 1956 1958 1960
Time
Figure 3.10: Errors.

3.7. DECOMPOSITION ON LOG-TRANSFORMED DATA 65
3.7 Decomposition on log-transformed data
Let’s repeat the decomposition with the log of the airline data.
lx <- log(AirPassengers)
par(mai = c(0.9, 0.9, 0.1, 0.1), omi = c(0, 0, 0, 0))
plot.ts(lx, las = 1, ylab = "")
6.5
6.0
5.5
5.0
1950 1952 1954 1956 1958 1960
Time
Figure 3.11: Log monthly airline passengers from 1949-1960

3.7.1 The trend (mt )
6.5
6.0
5.5
5.0
1950 1952 1954 1956 1958 1960
Time
3.7.2 Seasonal effect (st ) with error (et )

0.3
0.2
0.1
lx − pp
0.0
−0.1
−0.2
−0.3
1950 1952 1954 1956 1958 1960
Time
3.7. DECOMPOSITION ON LOG-TRANSFORMED DATA 67
3.7.3 Mean seasonal effect (st )
0.2
0.1
0.0
−0.1
1950 1952 1954 1956 1958 1960
Time
3.7.4 Remainder (et )
le <- lx - pp - seas_2
par(mai = c(0.9, 0.9, 0.1, 0.1), omi = c(0, 0, 0, 0))
plot.ts(le, las = 1, ylab = "")
0.15
0.10
0.05
0.00
−0.05
−0.10
−0.15
1950 1952 1954 1956 1958 1960
Time
Chapter 4
Basic time series functions in R
This chapter introduces you to some of the basic functions in R for plotting
and analyzing univariate time series data. Many of the things you learn here
will be relevant when we start examining multivariate time series as well. We
will begin with the creation and plotting of time series objects in R, and then
moves on to decomposition, differencing, and correlation (e.g., ACF, PACF)
before ending with fitting and simulation of ARMA models.
Data and packages
This chapter uses the stats package, which is often loaded by default when
you start R, the MARSS package and the forecast package. The problems
use a dataset in the datasets package. After installing the packages, if
needed, load:
library(stats)
library(MARSS)
library(forecast)
library(datasets)
The chapter uses data sets which are in the atsalibrary package. If needed,
install using the devtools package.
69
70 CHAPTER 4. BASIC TS FUNCTIONS IN R
library(devtools)
The main one is a time series of the atmospheric concentration of CO2 collected
at the Mauna Loa Observatory in Hawai’i (MLCO2). The second is Northern
Hemisphere land and ocean temperature anomalies from NOAA. (NHTemp).
The problems use a data set on hourly phytoplankton counts (hourlyphyto).
Use ?MLCO2, ?NHTemp and ?hourlyphyto for information on these datasets.
Load the data.
data(NHTemp, package = "atsalibrary")
Temp <- NHTemp
data(MLCO2, package = "atsalibrary")
CO2 <- MLCO2
data(hourlyphyto, package = "atsalibrary")
pDat <- hourlyphyto
4.1 Time series plots
Time series plots are an excellent way to begin the process of understanding
what sort of process might have generated the data of interest. Traditionally,
time series have been plotted with the observed data on the y-axis and time on
the x-axis. Sequential time points are usually connected with some form of line,
but sometimes other plot forms can be a useful way of conveying important
information in the time series (e.g., barplots of sea-surface temperature
anomolies show nicely the contrasting El Niño and La Niña phenomena).
4.1.1 ts objects and plot.ts()
The CO2 data are stored in R as a data.frame object, but we would like
to transform the class to a more user-friendly format for dealing with time
series. Fortunately, the ts() function will do just that, and return an object
of class ts as well. In addition to the data themselves, we need to provide
ts() with 2 pieces of information about the time index for the data.
4.1. TIME SERIES PLOTS 71
The first, frequency, is a bit of a misnomer because it does not really refer
to the number of cycles per unit time, but rather the number of observa-
tions/samples per cycle. So, for example, if the data were collected each hour
of a day then frequency=24.
The second, start, specifies the first sample in terms of (day, hour), (year,
month), etc. So, for example, if the data were collected monthly beginning
in November of 1969, then frequency=12 and start=c(1969,11). If the
data were collected annually, then you simply specify start as a scalar (e.g.,
start=1991) and omit frequency (i.e., R will set frequency=1 by default).
The Mauna Loa time series is collected monthly and begins in March of 1958,
which we can get from the data themselves, and then pass to ts().
## create a time series (ts) object from the CO2 data
co2 <- ts(data = CO2$ppm, frequency = 12, start = c(CO2[1, "year"],
CO2[1, "month"]))
Now let’s plot the data using plot.ts(), which is designed specifically for ts
objects like the one we just created above. It’s nice because we don’t need to
specify any x-values as they are taken directly from the ts object.
## plot the ts
plot.ts(co2, ylab = expression(paste("CO"[2], " (ppm)")))
400
CO2 (ppm)
360
320
1960 1970 1980 1990 2000 2010
Time
Figure 4.1: Time series of the atmospheric CO2 concentration at Mauna Loa,
Hawai’i measured monthly from March 1958 to present.
Examination of the plotted time series (Figure 4.1) shows 2 obvious features
that would violate any assumption of stationarity: 1) an increasing (and

perhaps non-linear) trend over time, and 2) strong seasonal patterns. (Aside:
Do you know the causes of these 2 phenomena?)
4.1.2 Combining and plotting multiple ts objects
Before we examine the CO2 data further, however, let’s see a quick example
of how you can combine and plot multiple time series together. We’ll use the
data on monthly mean temperature anomolies for the Northern Hemisphere
(Temp). First convert Temp to a ts object.
temp.ts <- ts(data = Temp$Value, frequency = 12, start = c(1880,
1))
Before we can plot the two time series together, however, we need to line up
their time indices because the temperature data start in January of 1880,
but the CO2 data start in March of 1958. Fortunately, the ts.intersect()
function makes this really easy once the data have been transformed to ts
objects by trimming the data to a common time frame. Also, ts.union()
works in a similar fashion, but it pads one or both series with the appropriate
number of NA’s. Let’s try both.
## intersection (only overlapping times)
datI <- ts.intersect(co2, temp.ts)
## dimensions of common-time data
dim(datI)
[1] 682 2
## union (all times)
datU <- ts.union(co2, temp.ts)
## dimensions of all-time data
dim(datU)
[1] 1647 2
As you can see, the intersection of the two data sets is much smaller than
the union. If you compare them, you will see that the first 938 rows of datU
contains NA in the co2 column.
4.2. DECOMPOSITION OF TIME SERIES 73
It turns out that the regular plot() function in R is smart enough to recognize
a ts object and use the information contained therein appropriately. Here’s
how to plot the intersection of the two time series together with the y-axes
on alternate sides (results are shown in Figure 4.2):
## plot the ts
plot(datI, main = "", yax.flip = TRUE)
400
360
co2
320
temp.ts
0.5
−0.5
1960 1970 1980 1990 2000 2010
Time
Figure 4.2: Time series of the atmospheric CO2 concentration at Mauna Loa,
Hawai’i (top) and the mean temperature index for the Northern Hemisphere
(bottom) measured monthly from March 1958 to present.
4.2 Decomposition of time series
Plotting time series data is an important first step in analyzing their various
components. Beyond that, however, we need a more formal means for
identifying and removing characteristics such as a trend or seasonal variation.
As discussed in lecture, the decomposition model reduces a time series into 3
components: trend, seasonal effects, and random errors. In turn, we aim to

model the random errors as some form of stationary process.
Let’s begin with a simple, additive decomposition model for a time series xt
xt = mt + st + et , (4.1)
where, at time t, mt is the trend, st is the seasonal effect, and et is a random

error that we generally assume to have zero-mean and to be correlated over
time. Thus, by estimating and subtracting both {mt } and {st } from {xt }, we
hope to have a time series of stationary residuals {et }.
4.2.1 Estimating trends
In lecture we discussed how linear filters are a common way to estimate trends
in time series. One of the most common linear filters is the moving average,
which for time lags from −a to a is defined as
a
1
X
m̂t = xt+k . (4.2)
k=−a 1 + 2a
This model works well for moving windows of odd-numbered lengths, but
should be adjusted for even-numbered lengths by adding only 12 of the 2 most
extreme lags so that the filtered value at time t lines up with the original
observation at time t. So, for example, in a case with monthly data such as
the atmospheric CO2 concentration where a 12-point moving average would
be an obvious choice, the linear filter would be
1
x
2 t−6
+ xt−5 + · · · + xt−1 + xt + xt+1 + · · · + xt+5 + 12 xt+6
m̂t = (4.3)
12
It is important to note here that our time series of the estimated trend {m̂t }
is actually shorter than the observed time series by 2a units.
Conveniently, R has the built-in function filter() for estimating moving-
average (and other) linear filters. In addition to specifying the time series to
be filtered, we need to pass in the filter weights (and 2 other arguments we

won’t worry about here–type ?filter to get more information). The easiest
way to create the filter is with the rep() function:
## weights for moving avg
fltr <- c(1/2, rep(1, times = 11), 1/2)/12
Now let’s get our estimate of the trend {m̂} with filter()} and plot it:
## estimate of trend
co2.trend <- filter(co2, filter = fltr, method = "convo", sides = 2)
## plot the trend
plot.ts(co2.trend, ylab = "Trend", cex = 1)
The trend is a more-or-less smoothly increasing function over time, the average
slope of which does indeed appear to be increasing over time as well (Figure
4.3).
400
Trend
360
320
1960 1970 1980 1990 2000 2010
Time
Figure 4.3: Time series of the estimated trend {m̂t } for the atmospheric CO2
concentration at Mauna Loa, Hawai’i.
4.2.2 Estimating seasonal effects
Once we have an estimate of the trend for time t (m̂t ) we can easily obtain
an estimate of the seasonal effect at time t (ŝt ) by subtraction
ŝt = xt − m̂t , (4.4)

which is really easy to do in R:

## seasonal effect over time
co2.1T <- co2 - co2.trend
This estimate of the seasonal effect for each time t also contains the random
error et , however, which can be seen by plotting the time series and careful
comparison of Equations (4.1) and (4.4).
## plot the monthly seasonal effects
plot.ts(co2.1T, ylab = "Seasonal effect", xlab = "Month", cex = 1)
4
Seasonal effect plus errors
2
0
−2
−4
1960 1970 1980 1990 2000 2010
Month
Figure 4.4: Time series of seasonal effects plus random errors for the atmo-
spheric CO2 concentration at Mauna Loa, Hawai’i, measured monthly from
March 1958 to present.
We can obtain the overall seasonal effect by averaging the estimates of {ŝt }
for each month and repeating this sequence over all years.
## length of ts
ll <- length(co2.1T)
## frequency (ie, 12)
ff <- frequency(co2.1T)
## number of periods (years); %/% is integer division
periods <- ll%/%ff
## index of cumulative month
index <- seq(1, ll, by = ff) - 1
## get mean by month
mm <- numeric(ff)
for (i in 1:ff) {
mm[i] <- mean(co2.1T[index + i], na.rm = TRUE)
}
## subtract mean to make overall mean=0
mm <- mm - mean(mm)
Before we create the entire time series of seasonal effects, let’s plot them for
each month to see what is happening within a year:
## plot the monthly seasonal effects
plot.ts(mm, ylab = "Seasonal effect", xlab = "Month", cex = 1)
It looks like, on average, that the CO2 concentration is highest in spring

(March) and lowest in summer (August) (Figure 4.5). (Aside: Do you know
why this is?)
3
2
Seasonal effect
1
−1 0
−3
2 4 6 8 10 12
Month
Figure 4.5: Estimated monthly seasonal effects for the atmospheric CO2
concentration at Mauna Loa, Hawai’i.
Finally, let’s create the entire time series of seasonal effects {ŝt }:
## create ts object for season
co2.seas <- ts(rep(mm, periods + 1)[seq(ll)], start = start(co2.1T),
frequency = ff)
4.2.3 Completing the model
The last step in completing our full decomposition model is obtaining the
random errors {êt }, which we can get via simple subtraction
êt = xt − m̂t − ŝt . (4.5)
Again, this is really easy in R:

## random errors over time
co2.err <- co2 - co2.trend - co2.seas
Now that we have all 3 of our model components, let’s plot them together
with the observed data {xt }. The results are shown in Figure 4.6.
## plot the obs ts, trend & seasonal effect
plot(cbind(co2, co2.trend, co2.seas, co2.err), main = "", yax.flip = TRUE)
4.2.4 Using decompose() for decomposition
Now that we have seen how to estimate and plot the various components of a
classical decomposition model in a piecewise manner, let’s see how to do this
in one step in R with the function decompose(), which accepts a ts object
as input and returns an object of class decomposed.ts.
## decomposition of CO2 data
co2.decomp <- decompose(co2)
co2.decomp is a list with the following elements, which should be familiar by

now:
• ```x``` the observed time series $\{x_t\}$
• ```seasonal``` time series of estimated seasonal component $\{\hat{s}_t
• ```figure``` mean seasonal effect (```length(figure) == frequency(x)```
• ```trend``` time series of estimated trend $\{\hat{m}_t\}$
• ```random``` time series of random errors $\{\hat{e}_t\}$
400
co2
360
320
400
co2.trend
360
320
1 2 3
co2.seas
−1
−3
1.0
co2.err
0.0
−1.0
1960 1970 1980 1990 2000 2010
Time
Figure 4.6: Time series of the observed atmospheric CO2 concentration at

Mauna Loa, Hawai’i (top) along with the estimated trend, seasonal effects,
and random errors.
• ```type``` type of error (```"additive"``` or ```"multiplicative"```)

We can easily make plots of the output and compare them to those in Figure
4.6:
## plot the obs ts, trend & seasonal effect
plot(co2.decomp, yax.flip = TRUE)
Decomposition of additive time series

400
observed
360
320
400
trend
360
320
1 2 3
seasonal
−1
−3
1.0
random
0.0
−1.0
1960 1970 1980 1990 2000 2010
Time
Figure 4.7: Time series of the observed atmospheric CO2 concentration at

Mauna Loa, Hawai’i (top) along with the estimated trend, seasonal effects,
and random errors obtained with the function decompose().
The results obtained with decompose() (Figure 4.7) are identical to those
we estimated previously.
Another nice feature of the decompose() function is that it can be used for
decomposition models with multiplicative (i.e., non-additive) errors (e.g., if
the original time series had a seasonal amplitude that increased with time).
4.3. DIFFERENCING TO REMOVE A TREND OR SEASONAL EFFECTS81
To do, so pass in the argument type="multiplicative", which is set to

type="additive" by default.
4.3 Differencing to remove a trend or sea-

sonal effects
An alternative to decomposition for removing trends is differencing. We saw

in lecture how the difference operator works and how it can be used to remove
linear and nonlinear trends as well as various seasonal features that might be
evident in the data. As a reminder, we define the difference operator as
∇xt = xt − xt−1 , (4.6)
and, more generally, for order d
∇d xt = (1 − B)d xt , (4.7)
where B is the backshift operator (i.e., Bk xt = xt−k for k ≥ 1).
So, for example, a random walk is one of the most simple and widely used
time series models, but it is not stationary. We can write a random walk
model as
xt = xt−1 + wt , with wt ∼ N(0, q). (4.8)
Applying the difference operator to Equation (4.8) will yield a time series of
Gaussian white noise errors {wt }:
∇(xt = xt−1 + wt )
xt − xt−1 = xt−1 − xt−1 + wt (4.9)
xt − xt−1 = wt
4.3.1 Using the diff() function
In R we can use the diff() function for differencing a time series, which
requires 3 arguments: x (the data), lag (the lag at which to difference), and
differences (the order of differencing; d in Equation (4.7)). For example,
first-differencing a time series will remove a linear trend (i.e., differences=1);
twice-differencing will remove a quadratic trend (i.e., differences=2). In
addition, first-differencing a time series at a lag equal to the period will
remove a seasonal trend (e.g., set lag=12 for monthly data).
Let’s use diff() to remove the trend and seasonal signal from the CO2 time
series, beginning with the trend. Close inspection of Figure 4.1 would suggest
that there is a nonlinear increase in CO2 concentration over time, so we’ll set
differences=2):
## twice-difference the CO2 data
co2.D2 <- diff(co2, differences = 2)
## plot the differenced data
plot(co2.D2, ylab = expression(paste(nabla^2, "CO"[2])))
2
1
∇2CO2
0
−1
−2
1960 1970 1980 1990 2000 2010
Time
Figure 4.8: Time series of the twice-differenced atmospheric CO2 concentration

at Mauna Loa, Hawai’i.
We were apparently successful in removing the trend, but the seasonal effect
still appears obvious (Figure 4.8). Therefore, let’s go ahead and difference
that series at lag-12 because our data were collected monthly.
4.4. CORRELATION WITHIN AND AMONG TIME SERIES 83
## difference the differenced CO2 data

co2.D2D12 <- diff(co2.D2, lag = 12)
## plot the newly differenced data
plot(co2.D2D12, ylab = expression(paste(nabla, "(", nabla^2,
"CO"[2], ")")))
2
1
∇(∇2CO2)
0
−1
−2
1960 1970 1980 1990 2000 2010
Time
Figure 4.9: Time series of the lag-12 difference of the twice-differenced

atmospheric CO2 concentration at Mauna Loa, Hawai’i.
Now we have a time series that appears to be random errors without any
obvious trend or seasonal components (Figure 4.9).
4.4 Correlation within and among time series

The concepts of covariance and correlation are very important in time series
analysis. In particular, we can examine the correlation structure of the original
data or random errors from a decomposition model to help us identify possible
form(s) of (non)stationary model(s) for the stochastic process.
4.4.1 Autocorrelation function (ACF)
Autocorrelation is the correlation of a variable with itself at differing time

lags. Recall from lecture that we defined the sample autocovariance function
(ACVF), ck , for some lag k as
1 n−k
X
ck = (xt − x̄) (xt+k − x̄) (4.10)
n t=1
Note that the sample autocovariance of {xt } at lag 0, c0 , equals the sample
variance of {xt } calculated with a denominator of n. The sample autocorrela-
tion function (ACF) is defined as
ck
rk = = Cor(xt , xt+k ) (4.11)
c0
Recall also that an approximate 95% confidence interval on the ACF can be
estimated by
1 2
− ±√ (4.12)
n n
where n is the number of data points used in the calculation of the ACF.
It is important to remember two things here. First, although the confidence
interval is commonly plotted and interpreted as a horizontal line over all time
lags, the interval itself actually grows as the lag increases because the number
of data points n used to estimate the correlation decreases by 1 for every
integer increase in lag. Second, care must be exercised when interpreting the
“significance” of the correlation at various lags because we should expect, a
priori, that approximately 1 out of every 20 correlations will be significant
based on chance alone.
We can use the acf() function in R to compute the sample ACF (note
that adding the option type="covariance" will return the sample auto-
covariance (ACVF) instead of the ACF–type ?acf for details). Calling the
function by itself will will automatically produce a correlogram (i.e., a plot
of the autocorrelation versus time lag). The argument lag.max allows you to
set the number of positive and negative lags. Let’s try it for the CO2 data.
## correlogram of the CO2 data
acf(co2, lag.max = 36)
There are 4 things about Figure 4.10 that are noteworthy:

0.8
ACF
0.4
0.0
0.0 0.5 1.0 1.5 2.0 2.5 3.0
Lag
Figure 4.10: Correlogram of the observed atmospheric CO2 concentration at

Mauna Loa, Hawai’i obtained with the function acf().
1. the ACF at lag 0, r0 , equals 1 by default (i.e., the correlation of a time

series with itself)–it’s plotted as a reference point;
2. the x-axis has decimal values for lags, which is caused by R using the
year index as the lag rather than the month;
3. the horizontal blue lines are the approximate 95% CI’s; and
4. there is very high autocorrelation even out to lags of 36 months.
As an alternative to the default plots for acf objects, let’s define a new plot
function for acf objects with some better features:
## better ACF plot
plot.acf <- function(ACFobj) {
rr <- ACFobj$acf[-1]
kk <- length(rr)
nn <- ACFobj$n.used
plot(seq(kk), rr, type = "h", lwd = 2, yaxs = "i", xaxs = "i",
ylim = c(floor(min(rr)), 1), xlim = c(0, kk + 1), xlab = "Lag",
ylab = "Correlation", las = 1)
abline(h = -1/nn + c(-2, 2)/sqrt(nn), lty = "dashed", col = "blue")
abline(h = 0)
}
Now we can assign the result of acf() to a variable and then use the infor-
mation contained therein to plot the correlogram with our new plot function.
## acf of the CO2 data

co2.acf <- acf(co2, lag.max = 36)
plot.acf(co2.acf)
Series co2
1.0
0.8
0.6
ACF
0.4
0.2
0.0
0.0 0.5 1.0 1.5 2.0 2.5 3.0
Lag
1.0
0.8
Correlation
0.6
0.4
0.2
0.0
0 5 10 15 20 25 30 35
Lag
Figure 4.11: Correlogram of the observed atmospheric CO2 concentration at

Mauna Loa, Hawai’i obtained with the function plot.acf().
Notice that all of the relevant information is still there (Figure 4.11), but
now r0 = 1 is not plotted at lag-0 and the lags on the x-axis are displayed
correctly as integers.
Before we move on to the PACF, let’s look at the ACF for some deterministic
time series, which will help you identify interesting properties (e.g., trends,
seasonal effects) in a stochastic time series, and account for them in time
series models–an important topic in this course. First, let’s look at a straight
line.
## length of ts
nn <- 100
## create straight line
tt <- seq(nn)
## set up plot area
par(mfrow = c(1, 2))
## plot line
plot.ts(tt, ylab = expression(italic(x[t])))
## get ACF
line.acf <- acf(tt, plot = FALSE)
## plot ACF
plot.acf(line.acf)
Series tt
1.0
0.8
0.6
ACF
0.4
0.2
−0.2
0 5 10 15 20
Lag
The correlogram for a straight line is itself a linearly decreasing function over
time (Figure 4.12).
Now let’s examine the ACF for a sine wave and see what sort of pattern
80 100
1.0
0.8
Correlation
60
0.6
xt
40
0.4
20
0.2
0
0.0
0 20 40 60 80 100 0 5 10 15 20
Time Lag
Figure 4.12: Time series plot of a straight line (left) and the correlogram of
its ACF (right).
arises.
## create sine wave
tt <- sin(2 * pi * seq(nn)/12)
## set up plot area
## plot line
## get ACF
sine.acf <- acf(tt, plot = FALSE)
## plot ACF
plot.acf(sine.acf)
Perhaps not surprisingly, the correlogram for a sine wave is itself a sine wave
whose amplitude decreases linearly over time (Figure 4.13).
Now let’s examine the ACF for a sine wave with a linear downward trend
and see what sort of patterns arise.
## create sine wave with trend
tt <- sin(2 * pi * seq(nn)/12) - seq(nn)/50
## set up plot area
## plot line
1.0
1.0
0.5
0.5
Correlation
0.0
xt
0.0
−0.5
−1.0
−1.0
0 20 40 60 80 100 0 5 10 15 20
Time Lag
Figure 4.13: Time series plot of a discrete sine wave (left) and the correlogram
of its ACF (right).
## get ACF
sili.acf <- acf(tt, plot = FALSE)
## plot ACF
plot.acf(sili.acf)
1.0
1
0.5
0
Correlation
−1
xt
0.0
−2
−0.5
−3
−1.0
0 20 40 60 80 100 0 5 10 15 20
Time Lag
Figure 4.14: Time series plot of a discrete sine wave (left) and the correlogram
of its ACF (right).
The correlogram for a sine wave with a trend is itself a nonsymmetrical sine
wave whose amplitude and center decrease over time (Figure 4.14).
As we have seen, the ACF is a powerful tool in time series analysis for
identifying important features in the data. As we will see later, the ACF is
also an important diagnostic tool for helping to select the proper order of p
and q in ARMA(p,q) models.
4.4.2 Partial autocorrelation function (PACF)
The partial autocorrelation function (PACF) measures the linear correlation

of a series {xt } and a lagged version of itself {xt+k } with the linear dependence
of {xt−1 , xt−2 , . . . , xt−(k−1) } removed. Recall from lecture that we define the
PACF as

Cor(x
1 , x0 )
= r1 if k = 1;
fk = (4.13)
Cor(xk − xk−1 , x0 − xk−1 ) if k ≥ 2;
k 0
with
xk−1
k = β1 xk−1 + β2 xk−2 + · · · + βk−1 x1 ; (4.14a)
xk−1
0 = β1 x1 + β2 x2 + · · · + βk−1 xk−1 . (4.14b)
It’s easy to compute the PACF for a variable in R using the pacf() function,
which will automatically plot a correlogram when called by itself (similar to
acf()). Let’s look at the PACF for the CO2 data.
## PACF of the CO2 data
pacf(co2, lag.max = 36)
The default plot for PACF is a bit better than for ACF, but here is another
plotting function that might be useful.
## better PACF plot
plot.pacf <- function(PACFobj) {
rr <- PACFobj$acf
kk <- length(rr)
nn <- PACFobj$n.used
plot(seq(kk), rr, type = "h", lwd = 2, yaxs = "i", xaxs = "i",
ylim = c(floor(min(rr)), 1), xlim = c(0, kk + 1), xlab = "Lag",
ylab = "PACF", las = 1)
abline(h = -1/nn + c(-2, 2)/sqrt(nn), lty = "dashed", col = "blue")

abline(h = 0)
}
0.8
Partial ACF
0.4
0.0
0.0 0.5 1.0 1.5 2.0 2.5 3.0
Lag
Figure 4.15: Correlogram of the PACF for the observed atmospheric CO2
concentration at Mauna Loa, Hawai’i obtained with the function pacf().
Notice in Figure 4.15 that the partial autocorrelation at lag-1 is very high
(it equals the ACF at lag-1), but the other values at lags > 1 are relatively
small, unlike what we saw for the ACF. We will discuss this in more detail
later on in this lab.
Notice also that the PACF plot again has real-valued indices for the time
lag, but it does not include any value for lag-0 because it is impossible to
remove any intermediate autocorrelation between t and t − k when k = 0, and
therefore the PACF does not exist at lag-0. If you would like, you can use the
plot.acf() function we defined above to plot the PACF estimates because
acf() and pacf() produce identical list structures (results not shown here).
## PACF of the CO2 data
co2.pacf <- pacf(co2)
plot.acf(co2.pacf)
As with the ACF, we will see later on how the PACF can also be used to help
identify the appropriate order of p and q in ARMA(p,q) models.
4.4.3 Cross-correlation function (CCF)
Often we are interested in looking for relationships between 2 different time

series. There are many ways to do this, but a simple method is via examination
of their cross-covariance and cross-correlation.
We begin by defining the sample cross-covariance function (CCVF) in a

manner similar to the ACVF, in that
1 n−k
gkxy
X
= (yt − ȳ) (xt+k − x̄) , (4.15)
n t=1
but now we are estimating the correlation between a variable y and a different
time-shifted variable xt+k . The sample cross-correlation function (CCF) is
then defined analogously to the ACF, such that
gkxy
rkxy = q ; (4.16)
SDx SDy
SDx and SDy are the sample standard deviations of {xt } and {yt }, respectively.
It is important to re-iterate here that rkxy = xy
6 r−k , but rkxy = r−k
yx
. Therefore,
it is very important to pay particular attention to which variable you call y
(i.e., the “response”) and which you call x (i.e., the “predictor”).
As with the ACF, an approximate 95% confidence interval on the CCF can
be estimated by
1 2
− ±√ (4.17)
n n
where n is the number of data points used in the calculation of the CCF, and
the same assumptions apply to its interpretation.
Computing the CCF in R is easy with the function ccf() and it works just
like acf(). In fact, ccf() is just a “wrapper” function that calls acf(). As
4.5. WHITE NOISE (WN) 93
an example, let’s examine the CCF between sunspot activity and number of
lynx trapped in Canada as in the classic paper by Moran1 .
To begin, let’s get the data, which are conveniently included in the datasets
package included as part of the base installation of R. Before calculating the
CCF, however, we need to find the matching years of data. Again, we’ll use
the ts.intersect() function.
## get the matching years of sunspot data
suns <- ts.intersect(lynx, sunspot.year)[, "sunspot.year"]
## get the matching lynx data
lynx <- ts.intersect(lynx, sunspot.year)[, "lynx"]
Here are plots of the time series.

## plot time series
plot(cbind(suns, lynx), yax.flip = TRUE)
It is important to remember which of the 2 variables you call y and x when

calling ccf(x, y, ...). In this case, it seems most relevant to treat lynx
as the y and sunspots as the x, in which case we are mostly interested in
the CCF at negative lags (i.e., when sunspot activity predates inferred lynx
abundance). Furthermore, we’ll use log-transformed lynx trappings.
## CCF of sunspots and lynx
ccf(suns, log(lynx), ylab = "Cross-correlation")
From Figures 4.16 and 4.17 it looks like lynx numbers are relatively low 3-5
years after high sunspot activity (i.e., significant correlation at lags of -3 to
-5).
4.5 White noise (WN)

A time series {wt } is a discrete white noise series (DWN) if the w1 , w1 , . . . , wt
are independent and identically distributed (IID) with a mean of zero. For
most of the examples in this course we will assume that the wt ∼ N(0, q),
and therefore we refer to the time series {wt } as Gaussian white noise. If
1
Moran, P.A.P. 1949. The statistical analysis of the sunspot and lynx cycles. *J. Anim.
Ecol.* 18:115-116
120
80
suns
40
0
5000
lynx
2000
0
1820 1840 1860 1880 1900 1920
Time
Figure 4.16: Time series of sunspot activity (top) and lynx trappings in
Canada (bottom) from 1821-1934.
0.1 0.2
Cross−correlation
−0.1
−0.3
−15 −10 −5 0 5 10 15
Lag
Figure 4.17: CCF for annual sunspot activity and the log of the number of
lynx trappings in Canada from 1821-1934.
4.5. WHITE NOISE (WN) 95
our time series model has done an adequate job of removing all of the serial
autocorrelation in the time series with trends, seasonal effects, etc., then
the model residuals (et = yt − ŷt ) will be a WN sequence with the following
properties for its mean (ē), covariance (ck ), and autocorrelation (rk ):
x̄ = 0

q if k = 0
ck = Cov(et , et+k ) =
6 1
0 if k = (4.18)

1 if k = 0
rk = Cor(et , et+k ) =
6 1.
0 if k =
4.5.1 Simulating white noise
Simulating WN in R is straightforward with a variety of built-in random

number generators for continuous and discrete distributions. Once you know
R’s abbreviation for the distribution of interest, you add an r to the beginning
to get the function’s name. For example, a Gaussian (or normal) distribution
is abbreviated norm and so the function is rnorm(). All of the random number
functions require two things: the number of samples from the distribution
and the parameters for the distribution itself (e.g., mean & SD of a normal).
Check the help file for the distribution of interest to find out what parameters
you must specify (e.g., type ?rnorm to see the help for a normal distribution).
Here’s how to generate 100 samples from a normal distribution with mean of
5 and standard deviation of 0.2, and 50 samples from a Poisson distribution
with a rate (λ) of 20.
set.seed(123)
## random normal variates
GWN <- rnorm(n = 100, mean = 5, sd = 0.2)
## random Poisson variates
PWN <- rpois(n = 50, lambda = 20)
Here are plots of the time series. Notice that on one occasion the same number
was drawn twice in a row from the Poisson distribution, which is discrete.
That is virtually guaranteed to never happen with a continuous distribution.
## set up plot region

## plot normal variates with mean
plot.ts(GWN)
abline(h = 5, col = "blue", lty = "dashed")
## plot Poisson variates with mean
plot.ts(PWN)
abline(h = 20, col = "blue", lty = "dashed")
30
5.4
25
5.2
GWN
PWN
5.0
20
4.8
15
4.6
10
0 20 40 60 80 100 0 10 20 30 40 50
Time Time
Figure 4.18: Time series plots of simulated Gaussian (left) and Poisson (right)
white noise.
Now let’s examine the ACF for the 2 white noise series and see if there is, in
fact, zero autocorrelation for lags ≥ 1.
## plot normal variates with mean
acf(GWN, main = "", lag.max = 20)
## plot Poisson variates with mean
acf(PWN, main = "", lag.max = 20)
Interestingly, the rk are all greater than zero in absolute value although they
are not statistically different from zero for lags 1-20. This is because we are
dealing with a sample of the distributions rather than the entire population
of all random variates. As an exercise, try setting n=1e6 instead of n=100
or n=50 in the calls calls above to generate the WN sequences and see what
effect it has on the estimation of rk . It is also important to remember, as we
4.6. RANDOM WALKS (RW) 97
1.0
1.0
0.6
0.6
ACF
ACF
0.2
0.2
−0.2
−0.2
0 5 10 15 20 0 5 10 15 20
Lag Lag
Figure 4.19: ACF’s for the simulated Gaussian (left) and Poisson (right)
white noise shown in Figure 4.18.
discussed earlier, that we should expect that approximately 1 in 20 of the rk

will be statistically greater than zero based on chance alone, especially for
relatively small sample sizes, so don’t get too excited if you ever come across
a case like then when inspecting model residuals.
4.6 Random walks (RW)
Random walks receive considerable attention in time series analyses because

of their ability to fit a wide range of data despite their surprising simplicity.
In fact, random walks are the most simple non-stationary time series model.
A random walk is a time series {xt } where
xt = xt−1 + wt , (4.19)
and wt is a discrete white noise series where all values are independent and
identically distributed (IID) with a mean of zero. In practice, we will almost
always assume that the wt are Gaussian white noise, such that wt ∼ N(0, q).
We will see later that a random walk is a special case of an autoregressive
model.
4.6.1 Simulating a random walk
Simulating a RW model in R is straightforward with a for loop and the use

of rnorm() to generate Gaussian errors (type ?rnorm to see details on the
function and its useful relatives dnorm() and pnorm()). Let’s create 100 obs
(we’ll also set the random number seed so everyone gets the same results).
## set random number seed
set.seed(123)
## length of time series
TT <- 100
## initialize {x_t} and {w_t}
xx <- ww <- rnorm(n = TT, mean = 0, sd = 1)
## compute values 2 thru TT
for (t in 2:TT) {
xx[t] <- xx[t - 1] + ww[t]
}
Now let’s plot the simulated time series and its ACF.
## setup plot area
## plot line
plot.ts(xx, ylab = expression(italic(x[t])))
## plot ACF
plot.acf(acf(xx, plot = FALSE))
Perhaps not surprisingly based on their names, autoregressive models such as

RW’s have a high degree of autocorrelation out to long lags (Figure 4.20).
4.6.2 Alternative formulation of a random walk
As an aside, let’s use an alternative formulation of a random walk model to

see an even shorter way to simulate an RW in R. Based on our definition of a
random walk in Equation (4.19), it is easy to see that
4.6. RANDOM WALKS (RW) 99
1.0
8 10
0.5
Correlation
6
4
xt
0.0
2
−0.5
−2 0
−1.0
0 20 40 60 80 100 0 5 10 15 20
Time Lag
Figure 4.20: Simulated time series of a random walk model (left) and its
associated ACF (right).
xt = xt−1 + wt
xt−1 = xt−2 + wt−1
xt−2 = xt−3 + wt−2 (4.20)
..
.
Therefore, if we substitute xt−2 + wt−1 for xt−1 in the first equation, and then
xt−3 + wt−2 for xt−2 , and so on in a recursive manner, we get
xt = wt + wt−1 + wt−2 + · · · + wt−∞ + xt−∞ . (4.21)
In practice, however, the time series will not start an infinite time ago, but
rather at some t = 1, in which case we can write
xt = w1 + w2 + · · · + wt
T
X (4.22)
= wt .
t=1
From Equation (4.22) it is easy to see that the value of an RW process at time
step t is the sum of all the random errors up through time t. Therefore, in R
we can easily simulate a realization from an RW process using the cumsum(x)
function, which does cumulative summation of the vector x over its entire
length. If we use the same errors as before, we should get the same results.
## simulate RW
x2 <- cumsum(ww)
Let’s plot both time series to see if it worked.

## setup plot area
## plot 1st RW
plot.ts(xx, ylab = expression(italic(x[t])))
## plot 2nd RW
plot.ts(x2, ylab = expression(italic(x[t])))
8 10
8 10
6
6
4
4
xt
xt
2
2
−2 0
−2 0
0 20 40 60 80 100 0 20 40 60 80 100
Time Time
Figure 4.21: Time series of the same random walk model formulated as
Equation (4.19) and simulated via a for loop (left), and as Equation (4.22)
and simulated via cumsum() (right).
Indeed, both methods of generating a RW time series appear to be equivalent.
4.7 Autoregressive (AR) models

Autoregressive models of order p, abbreviated AR(p), are commonly used
in time series analyses. In particular, AR(1) models (and their multivariate
extensions) see considerable use in ecology as we will see later in the course.
Recall from lecture that an AR(p) model is written as
4.7. AUTOREGRESSIVE (AR) MODELS 101
xt = φ1 xt−1 + φ2 xt−2 + · · · + φp xt−p + wt , (4.23)
where {wt } is a white noise sequence with zero mean and some variance σ 2 .
For our purposes we usually assume that wt ∼ N(0, q). Note that the random
walk in Equation (4.19) is a special case of an AR(1) model where φ1 = 1
and φk = 0 for k ≥ 2.
4.7.1 Simulating an AR(p) process
Although we could simulate an AR(p) process in R using a for loop just as

we did for a random walk, it’s much easier with the function arima.sim(),
which works for all forms and subsets of ARIMA models. To do so, remember
that the AR in ARIMA stands for “autoregressive”, the I for “integrated”,
and the MA for “moving-average”; we specify the order of ARIMA models as
p, d, q. So, for example, we would specify an AR(2) model as ARIMA(2,0,0),
or an MA(1) model as ARIMA(0,0,1). If we had an ARMA(3,1) model that
we applied to data that had been twice-differenced, then we would have an
ARIMA(3,2,1) model.
arima.sim() will accept many arguments, but we are interested primarily
in two of them: n and model (type ?arima.sim to learn more). The former
simply indicates the length of desired time series, but the latter is more
complex. Specifically, model is a list with the following elements:
• order a vector of length 3 containing the ARIMA(p, d, q) order
• ar a vector of length p containing the AR(p) coefficients
• ma a vector of length q containing the MA(q) coefficients
• sd a scalar indicating the std dev of the Gaussian errors
Note that you can omit the ma element entirely if you have an AR(p) model,
or omit the ar element if you have an MA(q) model. If you omit the sd
element, arima.sim() will assume you want normally distributed errors with
SD = 1. Also note that you can pass arima.sim() your own time series of
random errors or the name of a function that will generate the errors (e.g.,
you could use rpois() if you wanted a model with Poisson errors). Type
?arima.sim for more details.
Let’s begin by simulating some AR(1) models and comparing their behavior.
First, let’s choose models with contrasting AR coefficients. Recall that in
order for an AR(1) model to be stationary, φ < |1|, so we’ll try 0.1 and 0.9.
We’ll again set the random number seed so we will get the same answers.
set.seed(456)
## list description for AR(1) model with small coef
AR.sm <- list(order = c(1, 0, 0), ar = 0.1, sd = 0.1)
## list description for AR(1) model with large coef
AR.lg <- list(order = c(1, 0, 0), ar = 0.9, sd = 0.1)
## simulate AR(1)
AR1.sm <- arima.sim(n = 50, model = AR.sm)
AR1.lg <- arima.sim(n = 50, model = AR.lg)
Now let’s plot the 2 simulated series.

## setup plot region
## get y-limits for common plots
ylm <- c(min(AR1.sm, AR1.lg), max(AR1.sm, AR1.lg))
## plot the ts
plot.ts(AR1.sm, ylim = ylm, ylab = expression(italic(x)[italic(t)]),
main = expression(paste(phi, " = 0.1")))
plot.ts(AR1.lg, ylim = ylm, ylab = expression(italic(x)[italic(t)]),
main = expression(paste(phi, " = 0.9")))
φ = 0.1 φ = 0.9
1 2
1 2
−1
−1
xt
xt
−3
−3
−5
−5
0 10 20 30 40 50 0 10 20 30 40 50
Time Time
Figure 4.22: Time series of simulated AR(1) processes with φ = 0.1 (left) and
φ = 0.9 (right).
4.7. AUTOREGRESSIVE (AR) MODELS 103
What do you notice about the two plots in Figure 4.22? It looks like the time
series with the smaller AR coefficient is more “choppy” and seems to stay
closer to 0 whereas the time series with the larger AR coefficient appears to
wander around more. Remember that as the coefficient in an AR(1) model
goes to 0, the model approaches a WN sequence, which is stationary in both
the mean and variance. As the coefficient goes to 1, however, the model
approaches a random walk, which is not stationary in either the mean or
variance.
Next, let’s generate two AR(1) models that have the same magnitude coefi-
cient, but opposite signs, and compare their behavior.
set.seed(123)
## list description for AR(1) model with small coef
AR.pos <- list(order = c(1, 0, 0), ar = 0.5, sd = 0.1)
## list description for AR(1) model with large coef
AR.neg <- list(order = c(1, 0, 0), ar = -0.5, sd = 0.1)
## simulate AR(1)
AR1.pos <- arima.sim(n = 50, model = AR.pos)
AR1.neg <- arima.sim(n = 50, model = AR.neg)
OK, let’s plot the 2 simulated series.

## get y-limits for common plots
ylm <- c(min(AR1.pos, AR1.neg), max(AR1.pos, AR1.neg))
## plot the ts
plot.ts(AR1.pos, ylim = ylm, ylab = expression(italic(x)[italic(t)]),
main = expression(paste(phi[1], " = 0.5")))
plot.ts(AR1.neg, ylab = expression(italic(x)[italic(t)]), main = expression(paste(phi
" = -0.5")))
Now it appears like both time series vary around the mean by about the same
amount, but the model with the negative coefficient produces a much more
“sawtooth” time series. It turns out that any AR(1) model with −1 < φ < 0
will exhibit the 2-point oscillation you see here.
We can simulate higher order AR(p) models in the same manner, but care
must be exercised when choosing a set of coefficients that result in a stationary
φ1 = 0.5 φ1 = −0.5
3
2
2
1
1
xt
xt
0
0
−2 −1
−2 −1
0 10 20 30 40 50 0 10 20 30 40 50
Time Time
Figure 4.23: Time series of simulated AR(1) processes with φ1 = 0.5 (left)
and φ1 = −0.5 (right).
model or else arima.sim() will fail and report an error. For example, an
AR(2) model with both coefficients equal to 0.5 is not stationary, and therefore
this function call will not work:
arima.sim(n = 100, model = list(order(2, 0, 0), ar = c(0.5, 0.5)))
If you try, R will respond that the “'ar' part of model is not
stationary”.
4.7.2 Correlation structure of AR(p) processes
Let’s review what we learned in lecture about the general behavior of the ACF
and PACF for AR(p) models. To do so, we’ll simulate four stationary AR(p)
models of increasing order p and then examine their ACF’s and PACF’s. Let’s
use a really big n so as to make them “pure”, which will provide a much
better estimate of the correlation structure.
set.seed(123)
## the 4 AR coefficients
ARp <- c(0.7, 0.2, -0.1, -0.3)
## empty list for storing models
AR.mods <- list()
## loop over orders of p
for (p in 1:4) {
4.8. MOVING-AVERAGE (MA) MODELS 105
## assume SD=1, so not specified

AR.mods[[p]] <- arima.sim(n = 10000, list(ar = ARp[1:p]))
}
Now that we have our four AR(p) models, lets look at plots of the time series,
ACF’s, and PACF’s.
## loop over orders of p
for (p in 1:4) {
plot.ts(AR.mods[[p]][1:50], ylab = paste("AR(", p, ")", sep = ""))
acf(AR.mods[[p]], lag.max = 12)
pacf(AR.mods[[p]], lag.max = 12, ylab = "PACF")
}
As we saw in lecture and is evident from our examples shown in Figure 4.24,
the ACF for an AR(p) process tails off toward zero very slowly, but the PACF
goes to zero for lags > p. This is an important diagnostic tool when trying to
identify the order of p in ARMA(p, q) models.
4.8 Moving-average (MA) models
A moving-averge process of order q, or MA(q), is a weighted sum of the

current random error plus the q most recent errors, and can be written as
xt = wt + θ1 wt−1 + θ2 wt−2 + · · · + θq wt−q , (4.24)
where {wt } is a white noise sequence with zero mean and some variance σ 2 ; for
our purposes we usually assume that wt ∼ N(0, q). Of particular note is that
because MA processes are finite sums of stationary errors, they themselves
are stationary.
Of interest to us are so-called “invertible” MA processes that can be expressed
as an infinite AR process with no error term. The term invertible comes from
the inversion of the backshift operator (B) that we discussed in class (i.e.,
0.6
0.8
1
AR(1)
PACF
ACF
0.3
0.4
−1
0.0
0.0
−3
0 20 40 0 4 8 12 2 6 10
Series AR.mods[[p]] Series AR.mods[[p]]
Time Lag Lag
0.8
0.8
0
AR(2)
PACF
ACF
−2
0.4
0.4
−5
0.0
0.0
0 20 40 0 4 8 12 2 6 10
Time Lag Lag
0.8
0.8
0
AR(3)
PACF
0.4
ACF
0.4
−2
0.0
0.0
−4
0 20 40 0 4 8 12 2 6 10
Time Lag Lag
3
0.4
0.5
−1 1
AR(4)
PACF
ACF
−0.2
−0.5
−4
0 20 40 0 4 8 12 2 6 10
Time Lag Lag
Figure 4.24: Time series of simulated AR(p) processes (left column) of

increasing orders from 1-4 (rows) with their associated ACF’s (center column)
and PACF’s (right column). Note that only the first 50 values of xt are
plotted.
4.8. MOVING-AVERAGE (MA) MODELS 107
Bxt = xt−1 ). So, for example, an MA(1) process with θ < |1| is invertible
because it can be written using the backshift operator as
xt = wt − θwt−1
xt = wt − θBwt
xt = (1 − θB)wt ,
⇓
(4.25)
1
wt = xt
(1 − θB)
wt = (1 + θB + θ2 B2 + θ3 B3 + . . . )xt
wt = xt + θxt−1 + θ2 xt−2 + θ3 xt−3 + . . .
4.8.1 Simulating an MA(q) process
We can simulate MA(q) processes just as we did for AR(p) processes using
arima.sim(). Here are 3 different ones with contrasting θ’s:
set.seed(123)
## list description for MA(1) model with small coef
MA.sm <- list(order = c(0, 0, 1), ma = 0.2, sd = 0.1)
## list description for MA(1) model with large coef
MA.lg <- list(order = c(0, 0, 1), ma = 0.8, sd = 0.1)
## list description for MA(1) model with large coef
MA.neg <- list(order = c(0, 0, 1), ma = -0.5, sd = 0.1)
## simulate MA(1)
MA1.sm <- arima.sim(n = 50, model = MA.sm)
MA1.lg <- arima.sim(n = 50, model = MA.lg)
MA1.neg <- arima.sim(n = 50, model = MA.neg)
with their associated plots.

## plot the ts
plot.ts(MA1.sm, ylab = expression(italic(x)[italic(t)]), main = expression(paste(thet
" = 0.2")))
plot.ts(MA1.lg, ylab = expression(italic(x)[italic(t)]), main = expression(pa

" = 0.8")))
plot.ts(MA1.neg, ylab = expression(italic(x)[italic(t)]), main = expression(p
" = -0.5")))
θ = 0.2 θ = 0.8 θ = −0.5

2
2
3
2
1
1
1
0
xt
xt
xt
0
−1
−1
−2
−2
−2
0 20 40 0 20 40 0 20 40
Time Time Time
Figure 4.25: Time series of simulated MA(1) processes with θ = 0.2 (left),
θ = 0.8 (middle), and θ = −0.5 (right).
In contrast to AR(1) processes, MA(1) models do not exhibit radically different

behavior with changing θ. This should not be too surprising given that they
are simply linear combinations of white noise.
4.8.2 Correlation structure of MA(q) processes
We saw in lecture and above how the ACF and PACF have distinctive features
for AR(p) models, and they do for MA(q) models as well. Here are examples
of four MA(q) processes. As before, we’ll use a really big n so as to make
them “pure”, which will provide a much better estimate of the correlation
structure.
set.seed(123)
## the 4 MA coefficients
MAq <- c(0.7, 0.2, -0.1, -0.3)
## empty list for storing models
MA.mods <- list()
## loop over orders of q
4.9. AUTOREGRESSIVE MOVING-AVERAGE (ARMA) MODELS 109
for (q in 1:4) {
## assume SD=1, so not specified
MA.mods[[q]] <- arima.sim(n = 1000, list(ma = MAq[1:q]))
}
Now that we have our four MA(q) models, lets look at plots of the time series,
ACF’s, and PACF’s.
## loop over orders of q
for (q in 1:4) {
plot.ts(MA.mods[[q]][1:50], ylab = paste("MA(", q, ")", sep = ""))
acf(MA.mods[[q]], lag.max = 12)
pacf(MA.mods[[q]], lag.max = 12, ylab = "PACF")
}
Note very little qualitative difference in the realizations of the four MA(q)
processes (Figure 4.26). As we saw in lecture and is evident from our examples
here, however, the ACF for an MA(q) process goes to zero for lags > q, but
the PACF tails off toward zero very slowly. This is an important diagnostic
tool when trying to identify the order of q in ARMA(p, q) models.
4.9 Autoregressive moving-average (ARMA)

models
ARMA(p, q) models have a rich history in the time series literature, but they
are not nearly as common in ecology as plain AR(p) models. As we discussed
in lecture, both the ACF and PACF are important tools when trying to
identify the appropriate order of p and q. Here we will see how to simulate
time series from AR(p), MA(q), and ARMA(p, q) processes, as well as fit time
series models to data based on insights gathered from the ACF and PACF.
We can write an ARMA(p, q) as a mixture of AR(p) and MA(q) models, such

that
0.0 0.4 0.8

2
0.2
MA(1)
PACF
ACF
0
−0.2
−2
0 20 40 0 4 8 12 2 6 10
Series MA.mods[[q]] Series MA.mods[[q]]
Time Lag Lag
4
0.0 0.4 0.8

MA(2)
PACF
0.2
2
ACF
−0.2
−2
0 20 40 0 4 8 12 2 6 10
Time Lag Lag
0.0 0.4 0.8
2
MA(3)
0.2
PACF
ACF
0
−0.2
−2
0 20 40 0 4 8 12 2 6 10
Time Lag Lag
1.0
2
0.2
MA(4)
PACF
ACF
0.4
0
−0.2
−0.2
−3
0 20 40 0 4 8 12 2 6 10
Time Lag Lag
Figure 4.26: Time series of simulated MA(q) processes (left column) of

increasing orders from 1-4 (rows) with their associated ACF’s (center column)
and PACF’s (right column). Note that only the first 50 values of xt are
plotted.
xt = φ1 xt−1 +φ2 xt−2 +· · ·+φp xt−p +wt +θwt−1 +θ2 wt−2 +· · ·+θq xt−q , (4.26)
and the wt are white noise.
4.9.1 Fitting ARMA(p, q) models with arima()
We have already seen how to simulate AR(p) and MA(q) models with
arima.sim(); the same concepts apply to ARMA(p, q) models and therefore
we will not do that here. Instead, we will move on to fitting ARMA(p, q)
models when we only have a realization of the process (i.e., data) and do not
know the underlying parameters that generated it.
The function arima() accepts a number of arguments, but two of them are
most important:
• x a univariate time series
• order a vector of length 3 specifying the order of ARIMA(p,d,q) model
In addition, note that by default arima() will estimate an underlying mean
of the time series unless d > 0. For example, an AR(1) process with mean µ
would be written
xt = µ + φ(xt−1 − µ) + wt . (4.27)
If you know for a fact that the time series data have a mean of zero (e.g., you
already subtracted the mean from them), you should include the argument
include.mean=FALSE, which is set to TRUE by default. Note that ignoring
and not estimating a mean in ARMA(p, q) models when one exists will bias
the estimates of all other parameters.
Let’s see an example of how arima() works. First we’ll simulate an
ARMA(2,2) model and then estimate the parameters to see how well we can
recover them. In addition, we’ll add in a constant to create a non-zero mean,
which arima() reports as intercept in its output.
set.seed(123)
## ARMA(2,2) description for arim.sim()
ARMA22 <- list(order = c(2, 0, 2), ar = c(-0.7, 0.2), ma = c(0.7,
0.2))
## mean of process
mu <- 5
## simulated process (+ mean)
ARMA.sim <- arima.sim(n = 10000, model = ARMA22) + mu
## estimate parameters
arima(x = ARMA.sim, order = c(2, 0, 2))
Call:
arima(x = ARMA.sim, order = c(2, 0, 2))
Coefficients:
ar1 ar2 ma1 ma2 intercept
-0.7079 0.1924 0.6912 0.2001 4.9975
s.e. 0.0291 0.0284 0.0289 0.0236 0.0125
sigma^2 estimated as 0.9972: log likelihood = -14175.92, aic = 28363.84

It looks like we were pretty good at estimating the true parameters, but
our sample size was admittedly quite large; the estimate of the variance of
the process errors is reported as sigmaˆ2 below the other coefficients. As
an exercise, try decreasing the length of time series in the arima.sim() call
above from 10,000 to something like 100 and see what effect it has on the
parameter estimates.
4.9.2 Searching over model orders
In an ideal situation, you could examine the ACF and PACF of the time
series of interest and immediately decipher what orders of p and q must have
generated the data, but that doesn’t always work in practice. Instead, we are
often left with the task of searching over several possible model forms and
seeing which of them provides the most parsimonious fit to the data. There
are two easy ways to do this for ARIMA models in R. The first is to write a
little script that loops ove the possible dimensions of p and q. Let’s try that
for the process we simulated above and search over orders of p and q from
0-3 (it will take a few moments to run and will likely report an error about a
“possible convergence problem”, which you can ignore).
## empty list to store model fits
ARMA.res <- list()
## set counter
cc <- 1
## loop over AR
for (p in 0:3) {
## loop over MA
for (q in 0:3) {
ARMA.res[[cc]] <- arima(x = ARMA.sim, order = c(p, 0,
q))
cc <- cc + 1
}
}
Warning in arima(x = ARMA.sim, order = c(p, 0, q)): possible convergence

problem: optim gave code = 1
## get AIC values for model evaluation
ARMA.AIC <- sapply(ARMA.res, function(x) x$aic)
## model with lowest AIC is the best
ARMA.res[[which(ARMA.AIC == min(ARMA.AIC))]]
Call:
arima(x = ARMA.sim, order = c(p, 0, q))
Coefficients:
ar1 ar2 ma1 ma2 intercept
-0.7079 0.1924 0.6912 0.2001 4.9975
s.e. 0.0291 0.0284 0.0289 0.0236 0.0125

It looks like our search worked, so let’s look at the other method for fitting
ARIMA models. The auto.arima() function in the forecast package will
conduct an automatic search over all possible orders of ARIMA models that
you specify. For details, type ?auto.arima after loading the package. Let’s
repeat our search using the same criteria.
## find best ARMA(p,q) model
auto.arima(ARMA.sim, start.p = 0, max.p = 3, start.q = 0, max.q = 3)
Series: ARMA.sim
ARIMA(2,0,2) with non-zero mean
Coefficients:
ar1 ar2 ma1 ma2 mean
-0.7079 0.1924 0.6912 0.2001 4.9975
s.e. 0.0291 0.0284 0.0289 0.0236 0.0125
sigma^2 estimated as 0.9977: log likelihood=-14175.92

AIC=28363.84 AICc=28363.84 BIC=28407.1
We get the same results with an increase in speed and less coding, which is nice.
If you want to see the form for each of the models checked by auto.arima()
and their associated AIC values, include the argument trace=1.
4.10. PROBLEMS 115
4.10 Problems
We have seen how to do a variety of introductory time series analyses with R.
Now it is your turn to apply the information you learned here and in lecture to
complete some analyses. You have been asked by a colleague to help analyze
some time series data she collected as part of an experiment on the effects of
light and nutrients on the population dynamics of phytoplankton. Specifically,
after controlling for differences in light and temperature, she wants to know
if the natural log of population density can be modeled with some form of
ARMA(p, q) model.
The data are expressed as the number of cells per milliliter recorded every
hour for one week beginning at 8:00 AM on December 1, 2014. You can load
the data using
data(hourlyphyto, package = "atsalibrary")
pDat <- hourlyphyto
Use the information above to do the following:

1. Convert pDat, which is a data.frame object, into a ts object. This bit
of code might be useful to get you started:
## what day of 2014 is Dec 1st?
dBegin <- as.Date("2014-12-01")
dayOfYear <- (dBegin - as.Date("2014-01-01") + 1)
2. Plot the time series of phytoplankton density and provide a brief de-
scription of any notable features.
3. Although you do not have the actual measurements for the specific
temperature and light regimes used in the experiment, you have been
informed that they follow a regular light/dark period with accompanying
warm/cool temperatures. Thus, estimating a fixed seasonal effect is
justifiable. Also, the instrumentation is precise enough to preclude any
systematic change in measurements over time (i.e., you can assume
mt = 0 for all t). Obtain the time series of the estimated log-density
of phytoplankton absent any hourly effects caused by variation in
temperature or light. (Hint: You will need to do some decomposition.)
4. Use diagnostic tools to identify the possible order(s) of ARMA model(s)
that most likely describes the log of population density for this particular
experiment. Note that at this point you should be focusing your analysis
on the results obtained in Question 3.
5. Use some form of search to identify what form of ARMA(p, q) model best
describes the log of population density for this particular experiment.
Use what you learned in Question 4 to inform possible orders of p and
q. (Hint: if you use auto.arima(), include the additional argument
seasonal=FALSE)
6. Write out the best model in the form of Equation (4.26) using the
underscore notation to refer to subscripts (e.g., write x_t for xt ). You
can round any parameters/coefficients to the nearest hundreth. (Hint:
if the mean of the time series is not zero, refer to Eqn 1.27 in the lab
handout).
Chapter 5
Box-Jenkins method
In this chapter, you will practice selecting and fitting an ARIMA model to
catch data using the Box-Jenkins method. After fitting a model, you will
prepare simple forecasts using the forecast package.
Data and packages
We will use the catch landings from Greek waters (greeklandings) and
the Chinook landings (chinook) in Washington data sets for this chapter.
These datasets are in the atsalibrary package on GitHub. Install using the
devtools package.
library(devtools)
Load the data.

data(greeklandings, package = "atsalibrary")
landings <- greeklandings
# Use the monthly data
data(chinook, package = "atsalibrary")
chinook <- chinook.month
Ensure you have the necessary packages.
117
118 CHAPTER 5. BOX-JENKINS METHOD
library(ggplot2)
library(gridExtra)
library(reshape2)
library(tseries)
library(urca)
library(forecast)
5.1 Box-Jenkins method
A. Model form selection

1. Evaluate stationarity
2. Selection of the differencing level (d) – to fix stationarity problems
3. Selection of the AR level (p)
4. Selection of the MA level (q)
B. Parameter estimation
C. Model checking
5.2 Stationarity
It is important to test and transform (via differencing) your data to ensure

stationarity when fitting an ARMA model using standard algorithms. The
standard algorithms for ARIMA models assume stationarity and we will be
using those algorithms. It possible to fit ARMA models without transforming
the data. We will cover that in later chapters. However, that is not commonly
done in the literature on forecasting with ARMA models, certainly not in the
literature on catch forecasting.
Keep in mind also that many ARMA models are stationary and you do not
want to get in the situation of trying to fit an incompatible process model
to your data. We will see examples of this when we start fitting models to
non-stationary data and random walks.
5.2. STATIONARITY 119
5.2.1 Look at stationarity in simulated data
We will start by looking at white noise and a stationary AR(1) process from
simulated data. White noise is simply a string of random numbers drawn
from a Normal distribution. rnorm() with return random numbers drawn
from a Normal distribution. Use ?rnorm to understand what the function
requires.
TT <- 100
y <- rnorm(TT, mean = 0, sd = 1) # 100 random numbers
op <- par(mfrow = c(1, 2))
plot(y, type = "l")
acf(y)
Series y
1.0
2
0.8
1
0.6
ACF
0.4
0
y
0.2
−1
−2
−0.2
0 20 40 60 80 100 0 5 10 15 20
Index Lag
par(op)
Here we use ggplot() to plot 10 white noise time series.

dat <- data.frame(t = 1:TT, y = y)
p1 <- ggplot(dat, aes(x = t, y = y)) + geom_line() + ggtitle("1 white noise time seri
xlab("") + ylab("value")
ys <- matrix(rnorm(TT * 10), TT, 10)
ys <- data.frame(ys)
ys$id = 1:TT
ys2 <- melt(ys, id.var = "id")

p2 <- ggplot(ys2, aes(x = id, y = value, group = variable)) +
geom_line() + xlab("") + ylab("value") + ggtitle("10 white noise processes
grid.arrange(p1, p2, ncol = 1)
1 white noise time series

2
1
value
−1
−2
0 25 50 75 100
10 white noise processes
2
value
−2
0 25 50 75 100
These are stationary because the variance and mean (level) does not change
with time.
An AR(1) process is also stationary.

theta <- 0.8
nsim <- 10
ar1 <- arima.sim(TT, model = list(ar = theta))
plot(ar1)
2
0
ar1
−2
−4
0 20 40 60 80 100
Time
We can use ggplot to plot 10 AR(1) time series, but we need to change the
data to a data frame.
dat <- data.frame(t = 1:TT, y = ar1)
p1 <- ggplot(dat, aes(x = t, y = y)) + geom_line() + ggtitle("AR-1") +
xlab("") + ylab("value")
ys <- matrix(0, TT, nsim)
for (i in 1:nsim) ys[, i] <- as.vector(arima.sim(TT, model = list(ar = theta)))
ys <- data.frame(ys)
ys$id <- 1:TT
ys2 <- melt(ys, id.var = "id")

p2 <- ggplot(ys2, aes(x = id, y = value, group = variable)) +
geom_line() + xlab("") + ylab("value") + ggtitle("The variance of an AR-1 process
grid.arrange(p1, p2, ncol = 1)
Don't know how to automatically pick scale for object of type ts. Defaulting to contin
AR−1
2.5
0.0
value
−2.5
−5.0
0 25 50 75 100
The variance of an AR−1 process is steady
2.5
value
0.0
−2.5
−5.0
0 25 50 75 100
5.2.2 Stationary around a linear trend
Fluctuating around a linear trend is a very common type of stationarity used

in ARMA modeling and forecasting. This is just a stationary process, like
white noise or AR(1), around an linear trend up or down.
intercept <- 0.5
trend <- 0.1
sd <- 0.5
TT <- 20
wn <- rnorm(TT, sd = sd) #white noise
wni <- wn + intercept #white noise witn interept
wnti <- wn + trend * (1:TT) + intercept
See how the white noise with trend is just the white noise overlaid on a linear
trend.
op <- par(mfrow = c(1, 3))
plot(wn, type = "l")
plot(trend * 1:TT)
plot(wnti, type = "l")
3.0
2.0
1.0
2.5
1.5
0.5
2.0
trend * 1:TT
1.5
wnti
wn
1.0
0.0
1.0
0.5
0.5
−0.5
0.0
5 10 15 20 5 10 15 20 5 10 15 20
Index Index Index
par(op)
We can make a similar plot with ggplot.

dat <- data.frame(t = 1:TT, wn = wn, wni = wni, wnti = wnti)
p1 <- ggplot(dat, aes(x = t, y = wn)) + geom_line() + ggtitle("White noise")
p2 <- ggplot(dat, aes(x = t, y = wni)) + geom_line() + ggtitle("with non-zero mean")
p3 <- ggplot(dat, aes(x = t, y = wnti)) + geom_line() + ggtitle("with linear trend")
grid.arrange(p1, p2, p3, ncol = 3)
White noise with non−zero mean with linear trend

3
1.0 1.5
0.5 1.0 2
wnti
wni
wn
0.0 0.5
1
−0.5 0.0
5 10 15 20 5 10 15 20 5 10 15 20
t t t
We can make a similar plot with AR(1) data. Ignore the warnings about not
knowing how to pick the scale.
beta1 <- 0.8
ar1 <- arima.sim(TT, model = list(ar = beta1), sd = sd)
ar1i <- ar1 + intercept
ar1ti <- ar1 + trend * (1:TT) + intercept
dat <- data.frame(t = 1:TT, ar1 = ar1, ar1i = ar1i, ar1ti = ar1ti)
p4 <- ggplot(dat, aes(x = t, y = ar1)) + geom_line() + ggtitle("AR1")
p5 <- ggplot(dat, aes(x = t, y = ar1i)) + geom_line() + ggtitle("with non-zero
p6 <- ggplot(dat, aes(x = t, y = ar1ti)) + geom_line() + ggtitle("with linear
grid.arrange(p4, p5, p6, ncol = 3)
Don't know how to automatically pick scale for object of type ts. Defaulting t
AR1 with non−zero mean with linear trend

1.2
3.0
1.4
0.8 2.5
ar1ti
ar1i
ar1
2.0
1.0
0.4
1.5
0.6
0.0 1.0
5 10 15 20 5 10 15 20 5 10 15 20
t t t
5.2.3 Greek landing data
We will look at the anchovy data. Notice the two == in the subset call not
one =. We will use the Greek data before 1989 for the lab.
anchovy <- subset(landings, Species == "Anchovy" & Year <= 1989)$log.metric.tons
anchovyts <- ts(anchovy, start = 1964)
Plot the data.

plot(anchovyts, ylab = "log catch")
10.0
9.5
log catch
9.0
8.5
1965 1970 1975 1980 1985 1990
Time
Questions to ask.
• Does it have a trend (goes up or down)? Yes, definitely
• Does it have a non-zero mean? Yes
• Does it look like it might be stationary around a trend? Maybe
5.3 Dickey-Fuller and Augmented Dickey-

Fuller tests
5.3.1 Dickey-Fuller test
The Dickey-Fuller test is testing if φ = 0 in this model of the data:

yt = α + βt + φyt−1 + et
which is written as
∆yt = yt − yt−1 = α + βt + γyt−1 + et
where yt is your data. It is written this way so we can do a linear regression
of ∆yt against t and yt−1 and test if γ is different from 0. If γ = 0, then we
have a random walk process. If not and −1 < 1 + γ < 1, then we have a
stationary process.
5.3. DICKEY-FULLER AND AUGMENTED DICKEY-FULLER TESTS127
5.3.2 Augmented Dickey-Fuller test
The Augmented Dickey-Fuller test allows for higher-order autoregressive

processes by including ∆yt−p in the model. But our test is still if γ = 0.
∆yt = α + βt + γyt−1 + δ1 ∆yt−1 + δ2 ∆yt−2 + . . .
The null hypothesis for both tests is that the data are non-stationary. We
want to REJECT the null hypothesis for this test, so we want a p-value of
less that 0.05 (or smaller).
5.3.3 ADF test using adf.test()
The adf.test() from the tseries package will do a Augmented Dickey-Fuller

test (Dickey-Fuller if we set lags equal to 0) with a trend and an intercept.
Use ?adf.test to read about this function. The function is
adf.test(x, alternative = c("stationary", "explosive"),
k = trunc((length(x)-1)^(1/3)))
x are your data. alternative="stationary" means that −2 < γ < 0
(−1 < φ < 1) and alternative="explosive" means that is outside these
bounds. k is the number of δ lags. For a Dickey-Fuller test, so only up to
AR(1) time dependency in our stationary process, we set k=0 so we have no
δ’s in our test. Being able to control the lags in our test, allows us to avoid a
stationarity test that is too complex to be supported by our data.
5.3.3.1 Test on white noise
Let’s start by doing the test on data that we know are stationary, white
noise. We will use an Augmented Dickey-Fuller test where we use the default
number of lags (amount of time-dependency) in our test. For a time-series of
100, this is 4.
TT <- 100
wn <- rnorm(TT) # white noise
tseries::adf.test(wn)
Warning in tseries::adf.test(wn): p-value smaller than printed p-value
Augmented Dickey-Fuller Test
data: wn
Dickey-Fuller = -4.8309, Lag order = 4, p-value = 0.01
alternative hypothesis: stationary
The null hypothesis is rejected.
Try a Dickey-Fuller test. This is testing with a null hypothesis of AR(1)
stationarity versus a null hypothesis with AR(4) stationarity when we used
the default k.
tseries::adf.test(wn, k = 0)
Warning in tseries::adf.test(wn, k = 0): p-value smaller than printed p-value
data: wn
Notice that the test-statistic is smaller. This is a more restrictive test and we
can reject the null with a higher significance level.
5.3.3.2 Test on white noise with trend
Try the test on white noise with a trend and intercept.

intercept <- 1
wnt <- wn + 1:TT + intercept
tseries::adf.test(wnt)
Warning in tseries::adf.test(wnt): p-value smaller than printed p-value

data: wnt
The null hypothesis is still rejected. adf.test() uses a model that allows an
intercept and trend.
5.3.3.3 Test on random walk
Let’s try the test on a random walk (nonstationary).

rw <- cumsum(rnorm(TT))
tseries::adf.test(rw)
data: rw
The null hypothesis is NOT rejected as the p-value is greater than 0.05.
Try a Dickey-Fuller test.
tseries::adf.test(rw, k = 0)
data: rw
Notice that the test-statistic is larger.
5.3.3.4 Test the anchovy data
tseries::adf.test(anchovyts)
data: anchovyts
The p-value is greater than 0.05. We cannot reject the null hypothesis. The
null hypothesis is that the data are non-stationary.
5.3.4 ADF test using ur.df()
The ur.df() Augmented Dickey-Fuller test in the urca package gives us a

bit more information on and control over the test.
ur.df(y, type = c("none", "drift", "trend"), lags = 1,
selectlags = c("Fixed", "AIC", "BIC"))
The ur.df() function allows us to specify whether to test stationarity around
a zero-mean with no trend, around a non-zero mean with no trend, or around
a trend with an intercept. This can be useful when we know that our data
have no trend, for example if you have removed the trend already. ur.df()
allows us to specify the lags or select them using model selection.
5.3.4.1 Test on white noise
Let’s first do the test on data we know is stationary, white noise. We have to
choose the type and lags. If you have no particular reason to not include an
intercept and trend, then use type="trend". This allows both intercept and
trend. When you might you have a particular reason not to use "trend"?
When you have removed the trend and/or intercept.
Next you need to chose the lags. We will use lags=0 to do the Dickey-Fuller
test. Note the number of lags you can test will depend on the amount of data
that you have. adf.test() used a default of trunc((length(x)-1)ˆ(1/3))
for the lags, but ur.df() requires that you pass in a value or use a fixed
default of 1.
lags=0 is fitting this model to the data. You are testing if the effect for
z.lag.1 is 0.
z.diff = gamma * z.lag.1 + intercept + trend * tt z.diff means
∆yt and z.lag.1 is yt−1 .
When you use summary() for the output from ur.df(), you will see the
estimated values for γ (denoted z.lag.1), intercept and trend. If you see
*** or ** on the coefficients list for z.lag.1, it indicates that the effect of
z.lag.1 is significantly different than 0 and this supports the assumption of
stationarity.
The intercept and tt estimates indicate where there is a non-zero level
(intercept) or linear trend (tt).
wn <- rnorm(TT)
test <- urca::ur.df(wn, type = "trend", lags = 0)
summary(test)
###############################################
# Augmented Dickey-Fuller Test Unit Root Test #
###############################################
Test regression trend
Call:
lm(formula = z.diff ~ z.lag.1 + 1 + tt)
Residuals:
Min 1Q Median 3Q Max
-2.2170 -0.6654 -0.1210 0.5311 2.6277
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.0776865 0.2037709 0.381 0.704
z.lag.1 -1.0797598 0.1014244 -10.646 <2e-16 ***
tt 0.0004891 0.0035321 0.138 0.890
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.004 on 96 degrees of freedom

Multiple R-squared: 0.5416, Adjusted R-squared: 0.532
F-statistic: 56.71 on 2 and 96 DF, p-value: < 2.2e-16
Value of test-statistic is: -10.646 37.806 56.7083
Critical values for test statistics:

1pct 5pct 10pct
tau3 -4.04 -3.45 -3.15
phi2 6.50 4.88 4.16
phi3 8.73 6.49 5.47
The coefficient part of the summary indicates that z.lag.1 is different than
0 (so stationary) and no support for intercept or trend.
Notice that the test statistic is LESS than the critical value for tau3 at 5
percent. This means the null hypothesis is rejected at α = 0.05, a standard
level for significance testing.
5.3.4.2 When you might want to use ur.df()
If you remove the trend (and/or level) from your data, the ur.df() test
allows you to increase the power of the test by removing the trend and/or
level from the model.
5.4 KPSS test
The null hypothesis for the KPSS test is that the data are stationary. For
this test, we do NOT want to reject the null hypothesis. In other words, we
want the p-value to be greater than 0.05 not less than 0.05.
5.4. KPSS TEST 133
5.4.1 Test on simulated data
Let’s try the KPSS test on white noise with a trend. The default is a null
hypothesis with no trend. We will change this to null="Trend".
tseries::kpss.test(wnt, null = "Trend")
Warning in tseries::kpss.test(wnt, null = "Trend"): p-value greater than printed

p-value
KPSS Test for Trend Stationarity
data: wnt
KPSS Trend = 0.045579, Truncation lag parameter = 4, p-value = 0.1
The p-value is greater than 0.05. The null hypothesis of stationarity around
a trend is not rejected.
Let’s try the KPSS test on white noise with a trend but let’s use the default
of stationary with no trend.
tseries::kpss.test(wnt, null = "Level")
Warning in tseries::kpss.test(wnt, null = "Level"): p-value smaller than printed

p-value
KPSS Test for Level Stationarity
data: wnt
KPSS Level = 2.1029, Truncation lag parameter = 4, p-value = 0.01
The p-value is less than 0.05. The null hypothesis of stationarity around
a level is rejected. This is white noise around a trend so it is definitely a
stationary process but has a trend. This illustrates that you need to be
thoughtful when applying stationarity tests.
5.4.2 Test the anchovy data
Let’s try the anchovy data.

kpss.test(anchovyts, null = "Trend")
KPSS Test for Trend Stationarity
data: anchovyts
KPSS Trend = 0.14779, Truncation lag parameter = 2, p-value = 0.04851
The null is rejected (p-value less than 0.05). Again stationarity is not sup-
ported.
5.5 Dealing with non-stationarity

The anchovy data have failed both tests for the stationarity, the Augmented
Dickey-Fuller and the KPSS test. How do we fix this? The approach in the
Box-Jenkins method is to use differencing.
Let’s see how this works with random walk data. A random walk is non-
stationary but the difference is white noise so is stationary:
xt − xt−1 = et , et ∼ N (0, σ)
adf.test(diff(rw))
data: diff(rw)
kpss.test(diff(rw))
Warning in kpss.test(diff(rw)): p-value greater than printed p-value

5.5. DEALING WITH NON-STATIONARITY 135
data: diff(rw)
If we difference random walk data, the null is rejected for the ADF test and
not rejected for the KPSS test. This is what we want.
Let’s try a single difference with the anchovy data. A single difference means
dat(t)-dat(t-1). We get this using diff(anchovyts).
diff1dat <- diff(anchovyts)
adf.test(diff1dat)
data: diff1dat
kpss.test(diff1dat)
Warning in kpss.test(diff1dat): p-value greater than printed p-value
data: diff1dat
If a first difference were not enough, we would try a second difference which
is the difference of a first difference.
diff2dat <- diff(diff1dat)
adf.test(diff2dat)
Warning in adf.test(diff2dat): p-value smaller than printed p-value
data: diff2dat
The null hypothesis of a random walk is now rejected so you might think that
a 2nd difference is needed for the anchovy data. However the actual problem
is that the default for adf.test() includes a trend but we removed the trend
with our first difference. Thus we included an unneeded trend parameter in
our test. Our data are not that long and this affects the result.
Let’s repeat without the trend and we’ll see that the null hypothesis is rejected.
The number of lags is set to be what would be used by adf.test(). See
?adf.test.
k <- trunc((length(diff1dat) - 1)^(1/3))
test <- urca::ur.df(diff1dat, type = "drift", lags = k)
summary(test)
###############################################
# Augmented Dickey-Fuller Test Unit Root Test #
###############################################
Test regression drift
Call:
lm(formula = z.diff ~ z.lag.1 + 1 + z.diff.lag)
Residuals:
Min 1Q Median 3Q Max
-0.37551 -0.13887 0.04753 0.13277 0.28223
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.11062 0.06165 1.794 0.08959 .
z.lag.1 -2.16711 0.64900 -3.339 0.00365 **
z.diff.lag1 0.58837 0.47474 1.239 0.23113
z.diff.lag2 0.13273 0.25299 0.525 0.60623
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
5.6. SUMMARY: STATIONARITY TESTING 137
Residual standard error: 0.207 on 18 degrees of freedom

Multiple R-squared: 0.7231, Adjusted R-squared: 0.677
F-statistic: 15.67 on 3 and 18 DF, p-value: 2.918e-05
Value of test-statistic is: -3.3391 5.848
Critical values for test statistics:

1pct 5pct 10pct
tau2 -3.75 -3.00 -2.63
phi1 7.88 5.18 4.12
5.5.1 ndiffs()
As an alternative to trying many different differences and remembering to

include or not include the trend or level, you can use the ndiffs() function
in the forecast package. This automates finding the number of differences
needed.
forecast::ndiffs(anchovyts, test = "kpss")
[1] 1
forecast::ndiffs(anchovyts, test = "adf")
[1] 1
One difference is required to pass both the ADF and KPSS stationarity tests.
5.6 Summary: stationarity testing
The basic stationarity diagnostics are the following

• Plot your data. Look for
– An increasing trend
– A non-zero level (if no trend)
– Strange shocks or steps in your data (indicating something dramatic

changed like the data collection methodology)
• Apply stationarity tests
– adf.test() p-value should be less than 0.05 (reject null)
– kpss.test() p-value should be greater than 0.05 (do not reject
null)
• If stationarity tests are failed, then try differencing to correct
– Try ndiffs() in the forecast package or manually try different
differences.
5.7 Estimating ARMA parameters

Let’s start with fitting to simulated data.
5.7.1 AR(2) data
Simulate AR(2) data and add a mean level so that the data are not mean 0.
xt = 0.8xt−1 + 0.1xt−2 + et yt = xt + m
m <- 1
ar2 <- arima.sim(n = 1000, model = list(ar = c(0.8, 0.1))) +
m
To see info on arima.sim(), type ?arima.sim.
5.7.2 Fit with Arima()
Fit an ARMA(2) with level to the data.

forecast::Arima(ar2, order = c(2, 0, 0), include.constant = TRUE)
Series: ar2
5.7. ESTIMATING ARMA PARAMETERS 139
Coefficients:
ar1 ar2 mean
0.7684 0.1387 0.9561
s.e. 0.0314 0.0314 0.3332

AIC=2827.54 AICc=2827.58 BIC=2847.17
Note, the model being fit by Arima() is not this model
yt = m + 0.8yt−1 + 0.1yt−2 + et
It is this model:
(yt − m) = 0.8(yt−1 − m) + 0.1(yt−2 − m) + et

or as written above:
xt = 0.8xt−1 + 0.1xt−2 + et yt = xt + m
We could also use arima() to fit to the data.

arima(ar2, order = c(2, 0, 0), include.mean = TRUE)
Warning in arima(ar2, order = c(2, 0, 0), include.mean = TRUE): possible

convergence problem: optim gave code = 1
Call:
arima(x = ar2, order = c(2, 0, 0), include.mean = TRUE)
Coefficients:
ar1 ar2 intercept
0.7684 0.1387 0.9561
s.e. 0.0314 0.0314 0.3332

However we will not be using arima() directly because for if we have differ-
enced data, it will not allow us to include and estimated mean level. Unless
we have transformed our differenced data in a way that ensures it is mean

zero, then we want to include a mean.
Try increasing the length of the simulated data (from 100 to 1000 say) and see
how that affects your parameter estimates. Run the simulation a few times.
5.7.3 AR(1) simulated data
ar1 <- arima.sim(n = 100, model = list(ar = c(0.8))) + m

forecast::Arima(ar1, order = c(1, 0, 0), include.constant = TRUE)
Series: ar1
Coefficients:
ar1 mean
0.7091 0.4827
s.e. 0.0705 0.3847

AIC=317.7 AICc=317.95 BIC=325.51
5.7.4 ARMA(1,2) simulated data
Simulate ARMA(1,2)
xt = 0.8xt−1 + et + 0.8et−1 + 0.2et−2
arma12 = arima.sim(n = 100, model = list(ar = c(0.8), ma = c(0.8,

0.2))) + m
forecast::Arima(arma12, order = c(1, 0, 2), include.constant = TRUE)
Series: arma12
Coefficients:
5.7. ESTIMATING ARMA PARAMETERS 141
ar1 ma1 ma2 mean

0.8138 0.8599 0.1861 0.3350
s.e. 0.0646 0.1099 0.1050 0.8145

AIC=246.03 AICc=246.67 BIC=259.06
We will up the number of data points to 1000 because models with a MA
component take a lot of data to estimate. Models with MA(>1) are not very
practical for fisheries data for that reason.
5.7.5 These functions work for data with missing val-

ues
Create some AR(2) data and then add missing values (NA).
ar2miss <- arima.sim(n = 100, model = list(ar = c(0.8, 0.1)))
ar2miss[sample(100, 50)] <- NA
plot(ar2miss, type = "l")
title("many missing values")
many missing values

4
2
ar2miss
0
−2
−4
0 20 40 60 80 100
Time
Fit
fit <- forecast::Arima(ar2miss, order = c(2, 0, 0))

fit
Series: ar2miss
Coefficients:
ar1 ar2 mean
1.0625 -0.2203 -0.0586
s.e. 0.1555 0.1618 0.6061

AIC=167.72 AICc=168.15 BIC=178.06
Note fitted() does not return the expected value at time t. It is the expected
value of yt given the data up to time t − 1.
plot(ar2miss, type = "l")
title("many missing values")
lines(fitted(fit), col = "blue")
many missing values

4
2
ar2miss
0
−2
−4
0 20 40 60 80 100
Time
It is easy enough to get the expected value of yt for all the missing values but
we’ll learn to do that when we learn the MARSS package and can apply the
5.8. ESTIMATING THE ARMA ORDERS 143
Kalman Smoother in that package.
5.8 Estimating the ARMA orders
We will use the auto.arima() function in forecast. This function will

estimate the level of differencing needed to make our data stationary and
estimate the AR and MA orders using AICc (or BIC if we choose).
5.8.1 Example: model selection for AR(2) data
forecast::auto.arima(ar2)
Series: ar2
Coefficients:
0.2795 0.5938 0.4861 -0.0943 0.9553
s.e. 1.1261 1.0413 1.1284 0.1887 0.3398

AIC=2831.15 AICc=2831.23 BIC=2860.59
Works with missing data too though might not estimate very close to the
true model form.
forecast::auto.arima(ar2miss)
Series: ar2miss
ARIMA(0,1,0)

AIC=166.15 AICc=166.19 BIC=168.72
5.8.2 Fitting to 100 simulated data sets
Let’s fit to 100 simulated data sets and see how often the true (generating)
model form is selected.
save.fits <- rep(NA, 100)
for (i in 1:100) {
a2 <- arima.sim(n = 100, model = list(ar = c(0.8, 0.1)))
fit <- auto.arima(a2, seasonal = FALSE, max.d = 0, max.q = 0)
save.fits[i] <- paste0(fit$arma[1], "-", fit$arma[2])
}
table(save.fits)
save.fits
1-0 2-0 3-0
71 22 7
auto.arima() uses AICc for selection by default. You can change that to
AIC or BIC using ic="aic" or ic="bic".
Repeat the simulation using AIC and BIC to see how the choice of the infor-
mation criteria affects the model that is selected.
5.8.3 Trace=TRUE
We can set Trace=TRUE to see what models auto.arima() fit.

forecast::auto.arima(ar2, trace = TRUE)
Fitting models using approximations to speed things up...
ARIMA(2,0,2) with non-zero mean : 2824.88

ARIMA(0,0,0) with zero mean : 4602.31

Now re-fitting the best model(s) without approximations...
Best model: ARIMA(2,0,2) with non-zero mean

Series: ar2
Coefficients:
0.2795 0.5938 0.4861 -0.0943 0.9553
s.e. 1.1261 1.0413 1.1284 0.1887 0.3398

AIC=2831.15 AICc=2831.23 BIC=2860.59
5.8.4 stepwise=FALSE
We can set stepwise=FALSE to use an exhaustive search. The model may be

different than the result from the non-exhaustive search.
forecast::auto.arima(ar2, trace = TRUE, stepwise = FALSE)
Fitting models using approximations to speed things up...


Now re-fitting the best model(s) without approximations...
Best model: ARIMA(2,0,0) with non-zero mean

Series: ar2
Coefficients:
ar1 ar2 mean
0.7684 0.1387 0.9561
s.e. 0.0314 0.0314 0.3332

AIC=2827.54 AICc=2827.58 BIC=2847.17
5.8.5 Fit to the anchovy data
fit <- auto.arima(anchovyts)

fit
Series: anchovyts
ARIMA(0,1,1) with drift
Coefficients:
ma1 drift
-0.6685 0.0542
s.e. 0.1977 0.0142
sigma^2 estimated as 0.04037: log likelihood=5.39

AIC=-4.79 AICc=-3.65 BIC=-1.13
Note arima() writes a MA model like:
xt = et + b1 et−1 + b2 et−2
while many authors use this notation:
xt = et − θ1 et−1 − θ2 et−2
so the MA parameters reported by auto.arima() will be NEGATIVE of that

reported in Stergiou and Christou (1996) who analyze these same data. Note,
in Stergiou and Christou, the model is written in backshift notation on page
112. To see the model as the equation above, I translated from backshift to
non-backshift notation.
5.9 Check residuals
We can do a test of autocorrelation of the residuals with Box.test() with

fitdf adjusted for the number of parameters estimated in the fit. In our
case, MA(1) and drift parameters.
res <- resid(fit)
Box.test(res, type = "Ljung-Box", lag = 12, fitdf = 2)
Box-Ljung test
data: res
X-squared = 5.1609, df = 10, p-value = 0.8802
checkresiduals() in the forecast package will automate this test and show
some standard diagnostics plots.
forecast::checkresiduals(fit)
5.10. FORECAST FROM A FITTED ARIMA MODEL 149
Residuals from ARIMA(0,1,1) with drift
0.2
0.0
−0.2
−0.4
1965 1970 1975 1980 1985 1990
0.4 10.0
0.2 7.5
count
ACF
0.0 5.0
−0.2 2.5
−0.4 0.0
1 2 3 4 5 6 7 8 9 −0.4 0.0 0.4
Lag residuals
Ljung-Box test
data: Residuals from ARIMA(0,1,1) with drift

Q* = 1.0902, df = 3, p-value = 0.7794
Model df: 2. Total lags used: 5
5.10 Forecast from a fitted ARIMA model
We can create a forecast from our anchovy ARIMA model using forecast().
The shading is the 80% and 95% prediction intervals.
fr <- forecast::forecast(fit, h = 10)
plot(fr)
Forecasts from ARIMA(0,1,1) with drift
10.0 10.5 11.0

9.5
9.0
8.5
1965 1970 1975 1980 1985 1990 1995 2000
5.11 Seasonal ARIMA model
The Chinook data are monthly and start in January 1990. To make this into
a ts object do
chinookts <- ts(chinook$log.metric.tons, start = c(1990, 1),
frequency = 12)
start is the year and month and frequency is the number of months in the
year.
Use ?ts to see more examples of how to set up ts objects.
5.11.1 Plot seasonal data
plot(chinookts)
5.11. SEASONAL ARIMA MODEL 151
6
4
chinookts
2
0
−2
1990 2000 2010 2020 2030 2040
Time
5.11.2 auto.arima() for seasonal ts
auto.arima() will recognize that our data has season and fit a seasonal
ARIMA model to our data by default. Let’s define the training data up to
1998 and use 1999 as the test data.
traindat <- window(chinookts, c(1990, 10), c(1998, 12))
testdat <- window(chinookts, c(1999, 1), c(1999, 12))
fit <- forecast::auto.arima(traindat)
fit
Series: traindat
ARIMA(1,0,0)(0,1,0)[12] with drift
Coefficients:
ar1 drift
0.3676 -0.0320
s.e. 0.1335 0.0127

AIC=220.73 AICc=221.02 BIC=228.13
Use ?window to understand how subsetting a ts object works.
5.12 Forecast using a seasonal model

Forecasting works the same using the forecast() function.
fr <- forecast::forecast(fit, h = 12)
plot(fr)
points(testdat)
Forecasts from ARIMA(1,0,0)(0,1,0)[12] with drift

5
0
−5
1992 1994 1996 1998 2000

5.13. PROBLEMS 153
5.13 Problems
For these problems, use the catch landings from Greek waters
(greeklandings) and the Chinook landings (chinook) in Washington
data. Load the data as follows:
data(greeklandings, package = "atsalibrary")
landings <- greeklandings
data(chinook, package = "atsalibrary")
chinook <- chinook.month
1. Augmented Dickey-Fuller tests in R.

a. What is the null hypothesis for the Dickey-Fuller and Augmented
Dickey-Fuller tests?
b. How do the Dickey-Fuller and Augmented Dickey-Fuller tests
differ?
c. For adf.test(), does the test allow the data to have a non-zero
level? Does the test allow the data to be stationarity around a
trend (a linear slope)?
d. For ur.df(), what does type = “none”, “drift”, and “trend” mean?
Which one gives you the same result as adf.test()? What do you
have to set the lags equal to get the default lags in adf.test()?
e. For ur.df(), how do you determine if the null hypothesis is re-
jected?
f. For ur.df(), how do you determine if there is a significant trend
in the data? How do you determine if the intercept is different
than zero?
2. KPSS tests in R.
a. What is the null hypothesis for the KPSS test?
b. For kpss.test(), what does setting null equal to “Level” versus
“Trend” change?
3. Repeat the stationarity tests for sardine 1964-1987 in the landings data
set. Here is how to set up the data for another species.
datdf <- subset(landings, Species == "Sardine")
dat <- ts(datdf$log.metric.tons, start = 1964)
dat <- window(dat, start = 1964, end = 1987)
a. Do a Dickey-Fuller (DF) test using ùr.df()` and àdf.test()`. You have to set the
a. Do an Augmented Dickey-Fuller (ADF) test using ùr.df()`. How did you choos
b. Do a KPSS test using `kpss.test()`. What does the result tell you?
4. Using the anchovy 1964-1987 data, fit using auto.arima() with
trace=TRUE.
forecast::auto.arima(anchovy, trace = TRUE)
a. Fit each of the models listed using Àrima()` and show that you can produce
b. What models are within $\Delta$AIC of 2? What is different about these mode
5. Repeat the stationarity tests and differencing tests for anchovy using
the following two time ranges: 1964-1987 and 1988-2007. The following
shows you how to subset the data:
datdf <- subset(landings, Species == "Anchovy")
dat <- ts(datdf$log.metric.tons, start = 1964)
dat64.87 <- window(dat, start = 1964, end = 1987)
a. Plot the time series for the two time periods. For the `kpss.test()`, which
a. Do the conclusions regarding stationarity and the amount of differencing ne
c. Fit each time period using àuto.arima()`. Do the selected models change?
d. Discuss the best models for each time period. How are they different?
e. You cannot compare the AIC values for an Arima(0,1,0) and Arima(0,0,1). Why
6. For the anchovy 1964-2007 data, use auto.arima() with stepwise=FALSE
to fit models.
a. find the set of models within ∆AICc = 2 of the top model.
b. Use Arima() to fit the models with Inf or -Inf in the list. Does the
set of models within ∆AICc = 2 change?
c. Create a 5-year forecast for each of the top 3 models according to
AICc.
d. How do the forecasts differ in trend and size of prediction intervals?

7. Using the chinook data set,
a. Set up a monthly time series object for the Chinook log metric
tons catch for Jan 1990 to Dec 2015.
b. Fit a seasonal model to the Chinook Jan 1990 to Dec 1999 data
using auto.arima().
c. Create a forecast through 2015 using the model in part b.

5.13. PROBLEMS 155
d. Plot the forecast with the 2014 and 2015 actual landings added as
data points.
e. The model from part b has drift. Fit this model using Arima()
without drift and compare the 2015 forecast with this model.
Chapter 6
Univariate state-space models
This chapter will show you how to fit some basic univariate state-space
models using the MARSS package, the StructTS() function, and JAGS
code. This chapter will also introduce you to the idea of writing AR(1) models
in state-space form.
Data and packages
All the data used in the chapter are in the MARSS package. The other
required packages are stats (normally loaded by default when starting R),
datasets and forecast. Install the packages, if needed, and load:
library(stats)
library(MARSS)
library(forecast)
library(datasets)
To run the JAGS code example (optional), you will also need JAGS installed
and the R2jags, rjags and coda R packages. To run the Stan code example
(optional), you will need the rstan package.
157
158 CHAPTER 6. UNIVARIATE STATE-SPACE MODELS
6.1 Fitting a state-space model with MARSS
The MARSS package fits multivariate auto-regressive models of this form:
xt = Bxt−1 + u + wt where wt ∼ N(0, Q)

yt = Zxt + a + vt where vt ∼ N(0, R) (6.1)
x0 = µ
To fit your time series model with the MARSS package, you need to put
your model into the form above. The B, Z, u, a, Q, R and µ are parameters
that are (potentially) estimated. The y are your data. The x are the hidden
state(s). Everything in bold is a matrix; if it is a small bolded letter, it is a
matrix with 1 column.
Important: In the state-space model equation, y is always the data and x is a
hidden random walk estimated from the data.
A basic MARSS() call looks like fit=MARSS(y, model=list(...)). The ar-
gument model tells the function what form the parameters take. The list
has the elements with the names: B, U, Q, etc. The names correspond to the
parameters with the same names in Equation (6.1) except that µ is called x0.
tinitx indicates whether the initial x is specified at t = 0 so x0 or t = 1 so
x1 .
Here’s an example. Let’s say we want to fit a univariate AR(1) model observed
with error. Here is that model:
xt = bxt−1 + wt where wt ∼ N(0, q)

yt = xt + vt where vt ∼ N(0, r) (6.2)
x0 = µ
To fit this with MARSS(), we need to write Equation (6.2) as Equation (6.1).
Equation (6.1) is in MATRIX form. In the model list, the parameters must
be written EXACTLY like they would be written for Equation (6.1). For
example, 1 is the number 1 in R. It is not a matrix:
class(1)
[1] "numeric"
6.1. FITTING A STATE-SPACE MODEL WITH MARSS 159
If you need a 1 (or 0) in your model, you need to pass in the parameter as a
1 × 1 matrix: matrix(1).
With that mind, our model list for Equation (6.2) is:
mod.list <- list(B = matrix(1), U = matrix(0), Q = matrix("q"),
Z = matrix(1), A = matrix(0), R = matrix("r"), x0 = matrix("mu"),
tinitx = 0)
We can simulate some AR(1) plus error data like so

q <- 0.1
r <- 0.1
n <- 100
y <- cumsum(rnorm(n, 0, sqrt(q))) + rnorm(n, 0, sqrt(r))
And then fit with MARSS() using mod.list above:

fit <- MARSS(y, model = mod.list)
Success! abstol and log-log tests passed at 16 iterations.

Alert: conv.test.slope.tol is 0.5.
Test with smaller values (<0.1) to ensure convergence.
MARSS fit is
Estimation method: kem
Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001
Estimation converged in 16 iterations.
Log-likelihood: -65.70444
AIC: 137.4089 AICc: 137.6589
Estimate
R.r 0.1066
Q.q 0.0578
x0.mu -0.2024
Initial states (x0) defined at t=0
Standard errors have not been calculated.

Use MARSSparamCIs to compute CIs and bias estimates.
If we wanted to fix q = 0.1, then Q = [0.1] (a 1 × 1 matrix with 0.1). We just
change mod.list$Q and re-fit:

mod.list$Q <- matrix(0.1)
fit <- MARSS(y, model = mod.list)
6.2 Examples using the Nile river data
We will use the data from the Nile River (Figure 6.1). We will fit different
flow models to the data and compare the models with AIC.
library(datasets)
dat <- as.vector(Nile)
1400
1200
Flow volume
1000
800
600
1880 1900 1920 1940 1960
Year
Figure 6.1: The Nile River flow volume 1871 to 1970 (Nile dataset in R).
6.2.1 Flat level model
We will start by modeling these data as a simple average river flow with
variability around some level µ.
6.2. EXAMPLES USING THE NILE RIVER DATA 161
yt = µ + vt where vt ∼ N(0, r) (6.3)

where yt is the river flow volume at year t.
We can write this model as a univariate state-space model as follows. We use
xt to model the average flow level. yt is just an observation of this flat xt .
Work through x1 , x2 , . . . starting from x0 to convince yourself that xt will
always equal µ.
xt = 1 × xt−1 + 0 + wt where wt ∼ N(0, 0)

yt = 1 × xt + 0 + vt where vt ∼ N(0, r) (6.4)
x0 = µ
The model is specified as a list as follows:

mod.nile.0 <- list(B = matrix(1), U = matrix(0), Q = matrix(0),
tinitx = 0)
We then fit the model:

kem.0 <- MARSS(dat, model = mod.nile.0)
Output not shown, but here are the estimates and AICc.
c(coef(kem.0, type = "vector"), LL = kem.0$logLik, AICc = kem.0$AICc)
R.r x0.mu LL AICc

28351.5675 919.3500 -654.5157 1313.1552
6.2.2 Linear trend in flow model
Figure 6.2 shows the fit for the flat average river flow model. Looking at the
data, we might expect that a declining average river flow would be better. In
MARSS form, that model would be:
xt = 1 × xt−1 + u + wt where wt ∼ N(0, 0)

yt = 1 × xt + 0 + vt where vt ∼ N(0, r) (6.5)
x0 = µ
where u is now the average per-year decline in river flow volume. The model
is specified as follows:
mod.nile.1 <- list(B = matrix(1), U = matrix("u"), Q = matrix(0),
tinitx = 0)
We then fit the model:

Here are the estimates, log-likelihood and AICc:

R.r U.u x0.mu LL AICc

22213.595453 -2.692106 1054.935067 -642.315910 1290.881821
Figure 6.2 shows the fits for the two models with deterministic models (flat
and declining) for mean river flow along with their AICc values (smaller AICc
is better). The AICc for the model with a declining river flow is lower by
over 20 (which is a lot).
6.2.3 Stochastic level model
Looking at the flow levels, we might suspect that a model that allows the
average flow to change would model the data better and we might suspect
that there have been sudden, and anomalous, changes in the river flow
level. We will now model the average river flow at year t as a random walk,
specifically an autoregressive process which means that average river flow is
year t is a function of average river flow in year t − 1.
xt = xt−1 + wt where wt ∼ N(0, q)

x0 = µ
As before, yt is the river flow volume at year t. xt is the mean level. The
model is specified as:
6.2. EXAMPLES USING THE NILE RIVER DATA 163
mod.nile.2 = list(B = matrix(1), U = matrix(0), Q = matrix("q"),

tinitx = 0)
We could also use the text shortcuts to specify the model. Because R and Q
are 1 × 1 matrices, “unconstrained”, “diagonal and unequal”, “diagonal and
equal” and “equalvarcov” will all lead to a 1 × 1 matrix with one estimated
element. For a and u, the following shortcut could be used:
A <- "zero"
U <- "zero"
Because x0 is 1 × 1, it could be specified as “unequal”, “equal” or “uncon-

strained”.

R.r Q.q x0.mu LL AICc

15065.6121 1425.0030 1111.6338 -637.7631 1281.7762
6.2.4 Stochastic level model with drift
We can add a drift to term to our random walk; the u in the process model
(x) is the drift term. This causes the random walk to tend to trend up or
down.
xt = xt−1 + u + wt where wt ∼ N(0, q)

x0 = µ
The model is then specified by changing U to indicate that a u is estimated:
mod.nile.3 = list(B = matrix(1), U = matrix("u"), Q = matrix("q"),
tinitx = 0)

R.r U.u Q.q x0.mu LL AICc

15585.278194 -3.248793 1088.987455 1124.044484 -637.302692 1283.026436
Figure 6.2 shows all the models along with their AICc values.
6.3 The StructTS function
The StructTS function in the stats package in R will also fit the stochastic
level model:
fit.sts <- StructTS(dat, type = "level")
fit.sts
Call:
StructTS(x = dat, type = "level")
Variances:
level epsilon
1469 15099
The estimates from StructTS() will be different (though similar) from

MARSS() because StructTS() uses x1 = y1 , that is the hidden state at
t = 1 is fixed to be the data at t = 1. That is fine if you have a long data
set, but would be disastrous for the short data sets typical in fisheries and
ecology.
StructTS() is much, much faster for long time series. The example in
?StructTS is pretty much instantaneous with StructTS() but takes minutes
with the EM algorithm that is the default in MARSS(). With the BFGS
algorithm, it is much closer to StructTS():
6.3. THE STRUCTTS FUNCTION 165
trees <- window(treering, start = 0)

fitts <- StructTS(trees, type = "level")
fitem <- MARSS(as.vector(trees), mod.nile.2)
fitbf <- MARSS(as.vector(trees), mod.nile.2, method = "BFGS")
Note that mod.nile.2 specifies a univariate stochastic level model so we can

use it just fine with other univariate data sets.
In addition, fitted(fit.sts) where fit.sts is a fit from StructTS() is

very different than fit.marss$states from MARSS().
t = 10
fitted(fit.sts)[t]
[1] 1162.904
is the expected value of yt+1 (in this case y11 since we set t = 10) given
the data up to yt (in this case, up to y10 ). It is called the one-step ahead
prediction.
We are not going to use the one-step ahead predictions unless we are forecasting
or doing cross-validation.
Typically, when we analyze fisheries and ecological data, we want to know

the estimate of the state, the xt , given ALL the data. For example, we might
need an estimate of the population size in year 1990 given a time series of
counts from 1930 to 2015. We don’t want to use only the data up to 1989;
we want to use all the information. fit.marss$states from MARSS() is the
expected value of xt given all the data. For the stochastic level model, that is
equal to the expected value of yt given all the data except yt .
If you needed the one-step predictions from MARSS(), you can get them from
the Kalman filter output:
kf = print(kem.2, what = "kfs")
kf$xtt1[1, t]
Passing in what="kfs" returns the Kalman filter/smoother output. The

expected value of xt conditioned on y1 to yt−1 is in kf$xtt1. The expected
value of xt conditioned on all the data is in kf$xtT.
1000 1400
model 0, AICc= 1313
Flow volume
600
1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970
1000 1400
model 1, AICc= 1291

Flow volume
600
1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970
1000 1400
model 2, AICc= 1282

Flow volume
600
1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970
1000 1400
model 3, AICc= 1283

Flow volume
600
1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970
Figure 6.2: The Nile River flow volume with the model estimated flow rates
(solid lines). The bottom model is a stochastic level model, meaning there
isn’t one level line. Rather the level line is a distribution that has a mean
and standard deviation. The solid state line in the bottom plots is the mean
of the stochastic level and the 2 standard deviations are shown. The other
two models are deterministic level models so the state is not stochastic and
does not have a standard deviation.
6.4. COMPARING MODELS WITH AIC AND MODEL WEIGHTS 167
6.4 Comparing models with AIC and model

weights
To get the AIC or AICc values for a model fit from a MARSS fit, use
fit$AIC or fit$AICc. The log-likelihood is in fit$logLik and the number
of estimated parameters in fit$num.params. For fits from other functions,
try AIC(fit) or look at the function documentation.
Let’s put the AICc values 3 Nile models together:
nile.aic = c(kem.0$AICc, kem.1$AICc, kem.2$AICc, kem.3$AICc)
Then we calculate the AICc minus the minus AICc in our model set and
compute the model weights. ∆AIC is the AIC values minus the minimum
AIC value in your model set.
delAIC <- nile.aic - min(nile.aic)
relLik <- exp(-0.5 * delAIC)
aicweight <- relLik/sum(relLik)
And this leads to our model weights table:

aic.table <- data.frame(AICc = nile.aic, delAIC = delAIC, relLik = relLik,
weight = aicweight)
rownames(aic.table) <- c("flat level", "linear trend", "stoc level",
"stoc level w drift")
Here the table is printed using round() to limit the number of digits shown.
round(aic.table, digits = 3)
AICc delAIC relLik weight

flat level 1313.155 31.379 0.000 0.000
linear trend 1290.882 9.106 0.011 0.007
stoc level 1281.776 0.000 1.000 0.647
stoc level w drift 1283.026 1.250 0.535 0.346
One thing to keep in mind when comparing models within a set of models is
that the model set needs to include at least one model that can fit the data
reasonably well. Reasonably well' means the model can put a fitted
line through the data. Can't all models do that? Definitely,
not. For example, the flat-level model cannot put a fitted

line through the Nile River data. It is simply impossible.
The straight trend model also cannot put a fitted line through
the flow data. So if our model set only included flat-level
and straight trend, then we might have said that the straight
trend model isbest’ even though it is just the better of two bad models.
6.5 Basic diagnostics
The first diagnostic that you do with any statistical analysis is check that
your residuals correspond to your assumed error structure. We have two
types of errors in a univariate state-space model: process errors, the wt , and
observation errors, the vt .
They should not have a temporal trend. To get the residuals from most
types of fits in R, you can use residuals(fit). MARSS() calls the vt , “model
residuals”, and the wt “state residuals”. We can plot these using the following
code (Figure 6.3).
resids <- residuals(kem.0)
MARSSresiduals.tT reported warnings. See msg element of returned residuals obj

plot(resids$model.residuals[1, ], ylab = "model residual", xlab = "",
main = "flat level")
abline(h = 0)
plot(resids$state.residuals[1, ], ylab = "state residual", xlab = "",
main = "flat level")
abline(h = 0)

The residuals should also not be autocorrelated in time. We can check the
autocorrelation with the function acf(). We won’t do this for the state
residuals for the flat level or linear trends since for those models wt = 0. The
6.5. BASIC DIAGNOSTICS 169
flat level flat level

model residual
state residual
−400
−1.0
0 20 40 60 80 100 0 20 40 60 80 100
linear trend linear trend

model residual
state residual
−400
−1.0
0 20 40 60 80 100 0 20 40 60 80 100
stoc level stoc level

model residual
state residual
−300
−40
0 20 40 60 80 100 0 20 40 60 80 100
Figure 6.3: The model and state residuals for the first 3 models.
autocorrelation plots are shown in Figure 6.4. The stochastic level model
looks the best in that its model residuals (the vt ) are fine but the state model
still has problems. Clearly the state is not a simple random walk. This is not
surprising. The Aswan Low Dam was completed in 1902 and changed the
mean flow. The Aswan High Dam was completed in 1970 and also affected
the flow. You can see these perturbations in Figure 6.1.
MARSSresiduals.tT reported warnings. See msg element of returned residuals object.

acf(resids$model.residuals[1, ], main = "flat level v(t)", na.action = na.pass)

acf(resids$model.residuals[1, ], main = "linear trend v(t)",
na.action = na.pass)

acf(resids$model.residuals[1, ], main = "stoc level v(t)", na.action = na.pass

acf(resids$state.residuals[1, ], main = "stoc level w(t)", na.action = na.pass

flat level v(t) linear trend v(t)

0.6
0.6
ACF
ACF
−0.2
−0.2
0 5 10 15 20 0 5 10 15 20
Lag Lag
stoc level v(t) stoc level w(t)

0.6
0.6
ACF
ACF
−0.2
−0.2
0 5 10 15 20 0 5 10 15 20
Lag Lag
Figure 6.4: The model and state residual acfs for the 3 models.
6.6. FITTING WITH JAGS 171
6.6 Fitting with JAGS
Here we show how to fit the stochastic level model, model 3 Equation (6.7),
with JAGS. This is a model where the level is a random walk with drift and
the Nile River flow is that level plus error.
library(datasets)
y <- as.vector(Nile)
This section requires that you have JAGS installed and the R2jags, rjags
and coda R packages loaded.
library(R2jags)
library(rjags)
library(coda)
The first step is to write the model for JAGS to a file (filename in model.loc):
model.loc <- "ss_model.txt"
jagsscript <- cat("
model {
# priors on parameters
mu ~ dnorm(Y1, 1/(Y1*100)); # normal mean = 0, sd = 1/sqrt(0.01)
tau.q ~ dgamma(0.001,0.001); # This is inverse gamma
sd.q <- 1/sqrt(tau.q); # sd is treated as derived parameter
tau.r ~ dgamma(0.001,0.001); # This is inverse gamma
sd.r <- 1/sqrt(tau.r); # sd is treated as derived parameter
u ~ dnorm(0, 0.01);
# Because init X is specified at t=0

X0 <- mu
X[1] ~ dnorm(X0+u,tau.q);
Y[1] ~ dnorm(X[1], tau.r);
for(i in 2:TT) {
predX[i] <- X[i-1]+u;
X[i] ~ dnorm(predX[i],tau.q); # Process variation
Y[i] ~ dnorm(X[i], tau.r); # Observation variation
}
}
",
file = model.loc)
Next we specify the data (and any other input) that the JAGS code needs. In
this case, we need to pass in dat and the number of time steps since that is
used in the for loop. We also specify the parameters that we want to monitor.
We need to specify at least one, but we will monitor all of them so we can
plot them after fitting. Note, that the hidden state is a parameter in the
Bayesian context (but not in the maximum likelihood context).
jags.data <- list(Y = y, TT = length(y), Y1 = y[1])
jags.params <- c("sd.q", "sd.r", "X", "mu", "u")
Now we can fit the model:

mod_ss <- jags(jags.data, parameters.to.save = jags.params, model.file = model
n.chains = 3, n.burnin = 5000, n.thin = 1, n.iter = 10000,
DIC = TRUE)
We can then show the posteriors along with the MLEs from MARSS on top
(Figure 6.5 ) using the code below.
attach.jags(mod_ss)
hist(mu)
abline(v = coef(kem.3)$x0, col = "red")
hist(u)
abline(v = coef(kem.3)$U, col = "red")
hist(log(sd.q^2))
abline(v = log(coef(kem.3)$Q), col = "red")
hist(log(sd.r^2))
abline(v = log(coef(kem.3)$R), col = "red")
detach.jags()
To plot the estimated states ( Figure 6.6 ), we write a helper function:

plotModelOutput <- function(jagsmodel, Y) {
attach.jags(jagsmodel)
6.6. FITTING WITH JAGS 173
Histogram of mu Histogram of u
Frequency
Frequency
2000
0.6
0.0
0
0 1 2 3 4 5 −20 −10 0 10 20
mu u
Histogram of log(sd.q^2) Histogram of log(sd.r^2)

Frequency
Frequency
1500
0 1500
0
3 4 5 6 7 8 9 9.0 9.5 10.0 10.5
log(sd.q^2) log(sd.r^2)
Figure 6.5: The posteriors for model 3 with MLE estimates from MARSS()
shown in red.
x <- seq(1, length(Y))

XPred <- cbind(apply(X, 2, quantile, 0.025), apply(X, 2,
mean), apply(X, 2, quantile, 0.975))
ylims <- c(min(c(Y, XPred), na.rm = TRUE), max(c(Y, XPred),
na.rm = TRUE))
plot(Y, col = "white", ylim = ylims, xlab = "", ylab = "State predictions")
polygon(c(x, rev(x)), c(XPred[, 1], rev(XPred[, 3])), col = "grey70",
border = NA)
lines(XPred[, 2])
points(Y)
}
plotModelOutput(mod_ss, y)
The following object is masked _by_ .GlobalEnv:
mu
lines(kem.3$states[1, ], col = "red")

lines(1.96 * kem.3$states.se[1, ] + kem.3$states[1, ], col = "red",
lty = 2)
lines(-1.96 * kem.3$states.se[1, ] + kem.3$states[1, ], col = "red",
lty = 2)
title("State estimate and data from\nJAGS (black) versus MARSS (red)")
State estimate and data from

JAGS (black) versus MARSS (red)
1400
1200
State predictions
1000
800
600
0 20 40 60 80 100
Figure 6.6: The estimated states from the Bayesian fit along with 95% credible
intervals (black and grey) with the MLE states and 95% condidence intervals
in red.
6.7 Fitting with Stan

Let’s fit the same model with Stan using the rstan package. If you have not
already, you will need to install the rstan package. This package depends
on a number of other packages which should install automatically when you
install rstan.
library(datasets)
library(rstan)
6.7. FITTING WITH STAN 175
y <- as.vector(Nile)
First we write the model. We could write this to a file (recommended), but for
this example, we write as a character object. Though the syntax is different
from the JAGS code, it has many similarities. Note, unlike the JAGS, the
Stan does not allow any NAs in your data. Thus we have to specify the
location of the NAs in our data. The Nile data does not have NAs, but we
want to write the code so it would work even if there were NAs.
scode <- "
data {
int<lower=0> TT;
int<lower=0> n_pos; // number of non-NA values
int<lower=0> indx_pos[n_pos]; // index of the non-NA values
vector[n_pos] y;
}
parameters {
real x0;
real u;
vector[TT] pro_dev;
real<lower=0> sd_q;
real<lower=0> sd_r;
}
transformed parameters {
vector[TT] x;
x[1] = x0 + u + pro_dev[1];
for(i in 2:TT) {
x[i] = x[i-1] + u + pro_dev[i];
}
}
model {
x0 ~ normal(y[1],10);
u ~ normal(0,2);
sd_q ~ cauchy(0,5);
sd_r ~ cauchy(0,5);
pro_dev ~ normal(0, sd_q);
for(i in 1:n_pos){
y[i] ~ normal(x[indx_pos[i]], sd_r);
}
}
generated quantities {
vector[n_pos] log_lik;
for (i in 1:n_pos) log_lik[i] = normal_lpdf(y[i] | x[indx_pos[i]], sd_r);
}
"
Then we call stan() and pass in the data, names of parameter we wish to
have returned, and information on number of chains, samples (iter), and
thinning. The output is verbose (hidden here) and may have some warnings.
# We pass in the non-NA ys as vector
ypos <- y[!is.na(y)]
n_pos <- sum(!is.na(y)) #number on non-NA ys
indx_pos <- which(!is.na(y)) #index on the non-NAs
mod <- rstan::stan(model_code = scode, data = list(y = ypos,
TT = length(y), n_pos = n_pos, indx_pos = indx_pos), pars = c("sd_q",
"x", "sd_r", "u", "x0"), chains = 3, iter = 1000, thin = 1)
We use extract() to extract the parameters from the fitted model and we
can plot. The estimated level is x and we will plot that with the 95% credible
intervals.
pars <- rstan::extract(mod)
pred_mean <- apply(pars$x, 2, mean)
pred_lo <- apply(pars$x, 2, quantile, 0.025)
pred_hi <- apply(pars$x, 2, quantile, 0.975)
plot(pred_mean, type = "l", lwd = 3, ylim = range(c(pred_mean,
pred_lo, pred_hi)), ylab = "Nile River Level")
lines(pred_lo)
lines(pred_hi)
points(y, col = "blue")
Here is a ggplot() version of the plot.

library(ggplot2)
nile <- data.frame(y = y, year = 1871:1970)
h <- ggplot(nile, aes(year))
6.7. FITTING WITH STAN 177
1200
Nile River Level
900 1000
800
700
0 20 40 60 80 100
Index
Figure 6.7: Estimated level and 95 percent credible intervals. Blue dots are
the actual Nile River levels.
h + geom_ribbon(aes(ymin = pred_lo, ymax = pred_hi), fill = "grey70") +

geom_line(aes(y = pred_mean), size = 1) + geom_point(aes(y = y),
color = "blue") + labs(y = "Nile River level")
We can plot the histogram of the samples against the values estimated via
maximum likelihood.
hist(pars$x0)
abline(v = coef(kem.3)$x0, col = "red")
hist(pars$u)
abline(v = coef(kem.3)$U, col = "red")
hist(log(pars$sd_q^2))
abline(v = log(coef(kem.3)$Q), col = "red")
hist(log(pars$sd_r^2))
abline(v = log(coef(kem.3)$R), col = "red")
1400
Nile River level 1200
1000
800
600
1875 1900 1925 1950

year
Figure 6.8: Estimated level and 95 percent credible intervals
Histogram of pars$x0 Histogram of pars$u

Frequency
Frequency
150
200
0
1090 1110 1130 1150 −6 −4 −2 0 2 4 6
pars$x0 pars$u
Histogram of log(pars$sd_q^2) Histogram of log(pars$sd_r^2)

350
Frequency
Frequency
150
0 150
0
4 5 6 7 8 9 9.0 9.5 10.0
log(pars$sd_q^2) log(pars$sd_r^2)
Figure 6.9: Histogram of the parameter samples versus the estimate (red line)
from maximum likelihood.
6.8. A RANDOM WALK MODEL OF ANIMAL MOVEMENT 179
6.8 A random walk model of animal move-

ment
A simple random walk model of movement with drift (directional movement)
but no correlation is
x1,t = x1,t−1 + u1 + w1,t , w1,t ∼ N(0, σ12 ) (6.8)
x2,t = x2,t−1 + u2 + w2,t , w2,t ∼ N(0, σ22 ) (6.9)
where x1,t is the location at time t along one axis (here, longitude) and x2,t is
for another, generally orthogonal, axis (in here, latitude). The parameter u1 is
the rate of longitudinal movement and u2 is the rate of latitudinal movement.
We add errors to our observations of location:
y1,t = x1,t + v1,t , v1,t ∼ N(0, η12 ) (6.10)
y2,t = x2,t + v2,t , v2,t ∼ N(0, η22 ), (6.11)
This model is comprised of two separate univariate state-space models. Note

that y1 depends only on x1 and y2 depends only on x2 . There are no actual
interactions between these two univariate models. However, we can write
the model down in the form of a multivariate model using diagonal variance-
covariance matrices and a diagonal design (Z) matrix. Because the variance-
covariance matrices and Z are diagonal, the x1 :y1 and x2 :y2 processes will be
independent as intended. Here are Equations (6.9) and (6.11) written as a
MARSS model (in matrix form):
" # " # " # " # " #!
x1,t x u w σ2 0
= 1,t−1 + 1 + 1,t , wt ∼ MVN 0, 1 (6.12)
x2,t x2,t−1 u2 w2,t 0 σ22
" # " #" # " # " #!

y1,t 1 0 x1,t v η2 0
= + 1,t , vt ∼ MVN 0, 1 2 (6.13)
y2,t 0 1 x2,t v2,t 0 η2
The variance-covariance matrix for wt is a diagonal matrix with unequal
variances, σ12 and σ22 . The variance-covariance matrix for vt is a diagonal
matrix with unequal variances, η12 and η22 . We can write this succinctly as
xt = xt−1 + u + wt , wt ∼ MVN(0, Q) (6.14)
yt = xt + vt , vt ∼ MVN(0, R). (6.15)
6.9 Problems
1. Write the equations for each of these models: ARIMA(0,0,0),
ARIMA(0,1,0), ARIMA(1,0,0), ARIMA(0,0,1), ARIMA(1,0,1). Read
the help file for the Arima() function (in the forecast package) if you
are fuzzy on the arima notation.
2. The MARSS package includes a data set of sharp-tailed grouse in
Washington. Load the data to use as follows:
library(MARSS)
dat = log(grouse[, 2])
Consider these two models for the data:

• Model 1 random walk with no drift observed with no error
• Model 2 random walk with drift observed with no error
Written as a univariate state-space model, model 1 is
xt = xt−1 + wt where wt ∼ N(0, q)
x0 = a (6.16)
y t = xt
Model 2 is almost identical except with u added
x0 = a (6.17)
y t = xt
y is the log grouse count in year t.
a. Plot the data. The year is in column 1 of grouse.
b. Fit each model using MARSS().
c. Which one appears better supported given AICc?
d. Load the forecast package. Use ?auto.arima to learn what
it does. Then use auto.arima(dat) to fit the data. Next run
auto.arima(dat, trace=TRUE) to see all the ARIMA models
that the function compared. Note, ARIMA(0,1,0) is a random
6.9. PROBLEMS 181
walk with b=1. ARIMA(0,1,0) with drift would be a random walk

(b=1) with drift (with u).
e. Is the difference in the AICc values between a random walk with

and without drift comparable between MARSS() and auto.arima()?
Note when using auto.arima(), an AR(1) model of the following form
will be fit (notice the b): xt = bxt−1 + wt . auto.arima() refers to
this model xt = xt−1 + wt , which is also AR(1) but with b = 1, as
ARIMA(0,1,0). This says that the first difference of the data (that’s
the 1 in the middle) is a ARMA(0,0) process (the 0s in the 1st and 3rd
spots). So ARIMA(0,1,0) means this: xt − xt−1 = wt .
3. Create a random walk with drift time series using cumsum() and
rnorm(). Look at the rnorm() help file (?rnorm) to make sure you
know what the arguments to the rnorm() are.
dat <- cumsum(rnorm(100, 0.1, 1))
a. What is the order of this random walk written as ARIMA(p, d,

q)? “what is the order” means “what is p, d, and q. Model”order"
is how arima() and Arima() specify arima models.
b. Fit that model using Arima() in the forecast package. You’ll
need to specify the arguments order and include.drift. Use
?Arima to review what that function does if needed.
c. Write out the equation for this random walk as a univariate state-
space model. Notice that there is no observation error, but still
write this as a state-space model.
d. Fit that model with MARSS().
e. How are the two estimates from Arima() and MARSS() different?
4. The first-difference of dat used in the previous problem is:
diff.dat = diff(dat)
Use ?diff to check what the diff() function does.

a. If xt denotes a time series. What is the first difference of x? What
is the second difference?
b. What is the x model for diff.dat? Look at your answer to part

(a) and the answer to part (e).
c. Fit diff.dat using Arima(). You’ll need to change the arguments
order and include.mean.
d. Fit with MARSS(). You will need to write the model for diff.dat
as a state-space model. If you’ve done this right, the estimated
parameters using Arima() and MARSS() will now be the same.
This question should clue you into the fact that Arima() is not exactly
fitting Equation (6.1). It’s very similar, but not quite written that way.
By the way, Equation (6.1) is how structural time series observed with
error are written (state-space models). To recover the estimates that
a function like arima() or Arima() returns, you need to write your
state-space model in a specific way (as seen above).
5. Arima() will also fit what it calls an “AR(1) with drift”. An AR(1)
with drift is NOT this model:
xt = bxt−1 + u + wt where wt ∼ N(0, q) (6.18)
In the population dynamics literature, this equation is called the Gom-
pertz model and is a type of density-dependent population model.
a. Write R code to simulate Equation (6.18). Make b less than 1 and
greater than 0. Set u and x0 to whatever you want. You can use a
for loop.
b. Plot the trajectories and show that this model does not “drift”
upward or downward. It fluctuates about a mean value.
c. Hold b constant and change u. How do the trajectories change?
d. Hold u constant and change b. Make sure to use a b close to 1 and
another close to 0. How do the trajectories change?
e. Do 2 simulations each with the same wt . In one simulation, set
u = 1 and in the other u = 2. For both simulations, set x1 =
u/(1 − b). You can set b to whatever you want as long as 0 < b < 1.
Plot the 2 trajectories on the same plot. What is different?
We will fit what Arima() calls “AR(1) with drift” models in the chapter
on MARSS models with covariates.
6.9. PROBLEMS 183
6. The MARSS package includes a data set of gray whales. Load the
data to use as follows:
library(MARSS)
dat <- log(graywhales[, 2])
Fit a random walk with drift model observed with error to the data:
x0 = a
y is the whale count in year t. x is interpreted as the ‘true’ unknown

population size that we are trying to estimate.
a. Fit this model with MARSS()
b. Plot the estimated x as a line with the actual counts added
as points. x is in fit$states. It is a matrix. To plot using
plot(), you will need to change it to a vector using as.vector()
or fit$states[1,]
c. Simulate 1000 sample gray whale populstion trajectories (the x in
your model) using the estimated u and q starting at the estimated x
in 1997. You can do this with a couple for loops or write something
terse with cumsum() and apply().
d. Using these simulated trajectories, what is your estimate of the
probability that the grey whale population will be above 50,000
graywhales in 2007?
e. What kind(s) of uncertainty does your estimate above NOT in-
clude?
7. Fit the following models to the graywhales data using MARSS(). Assume
b = 1.
• Model 1 Process error only model with drift
• Model 2 Process error only model without drift
• Model 3 Process error with drift and observation error with obser-
vation error variance fixed = 0.05.
• Model 4 Process error with drift and observation error with obser-
vation error variance estimated.
a. Compute the AICc’s for each model and likelihood or deviance (-2
* log likelihood). Where to find these? Try names(fit). logLik()
is the standard R function to return log-likelihood from fits.
b. Calculate a table of ∆AICc values and AICc weights.
c. Show the acf of the model and state residuals for the best model.
You will need a vector of the residuals to do this. If fit is the fit
from a fit call like fit = MARSS(dat), you get the residuals using
this code:
residuals(fit)$state.residuals[1, ]
residuals(fit)$model.residuals[1, ]
Do the acf’s suggest any problems?

8. Evaluate the predictive accuracy of forecasts using the forecast package
using the airmiles dataset. Load the data to use as follows:
library(forecast)
dat <- log(airmiles)
n <- length(dat)
training.dat <- dat[1:(n - 3)]
test.dat <- dat[(n - 2):n]
This will prepare the training data and set aside the last 3 data points
for validation.
a. Fit the following four models using Arima(): ARIMA(0,0,0),
ARIMA(1,0,0), ARIMA(0,0,1), ARIMA(1,0,1).
b. Use forecast() to make 3 step ahead forecasts from each.
c. Calculate the MASE statistic for each using the accuracy() func-
tion in the forecast package. Type ?accuracy to learn how to
use this function.
d. Present the results in a table.
e. Which model is best supported based on the MASE statistic?
6.9. PROBLEMS 185
9. The WhaleNet Archive of STOP Data has movement data on logger-

head turtles on the east coast of the US from ARGOS tags. The
MARSS package loggerheadNoisy dataset is lat/lot data on eight
individuals, however we have corrupted this data severely by adding
random errors in order to create a “bad tag” problem (very noisy). Use
head(loggerheadNoisy) to get an idea of the data. Then load the
data on one turtle, MaryLee. MARSS needs time across the columns to
you need to use transpose the data (as shown).
turtlename <- "MaryLee"
dat <- loggerheadNoisy[which(loggerheadNoisy$turtle == turtlename),
5:6]
dat <- t(dat)
a. Plot MaryLee’s locations (as a line not dots). Put the latitude
locations on the y-axis and the longitude on the y-axis. You can
use rownames(dat) to see which is in which row. You can just use
plot() for the homework. But if you want, you can look at the
MARSS Manual chapter on animal movement to see how to plot
the turtle locations on a map using the maps package.
b. Analyze the data with a state-space model (movement observed
with error) using
fit0 <- MARSS(dat)
Look at the output from the above MARSS call. What is the
meaning of the parameters output from MARSS in terms of turtle
movement? What exactly is the u estimate for example? Look at
the data and think about the model you fit.
c. What assumption did the default MARSS model make about obser-
vation error and process error? What does that assumption mean
in terms of how steps in the N-S and E-W directions are related?
What does that assumption mean in terms of our assumption about
the latitudal and longitudinal observation errors?
d. Does MaryLee move faster in the latitude direction versus longitude
direction?
e. Add MaryLee’s estimated “true” positions to your plot of her
locations. You can use lines(x, y, col="red") (with x and

y replaced with your x and y data). The true position is the
“state”. This is in the states element of an output from MARSS
fit0$states.
f. Fit the following models with different assumptions regarding the
movement in the lat/lon direction:
• Lat/lon movements are independent but the variance is the
same
• Lat/lon movements are correlated and lat/lon variances are
different
• Lat/lon movements are correlated and the lat/lon variances
are the same.
You only need to change Q specification. Your MARSS call will now
look like the following with ... replaced with your Q specification.
fit1 <- MARSS(dat, list(Q = ...))
g. Plot your state residuals (true location residuals). What are the
problems? Discuss in reference to your plot of the location data.
Here is how to get state residuals from MARSS() output:
resids <- residuals(fit0)$state.residuals
The lon residuals are in row 1 and lat residuals are in row 2 (same o
Chapter 7
MARSS models
This lab will show you how to fit multivariate state-space (MARSS) models
using the MARSS package. This class of time-series model is also called vec-
tor autoregressive state-space (VARSS) models. This chapter works through
an example which uses model selection to test different population structures
in west coast harbor seals. See Holmes et al. (2014) for a fuller version of this
example.
Rmd for this chapter can be downloaded here
Data and packages
All the data used in the chapter are in the MARSS package. For most
examples, we will use the MARSS() function to fit models via maximum-
likelihood. We also show how to fit a Bayesian model using JAGS and Stan.
For these sectiosn you will need the R2jags, coda and rstan packages. To
run the JAGS code, you will also need JAGS installed. See Chapter 12 for
more details on JAGS and Chapter 13 for more details on Stan.
library(MARSS)
library(R2jags)
library(coda)
library(rstan)
187
188 CHAPTER 7. MULTIVARIATE STATE-SPACE MODELS
7.1 Overview
As discussed in Chapter 6, the MARSS package fits multivariate state-space
models in this form:
xt = Bxt−1 + u + wt where wt ∼ N(0, Q)
yt = Zxt + a + vt where vt ∼ N(0, R) (7.1)
x0 = µ
where each of the bolded terms are matrices. Those that are bolded and small
(not capitalized) have one column only, so are column matrices.
To fit a multivariate time series model with the MARSS package, you need
to first determine the size and structure of each of the parameter matrices:
B, u, Q, Z, a, R and µ. This requires first writing down your model in
matrix form. We will illustarte this with a series of models for the temporal
population dynamics of West coast harbor seals.
7.2 West coast harbor seals counts

In this example, we will use multivariate state-space models to combine
surveys from four survey regions to estimate the average long-term population
growth rate and the year-to-year variability in that population growth rate.
We have five regions (or sites) where harbor seals were censused from 1978-
1999 while hauled out of land1 . During the period of this dataset, harbor seals
were recovering steadily after having been reduced to low levels by hunting
prior to protection. We will assume that the underlying population process
is a stochastic exponential growth process with mean rates of increase that
were not changing through 1978-1999.
The survey methodologies were consistent throughout the 20 years of the
data but we do not know what fraction of the population that each region
represents nor do we know the observation-error variance for each region.
Given differences between the numbers of haul-outs in each region, the
1
Jeffries et al. 2003. Trends and status of harbor seals in Washington State: 1978-1999.
Journal of Wildlife Management 67(1):208–219
7.2. WEST COAST HARBOR SEALS COUNTS 189
observation errors may be quite different. The regions have had different
levels of sampling; the best sampled region has only 4 years missing while the
worst has over half the years missing (Figure 7.1).
Puget Sound Harbor Seal Surveys
region
SJF
value
SJI
EBays
7
PSnd
HC
1980 1985 1990 1995 2000

Year
Figure 7.1: Plot of the of the count data from the five harbor seal regions
(Jeffries et al. 2003). The numbers on each line denote the different regions:
1) Strait of Juan de Fuca (SJF), 2) San Juan Islands (SJI), 2) Eastern Bays
(EBays), 4) Puget Sound (PSnd), and 5) Hood Canal (HC). Each region is
an index of the total harbor seal population in each region.
7.2.1 Load the harbor seal data
The harbor seal data are included in the MARSS package as matrix with
years in column 1 and the logged counts in the other columns. Let’s look at
the first few years of data:

data(harborSealWA, package = "MARSS")
print(harborSealWA[1:8, ], digits = 3)
Year SJF SJI EBays PSnd HC

[1,] 1978 6.03 6.75 6.63 5.82 6.6
[2,] 1979 NA NA NA NA NA
[6,] 1983 6.78 7.43 7.21 NA NA
[7,] 1984 6.93 7.74 7.45 NA NA
[8,] 1985 7.16 7.53 7.26 6.60 NA
We are going to leave out Hood Canal (HC) since that region is somewhat
isolated from the others and experiencing very different conditions due to
hypoxic events and periodic intense killer whale predation. We will set up
the data as follows:
dat <- MARSS::harborSealWA
years = dat[, "Year"]
dat = dat[, !(colnames(dat) %in% c("Year", "HC"))]
dat = t(dat) #transpose to have years across columns
colnames(dat) = years
n = nrow(dat) - 1
7.3 A single well-mixed population

When we are looking at data over a large geographic region, we might make the
assumption that the different census regions are measuring a single population
if we think animals are moving sufficiently such that the whole area (multiple
regions together) is “well-mixed”. We write a model of the total population
abundance for this case as:
nt = exp(u + wt )nt−1 , (7.2)
where nt is the total count in year t, u is the mean population growth rate,
and wt is the deviation from that average in year t. We then take the log of
7.3. A SINGLE WELL-MIXED POPULATION 191
both sides and write the model in log space:
xt = xt−1 + u + wt , where wt ∼ N(0, q) (7.3)
xt = log nt . When there is one effective population, there is one x, therefore

xt is a 1 × 1 matrix. This is our state model and x is called the “state”.
This is just the jargon used in this type of model (state-space model) for the
hidden state that you are estimating from the data. “Hidden” means that
you observe this state with error.
7.3.1 The observation process
We assume that all four regional time series are observations of this one pop-
ulation trajectory but they are scaled up or down relative to that trajectory.
In effect, we think of each regional survey as an index of the total population.
With this model, we do not think the regions represent independent subpopu-
lations but rather independent observations of one population. Our model
for the data, yt = Zxt + a + vt , is written as:
y1 1 0 v1
       
 y2   1   a2   v2 
=  xt + + (7.4)
       
y3 1 a3 v3
  
       
y4 t
1 a4 v4 t
Each yi is the observed time series of counts for a different region. The
a’s are the bias between the regional sample and the total population. Z
specifies which observation time series, yi , is associated with which population
trajectory, xj . In this case, Z is a matrix with 1 column since each region is
an observation of the one population trajectory.
We allow that each region could have a unique observation variance and that
the observation errors are independent between regions. We assume that the
observations errors on log(counts) are normal and thus the errors on (counts)
are log-normal. The assumption of normality is not unreasonable since these
regional counts are the sum of counts across multiple haul-outs. We specify
independent observation errors with different variances by specifying that
v ∼ MVN(0, R), where
r1 0 0 0
 
0 r 0 0
2
R= (7.5)
 
 0 0 r3 0 

0 0 0 r4
This is a diagonal matrix with unequal variances. The shortcut for this
structure in MARSS() is "diagonal and unequal".
7.3.2 Fitting the model
We need to write the model in the form of Equation (7.1) with each parameter
written as a matrix. The observation model (Equation (7.4)) is already in
matrix form. Let’s write the state model in matrix form too:
[x]t = [1][x]t−1 + [u] + [w]t , where [w]t ∼ N(0, [q]) (7.6)
It is very simple since all terms are 1 × 1 matrices.

To fit our model with MARSS(), we set up a list which precisely describes
the size and structure of each parameter matrix. Fixed values in a matrix
are designated with their numeric value and estimated values are given a
character name and put in quotes. Our model list for a single well-mixed
population is:
mod.list.0 <- list(B = matrix(1), U = matrix("u"), Q = matrix("q"),
Z = matrix(1, 4, 1), A = "scaling", R = "diagonal and unequal",
x0 = matrix("mu"), tinitx = 0)
and fit:
fit.0 <- MARSS(dat, model = mod.list.0)

MARSS fit is
7.3. A SINGLE WELL-MIXED POPULATION 193

Log-likelihood: 21.62931
AIC: -23.25863 AICc: -19.02786
Estimate
A.SJI 0.79583
A.EBays 0.27528
A.PSnd -0.54335
R.(SJF,SJF) 0.02883
R.(SJI,SJI) 0.03063
R.(EBays,EBays) 0.01661
R.(PSnd,PSnd) 0.01168
U.u 0.05537
Q.q 0.00642
x0.mu 6.22810

We already discussed that the short-cut "diagonal and unequal" means a

diagonal matrix with each diagonal element having a different value. The
short-cut "scaling" means the form of a in Equation (7.4) with one value
set to 0 and the rest estimated. You should run the code in the list to make
sure you see that each parameter in the list has the same form as in our
mathematical equation for the model.
7.3.3 Model residuals
The model fits fine but look at the model residuals (Figure 7.2). They have
problems.
resids <- residuals(fit.0)

for (i in 1:4) {
plot(resids$model.residuals[i, ], ylab = "model residuals",
xlab = "")
abline(h = 0)
title(rownames(dat)[i])
}
SJF SJI
model residuals
model residuals
0.1
0.1
−0.2
−0.3
5 10 15 20 5 10 15 20
EBays PSnd
model residuals
model residuals
0.05
0.1
−0.20
−0.2
5 10 15 20 5 10 15 20
Figure 7.2: The model residuals for the first model. SJI and EBays do not
look good.
7.4. FOUR SUBPOPULATIONS WITH TEMPORALLY UNCORRELATED ERRORS195
7.4 Four subpopulations with temporally un-

correlated errors
The model for one well-mixed population was not very good. Another
reasonable assumption is that the different census regions are measuring four
different temporally independent subpopulations. We write a model of the
log subpopulation abundances for this case as:
x1 1 0 0 0 x1 u w1
        
x  0 1 0 0 x2 
   u w2 
 
 2
  = +
 + 
 
x3  0 0 1 0 x3  u w3 
 
x4 t 0 0 0 1 x4 t−1 u w4 t
q 0 0 0
  
 0 q 0 0 
where wt ∼ MVN 0,  (7.7)
  
 0 0 q 0

0 0 0 q
x1 µ1
   
x  µ 
 2  2
  = 
x3  µ3 
x4 0 µ4 t
The Q matrix is diagonal with one variance value. This means that the process
variance (variance in year-to-year population growth rates) is independent
(good and bad years are not correlated) but the level of variability is the same
across regions. We made the u matrix with one u value. This means that we
assume the population growth rates are the same across regions.
Notice that we set the B matrix equal to a diagonal matrix with 1 on the
diagonal. This is the “identity” matrix and it is like a 1 but for matrices. We
do not need B for our model, but MARSS() requires a value.
7.4.1 The observation process
In this model, each survey is an observation of a different x:
y1 1 0 0 0 x1 0 v1
        
y2 0 1 0 0 0
 x2  v2
       
=   +  + (7.8)
      
y3 0 0 1 0 x3  0 v3
  
     
y4 t
0 0 0 1 x4 t 0 v4 t
No a’s can be estimated since we do not have multiple observations of a given

x time series. Our R matrix doesn’t change; the observation errors are still
assumed to the independent with different variances.
Notice that our Z matrix changed. Z is specifying which yi goes to which xj .
The one we have specified means that y1 is observing x1 , y2 observes x2 , etc.
We could have set up Z like so
0 1 0 0
 
1 0 0 0
(7.9)
 
0 0 0 1
 
0 0 1 0
This would mean that y1 observes x2 , y2 observes x1 , y3 observes x4 , and y4

observes x3 . Which x goes to which y is arbitrary; we need to make sure it is
one-to-one. We will stay with Z as an identity matrix since yi observing xi
makes it easier to remember which x goes with which y.
We set up the model list for MARSS() as:

mod.list.1 <- list(B = "identity", U = "equal", Q = "diagonal and equal",
Z = "identity", A = "scaling", R = "diagonal and unequal",
x0 = "unequal", tinitx = 0)
We introduced a few more short-cuts. "equal" means all the values in

the matrix are the same. "diagonal and equal" means that the matrix is
diagonal with one value on the diagonal. "unequal" means that all values in
the matrix are different.
7.5. FOUR SUBPOPULATIONS WITH TEMPORALLY CORRELATED ERRORS197
We can then fit our model for 4 subpopulations as:

fit.1 <- MARSS::MARSS(dat, model = mod.list.1)
7.5 Four subpopulations with temporally

correlated errors
Another reasonable assumption is that the different census regions are mea-
suring different subpopulations but that the year-to-year population growth
rates are correlated (good and bad year coincide). The only parameter that
changes is the Q matrix:
q c c c
 
c q c c
Q= (7.10)
 
c c q c

c c c q
This Q matrix structure means that the process variance (variance in year-to-
year population growth rates) is the same across regions and the covariance
in year-to-year population growth rates is also the same across regions.
Set up the model list for MARSS() as:

mod.list.2 <- mod.list.1
mod.list.2$Q <- "equalvarcov"
"equalvarcov" is a shortcut for the matrix form in Equation (7.10).

Fit the model with:
fit.2 <- MARSS::MARSS(dat, model = mod.list.2)
Results are not shown, but here are the AICc. This last model is much better:
c(fit.0$AICc, fit.1$AICc, fit.2$AICc)
[1] -19.02786 -22.20194 -41.00511

7.5.2 Model residuals
Look at the model residuals (Figure 7.3). They are also much better.
SJF SJI
model residuals
model residuals
0.02
0.1
−0.06
−0.2
5 10 15 20 5 10 15 20
EBays PSnd
model residuals
model residuals
0.02
−0.15 0.00
−0.04
5 10 15 20 5 10 15 20
Figure 7.3: The model residuals for the model with four temporally correlated
subpopulations.
Figure 7.4 shows the estimated states for each region using this code:
for (i in 1:4) {
plot(years, fit.2$states[i, ], ylab = "log subpopulation estimate",
xlab = "", type = "l")
lines(years, fit.2$states[i, ] - 1.96 * fit.2$states.se[i,
], type = "l", lwd = 1, lty = 2, col = "red")
lines(years, fit.2$states[i, ] + 1.96 * fit.2$states.se[i,
], type = "l", lwd = 1, lty = 2, col = "red")
title(rownames(dat)[i])
}
7.5. FOUR SUBPOPULATIONS WITH TEMPORALLY CORRELATED ERRORS199
SJF SJI
log subpopulation estimate
8.5
7.5
8.0
7.0
7.5
6.5
7.0
1980 1985 1990 1995 1980 1985 1990 1995
EBays PSnd

7.8
7.0
7.4
6.6
7.0
6.2
5.8
6.6
1980 1985 1990 1995 1980 1985 1990 1995
Figure 7.4: Plot of the estimate of log harbor seals in each region. The 95%
confidence intervals on the population estimates are the dashed lines. These
are not the confidence intervals on the observations, and the observations
(the numbers) will not fall between the confidence interval lines.
7.6 Using MARSS models to study spatial

structure
For our next example, we will use MARSS models to test hypotheses about
the population structure of harbor seals on the west coast. For this example,
we will evaluate the support for different population structures (numbers of
subpopulations) using different Zs to specify how survey regions map onto
subpopulations. We will assume correlated process errors with the same
magnitude of process variance and covariance. We will assume independent
observations errors with equal variances at each site. We could do unequal
variances but it takes a long time to fit so for this example, the observation
variances are set equal.
The dataset we will use is harborSeal, a 29-year dataset of abundance indices
for 12 regions along the U.S. west coast between 1975-2004 (Figure 7.5).
We start by setting up our data matrix. We will leave off Hood Canal.
dat <- MARSS::harborSeal
years <- dat[, "Year"]
good <- !(colnames(dat) %in% c("Year", "HoodCanal"))
sealData <- t(dat[, good])
7.7 Hypotheses regarding spatial structure
We will evaluate the data support for the following hypotheses about the
population structure:
• H1: stock 3 subpopulations defined by management units
• H2: coast+PS 2 subpopulations defined by coastal versus WA inland
• H3: N+S 2 subpopulations defined by north and south split in the middle
of Oregon
• H4:NC+strait+PS+SC 4 subpopulations defined by N coastal, S coastal,
SJF+Georgia Strait, and Puget Sound
• H5: panmictic All regions are part of the same panmictic population
• H6: site Each of the 11 regions is a subpopulation
7.7. HYPOTHESES REGARDING SPATIAL STRUCTURE 201
CoastalEstuaries OlympicPeninsula StraitJuanDeFuca SanJuanIslands
10
EasternBays PugetSound HoodCanal CA.Mainland
10
9
value
CA.ChannelIslands OR.NorthCoast OR.SouthCoast Georgia.Strait
10
6
1980 1990 2000 1980 1990 2000 1980 1990 2000 1980 1990 2000
Year
Figure 7.5: Plot of log counts at each survey region in the harborSeal dataset.
Each region is an index of the harbor seal abundance in that region.
These hypotheses translate to these Z matrices (H6 not shown; it is an identity

matrix):
H1 H2 H4 H5
pnw ps ca coast pc nc is ps sc pan
       
Coastal Estuaries 1 0 0 1 0 1 0 0 0 1
Olympic Peninsula 1 0 0 1 0
   
1 0 0
 0
1
 
Str. Juan de Fuca 0 1 0 0 1 0 1 0 0 1
       
       
San Juan Islands 0 1 0 0 1
   
0 1 0
 0
1
 
Eastern Bays 0 1 0 0 1 0 0 1 0 1
       
       
Puget Sound 0 1 0 0 1
   
0 0 1
 0
1
 
CA Mainland 0 0 1 1 0
   
0 0 0

1 1
 
CA Channel Islands 0 0 1 1 0 0 0 0 1 1
       
       
OR North Coast 1 0 0 1 0
   
1 0 0
 0
1
 
OR South Coast 1 0 0 1 0 0 0 0 1 1
       
Georgia Strait 0 1 0 0 1 0 1 0 0 1
To tell MARSS() the form of Z, we construct the same matrix in R. For

example, for hypotheses 1, we can write:
Z.model <- matrix(0,11,3)
Z.model[c(1,2,9,10),1] <- 1 #which elements in col 1 are 1
Z.model[c(3:6,11),2] <- 1 #which elements in col 2 are 1
Z.model[7:8,3] <- 1 #which elements in col 3 are 1
Or we can use a short-cut by specifying Z as a factor that has the name of

the subpopulation associated with each row in y. For hypothesis 1, this is
Z1 <- factor(c("pnw", "pnw", rep("ps", 4), "ca", "ca", "pnw",
"pnw", "ps"))
Notice it is 11 elements in length; one element for each row of data.
7.8 Set up the hypotheses as different models
Only the Z matrices change for our model. We will set up a base model list
used for all models.
7.8. SET UP THE HYPOTHESES AS DIFFERENT MODELS 203
mod.list = list(
B = "identity",
U = "unequal",
Q = "equalvarcov",
Z = "placeholder",
A = "scaling",
R = "diagonal and equal",
x0 = "unequal",
tinitx = 0 )
Then we set up the Z matrices using the factor short-cut.

Z.models <- list(
H1 = factor(c("pnw","pnw",rep("ps",4),"ca","ca","pnw","pnw","ps")),
H2 = factor(c(rep("coast",2),rep("ps",4),rep("coast",4),"ps")),
H3 = factor(c(rep("N",6),"S","S","N","S","N")),
H4 = factor(c("nc","nc","is","is","ps","ps","sc","sc","nc","sc","is")),
H5 = factor(rep("pan",11)),
H6 = factor(1:11) #site
)
names(Z.models) <-
c("stock","coast+PS","N+S","NC+strait+PS+SC","panmictic","site")
7.8.1 Fit the models
We loop through the models, fit and store the results:

out.tab <- NULL
fits <- list()
for (i in 1:length(Z.models)) {
mod.list$Z <- Z.models[[i]]
fit <- MARSS::MARSS(sealData, model = mod.list, silent = TRUE,
control = list(maxit = 1000))
out <- data.frame(H = names(Z.models)[i], logLik = fit$logLik,
AICc = fit$AICc, num.param = fit$num.params, m = length(unique(Z.models[[i]])
num.iter = fit$numIter, converged = !fit$convergence)
out.tab <- rbind(out.tab, out)
fits <- c(fits, list(fit))

}
We will use AICc and AIC weights to summarize the data support for the
different hypotheses. First we will sort the fits based on AICc:
min.AICc <- order(out.tab$AICc)
out.tab.1 <- out.tab[min.AICc, ]
Next we add the ∆AICc values by subtracting the lowest AICc:

out.tab.1 <- cbind(out.tab.1, delta.AICc = out.tab.1$AICc - out.tab.1$AICc[1])
Relative likelihood is defined as exp(−∆AICc/2).

out.tab.1 <- cbind(out.tab.1, rel.like = exp(-1 * out.tab.1$delta.AICc/2))
The AIC weight for a model is its relative likelihood divided by the sum of
all the relative likelihoods.
out.tab.1 <- cbind(out.tab.1, AIC.weight = out.tab.1$rel.like/sum(out.tab.1$re
Let’s look at the model weights (out.tab.1):

H delta.AICc AIC.weight converged
NC+strait+PS+SC 0.00 0.979 TRUE
site 7.65 0.021 TRUE
N+S 36.97 0.000 TRUE
stock 47.02 0.000 TRUE
coast+PS 48.78 0.000 TRUE
panmictic 71.67 0.000 TRUE
7.9 Fitting a MARSS model with JAGS

Here we show you how to fit a MARSS model for the harbor seal data using
JAGS. We will focus on four time series from inland Washington and set up
the data as follows:
sites <- c("SJF", "SJI", "EBays", "PSnd")
7.9. FITTING A MARSS MODEL WITH JAGS 205
Y <- harborSealWA[, sites]

Y <- t(Y) # time across columns
We will fit the model with four temporally independent subpopulations with
the same population growth rate (u) and year-to-year variance (q). This is
the model in Section 7.4.
7.9.1 Writing the model in JAGS
The first step is to write this model in JAGS. See Chapter 12 for more
information on and examples of JAGS models.
jagsscript <- cat("
model {
U ~ dnorm(0, 0.01);
tauQ~dgamma(0.001,0.001);
Q <- 1/tauQ;
# Estimate the initial state vector of population abundances

for(i in 1:nSites) {
X[i,1] ~ dnorm(3,0.01); # vague normal prior
}
# Autoregressive process for remaining years

for(t in 2:nYears) {
predX[i,t] <- X[i,t-1] + U;
X[i,t] ~ dnorm(predX[i,t], tauQ);
}
}
# Observation model
# The Rs are different in each site
tauR[i]~dgamma(0.001,0.001);
R[i] <- 1/tauR[i];
}
for(t in 1:nYears) {
Y[i,t] ~ dnorm(X[i,t],tauR[i]);
}
}
}
",
file = "marss-jags.txt")
7.9.2 Fit the JAGS model
{#sec-mss-fit-jags}
Then we write the data list, parameter list, and pass the model to the jags()
function:
jags.data <- list(Y = Y, nSites = nrow(Y), nYears = ncol(Y)) # named list
jags.params <- c("X", "U", "Q", "R")
model.loc <- "marss-jags.txt" # name of the txt file
mod_1 <- jags(jags.data, parameters.to.save = jags.params, model.file = model.
DIC = TRUE)
7.9.3 Plot the posteriors for the estimated states
We can plot any of the variables we chose to return to R in the jags.params

list. Let’s focus on the X. When we look at the dimension of the X, we can
use the apply() function to calculate the means and 95 percent CIs of the
estimated states.
# attach.jags attaches the jags.params to our workspace
attach.jags(mod_1)
means <- apply(X, c(2, 3), mean)
upperCI <- apply(X, c(2, 3), quantile, 0.975)
lowerCI <- apply(X, c(2, 3), quantile, 0.025)
7.9. FITTING A MARSS MODEL WITH JAGS 207

nYears <- ncol(Y)
for (i in 1:nrow(means)) {
plot(means[i, ], lwd = 3, ylim = range(c(lowerCI[i, ], upperCI[i,
])), type = "n", main = colnames(Y)[i], ylab = "log abundance",
xlab = "time step")
polygon(c(1:nYears, nYears:1, 1), c(upperCI[i, ], rev(lowerCI[i,
]), upperCI[i, 1]), col = "skyblue", lty = 0)
lines(means[i, ], lwd = 3)
title(rownames(Y)[i])
}
SJF SJI
log abundance
log abundance
8.0
7.0
7.0
6.0
5 10 15 20 5 10 15 20
time step time step
EBays PSnd
log abundance
log abundance
6.5
7.4
5.0
6.6
5 10 15 20 5 10 15 20
time step time step
Figure 7.6: Plot of the posterior means and credible intervals for the estimated
states.
detach.jags()
7.10 Fitting a MARSS model with Stan
Let’s fit the same model as in Section 7.9 with Stan using the rstan package. If
you have not already, you will need to install the rstan package. This package
depends on a number of other packages which should install automatically
when you install rstan.
First we write the model. We could write this to a file (recommended), but for
this example, we write as a character object. Though the syntax is different
from the JAGS code, it has many similarities. Note that Stan does not allow
missing values in the data, thus we need to pass in only the non-missing
values along with the row and column indices of those values. The latter is
so we can match them to the appropriate state (x) values.
scode <- "
data {
int<lower=0> TT; // length of ts
int<lower=0> N; // num of ts; rows of y
int<lower=0> n_pos; // number of non-NA values in y
int<lower=0> col_indx_pos[n_pos]; // col index of non-NA vals
int<lower=0> row_indx_pos[n_pos]; // row index of non-NA vals
vector[n_pos] y;
}
parameters {
vector[N] x0; // initial states
real u;
vector[N] pro_dev[TT]; // refed as pro_dev[TT,N]
real<lower=0> sd_q;
real<lower=0> sd_r[N]; // obs variances are different
}
transformed parameters {
vector[N] x[TT]; // refed as x[TT,N]
for(i in 1:N){
x[1,i] = x0[i] + u + pro_dev[1,i];
for(t in 2:TT) {
x[t,i] = x[t-1,i] + u + pro_dev[t,i];
}
}
7.10. FITTING A MARSS MODEL WITH STAN 209
}
model {
sd_q ~ cauchy(0,5);
for(i in 1:N){
x0[i] ~ normal(y[i],10); // assume no missing y[1]
sd_r[i] ~ cauchy(0,5);
for(t in 1:TT){
pro_dev[t,i] ~ normal(0, sd_q);
}
}
u ~ normal(0,2);
for(i in 1:n_pos){
y[i] ~ normal(x[col_indx_pos[i], row_indx_pos[i]], sd_r[row_indx_pos[i]]);
}
}
generated quantities {
vector[n_pos] log_lik;
for (n in 1:n_pos) log_lik[n] = normal_lpdf(y[n] | x[col_indx_pos[n], row_indx_pos[n
}
"
Then we call stan() and pass in the data, names of parameter we wish to
have returned, and information on number of chains, samples (iter), and
thinning. The output is verbose (hidden here) and may have some warnings.
ypos <- Y[!is.na(Y)]
n_pos <- length(ypos) #number on non-NA ys
indx_pos <- which(!is.na(Y), arr.ind = TRUE) #index on the non-NAs
col_indx_pos <- as.vector(indx_pos[, "col"])
row_indx_pos <- as.vector(indx_pos[, "row"])
mod <- rstan::stan(model_code = scode, data = list(y = ypos,
TT = ncol(Y), N = nrow(Y), n_pos = n_pos, col_indx_pos = col_indx_pos,
row_indx_pos = row_indx_pos), pars = c("sd_q", "x", "sd_r",
"u", "x0"), chains = 3, iter = 1000, thin = 1)
We use extract() to extract the parameters from the fitted model and then
the means and 95% credible intervals.
pars <- rstan::extract(mod)

means <- apply(pars$x, c(2, 3), mean)
upperCI <- apply(pars$x, c(2, 3), quantile, 0.975)
lowerCI <- apply(pars$x, c(2, 3), quantile, 0.025)
colnames(means) <- colnames(upperCI) <- colnames(lowerCI) <- rownames(Y)
No id variables; using all as measure variables

SJF SJI
6
mean
EBays PSnd
1980 1985 1990 1995 2000 1980 1985 1990 1995 2000
year
Figure 7.7: Estimated level and 95 percent credible intervals.

7.11. PROBLEMS 211
7.11 Problems
For these questions, use the harborSealWA data set in MARSS. The data
are already logged, but you will need to remove the year column and have
time going across the columns not down the rows.
require(MARSS)
dat <- t(harborSealWA[, 2:6])
The sites are San Juan de Fuca (SJF 3), San Juan Islands (SJI 4), Eastern
Bays (EBays 5), Puget Sound (PSnd 6) and Hood Canal (HC 7).
1. Plot the harbor seal data. Use whatever plotting functions you wish
(e.g. ggplot(), plot(); points(); lines(), matplot()).
2. Fit a panmictic population model that assumes that each of the 5 sites
is observing one “Inland WA” harbor seal population with trend u.
Assume the observation errors are independent and identical. This
means 1 variance on diagonal and 0s on off-diagonal. This is the default
assumption for MARSS().
a. Write the Z for this model. The code to use for making a matrix
in Rmarkdown is
$$\begin{bmatrix}a & b & 0\\d & e & f\\0 & h & i\end{bmatrix}$$
b. Write the Z matrix in R using Z=matrix(...) and using the
factor short-cut for specifying Z. Z=factor(c(...).
c. Fit the model using MARSS(). What is the estimated trend (u)?
How fast was the population increasing (percent per year) based
on this estimated u?
d. Compute the confidence intervals for the parameter estimates.
Compare the intervals using the Hessian approximation and using
a parametric bootstrap. What differences do you see between the
two approaches? Use this code:
library(broom)
tidy(fit)
# set nboot low so it doesn't take forever
Figure 7.8: Regions in the harbor seal surveys

7.11. PROBLEMS 213
tidy(fit, method="parametric",nboot=100)
e. What does an estimate of Q = 0 mean? What would the estimated
state (x) look like when Q = 0?
3. Using the same panmictic population model, compare 3 assumptions
about the observation error structure.
• The observation errors are independent with different variances.
• The observation errors are independent with the same variance.
• The observation errors are correlated with the same variance and
same correlation.
a. Write the R variance-covariance matrices for each assumption.
b. Create each R matrix in R. To combine, numbers and characters
in a matrix use a list matrix like so:
A <- matrix(list(0),3,3)
A[1,1] <- "sigma2"
c. Fit each model using MARSS() and compute the confidence intervals
(CIs) for the estimated parameters. Compare the estimated u
(the population long-term trend) along with their CIs. Does the
assumption about the observation errors change the u estimate?
d. Plot the state residuals, the ACF of the state residuals, and the
histogram of the state residuals for each fit. Are there any issues
that you see? Use this code to get your state residuals:
residuals(fit)$state.residuals[1,]
You need the [1,] since the residuals are returned as a matrix.
4. Fit a model with 3 subpopulations. 1=SJF,SJI; 2=PS,EBays; 3=HC.
The x part of the model is the population structure. Assume that the
observation errors are identical and independent (R="diagonal and
equal"). Assume that the process errors are unique and independent
(Q="diagonal and unequal"). Assume that the u are unique among
the 3 subpopulation.
a. Write the x equation. Make sure each matrix in the equation has
the right number of rows and columns.
b. Write the Z matrix.

c. Write the Z in R using Z=matrix(...) and using the factor
shortcut Z=factor(c(...)).
d. Fit the model with MARSS().
e. What do the estimated u and Q imply about the population
dynamics in the 3 subpopulations?
5. Repeat the fit from Question 4 but assume that the 3 subpopulations
covary. Use Q="unconstrained".
a. What does the estimated Q matrix tell you about how the 3
subpopulation covary?
b. Compare the AICc from the model in Question 4 and the one with
Q="unconstrained". Which is more supported?
c. Fit the model with Q="equalvarcov". Is this more supported
based on AICc?
6. Develop the following alternative models for the structure of the inland
harbor seal population. For each model assume that the observation
errors are identical and independent (R="diagonal and equal"). As-
sume that the process errors covary with equal variance and covariances
(Q="equalvarcov").
• 5 subpopulations with unique u.
• 5 subpopulations with shared (equal) u.
• 5 subpopulations but with u shared in some regions: SJF+SJI
shared, PS+EBays shared, HC unique.
• 1 panmictic population.
• 3 subpopulations, 1=SJF,SJI, 2=PS,EBays, 3=HC, with unique u
• 2 subpopulations, 1=SJF,SJI,PS,EBays, 2=HC, with unique u
a. Fit each model using MARSS().
b. Prepare a table of each model with a column for the AICc values.
And a column for ∆AICc (AICc minus the lowest AICc in the
group). What is the most supported model?
7. Do diagnostics on the model residuals for the 3 subpopulation model
from question 4. Use the following code to get your model residuals.
7.11. PROBLEMS 215
This will put NAs in the model residuals where there is missing data.
Then do the tests on each row of resids.
resids <- residuals(fit)$model.residuals
resids[is.na(dat)] <- NA
a. Plot the model residuals.

b. Plot the ACF of the model residuals. Use acf(...,
na.action=na.pass).
c. Plot the histogram of the model residuals.
d. Fit an ARIMA() model to your model residuals using
forecast::auto.arima(). Are the best fit models what
you want? Note, we cannot use the Augmented Dickey-Fuller or
KPSS tests when there are missing values in our residuals time
series.
Chapter 8
MARSS models with covariates
Rmd for this chapter can be downloaded here
Data and packages
For the chapter examples, we will use the green and bluegreen algae in the Lake
Washington plankton data set and the covariates in that dataset. This is a 32-
year time series (1962-1994) of monthly plankton counts (cells per mL) from
Lake Washington, Washington, USA with the covariates total phosphorous
and pH. lakeWAplanktonTrans is a transformed version of the raw data used
for teaching purposes. Zeros have been replaced with NAs (missing). The
logged (natural log) raw plankton counts have been standardized to a mean
of zero and variance of 1 (so logged and then z-scored). Temperature, TP and
pH were also z-scored but not logged (so z-score of the untransformed values
for these covariates). The single missing temperature value was replaced with
-1 and the single missing TP value was replaced with -0.3.
We will use the 10 years of data from 1965-1974 (Figure 8.1), a decade with
particularly high green and bluegreen algae levels.
data(lakeWAplankton, package = "MARSS")
# lakeWA
fulldat = lakeWAplanktonTrans
217
218 CHAPTER 8. MARSS WITH COVARIATES
years = fulldat[, "Year"] >= 1965 & fulldat[, "Year"] < 1975
dat = t(fulldat[years, c("Greens", "Bluegreens")])
covariates = t(fulldat[years, c("Temp", "TP")])
Packages:
library(MARSS)
library(ggplot2)
8.1 Overview
A multivariate autoregressive state-space (MARSS) model with covariate
effects in both the process and observation components is written as:
xt = Bt xt−1 + ut + Ct ct + wt , where wt ∼ MVN(0, Qt )

(8.1)
yt = Zt xt + at + Dt dt + vt , where vt ∼ MVN(0, Rt )
where ct is the p × 1 vector of covariates (e.g., temperature, rainfall) which

affect the states and dt is a q × 1 vector of covariates (potentially the same
as ct ), which affect the observations. Ct is an m × p matrix of coefficients
relating the effects of ct to the m × 1 state vector xt , and Dt is an n × q
matrix of coefficients relating the effects of dt to the n × 1 observation vector
yt .
With the MARSS() function, one can fit this model by passing in model$c
and/or model$d in the model argument as a p×T or q×T matrix, respectively.
The form for Ct and Dt is similarly specified by passing in model$C and/or
model$D. C and D are matrices and are specified as 2-dimensional matrices
as you would other parameter matrices.
8.2 Prepare the plankton data

We will prepare the data by z-scoring. The original data lakeWAplanktonTrans
were already z-scored, but we changed the mean when we subsampled the
years so we need to z-score again.
8.3. OBSERVATION-ERROR ONLY MODEL 219
# z-score the response variables

the.mean = apply(dat, 1, mean, na.rm = TRUE)
the.sigma = sqrt(apply(dat, 1, var, na.rm = TRUE))
dat = (dat - the.mean) * (1/the.sigma)
Next we set up the covariate data, temperature and total phosphorous. We

z-score the covariates to standardize and remove the mean.
the.mean = apply(covariates, 1, mean, na.rm = TRUE)
the.sigma = sqrt(apply(covariates, 1, var, na.rm = TRUE))
covariates = (covariates - the.mean) * (1/the.sigma)
Greens
1
−1
−3
Bluegreens
0 1 2
−2
1
Temp
−1 0
TP
1
−1
1966 1968 1970 1972 1974
Time
Figure 8.1: Time series of Green and Bluegreen algae abundances in Lake
Washington along with the temperature and total phosporous covariates.
8.3 Observation-error only model

We can estimate the effect of the covariates using a process-error only model,
an observation-error only model, or a model with both types of error. An
observation-error only model is a multivariate regression, and we will start
here so you see the relationship of MARSS model to more familiar linear
regression models.
In a standard multivariate linear regression, we only have an observation
model with independent errors (the state process does not appear in the
model):
yt = a + Ddt + vt , where vt ∼ MVN(0, R) (8.2)
The elements in a are the intercepts and those in D are the slopes (effects).
We have dropped the t subscript on a and D because these will be modeled as
time-constant. Writing this out for the two plankton and the two covariates
we get:
" # " # " #" # " #
yg a β βg,tp temp v
= 1 + g,temp + 1 (8.3)
ybg t
a2 βbg,temp βbg,tp tp t−1
v2 t
Let’s fit this model with MARSS. The x part of the model is irrelevant so we
want to fix the parameters in that part of the model. We won’t set B = 0 or
Z = 0 since that might cause numerical issues for the Kalman filter. Instead
we fix them as identity matrices and fix x0 = 0 so that xt = 0 for all t.
Q <- U <- x0 <- "zero"
B <- Z <- "identity"
d <- covariates
A <- "zero"
D <- "unconstrained"
y <- dat # to show relationship between dat & the equation
model.list <- list(B = B, U = U, Q = Q, Z = Z, A = A, D = D,
d = d, x0 = x0)
kem <- MARSS(y, model = model.list)
Success! algorithm run for 15 iterations. abstol and log-log tests passed.
MARSS fit is
Algorithm ran 15 (=minit) iterations and convergence was reached.
8.4. PROCESS-ERROR ONLY MODEL 221
AIC: 562.8573 AICc: 563.1351
Estimate
R.diag 0.706
D.(Greens,Temp) 0.367
D.(Bluegreens,Temp) 0.392
D.(Greens,TP) 0.058
D.(Bluegreens,TP) 0.535

We set A="zero" because the data and covariates have been demeaned. Of
course, one can do multiple regression in R using, say, lm(), and that would
be much, much faster. The EM algorithm is over-kill here, but it is shown so
that you see how a standard multivariate linear regression model is written
as a MARSS model in matrix form.
8.4 Process-error only model

Now let’s model the data as an autoregressive process observed without error,
and incorporate the covariates into the process model. Note that this is much
different from typical linear regression models. The x part represents our
model of the data (in this case plankton species). How is this different from
the autoregressive observation errors? Well, we are modeling our data as
autoregressive so data at t − 1 affects the data at t. Population abundances
are inherently autoregressive so this model is a bit closer to the underlying
mechanism generating the data. Here is our new process model for plankton
abundance.
xt = xt−1 + Cct + wt , where wt ∼ MVN(0, Q) (8.4)
We can fit this as follows:
R <- A <- U <- "zero"
B <- Z <- "identity"
Q <- "equalvarcov"
C <- "unconstrained"
model.list <- list(B = B, U = U, Q = Q, Z = Z, A = A, R = R,
C = C, c = covariates)
kem <- MARSS(dat, model = model.list)
MARSS fit is
AIC: 586.1465 AICc: 586.8225
Estimate
Q.diag 0.7269
Q.offdiag -0.0210
x0.X.Greens -0.5189
x0.X.Bluegreens -0.2431
C.(X.Greens,Temp) -0.0434
C.(X.Bluegreens,Temp) 0.0988
C.(X.Greens,TP) -0.0589
C.(X.Bluegreens,TP) 0.0104

Now, it looks like temperature has a strong negative effect on algae? Also
our log-likelihood dropped a lot. Well, the data do not look at all like a
random walk model (where B = 1), which we can see from the plot of the
data (Figure 8.1). The data are fluctuating about some mean so let’s switch
to a better autoregressive model—a mean-reverting model. To do this, we
will allow the diagonal elements of B to be something other than 1.
8.4. PROCESS-ERROR ONLY MODEL 223
model.list$B <- "diagonal and unequal"

MARSS fit is
AIC: 493.2211 AICc: 494.2638
Estimate
B.(X.Greens,X.Greens) 0.1981
B.(X.Bluegreens,X.Bluegreens) 0.7672
Q.diag 0.4899
Q.offdiag -0.0221
x0.X.Greens -1.2915
C.(X.Greens,Temp) 0.2844
C.(X.Greens,TP) 0.0332

Notice that the log-likelihood goes up quite a bit, which means that the
mean-reverting model fits the data much better.
With this model, we are estimating x0 . If we set model$tinitx=1, we will

get a error message that R diagonals are equal to 0 and we need to fix x0.
Because R = 0, if we set the initial states at t = 1, then they are fully
determined by the data.
x0 <- dat[, 1, drop = FALSE]

model.list$tinitx <- 1
model.list$x0 <- x0
MARSS fit is
AIC: 486.9653 AICc: 487.6414
Estimate
Q.diag 0.4944
Q.offdiag -0.0223

8.5. BOTH PROCESS- AND OBSERVATION-ERROR 225
8.5 Both process- and observation-error
Here is an example where we have both process and observation error but
the covariates only affect the process:
xt = Bxt−1 + Ct ct + wt , where wt ∼ MVN(0, Q)

(8.5)
yt = xt−1 + vt , where vt ∼ MVN(0, R),
x is the true algae abundances and y is the observation of the x’s.

Let’s say we knew that the observation variance on the algae measurements
was about 0.16 and we wanted to include that known value in the model. To
do that, we can simply add R to the model list from the process-error only
model in the last example.
D <- d <- A <- U <- "zero"
Z <- "identity"
B <- "diagonal and unequal"
Q <- "equalvarcov"
c <- covariates
R <- diag(0.16, 2)
x0 <- "unequal"
tinitx <- 1
D = D, d = d, C = C, c = c, x0 = x0, tinitx = tinitx)

MARSS fit is
AIC: 500.7389 AICc: 501.7815
Estimate
Q.diag 0.33923
Q.offdiag -0.00411
x0.X.Greens -0.52614

Note, our estimates of the effect of temperature and total phosphorous are
not that different than what you get from a simple multiple regression (our
first example). This might be because the autoregressive component is small,
meaning the estimated diagonals on the B matrix are small.
Here is an example where we have both process and observation error but
the covariates only affect the observation process:
xt = Bxt−1 + wt , where wt ∼ MVN(0, Q)

(8.6)
yt = xt−1 + Ddt vt , where vt ∼ MVN(0, R),
x is the true algae abundances and y is the observation of the x’s.

C <- c <- A <- U <- "zero"
Z <- "identity"
Q <- "equalvarcov"
D <- "unconstrained"
d <- covariates
R <- diag(0.16, 2)
x0 <- "unequal"
tinitx <- 1
8.6. INCLUDING SEASONAL EFFECTS IN MARSS MODELS 227
D = D, d = d, C = C, c = c, x0 = x0, tinitx = tinitx)


MARSS fit is
AIC: 499.1759 AICc: 500.2185
Estimate
Q.diag 0.314
Q.offdiag -0.030
x0.X.Greens -0.121
D.(Greens,Temp) 0.373
D.(Bluegreens,Temp) 0.276
D.(Greens,TP) 0.042
D.(Bluegreens,TP) 0.115

8.6 Including seasonal effects in MARSS

models
Time-series data are often collected at intervals with some implicit “seasonality.”
For example, quarterly earnings for a business, monthly rainfall totals, or
hourly air temperatures. In those cases, it is often helpful to extract any

recurring seasonal patterns that might otherwise mask some of the other
temporal dynamics we are interested in examining.
Here we show a few approaches for including seasonal effects using the Lake
Washington plankton data, which were collected monthly. The following
examples will use all five phytoplankton species from Lake Washington. First,
let’s set up the data.
years <- fulldat[, "Year"] >= 1965 & fulldat[, "Year"] < 1975
phytos <- c("Diatoms", "Greens", "Bluegreens", "Unicells", "Other.algae")
dat <- t(fulldat[years, phytos])
# z.score data because we changed the mean when we subsampled

the.mean <- apply(dat, 1, mean, na.rm = TRUE)
the.sigma <- sqrt(apply(dat, 1, var, na.rm = TRUE))
dat <- (dat - the.mean) * (1/the.sigma)
# number of time periods/samples
TT <- dim(dat)[2]
8.6.1 Seasonal effects as fixed factors
One common approach for estimating seasonal effects is to treat each one
as a fixed factor, such that the number of parameters equals the number of
“seasons” (e.g., 24 hours per day, 4 quarters per year). The plankton data
are collected monthly, so we will treat each month as a fixed factor. To fit a
model with fixed month effects, we create a 12 × T covariate matrix c with one
row for each month (Jan, Feb, . . . ) and one column for each time point. We
put a 1 in the January row for each column corresponding to a January time
point, a 1 in the February row for each column corresponding to a February
time point, and so on. All other values of c equal 0. The following code will
create such a c matrix.
# number of 'seasons' (e.g., 12 months per year)
period <- 12
# first 'season' (e.g., Jan = 1, July = 7)
per.1st <- 1
# create factors for seasons
c.in <- diag(period)

for (i in 2:(ceiling(TT/period))) {
c.in <- cbind(c.in, diag(period))
}
# trim c.in to correct start & length
c.in <- c.in[, (1:TT) + (per.1st - 1)]
# better row names
rownames(c.in) <- month.abb
Next we need to set up the form of the C matrix which defines any constraints
we want to set on the month effects. C is a 5 × 12 matrix. Five taxon and
12 month effects. If we wanted each taxon to have the same month effect,
i.e. there is a common month effect across all taxon, then we have the same
value in each C column1 :
C <- matrix(month.abb, 5, 12, byrow = TRUE)
C
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
[1,] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov" "Dec"
Notice, that C only has 12 values in it, the 12 common month effects. However,
for this example, we will let each taxon have a different month effect thus
allowing different seasonality for each taxon. For this model, we want each
value in C to be unique:
Now C has 5 × 12 = 60 separate effects.

Then we set up the form for the rest of the model parameters. We make the
following assumptions:
# Each taxon has unique density-dependence
1
‘month.abb‘ is a R constant that gives month abbreviations in text.
# Assume independent process errors

Q <- "diagonal and unequal"
# We have demeaned the data & are fitting a mean-reverting
# model by estimating a diagonal B, thus
U <- "zero"
# Each obs time series is associated with only one process
Z <- "identity"
# The data are demeaned & fluctuate around a mean
A <- "zero"
# We assume observation errors are independent, but they have
# similar variance due to similar collection methods
R <- "diagonal and equal"
# We are not including covariate effects in the obs equation
D <- "zero"
d <- "zero"
Now we can set up the model list for MARSS and fit the model (results are
not shown since they are verbose with 60 different month effects).
C = C, c = c.in, D = D, d = d)
seas.mod.1 <- MARSS(dat, model = model.list, control = list(maxit = 1500))
# Get the estimated seasonal effects rows are taxa, cols are
# seasonal effects
seas.1 <- coef(seas.mod.1, type = "matrix")$C
rownames(seas.1) <- phytos
colnames(seas.1) <- month.abb
The top panel in Figure 8.2 shows the estimated seasonal effects for this
model. Note that if we had set U=“unequal”, we would need to set one of the
columns of C to zero because the model would be under-determined (infinite
number of solutions). If we substracted the mean January abundance off
each time series, we could set the January column in C to 0 and get rid of 5
estimated effects.
8.6.2 Seasonal effects as a polynomial
The fixed factor approach required estimating 60 effects. Another approach

is to model the month effect as a 3rd-order (or higher) polynomial: a + b ×
m + c × m2 + d × m3 where m is the month number. This approach has
less flexibility but requires only 20 estimated parameters (i.e., 4 regression
parameters times 5 taxa). To do so, we create a 4 × T covariate matrix c
with the rows corresponding to 1, m, m2 , and m3 , and the columns again
corresponding to the time points. Here is how to set up this matrix:
# number of 'seasons' (e.g., 12 months per year)
period <- 12
# first 'season' (e.g., Jan = 1, July = 7)
per.1st <- 1
# order of polynomial
poly.order <- 3
# create polynomials of months
month.cov <- matrix(1, 1, period)
for (i in 1:poly.order) {
month.cov = rbind(month.cov, (1:12)î)
}
# our c matrix is month.cov replicated once for each year
c.m.poly <- matrix(month.cov, poly.order + 1, TT + period, byrow = FALSE)
# trim c.in to correct start & length
c.m.poly <- c.m.poly[, (1:TT) + (per.1st - 1)]
# Everything else remains the same as in the previous example

C = C, c = c.m.poly, D = D, d = d)
The effect of month m for taxon i is ai + bi × m + ci × m2 + di × m3 , where

ai , bi , ci and di are in the i-th row of C. We can now calculate the matrix of
seasonal effects as follows, where each row is a taxon and each column is a
month:
C.2 = coef(seas.mod.2, type = "matrix")$C
seas.2 = C.2 %*% month.cov
The middle panel in Figure 8.2 shows the estimated seasonal effects for this
polynomial model.
Note: Setting the covariates up like this means that our covariates are collinear
since m, m2 and m3 are correlated, obviously. A better approach is to use
the poly() function to create an orthogonal polynomial covariate matrix
c.m.poly.o:
month.cov.o <- cbind(1, poly(1:period, poly.order))
c.m.poly.o <- matrix(t(month.cov.o), poly.order + 1, TT + period,
byrow = FALSE)
c.m.poly.o <- c.m.poly.o[, (1:TT) + (per.1st - 1)]
8.6.3 Seasonal effects as a Fourier series
The factor approach required estimating 60 effects, and the 3rd order polyno-
mial model was an improvement at only 20 parameters. A third option is to
use a discrete Fourier series, which is combination of sine and cosine waves;
it would require only 10 parameters. Specifically, the effect of month m on
taxon i is ai × cos(2πm/p) + bi × sin(2πm/p), where p is the period (e.g., 12
months, 4 quarters), and ai and bi are contained in the i-th row of C.
We begin by defining the 2 × T seasonal covariate matrix c as a combination

of 1 cosine and 1 sine wave:
cos.t <- cos(2 * pi * seq(TT)/period)
sin.t <- sin(2 * pi * seq(TT)/period)
c.Four <- rbind(cos.t, sin.t)
Everything else remains the same and we can fit this model as follows:
C = C, c = c.Four, D = D, d = d)
We make our seasonal effect matrix as follows:

C.3 <- coef(seas.mod.3, type = "matrix")$C

# The time series of net seasonal effects
seas.3 <- C.3 %*% c.Four[, 1:period]
The bottom panel in Figure 8.2 shows the estimated seasonal effects for this
seasonal-effects model based on a discrete Fourier series.
Diatoms
Greens
Fixed monthly
0.5
Bluegreens
Unicells
Other.algae
−0.5
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Diatoms
0.4
Greens
Bluegreens
Unicells
Cubic
Other.algae
−0.2
−0.8
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Diatoms
0.4
Greens
Bluegreens
Unicells
Fourier
Other.algae
0.0
−0.4
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Figure 8.2: Estimated monthly effects for the three approaches to estimating
seasonal effects. Top panel: each month modelled as a separate fixed effect for
each taxon (60 parameters); Middle panel: monthly effects modelled as a 3rd
order polynomial (20 parameters); Bottom panel: monthly effects modelled
as a discrete Fourier series (10 parameters).
Rather than rely on our eyes to judge model fits, we should formally assess
which of the 3 approaches offers the most parsimonious fit to the data. Here
is a table of AICc values for the 3 models:
data.frame(Model = c("Fixed", "Cubic", "Fourier"), AICc = round(c(seas.mod.1$AICc,
seas.mod.2$AICc, seas.mod.3$AICc), 1))
Model AICc
1 Fixed 1188.4
2 Cubic 1144.9
3 Fourier 1127.4
The model selection results indicate that the model with monthly seasonal
effects estimated via the discrete Fourier sequence is the best of the 3 models.
Its AICc value is much lower than either the polynomial or fixed-effects
models.
8.7 Model diagnostics
We will examine some basic model diagnostics for these three approaches by
looking at plots of the model residuals and their autocorrelation functions
(ACFs) for all five taxa using the following code:
for (i in 1:3) {
dev.new()
modn <- paste("seas.mod", i, sep = ".")
for (j in 1:5) {
plot.ts(residuals(get(modn))$model.residuals[j, ], ylab = "Residual",
main = phytos[j])
abline(h = 0, lty = "dashed")
acf(residuals(get(modn))$model.residuals[j, ], na.action = na.pass)
}
}

8.8 Homework data and discussion
For these problems, use the following code to load in 1980-1994 phytoplankton
data, covariates, and z-score all the data. Run the code below and use dat
and covars directly in your code.
8.8. HOMEWORK DATA AND DISCUSSION 235
Diatoms
Residual
−0.2
0 20 40 60 80 100 120
Time
Series residuals(get(modn))$model.residuals[j, ]
ACF
−0.2
0 5 10 15 20
Lag
Figure 8.3: Residuals for model with season modelled as a discrete Fourier
series.
library(MARSS)
spp <- c("Cryptomonas", "Diatoms", "Greens", "Unicells", "Other.algae",
"Daphnia")
yrs <- lakeWAplanktonTrans[, "Year"] %in% 1980:1994
dat <- t(lakeWAplanktonTrans[yrs, spp])
# z-score the data
avg <- apply(dat, 1, mean, na.rm = TRUE)
sd <- sqrt(apply(dat, 1, var, na.rm = TRUE))
dat <- (dat - avg)/sd
rownames(dat) = spp
# always check that the mean and variance are 1 after
# z-scoring
apply(dat, 1, mean, na.rm = TRUE) #this should be 0
apply(dat, 1, var, na.rm = TRUE) #this should be 1
For the covariates, you’ll use temperature and TP.

covars <- rbind(Temp = lakeWAplanktonTrans[yrs, "Temp"], TP = lakeWAplanktonTrans[yrs
"TP"])
avg <- apply(covars, 1, mean)

sd <- sqrt(apply(covars, 1, var, na.rm = TRUE))
covars <- (covars - avg)/sd
rownames(covars) <- c("Temp", "TP")
# always check that the mean and variance are 1 after
# z-scoring
apply(covars, 1, mean, na.rm = TRUE) #this should be 0
apply(covars, 1, var, na.rm = TRUE) #this should be 1
Here are some guidelines to help you answer the questions:

• Use a MARSS model that allows for both observation and process error.
• Assume that the observation errors are independent and identically
distributed with known variance of 0.10.
• Assume that the process errors are independent from one another, but
the variances differ by taxon.
• Assume that each group is an observation of its own process. This
means Z="identity".
• Use B="diagonal and unequal". This implies that each of the taxa
are operating under varying degrees of density-dependence, and they
are not allowed to interact.
• All the data have been de-meaned and Z is identity, therefore use
U="zero" and A="zero". Make sure to check that the means of the
data are 0 and the variance is 1.
• Use tinitx=1. It makes B estimation more stable. It goes in your
model list.
• Include a plot of residuals versus time and acf of residuals for each
question. You only need to show these for the top (best) model if the
question involves comparing multiple models.
• Use AICc to compare models.
• Some of the models may not converge, however use for the purpose of
the homework, use the unconverged models. Thus use the output from
MARSS() without any additional arguments. If you want, you can try
using control=list(maxit=1000) to increase the number of iterations.
Or you can try method="BFGS" in your MARSS() call. This will use the
BFGS optimization method, however it may throw an error for these
data.
8.9. PROBLEMS 237
8.9 Problems
Read Section 8.8 for the data and tips on answering the questions and setting
up your models. Note the questions asking about the effects on growth rate
are asking about the C matrix in
xt = Bxt−1 + Cct + wt
The Cct + wt are the process errors and represent the growth rates (growth
above or below what you would expect given xt−1 ). Use your raw data in the
MARSS model. You do not need to difference the data to get at the growth
rates since the process model is modeling that.
1. How does month affect the mean phytoplankton population growth
rates? Show a plot of the estimated mean growth rate versus month
for each taxon using three approaches to estimate the month effect
(factor, polynomial, Fourier series). Estimate seasonal effects without
any covariate (Temp, TP) effects.
2. It is likely that both temperature and total phosphorus (TP) affect
phytoplankton population growth rates. Using MARSS models, estimate
the effect of Temp and TP on growth rates of each taxon. Leave out
the seasonal covariates from question 1, i.e. only use Temp and TP as
covariates. Make a plot of the point estimates of the Temp and TP
effects with the 95% CIs added to the plot. tidy() is an easy way to
get the parameters CIs.
3. Estimate the Temp and TP effects using B="unconstrained".
a. Compare the B matrix for the fit from question 2 and from question
3. Describe the species interactions modeled by the B matrix when
B="unconstrained". How is it different than the B matrix from
question 2? Note, you can retrieve the matrix using coef(fit,
type="matrix")$B.
b. Do the Temp and TP effects change when you use B="unconstrained"?
Make sure to look at the CIs also.
4. Using MARSS models, evaluate which (Temp or TP) is the more
important driver or if both are important. Again, leave out the seasonal
covariates from question 1, i.e. only use Temp and TP as covariates.
Compare two approaches: comparison of effect sizes in a model with

both Temp and TP and model selection using a set of models.
5. Evaluate whether the effect of temperature (Temp) on the taxa manifests
itself via their underlying physiology (by affecting growth rates and
thus abundance) or because physical changes in the water stratification
makes them easier/harder to sample in some months. Leave out the
seasonal covariates from question 1, i.e. only use Temp and TP as the
covariates. For TP, assume it always affects growth rates, never the
observation errors.
6. Is there support for temperature or TP affecting all functional groups’
growth rates the same, or are the effects on one taxon different from
another? Make sure to test all possibilities: the Temp and TP effects
are the same for all taxa, and one covariate effect is the same across
taxa while the other’s effects are unique across taxa.
7. Compare your results for question 2 using linear regression, by using
the lm() function. You’ll need to look at the response of each taxon
separately, i.e. one response variable. You can have a multivariate re-
sponse variable with lm() but the functions will be doing 6 independent
linear regressions. In your lm() model, use only Temp and TP (and
intercept) as covariates. Compare the estimated effects to those from
question 2. How are they different? How is this model different from
the model you fit in question 2?
8. Temp and TP are negatively correlated (cor = -0.66). A common
threshold for collinearity in regression models is 0.7. Temp and TP fall
below that but are close. One approach to collinearity is sequential
regression (Dormann et al., 2013). The first (most influential) covariate
is included ‘as is’ and the second covariate appears as the residuals of
a regression of the second against the first. The covariates are now
orthogonal however the second covariate is conditioned on the first.
If we see an effect of the residuals covariate, it is the effect of TP
additional to the contribution it already made through its relationship
with temperature. Rerun question 2 using sequential regression (see
code below).
Make your Temp and TP covariates orthogonal using sequential regression.
Do your conclusions about the effects of Temperature and TP change?
8.9. PROBLEMS 239
Below is code to construct your orthogonal covariates for sequential regression.

fit <- lm(covars[1, ] ~ covars[2, ])
seqcovs <- rbind(covars[1, ], residuals(fit))
avg <- apply(seqcovs, 1, mean)
sd <- sqrt(apply(seqcovs, 1, var, na.rm = TRUE))
seqcovs <- (seqcovs - avg)/sd
rownames(seqcovs) <- c("Temp", "TPresids")
9. Compare the AICc’s of the 3 seasonal models from question 1 and the
4 Temp/TP models from question 5. What does this tell you about the
Temp and TP only models?
10. We cannot just fit a model with season and Temp plus TP since Temp
and TP are highly seasonal. That will cause problems if we have
something that explain season (a polynomial) and a covariate that
has seasonality. Instead, use sequential regression to fit a model with
seasonality, Temp and TP. Use a 3rd order polynomial with the poly()
function to create orthogonal season covariates and then use sequential
regression (code in problem 8) to create Temp and TP covariates that
are orthogonal to your season covariates. Fit the model and compare a
model with only season to a model with season and Temp plus TP.
11. Another approach to looking at effects of covariates which have season
cycles is to examine if the seasonal anomalies of the independent variable
can be explained by the seasonal anomalies of the dependent variables.
In other words, can an unusually high February abundance (higher
than expected) be explained by an unusually high or low February
temperature? In this approach, you remove season so you do not need
to model it (with factor, polynomial, etc). The stl() function can be
used to decompose a time series using LOESS. We’ll use stl() since it
can handle missing values.
a. Decompose the Diatom time series using stl() and plot.
Use na.action=zoo::na.approx to deal with the NAs. Use
s.window="periodic". Other than that you can use the defaults.
i <- "Diatoms"
dati <- ts(dat[i, ], frequency = 12)
a <- stl(dati, "periodic", na.action = zoo::na.approx)
b. Create dependent variables and covariates that are anomolies by modifying t

i <- "Diatoms"
a <- stl(ts(dat[i, ], frequency = 12), "periodic", na.action = zoo::na.approx)
anom <- a$time.series[, "remainder"] + a$time.series[, "trend"]
c. Notice that you have simply removed the seasonal cycle from the data. Using
anoms <- matrix(NA, dim(dat)[1] + dim(covars)[1], dim(dat)[2])
rownames(anoms) <- c(rownames(dat), rownames(covars))
for (i in 1:dim(dat)[1]) {
a <- stl(ts(dat[i, ], frequency = 12), "periodic", na.action = zoo::na.app
anoms[i, ] <- a$time.series[, "remainder"] + a$time.series[,
"trend"]
}
for (i in 1:dim(covars)[1]) {
a <- stl(ts(covars[i, ], frequency = 12), "periodic", na.action = zoo::na.
anoms[i + dim(dat)[1], ] <- a$time.series[, "remainder"] +
a$time.series[, "trend"]
}
Chapter 9
Dynamic linear models
Dynamic linear models (DLMs) are a type of linear regression model, wherein
the parameters are treated as time-varying rather than static. DLMs are
used commonly in econometrics, but have received less attention in the
ecological literature (c.f. Lamon et al., 1998; Scheuerell and Williams, 2005).
Our treatment of DLMs is rather cursory—we direct the reader to excellent
textbooks by Pole et al. (1994) and Petris et al. (2009) for more in-depth
treatments of DLMs. The former focuses on Bayesian estimation whereas the
latter addresses both likelihood-based and Bayesian estimation methods.
Data
Most of the data used in the chapter are from the MARSS package. Install
the package, if needed, and load:
library(MARSS)
The problem set uses an additional data set on spawners and recruits
(KvichakSockeye) in the atsalibrary package.
241
242 CHAPTER 9. DYNAMIC LINEAR MODELS
9.1 Overview
We begin our description of DLMs with a static regression model, wherein
the ith observation (response variable) is a linear function of an intercept,
predictor variable(s), and a random error term. For example, if we had one
predictor variable (f ), we could write the model as
yi = α + βfi + vi , (9.1)
where the α is the intercept, β is the regression slope, fi is the predictor

variable matched to the ith observation (yi ), and vi ∼ N(0, r). It is important
to note here that there is no implicit ordering of the index i. That is, we
could shuffle any/all of the (yi , fi ) pairs in our dataset with no effect on our
ability to estimate the model parameters.
We can write Equation (9.1) using matrix notation, as
" #
h i α
y i = 1 fi + vi
β
= F>
i θ + vi , (9.2)
" #
h i α
where F>
i = 1 fi and θ = .
β
In a DLM, however, the regression parameters are dynamic in that they
“evolve” over time. For a single observation at time t, we can write
yt = F>
t θ t + vt , (9.3)
where Ft is a column vector of predictor variables (covariates) at time t, θ t is

a column vector of regression parameters at time t and vt ∼ N(0, r). This for-
mulation presents two features that distinguish it from Equation (9.2). First,
the observed data are explicitly time ordered (i.e., y = {y1 , y2 , y3 , . . . , yT }),
which means we expect them to contain implicit information. Second, the
relationship between the observed datum and the predictor variables are
unique at every time t (i.e., θ = {θ 1 , θ 2 , θ 3 , . . . , θ T }).
However, closer examination of Equation (9.3) reveals an apparent complica-
tion for parameter estimation. With only one datum at each time step t, we
could, at best, estimate only one regression parameter, and even then, the 1:1
9.2. DLM IN STATE-SPACE FORM 243
correspondence between data and parameters would preclude any estimation

of parameter uncertainty. To address this shortcoming, we return to the time
ordering of model parameters. Rather than assume the regression parameters
are independent from one time step to another, we instead model them as an
autoregressive process where
θ t = Gt θ t−1 + wt , (9.4)
Gt is the parameter “evolution” matrix, and wt is a vector of process errors,

such that wt ∼ MVN(0, Q). The elements of Gt may be known and fixed a
priori, or unknown and estimated from the data. Although we could allow
Gt to be time-varying, we will typically assume that it is time invariant or
assume Gt is an m × m identity matrix Im .
The idea is that the evolution matrix Gt deterministically maps the parameter
space from one time step to the next, so the parameters at time t are temporally
related to those before and after. However, the process is corrupted by
stochastic error, which amounts to a degradation of information over time. If
the diagonal elements of Q are relatively large, then the parameters can vary
widely from t to t + 1. If Q = 0, then θ 1 = θ 2 = θ T and we are back to the
static model in Equation (9.1).
9.2 DLM in state-space form

A DLM is a state-space model and can be written in MARSS form:
y t = F>
t θ t + et θ t = Gθ t−1 + wt ⇓ yt = Zt xt + vt xt = Bxt−1 + wt (9.5)
Note that DLMs include predictor variables (covariates) in the observation

equation much differently than other forms of MARSS models. In a DLM,
Z is a matrix of predictor variables and xt are the time-evolving regression
parameters.
yt = Zt xt + vt . (9.6)
In many other MARSS models, dt is a time-varying column vector of covariates

and D is the matrix of covariate-effect parameters.
yt = Zt xt + Ddt + vt . (9.7)
9.3 Stochastic level models
The most simple DLM is a stochastic level model, where the level is a random
walk without drift, and this level is observed with error. We will write it first
in using regression notation where the intercept is α and then in MARSS
notation. In the latter, αt = xt .
yt = αt + et αt = αt−1 + wt ⇓ yt = xt + vt xt = xt−1 + wt (9.8)
Using this model, we can model the Nile River level and fit the model using
MARSS().
data(Nile, package = "datasets")
mod_list <- list(B = "identity", U = "zero", Q = matrix("q"),
Z = "identity", A = matrix("a"), R = matrix("r"))
fit <- MARSS(matrix(Nile, nrow = 1), mod_list)

MARSS fit is
AIC: 1283.514 AICc: 1283.935
Estimate
A.a -0.338
R.r 15135.796
Q.q 1381.153
x0.x0 1111.791
9.3. STOCHASTIC LEVEL MODELS 245

1400
1200
Flow of the River Nile
1000
800
600
1880 1900 1920 1940 1960
Year
9.3.1 Stochastic level with drift
We can add a drift term to the level model to allow the level to tend upward
or downward with a deterministic rate η. This is a random walk with bias.
yt = αt + et αt = αt−1 + η + wt ⇓ yt = xt + vt xt = xt−1 + u + wt (9.9)
We can allow that the drift term η evolves over time along with the level. In
this case, η is modeled as a random walk along with α. This model is
yt = αt + et αt = αt−1 + ηt−1 + wα,t ηt = ηt−1 + wη,t (9.10)

Equation (9.10) can be written in matrix form as:
" # " # " #" # " #
h i α α 1 1 α w
yt = 1 0 + vt = + α (9.11)
η t
η t
0 1 η t−1
wη t
Equation (9.11) is a MARSS model.
yt = Zxt + vt xt = Bxt−1 + wt (9.12)
" # " #
1 1 α h i
where B = ,x= and Z = 1 0 .
0 1 η
See Section 6.2 for more discussion of stochastic level models and Section @ref()
to see how to fit this model with the StructTS(sec-uss-the-structts-function)
function in the stats package.
9.4 Stochastic regression model
The stochastic level models in Section 9.3 do not have predictor variables
(covariates). Let’s add one predictor variable ft and write a simple DLM
where the intercept α and slope β are stochastic. We will specify that α
and β evolve according to a simple random walk. Normally x is used for the
predictor variables in a regression model, but we will avoid that since we are
using x for the state equation in a state-space model. This model is
yt = αt + βt ft + vt αt = αt−1 + wα,t βt = βt−1 + wβ,t (9.13)
Written in matrix form, the model is

" # " # " # " #
h i α α α w
yt = 1 ft + vt = + α (9.14)
β t
β t
β t−1
wβ t
Equation (9.14) is a MARSS model:
yt = Zxt + vt xt = xt−1 + wt (9.15)

" #
α h i
where x = and Z = 1 ft .
β
9.5. DLM WITH SEASONAL EFFECT 247
9.5 DLM with seasonal effect

Let’s add a simple fixed quarter effect to the regression model:



 γ1 if qtr =1

γ if qtr =2

2
yt = αt + βt xt + γqtr + et γqtr = (9.16)




γ3 if qtr =3


γ4 if qtr =4
We can write Equation (9.16) in matrix form. In our model for γ, we will set
the variance to 0 so that the γ does not change with time.
       
h i α α α wα
y t = 1 xt  β  +et  β  =  β  +wβ  ⇓ yt = Zt xt +vt xt = xt−1 +wt
1        
γqtr t γqtr t γqtr t−1 0 t

(9.17)
How do we select the right quarterly effect? Let’s separate out the quarterly
effects and add them to x. We could then select the right γ using 0s and 1s
in the Zt matrix. For example, if t is in quarter 1, our model would be
 
αt
 βt 
 
γ1 
i
h 
yt = 1 xt 1 0 0 0  
 (9.18)
γ2 

γ3 
 
γ4
While if t is in quarter 2, the model is
 
αt
 βt 
 
γ1 
i
h 
yt = 1 xt 0 1 0 0 
γ 
 (9.19)
 2
γ3 
 
γ4
This would work, but we would have to have a different Zt matrix and it
might get cumbersome to keep track of the 0s and 1s. If we wanted the γ to
evolve with time, we might need to do this. However, if the γ are fixed, i.e. the
quarterly effect does not change over time, a less cumbersome approach is
possible.
We could instead keep the Zt matrix the same, but reorder the γi within x.
If t is in quarter 1,
 
αt
 βt 
 
γ1 
i
h 
yt = 1 xt 1 0 0 0  
 (9.20)
γ2 

γ3 
 
γ4
While if t is in quarter 2,
 
αt
 βt 
 
γ2 
i
h 
yt = 1 xt 1 0 0 0  
 (9.21)
γ3 

γ4 
 
γ1
We can use a non-diagonal G to to shift the correct quarter effect within x.
 
1 0 0 0 0 0
0 1 0 0 0 0


0 0 0 1 0 0
 
G=
0

 0 0 0 1 0

0 0 0 0 0 1
 
0 0 1 0 0 0
With this G, the γ rotate within x with each time step. If t is in quarter 1,
then t + 1 is in quarter 2, and we want γ2 to be in the 3rd row.
9.6. ANALYSIS OF SALMON SURVIVAL 249
      
α 1 0 0 0 0 0 α wα
β 0 1 0 0 0 0 β  wβ 
      
   
0 0 0 1 0 0
 γ1  +  0 
γ2 
      
   
  =
0 (9.22)
γ 
 3  0 0 0 1 0 γ2 
    0 
 
γ4  0 0 0 0 0 1 γ3   0 
      
γ1 t+1 0 0 1 0 0 0 γ4 t 0 t
At t + 2, we are in quarter 3 and γ3 will be in row 3.
      
α 1 0 0 0 0 0 α wα
β 0 1 0 0 0 0 β  wβ 
      
   
γ3  0 0 0 1 0 0  0 
 γ2 
      
 
  =
0 +  (9.23)
γ 
 4  0 0 0 1 0 γ3 
    0 
 
γ1  0 0 0 0 0 1 γ4   0 
      
γ2 t+2 0 0 1 0 0 0 γ1 t+1 0 t
9.6 Analysis of salmon survival
Let’s see an example of a DLM used to analyze real data from the literature.
Scheuerell and Williams (2005) used a DLM to examine the relationship
between marine survival of Chinook salmon and an index of ocean upwelling
strength along the west coast of the USA. Upwelling brings cool, nutrient-
rich waters from the deep ocean to shallower coastal areas. Scheuerell &
Williams hypothesized that stronger upwelling in April should create better
growing conditions for phytoplankton, which would then translate into more
zooplankton. In turn, juvenile salmon (“smolts”) entering the ocean in May
and June should find better foraging opportunities. Thus, for smolts entering
the ocean in year t,
survivalt = αt + βt ft + vt with vt ∼ N(0, r), (9.24)
and ft is the coastal upwelling index (cubic meters of seawater per second
per 100 m of coastline) for the month of April in year t.
Both the intercept and slope are time varying, so
αt = αt−1 + wα,t with wα,t ∼ N(0, qα ) (9.25)

βt = βt−1 + wβ,t with wβ,t ∼ N(0, qβ ). (9.26)
" # " # " #
α wα q 0
If we define θ t = , Gt = I, wt = , and Q = α , we get
β t wβ t 0 qβ
" #
1
Equation (9.4). If we define yt = survivalt and Ft = , we can write out
ft
the full DLM as a state-space model with the following form:
yt = F>
t θ t +vt with vt ∼ N(0, r)θ t = Gt θ t−1 +wt with wt ∼ MVN(0, Q)θ 0 = π 0 .
(9.27)
Equation (9.27) is equivalent to our standard MARSS model:
yt = Zt xt +a+vt with vt ∼ MVN(0, Rt )xt = Bt xt−1 +u+wt with wt ∼ MVN(0, Qt )x0 = π

(9.28)
>
where xt = θ t , Bt = Gt , yt = yt (i.e., yt is 1 × 1), Zt = Ft , a = u = 0, and
Rt = r (i.e., Rt is 1 × 1).
9.7 Fitting with MARSS()

Now let’s go ahead and analyze the DLM specified in Equations (9.24)–(9.27).
We begin by loading the data set (which is in the MARSS package). The
data set has 3 columns for 1) the year the salmon smolts migrated to the
ocean (year), 2) logit-transformed survival 1 (logit.s), and 3) the coastal
upwelling index for April (CUI.apr). There are 42 years of data (1964–2005).
# load the data
data(SalmonSurvCUI, package = "MARSS")
# get time indices
years <- SalmonSurvCUI[, 1]
# number of years of data
1
Survival in the original context was defined as the proportion of juveniles that survive
to adulthood. Thus, we use the logit function, defined as logit(p) = loge (p/[1 − p]), to map
survival from the open interval (0,1) onto the interval (−∞, ∞), which allows us to meet
our assumption of normally distributed observation errors.
9.7. FITTING WITH MARSS() 251
TT <- length(years)
# get response variable: logit(survival)
dat <- matrix(SalmonSurvCUI[, 2], nrow = 1)
As we have seen in other case studies, standardizing our covariate(s) to have

zero-mean and unit-variance can be helpful in model fitting and interpretation.
In this case, it’s a good idea because the variance of CUI.apr is orders of
magnitude greater than logit.s.
# get predictor variable
CUI <- SalmonSurvCUI[, 3]
## z-score the CUI
CUI.z <- matrix((CUI - mean(CUI))/sqrt(var(CUI)), nrow = 1)
# number of regr params (slope + intercept)
m <- dim(CUI.z)[1] + 1
Plots of logit-transformed survival and the z-scored April upwelling index are
shown in Figure 9.1.
−4.0
Logit(s)
−6.0
1
CUI
−3 −1
1965 1970 1975 1980 1985 1990 1995 2000 2005
Year of ocean entry
Figure 9.1: Time series of logit-transformed marine survival estimates for

Snake River spring/summer Chinook salmon (top) and z-scores of the coastal
upwelling index at 45N 125W (bottom). The x-axis indicates the year that
the salmon smolts entered the ocean.
Next, we need to set up the appropriate matrices and vectors for MARSS. Let’s
begin with those for the process equation because they are straightforward.
# for process eqn
B <- diag(m) ## 2x2; Identity
U <- matrix(0, nrow = m, ncol = 1) ## 2x1; both elements = 0
Q <- matrix(list(0), m, m) ## 2x2; all 0 for now
diag(Q) <- c("q.alpha", "q.beta") ## 2x2; diag = (q1,q2)
Defining the correct form for the observation model is a little more tricky,
however, because of how we model the effect(s) of predictor variables. In a
DLM, we need to use Zt (instead of dt ) as the matrix of predictor variables
that affect yt , and we use xt (instead of Dt ) as the regression parameters.
Therefore, we need to set Zt equal to an n × m × T array, where n is the
number of response variables (= 1; yt is univariate), m is the number of
regression parameters (= intercept + slope = 2), and T is the length of the
time series (= 42).
# for observation eqn
Z <- array(NA, c(1, m, TT)) ## NxMxT; empty for now
Z[1, 1, ] <- rep(1, TT) ## Nx1; 1's for intercept
Z[1, 2, ] <- CUI.z ## Nx1; predictor variable
A <- matrix(0) ## 1x1; scalar = 0
R <- matrix("r") ## 1x1; scalar = r
Lastly, we need to define our lists of initial starting values and model matri-
ces/vectors.
# only need starting values for regr parameters
inits.list <- list(x0 = matrix(c(0, 0), nrow = m))
# list of model matrices & vectors
mod.list <- list(B = B, U = U, Q = Q, Z = Z, A = A, R = R)
And now we can fit our DLM with MARSS.

# fit univariate DLM
dlm1 <- MARSS(dat, inits = inits.list, model = mod.list)

9.8. FORECASTING 253
MARSS fit is
AIC: 90.07627 AICc: 91.74293
Estimate
R.r 0.15708
Q.q.alpha 0.11264
Q.q.beta 0.00564
x0.X1 -3.34023
x0.X2 -0.05388

Notice that the MARSS output does not list any estimates of the regression
parameters themselves. Why not? Remember that in a DLM the matrix of
states (x) contains the estimates of the regression parameters (θ). Therefore,
we need to look in dlm1$states for the MLEs of the regression parameters,
and in dlm1$states.se for their standard errors.
Time series of the estimated intercept and slope are shown in Figure 9.2. It
appears as though the intercept is much more dynamic than the slope, as
indicated by a much larger estimate of process variance for the former (Q.q1).
In fact, although the effect of April upwelling appears to be increasing over
time, it doesn’t really become important as a predictor variable until about
1990 when the approximate 95% confidence interval for the slope no longer
overlaps zero.
9.8 Forecasting
Forecasting from a DLM involves two steps:

−4.0
αt
−6.0
1.0
−0.2 0.4
βt
1965 1970 1975 1980 1985 1990 1995 2000 2005
Year of ocean entry
Figure 9.2: Time series of estimated mean states (thick lines) for the intercept
(top) and slope (bottom) parameters from the DLM specified by Equations
(9.24)–(9.27). Thin lines denote the mean ± 2 standard deviations.
1. Get an estimate of the regression parameters at time t from data up

to time t − 1. These are also called the one-step ahead forecast (or
prediction) of the regression parameters.
2. Make a prediction of y at time t based on the predictor variables at
time t and the estimate of the regression parameters at time t (step
1). This is also called the one-step ahead forecast (or prediction) of the
observation.
9.8.1 Estimate of the regression parameters
For step 1, we want to compute the distribution of the regression parameters

at time t conditioned on the data up to time t − 1, also known as the one-step
ahead forecasts of the regression parameters. Let’s denote θ t−1 conditioned
on y1:t−1 as θ t−1|t−1 and denote θ t conditioned on y1:t−1 as θ t|t−1 . We will
start by defining the distribution of θ t|t as follows
θ t|t ∼ MVN(π t , Λt ) (9.29)

where π t = E(θ t|t ) and Λt = Var(θ t|t ). Now we can compute the distribution
of θ t conditioned on y1:t−1 using the process equation for θ:
θ t = Gt θ t−1 + wt with wt ∼ MVN(0, Q) (9.30)
The expected value of θ t|t−1 is thus
E(θ t|t−1 ) = Gt E(θ t−1|t−1 ) = Gt π t−1 (9.31)
The variance of θ t|t−1 is
Var(θ t|t−1 ) = Gt Var(θ t−1|t−1 )G> >

t + Q = Gt Λt−1 Gt + Q (9.32)
Thus the distribution of θ t conditioned on y1:t−1 is
E(θ t|t−1 ) ∼ MVN(Gt π t−1 , Gt Λt−1 G>

t + Q) (9.33)
9.8.2 Prediction of the response variable yt
For step 2, we make the prediction of yt given the predictor variables at time
t and the estimate of the regression parameters at time t. This is called
the one-step ahead prediction for the observation at time t. We will denote
the prediction of y as ŷ and we want to compute its distribution (mean and
variance). We do this using the equation for yt but substituting the expected
value of θ t|t−1 for θ t .
ŷt|t−1 = F>
t E(θ t|t−1 ) + et with et ∼ N(0, r) (9.34)
Our prediction of y at t has a normal distribution with mean (expected value)
and variance. The expected value of ŷt|t−1 is
E(ŷt|t−1 ) = F> >

t E(θ t|t−1 ) = Ft (Gt π t−1 ) (9.35)
and the variance of ŷt|t−1 is
Var(ŷt|t−1 ) = F>
t Var(θ t|t−1 )Ft + r (9.36)
= F> >
t (Gt Λt−1 Gt + Q)Ft + r (9.37)
(9.38)
9.8.3 Computing the prediction
The expectations and variance of θ t conditioned on y1:t and y1:t−1 are standard
output from the Kalman filter. Thus to produce the predictions, all we need
to do is run our DLM state-space model through a Kalman filter to get
E(θ t|t−1 ) and Var(θ t|t−1 ) and then use Equation (9.35) to compute the mean
prediction and Equation (9.36) to compute its variance.
The Kalman filter will need Ft , Gt and estimates of Q and r. The latter are
calculated by fitting the DLM to the data y1:t , using for example the MARSS()
function.
Let’s see an example with the salmon survival DLM. We will use the Kalman
filter function in the MARSS package and the DLM fit from MARSS().
9.8.4 Forecasting salmon survival
Scheuerell and Williams (2005) were interested in how well upwelling could be
used to actually forecast expected survival of salmon, so let’s look at how well
our model does in that context. To do so, we need the predictive distribution
for the survival at time t given the upwelling at time t and the predicted
regression parameters at t.
In the salmon survival DLM, the Gt matrix is the identity matrix, thus
the mean and variance of the one-step ahead predictive distribution for the
observation at time t reduces to (from Equations (9.35) and (9.36))
E(ŷt|t−1 ) = F> >
t E(θ t|t−1 )Var(ŷt|t−1 ) = Ft Var(θ t|t−1 )Ft + r̂ (9.39)
where " #
1
Ft =
ft
and ft is the upwelling index at t + 1. r̂ is the estimated observation variance
from our model fit.
9.8.5 Forecasting using MARSS
Working from Equation (9.39), we can compute the expected value of the
forecast at time t and its variance using the Kalman filter. For the expectation,
we need F> >

t E(θ t|t−1 ). Ft is called Zt in MARSS notation. The one-step ahead
forecasts of the regression parameters at time t, the E(θ t|t−1 ), are calculated
as part of the Kalman filter algorithm—they are termed x̃t−1 t in MARSS
notation and stored as xtt1 in the list produced by the MARSSkfss() Kalman
filter function.
Using the Z defined in 9.6, we compute the mean forecast as follows:

# get list of Kalman filter output
kf.out <- MARSSkfss(dlm1)
## forecasts of regr parameters; 2xT matrix
eta <- kf.out$xtt1
## ts of E(forecasts)
fore.mean <- vector()
for (t in 1:TT) {
fore.mean[t] <- Z[, , t] %*% eta[, t, drop = FALSE]
}
For the variance of the forecasts, we need F> t Var(θ t|t−1 )Ft + r̂. As with
the mean, F> t ≡ Z t . The variances of the one-step ahead forecasts of the
regression parameters at time t, Var(θ t|t−1 ), are also calculated as part of
the Kalman filter algorithm—they are stored as Vtt1 in the list produced by
the MARSSkfss() function. Lastly, the observation variance r̂ was estimated
when we fit the DLM to the data using MARSS() and can be extracted from
the dlm1 fit.
Putting this together, we can compute the forecast variance:

# variance of regr parameters; 1x2xT array
Phi <- kf.out$Vtt1
## obs variance; 1x1 matrix
R.est <- coef(dlm1, type = "matrix")$R
## ts of Var(forecasts)
fore.var <- vector()
for (t in 1:TT) {
tZ <- matrix(Z[, , t], m, 1) ## transpose of Z
fore.var[t] <- Z[, , t] %*% Phi[, , t] %*% tZ + R.est
}
Plots of the model mean forecasts with their estimated uncertainty are shown
in Figure 9.3. Nearly all of the observed values fell within the approximate
prediction interval. Notice that we have a forecasted value for the first year of
the time series (1964), which may seem at odds with our notion of forecasting
at time t based on data available only through time t − 1. In this case,
however, MARSS is actually estimating the states at t = 0 (θ 0 ), which allows
us to compute a forecast for the first time point.
−2
−4
Logit(s)
−6
−8
1965 1970 1975 1980 1985 1990 1995 2000 2005
Year of ocean entry
Figure 9.3: Time series of logit-transformed survival data (blue dots) and
model mean forecasts (thick line). Thin lines denote the approximate 95%
prediction intervals.
Although our model forecasts look reasonable in logit-space, it is worthwhile

to examine how well they look when the survival data and forecasts are back-
transformed onto the interval [0,1] (Figure 9.4). In that case, the accuracy
does not seem to be affected, but the precision appears much worse, especially
during the early and late portions of the time series when survival is changing
rapidly.
Notice that we passed the DLM fit to all the data to MARSSkfss(). This
meant that the Kalman filter used estimates of Q and r using all the data
in the xtt1 and Vtt1 calculations. Thus our predictions at time t are not
entirely based on only data up to time t − 1 since the Q and r estimates were
from all the data from 1964 to 2005.
9.9. FORECAST DIAGNOSTICS 259
0.12
Survival
0.06
0.00
1965 1970 1975 1980 1985 1990 1995 2000 2005
Year of ocean entry
Figure 9.4: Time series of survival data (blue dots) and model mean forecasts
(thick line). Thin lines denote the approximate 95% prediction intervals.
9.9 Forecast diagnostics
As with other time series models, evaluation of a DLM should include diag-
nostics. In a forecasting context, we are often interested in the forecast errors,
which are simply the observed data minus the forecasts et = yt − E(yt |y1:t−1 ).
In particular, the following assumptions should hold true for et :
1. et ∼ N(0, σ 2 ); 2. cov(et , et−k ) = 0.
In the literature on state-space models, the set of et are commonly referred to as

“innovations”. MARSS() calculates the innovations as part of the Kalman filter
algorithm—they are stored as Innov in the list produced by the MARSSkfss()
function.
# forecast errors
innov <- kf.out$Innov
Let’s see if our innovations meet the model assumptions. Beginning with (1),
we can use a Q-Q plot to see whether the innovations are normally distributed
with a mean of zero. We’ll use the qqnorm() function to plot the quantiles of
the innovations on the y-axis versus the theoretical quantiles from a Normal
distribution on the x-axis. If the 2 distributions are similar, the points should
fall on the line defined by y = x.
# Q-Q plot of innovations

qqnorm(t(innov), main = "", pch = 16, col = "blue")
# add y=x line for easier interpretation
qqline(t(innov))
1.5
Sample Quantiles
0.5
−0.5
−1.5
−2 −1 0 1 2
Theoretical Quantiles
Figure 9.5: Q-Q plot of the forecast errors (innovations) for the DLM specified
in Equations (9.24)–(9.27).
The Q-Q plot (Figure 9.5) indicates that the innovations appear to be more-
or-less normally distributed (i.e., most points fall on the line). Furthermore,
it looks like the mean of the innovations is about 0, but we should use a more
reliable test than simple visual inspection. We can formally test whether the
mean of the innovations is significantly different from 0 by using a one-sample
t-test. based on a null hypothesis of E(et ) = 0. To do so, we will use the
function t.test() and base our inference on a significance value of α = 0.05.
# p-value for t-test of H0: E(innov) = 0
t.test(t(innov), mu = 0)$p.value
[1] 0.4840901
The p-value >> 0.05 so we cannot reject the null hypothesis that E(et ) = 0.
Moving on to assumption (2), we can use the sample autocorrelation function
(ACF) to examine whether the innovations covary with a time-lagged version
of themselves. Using the acf() function, we can compute and plot the
correlations of et and et−k for various values of k. Assumption (2) will be
met if none of the correlation coefficients exceed the 95% confidence intervals
9.9. FORECAST DIAGNOSTICS 261
√
defined by ± z0.975 / n.
# plot ACF of innovations
acf(t(innov), lag.max = 10)
1.0
0.6
ACF
0.2
−0.2
0 2 4 6 8 10
Lag
Figure 9.6: Autocorrelation plot of the forecast errors (innovations) for the
DLM specified in Equations (9.24)–(9.27). Horizontal blue lines define the
upper and lower 95% confidence intervals.
The ACF plot (Figure 9.6) shows no significant autocorrelation in the innova-
tions at lags 1–10, so it looks like both of our model assumptions have indeed
been met.
9.10 Homework discussion and data
For the homework this week we will use a DLM to examine some of the
time-varying properties of the spawner-recruit relationship for Pacific salmon.
Much work has been done on this topic, particularly by Randall Peterman and
his students and post-docs at Simon Fraser University. To do so, researchers
commonly use a Ricker model because of its relatively simple form, such that
the number of recruits (offspring) born in year t (Rt ) from the number of
spawners (parents) (St ) is
Rt = aSt e−bS+vt . (9.40)
The parameter a determines the maximum reproductive rate in the absence

of any density-dependent effects (the slope of the curve at the origin), b is the
strength of density dependence, and vt ∼ N (0, σ). In practice, the model is
typically log-transformed so as to make it linear with respect to the predictor
variable St , such that
log(Rt ) = log(a) + log(St ) − bSt + vt (9.41)

log(Rt ) − log(St ) = log(a) − bSt + vt (9.42)
log(Rt /St ) = log(a) − bSt + vt . (9.43)
Substituting yt = log(Rt /St ), xt = St , and α = log(a) yields a simple linear

regression model with intercept α and slope b.
Unfortunately, however, residuals from this simple model typically show
high-autocorrelation due to common environmental conditions that affect
overlapping generations. Therefore, to correct for this and allow for an index
of stock productivity that controls for any density-dependent effects, the
model may be re-written as
log(Rt /St ) = αt − bSt + vt , (9.44)

αt = αt−1 + wt , (9.45)
9.10. HOMEWORK DISCUSSION AND DATA 263
and wt ∼ N (0, q). By treating the brood-year specific productivity as a

random walk, we allow it to vary, but in an autocorrelated manner so that
consecutive years are not independent from one another.
More recently, interest has grown in using covariates (e.g., sea-surface tem-
perature) to explain the interannual variability in productivity. In that case,
we can can write the model as
log(Rt /St ) = α + δt Xt − bSt + vt . (9.46)
In this case we are estimating some base-level productivity (α) plus the
time-varying effect of some covariate Xt (δt ).
9.10.1 Spawner-recruit data
The data come from a large public database begun by Ransom Myers many
years ago. If you are interested, you can find lots of time series of spawning-
stock, recruitment, and harvest for a variety of fishes around the globe. Here
is the website:
https://www.ramlegacy.org/
For this exercise, we will use spawner-recruit data for sockeye salmon (On-
corhynchus nerka) from the Kvichak River in SW Alaska that span the years
1952-1989. In addition, we’ll examine the potential effects of the Pacific
Decadal Oscillation (PDO) during the salmon’s first year in the ocean, which
is widely believed to be a “bottleneck” to survival.
These data are in the atsalibrary package on GitHub. If needed, install
using the devtools package.
library(devtools)
Load the data.

data(KvichakSockeye, package = "atsalibrary")
SRdata <- KvichakSockeye
The data are a dataframe with columns for brood year (brood.yr), number
of spawners (Sp), number of recruits (Rec) and PDO at year t − 2 (PDO.t2)
and t − 3 (PDO.t3).
# head of data file
head(SRdata)
brood.yr Sp Rec PDO.t2 PDO.t3

1 1952 5970 17310 -0.61 -0.61
2 1953 320 520 -1.48 -2.66
3 1954 240 750 -2.05 -1.26
4 1955 250 1280 0.01 0.11
5 1956 9443 39036 0.86 0.37
6 1957 2843 4091 -0.25 0.29
9.11. PROBLEMS 265
9.11 Problems
Use the information and data in the previous section to answer the following
questions. Note that if any model is not converging, then you will need to
increase the maxit parameter in the control argument/list that gets passed
to MARSS(). For example, you might try control=list(maxit=2000).
1. Begin by fitting a reduced form of Equation (9.44) that includes only a
time-varying level (αt ) and observation error (vt ). That is,
log(Rt ) = αt + log(St ) + vt
log(Rt /St ) = αt + vt
This model assumes no density-dependent survival in that the number of

recruits is an ascending function of spawners. Plot the ts of αt and note the
AICc for this model. Also plot appropriate model diagnostics.
2. Fit the full model specified by Equation (9.44). For this model, obtain
the time series of αt , which is an estimate of the stock productivity in
the absence of density-dependent effects. How do these estimates of
productivity compare to those from the previous question? Plot the ts
of αt and note the AICc for this model. Also plot appropriate model
diagnostics. (Hint: If you don’t want a parameter to vary with time,
what does that say about its process variance?)
3. Fit the model specified by Equation (9.46) with the summer PDO index
as the covariate (PDO.t2). What is the mean level of productivity? Plot
the ts of δt and note the AICc for this model. Also plot appropriate
model diagnostics.
4. Fit the model specified by Equation (9.46) with the winter PDO index
as the covariate (PDO.t3). What is the mean level of productivity? Plot
the ts of δt and note the AICc for this model. Also plot appropriate
model diagnostics.
5. Based on AICc, which of the models above is the most parsimonious? Is
it well behaved (i.e., are the model assumptions met)? Plot the model
forecasts for the best model. Is this a good forecast model?
Chapter 10
Dynamic Factor Analysis
Here we will use the MARSS package to do Dynamic Factor Analysis (DFA),
which allows us to look for a set of common underlying processes among
a relatively large set of time series (Zuur et al., 2003). There have been a
number of recent applications of DFA to ecological questions surrounding
Pacific salmon (Stachura et al., 2014; Jorgensen et al., 2016; Ohlberger et al.,
2016) and stream temperatures (Lisi et al., 2015). For a more in-depth
treatment of potential applications of MARSS models for DFA, see Chapter
9 in the MARSS User’s Guide.
Data and packages
All the data used in the chapter are in the MARSS package. Install the
package, if needed, and load to run the code in the chapter.
267
268 CHAPTER 10. DYNAMIC FACTOR ANALYSIS
library(MARSS)
10.1 Introduction
DFA is conceptually different than what we have been doing in the previous
applications. Here we are trying to explain temporal variation in a set of n
observed time series using linear combinations of a set of m hidden random
walks, where m << n. A DFA model is a type of MARSS model with the
following structure:
yt = Zxt + a + vt where vt ∼ MVN(0, R)

(10.1)
xt = xt−1 + wt where wt ∼ MVN(0, Q)
This equation should look rather familiar as it is exactly the same form we
used for estimating varying number of processes from a set of observations in
Lesson II. The difference with DFA is that rather than fixing the elements
within Z at 1 or 0 to indicate whether an observation does or does not
correspond to a trend, we will instead estimate them as “loadings” on each of
the states/processes.
10.2 Example of a DFA model

The general idea is that the observations y are modeled as a linear combination
of hidden processes x and factor loadings Z plus some offsets a. Imagine a
case where we had a data set with five observed time series (n = 5) and we
want to fit a model with three hidden processes (m = 3). If we write out our
DFA model in MARSS matrix form, the observation equation would look like
       
y1 z11 z12 z13   a1 v1
y2  z21 z22 z23  x a v
       
  1  2  2
  
  
y3  = z31 z32 z33  x2  + a3  + v3  (10.2)
 .
   
  
y4  z41 z42 z43  x3 t a4  v4 
       
y5 t z51 z52 z53 a5 v5 t
10.3. CONSTRAINING A DFA MODEL 269
and the process model would look like
      
x1 1 0 0 x1 w1
x2  = 0 1 0 x2  + w2  (10.3)
      
x3 t 0 0 1 x3 t−1 w3 t
The observation errors would be
     
v1 0 r11 r12 r13 r14 r15
v2  0 r12 r22 r23 r24 r25 
     

     
v3  ∼ MVN 0 , r13 r23 r33 r34 r35 
 (10.4)
     
v4  0 r14 r24 r34 r44 r45 
     

v5 t 0 r15 r25 r35 r45 r55
And the process errors would be
     
w1 0 q11 q12 q13
w2  ∼ MVN 0 , q12 q22 q23  . (10.5)
     
w3 t 0 q13 q23 q33
10.3 Constraining a DFA model
If a, Z, and Q are not constrained, the DFA model above is unidentifiable.

Nevertheless, we can use the following parameter constraints to make the
model identifiable:
• a is constrained so that the first m values are set to zero;
• in the first m − 1 rows of Z, the z-value in the j-th column and i-th
row is set to zero if j > i; and
• Q is set equal to the identity matrix Im .

Using these constraints, the observation equation for the DFA model above
becomes
       
y1 z11 0 0   0 v1
y2  z21 z22 0  x 0 v2 
       
     1    
  2  + 0 + v3  .
y3  = z31 z32 z33  x     (10.6)
  
y4  z41 z42 z43  x3 t  0 v4 
      
y5 t z51 z52 z53 0 v5 t
and the process equation becomes
      
x1 1 0 0 x1 w1
x2  = 0 1 0 x2  + w2  (10.7)
      
x3 t 0 0 1 x3 t−1 w3 t
The distribution of the observation errors would stay the same, such that
     
v1 0 r11 r12 r13 r14 r15
v2  0 r12 r22 r23 r24 r25 
     

     
v3  ∼ MVN 0 , r13 r23 r33 r34 r35  (10.8)
.

    
v4  0 r14 r24 r34 r44 r45 
     
v5 t 0 r15 r25 r35 r45 r55
but the distribution of the process errors would become
     
w1 0 1 0 0
w
 
 2 ∼ MVN   0 1 0 ,
0 ,
   
(10.9)
w3 t 0 0 0 1
10.4 Different error structures

The example observation equation we used above had what we refer to as an
“unconstrained” variance-covariance matrix R wherein all of the parameters
are unique. In certain applications, however, we may want to change our
assumptions about the forms for R. For example, we might have good reason
to believe that all of the observations have different error variances and
they were independent of one another (e.g., different methods were used for
sampling), in which case
10.5. LAKE WASHINGTON PHYTOPLANKTON DATA 271
 
r1 0 0 0 0
 0 r2 0 0 0 
 
 
 0 0 r3 0 0  .
R= 
 0 0 0 r4 0 
 
0 0 0 0 r5
Alternatively, we might have a situation where all of the observation errors

had the same variance r, but they were not independent from one another.
In that case we would have to include a covariance parameter k, such that
 
r k k k k
k r k k k
 
 
R=
k k r k k.
k k k r k


k k k k r
Any of these options for R (and other custom options as well) are available to
us in a DFA model, just as they were in the MARSS models used in previous
chapters.
10.5 Lake Washington phytoplankton data

For this exercise, we will use the Lake Washington phytoplankton data
contained in the MARSS package. Let’s begin by reading in the monthly
values for all of the data, including metabolism, chemistry, and climate.
## load the data (there are 3 datasets contained here)
## we want lakeWAplanktonTrans, which has been transformed so
## the 0s are replaced with NAs and the data z-scored
all_dat <- lakeWAplanktonTrans
## use only the 10 years from 1980-1989
yr_frst <- 1980
yr_last <- 1989
plank_dat <- all_dat[all_dat[, "Year"] >= yr_frst & all_dat[,
"Year"] <= yr_last, ]
## create vector of phytoplankton group names

phytoplankton <- c("Cryptomonas", "Diatoms", "Greens", "Unicells",
"Other.algae")
## get only the phytoplankton
dat_1980 <- plank_dat[, phytoplankton]
Next, we transpose the data matrix and calculate the number of time series
and their length.
## transpose data so time goes across columns
dat_1980 <- t(dat_1980)
## get number of time series
N_ts <- dim(dat_1980)[1]
## get length of time series
TT <- dim(dat_1980)[2]
It will be easier to estimate the real parameters of interest if we de-mean the

data, so let’s do that.
y_bar <- apply(dat_1980, 1, mean, na.rm = TRUE)
dat <- dat_1980 - y_bar
rownames(dat) <- rownames(dat_1980)
10.5.1 Plots of the data
Here are time series plots of all five phytoplankton functional groups.
spp <- rownames(dat_1980)
clr <- c("brown", "blue", "darkgreen", "darkred", "purple")
cnt <- 1
par(mfrow = c(N_ts, 1), mai = c(0.5, 0.7, 0.1, 0.1), omi = c(0,
0, 0, 0))
for (i in spp) {
plot(dat[i, ], xlab = "", ylab = "Abundance index", bty = "L",
xaxt = "n", pch = 16, col = clr[cnt], type = "b")
axis(1, 12 * (0:dim(dat_1980)[2]) + 1, yr_frst + 0:dim(dat_1980)[2])
title(i)
cnt <- cnt + 1
10.6. FITTING DFA MODELS WITH THE MARSS PACKAGE 273
}
Cryptomonas
Abundance index
1.0
0.0
−1.0
1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990
Diatoms
2
Abundance index
1
0
−1
−2
1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990
Greens
3
Abundance index
2
1
0
−2
1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990
Unicells
1
Abundance index
0
−2
−4
1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990
Other.algae
2
Abundance index
1
0
−1
−2
1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990
Figure 10.1: Demeaned time series of Lake Washington phytoplankton.
10.6 Fitting DFA models with the MARSS

package
The MARSS package is designed to work with the fully specified matrix
form of the multivariate state-space model we wrote out in Sec 3. Thus, we
will need to create a model list with forms for each of the vectors and matrices.
Note that even though some of the model elements are scalars and vectors,
we will need to specify everything as a matrix (or array for time series of
matrices).
Notice that the code below uses some of the MARSS shortcuts
for specifying forms of vectors and matrices. We will also use the
matrix(list(),nrow,ncol) trick we learned previously.
10.6.1 The observation model
Here we will fit the DFA model above where we have R N_ts observed time
series and we want 3 hidden states. Now we need to set up the observation
model for MARSS. Here are the vectors and matrices for our first model where
each nutrient follows its own process. Recall that we will need to set the
elements in the upper R corner of Z to 0. We will assume that the observation
errors have different variances and they are independent of one another.
## 'ZZ' is loadings matrix
Z_vals <- list("z11", 0, 0, "z21", "z22", 0, "z31", "z32", "z33",
"z41", "z42", "z43", "z51", "z52", "z53")
ZZ <- matrix(Z_vals, nrow = N_ts, ncol = 3, byrow = TRUE)
ZZ
[,1] [,2] [,3]

[1,] "z11" 0 0
[2,] "z21" "z22" 0
[3,] "z31" "z32" "z33"
[4,] "z41" "z42" "z43"
[5,] "z51" "z52" "z53"
## 'aa' is the offset/scaling
aa <- "zero"
## 'DD' and 'd' are for covariates
DD <- "zero" # matrix(0,mm,1)
dd <- "zero" # matrix(0,1,wk_last)
## 'RR' is var-cov matrix for obs errors
RR <- "diagonal and unequal"
10.6. FITTING DFA MODELS WITH THE MARSS PACKAGE 275
10.6.2 The process model
We need to specify the explicit form for all of the vectors and matrices in the
full form of the MARSS model we defined in Sec 3.1. Note that we do not
have to specify anything for the states (x) – those are elements that MARSS
will identify and estimate itself based on our definitions of the other vectors
and matrices.
## number of processes
mm <- 3
## 'BB' is identity: 1's along the diagonal & 0's elsewhere
BB <- "identity" # diag(mm)
## 'uu' is a column vector of 0's
uu <- "zero" # matrix(0,mm,1)
## 'CC' and 'cc' are for covariates
CC <- "zero" # matrix(0,mm,1)
cc <- "zero" # matrix(0,1,wk_last)
## 'QQ' is identity
QQ <- "identity" # diag(mm)
10.6.3 Fit the model in MARSS
Now it’s time to fit our first DFA model To do so, we need to create three
lists that we will need to pass to the MARSS() function:
1. A list of specifications for the model’s vectors and matrices;
2. A list of any initial values – MARSS will pick its own otherwise;
3. A list of control parameters for the MARSS() function.
## list with specifications for model vectors/matrices
mod_list <- list(Z = ZZ, A = aa, D = DD, d = dd, R = RR, B = BB,
U = uu, C = CC, c = cc, Q = QQ)
## list with model inits
init_list <- list(x0 = matrix(rep(0, mm), mm, 1))
## list with model control parameters
con_list <- list(maxit = 3000, allow.degen = TRUE)
Now we can fit the model.

## fit MARSS
dfa_1 <- MARSS(y = dat, model = mod_list, inits = init_list,
control = con_list)

MARSS fit is
AIC: 1425.959 AICc: 1427.42
Estimate
Z.z11 0.2738
Z.z21 0.4487
Z.z31 0.3170
Z.z41 0.4107
Z.z51 0.2553
Z.z22 0.3608
Z.z32 -0.3690
Z.z42 -0.0990
Z.z52 -0.3793
Z.z33 0.0185
Z.z43 -0.1404
Z.z53 0.1317
R.(Cryptomonas,Cryptomonas) 0.1638
R.(Diatoms,Diatoms) 0.2913
R.(Greens,Greens) 0.8621
R.(Unicells,Unicells) 0.3080
R.(Other.algae,Other.algae) 0.5000
x0.X1 0.2218
x0.X2 1.8155
x0.X3 -4.8097
10.7. INTERPRETING THE MARSS OUTPUT 277

10.7 Interpreting the MARSS output

By now the MARSS() output should look familiar. The first 12 parameter
estimates Z.z## are the loadings of each observed time series on the 3 hidden
states. The next 5 estimates R.(,) are the variances of the observation errors
(vi,t ). The last 3 values, x0.X#, are the estimates of the initial states at t = 0.
Recall that the estimates of the processes themselves (i.e., x) are con-
tained in one of the list elements in our fitted MARSS object. Specifically,
they are in mod_fit$states, and their respective standard errors are
in mod_fit$states.se. For the names of all of the other objects, type
names(dfa_1).
10.8 Rotating trends and loadings

Before proceeding further, we need to address the constraints we placed on
the DFA model in Sec 2.2. In particular, we arbitrarily constrained Z in
such a way to choose only one of these solutions, but fortunately the different
solutions are equivalent, and they can be related to each other by a rotation
matrix H. Let H be any m × m non-singular matrix. The following are then
equivalent DFA models:
yt = Zxt + a + vt xt = xt−1 + wt (10.10)
and
yt = ZH−1 xt + a + vt Hxt = Hxt−1 + Hwt . (10.11)
There are many ways of doing factor rotations, but a common method is the
“varimax”" rotation, which seeks a rotation matrix H that creates the largest
difference between the loadings in Z. For example, imagine that row 3 in our
estimated Z matrix was (0.2, 0.2, 0.2). That would mean that green algae
were a mixture of equal parts of processes 1, 2, and 3. If instead row 3 was
(0.8, 0.1, 0.05), this would make our interpretation of the model fits easier
because we could say that green algae followed the first process most closely.
The varimax rotation would find the H matrix that makes the rows in Z
more like (0.8, 0.1, 0.05) and less like (0.2, 0.2, 0.2).
The varimax rotation is easy to compute because R has a built in function for
this: varimax(). Interestingly, the function returns the inverse of H, which
we need anyway.
## get the estimated ZZ
Z_est <- coef(dfa_1, type = "matrix")$Z
## get the inverse of the rotation matrix
H_inv <- varimax(Z_est)$rotmat
We can now rotate both Z and x.

## rotate factor loadings
Z_rot = Z_est %*% H_inv
## rotate processes
proc_rot = solve(H_inv) %*% dfa_1$states
10.9 Estimated states and loadings
Here are plots of the three hidden processes (left column) and the loadings
for each of phytoplankton groups (right column).
ylbl <- phytoplankton
w_ts <- seq(dim(dat)[2])
layout(matrix(c(1, 2, 3, 4, 5, 6), mm, 2), widths = c(2, 1))
## par(mfcol=c(mm,2), mai=c(0.5,0.5,0.5,0.1), omi=c(0,0,0,0))
par(mai = c(0.5, 0.5, 0.5, 0.1), omi = c(0, 0, 0, 0))
## plot the processes
for (i in 1:mm) {
ylm <- c(-1, 1) * max(abs(proc_rot[i, ]))
## set up plot area
10.9. ESTIMATED STATES AND LOADINGS 279
plot(w_ts, proc_rot[i, ], type = "n", bty = "L", ylim = ylm,

xlab = "", ylab = "", xaxt = "n")
## draw zero-line
abline(h = 0, col = "gray")
## plot trend line
lines(w_ts, proc_rot[i, ], lwd = 2)
lines(w_ts, proc_rot[i, ], lwd = 2)
## add panel labels
mtext(paste("State", i), side = 3, line = 0.5)
}
## plot the loadings
minZ <- 0
ylm <- c(-1, 1) * max(abs(Z_rot))
for (i in 1:mm) {
plot(c(1:N_ts)[abs(Z_rot[, i]) > minZ], as.vector(Z_rot[abs(Z_rot[,
i]) > minZ, i]), type = "h", lwd = 2, xlab = "", ylab = "",
xaxt = "n", ylim = ylm, xlim = c(0.5, N_ts + 0.5), col = clr)
for (j in 1:N_ts) {
if (Z_rot[j, i] > minZ) {
text(j, -0.03, ylbl[j], srt = 90, adj = 1, cex = 1.2,
col = clr[j])
}
if (Z_rot[j, i] < -minZ) {
text(j, 0.03, ylbl[j], srt = 90, adj = 0, cex = 1.2,
col = clr[j])
}
abline(h = 0, lwd = 1.5, col = "gray")
}
mtext(paste("Factor loadings on state", i), side = 3, line = 0.5)
}
It looks like there are strong seasonal cycles in the data, but there is some
indication of a phase difference between some of the groups. We can use
ccf() to investigate further.
par(mai = c(0.9, 0.9, 0.1, 0.1))
ccf(proc_rot[1, ], proc_rot[2, ], lag.max = 12, main = "")
State 1 Factor loadings on state 1
0.6
2
0.4
0.2
1
0.0
0
Cryptomonas
Diatoms
Greens
Unicells
Other.algae
−0.6 −0.4 −0.2
−1
−2
1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990
0.6
3
Cryptomonas
Other.algae
0.4
2
Unicells
Greens
0.2
1
0.0
0
Diatoms
−0.6 −0.4 −0.2
−1
−2
−3
1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990

0.6
Cryptomonas
Other.algae
0.4
5
Diatoms
Unicells
Greens
0.2
0.0
0
−0.6 −0.4 −0.2

−5
1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990
Figure 10.2: Estimated states from the DFA model.

10.10. PLOTTING THE DATA AND MODEL FITS 281
0.2
0.0
ACF
−0.2
−0.4
−10 −5 0 5 10
Lag
Figure 10.3: Cross-correlation plot of the two rotations.
10.10 Plotting the data and model fits
We can plot the fits for our DFA model along with the data. The following
function will return the fitted values ± (1-α)% confidence intervals.
get_DFA_fits <- function(MLEobj, dd = NULL, alpha = 0.05) {
## empty list for results
fits <- list()
## extra stuff for var() calcs
Ey <- MARSS:::MARSShatyt(MLEobj)
## model params
ZZ <- coef(MLEobj, type = "matrix")$Z
## number of obs ts
nn <- dim(Ey$ytT)[1]
## number of time steps
TT <- dim(Ey$ytT)[2]
## get the inverse of the rotation matrix
H_inv <- varimax(ZZ)$rotmat
## check for covars
if (!is.null(dd)) {
DD <- coef(MLEobj, type = "matrix")$D
## model expectation
fits$ex <- ZZ %*% H_inv %*% MLEobj$states + DD %*% dd
} else {
## model expectation
fits$ex <- ZZ %*% H_inv %*% MLEobj$states
}
## Var in model fits
VtT <- MARSSkfss(MLEobj)$VtT
VV <- NULL
for (tt in 1:TT) {
RZVZ <- coef(MLEobj, type = "matrix")$R - ZZ %*% VtT[,
, tt] %*% t(ZZ)
SS <- Ey$yxtT[, , tt] - Ey$ytT[, tt, drop = FALSE] %*%
t(MLEobj$states[, tt, drop = FALSE])
VV <- cbind(VV, diag(RZVZ + SS %*% t(ZZ) + ZZ %*% t(SS)))
}
SE <- sqrt(VV)
## upper & lower (1-alpha)% CI
fits$up <- qnorm(1 - alpha/2) * SE + fits$ex
fits$lo <- qnorm(alpha/2) * SE + fits$ex
return(fits)
}
Here are time series of the five phytoplankton groups (points) with the mean
of the DFA fits (black line) and the 95% confidence intervals (gray lines).
## get model fits & CI's
mod_fit <- get_DFA_fits(dfa_1)
## plot the fits
0, 0, 0))
for (i in 1:N_ts) {
up <- mod_fit$up[i, ]
mn <- mod_fit$ex[i, ]
lo <- mod_fit$lo[i, ]
10.10. PLOTTING THE DATA AND MODEL FITS 283
plot(w_ts, mn, xlab = "", ylab = ylbl[i], xaxt = "n", type = "n",
cex.lab = 1.2, ylim = c(min(lo), max(up)))
points(w_ts, dat[i, ], pch = 16, col = clr[i])
lines(w_ts, up, col = "darkgray")
lines(w_ts, mn, col = "black", lwd = 2)
lines(w_ts, lo, col = "darkgray")
}
1.5
Cryptomonas
0.5
−0.5
−2.0
1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990
2
1
Diatoms
0
−1
−3
1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990
2
Greens
0
−2
−4
1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990
2
1
Unicells
0
−2
−4
1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990
2
Other.algae
1
0
−2
1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990
Figure 10.4: Data and fits from the DFA model.

10.11 Covariates in DFA models
It is standard to add covariates to the analysis so that one removes known

important drivers. The DFA with covariates is written:
yt = Zxt + a + Ddt + vt where vt ∼ MVN(0, R)xt = xt−1 + wt where wt ∼ MVN(0, Q)

(10.12)
where the q × 1 vector dt contains the covariate(s) at time t, and the n × q
matrix D contains the effect(s) of the covariate(s) on the observations. Using
form = "dfa" and covariates=<covariate name(s)>, we can easily add
covariates to our DFA, but this means that the covariates are input, not
data, and there can be no missing values (see Chapter 6 in the MARSS User
Guide for how to include covariates with missing values).
10.12 Example from Lake Washington
The Lake Washington dataset has two environmental covariates that we

might expect to have effects on phytoplankton growth, and hence, abundance:
temperature (Temp) and total phosphorous (TP). We need the covariate inputs
to have the same number of time steps as the variate data, and thus we limit
the covariate data to the years 1980-1994 also.
temp <- t(plank_dat[, "Temp", drop = FALSE])
TP <- t(plank_dat[, "TP", drop = FALSE])
We will now fit three different models that each add covariate effects (i.e.,
Temp, TP, Temp and TP) to our existing model above where m = 3 and R is
"diagonal and unequal".
mod_list = list(m = 3, R = "diagonal and unequal")
dfa_temp <- MARSS(dat, model = mod_list, form = "dfa", z.score = FALSE,
control = con_list, covariates = temp)
dfa_TP <- MARSS(dat, model = mod_list, form = "dfa", z.score = FALSE,
control = con_list, covariates = TP)
10.12. EXAMPLE FROM LAKE WASHINGTON 285
dfa_both <- MARSS(dat, model = mod_list, form = "dfa", z.score = FALSE,

control = con_list, covariates = rbind(temp, TP))
Next we can compare whether the addition of the covariates improves the
model fit.
print(cbind(model = c("no covars", "Temp", "TP", "Temp & TP"),
AICc = round(c(dfa_1$AICc, dfa_temp$AICc, dfa_TP$AICc, dfa_both$AICc))),
quote = FALSE)
model AICc
[1,] no covars 1427
[2,] Temp 1356
[3,] TP 1414
[4,] Temp & TP 1362
This suggests that adding temperature or phosphorus to the model, either
alone or in combination with one another, does seem to improve overall model
fit. If we were truly interested in assessing the “best” model structure that
includes covariates, however, we should examine all combinations of 1-4 trends
and different structures for R.
Now let’s try to fit a model with a dummy variable for season, and see how
that does.
cos_t <- cos(2 * pi * seq(TT)/12)
sin_t <- sin(2 * pi * seq(TT)/12)
dd <- rbind(cos_t, sin_t)
dfa_seas <- MARSS(dat_1980, model = mod_list, form = "dfa", z.score = TRUE,
control = con_list, covariates = dd)

MARSS fit is
AIC: 1481.693 AICc: 1484.355
Estimate
Z.11 0.49562
Z.21 0.27206
Z.31 0.03354
Z.41 0.51692
Z.51 0.18981
Z.22 0.05290
Z.32 -0.08042
Z.42 0.06336
Z.52 0.06157
Z.33 0.02383
Z.43 0.19506
Z.53 -0.10800
R.(Cryptomonas,Cryptomonas) 0.51583
R.(Diatoms,Diatoms) 0.53296
R.(Greens,Greens) 0.60329
R.(Unicells,Unicells) 0.19787
R.(Other.algae,Other.algae) 0.52977
D.(Cryptomonas,cos_t) -0.43973
D.(Diatoms,cos_t) -0.44836
D.(Greens,cos_t) -0.66003
D.(Unicells,cos_t) -0.34898
D.(Other.algae,cos_t) -0.42773
D.(Cryptomonas,sin_t) 0.23672
D.(Diatoms,sin_t) 0.72062
D.(Greens,sin_t) -0.46019
D.(Unicells,sin_t) -0.00873
D.(Other.algae,sin_t) -0.64228

dfa_seas$AICc
[1] 1484.355
10.12. EXAMPLE FROM LAKE WASHINGTON 287
The model with a dummy seasonal factor does much better than the covariate
models, but still not as well as the model with only 3 trends. The model fits
for the seasonal effects model are shown below.
## get model fits & CI's
mod_fit <- get_DFA_fits(dfa_seas, dd = dd)
## plot the fits
0, 0, 0))
for (i in 1:N_ts) {
up <- mod_fit$up[i, ]
mn <- mod_fit$ex[i, ]
lo <- mod_fit$lo[i, ]
plot(w_ts, mn, xlab = "", ylab = ylbl[i], xaxt = "n", type = "n",
cex.lab = 1.2, ylim = c(min(lo), max(up)))
points(w_ts, dat[i, ], pch = 16, col = clr[i])
lines(w_ts, up, col = "darkgray")
lines(w_ts, mn, col = "black", lwd = 2)
lines(w_ts, lo, col = "darkgray")
}
1 2 3
Cryptomonas
−1
−3
1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990
3
2
Diatoms
1
0
−2
1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990
2
1
Greens
0
−2
1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990
2
Unicells
1
0
−2 −1
1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990
3
2
Other.algae
1
−1 0
−3
1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990
Figure 10.5: Data and model fits for the DFA with covariates.
10.13. PROBLEMS 289
10.13 Problems
For your homework this week, we will continue to investigate common trends
in the Lake Washington plankton data.
1. Fit other DFA models to the phytoplankton data with varying numbers
of trends from 1-4 (we fit a 3-trend model above). Do not include any
covariates in these models. Using R="diagonal and unequal" for the
observation errors, which of the DFA models has the most support from
the data?
Plot the model states and loadings as in Section 10.9. Describe the
general patterns in the states and the ways the different taxa load onto
those trends.
Also plot the the model fits as in Section 10.10. Do they reasonable?
Are there any particular problems or outliers?
2. How does the best model from Question 1 compare to a DFA model
with the same number of trends, but with R="unconstrained"?
those trends.
3. Fit a DFA model that includes temperature as a covariate and 3 trends
(as in Section 10.12), but withR="unconstrained"? How does this
model compare to the model with R="diagonal and unequal"? How
does it compare to the model in Question 2?
those trends.
Chapter 11
Covariates with Missing Values
Data and packages
This chapter will use a SNOTEL dataset. These are data on snow water
equivalency at locations throughtout the state of Washington. The data are
in the atsalibrary package.
data(snotel, package = "atsalibrary")
The main packages used in this chapter are MARSS and forecast.
library(MARSS)
library(forecast)
library(ggplot2)
library(ggmap)
library(broom)
291
292 CHAPTER 11. COVARIATES WITH NAS
11.1 Covariates with missing values or obser-

vation error
The specific formulation of Equation (8.1) creates restrictions on the assump-
tions regarding the covariate data. You have to assume that your covariate
data has no error, which is probably not true. You cannot have missing values
in your covariate data, again unlikely. You cannot combine instrument time
series; for example, if you have two temperature recorders with different error
rates and biases. Also, what if you have one noisy temperature sensor in the
first part of your time series and then you switch to a much better sensor in
the second half of your time series? All these problems require pre-analysis
massaging of the covariate data, leaving out noisy and gappy covariate data,
and making what can feel like arbitrary choices about which covariate time
series to include.
To circumvent these potential problems and allow more flexibility in how we
incorporate covariate data, one can instead treat the covariates as components
of an auto-regressive process by including them in both the process and
observation models. Beginning with the process equation, we can write
" # " #" # " #
x(v) B(v) C x(v) u(v)
= + (c) + wt ,
x(c) t
0 B(c) x(c) t−1
u
" #! (11.1)
Q(v) 0
wt ∼ MVN 0,
0 Q(c)
The elements with superscript (v) are for the k variate states and those with
superscript (c) are for the q covariate states. The dimension of x(c) is q ×1 and
q is not necessarily equal to p, the number of covariate observation time series
in your dataset. Imagine, for example, that you have two temperature sensors
and you are combining these data. Then you have two covariate observation
time series (p = 2) but only one underlying covariate state time series (q = 1).
The matrix C is dimension k × q, and B(c) and Q(c) are dimension q × q.
The dimension of x(v) is k × 1, and B(v) and Q(v) are dimension k × k. The
dimension of x is always denoted m. If your process model includes only
variates, then k = m, but now your process model includes k variates and q
covariate states so m = k + q.
Next, we can write the observation equation in an analogous manner, such
11.1. COVARIATES WITH MISSING VALUES OR OBSERVATION ERROR293
that " # " #" # " #

y(v) Z(v) D x(v) a(v)
= + (c) + vt ,
y(c) t
0 Z(c) x(c) t
a
" #! (11.2)
R(v) 0
vt ∼ MVN 0,
0 R(c)
The dimension of y(c) is p × 1, where p is the number of covariate observation
time series in your dataset. The dimension of y(v) is l × 1, where l is the
number of variate observation time series in your dataset. The total dimension
of y is l + p. The matrix D is dimension l × q, Z(c) is dimension p × q, and
R(c) are dimension p × p. The dimension of Z(v) is dimension l × k, and R(v)
are dimension l × l.
The D matrix would presumably have a number of all zero rows in it, as
would the C matrix. The covariates that affect the states would often be
different than the covariates that affect the observations. For example, mean
annual temperature might affect population growth rates for many species
while having little or no affect on observability, and turbidity might strongly
affect observability in many types of aquatic surveys but have little affect on
population growth rate.
Our MARSS model with covariates now looks on the surface like a regular
MARSS model:
xt = Bxt−1 + u + wt , where wt ∼ MVN(0, Q)
(11.3)
yt = Zxt + a + vt , where vt ∼ MVN(0, R)
with the xt , yt and parameter matrices redefined as in Equations (11.1) and
(11.2):
" # " # " # " #
x(v) B(v) C u(v) Q(v) 0
x = (c) B= u = (c) Q=
x 0 B(c) u 0 Q(c)
" # " # " # " # (11.4)
y(v) Z(v) D a(v) R(v) 0
y = (c) Z= a = (c) R=
y 0 Z(c) a 0 R(c)
Note Q and R are written as block diagonal matrices, but you could allow
covariances if that made sense. u and a are column vectors here. We can fit
the model (Equation (11.4)) as usual using the MARSS() function.
The log-likelihood that is returned by MARSS will include the log-likelihood
of the covariates under the covariate state model. If you want only the the
log-likelihood of the non-covariate data, you will need to subtract off the
log-likelihood of the covariate model:
(c) (c)
xt = B(c) xt−1 + u(c) + wt , where wt ∼ MVN(0, Q(c) )
(c) (c)
(11.5)
yt = Z(c) xt + a(c) + vt , where vt ∼ MVN(0, R(c) )
An easy way to get this log-likelihood for the covariate data only is use the
augmented model (Equation (11.2) with terms defined as in Equation (11.4)
but pass in missing values for the non-covariate data. The following code
shows how to do this.
y.aug = rbind(data, covariates)
fit.aug = MARSS(y.aug, model = model.aug)
fit.aug is the MLE object that can be passed to MARSSkf(). You need to
make a version of this MLE object with the non-covariate data filled with NAs
so that you can compute the log-likelihood without the covariates. This needs
to be done in the marss element since that is what is used by MARSSkf().
Below is code to do this.
fit.cov = fit.aug
fit.cov$marss$data[1:dim(data)[1], ] = NA
extra.LL = MARSSkf(fit.cov)$logLik
Note that when you fit the augmented model, the estimates of C and B(c) are
affected by the non-covariate data since the model for both the non-covariate
and covariate data are estimated simultaneously and are not independent
(since the covariate states affect the non-covariates states). If you want the
covariate model to be unaffected by the non-covariate data, you can fit the
covariate model separately and use the estimates for B(c) and Q(c) as fixed
values in your augmented model.
11.2 Example: Snotel Data
Let’s see an example using the Washington SNOTEL data. The data we will
use is the snow water equivalent percent of normal. This represents the snow
water equivalent compared to the average value for that site on the same
11.2. EXAMPLE: SNOTEL DATA 295
day. We will look at a subset of sites in the Central Cascades in our snotel
dataset (Figure 11.1).
y <- snotelmeta
# Just use a subset
y = y[which(y$Longitude < -121.4), ]
y = y[which(y$Longitude > -122.5), ]
y = y[which(y$Latitude < 47.5), ]
y = y[which(y$Latitude > 46.5), ]
SnoTel sites
49
48
Longitude
47
46
45
−124 −122 −120 −118
Latitude
Figure 11.1: Subset of SNOTEL sties used in this chapter.
For the first analysis, we are just going to look at February Snow Water
Equivalent (SWE). Our subset of stations is y$Station.Id. There are many
missing years among some of our stations (Figure 11.2).
swe.feb <- snotel
swe.feb <- swe.feb[swe.feb$Station.Id %in% y$Station.Id & swe.feb$Month ==
"Feb", ]
p <- ggplot(swe.feb, aes(x = Date, y = SWE)) + geom_line()
p + facet_wrap(~Station)
Burnt Mountain Cayuse Pass Corral Pass Cougar Mountain
60
40
20
0
Huckleberry Creek Lynn Lake Meadows Pass Morse Lake
60
40
20
0
SWE
Mount Gardner Mowich Olallie Meadows Paradise
60
40
20
0
1980 1990 2000 2010
Rex River Sawmill Ridge Tinkham Creek
60
40
20
0
1980 1990 2000 20101980 1990 2000 20101980 1990 2000 2010
Date
Figure 11.2: Snow water equivalent time series from each SNOTEL station.
11.2.1 Estimate missing Feb SWE using AR(1) with

spatial correlation
Imagine that for our study we need an estimate of SWE for all sites. We will
use the information from the sites with full data to estimate the missing SWE
for other sites. We will use a MARSS model to use all the available data.
x1 b 0 ... 0 x1 w1 y1 x1 a1 v1
              
x   0 b ... 0 x  w  y  x  a  v 
 2  2   2  2  2  2  2
 =  +   =  +  + 

. . . . . . ... ... . . .  . . .  ... . . .  . . .   . . .  . . .
  
x15 t 0 0 ... b x15 t−1 w15 t y15 t x15 t a15 t v15 t

(11.6)
We will use an unconstrained variance-covariance structure for w and assume

that v is identical and independent and very low (SNOTEL instrument
variability). The ai determine the level of the xi .
We need our data to be in rows. We will use reshape2::acast().

dat.feb <- reshape2::acast(swe.feb, Station ~ Year, value.var = "SWE")
We set up the model for MARSS so that it is the same as (11.6). We will fix
the measurement error to be small; we could use 0 but the fitting is more
stable if we use a small variance instead. When estimating B, setting the
initial value to be at t = 1 instead of t = 0 works better.
ns <- length(unique(swe.feb$Station))
B <- "diagonal and equal"
Q <- "unconstrained"
R <- diag(0.01, ns)
U <- "zero"
A <- "unequal"
x0 <- "unequal"
mod.list.ar1 = list(B = B, Q = Q, R = R, U = U, x0 = x0, A = A,
tinitx = 1)
Now we can fit a MARSS model and get estimates of the missing SWEs.
Convergence is slow. We set a equal to the mean of the time series to speed
convergence.
library(MARSS)
m <- apply(dat.feb, 1, mean, na.rm = TRUE)
fit.ar1 <- MARSS(dat.feb, model = mod.list.ar1, control = list(maxit = 5000),
inits = list(A = matrix(m, ns, 1)))
The b estimate is 0.4494841.
Let’s plot the estimated SWEs for the missing years (Figure 11.3). These
estimates use all the information about the correlation with other sites and
uses information about correlation with the prior and subsequent years. We
will use the tidy() function to get the estimates and the 95% prediction
intervals. The prediction interval is for the range of SWE values we might
observe for that site. Notice that for some sites, intervals are low in early
years as these sites are highly correlated with site for which there are data.
In other sites, the uncertainty is high in early years because the sites with
data in those years are not highly correlated. There are no intervals for sites
with data. We have data for those sites, so we are not uncertain about the
observed SWE for those.
fit <- fit.ar1

d <- broom::tidy(fit, type = "ytT", conf.int = TRUE)
d$Year <- d$t + 1980
d$Station <- d$.rownames
p <- ggplot(data = d) + geom_line(aes(Year, estimate)) + geom_point(aes(Year,
y)) + geom_ribbon(aes(x = Year, ymin = pred.low, ymax = pred.high),
linetype = 2, alpha = 0.2, fill = "blue") + facet_wrap(~Station) +
xlab("") + ylab("SWE (demeaned)")
p

80
60
40
20
0

80
60
SWE (demeaned)
40
20
0

80
60
40
20
0
1980 1990 2000 2010
80
60
40
20
0
1980 1990 2000 20101980 1990 2000 20101980 1990 2000 2010
Figure 11.3: Estimated SWEs for the missing sites with prediction intervals.
If we were using these SWE as covariates in a site specific model, we could

then use the estimates as our covariates, however this would not incorporate
uncertainty. Alternatively we could use Equation (11.1) and set the parame-
ters for the covariate process to those estimated for our covariate-only model.
This approach will incorporate the uncertainty in the SWE estimates in the
early years for the sites with no data.
Note, we should do some cross-validation (fitting with data left out) to ensure
that the estimated SWEs are well-matched to actual measurements. It would
probably be best to do ‘leave-three’ out instead of ‘leave-one’ out since the
estimates for time t uses information from t − 1 and t + 1 (if present).
11.2.1.1 Diagnostics
The state residuals have a tendency for negative autocorrelation at lag-1

(Figure 11.4).
fit <- fit.ar1
par(mfrow = c(4, 4), mar = c(2, 2, 1, 1))
apply(residuals(fit)$state.residuals[, 1:30], 1, acf)
0.8
0.5
0.5
0.5
ACF
ACF
ACF
0.2
−0.5
−0.5
−0.4
−0.5
Series newX[, i] Series newX[, i] Series newX[, i] Series newX[, i]

0 2 4 6 8 12 0 2 4 6 8 12 0 2 4 6 8 12 0 2 4 6 8 12
Lag Lag Lag Lag
0.8
0.8
0.8
0.5
ACF
ACF
ACF
−0.4 0.2
0.2
0.2
−0.5
−0.4
−0.4

0 2 4 6 8 12 0 2 4 6 8 12 0 2 4 6 8 12 0 2 4 6 8 12
Lag Lag Lag Lag

0.5
0.5
0.5
0.5
ACF
ACF
ACF
−0.5
−0.5
−0.5
−0.5
Series newX[, i] Series newX[, i] Series newX[, i]

0 2 4 6 8 12 0 2 4 6 8 12 0 2 4 6 8 12 0 2 4 6 8 12
Lag Lag Lag Lag

0.8
0.5
0.5
ACF
ACF
0.2
−0.5
−0.5
−0.4
0 2 4 6 8 12 0 2 4 6 8 12 0 2 4 6 8 12
Figure 11.4: State residuals for the AR(1) model. Many stations for autocor-
relation at lag-1.
11.2.2 Estimate missing Feb SWE using only correla-

tion
Another approach is to treat the February data as temporally uncorrelated.

The two longest time series (Paradise and Olallie Meadows) show minimal
autocorrelation so we might decide to just use the correlation across stations
for our estimates. In this case, the state of the missing SWE values at time t
is the expected value conditioned on all the stations with data at time t given
the estimated variance-covariance matrix Q.
We could set this model up as
y1 a1 v1 σ1 ζ1,2 . . . ζ1,15
       
y  a  v  ζ σ2 . . . ζ2,15 
 2  2  2  2,1
  =  +  , (11.7)

. . . . . . . . . ... ... ... ... 
 
y15 t a15 t v15 t ζ15,1 ζ15,2 . . . σ15
However the EM algorithm used by MARSS() runs into numerical issues.

Instead we will set the model up as follows. Allowing a hidden state observed
with small error makes the estimation more stable.
x1 w1 w1 σ1 ζ1,2 y1. . . ζ1,15

x1 a1 v1 0.01
               
x  w  w  ζ σ2 y . . . ζ2,15 
x  a  v   0
 2  2  2  2,1  2  2  2  2
 =  ,  ∼   =  +  +  ,
 
. . . ... ... ... ... ... ... 
. . .  . . .   . . .  . . . ...
   
x15 t w15 t w15 t ζ15,1 ζ15,2 . . . σ15

y15 t x15 t a15 t v15 t 0
(11.8)
Again a is the mean level in the time series. Note that the expected value of
x is zero if there are no data, so E(x0 ) = 0.
ns <- length(unique(swe.feb$Station))
B <- "zero"
Q <- "unconstrained"
R <- diag(0.01, ns)
U <- "zero"
A <- "unequal"
x0 <- "zero"
mod.list.corr = list(B = B, Q = Q, R = R, U = U, x0 = x0, A = A,
tinitx = 0)
Now we can fit a MARSS model and get estimates of the missing SWEs.
Convergence is slow. We set a equal to the mean of the time series to speed
convergence.

fit.corr <- MARSS(dat.feb, model = mod.list.corr, control = list(maxit = 5000),
The estimated SWEs for the missing years uses the information about the
correlation with other sites only.
fit <- fit.corr
d <- broom::tidy(fit, type = "ytT", conf.int = TRUE)
d$Year <- d$t + 1980
d$Station <- d$.rownames
p <- ggplot(data = d) + geom_line(aes(Year, estimate)) + geom_point(aes(Year,
y)) + geom_ribbon(aes(x = Year, ymin = pred.low, ymax = pred.high),
linetype = 2, alpha = 0.2, fill = "blue") + facet_wrap(~Station) +
xlab("") + ylab("SWE (demeaned)")
p

75
50
25
0

75
50
SWE (demeaned)
25
0

75
50
25
0
1980 1990 2000 2010

75
50
25
0
1980 1990 2000 20101980 1990 2000 20101980 1990 2000 2010
Figure 11.5: Estimated SWEs from the expected value of the states x̂ condi-
tioned on all the data for the model with only correlation across stations at
time t.
11.2.2.1 Diagnostics
The state and model residuals have no tendency towards negative autocorre-
lation now that we removed the autoregressive component from the process
(x) model.
fit <- fit.corr
par(mfrow = c(4, 4), mar = c(2, 2, 1, 1))
apply(residuals(fit)$state.residuals, 1, acf, na.action = na.pass)
mtext("State Residuals ACF", outer = TRUE, side = 3)
1.0
1.0
1.0
0.8
0.4
0.4
0.4
ACF
ACF
ACF
−0.4 0.2
−0.2
−0.2
−0.2

0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
1.0
1.0
1.0
Lag Lag Lag Lag

0.5
0.4
0.4
0.4
ACF
ACF
ACF
−0.2
−0.2
−0.2
−0.5

0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
Lag Lag Lag Lag

0.8
0.8
0.8
0.8
ACF
ACF
ACF
−0.4 0.2
−0.4 0.2
−0.4 0.2
0.2
−0.4

0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
1.0
Lag Lag Lag Lag

0.8
0.5
0.4
ACF
ACF
−0.4 0.2
−0.2
−0.5
0 5 10 15 0 5 10 15 0 5 10 15
fit <- fit.corr

par(mfrow = c(4, 4), mar = c(2, 2, 1, 1))
apply(residuals(fit)$model.residuals[, 1:30], 1, acf, na.action = na.pass)
mtext("Model Residuals ACF", outer = TRUE, side = 3)
0.8
0.8
0.8
0.8
ACF
ACF
ACF
−0.4 0.2
0.2
0.2
0.2
−0.4
−0.4
−0.4
0 2 4 6 8 12 0 2 4 6 8 12 0 2 4 6 8 12 0 2 4 6 8 12
Lag Lag Lag Lag
0.8
0.8
0.8
0.8
ACF
ACF
ACF
−0.4 0.2
0.2
0.2
0.2
−0.4
−0.4
−0.4
0 2 4 6 8 12 0 2 4 6 8 12 0 2 4 6 8 12 0 2 4 6 8 12
Lag Lag Lag Lag

0.8
0.8
0.8
0.8
ACF
ACF
ACF
0.2
0.2
0.2
0.2
−0.4
−0.4
−0.4
−0.4
0 2 4 6 8 12 0 2 4 6 8 12 0 2 4 6 8 12 0 2 4 6 8 12
Lag Lag Lag Lag

0.8
0.8
0.8
ACF
ACF
−0.4 0.2
−0.4 0.2
0.2
−0.4
0 2 4 6 8 12 0 2 4 6 8 12 0 2 4 6 8 12
11.2.3 Estimate missing Feb SWE using DFA
Another approach we might take is to model SWE using Dynamic Factor

Analysis. Our model might take the following form with two factors, modeled
as AR(1) processes. a is the mean level of the time series.
y1 z1,1 0 " # a1 v1
       
" # " #" # " #
x1 b1 0 x1 w1 y 
 2
z
 2,1 z2,2  x1
 a  v 
 2  2
= +   = + + 
x2 0 b2 x2 w2 . . . . . .  x2  . . .  . . .

t t−1 t t
y15 t z3,1 z3,2 a15 v15 t
The model is set up as follows:

ns <- dim(dat.feb)[1]
B <- matrix(list(0), 2, 2)
B[1, 1] <- "b1"
B[2, 2] <- "b2"
Q <- diag(1, 2)
R <- "diagonal and unequal"
U <- "zero"
x0 <- "zero"
Z <- matrix(list(0), ns, 2)
Z[1:(ns * 2)] <- c(paste0("z1", 1:ns), paste0("z2", 1:ns))
Z[1, 2] <- 0
A <- "unequal"
mod.list.dfa = list(B = B, Z = Z, Q = Q, R = R, U = U, A = A,
x0 = x0)
Now we can fit a MARSS model and get estimates of the missing SWEs. We
pass in the initial value for a as the mean level so it fits easier.
library(MARSS)
fit.dfa <- MARSS(dat.feb, model = mod.list.dfa, control = list(maxit = 1000),

75
50
25
0

75
50
SWE (demeaned)
25
0

75
50
25
0
1980 1990 2000 2010
75
50
25
0
1980 1990 2000 20101980 1990 2000 20101980 1990 2000 2010
11.2.4 Diagnostics
The state residuals are uncorrelated.

fit <- fit.dfa
par(mfrow = c(1, 2), mar = c(2, 2, 1, 1))
apply(residuals(fit)$state.residuals[, 1:30, drop = FALSE], 1,

acf)
1.0
1.0
0.8
0.8
0.6
0.6
0.4
0.4
ACF
0.2
0.2
0.0
0.0
−0.2
−0.2
−0.4
−0.4
0 2 4 6 8 10 12 14 0 2 4 6 8 10 12 14
As are the model residuals:

par(mfrow = c(4, 4), mar = c(2, 2, 1, 1))
apply(residuals(fit)$model.residual, 1, function(x) {
acf(x, na.action = na.pass)
})
1.0
1.0
1.0
0.5
0.4
0.4
0.4
ACF
ACF
ACF
−0.2
−0.2
−0.2
−0.5
Series x Series x Series x Series x
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
1.0
Lag Lag Lag Lag
0.8
0.8
0.8
0.4
ACF
ACF
ACF
−0.4 0.2
−0.4 0.2
−0.4 0.2
−0.2
Series x Series x Series x Series x
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
1.0
1.0
1.0
1.0
Lag Lag Lag Lag
0.4
0.4
0.4
0.4
ACF
ACF
ACF
−0.2
−0.2
−0.2
−0.2
Series x Series x Series x
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
1.0
Lag Lag Lag Lag
0.5
0.5
0.4
ACF
ACF
−0.2
−0.5
−0.5
0 5 10 15 0 5 10 15 0 5 10 15
11.2.5 Plot the fitted or mean Feb SWE using DFA
The plots showed the estimate of the missing Feb SWE values, which is the
expected value of y conditioned on all the data. For the non-missing SWE,
this expected value is just the observation. Many times we want the model
fit for the covariate. If the measurements have observation error, the fitted
value is the estimate without this observation error.
We can use tidy() again but this time we want type="fitted.ytT". We

will not show the prediction intervals which would be for new data. We will
just show the confidence intervals on the fitted estimate. It is hard to see but
there are intervals for all years now.
11.3. MODELING SEASONAL SWE 307

75
50
25
0

75
50
SWE (demeaned)
25
0

75
50
25
0
1980 1990 2000 2010
75
50
25
0
1980 1990 2000 20101980 1990 2000 20101980 1990 2000 2010
11.3 Modeling Seasonal SWE
When we look at all months, we see that SWE is highly seasonal. Note
October and November are missing for all years.
swe.yr <- snotel
swe.yr <- swe.yr[swe.yr$Station.Id %in% y$Station.Id, ]
swe.yr$Station <- droplevels(swe.yr$Station)

100
75
50
25
0
100
75
50
25
0
SWE

100
75
50
25
0
2011 2012 2013
100
75
50
25
0
2011 2012 2013 2011 2012 2013 2011 2012 2013
Date
Set up the data matrix of monthly SNOTEL data:

dat.yr <- snotel
dat.yr <- dat.yr[dat.yr$Station.Id %in% y$Station.Id, ]
dat.yr$Station <- droplevels(dat.yr$Station)
dat.yr$Month <- factor(dat.yr$Month, level = month.abb)
dat.yr <- reshape2::acast(dat.yr, Station ~ Year + Month, value.var = "SWE")
We will model the seasonal differences using a periodic model. The covariates
are
period <- 12
TT <- dim(dat.yr)[2]
cos.t <- cos(2 * pi * seq(TT)/period)
sin.t <- sin(2 * pi * seq(TT)/period)
c.seas <- rbind(cos.t, sin.t)
11.3.1 Modeling season across sites
We will create a state for the seasonal cycle and each station will have a
scaled effect of that seasonal cycle. The observations will have the seasonal
effect plus a mean and residuals (observation - season - mean) will be allowed
to correlate across stations.

ns <- dim(dat.yr)[1]
B <- "zero"
Q <- matrix(1)
R <- "unconstrained"
U <- "zero"
x0 <- "zero"
Z <- matrix(paste0("z", 1:ns), ns, 1)
A <- "unequal"
mod.list.dfa = list(B = B, Z = Z, Q = Q, R = R, U = U, A = A,
x0 = x0)
C <- matrix(c("c1", "c2"), 1, 2)
c <- c.seas
mod.list.seas <- list(B = B, U = U, Q = Q, A = A, R = R, Z = Z,
C = C, c = c, x0 = x0, tinitx = 0)
Now we can fit the model:

m <- apply(dat.yr, 1, mean, na.rm = TRUE)
fit.seas <- MARSS(dat.yr, model = mod.list.seas, control = list(maxit = 500),
Figure @ref{fig:mssmiss-seas} shows the seasonal estimate plus prediction

intervals for each station. This is zi xi + ai . The prediction interval shows our
estimate of the range of the data we would see around the seasonal estimate.

100
50
0
−50
SWE seasonal component Huckleberry Creek Lynn Lake Meadows Pass Morse Lake
100
50
0
−50
100
50
0
−50
19821984198619881990
100
50
0
−50
19821984198619881990 19821984198619881990 19821984198619881990
The estimated mean SWE at each station is E(yi |y1:T ). This is the estimate
conditioned on all the data and includes the seasonal component plus the
information from the data from other stations. Because we estimated a R
matrix with covariance, stations with data at time t help inform the value of
stations without data at time t. Only years up to 1990 are shown, but the
model is fit to all years. The stations with no data before 1990 are being
estimated based on the information in the later years when they do have
data. We did not constrain the SWE to be positive, so negative estimates are
possible and occurs in the months in which we have no SWE data (because
there is no snow).

75
50
25
0

75
50
SWE (demeaned)
25
0

75
50
25
0
1982 1984 1986 1988 1990
75
50
25
0
1982 1984 1986 1988 1990 1982 1984 1986 1988 1990 1982 1984 1986 1988 1990
Chapter 12
JAGS for Bayesian time series

analysis
In this lab, we will work through using Bayesian methods to estimate pa-
rameters in time series models. There are a variety of software tools to do
time series analysis using Bayesian methods. R lists a number of packages
available on the R Cran TimeSeries task view.
Software to implement more complicated models is also available, and many

of you are probably familiar with these options (AD Model Builder and
Template Model Builder, WinBUGS, OpenBUGS, JAGS, Stan, to name a
few). In this chapter, we will show you how to write state-space models in
JAGS and fit these models.
After updating to the latest version of R, install JAGS for your operating
platform using the instructions here. Click on JAGS, then the most recent
folder, then the platform of your machine. You will also need the coda, rjags
and R2jags packages.
library(coda)
library(rjags)
library(R2jags)
313
314 CHAPTER 12. JAGS
12.1 The airquality dataset

For data for this lab, we will include a dataset on air quality in New York.
We will load the data and create a couple new variables for future use. For
the majority of our models, we are going to treat wind speed as the response
variable for our time series models.
data(airquality, package = "datasets")
Wind = airquality$Wind # wind speed
Temp = airquality$Temp # air temperature
N = dim(airquality)[1] # number of data points
12.2 Linear regression with no covariates

We will start with the simplest time series model possible: linear regression
with only an intercept, so that the predicted values of all observations are the
same. There are several ways we can write this equation. First, the predicted
values can be written as E[yt ] = µ. Assuming that the residuals are normally
distributed, the model linking our predictions to observed data are written as
yt = µ + et , et ∼ N(0, σ 2 ) (12.1)
An equivalent way to think about this model is that instead of the residuals
as normally distributed with mean zero, we can think of the data y as being
normally distributed with a mean of the intercept, and the same residual
standard deviation:
y ∼ N(E[yt ], σ 2 ) (12.2)
Remember that in linear regression models, the residual error is interpreted
as independent and identically distributed observation error.
To run the JAGS model, we will need to start by writing the model in JAGS
notation. For our linear regression model, one way to construct the model is
# 1. LINEAR REGRESSION with no covariates no covariates, so
# intercept only. The parameters are mean 'mu' and
# precision/variance parameter 'tau.obs'
12.2. LINEAR REGRESSION WITH NO COVARIATES 315
model.loc = "lm_intercept.txt" # name of the txt file

jagsscript = cat("
model {
mu ~ dnorm(0, 0.01); # mean = 0, sd = 1/sqrt(0.01)
tau.obs ~ dgamma(0.001,0.001); # This is inverse gamma
sd.obs <- 1/sqrt(tau.obs); # sd is treated as derived parameter
for(i in 1:N) {
Y[i] ~ dnorm(mu, tau.obs);
}
}
",
file = model.loc)
A couple things to notice: JAGS is not vectorized so we need to use for loops
(instead of matrix multiplication) and the dnorm notation means that we
assume that value (on the left) is normally distributed around a particular
mean with a particular precision (1 over the square root of the variance).
The model can briefly be summarized as follows: there are 2 parameters in the
model (the mean and variance of the observation error). JAGS is a bit funny in
that instead of giving a normal distribution the standard deviation or variance,
you pass in the precision (1/variance), so our prior on µ is pretty vague. The
precision receives a gamma prior, which is equivalent to the variance receiving
an inverse gamma prior (fairly common for standard Bayesian regression
models). We will treat the standard deviation as derived (if we know the
variance or precision, which we are estimating, we automatically know the
standard deviation). Finally, we write a model for the data yt (Y[i]). Again
we use the dnorm distribution to say that the data are normally distributed
(equivalent to our likelihood).
The function from the R2jags package that we actually use to run the model
is jags(). There is a parallel version of the function called jags.parallel()
which is useful for larger, more complex models. The details of both can be
found with ?jags or ?jags.parallel.
To actually run the model, we need to create several new objects, representing
(1) a list of data that we will pass to JAGS, (2) a vector of parameters that
we want to monitor in JAGS and have returned back to R, and (3) the name
of our text file that contains the JAGS model we wrote above. With those
three things, we can call the jags() function.
jags.data = list(Y = Wind, N = N) # named list of inputs
jags.params = c("sd.obs", "mu") # parameters to be monitored
mod_lm_intercept = jags(jags.data, parameters.to.save = jags.params,
model.file = model.loc, n.chains = 3, n.burnin = 5000, n.thin = 1,
n.iter = 10000, DIC = TRUE)
Notice that the jags() function contains a number of other important argu-
ments. In general, larger is better for all arguments: we want to run multiple
MCMC chains (maybe 3 or more), and have a burn-in of at least 5000. The
total number of samples after the burn-in period is n.iter-n.burnin, which in
this case is 5000 samples. Because we are doing this with 3 MCMC chains,
and the thinning rate equals 1 (meaning we are saving every sample), we will
retain a total of 1500 posterior samples for each parameter.
The saved object storing our model diagnostics can be accessed directly, and
includes some useful summary output.
mod_lm_intercept
Inference for Bugs model at "lm_intercept.txt", fit using jags,

3 chains, each with 10000 iterations (first 5000 discarded)
n.sims = 15000 iterations saved
mu.vect sd.vect 2.5% 25% 50% 75% 97.5% Rhat n.eff
mu 9.951 0.289 9.384 9.758 9.950 10.147 10.516 1.001 15000
sd.obs 3.541 0.205 3.169 3.399 3.530 3.672 3.973 1.001 3500
deviance 820.566 2.022 818.594 819.122 819.925 821.383 826.087 1.001 15000
For each parameter, n.eff is a crude measure of effective sample size,

and Rhat is the potential scale reduction factor (at convergence, Rhat=1).
DIC info (using the rule, pD = var(deviance)/2)

pD = 2.0 and DIC = 822.6
DIC is an estimate of expected predictive error (lower deviance is better).
The last 2 columns in the summary contain Rhat (which we want to be close
to 1.0), and neff (the effective sample size of each set of posterior draws). To
examine the output more closely, we can pull all of the results directly into R,
attach.jags(mod_lm_intercept)
mu
Attaching the R2jags object allows us to work with the named parameters
directly in R. For example, we could make a histogram of the posterior
distributions of the parameters mu and sd.obs with the following code,
# Now we can make plots of posterior values
hist(mu, 40, col = "grey", xlab = "Mean", main = "")
hist(sd.obs, 40, col = "grey", xlab = expression(sigma[obs]),
main = "")
Finally, we can run some useful diagnostics from the coda package on this
model output. We have written a small function to make the creation of
mcmc lists (an argument required for many of the diagnostics). The function
createMcmcList = function(jagsmodel) {
McmcArray = as.array(jagsmodel$BUGSoutput$sims.array)
McmcList = vector("list", length = dim(McmcArray)[2])
for (i in 1:length(McmcList)) McmcList[[i]] = as.mcmc(McmcArray[,
i, ])
McmcList = mcmc.list(McmcList)
return(McmcList)
}
Creating the MCMC list preserves the random samples generated from each
chain and allows you to extract the samples for a given parameter (such as µ)
from any chain you want. To extract µ from the first chain, for example, you
could use the following code. Because createMcmcList() returns a list of
mcmc objects, we can summarize and plot these directly. Figure 12.2 shows
the plot from plot(myList[[1]]).
myList = createMcmcList(mod_lm_intercept)
summary(myList[[1]])
0.8
Frequency
0.4
0.0
0 1 2 3 4 5
Mean
1500
Frequency
500
0
3.0 3.5 4.0 4.5
σobs
Figure 12.1: Plot of the posteriors for the linear regression model.
Iterations = 1:5000
Thinning interval = 1
Number of chains = 1
Sample size per chain = 5000
1. Empirical mean and standard deviation for each variable,

plus standard error of the mean:
Mean SD Naive SE Time-series SE

deviance 820.560 2.0045 0.028347 0.028347
mu 9.955 0.2900 0.004101 0.004360
sd.obs 3.548 0.2032 0.002874 0.002874
2. Quantiles for each variable:
2.5% 25% 50% 75% 97.5%

deviance 818.594 819.126 819.946 821.371 826.010
mu 9.392 9.758 9.955 10.147 10.518
sd.obs 3.181 3.404 3.538 3.678 3.987
plot(myList[[1]])
Trace of deviance Density of deviance

820
0.0
0 1000 2000 3000 4000 5000 820 825 830 835
Iterations N = 5000 Bandwidth = 0.3233
Trace of mu Density of mu
1.4
9.0
0.0
0 1000 2000 3000 4000 5000 9.0 9.5 10.0 10.5 11.0
Trace of sd.obs Density of sd.obs

2.0
3.0 4.5
0.0
0 1000 2000 3000 4000 5000 3.0 3.5 4.0 4.5

Trace of deviance Density of deviance

830
0.2
820
0.0
0 1000 2000 3000 4000 5000 820 825 830 835
Trace of mu Density of mu
10.0 11.0
1.2
0.6
9.0
0.0
0 1000 2000 3000 4000 5000 9.0 9.5 10.0 10.5 11.0
Trace of sd.obs Density of sd.obs

2.0
4.0
1.0
3.0
0.0
0 1000 2000 3000 4000 5000 3.0 3.5 4.0 4.5
Figure 12.2: Plot of an object output from creatMcmcList.

12.3. REGRESSION WITH AUTOCORRELATED ERRORS 321
For more quantitative diagnostics of MCMC convergence, we can rely on the

coda package in R. There are several useful statistics available, including
the Gelman-Rubin diagnostic (for one or several chains), autocorrelation
diagnostics (similar to the ACF you calculated above), the Geweke diagnostic,
and Heidelberger-Welch test of stationarity.
# Run the majority of the diagnostics that CODA() offers
library(coda)
gelmanDiags = gelman.diag(createMcmcList(mod_lm_intercept), multivariate = F)
autocorDiags = autocorr.diag(createMcmcList(mod_lm_intercept))
gewekeDiags = geweke.diag(createMcmcList(mod_lm_intercept))
heidelDiags = heidel.diag(createMcmcList(mod_lm_intercept))
12.3 Regression with autocorrelated errors
In our first model, the errors were independent in time. We are going to
modify this to model autocorrelated errors. Autocorrelated errors are widely
used in ecology and other fields – for a greater discussion, see Morris and
Doak (2002) Quantitative Conservation Biology. To make the deviations
autocorrelated, we start by defining the deviation in the first time step,
e1 = Y1 − u. The expectation of yt in each time step is then written as
E[yt ] = µ + φ ∗ et−1 (12.3)
In addition to affecting the expectation, the correlation parameter φ also

affects the variance of the errors, so that

σ 2 = ψ 2 1 − φ2 (12.4)
Like in our first model, we assume that the data follow a normal likelihood
(or equivalently that the residuals are normally distributed), yt = E[yt ] + et ,
or yt ∼ N(E[yt ], σ 2 ). Thus, it is possible to express the subsequent deviations
as et = yt − E[yt ], or equivalently as et = yt − µ − φ × et−1 . The JAGS script
for this model is:
# 2. LINEAR REGRESSION WITH AUTOCORRELATED ERRORS no

# covariates, so intercept only.
model.loc = ("lmcor_intercept.txt")
jagsscript = cat("
model {
mu ~ dnorm(0, 0.01);
tau.obs ~ dgamma(0.001,0.001);
sd.obs <- 1/sqrt(tau.obs);
phi ~ dunif(-1,1);
tau.cor <- tau.obs / (1-phi*phi); # Var = sigma2 * (1-rho^2)
epsilon[1] <- Y[1] - mu;

predY[1] <- mu; # initial value
for(i in 2:N) {
predY[i] <- mu + phi * epsilon[i-1];
Y[i] ~ dnorm(predY[i], tau.cor);
epsilon[i] <- (Y[i] - mu) - phi*epsilon[i-1];
}
}
",
file = model.loc)
Notice several subtle changes from the simpler first model: (1) we are esti-
mating the autocorrelation parameter φ, which is assigned a Uniform(-1, 1)
prior, (2) we model the residual variance as a function of the autocorrelation,
and (3) we allow the autocorrelation to affect the predicted values predY.
One other change we can make is to add predY to the list of parameters we
want returned to R.
jags.data = list(Y = Wind, N = N)
jags.params = c("sd.obs", "predY", "mu", "phi")
mod_lmcor_intercept = jags(jags.data, parameters.to.save = jags.params,
For some models, we may be interested in examining the posterior fits to

12.4. RANDOM WALK TIME SERIES MODEL 323
data. You can make this plot yourself, but we have also put together a simple
function whose arguments are one of our fitted models and the raw data. The
function is:
plotModelOutput = function(jagsmodel, Y) {
# attach the model
attach.jags(jagsmodel)
x = seq(1, length(Y))
summaryPredictions = cbind(apply(predY, 2, quantile, 0.025),
apply(predY, 2, mean), apply(predY, 2, quantile, 0.975))
plot(Y, col = "white", ylim = c(min(c(Y, summaryPredictions)),
max(c(Y, summaryPredictions))), xlab = "", ylab = "95% CIs of predictions and
main = paste("JAGS results:", jagsmodel$model.file))
polygon(c(x, rev(x)), c(summaryPredictions[, 1], rev(summaryPredictions[,
3])), col = "grey70", border = NA)
lines(summaryPredictions[, 2])
points(Y)
}
We can use the function to plot the predicted posterior mean with 95% CIs,
as well as the raw data. For example, try
plotModelOutput(mod_lmcor_intercept, Wind)
mu
12.4 Random walk time series model
All of the previous three models can be interpreted as observation error

models. Switching gears, we can alternatively model error in the state of
nature, creating process error models. A simple process error model that
many of you may have seen before is the random walk model. In this model,
the assumption is that the true state of nature (or latent states) are measured
perfectly. Thus, all uncertainty is originating from process variation (for
ecological problems, this is often interpreted as environmental variation). For
JAGS results: lmcor_intercept.txt

20
95% CIs of predictions and data
15
10
5
0 50 100 150
Figure 12.3: Predicted posterior mean with 95% CIs

12.4. RANDOM WALK TIME SERIES MODEL 325
this simple model, we will assume that our process of interest (in this case,
daily wind speed) exhibits no daily trend, but behaves as a random walk.
E[yt ] = yt−1 + et−1 (12.5)
And the et ∼ N(0, σ 2 ). Remember back to the autocorrelated model (or

MA(1) models) that we assumed that the errors et followed a random walk.
In contrast, the AR(1) model assumes that the errors are independent, but
that the state of nature follows a random walk. The JAGS random walk
model and R script to run it is below:
# 3. AR(1) MODEL WITH NO ESTIMATED AR COEFFICIENT = RANDOM
# WALK no covariates. The model is y[t] ~ Normal(y[n-1],
# sigma) for we will call the precision tau.pro Note too that
# we have to define predY[1]
model.loc = ("rw_intercept.txt")
jagsscript = cat("
model {
mu ~ dnorm(0, 0.01);
tau.pro ~ dgamma(0.001,0.001);
sd.pro <- 1/sqrt(tau.pro);
predY[1] <- mu; # initial value

for(i in 2:N) {
predY[i] <- Y[i-1];
Y[i] ~ dnorm(predY[i], tau.pro);
}
}
",
file = model.loc)

jags.params = c("sd.pro", "predY", "mu")
mod_rw_intercept = jags(jags.data, parameters.to.save = jags.params,
12.5 Autoregressive AR(1) time series mod-

els
A variation of the random walk model described previously is the autoregres-
sive time series model of order 1, AR(1). This model introduces a coefficient,
which we will call φ. The parameter φ controls the degree to which the
random walk reverts to the mean—when φ = 1, the model is identical to the
random walk, but at smaller values, the model will revert back to the mean
(which in this case is zero). Also, φ can take on negative values, which we
will discuss more in future lectures. The math to describe the AR(1) time
series model is:
E[yt ] = φ ∗ yt−1 + et−1 (12.6)
The JAGS random walk model and R script to run the AR(1) model is below:
# 4. AR(1) MODEL WITH AND ESTIMATED AR COEFFICIENT We're
# introducting a new AR coefficient 'phi', so the model is
# y[t] ~ N(mu + phi*y[n-1], sigma^2)
model.loc = ("ar1_intercept.txt")
jagsscript = cat("
model {
mu ~ dnorm(0, 0.01);
sd.pro <- 1/sqrt(tau.pro);
phi ~ dnorm(0, 1);
predY[1] <- Y[1];

for(i in 2:N) {
predY[i] <- mu + phi * Y[i-1];
Y[i] ~ dnorm(predY[i], tau.pro);
}
}
",
file = model.loc)
12.6. UNIVARIATE STATE SPACE MODEL 327

jags.params = c("sd.pro", "predY", "mu", "phi")
mod_ar1_intercept = jags(jags.data, parameters.to.save = jags.params,
12.6 Univariate state space model

At this point, we have fit models with observation or process error, but we
have not tried to estimate both simultaneously. We will do so here, and
introduce some new notation to describe the process model and observation
model. We use the notation xt to denote the latent state or state of nature
(which is unobserved) at time t and yt to denote the observed data. For
introductory purposes, we will make the process model autoregressive (similar
to our AR(1) model),
xt = φ ∗ xt−1 + et−1 ; et−1 ∼ N(0, q) (12.7)
For the process model, there are a number of ways to parameterize the first
state (x1 ), and we will talk about this more in the class. For the sake of this
model, we will place a vague weakly informative prior on x1 : x1 ∼ N(0, 0.01).
Second, we need to construct an observation model linking the estimate
unseen states of nature xt to the data yt . For simplicitly, we will assume that
the observation errors are indepdendent and identically distributed, with no
observation component. Mathematically, this model is
yt ∼ N(xt , r) (12.8)
In the two above models, q is the process variance and r is the observation
error variance. The JAGS code will use the standard deviation (square root)
of these. The code to produce and fit this model is below:
# 5. MAKE THE SS MODEL a univariate random walk no
# covariates.
model.loc = ("ss_model.txt")
jagsscript = cat("
model {
mu ~ dnorm(0, 0.01);
sd.q <- 1/sqrt(tau.pro);
sd.r <- 1/sqrt(tau.obs);
phi ~ dnorm(0,1);
X[1] <- mu;

predY[1] <- X[1];
Y[1] ~ dnorm(X[1], tau.obs);
for(i in 2:N) {
predX[i] <- phi*X[i-1];
X[i] ~ dnorm(predX[i],tau.pro); # Process variation
predY[i] <- X[i];
Y[i] ~ dnorm(X[i], tau.obs); # Observation variation
}
}
",
file = model.loc)

jags.params = c("sd.q", "sd.r", "predY", "mu")
mod_ss = jags(jags.data, parameters.to.save = jags.params, model.file = model.
DIC = TRUE)
12.6.1 Including covariates
Returning to the first example of regression with the intercept only, we will
introduce Temp as the covariate explaining our response variable Wind. Note
that to include the covariate, we (1) modify the JAGS script to include a new
coefficient—in this case beta, (2) update the predictive equation to include
12.7. FORECASTING WITH JAGS MODELS 329
the effects of the new covariate, and (3) we include the new covariate in our
named data list.
# 6. Include some covariates in a linear regression Use
# temperature as a predictor of wind
model.loc = ("lm.txt")
jagsscript = cat("
model {
mu ~ dnorm(0, 0.01);
beta ~ dnorm(0,0.01);
sd.obs <- 1/sqrt(tau.obs);
for(i in 1:N) {
predY[i] <- mu + C[i]*beta;
Y[i] ~ dnorm(predY[i], tau.obs);
}
}
",
file = model.loc)
jags.data = list(Y = Wind, N = N, C = Temp)

jags.params = c("sd.obs", "predY", "mu", "beta")
mod_lm = jags(jags.data, parameters.to.save = jags.params, model.file = model.loc,
DIC = TRUE)
12.7 Forecasting with JAGS models
There are a number of different approaches to using Bayesian time series

models to perform forecasting. One approach might be to fit a model, and
use those posterior distributions to forecast as a secondary step (say within
R). A more streamlined approach is to do this within the JAGS code itself.
We can take advantage of the fact that JAGS allows you to include NAs in
the response variable (but never in the predictors). Let’s use the same Wind
dataset, and the univariate state-space model described above to forecast

three time steps into the future. We can do this by including 3 more NAs in
the dataset, and incrementing the variable N by 3.
jags.data = list(Y = c(Wind, NA, NA, NA), N = (N + 3))
jags.params = c("sd.q", "sd.r", "predY", "mu")
model.loc = ("ss_model.txt")
mod_ss_forecast = jags(jags.data, parameters.to.save = jags.params,
We can inspect the fitted model object, and see that predY contains the 3
new predictions for the forecasts from this model.
12.8. PROBLEMS 331
12.8 Problems
1. Fit the intercept only model from section 12.2. Set the burn-in to 3,
and when the model completes, plot the time series of the parameter
mu for the first MCMC chain.
a. Based on your visual inspection, has the MCMC chain convered?
b. What is the ACF of the first MCMC chain?
2. Increase the MCMC burn-in for the model in question 1 to a value that
you think is reasonable. After the model has converged, calculate the
Gelman-Rubin diagnostic for the fitted model object.
3. Compare the results of the plotModelOutput() function for the inter-
cept only model from section 12.2. You will to add “predY” to your
JAGS model and to the list of parameters to monitor, and re-run the
model.
4. Modify the random walk model without drift from section 12.4 to a
random walk model with drift. The equation for this model is
E[yt ] = yt−1 + µ + et−1
where µ is interpreted as the average daily trend in wind speed. What

might be a reasonable prior on µ?
5. Plot the posterior distribution of φ for the AR(1) model in section 12.5.
Can this parameter be well estimated for this dataset?
6. Plot the posteriors for the process and observation variances (not stan-
dard deviation) for the univariate state-space model in section 12.6.
Which is larger for this dataset?
7. Add the effect of temperature to the AR(1) model in section 12.5. Plot
the posterior for beta and compare to the posterior for beta from the
model in section 12.6.1.
8. Plot the fitted values from the model in section 12.7, including the
forecasts, with the 95% credible intervals for each data point.
9. The following is a dataset from the Upper Skagit River (Puget Sound,
1952-2005) on salmon spawners and recruits:
Spawners = c(2662, 1806, 1707, 1339, 1686, 2220, 3121, 5028,

9263, 4567, 1850, 3353, 2836, 3961, 4624, 3262, 3898, 3039,
5966, 5931, 7346, 4911, 3116, 3185, 5590, 2485, 2987, 3829,
4921, 2348, 1932, 3151, 2306, 1686, 4584, 2635, 2339, 1454,
3705, 1510, 1331, 942, 884, 666, 1521, 409, 2388, 1043, 3262,
2606, 4866, 1161, 3070, 3320)
Recruits = c(12741, 15618, 23675, 37710, 62260, 32725, 8659,
28101, 17054, 29885, 33047, 20059, 35192, 11006, 48154, 35829,
46231, 32405, 20782, 21340, 58392, 21553, 27528, 28246, 35163,
15419, 16276, 32946, 11075, 16909, 22359, 8022, 16445, 2912,
17642, 2929, 7554, 3047, 3488, 577, 4511, 1478, 3283, 1633,
8536, 7019, 3947, 2789, 4606, 3545, 4421, 1289, 6416, 3647)
logRS = log(Recruits/Spawners)
a. Fit the following Ricker model to these data using the following
linear form of this model with normally distributed errors:
log(Rt /St ) = a + b × St + et , where et ∼ N(0, σ 2 )

You will recognize that this form is exactly the same as linear
regression, with independent errors (very similar to the intercept
only model of Wind we fit in section 12.2).
b. Within the constraints of the Ricker model, think about other ways
you might want to treat the errors. The basic model described
above has independent errors that are not correlated in time.
Approaches to analyzing this dataset might involve
• modeling the errors as independent (as described above)
• modeling the errors as autocorrelated
• fitting a state-space model, with independent or correlated
process errors
Fit each of these models, and compare their performance (either using
their predictive ability, or forecasting ability).
Chapter 13
Stan for Bayesian time series

analysis
For this lab, we will use Stan for fitting models. These examples are primarily
drawn from the Stan manual and previous code from this class.
A script with all the R code in the chapter can be downloaded here.
Data and packages
You will need the atsar package we have written for fitting state-space time
series models with Stan. This is hosted on Github safs-timeseries. Install
using the devtools package.
library(devtools)
devtools::install_github("nwfsc-timeseries/atsar")
In addition, you will need the rstan, datasets, parallel and loo packages.
After installing, if needed, load the packages:
library(atsar)
library(rstan)
library(loo)
Once you have Stan and rstan installed, optimize Stan on your machine:
333
334 CHAPTER 13. STAN
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())
For this lab, we will use a data set on airquality in New York from the
datasets package. Load the data and create a couple new variables for future
use.
data(airquality, package = "datasets")
Wind <- airquality$Wind # wind speed
Temp <- airquality$Temp # air temperature
13.1 Linear regression

We’ll start with the simplest time series model possible: linear regression with
only an intercept, so that the predicted values of all observations are the same.
There are several ways we can write this equation. First, the predicted values
can be written as E[Yt ] = βx, where x = 1. Assuming that the residuals are
normally distributed, the model linking our predictions to observed data is
written as
yt = βx + et , et ∼ N (0, σ), x = 1
An equivalent way to think about this model is that instead of the residuals
as normally distributed with mean zero, we can think of the data yt as being
drawn from a normal distribution with a mean of the intercept, and the same
residual standard deviation:
Yt ∼ N (E[Yt ], σ)
Remember that in linear regression models, the residual error is interpreted

as independent and identically distributed observation error.
To run this model using our package, we’ll need to specify the response and
predictor variables. The covariate matrix with an intercept only is a matrix
of 1s. To double check, you could always look at
x <- model.matrix(lm(Temp ~ 1))
Fitting the model using our function is done with this code,
13.1. LINEAR REGRESSION 335
lm_intercept <- atsar::fit_stan(y = as.numeric(Temp), x = rep(1,

length(Temp)), model_name = "regression")
Coarse summaries of stanfit objects can be examined by typing one of the

following
lm_intercept
# this is huge
summary(lm_intercept)
But to get more detailed output for each parameter, you have to use the
extract() function,
pars <- rstan::extract(lm_intercept)
names(pars)
[1] "beta" "sigma" "pred" "log_lik" "lp__"
extract() will return the draws from the posterior for your parameters and
any derived variables specified in your stan code. In this case, our model is
yt = β × 1 + et , et ∼ N (0, σ)
so our estimated parameters are β and σ. Our stan code computed the derived
variables: predicted yt which is ŷt = β × 1 and the log-likelihood. lp__ is
the log posterior which is automatically returned.
We can then make basic plots or summaries of each of these parameters,

hist(pars$beta, 40, col = "grey", xlab = "Intercept", main = "")
120
Frequency
80
60
40
20
0
2 4 6 8 10 12 14 16
Intercept
quantile(pars$beta, c(0.025, 0.5, 0.975))
2.5% 50% 97.5%

4.781097 8.871113 12.970220
One of the other useful things we can do is look at the predicted values of our
model (ŷt = β × 1) and overlay the data. The predicted values are pars$pred.
plot(apply(pars$pred, 2, mean), main = "Predicted values", lwd = 2,
ylab = "Wind", ylim = c(min(pars$pred), max(pars$pred)),
type = "l")
lines(apply(pars$pred, 2, quantile, 0.025))
lines(apply(pars$pred, 2, quantile, 0.975))
points(Wind, col = "red")
13.1.1 Burn-in and thinning
To illustrate the effects of the burn-in/warmup period and thinning, we can

re-run the above model, but for just 1 MCMC chain (the default is 3).
lm_intercept <- atsar::fit_stan(y = Temp, x = rep(1, length(Temp)),
model_name = "regression", mcmc_list = list(n_mcmc = 1000,
13.2. LINEAR REGRESSION WITH CORRELATED ERRORS 337
Predicted values
14
12
10
Wind
8
6
4
2
0 50 100 150
Index
Figure 13.1: Data and predicted values for the linear regression model.
n_burn = 1, n_chain = 1, n_thin = 1))
Here is a plot of the time series of beta with one chain and no burn-in. Based
on visual inspection, when does the chain converge?
pars <- rstan::extract(lm_intercept)
plot(pars$beta)
13.2 Linear regression with correlated errors
In our first model, the errors were independent in time. We’re going to modify
this to model autocorrelated errors. Autocorrelated errors are widely used in
ecology and other fields – for a greater discussion, see Morris and Doak (2002)
Quantitative Conservation Biology. To make the errors autocorrelated, we
start by defining the error in the first time step, e1 = y1 − β. The expectation
of Yt in each time step is then written as
E[Yt ] = β + φet−1
15
10
pars$beta
5
0
0 100 200 300 400 500
Index
Figure 13.2: A time series of our posterior draws using one chain and no
burn-in.
In addition to affecting the expectation, the correlation parameter φ also

affects the variance of the errors, so that

σ 2 = ψ 2 1 − φ2
Like in our first model, we assume that the data follows a normal likelihood
(or equivalently that the residuals are normally distributed), yt = E[Yt ] + et ,
or Yt ∼ N (E[Yt ], σ). Thus, it is possible to express the subsequent deviations
as et = yt − E[Yt ], or equivalently as et = yt − β − φet−1 .
We can fit this regression with autocorrelated errors by changing the model
name to ‘regression_cor’
lm_intercept_cor <- atsar::fit_stan(y = Temp, x = rep(1, length(Temp)),
model_name = "regression_cor", mcmc_list = list(n_mcmc = 1000,
n_burn = 1, n_chain = 1, n_thin = 1))
13.3. RANDOM WALK MODEL 339
13.3 Random walk model
All of the previous three models can be interpreted as observation error

models. Switching gears, we can alternatively model error in the state of
nature, creating process error models. A simple process error model that
many of you may have seen before is the random walk model. In this model,
the assumption is that the true state of nature (or latent states) are measured
perfectly. Thus, all uncertainty is originating from process variation (for
ecological problems, this is often interpreted as environmental variation). For
this simple model, we’ll assume that our process of interest (in this case, daily
wind speed) exhibits no daily trend, but behaves as a random walk.
yt = yt−1 + et
And the et ∼ N (0, σ). Remember back to the autocorrelated model (or
MA(1) models) that we assumed that the errors et followed a random walk.
In contrast, this model assumes that the errors are independent, but that the
state of nature follows a random walk. Note also that this model as written
doesn’t include a drift term (this can be turned on / off using the est_drift
argument).
We can fit the random walk model using argument model_name = 'rw'
passed to the fit_stan() function.
rw <- atsar::fit_stan(y = Temp, est_drift = FALSE, model_name = "rw")
13.4 Autoregressive models
A variation of the random walk model described previously is the autoregres-

sive time series model of order 1, AR(1). This model is essentially the same
as the random walk model but it introduces an estimated coefficient, which
we will call φ. The parameter φ controls the degree to which the random
walk reverts to the mean – when φ = 1, the model is identical to the random
walk, but at smaller values, the model will revert back to the mean (which
in this case is zero). Also, φ can take on negative values, which we’ll discuss
more in future lectures. The math to describe the AR(1) model is:
yt = φyt−1 + et
.
The fit_stan() function can fit higher order AR models, but for now we
just want to fit an AR(1) model and make a histogram of phi.
ar1 <- atsar::fit_stan(y = Temp, x = matrix(1, nrow = length(Temp),
ncol = 1), model_name = "ar", est_drift = FALSE, P = 1)
13.5 Univariate state-space models

At this point, we’ve fit models with observation or process error, but we
haven’t tried to estimate both simultaneously. We will do so here, and
introduce some new notation to describe the process model and observation
model. We use the notation xt to denote the latent state or state of nature
(which is unobserved) at time t and yt to denote the observed data. For
introductory purposes, we’ll make the process model autoregressive (similar
to our AR(1) model),
xt = φxt−1 + et , et ∼ N (0, q)
For the process model, there are a number of ways to parameterize the first
‘state’, and we’ll talk about this more in the class, but for the sake of this model,
we’ll place a vague weakly informative prior on x1 , x1 ∼ N (0, 0.01).Second,
we need to construct an observation model linking the estimate unseen states
of nature xt to the data yt . For simplicitly, we’ll assume that the observation
errors are indepdendent and identically distributed, with no observation
component. Mathematically, this model is
Yt ∼ N (xt , r)
In the two above models, we’ll refer to q as the standard deviation of the
process variance and r as the standard deviation of the observation error
variance
13.6. DYNAMIC FACTOR ANALYSIS 341
We can fit the state-space AR(1) and random walk models using the
fit_stan() function:
ss_ar <- atsar::fit_stan(y = Temp, est_drift = FALSE, model_name = "ss_ar")
ss_rw <- atsar::fit_stan(y = Temp, est_drift = FALSE, model_name = "ss_rw")
13.6 Dynamic factor analysis
First load the plankton dataset from the MARSS package.

library(MARSS)
# we want lakeWAplanktonTrans, which has been transformed so
# the 0s are replaced with NAs and the data z-scored
dat <- lakeWAplanktonTrans
# use only the 10 years from 1980-1989
plankdat <- dat[dat[, "Year"] >= 1980 & dat[, "Year"] < 1990,
]
# create vector of phytoplankton group names
phytoplankton <- c("Cryptomonas", "Diatoms", "Greens", "Unicells",
"Other.algae")
# get only the phytoplankton
dat.spp.1980 <- t(plankdat[, phytoplankton])
# z-score the data since we subsetted time
dat.spp.1980 <- dat.spp.1980 - apply(dat.spp.1980, 1, mean, na.rm = TRUE)
dat.spp.1980 <- dat.spp.1980/sqrt(apply(dat.spp.1980, 1, var,
na.rm = TRUE))
# check our z-score
apply(dat.spp.1980, 1, mean, na.rm = TRUE)
Cryptomonas Diatoms Greens Unicells Other.algae

4.951913e-17 -1.337183e-17 3.737694e-18 -5.276451e-18 4.365269e-18
apply(dat.spp.1980, 1, var, na.rm = TRUE)
Cryptomonas Diatoms Greens Unicells Other.algae

1 1 1 1 1
Plot the data.

# make into ts since easier to plot
dat.ts <- ts(t(dat.spp.1980), frequency = 12, start = c(1980,
1))
par(mfrow = c(3, 2), mar = c(2, 2, 2, 2))
for (i in 1:5) plot(dat.ts[, i], type = "b", main = colnames(dat.ts)[i],
col = "blue", pch = 16)
Cryptomonas Diatoms
0 1 2 3
2
1
dat.ts[, i]
0
−2
−2
1980 1982 1984 1986 1988 1990 1980 1982 1984 1986 1988 1990
Greens
Time
Unicells
Time
0 1 2 3
0
dat.ts[, i]
−2
−2
−4
1980 1982 1984 1986 1988 1990 1980 1982 1984 1986 1988 1990
Other.algae
Time Time
2
1
0
−2
1980 1982 1984 1986 1988 1990
Figure 13.3: Phytoplankton data.
Run a 3 trend model on these data.

mod_3 <- atsar::fit_dfa(y = dat.spp.1980, num_trends = 3)
Rotate the estimated trends and look at what it produces.

rot <- atsar::rotate_trends(mod_3)
names(rot)
[1] "Z_rot" "trends" "Z_rot_mean" "trends_mean" "trends_lower"

[6] "trends_upper"
Plot the estimate of the trends.
13.6. DYNAMIC FACTOR ANALYSIS 343
matplot(t(rot$trends_mean), type = "l", lwd = 2, ylab = "mean trend")

4
mean trend
2
0
−2
0 20 40 60 80 100 120
Figure 13.4: Trends.
13.6.1 Using leave one out cross-validation to select

models
We will fit multiple DFA with different numbers of trends and use leave one
out (LOO) cross-validation to choose the best model.
mod_1 = atsar::fit_dfa(y = dat.spp.1980, num_trends = 1)
Warning: The largest R-hat is 1.07, indicating chains have not mixed.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#r-hat
Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and
http://mc-stan.org/misc/warnings.html#bulk-ess
Warning: Tail Effective Samples Size (ESS) is too low, indicating posterior va
http://mc-stan.org/misc/warnings.html#tail-ess
Warning: The largest R-hat is 1.07, indicating chains have not mixed.
http://mc-stan.org/misc/warnings.html#r-hat
Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior me
Warning: There were 15 transitions after warmup that exceeded the maximum tree
http://mc-stan.org/misc/warnings.html#maximum-treedepth-exceeded
Warning: Examine the pairs() plot to diagnose sampling problems
Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior me
We will compute the Leave One Out Information Criterion (LOOIC) using
the loo package. Like AIC, lower is better.
loo::loo(loo::extract_log_lik(mod_1))$looic
[1] 1634.44
Table of the LOOIC values:

13.7. UNCERTAINTY INTERVALS ON STATES 345
looics = c(loo::loo(loo::extract_log_lik(mod_1))$looic, loo::loo(loo::extract_log_lik

loo::loo(loo::extract_log_lik(mod_3))$looic, loo::loo(loo::extract_log_lik(mod_4)
loo::loo(loo::extract_log_lik(mod_5))$looic)
looic.table <- data.frame(trends = 1:5, LOOIC = looics)
looic.table
trends LOOIC
1 1 1634.440
2 2 1561.259
3 3 1480.432
4 4 1415.185
5 5 1402.356
13.7 Uncertainty intervals on states
We will look at the effect of missing data on the uncertainty intervals on

estimates states using a DFA on the harbor seal dataset.
# the first column is year
matplot(harborSealWA[, 1], harborSealWA[, -1], type = "l", ylab = "Log abundance",
xlab = "")
8.5
8.0
Log abundance
7.5
7.0
6.5
6.0
1980 1985 1990 1995
Assume they are all observing a single trend.

seal.mod <- atsar::fit_dfa(y = t(harborSealWA[, -1]), num_trends = 1)
pars <- rstan::extract(seal.mod)
pred_mean <- c(apply(pars$x, c(2, 3), mean))

pred_lo <- c(apply(pars$x, c(2, 3), quantile, 0.025))
pred_hi <- c(apply(pars$x, c(2, 3), quantile, 0.975))
plot(pred_mean, type = "l", lwd = 3, ylim = range(c(pred_mean,

pred_lo, pred_hi)), main = "Trend")
lines(pred_lo)
lines(pred_hi)
13.7. UNCERTAINTY INTERVALS ON STATES 347
Trend
4
2
0
pred_mean
−2
−4
−6
−8
5 10 15 20
Index
Figure 13.5: Estimated states and 95 percent credible intervals.

13.8 Problems
1. By adapting the code in Section 13.1, fit a regression model that includes
the intercept and a slope, modeling the effect of Wind. What is the
mean wind effect you estimate?
2. Using the results from the linear regression model fit with no burn-in
(Section 13.1.1), calculate the ACF of the beta time series using acf().
Would thinning more be appropriate? How much?
3. Using the fit of the random walk model to the temperature data (Section
13.3), plot the predicted values (states) and 95% CIs.
4. To see the effect of this increased flexibility in estimating the autocorre-
lation, make a plot of the predictions from the AR(1) model (Section
13.4 and the RW model (13.3).
5. Fit the univariate state-space model (Section 13.5) with and without
the autoregressive parameter φ and compare the estimated process and
observation error variances. Recall that AR(1) without the φ parameter
is a random walk.
Bibliography
Dormann, C. F., Elith, J., Bacher, S., Buchmann, C., Carl, G., Carré, G.,
Marquéz, J. R. G., Gruber, B., Lafourcade, B., Leitão, P. J., Münkemüller,
T., McClean, C., Osborne, P. E., Reineking, B., Schröder, B., Skidmore,
A. K., Zurell, D., and Lautenbach, S. (2013). Collinearity: a review of
methods to deal with it and a simulation study evaluating their performance.
Ecography, 36:027–046.
Holmes, E. E., Ward, E. J., and Scheuerell, M. D. (2014). Analysis of multi-

variate time-series using the marss package. Technical report, Northwest
Fisheries Science Center, Seattle, WA.
Jorgensen, J. C., Ward, E. J., Scheuerell, M. D., and Zabel, R. W. (2016).

Assessing spatial covariance among time series of abundance. Ecology and
Evolution, 6:2472–2485.
Lamon, E. I., Carpenter, S., and Stow, C. (1998). Forecasting pcb concen-
trations in lake michigan salmonids: a dynamic linear model approach.
Ecological Applications, 8:659–668.
Lisi, P. J., Schindler, D. E., Cline, T. J., Scheuerell, M. D., and Walsh, P. B.
(2015). Watershed geomorphology and snowmelt control stream thermal
sensitivity to air temperature. Geophysical Research Letters, 42(9):3380–
3388.
Ohlberger, J., Scheuerell, M. D., and Schindler, D. E. (2016). Population

coherence and environmental impacts across spatial scales: a case study of
Chinook salmon. Ecosphere, 7:e01333.
Petris, G., Petrone, S., and Campagnoli, P. (2009). Dynamic Linear Models
with R. Use R! Springer, London.
349
350 BIBLIOGRAPHY
Pole, A., West, M., and Harrison, J. (1994). Applied Bayesian forecasting and
time series analysis. Chapman and Hall, New York.
Scheuerell, M. D. and Williams, J. G. (2005). Forecasting climate induced
changes in the survival of snake river spring/summer chinook salmon
(oncorhynchus tshawytscha). Fisheries Oceanography, 14(6):448–457.
Stachura, M. M., Mantua, N. J., and Scheuerell, M. D. (2014). Oceanographic
influences on patterns in North Pacific salmon abundance. Canadian
Journal of Fisheries and Aquatic Sciences, 71(2):226–235.
Zuur, A. F., Fryer, R. J., Jolliffe, I. T., Dekker, R., and Beukema, J. J. (2003).
Estimating common trends in multivariate time series using dynamic factor
analysis. Environmetrics, 14(7):665–685.

Applied Time Series Analysis

Uploaded by

Copyright:

Available Formats

Applied Time Series Analysis

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Applied Time Series Analysis

Uploaded by

Copyright:

Available Formats

What are the different matrix forms used to represent linear regression models?

What are the different matrix forms used to represent linear regression models?

How does one estimate the parameters in matrix form 1?

How does one estimate the parameters in matrix form 1?

Applied Time Series Analysis for Fisheries and

E. E. Holmes, M. D. Scheuerell, and E. J. Ward

1 Basic matrix math in R 11

2 Linear regression in matrix form 27

3 Introduction to time series 53

3.7 Decomposition on log-transformed data . . . . . . . . . . . . . 65

4 Basic time series functions in R 69

5 Box-Jenkins method 117

6 Univariate state-space models 157

6.9 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180

7 MARSS models 187

8 MARSS models with covariates 217

9 Dynamic linear models 241

10 Dynamic Factor Analysis 267

11 Covariates with Missing Values 291

12 JAGS for Bayesian time series analysis 313

13 Stan for Bayesian time series analysis 333

13.8 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348

This is material that was developed as part of a course we teach at the

Holmes, E. E., M. D. Scheuerell, and E. J. Ward. Applied time series

Basic matrix math in R

1.1 Creating matrices in R

Create a 3 × 4 matrix, meaning 3 row and 4 columns, that is all 1s:

[,1] [,2] [,3] [,4]

[,1] [,2] [,3] [,4]

[,1] [,2] [,3] [,4]

[,1] [,2] [,3] [,4]

[,1] [,2] [,3]

[,1] [,2] [,3]

[,1] [,2] [,3]

[,1] [,2] [,3] [,4]

1.2 Matrix multiplication, addition and

A=matrix(1:6, 2, 3) #2 rows, 3 columns

[,1] [,2] [,3]

Error in B %*% B : non-conformable arguments

[,1] [,2] [,3]

[,1] [,2] [,3]

Error in A + B : non-conformable arrays

Error in A %*% A : non-conformable arguments

1.3 Subsetting a matrix

[,1] [,2] [,3]

[,1] [,2] [,3]

Error in A %*% B : non-conformable arguments

1.4 Replacing elements in a matrix

[,1] [,2] [,3]

[,1] [,2] [,3]

[,1] [,2] [,3]

[,1] [,2] [,3]

[,1] [,2] [,3]

1.5 Diagonal matrices and identity matrices

A diagonal matrix is one that is square, meaning number of rows equals

[,1] [,2] [,3]

[,1] [,2] [,3]

[,1] [,2] [,3] [,4]

[,1] [,2] [,3]