Monte Carlo Simulation (ARIMA Time Series Models)
Monte Carlo Simulation (ARIMA Time Series Models)
Monte Carlo Simulation (ARIMA Time Series Models)
1/7/2010
Summary
This procedure generates random samples from ARIMA time series models. The general
form of an ARIMA model is most easily expressed in terms of the backwards operator B,
which operates on the time index of a data value such that BjYt = Yt-j. Using this operator,
the model takes the form
1 B B 2
... B p 1 B s B 2 s ... B Ps (1 B ) d (1 B s ) D Z t
1 B B 2 ... B q 1 B s B 2 s ... B Qs at (1)
where
Z t Yt (2)
and at is a random error or shock to the system at time t, usually assumed to be random
observations from a normal distribution with mean 0 and standard deviation . For a
stationary series, represents the process mean. Otherwise, it is related to the slope of the
forecast function. is sometimes assumed to equal 0.
While the general model looks formidable, the most commonly used models are
relatively simple special cases. These include:
Yt 1 Yt 1 a t (3)
Yt a t 1 a t 1 (5)
Yt a t 1 a t 1 2 a t 2 (6)
Yt 1 Yt 1 a t 1 a t 1 (7)
Yt Yt 1 a t 1 a t 1 (8)
It can be shown that this model is equivalent to the Simple Exponential Smoothing model.
Many economic time series with a seasonal component can be well represented by this
model.
Data Input
The data input dialog box is used to specify the model from which the desired time series
should be generated. For example, the dialog box below requests data from a
(2,0,0)x(0,0,1)12 model:
Random seed: the seed for the random number generator. The initial default value is
set based on the time of day. If you use the same seed more than once, you will get
the same results.
Nonseasonal factors: the order of the nonseasonal AR factor (p), the order of
nonseasonal differencing (d), and the order of the nonseasonal MA factor (q). The
values of the AR and MA parameters are entered in the corresponding edit fields.
Seasonal factors: the order of the seasonal AR factor (P), the order of seasonal
differencing (D), the order of the seasonal MA factor (Q), and the length of
seasonality (s). The values of the AR and MA parameters are entered in the
corresponding edit fields.
When the OK button is pressed, a random time series is generated from the specified
model. To initialize the series, all values of Yt for t < 1 are set equal to the mean, while all
values for at for t < 1 are set equal to 0. A total of 2n observations are generated, but only
the last n are retained.
Analysis Summary
The Analysis Summary displays the requested model:
Mean: 10.0
Sigma: 1.0
Nonseasonal Factors
Order Parameters
AutoRegressive p=2 1.1,-0.3
Differencing d=0
Moving Average q=0
If you wish to generate the same time series again, record the seed of the random number
generator and use it the next time you generate the series. Otherwise, each time a series is
generated, it will be different.
14.3
12.3
Observation
10.3
8.3
6.3
4.3
0 20 40 60 80 100
Time index
Pane Options
Autocorrelations
An important tool in modeling time series data is the autocorrelation function. The
autocorrelation at lag k measures the strength of the correlation between observations k
time periods apart. The sample lag k autocorrelation is calculated from
nk
y t y y t k y
rk t 1
n
(11)
y y
2
t
t 1
The Autocorrelations pane displays the sample autocorrelations together with large lag
standard errors and probability limits:
Autocorrelations
Lower 95.0% Upper 95.0%
Lag Autocorrelation Stnd. Error Prob. Limit Prob. Limit
1 0.768758 0.1 -0.195997 0.195997
2 0.430287 0.147715 -0.289517 0.289517
3 0.105649 0.159758 -0.313121 0.313121
4 -0.0422804 0.160455 -0.314487 0.314487
5 -0.0184315 0.160567 -0.314706 0.314706
6 0.0922117 0.160588 -0.314747 0.314747
7 0.19766 0.161117 -0.315783 0.315783
8 0.158793 0.163523 -0.320501 0.320501
9 0.0316962 0.165058 -0.323509 0.323509
10 -0.175929 0.165119 -0.323628 0.323628
11 -0.31828 0.166983 -0.327281 0.327281
12 -0.370336 0.172943 -0.338963 0.338963
13 -0.183741 0.1807 -0.354166 0.354166
14 0.0663615 0.182558 -0.357809 0.357809
15 0.247179 0.1828 -0.358281 0.358281
16 0.302024 0.186112 -0.364773 0.364773
17 0.205659 0.19095 -0.374256 0.374256
18 0.101909 0.193153 -0.378573 0.378573
19 0.0277422 0.193689 -0.379625 0.379625
20 0.0775123 0.193729 -0.379703 0.379703
21 0.183506 0.194039 -0.38031 0.38031
22 0.282405 0.195767 -0.383697 0.383697
23 0.280609 0.199799 -0.3916 0.3916
24 0.179063 0.203702 -0.399249 0.399249
The standard error for rk is calculated on the assumption that the autocorrelations have
“died out” by lag k and are equal to 0 at all lags greater or equal to k. It is calculated
from:
1 2
k 1
se[rk ] 1 2 rk (12)
n i 1
This standard error is then used to calculate 100(1-)% probability limits around zero,
using a critical value of the standard normal distribution:
0 z / 2 se[rk ] (13)
If = 0.05, any sample autocorrelations that fall outside these limits are statistically
significantly different from 0 at the 5% significance level. The StatAdvisor highlights any
such autocorrelations in red.
For the sample data, note that there are significant values for the first 2 lags and also at
lag 12.
Pane Options
Autocorrelation Function
The Autocorrelation Function plot displays the sample autocorrelations and probability
limits:
Autocorrelation Function
0.6
Estimate
0.2
-0.2
-0.6
-1
0 5 10 15 20 25
Bars extending beyond the upper or lower limit correspond to statistically significant
autocorrelations.
Partial Autocorrelations
Another important tool in modeling time series data is the partial autocorrelation
function. The partial autocorrelations are used to help identify the proper order of
autoregressive model to use to describe an observed time series. The sample lag k partial
autocorrelation ˆkk is calculated from the sample autocorrelations using:
r1 k 1
k 1
rk ˆk 1, j rk j
ˆkk j 1 for k 1 (14)
k 1
1 ˆk 1, j r j
j 1
where
The Partial Autocorrelations pane displays the sample partial autocorrelations together
with large lag standard errors and probability limits:
Partial Autocorrelations
Partial Lower 95.0% Upper 95.0%
Lag Autocorrelation Stnd. Error Prob. Limit Prob. Limit
1 0.768758 0.1 -0.195997 0.195997
2 -0.392904 0.1 -0.195997 0.195997
3 -0.1534 0.1 -0.195997 0.195997
4 0.199171 0.1 -0.195997 0.195997
5 0.136975 0.1 -0.195997 0.195997
6 0.0326462 0.1 -0.195997 0.195997
7 0.038875 0.1 -0.195997 0.195997
8 -0.217107 0.1 -0.195997 0.195997
9 -0.036418 0.1 -0.195997 0.195997
10 -0.193988 0.1 -0.195997 0.195997
11 -0.0270899 0.1 -0.195997 0.195997
12 -0.0823734 0.1 -0.195997 0.195997
13 0.419143 0.1 -0.195997 0.195997
14 0.00270772 0.1 -0.195997 0.195997
15 -0.0287618 0.1 -0.195997 0.195997
16 0.0996899 0.1 -0.195997 0.195997
17 -0.00248173 0.1 -0.195997 0.195997
18 0.101966 0.1 -0.195997 0.195997
19 0.0462495 0.1 -0.195997 0.195997
20 -0.00310214 0.1 -0.195997 0.195997
21 0.0590796 0.1 -0.195997 0.195997
22 -0.0531857 0.1 -0.195997 0.195997
23 -0.100044 0.1 -0.195997 0.195997
24 -0.0560427 0.1 -0.195997 0.195997
1
se[ˆkk ] (16)
n
This standard error is then used to calculate 100(1-)% probability limits around zero,
using a critical value of the standard normal distribution:
0 z / 2 se[ˆkk ] (17)
If = 0.05, any sample partial autocorrelations that fall outside these limits are
statistically significantly different from 0 at the 5% significance level. The StatAdvisor
highlights any such partial autocorrelations in red.
Pane Options
0.6
0.2
Estimate
-0.2
-0.6
-1
0 5 10 15 20 25
Bars extending beyond the upper or lower limit correspond to statistically significant
partial autocorrelations.
Periodogram
The autocorrelations and partial autocorrelations describe the behavior of the data in the
time domain, i.e., by estimating statistics based on the amount of time between
observations. It is also useful to examine the data in the frequency domain, by
considering how much variability exists at different frequencies. It has been shown that
any discrete time series can be represented as the sum of a set of sines and cosines at a set
of frequencies called the Fourier frequencies. A typical component has the form
i
fi (19)
n
The periodogram calculates the power in the data at each Fourier frequency by
calculating:
I fi
n 2
2
ai bi2 (20)
which is scaled so that the sum of the periodogram ordinates across all of the Fourier
frequencies except for i = 0 yields the sum of squared deviations of the time series about
Periodogram Table
Cumulative Integrated
i Frequency Period Ordinate Sum Periodogram
0 0.0 0.0 0.0 0.0
1 0.01 100.0 74.6069 74.6069 0.168192
2 0.02 50.0 7.39836 82.0052 0.18487
3 0.03 33.3333 5.53374 87.539 0.197345
4 0.04 25.0 9.42166 96.9606 0.218585
5 0.05 20.0 128.435 225.396 0.508127
6 0.06 16.6667 6.60897 232.005 0.523026
7 0.07 14.2857 1.63018 233.635 0.526701
8 0.08 12.5 1.85305 235.488 0.530878
9 0.09 11.1111 6.17935 241.668 0.544809
10 0.1 10.0 14.8544 256.522 0.578296
11 0.11 9.09091 8.60954 265.132 0.597705
… … … … … …
Period: the period associated with the Fourier frequency, given by 1/ fi. This is the
number of observations in a complete cycle at that frequency.
Cumulative Sum: the sum of the periodogram ordinates at all frequencies up to and
including the i-th.
Pane Options
Remove mean: check to subtract the mean from the time series before calculating the
periodogram. If the mean is not removed, the ordinate at i = 0 is likely to be very
large.
Taper: percent of the data at each end of the time series to which a data taper will be
applied before the periodogram is calculated. Following Bloomfield (2000),
STATGRAPHICS uses a cosine taper that downweights observations close to i = 1
and i = n. This is useful for correcting bias if the periodogram ordinates are to be
smoothed in order to create an estimate of the underlying spectral density function.
Periodogram Plot
The Periodogram Plot displays the periodogram ordinates:
Periodogram
150
120
90
Estimate
60
30
0
0 0.1 0.2 0.3 0.4 0.5 0.6
frequency
Pane Options
Remove mean: check to subtract the mean from the time series before calculating the
periodogram.
Taper: percent of the data at each end of the time series to which a data taper will be
applied before the periodogram is calculated.
Integrated Periodogram
The Integrated Periodogram displays the cumulative sums of the periodogram ordinates,
divided by the sum of the ordinates over all of the Fourier frequencies:
Integrated Periodogram
0.8
0.6
Estimate
0.4
0.2
0
0 0.1 0.2 0.3 0.4 0.5 0.6
frequency
A diagonal line is included on the plot, together with 95% and 99% Kolmogorov-
Smirnov bounds. If the time series is purely random, the integrated periodogram should
fall within those bounds 95% and 99% of the time.
Save Results
The generated data may be saved to a datasheet by pressing the Save Results button on
the analysis toolbar.