Times Series 1
Times Series 1
Times Series 1
INTRODUCTION
1.1. Definition
A time series is a sequence of observations that are arranged according to the time of their
outcome. Many sets of data appear as time series: a monthly sequence of the quantity of goods
shipped from a factory, a weekly series of the number of road accidents, the newspapers'
business sections report daily stock prices, weekly interest rates, meteorology records hourly
wind speeds, daily maximum and minimum temperatures and annual rainfall. An intrinsic feature
of a time series is that, typically, adjacent observations are dependent. The nature of this
dependence among observations of a time series is of considerable practical interest. Time Series
analysis is concerned with techniques for the analysis of this dependence. Time Series Analysis
is the analysis of data organized across units of time.
Time series data provide useful information about the physical, biological, social or economic
systems.
Economic and financial time series: Many time series are routinely recorded in economics and
finance. Examples include share prices on successive days, export totals in successive months,
average incomes in successive months, and company profits in successive years and so on.
The average Monthly price of a certain crop at a town measured in successive years from 1993 to
2002 in fiscal year is given in figure1.1. This series is of particular interest to economic
historians, and is available in many places. The time plot shows that some apparent cyclic and
trend behaviors.
Figure 1.1: Plot of Price
Physical time series: Many types of time series occur in the physical sciences, particularly in
meteorology, marine science and geophysics. Examples are rainfall on successive days, and air
temperature measured in successive hours, days or months. Figure 1.2 the average weekly
maximum temperature of a country measured in 10 successive years in five months. The time
plot shows an outlier clearly at observation 141 and it needs adjustment using outlier adjustments
techniques.
Figure 1.2: The average maximum temperature in successive weeks over 5 months per 10 years.
Marketing time series: The analysis of time series arising in marketing is an important problem
in commerce. As an example, Figure 1.3 shows the sales of an engineering product by a certain
company in successive months over a 7-year period, as originally analyzed by Chatfield and
Prothero(1973). It is often important to forecast future sales so as to plan production. It may also
be of interest to examine the relationship between sales and other time series such as advertising
expenditure.
Figure 1.3: Sales of an industrial heater in successive months from Jan. 1965 to Nov. 1971.
Demographic time series: Various time series occur in the study of population change.
Examples include the population of the country measured annually, and monthly birth totals in
the country etc. Demographers want to predict changes in population for as long as ten or twenty
years in the future.
Process control data: In process control, the problem is to detect changes in the performance of
a manufacturing process by measuring a variable, which shows the quality of the process. These
measurements can be plotted against time as in Figure 1.4. When the measurements stray too far
from some target value, appropriate corrective action should be taken to control the process.
Special techniques have been developed for this type of time series problem, and the reader is
referred to a book on statistical quality control (e.g. Montgomery, 1996).
Terminology
Univariate time series are those where only one variable is measured over time, whereas
multiple time series are those, where more than one variable are measured simultaneously.
Continuous time series: A time series is said to be continuous when observations are made
continuously in time (at every instant of time). The term „continuous‟ is used for series of this
type even when the measured variable can only take a discrete set of values.
Discrete time series: A time series is said to be discrete when observations are taken only at
specific times, usually equally spaced. The term „discrete‟ is used for series of this type even
when the measured variable is a continuous variable.
Discrete time series can arise in several ways. Given a continuous time series, we could read off
(or digitize) the values at equal intervals of time to give a discrete time series, sometimes called a
sampled series. The sampling interval between successive readings must be carefully chosen so
as to lose little information. A different type of discrete series arises when a variable does not
have an instantaneous value but we can aggregate (or accumulate) the values over equal
intervals of time. Examples of this type are monthly exports and daily rainfalls. Finally, some
time series are inherently discrete, an example being the dividend paid by a company to
shareholders in successive years.
Deterministic Time Series: Much statistical theory is concerned with random samples of
independent observations. The special feature of time-series analysis is the fact that successive
observations are usually not independent and that the analysis must take into account the time
order of the observations. When successive observations are dependent, future values may be
predicted from past observations. If a time series can be predicted exactly, it is said to be
deterministic. However, most time series are stochastic in that the future is only partly
determined by past values, so that exact predictions are impossible and must be replaced by the
idea that future values have a probability distribution, which is conditioned by a knowledge of
past values.
Time Plot is a plot of observations against time. It is most important step in any time series
analysis. This graph should show up important features of the series such as trend, seasonality,
outliers, change in structure etc. the plot is vital, both to describe the data and to help in
formulating a sensible model.
Stationary series: A series whose overall behavior remains the same over time. It fluctuates
around a constant mean. A time series „looks‟ stationary if the time plot of the series appears
„similar‟ at different points along the time axis.
There are several possible objectives in analyzing a time series. These objectives maybe
classified as description, explanation, prediction and control.
a. Description
When presented with a time series, the first step in the analysis is usually to plot the data and to
obtain simple descriptive measures of the main properties of the series. If a time series contains
trend, seasonality or some other systematic component and correlations between successive
observations, the usual summary statistics can be seriously misleading and should not be
calculated. Moreover, even when a series does not contain any systematic components, the
summary statistics do not have their usual properties.
b. Explanation
When observations are taken on two or more variables, it may be possible to use the variation in
one time series to explain the variation in another series or ascertaining the leading, lagging and
feedback relationships among several series. A univariate model for a given variable is based
only on past values of that variable, while a multivariate model for a given variable may be
based, not only on past values of that variable, but also on present and past values of other
(predictor) variables. In the latter case, the variation in one series may help to explain the
variation in another series. For example, it is of interest to see how sea level is affected by
temperature and pressure, and to see how sales are affected by price and economic conditions.
c. Prediction
Given an observed time series, one may want to predict the future values of the series. This is an
important task in sales forecasting, and in the analysis of economic and industrial time series.
Many writers use the terms „prediction‟ and „forecasting‟ interchangeably, but some authors do
not. For example, Brown (1963) uses „prediction‟ to describe subjective methods and
„forecasting‟ to describe objective methods.
d. Control
Time series are sometimes collected or analyzed so as to improve control over some physical or
economic system. For example, when a time series is generated those measures the „quality‟ of a
manufacturing process, the aim of the analysis may be to keep the process operating at a „high‟
level and design optimal control scheme. Prediction is closely related to control problems in
many situations. For example, if one can predict that a manufacturing process is going to move
off target, then appropriate corrective action can be taken.
Trend Component: A trend is evolutionary movement, either upward or downward, in the value
of the variable. This type of component is present when a series exhibits steady upward growth
or a downward decline, at least over several successive time periods, when allowance has been
made for the other components. This may be loosely defined as „long-term change in the mean
level‟. A difficulty with this definition is deciding what is meant by „long term‟.
For example, climatic variables sometimes exhibit cyclic variation over a very long time period
such as 50 years. If one just had 20 years of data, this long-term oscillation may look like a trend,
but if several hundred years of data were available, then the long-term cyclic variation would be
visible. Nevertheless in the short term it may still be more meaningful to think of such a long-
term oscillation as a trend. Thus in speaking of a „trend‟, we must take into account the number
of observations available and make a subjective assessment of what is meant by the phrase „long
term‟. See figure1.1 above.
Cyclic Component: Apart from seasonal effects, some time series exhibit variation at a fixed
period due to some other physical cause. This includes regular cyclic variation at periods other
than one year. In addition some time series exhibit oscillations, which do not have a fixed period
but which are predictable to some extent. For example, economic data are sometimes thought to
be affected by business cycles with a period varying from about 3 or 4 years to more than 10
years, depending on the variable measured. However, the existence of such business cycles is the
subject of some controversy, and there is increasing evidence that any such cycles are not
symmetric.
Irregular Component: The phrase „irregular fluctuations‟ is often used to describe any variation
that is „left over‟ when other components of the series (trend, seasonal and cyclical) have been
accounted for. As such, they may or may not be random.
A time series model describes the process that generates the time series data using mathematical
and/or statistical expressions. In a simple model, the original data at any time point (denoted by
Yt) may be expressed as a function f of the components: the seasonality, the trend, cyclical, and
the irregularity
That isYt = f(Tt, St, Ct, It). There are usually two forms for the function f;
An additive model:
Yt = Tt+ St+ Ct +It
And a multiplicative model:
Yt = Tt*St *Ct*It, where
Yt = observation for period t,
Tt = trend component for period t,
St = seasonal component for period t,
Ct = cyclical component for period t
It = irregular component for period t
A classical approach to decomposition of a time series into patterns are additive and
multiplicative (actually, there are a lot of different decomposition algorithms). The additive
model is appropriate if the magnitude (amplitude) of the seasonal variation does not vary with
the level of the series, while the multiplicative version is more appropriate if the amplitude of the
seasonal fluctuations increases or decreases with the average level of the time series.
Analysts generally like to think they have „good‟ data, meaning that the data have been carefully
collected with no outliers or missing values. In reality, this does not always happen, so that an
important part of the initial examination of the data is to assess the quality of the data and
consider modifying them, if necessary.
The process of checking through data is often called cleaning the data, or data editing. It is an
essential precursor to attempts at modeling data. Data cleaning could include modifying
outliers, identifying and correcting obvious errors and filling in (or imputing) any missing
observations. This can sometimes be done using fairly crude devices, such as down weighting
outliers to the next most extreme value or replacing missing values with an appropriate mean
value. Data cleaning often arises naturally during a simple preliminary descriptive analysis. In
particular, in time-series analysis, the construction of a time plot for each variable is the most
important tool for revealing any oddities such as outliers and discontinuities.
Missing Value
When a series does not have too many missing observations, it may be possible to perform some
missing data analysis, estimation and replacement.
When observations are missing at random, it may be desirable to estimate, or impute, the missing
values so as to have a complete time series. A crude missing data replacement method is to plug
in the mean for the overall series. Another algorithm is to take the mean of the adjacent
observations. Missing value replacement in exponential smoothing often applies one-step-ahead
forecasting from the previous observations.
Caution!! Nonetheless, if there are too many observations missing, the series may simply be
unusable.
Outliers
Outliers, or aberrant observations, are often clearly visible in the time plot of the data. If they are
obviously errors, then they need to be adjusted or removed. Instead of adjusting or removing
outliers, an alternative approach is to use robust methods, which automatically down weight
extreme observations. Running median smoothers (also called Odd-span moving medians)
are effective data smoothers when time series data may be contaminated with unusual values.
The moving median of span-3 is a very popular and effective data smoother, where mt[3] =
median(Yt-1, Yt, Yt+1). For example, consider the sequence of observations: 15, 18, 13, 12, 16,
14, 16, 17, 18, 15, 18, 200, 19, 14, 21, 24, 19, 19 and 25. Here 200 is unusual value and that
deserves special attention should likely not be analyzed along with the rest of the dataset. So, 200
will be replaced by 19 if applying the moving median of span-3, and then the smoothed data is:
15, 18, 13, 12, 16, 14, 16, 17, 18, 15, 18, 19, 19, 14, 21, 24, 19, 19 and 25.
2. TEST OF RANDOMNESS
2.1. Introduction
A time series, in which the observations fluctuate around a constant mean, have a constant
variance and are statistically independent, is called a random time series. In other words, the time
series does not exhibit any pattern:
One can examine whether a time series is random or not by visually, whether the time series plot
shows any trend or not, or visually by looking at the Correlogram of the time series, or
statistically, testing whether the observed series could have been generated by a random
stochastic process, for example, statistical test based on- Turning Points, difference sign test,
phase length test and rank test.
It is a type of test based on counting the number of turning points. Meaning the number of times
there is a local maximum or minimum in the series. A local maximum is defined to be any
observation Yt such that Yt> Yt-1 and also Yt>Yt+1. A converse definition applies to local
minimum. If the series really is random, one can work out the expected number of turning points
and compare it with the observed value. The following is the procedure that test randomness of a
series by turning points.
Count the number of peaks or troughs in the time series plot. A peak is a value greater than its
two neighbors. Similarly, a trough is a value less than its two neighbors. The two (peak and
trough) together are known as Turning Points.
Consider the time series [Y1, Y2, Y3, ..., YN], the initial value Y1 cannot define a turning point
since we do not know Y0. Similarly, the final value YN cannot define a turning point since we do
not know YN+1. Three consecutive values (Yi, Yi+1, Yi+2) are required to define a turning point. In
a random series, these three values can occur in any of six possible orders and in any four of
them, there would be a turning point. The probability of finding a turning point in any set of
three values is .
Therefore, the number of turning points p in the series is given by p= ∑ and then the
If the observed number of Turning Points is more than the expected value, , then they
could not have arisen by chance alone. In other words, the series is not random. In order to test
whether the difference between the observed and expected number of turning points is
statistically significant, we have to calculate the variance of p. From combinatorial algebra as,
var(p) = . We can test an observed value against the expected value by the series is
random (null hypothesis) against the series is not random (alternative hypothesis) from the test
statistic, P, based on the decision rule of reject the null hypothesis if pcal is not in the interval of
infinity, consequently, Z = and then the decision rule becomes reject the null
√
Time, t 1 2 3 4 5 6 7 8 9 10 11 12
Data 1 102 112 113 100 90 88 85 86 91 92 99 105
Data 2 102 112 88 95 75 103 98 106 98 82 87 92
In order to apply test of randomness for the series by using turning point test first plot the series.
Then from the plot P = 2 and the CI = 2*√ = (3.98, 9.36) and 2 is not in the
interval and hence we can conclude that the series is not random.
Then from the plot P = 8 and the CI = 2*√ = (3.98, 9.36) and 8 is in the
This test consists of counting the number of positive first difference of the series, that is to say,
the number of points where the series increases (we shall ignore points where there is neither
increase nor decrease) with a series of N terms we have N-1 differences. Let us define a variable
given as W = ∑ and assume it‟s distribution is normal with mean and variance
∑ 2
= . The distribution of W tends to standard normal as N tends to
infinity, consequently, Z = and hence the hypothesis to be tested is: the series
√
is random (H0) against the series is not random (H1). So, we can decided to reject H0 if │Zcal
│> ⁄ at significant level.
Example 1: Consider the following series and test the randomness of it by applying difference
sign test. [Use 5% SL]
Time, t 1 2 3 4 5 6 7 8 9 10 11 12
Data 3 35 46 51 46 48 51 46 42 41 43 61 55
Solution: In order to test the randomness of the series, first find the difference of the series and
obtain the number of increasing points.
Time, t 1 2 3 4 5 6 7 8 9 10 11 12
Data 3 35 46 51 46 48 51 46 42 41 43 61 55
difference - 11 5 -5 2 3 -5 -4 -1 2 18 -6
Di - 1 1 0 1 1 0 0 0 1 1 0
Then w = ∑ = 6, E(w) = (12-1)/2 = 5.5, Var(w) =(12+1)/12=1.083, Zcal=(6-5.5)/(1.083) ½
= 0.45 and ⁄ = 1.96. Hence, Zcal is less than Zcrit and we retain H0 at 5% significance level
and conclude that the series is statistically random.
A phase is an interval between two turning points (peak and trough or trough and peak). To
define a phase of length d from the series we require d + 3 points, i. e., a phase of length 1 will
requires 4 points, a phase of length 2 will requires 5 points, and the like.
Consider the d + 3 values arranged in increasing order of magnitude. Then the probability of a
phase either rising or falling is . Now in a series of length n there are n-d-2 possible
phase of length d and the expected number of phases of length is: E(d) = .
The phase length test compares the deserved number with the expected number through the
statistic with a slight modification on the decision rule.
Step1: classify the observed and expected counts of phase length in to three categories as d=1,
d=2 and d 3.
Example: consider data 2 above and test the randomness of it by using phase length test [Use 5%
sig.level].
Solution:
d #d E(d) =
1 6 3.75
2 1 1.467
3 0 0.3694
and = = 1.601.
Decision: since < (i.e., 1.601 < 5.991) we retain Ho at 5% sig. level and conclude that
Given a set of series in which having Trend pattern. From the series let us count the number of
cases in which each observation is greater than the previous observation(s), i.e., Yt> Yt-1, Yt-2,…
Y1. Now let each count be Mt and add up to M = ∑ , then calculate Kendall‟s correlation
coefficient, r = -1,
-1 r 1.
Hypothesis: series is random (H0) versus series is not random (H1). Use r as a test statistic by
Example: consider the following series and test the randomness of the series by using rank test.
Time, t 1 2 3 4 5 6 7 8 9 10 11 12
Data 4 10 9 11 10 12 13 12 13 14 12 15 12
Solution:
Time, t 1 2 3 4 5 6 7 8 9 10 11 12 Total
Mt 0 0 2 1 4 5 4 6 8 4 10 4 48
Therefore, r = -1 = -1= -1 = 0.454545 =0.45 and √ =√ .
Therefore, since │r│ >√ (i. e., 0.45 > 0.22) we reject H0 and conclude that the series is
not random.
Exercises
1. In a certain time series there are 56 values with 35 turning points. Then, test the
randomness of the series using turning points method. (Use ).
2. Consider Q1 and let the series has 34 phases with a phase-length of 1, 2 and 3 are 23, 7
and 4 respectively. So, what is your decision about the randomness of the series if you
apply phase-length test? (Use )
3. Test the randomness of the series by using difference sign test to distinguish the
randomness of the series having 73 observations and w= 35. (Use )
4. Test the randomness of the series by using rank test to distinguish the randomness of the
series having 73 observations and the sum of positive differences are 39.
common estimate of μ is the sample mean or average defined as: ̂ = ̅ = ∑ . Under the
In order to estimate the trend pattern by free hand method the following steps will perform.
This method is to estimate the trend pattern by divide the series in to equal parts and computing
the averages of each part.
Example: Estimate the trend pattern by semi-average method by divide it on 4 equal parts.
T 1 2 3 4 5 6 7 8 9 10 11 12
Yt 9 8 9 12 9 12 11 7 13 9 11 12
S-AM= ̂ t 8.67 11 10.33 10.67
15
10
Yt
5 Tt -hat
error
0
1 2 3 4 5 6 7 8 9 10 11 12
-5
Suppose that all observations from the origin of time through the current period; say Y1, Y2,
Y3,…, YN are available. The least squares criterion is to choose μ so as to minimize sum of the
Example: estimate the trend for the following time series using the least square methods.
T 1 2 3 4 5 6 7 8 9 10 11 12
Data1 10 9 11 10 12 13 12 13 14 12 15 12
Data2 14 15 10 14 17 12 15 11 12 18
Estimated values for the trend component are 11.92 and 13.8 for Data1and Data2 respectively.
20
20
15 15
10 Data1 10 Data2
Tt-hat Tt-hat
5 5
0 0
1 3 5 7 9 11 1 2 3 4 5 6 7 8 9 10
In the ordinal least square method, the arithmetic mean includes all part observations of the
series with equal weight, namely . Since the value of the unknown parameter can change
slowly with time, it is reasonable to give more weights to observations nearby in time are likely
to be close in values. Here there are two moving average methods to estimate Trend component.
Method 1: Simple Moving Average (Odd Order, k) it is denoted by kMA and calculates for all t
except for those at the most beginning and end. At each period, the oldest observation is
discarded and the nearest is included to the set.
Method 2: Centered Moving Average (Even Order, k)-when we computed the average of the 1st
k periods we could have placed the average in the middle of the interval of k periods. This works
well with odd periods, but not so good for even periods. So where would we place the first
moving average when k is even? Technically, the moving average would fall at t= 2.5, 3.5, ….
To avoid this problem we smooth the moving averages as making moving average of the moving
averages called Centered Moving Average, CMA denoted by 2*kMA. This method weighs the
series to be averaged be assigning to the first and the last series and to the rest of the middle
series. Means greater weighs is given to the middle set of the series and less weighs to the two
extreme series.
even number.
Example: Consider the following series and estimate the trend pattern by taking 3, 5 and 4
period moving average methods (that is k =3, 5 and 4).
t 1 2 3 4 5 6 7 8 9 10
Yt 14 15 10 14 17 12 15 11 12 18
Solution: for instance, the first 13 in the 3rd column (3MA) is 1/3 (14+15+10), 13.625 in the 5th
column (2*4MA) is 1/8 (14+2*15+2*10+2*14+17).
T 1 2 3 4 5 6 7 8 9 10
Yt 14 15 10 14 17 12 15 11 12 18
3MA - 13 13 13.7 14.33 14.67 12.67 12.67 13.67 -
̂t 5MA - - 14 13.6 13.6 13.8 13.4 13.6 - -
2*4M 13.6 13.87 14.12
A - - 3 13.6 5 5 13.13 13.25 - -
20
15
Yt
10 3*MA
5*MA
5 2*4MA
0
1 2 3 4 5 6 7 8 9 10
Therefore, the estimated trend, ̂ t, is either 3MA or 5MA or 2*4MA and the best one is an
estimate that have minimum mean square error (MSE) among 3MA, 5MA or 2*4MA.
This is a very popular scheme to produce a smoothed time series. Whereas in single moving
averages that past observations are weighted equally, exponential smoothing assigns
exponentially decreasing weights as the observation get older. In other words, recent
observations are given relatively more weight in correcting than the older observations. In the
case of moving averages, the weights assigned to the observations are the same and are equal to
(1/k) except the beginning and end values (1/2k) in the centered moving average method. The
effect of recent observations is expected to decline exponentially over time. The further back
along the historical time path one travels, the less influence each observation has on trend
estimation. To represent this geometric decline in influence, an exponential weighting scheme is
applied in a procedure referred to as simple (single) exponential smoothing (Gardiner, 1987). In
exponential smoothing, however, there are one more smoothing parameters to be determined (or
estimated- the constant level) and these choices determine the weights assigned the observations.
In the exponential smoothing, a new estimate is the combination of the random error, (Yt – ̂ t)
generated in the present time period. That is ̂ t = ̂ t-1 + ( ) or ̂ t+1 = ̂ t + ( , where
̂ t+1 = the new estimated value for the next time period.
Why is it called “exponential”? Let us expand the simple exponential smoothing equation by
first substituting for ̂ t in the simple exponential equation to obtain:
By substituting for ̂ t-1, and so forth, until we reach ̂ 0, we will ultimately get:
̂t = ∑ – – ̂ 0.
For example, the expanded equation for t=4 the smoothed value is:
̂ t =∑ – – ̂0
=∑ – – ̂0
= – + – – + – + – ̂,
where ̂ 0is the starting value of exponential smoothing and an important role in computing all
the subsequent exponential weighted smoothed average‟s. Setting Y1 for ̂ 0 is one method of
initialization.
A simple average of the most recent k observations if historical data are available,
Some subjective prediction must be made if there are no reliable past-data available.
Example: Estimate trend component by simple exponential smoothing method for the series
given below. [use = 0.1 and ̂ 0 =Y1]
T 1 2 3 4 5 6 7 8 9 10
Yt 14 15 10 14 17 12 15 11 12 18
Solution: from ̂ t = + (1 – ) ̂ t then;
̂ 1 = 0.1*Y1+(1-0.1)*Y1 = 0.1*14+0.9*14 = 14
t 1 2 3 4 5 6 7 8 9 10
Yt 14 15 10 14 17 12 15 11 12 18
̂t 14 14.1 13.7 13.7 14 13.8 14 13.7 13.5 14
20
15
10 Yt
Yt+1-hat
5 Error
0
1 2 3 4 5 6 7 8 9 10
-5
For t sufficiently large so that – ̂ 0 is close to zero and the weights, – decreases
geometrically, and their sum is unity, then the exponential smoothing process gives as unbiased
estimate of μ, that means E( ̂ t) = μ.
=∑ – – ̂0
=∑ – – ̂0
=∑ – – ̂0
= ∑ – = ( ) =
What is the “best” value for ? The problem faced here is, how do we find an appropriate
value for ? In general the value of should between 0.1 and 0.9 (see ChatField, C. 2002). But
a smaller smoothing constant gives more relative weight to the observations in the more distant
past and a larger smoothing constant, within these bounds, gives more weight to the most recent
observation and less weight to the most distant observations.
In Practice, its value is finding by trial and error. This means try by taking different in the
given interval and then compute the Mean Square Error [MSE = ∑ ̂ for those
different value. The value that yields the smallest MSE is an appropriate value for
Example:Consider the following data set consisting 12 observations taken over time and
estimate Trend component at time t by assuming ̂0 = 71 and = 0.1 and 0.5. Which is
appropriate? Why?
t 1 2 3 4 5 6 7 8 9 10 11 12
Yt 71 70 69 68 64 65 72 78 75 75 75 70
Solution:
T 1 2 3 4 5 6 7 8 9 10 11 12 MSE
Yt 71 70 69 68 64 65 72 78 75 75 75 70
= 0.1 71 70.9 70.71 70.44 69.8 69.32 69.58 70.43 70.88 71.29 71.67 71.5
= 0.5 71 70.5 69.75 68.88 66.44 65.72 68.86 73.43 74.21 74.61 74.8 72.4
0.1) 0 -0.9 -1.71 -2.44 -5.8 -4.32 2.42 7.57 4.12 3.71 3.33 -1.5 14.1
0.5) 0 -0.5 -0.75 -0.88 -2.44 -0.72 3.14 4.57 0.79 0.39 0.2 -2.4 3.78
Estimation of trend using =0.5 is better than that of =0.1. Because the mean square error for
=0.5 is smaller than =0.1.
100
80
Yt
60
= 0.1
40 = 0.5
Error( )
20
Error( =0.5)
0
1 2 3 4 5 6 7 8 9 10 11 12
-20
Figure 3.6: Trend estimation by Simple Exponential Smoothing Method and comparing MSE.
3.2. Linear Trend Estimation
A time series that exhibits a trend is a non-stationary time series. Modeling and forecasting of
such a time series is greatly simplified if we can eliminate the trend. One way to do this is to fit a
regression model describing the trend component to the data and then subtracting it out of the
original observations, leaving a set of residuals that are free of trend. The trend models that are
usually considered are the linear trend, in which the mean of Yt is expected to change linearly
with time as in E(Yt) = β0 + β1t.
Assume that there are T periods of data, say Y1, Y2, Y3, …, YT. Let again the estimation for β0
and β1 is ̂ 0 and ̂ 1 respectively. Thus, ̂ t = ̂ 0 + ̂ 1t is known as the fitted model. Then the
difference between the fitted model and the data is denoted as: ̂ t. To estimate ̂ 0
and ̂ 1 by the method of least square, we must choose ̂ 0 and ̂ 1, so that error sum of square is
minimum as much as possible. That is:
SSE = ∑ =∑ ̂ =∑ ̂ ̂ is minimum.
∑ ̂ ̂
̂ = ̂ =0
➱-2 ∑ ̂ ̂ =0
➱ ̂ ̂ ∑ =∑
➱̂ = ∑ ̂ ∑ , we know that ∑ =
➱̂ = ∑ ̂ -------------------------------------------------------(3)
∑ ̂ ̂
̂ = ̂ =0
➱-2 ∑ ̂ ̂ =0
➱∑ ̂ ∑ ̂ ∑ =0
➱̂ ∑ ̂ ∑ ∑ , we know also∑ =
➱̂ ̂ ∑ -------------------------------------(4)
∑ ̂ ̂ ∑
➱ ∑ ̂ ̂ ∑
➱̂ ∑ ∑
( )
➱̂ ∑ ∑
➱̂ ∑ – ∑ ---------------------------------------(5)
̂ = ∑ ̂
➱̂ = ∑ ∑ – ∑
➱̂ = ∑ ∑ ∑
➱̂ = ∑ ∑ --------------------------------(6)
The magnitude of ̂ indicates the trend (or average rate of change) and its sign indicates the
direction of the trend.
Example: Assume linearity and estimate the trend pattern from the following series by least
square method.
Month Jan Feb Mar Apr May Jun Jul Aug Sep
Price 3 6 2 10 7 9 14 12 18
Solution:
Month Jan Feb Mar Apr May Jun Jul Aug Sep Total
Price 3 6 2 10 7 9 14 12 18 81
tYt 3 12 6 40 35 54 98 96 162 506
2
t 1 4 9 16 25 36 49 64 81 285
̂ = ∑ ∑ = = 1.68
Month Jan Feb Mar Apr May Jun Jul Aug Sep
Price 3 6 2 10 7 9 14 12 18
̂t 2.26 3.94 5.62 7.30 8.98 10.66 12.34 14.02 15.70
error 0.74 2.06 -3.62 2.70 -1.98 -1.66 1.66 -2.02 2.30
Unfortunately, neither the mean of all data nor the moving average of the most recent moving
values is able to cope with a significant trend. There exists a variation on the moving average
procedure that often does a better job of handling trend. It is called Double Moving averages for
a linear Trend process. It calculates a second moving average from the original moving average,
using the same value for moving. As soon as both single and double moving averages are
available. Therefore, estimation of trend in a time series which has a linear relationship with t is
as follows.
consider a double moving average from the single moving average, say:
Example: consider the price series above and estimate the trend by linear moving average
procedure with k =3.
20
18
16
14
12
10 Price
8 Tt
6
4
2
0
Jan Feb Mar Apr May Jun Jul Aug Sep
Figure 3.7: Trend estimation by Double moving average Method from linear model assumption.
As we previously observed, single smoothing does not excel in following the data when there is
a trend. This situation can be improved by the introduction of a second equation with a second
constant, , which must be chosen in conjunction with .
̂ = +(1– ) ̂ =∑ – – ̂
= ∑ ̂
Apply the same procedure for second order exponential smoothing and using similar arguments
one can get: ̂ = ∑ ̂ ̂
Then the trend estimate at the end of period t is given by ̂ t = 2 ̂ - ̂ . This procedure may
by refer to as the Brown’s one-parameter linear exponential smoothing.
The initial values of ̂ and ̂ are obtained from estimates of the two coefficients ̂ ̂ ,
which may be developed through simple linear regression analysis of a historical data. Given
initial estimates of ̂ ̂ from ordinal least square method ( ̂ t = ̂ + ̂ t from the series),
then ̂ =̂ ̂ and ̂ = ̂ ̂ .
Example: apply the Brown‟s method with = 0.2 to estimate the linear trend for the price data
given above.
Solution: From the ordinal least square method, we obtain: ̂ t = 0.58 + 1.68t.
̂ = 0.2*3+0.8*(-6.14) =-4.31
̂ = 0.2*3+0.8*(-4.31) =-2.25
Finally, ̂ t = 2* ̂ -̂
Month Price ̂ ̂ ̂t
Jan 3 -4.31 -11.15 2.53
Feb 6 -2.25 -9.37 4.87
Mar 2 -1.40 -7.78 4.98
Apr 10 0.88 -6.04 7.81
May 7 2.10 -4.42 8.62
Jun 9 3.48 -2.84 9.80
Jul 14 5.59 -1.15 12.32
Aug 12 6.87 0.45 13.29
Sep 18 9.10 2.18 16.01
20
15
10 Price
Tt
5
0
Jan Feb Mar Apr May Jun Jul Aug Sep
Figure 3.8: Trend estimation by Double ES Method from linear model assumption.
Properties of Exponential Smoothing
E( ̂ t)=E( ∑ ̂ 0) = ∑ ̂0
E( ̂ t)= ∑ (β0 β1 )
= β0 β1 ∑ β1 ∑
= β0 β1 β1 ∑
= β0 β1 β1 ∑
= β0 β1 β1
E( ̂ t) = β0 β1 β1 , since = . This shows that for a linear model, the first order
(simple) exponential smoothed statistic will tend to lag behind the true value by an amount equal
to β1.
Apply the same procedure for double exponential smoothing and using similar arguments one
can get: E( ̂ ) = E( ̂ t) β1 = β0 β1 t β1 β1 = β 0 β1 β1.
Finally, double exponential smoothed statistic will tend to lag behind the true value by an
amount equal to β1in a linear model assumption.
Exercises
1. Consider the following time series data in which monthly sales of shampoo product in a
certain super market and estimate the Trend pattern based on constant mean model
assumption using 3, 5 and 4 period moving average. Which moving average is best to
estimate Trend?
Month Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
price 266 145.9 183.1 119.3 180.3 168.5 231.8 224.5 192.8 122.9 136.5 185.9
2. A certain company‟s credit outstanding has been increasing at a relatively constant rate
(in millions) over time as we have seen from the following series. Therefore, estimate the
Trend component based on linear model assumption using: (I) Least square (II) 3 period
linear moving average (III) Simple Exponential Smoothing (Use ) method.
Which method is best to estimate Trend?
Year 1 2 3 4 5 6 7 8 9 10 11 12 13
Credit 133 155 165 171 194 231 274 312 313 343 333 360 373
In some cases, a linear trend is inadequate to capture the trend of a time series. A natural
generalization of the linear trend model is the polynomial trend model.
Note that the linear trend model is a special case of the polynomial trend model (p=1)
For economic time series we almost never require p > 2. That is, if the linear trend model is not
adequate, the quadratic trend model will usually work:
Quadratic model: Tt = β0 + β1t + β2t2
Our assumption at this point is that our time series, yt, can be modeled as yt = Tt(β) + εt
whereTt is the quadratic trend model , β denotes the parameters of the quadratic trend model,
and εt denotes the other factors (i.e., the seasonal and cyclical components) that determine yt.
We don‟t observe the β‟s and so we will need to estimate them in order to forecast the trend (and,
eventually, y).
The natural approach to estimating the quadratic trend model is the least square approach –
Choose the β‟s to minimize
[ y
t 1
t Tt ( )]2
[ yt 1
t 0 1t 2t 2 )]2
It turns out that under the assumptions of the unobserved components model, the OLS estimator
of the linear and quadratic trend models is unbiased, consistent, and asymptotically efficient.
Further, standard regression procedures can be applied to test hypotheses about the ‟s and
construct interval estimates.
Another alternative to the linear trend model is the log linear trend model, which is also called
the exponential trend model:
140
120 122
106 108 111
100 97
89 91 89
80 78 79 81
71 71 70 73 76
60 Birr
40
20
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Solution: Step 1
e.g. 86 Year Average
1998 86
1999 87
2000 89
2001 96
Steps 2-4
e.g.82.6= Quarter I II III IV
Year
1998 82.6 103.5 123.2 90.7
1999 81.6 103.4 124.1 90.8
2000 82.0 102.2 124.7 91.0
2001 72.2 101.0 127.1 92.7
Mean = S.I. 79.6 102.5 124.8 91.3
E.g.: 79.6 =
Note that yearly mean value is 100 and hence the mean of each quarter is the same to that of the
seasonal index for each quarter. Therefore, the seasonal indices for quarter I, II, III and IV are
79.6, 102.5, 124.8 and 91.3 respectively. These shows that the effect of the season in quarter I
and IV is decreased by 20.4% and 8.7% from the grand mean while the effect of the season in
quarter II and III is increased by 2.5% and 24.8% from the grand mean respectively.
4.2.2 Link Relative Method
This method expresses each figure as a relative of the immediately preceding value (week,
month, quarter). The steps involved under this are given below.
Step1: Express each of the season values as a percentage of the pervious value. These are called
Link Relative and computed as LRt= .
Step2: Sort out the link relatives by season and obtain the average of link relative for each
∑
season. It is calculated as ̅̅̅̅= , where s is number of observations in each season.
Step3: covert these averages into a series of Chain Relatives by setting the value of the first
season (week, January or quarter) as 100. Here the chain relative of any season is obtained
by multiplying the link relative of that preceding season and dividing by 100.
That is CRi= , i=1, 2, 3, …L and L is the length of season. For instance L =4 if the series
Therefore, the seasonal indices for quarter I, II, III and IV are 82.4, 103.2, 124.4 and 90.0
respectively. These shows that the effect of the season in quarter I and IV is decreased by
17.6%and 10.0% from the mean while the effect of the season in quarter II and III is increased
by 3.2% and 24.4% from the mean respectively.
Step4: Now the mean of seasonal averages is not 100 rather it is 160.03(=
¼*{201.32+146.55+129.48+162.74}). Therefore the corresponding correction factor would be
100/160.03 = 0.625. Each seasonal average is multiplied by the correction factor 0.625 to get the
adjusted seasonal indices as shown in the table below. 101.70 = 162.74*0.625
Quarter I II III IV Mean
Adjusted Seasonal Average 125.8 91.6 80.9 101.7 100.0
Therefore, the seasonal indices for quarter I, II, III and IV are 125.8, 91.6, 80.9 and 101.7
respectively. This table clearly shows an annual seasonal pattern of below average in the
beginning periods followed by an interim time interval of above average and ending each year
with another below average. These uninterrupted highs and lows in the seasonal index set
represent a very strong seasonal effect in the data. Thus the effect of the season in quarter II and
III are decreased by 8.42% and 19.09% from the overall mean while in quarter I and IV are
increased by 25.83% and 1.7%, respectively.
Quarter
Year I II III IV
1998 83.5 84.3 85.1 85.9
1999 86.7 87.5 88.3 89.1
2000 89.9 90.7 91.5 92.3
2001 93.1 93.9 94.7 95.5
Step2: dividing the actual values by the corresponding trend estimates and expressing the result
in percentages ({Yt/ ̂ ), we obtain the following results.
Quarter
Year I II III IV
1998 85 105.6 124.6 90.8
1999 81.9 102.8 122.3 88.6
2000 81.2 100.3 121.3 87.8
2001 81.6 103.3 128.8 93.2
Therefore, the seasonal indices for quarter I, II, III and IV are 82.5, 10.3.1, 124.2 and 90.2
respectively. Thus the effect of the season in quarter I and IV are decreased by 17.5% and 9.8%
from the overall mean (expected mean which is 100) while in quarter II and III are increased by
3.1% and 24.3% from expected mean (=100), respectively.
4.3 Estimation of Seasonal Component for the Additive Model
A time series model with an assumption of patterns in the additive is given by: Yt = Tt+ St+Ct+It.
If this model appears to be appropriate, the seasonal pattern is given in absolute terms. The
methods discussed above can be adopted easily to obtain the seasonal component for the additive
model.
For instance, if the moving average method is to be used one may follow the following steps.
Step1: compute a 12-month or 4-quarter centered moving average
Step2: subtract the moving average from the actual data.
Step3: construct a table containing these differences by season (months or quarter) and find the
seasonal (monthly or quarterly) averages and then the grand average.
Step4: Adjust them if they do not total zero (or the grand mean is not zero) by the addition or
subtraction of a correction factor.
Note that the correction factor in additive model is the grand average with sign reversed and
adjusted seasonal index is average for season plus correction factor.
Example: find the seasonal index in an additive model for the following data.
Quarter I II III IV
Year
1985 416 477 462 466
1986 446 471 487 482
1987 449 483 490 484
1988 476 507 516 510
Solution: step1and 2- calculation of 4-quarter centered moving average and differences for the
actual and centered moving averages.
Year Quarter Yt 2*4MA Yt – 2*4MA
1985 1 416
2 477
3 462 459.0 3.0
4 466 462.0 4.0
1986 1 446 464.4 -18.4
2 471 469.5 1.5
3 487 471.9 15.1
4 482 473.8 8.3
1987 1 449 475.6 -26.6
2 483 476.3 6.8
3 490 479.9 10.1
4 484 486.3 -2.3
1988 1 476 492.5 -16.5
2 507 499.0 8.0
3 516
4 510
Step3: calculation of seasonal averages for the differences and the grand mean.
Quarter I II III IV
Year
1985 3 4
1986 -18.4 1.5 15.1 8.3
1987 -26.6 6.8 10.1 -2.3 Grand
1988 -16.5 8.0 Mean
Mean -20.5 5.4 9.4 3.3 -0.6
Step4: The grand mean is not zero. Therefore, we should adjust the mean of each quarter in
order to obtain the adjusted seasonal indices. Our correction factor is the grand mean with
reversed sign, i.e., -(-0.6)=0.6. So, the adjusted seasonal indices are given below in the table.
Year Quarter Yt Q1 Q2 Q3 Q4
1998 1 71 1 0 0 0
2 89 0 1 0 0
3 106 0 0 1 0
4 78 0 0 0 1
….
2001 1 76 1 0 0 0
2 97 0 1 0 0
3 122 0 0 1 0
4 89 0 0 0 1
One can estimate the trend component by least square method as ̂ t = 80.75 + 1.03t with R2 =
0.099 from the series.
Assume the actual series is in the form of: Yt = + Q1 + Q2 + Q3 + εt, then one
can findthe fitted model as: ̂ t = 81.75 – 0.8t - 6.6Q1 + Q2 + Q3 with R2 = 0.979.
Note that the parameter estimates of the Q‟s will give the seasonal effect for each of the three
quarters. The seasonal effect for the fourth quarter is given by the constant intercepts. Alternative
schemes can be used to allocate the dummy variables. For example, instead of excluding the
fourth quarter dummy variable, the above application could have excluded the first quarter
dummy variable. Another way of proceeding is to include dummy variables for all four quarters.
If this method is used then the intercept must be dropped from the regression equation to avoid
the dummy variable trap.
̂ ̂ ̂ ̂
̂ ̂ ̂
̂ ̂ ̂ ̂
Multiplicative Model:
̂ ( ̂ ) ̂ ̂
̂ ̂ ̂ ̂
̂ ̂ ̂
̂ ̂ ̂ ̂ ,where
Winters‟ method requires initial values of the parameters like ̂ , ̂ and ̂ for all seasons and
smoothing constants , and . But if there is historical data, they can be used to provide some
or all the initial estimates.
4.6 Deseasonalization of Data
The non-stationary pattern in a time series data needs to be removed from the series before
proceeding with model building. One way of removing non-stationary is through the method of
differencing. The differenced series is defined as: Yt = Yt – Yt-1. Taking first differencing is a
very useful tool for removing non-stationary, but sometimes the differenced data will not appear
stationary and it may be necessary to difference the data a second time. The series of second
2
order difference is defined: Yt = Yt – Yt-1 = (Yt – Yt-1) – (Yt-1 – Yt-2) = Yt – 2Yt-1 + Yt-2. In
practice, it is almost never necessary to go beyond second order differences.
With seasonal data which is not stationary, it is appropriate to take seasonal differences. A
seasonal difference is the difference between an observation and the corresponding observation
from the previous year as: Yt = Yt – Yt-s, where s is the length of the season.
When both seasonal and first differences are applied, it does not make any difference which is
done first. It is recommended to do the seasonal differencing first since sometimes the resulting
series will be stationary and hence no need for a further first difference. When differencing is
used, it is important that the differences be interpretable.
The other method is estimating the seasonal component and removes the seasonal component
(de-seasonalized) from the series. The deseasonalized time-series data will have only trend (T)
cyclical(C) and irregular (I) components and is expressed as: Multiplicative model: Y/S*100 =(
T*S*C*I)/S*100 = (T*C*I)*100 and Additive model: Y – S = (T+S+C+I) – S = T+C+I.
Exercises
1. The following table provides monthly sales ($1000) at a certain college bookstore. The
sales show a seasonal pattern, with the greatest number when the college is in session and
decrease during the summer months. Therefore, estimate the seasonal index by the
following methods. (I) Simple Moving Average (II) Link Relatives (III) Ratio-to-
Moving Average methods. Assume the model is multiplicative!
M Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Y
1 196 188 192 164 140 120 112 140 160 168 192 200
2 200 188 192 164 140 122 132 144 176 168 196 194
3 196 212 202 180 150 140 156 144 164 186 200 230
4 242 240 196 220 200 192 176 184 204 228 250 260
2. Consider the following series and compute the seasonal index by using ratio-to-
Trend method for the assumption of multiplicative model.
Quarter I II III IV
Year
1996 75 60 54 59
1997 86 65 63 80
1998 90 72 66 85
1999 100 78 72 93
3. Consider the following series recorded Quarterly for five years and calculate the
Seasonal Index based on: (I) Simple Average Method if the model is additive (II)
Link Relative Method if the model is additive.
Q
Y I II III IV
1950 30 40 36 34
1951 34 52 50 44
1952 40 58 54 48
1953 54 76 68 62
1954 80 92 86 82
4.7.1. Introduction
Cyclical components usually vary greatly from one another with respect to duration and
amplitude. In practice, the cyclical component is irregular in behavior and is so inter-movable
with irregular movement and it is impossible to separate them (cyclical and irregular). In the
analysis of time series in to its component directly while cyclical and irregular fluctuations are
left together after the other components (Trend and Seasonal) have been removed. However, the
measurements of the cyclical variations involve the following steps.
Exercise: Estimate the Irregular components of the following series based on the
assumption that components are in the Multiplicative Model.
Quarter
Year I II III IV
1984 2881 3249 3180 3505
1985 3020 3449 3472 3715
1986 3184 3576 3657 3941
1987 3319 3850 3883 4159
In AR model, the current value of the process is expressed as a finite linear aggregates of
previous values of the process and a shock t, is a parameter that determines the “level” of the
process.
Moving Average models:Suppose that { t} is a purely random process with mean zero and
variance e2 . Then a process {yt} is said to be a moving average process of order q, MA(q), if
Yt = + t - 1 t-1 - 2 t-2 - …- q t-q
Autoregressive Moving Average (ARMA) processes are processes that are formed as
combination of autoregressive and moving average processes. ARMA process of order (p,q),
ARMA(p,q) has the form: Yt = + 1 Yt-1+ 2 yt-2+…+ p Yt-p- 1 t-1 - 2 t-2 - …- q t-q+ t.
Autocorrelation:
Autocorrelation coefficient measures the relationship, or correlation, between a set of
observations and a lagged set of observations in a time series. The autocorrelation
cov(Yt , Yt k )
coefficient at lag k, denoted by k, is given by k = = k .
var(Yt ) var(Yt k ) 0
Given the time series (Y1, Y2, Y3, ….,Yn), the autocorrelation between Yt, and Yt+k measures
the correlation between the pairs (Y1, Y1+k), (Y2, Y2+k), (Y3, Y3+k), .… ,( Yn, Yn+k). The
sample autocorrelationcoefficientat lag k (denoted by rk), an estimate of k is computed by
nk
(Y t Y )(Yt k Y )
rk t 1
n
, where Yt = the data from the stationary time series;
(Y
t 1
t Y ) 2
Yt+k = the data k time periods ahead; ̅ = the mean of the stationarytime series
A graph displaying the sample autocorrelation coefficient, rk, versus the lag k is called the
sample autocorrelation function (ACF) or a correlogram. This graph is useful both in
determining whether or not a series is stationary and in identifying a tentative ARIMA model.
The plot of partial autocorrelation coefficient kk) against the lag k gives the Partial
Autocorrelation Function (PACF) and the behavior of the partial correlation coefficients
(PACF's) for the stationary time series, along with the corresponding ACF, is used to identify a
tentative ARIMA model.
Note: 00 =1, 11 = 1
Computing kk.
If the autocorrelation „matrix‟ for a stationary time series with length k is given by:
| |
kk = , where is with the last column replaced by [ ] and and are a k by
| |
1
| |
̂ 22 = 1 2
= .
1
| |
1
Yt 47 64 23 71 38 64 55 41 59 48
Yt 47 64 23 71 38 64 55 41 59 48
∑ t ̅ ̅
r1 = = -1497/1896 = -0.790
∑ ̅ 2
∑ ̅ ̅
r2 = = 876/1896 = 0.462 and
∑ ̅ 2
1
| |
̂ 22 = 1 2
= = = -0.431.
1
| |
1
The general model introduced by Box and Jenkins (1976) includes autoregressive as well as
moving average parameters, and explicitly includes differencing in the formulation of the model.
Specifically, the three types of parameters in the model are: the autoregressive parameters (p),
the number of differencing passes (d), and moving average parameters (q). In the notation
introduced by Box and Jenkins, models are summarized as ARIMA (p, d, q); so, for example, a
model described as (0, 1, 2) means that it contains 0 (zero) autoregressive (p) parameters and 2
moving average (q) parameters which were computed for the series after it was differenced once.
The Box-Jenkins approach uses an iterative model-building strategy that consists of selecting an
initial model, estimating the model coefficients, and analyzing the residuals. If necessary, the
initial model is modified and the process is repeated until the residuals indicate no further
modification is necessary. At this point, the fitted model can be used for forecasting. The basis of
Box-Jenkins approach to modeling time series consists of three phases:
Model selection/Identification
Parameter estimation
Model checking/Diagnostics
Model identification:
As a rule of thumb, Box-Jenkins requires at least 50 equally-spaced periods of data. The data
must also be edited to deal with extreme or missing values or other distortions through the use of
functions as log or inverse to achieve stabilization and differencing to avoid obvious patterns
such as trend and seasonality. The input series for ARIMA needs to be stationary, that is, it
should have a constant mean and variance through time.
Therefore, usually the series first needs to be differenced until it is stationary (this also often
requires log transforming the data to stabilize the variance). The number of times the series needs
to be differenced to achieve stationarity is reflected in the d parameter. Seasonal patterns require
respective seasonal differencing.
Once stationarity have been addressed, we need to decide how many autoregressive (p) and
moving average (q) parameters are necessary to yield an effectivebut still parsimonious model of
the process (parsimonious means that it has the fewest parameters and greatest number of
degrees of freedom among all models that fit the data).The major tools for doing this are
autocorrelation function (ACF), and partial autocorrelation function (PACF).The sample
autocorrelation plots and the sample partial autocorrelation plots are compared to the theoretical
behavior of these plots when the order is known.
Parameter estimation:
Once a tentative model has been identified, the estimates for the constant and the coefficients of
the equation must be obtained. It is a way that finding the values of the model coefficient
(like , , , …, , , , , .., ).
There are several different methods for estimating the parameters. All of them should produce
very similar estimates, but may be more or less efficient for any given model. In general, during
the parameter estimation phase a function minimization algorithm is used (the so-called quasi-
Newton method; refer to the description of the Nonlinear Estimation method) to maximize the
likelihood (probability) of the observed series, given the parameter values. In practice, this
requires the calculation of the (conditional) sums of squares (SS) of the residuals, given the
respective parameters.
Model checking/Diagnostics:
Before forecasting with the final equation, it is necessary to perform various diagnostic tests in
order to validate the goodness of fit of the model. If the model is not a good fit, the tests can also
point the way to a better model.
A good way to check the adequacy of an overall Box-Jenkins model is to analyze the residuals
( Yt Yˆt ). If the residuals are truly random, the autocorrelations and partial autocorrelations
calculated using the residuals should be statistically equal to zero.
If they are not, this is an indication that we have not fitted the correct model to the data. When
this is the case, the residual ACF and PACF will contain information about which alternate
models to consider.
6. MODEL IDENTIFICATION AND ESTIMATION
6.1. Introduction
The Box-Jenkins approach consists of extracting the predictable movements (pattern) from the
observed data through a series of iterations. One first tries to identify a possible model from a
general class of linear models. The chosen model is then checked against the historical data to
see if it accurately describes the underlying process that generates the series. If the specified
model is not satisfactory, the process is repeated by using another model designed to improve the
original one. The process is repeated until a satisfactory model is found. This procedure is
carried out on stationary data (the trend has been removed).
Box-Jenkins models can only describe or represent stationary series or series that have been
made stationary by differencing. The models fall into one of the three following categories:
Autoregressive (AR), moving average (MA) and mixed process (ARMA). If differencing is
applied together with AR and MA, they are referred to as Autoregressive Integrated Moving
Average (ARIMA), with the „I‟ indicating "integrated" and referencing the differencing
procedure.
6.2. Autoregressive models
In autoregressive models, the current value of the process Ytis a linear function of past stationary
observations Yt-1, Yt-2, Yt-3,….and the current shock εt, where {εt} denotes a purely random
process with zero mean and variance .
Yt-2, …..Yt-p= past stationary observations; μ, 1, 2,….p= the parameters (constant and
coefficient); and t = the random error for the present time period (expected value is equal to 0).
The number of past stationary observations used in an autoregressive model is known as the
order, p. So, if we use two past observations in a model, we say that it is an autoregressive (AR)
model of order 2, or AR(2).
Let Yt - μ = Zt be deviation of values from μ, then the process is rewritten as:
Z t 1 Z t 1 2 Z t 2 ........... p Z t p t
.
Notice that the above model can be written in terms of Ztby using the backward shift operator B,
such thatBZt =Zt-1, B2Zt = Zt-2, …,BpZt =Zt-p. TheAR(p) model may be written in the form:
Z t 1 BZ t 2 B 2 Z t ........... p B p Zp t
t (1 1 B 2 B 2 ......... . p B p )Z t
t ( B)Z t
(B)= 1 1 B 2 B 2 ......... p B p
is an autoregressive operator of order p.
The necessary requirement for stationarity is that the autoregressive operator,
(B)= 1 1 B 2 B 2 ......... p B p , considered as a polynomial in B of degree p, must have
all roots of (B) =0 greater than one in absolute value; that is all roots must lie outside the unit
circle.
Special cases for Autoregressive Model/Process
i) AR(1)process, p=1
The simplest example of an AR process is the first-order case given by
Yt 1Yt 1 t or Z t 1 Z t 1 t 1 BZ t t . Hence, t (1 1 B)Z t
For AR(1) to be stationary the roots of 1 - 1B = 0 must lie outside the unit circle. The time
series literature typically says that an AR(1) process is stationary provided that │1│< 1. It is
more accurate to say that there is a unique stationary solution of AR(1) which is causal, provided
that │1│< 1
E(Zt-kZt) E (1 Z t k Z t 1 Z t k t )
E(Zt-kZt) E (1 Z t k Z t 1 ) E (Z t k t ) t up to
when k>0, Zt-k can only involve the shocks
Cov(Zt-k, Zt) = 1 cov(Zt-k, Zt-1) time t-k, which are uncorrelated with
t .
k = 1 k 1
k
k = 1 k 1
0 0
k 1 k 1 , k>0
k 1 (1 k 2 ) = 12 k 2
With 0 =1,
k 1k , k≥0
The autocorrelation function decays exponentially to zero when 1 is positive, but decays
exponentially to zero and oscillates in sign when 1 is negative. This property of the
autocorrelation function for AR(1) process is stated as the ACF tails off. That is the ACF of
AR(1) process tails off exponentially.
ZtZt 1 Z t Z t 1 Z t t
Taking expectation,
E(Zt2) = E( Zt Zt-1) + E(Ztt) , since the only part of Ztthat will be correlated with t is the
e2
Hence, 2
and
1 12
Example: ConsiderAR(1) model: Zt = 0.5Zt-1+t, Here 1 =0.5.
+1 ACF
k
k
1 2 3 4 5 6
2-1< 1
2 0
│2│< 1
-1
-2 0 2
1
1 1 2 1
2 1 1 2
When the roots of 1 1 B 2 B 2 0 are real, the autocorrelation function consists of a mixture
of damped exponentials. This occurs when 12 42 0 . If the roots are complex ( 12 42 0 ) a
second order autoregressive process displays a pseudo periodic behavior, the autocorrelation
function shows a damped sine wave.
Variance of AR(2) process, 2
e2
2
1 1 1 2 2
Example: Consider the AR(2) process given by Zt = Zt-1 - 0.5Zt-2 + εt. Is this process stationary?
Findtheautocorrelation of this process.
e2
2
1 1 1 2 2 ... p p
Example: Consider fourth order autoregressive process: Yt = Yt-4 +εt, 0 < < 1, whereεt is
white noise with zero mean and variance . Then find the Variance and Autocorrelation of Yt.
stationary observation; t = the white-noise (random) error, which is unknown and whose
expected value is zero; t 1 , , t 2 , ….., t q = previous errors ;and, θ1, θ2, .., θq = the constant
written as Z t t 1 B t 2 B 2 t ... q B q t
Z t (1 1 B 2 B 2 ...... q B q ) t
Z t ( B) t
(B)=( 1 1 B 2 B 2 ......... q B q )
is a moving average operator of order q.
Consider the MA(q) process corrected for mean: Zt= t 1 t 1 2 t 2 ........ q t q
The auto-covariance at lag k is k = E(ZtZt-k)
k
0 ,k q
k 0 , k>q
The autocorrelation function of a MA(q) process is zero beyond the order q of the process. In
other words, the autocorrelation function of a moving average, MA (q), process has a cut off
after lag q.
Special cases of MA(q)
2
2
1 12 22
t
1
Z t (1 1 B) 1 (Z t ) (*)
(1 1 B)
t 1i B i Z t (**)
i 0
Consider (**)
t 1i B i Z t = (1+ 1 B 12 B 2 13 B 3 ...)Z t
i 0
t Zt+ 1 Zt-1+ 12 Zt-2+ 13 Zt-3+…
Zt= t - 1 Zt-1- 12 Zt-2 - 13 Zt-3 -…
This is an infinite Autoregressive process (AR(∞)).
The condition 1 1 is called the invertibility condition for a MA(1) process.
Model A: Zt = t 1 t 1
1
Model B: Zt = t t 1
1
The autocorrelation functions:
1
,k 1
For model A: k 1 12
0 ,k 1
1
,k 1
For Model B: k 1 12
0 ,k 1
Although the models are different, they have the same autocorrelation functions.
This shows that a MA process cannot be uniquely determined from the ACF.
If we express models A and B by putting t in terms of Zt, Zt-1, Zt-2,…, we find by successive
substitution that
Zt = t - 1 Zt-1 - 12 Zt-2 - …
1 1
Model B: t Zt+ Zt-1+ Zt-2+…
1 12
1 1
Zt = t - Zt-1 - 2 Zt-2 - …
1 1
If 1 1 , the series for A converges whereas for B does not. Therefore, if 1 1 , model A is
equation 1 1 B 2 B 2 0
lie outside the unit circle, that is,
2 1 1
2 1 1
1 2 1
stationary observation; Yt-1, Yt-2,…,= the past observations, t-1, t-2, …..= errors for the
stationary time series; t = the present error (where the expected value is set equal to 0);μ, 1, 2,
…, θ1, θ2, …. = the constant and the parameters of the model; and p and q denote the order of the
model. The importance of ARMA processes is that many real data sets maybe approximated in a
more parsimonious way (meaning fewer parametersare needed) by a mixed ARMA model rather
than by a pure AR or pureMA process.
Notice that the above modelisrewrittenas follow using the backward shift operator.
1 B B
1 2
2
p B p Z t 1 1 B 2 B 2 q B q t ,whereZt is thestochastic process
being modeled, {εt} is a white noise process (i.e., a sequence of uncorrelated random variables
having zero mean and constant variance), B is the backward shift operator, defined by BZt = Zt-1,
and the ‟s and ‟s are the (unknown) parameters of the model, to be estimated from a
realization of the process (i.e., a sample of successive Z‟s) and we assume that Zt represents a
deviation from a mean value.
Again we may write the model 1 B B
1 2
2
p B p Z t 1 1 B 2 B 2 q B q t
as: B Z t B t , where (B) and (B) are polynomial of degree p and q respectively in B.
There are some cases of model B Z t B t that are of special interest. If B 1, we may
regression model of the most recent Zt on previous Zt‟s, and the model is called an autoregressive
process of order p. If, on the other hand, B 1, then the model becomes
the process is called a moving average process of order q. The general model B Z t B t is
called a mixed autoregressive moving average process.
The stationarity of an ARMA process, B Z t B t , is related to the AR component in the
model. B Z t B t will define a stationary process provided that the roots of the
polynomial (B) = 0 lie outside the unit circle. Similarly, the invertibility of an ARMA(p, q)
process is related to the MA components. The roots of the polynomial ( B) 0 must lie outside
the unit circle if the process to be invertible.
Special Case of an ARMA(p,q)
Z t 1 BZ t t 1 B t
Note that ARMA(1,1) is stationary for -1< 1 <1 and invertible for -1< 1 < 1
Hence, solving equations (1) and (2) for 0 and 1 , the auto-covariance of ARMA(1,1) process is
0
1 1
2
211 2
1 12
1
1 11 1 1 2
1 12
k 1 k 1 k2
The autocorrelation function of ARMA(1,1) process is
1
1 11 1 1
1 12 211
k 1k 1 k2
In particular, if there are d unit roots, the operator (B) can be written as (B) = ( B)(1 B) d
. Thus, a model that can represent homogeneous non-stationary behavior is of the form:
(B) Zt = ( B) t
( B)(1 B) d Zt = ( B) t
where (B) is a non-stationary autoregressive operator such that d of the roots of (B) =0
are unity and the remainder lie outside the unit circle and (B) is a stationary autoregressive
operator.
Since 1 B , the differencing operator, the model ( B)(1 B) d Zt = ( B) t can be written
as
( B) d Z t ( B) t
We call the process ( B) d Z t ( B) t an autoregressive integrated moving average
(ARIMA) process. If the autoregressive operator (B) is order of p, the dthdifference is
taken, and the moving average operator B is order of q, we say that we have an ARIMA
model of order
(p, d, q), or simply an ARIMA(p, d, q) process.In practice, d is usually 0, 1, or at most 2
Exercise: Classify each of the following model as an ARIMA(p, d, q) process (i.e. find p, d , q)
(a) (1- B)(1- 0.2B)Yt= (1-0.5B) t
The autocorrelation function of an autoregressive process of order p tails off, its partial
autocorrelation function has a cut off after lag p. Conversely, the autocorrelation of a moving
average process of order q has a cut off after lag q, while its partial autocorrelation function tails
off. If both the autocorrelation and partial autocorrelation tails off, a mixed process is suggested.
Furthermore, the autocorrelation function for a mixed process, containing a pth–order
autoregressive component and a qth-order moving average component, is a mixture of
exponential and damped sine waves after the first q-p lags. Conversely, the partial
autocorrelation function for a mixed process is dominated by a mixture of exponentials and
damped sine waves after the first p-q.
Example: If the true (correct) model is AR(1), then could be different from zero and
= 0 ,k 2. Similarly, if AR(2) is the correct model and must be different from zero
while = 0 , k 3. Hence, we say that the PACF cuts off after lag p for AR(p). So, we can use
PACF to identify the order of an AR process.
Similarly, the ACF can be used to identify the order of a MA process.
AR(p) Die out (Tails off) Cut off after the order p of the process
MA(q) Cut off after the order q of the process Die out (Tails off)
N.B: In this context “Die out” means “tend to zero gradually” and “cutoff” means “disappear or
is zero”.
ARIMA (1,0,0) Exponential decline, with first two or Single significant positive peak
many lags significant at lag 1
0< 1 <1
ARIMA (1,0,0) Alternative exponential decline with a Single significant negative peak
negative peak ACF(1) at lag 1
–1 < 1 < 0
ARIMA (0,0,1) Single significant negative peak at lag 1 Exponential decline of negative
value, with first two or many
0< θ1<1 lags significant
ARIMA (0,0,1) Single significant positive peak at lag 1 Alternative exponential decline
with a positive peak PACF(1)
-1< θ1< 0
Estimated autocorrelations can have rather large variances and can be highly autocorrelated with
each other. For this reason, detailed adherence to the theoretical autocorrelation function cannot
be expected in the estimated function. Since we do not know the theoretical correlations and
since estimated that we compute will differ somewhat from their theoretical counterparts, it is
important to have some indication of how far an estimated value may differ from the
corresponding theoretical value. In particular, we need some means for judging whether the
autocorrelations and partial autocorrelations are effectively zero after some specific lag q or p,
respectively.
We use the standard errors of the sample autocorrelation and partial autocorrelation functions to
identify non-zero values. Recall that the sample autocorrelation, rk, is estimate of k and the
On the hypothesis that the process is autoregressive of order p, the standard error for estimated
As a general rule, we would assume rk or ̂ to be zero if the absolute value of its estimate is
less than twice its standard error. That is │rk│ 2S.E.(rk) or │ ̂ │ 2S.E( ̂ ).
One may plot the interval (-2S.E.(rk), 2S.E.(rk)) to test H0: k = 0 versus H1: k 0. Reject H0 if
the value of rk lies outside the interval.
Similarly, we can use the interval (-2S.E( ̂ ), 2S.E.( ̂ )) to test H0: kk= 0 versus H1: kk 0.
Reject H0 if the value of ̂ lies outside the interval.
Sometimes one may choose more than one tentative models. Then, the diagnostic procedure will
help determine the best model.
Example: Consider the following ACF and PACF for some time series data with 120
observations.
K 1 2 3 4 5 6 7
rk 0.709 0.523 0.367 0.281 0.208 0.096 0.132
S.E.(rk) 0.091 0.129 0.146 0.153 0.153 0.153 0.153
̂ 0.709 0.041 -0.037 0.045 -0.007 -0.123 0.204
S.E.( ̂ ) 0.091 0.091 0.091 0.091 0.091 0.091 0.091
Test for PACF: H0: 11= 0 versus H1: 11 0. Reject H0 if the value of ̂ lies outside the
interval
(-2*S.E.( ̂ ), 2*S.E.( ̂ )) = (-2*0.091, 2*0.091) = (-0.182, 0.182). Since ̂ =0.709 lies
outside the interval (-0.182, 0.182), H0 is rejected.
H0: 22= 0 versus H1: 22 0. Reject H0 if the value of ̂ lies outside the interval
(-2*S.E.( ̂ ), 2*S.E.( ̂ )) = (-2*0.091, 2*0.091) = (-0.182, 0.182). Since ̂ =0.041lies in the
interval (-0.182, 0.182, H0 is not rejected.
Example 2: Consider the following ACF and PACF for a time series with N = 106.
K 1 2 3 4 5 6 7
rk -0.448 0.004 0.052 -0.058 0.094 -0.021 0.012
S.E.(rk) 0.097 0.115 0.115 0.115 0.115 0.115 0.115
̂ -0.448 -0.423 -0.308 -0.340 -0.224 -0.151 0.009
S.E( ̂ ) 0.097 0.097 0.097 0.097 0.097 0.097 0.097
Test for ACF: H0: 1 = 0 versus H1: 1 0. Reject H0 if the value of r1 lies outside the interval
(-2*S.E.(r1), 2*S.E.(r1)) = (-2*0.097, 2*0.097) = (-0.019, 0.019). Since r1= -0.448 lies outside
the interval (-0.019, 0.019), H0 is rejected.
H0: 2 = 0 versus H1: 2 0. Reject H0 if the value of r2 lies outside the interval
(-2*S.E.(r2), 2*S.E.(r2)) = (-2*0.115, 2*0.115) = (-0.23, 0.23). Since r1= 0.004 lies in the
interval
(-0.23, 0.23), H0 is not rejected.
Another approach to model selection is the use of information criteria such as Akaike
Information Criteria (AIC) or Bayesian Information Criteria (BIC). In the implementation of this
approach, a range of potential ARMA models is estimated by maximum likelihood methods. In
the information criteria approach, models that yield a minimum value for the criterion are to be
preferred, and the AIC or BIC values are compared among various models as the basis for
selection of the model.
If the time series is not stationary, the sample autocorrelation function will die down extremely
slowly. If this type of behavior is exhibited, the usual approach is to compute the same
autocorrelation and partial autocorrelation functions for the first difference (regular) of the series.
If these functions behave according to the theoretical patterns, one difference is necessary. If not
we must try successively higher order of differencing until stationarity behavior is achieved.
In practice, many time series contain seasonal periodic component. The fundamental fact about
seasonal time series with period S is that observations which are S intervals apart are similar.
Box and Jenkins have generalized the ARIMA model to deal with seasonality, and define a
general multiplicativeseasonal ARIMA (abbreviated SARIMA model) as
p ( B) P ( B s )Wt q ( B)Q ( B s ) t (*)
where B denotes the backward shift operator, p , P , q , Q are polynomials of order p, P, q,
Q, respectively.( p (B) = 1 1 B 2 B 2 ......... p B p ,
P ( B s ) 1 1 B s 2 B 2 s ......... P B Ps , q (B) 1 1 B 2 B 2 ......... q B q ,
Q ( B s ) 1 1 B s 2 B 2 s ......... Q B Qs )
t denotes a purely random process,
andWt= d sDYt (**), d (1 B) d , sD (1 B s ) D
The variables {Wt} are formed from the original series {Yt}not only by simple differencing (to
remove trend) but also by seasonal differencing, s , to remove the seasonality.
Example: If d=1, D=1, s=12, then Wt = 12Yt = 12Yt 12Yt 1 = (Yt – Yt-12) – (Yt-1 – Yt-13)
The model in (*) and (**) is said to be a SARIMA model of order (p, d, q) (P, D, Q)s. The value
of d and D do not usually need to exceed one.
Example: Consider a SARIMA model of order (1, 0, 0) (0, 1, 1)12
(p=1,d=0, q=0, P=0, D=1, Q=1)
(1 1 B)Wt (1 1 B12 ) t
p ( B) 1 1 B, P ( B s ) 1 q ( B) 1` , Q ( B s ) 1 1 B s
Wt 12Yt Yt Yt 12
Then we find
(1 1 B)12Yt t 1 B12 t
12Yt 1 B12Yt t 1 t 12
Yt Yt 12 1 B(Yt Yt 12 ) t 1 t 12
Yt Yt 12 1Yt 1 1Yt 13 ) t 1 t 12
the following manner: ̂ Y , where Y is the mean of the stationary time series.
̂ = and
̂ =
Example: Given r1 = 0.81 and r2 = 0.43, then we have estimate the parameters
̂ = = = 1.32and
̂ = = = -0.63
r1
1 ˆ ˆ ˆ ˆ ,
1 1 1 1
r2 ˆ1r1
1 ˆ 2ˆ1ˆ1
1
2
ˆ 2
For MA(q) , ˆ 2
1 ˆ12 ˆ22 ... ˆq2
ˆ 2 ˆ 2
e.g. for MA(1) , ˆ 2 and for MA(2) ˆ 2
1 ˆ12 1 ˆ12 ˆ22
ˆ 2 (1 ˆ12 )
For ARMA(1,1) it takes the form ˆ 2
1 ˆ12 2ˆ1ˆ1
6.9.Diagnostic checking
When a model has been fitted to a time seriesdata, it is advisable to check that the model really
does provide an adequate description of the data, if necessary, suggest potential improvements.
This is done through different approaches (usuallyby looking residuals).
Compute ˆt Yt Yˆt for all t, and then construct the autocorrelation function of the residuals.
Let rk (ˆ) be the ACF of the residuals.The estimated autocorrelations rk (ˆ) of the residuals, would
be uncorrelated and distributed approximately normally about zero with variance n-1, and hence
with standard error of n-1/2.
If the model is appropriate, then the residual sample autocorrelation function should have no
structure to identify. Values which lie outside the interval 2*√ are significantly different
from zero (at 5% significance level) and give evidence that the wrong model has been fitted.
Portmanteau Lack-of-Fit Test
Ruther than considering the rk (ˆ) terms individually, an indication is often needed of whether,
say, the first 20 autocorrelations of the residuals, indicate inadequacy of the model. Suppose we
have the first M autocorrelation of the residuals, rk (ˆ) (k=1, 2, …, M) from any ARIMA(p, d, q)
M
process. If the fitted model is appropriate Q = (N - d rk2 (ˆ ) is approximately distributed as
k 1
Example: The following table shows the first 25 autocorrelations of the residuals, rk (ˆ) , from an
IMA (0, 2, 2) process:2Zt = (1 – 0.13B- 0.12B2)εt, which was fitted to a series with N=226
observations. Check the adequacyof the model using Ljung-Box statistic, Q*.
k 1 2 3 4 5 6 7 8 9 10 11 12 13
rk (ˆ) 0.02 0.032 -0.125 -0.078 -0.011 -0.033 0.022 -0.056 -0.13 0.093 -0.129 0.063 -0.084
k 14 15 16 17 18 19 20 21 22 23 24 25
rk (ˆ) 0.022 -0.006 -0.05 0.153 -0.092 -0.005 -0.015 0.007 0.132 0.012 -0.012 -0.127
0.022 0.0322 0.1272 36.2
Q* = (224)(226) ...
223 222 199
M-p-q= 25 - 0 – 2 =23
02.05 (23) 35.2 and 02.1 (23) 32 .
Q*> 02.05 (23) 35.2 . There is some doubt as to the adequacy of this model.
Over fitting: It is another technique used for diagnostic checking. Having identified what is
believed to be a correct model, we actually fit a more elaborate one( more elaborate model
contains additional parameters) and use likelihood ratio or t-tests to check that they are not
significant.
7. Forecasting
The use of time t of available observations from a time series to forecast its value at some
future time t+k can provide a basis for
economic and business planning
production planning
inventory and production control
control and optimization of industrial processes
Box-Jenkins methodology
Here we will see the forecasting procedure based on ARIMA models which is usually known as
the Box-Jenkins Approach. For both seasonal and non-seasonal data, the adequacy of the fitted
model should be checked by what Box and Jenkins call „diagnostic checking‟. Here, we consider
only nonseasonal time series. When a satisfactory model is found, forecasts may readily be
computed.
Forecasts are usually needed over a period known as the lead time, which varies with each
problem. For example, in a sales forecasting problem, sales Yt, in the current month t and the
sales Yt-1, Yt-2, Yt-3, … in previous months might be used to forecast sales for lead times k=1, 2,
3, …, 12 months ahead.
Denote by Yˆt (k ) the forecast made at origin at t of the sales Yt+kat some future time t+k, that is at
lead time k. The function Yˆt (k ) , k=1, 2, ……which provides the forecasts at origin t for all future
lead times, will be called the forecast function at origin t. Our objective is to obtain a forecast
function such that the mean square of the deviations Yt+k b- Yˆt (k ) between the actual and
forecasted values is as small as possible for each lead time k.
We shall be concerned with forecasting a value Yt+k, k≥1, when we are currently standing at
time t. This forecast is said to be made at origin t for lead time k.
Now suppose, standing at origin t, that we are to make a forecast Yˆt (k ) of Yt+kwhich is to be a
linear function of current and previous observations Yt, Yt-1, Yt-2, ….. Then it will also be a linear
function of current and previous shocks t , t 1 , t 2 , ....
The error of the forecast Yˆt (k ) at lead time k, t (k ) , is given as t (k ) Yt k Yˆt (k ) The standard
criterion to use in obtaining the best forecast is the mean squared error for which the expected
value of the squared forecast errors, E Yt k Yˆt (k ) , is minimized.
2
Let us denote EYt k Yt , Yt 1 ,..., the conditional expectation of Yt+k given knowledge of all Y‟s up
to time t, by Et[Yt+k]. We will assume that the t are a sequence of independent random variables.
Then the conditional expectation of t j given knowledge of all Y‟s up to time t, denoted by ,
Et [ t j ], is zero. That is E[ t j Yt , Yt 1 , ...] 0, j 0 .
It can be shown that Yˆt (k ) = Et[Yt+k]. Thus the minimum mean square error forecast at origin t,
for lead time k, is the conditional expectation of Yt+k at time t. When Yˆt (k ) is regarded as a
function of k for fixed t, it will be called the forecast function for origin t. We note that a
minimum requirement on the random shocks t in the ARIMA model in order for the
conditional Et[Yt+k], which always equals the minimum mean square forecast, to coincide with
the minimum mean square error linear forecasts is that Et [ t j ] 0, j 0 .
The one-step-ahead forecast error is t (1) Yt+1 - Yˆt (1) t 1 . It follows that a minimum mean
square error forecast, the one-step-ahead forecast errors must be uncorrelated. Although the
optimal forecast errors at lead time 1 will be uncorrelated, the forecast errors for longer lead
times in general will be correlated.
To calculate the conditional expectations that occur in the forecast functions, we note that if j is a
non-negative integer,
Et Yt j Yt j j = 0, 1, 2, . . .
Et Yt j Yˆt ( j ) j = 1, 2, . . .
Et t j t j Yt j Yˆt j 1 (1) j = 0, 1, 2, . . .
Et t j 0 j = 1, 2, . . .
Therefore, to obtain the forecast Yˆt (k ) , one writes down the model for Yt+k and treats the terms
according to the following rules:
1. The Yt j (j =0, 1, 2, . . .), which have already happened at origin t, are left unchanged.
2. The Yt j (j = 1, 2, . . .), which have not yet happened, are replaced by their forecasts
Yˆt ( j ) at origin t.
3. The t j (j =0, 1, 2, . . .), which have happened at origin t, are available from
Yt j Yˆt j 1 (1).
4. The t j (j = 1, 2, . . .), which have not happened, are replaced by zeros.
In the expression t = Yt Yˆt 1 (1) , t 1 = Yt 1 Yˆt 2 (1) , the forecasting process may be
started off initially by setting unknown ' s equal to their unconditional expected values
of zero.
Examples
For this process, it holds that Yt = μ + εt – θ1εt-1, with |θ1| < 1. Then, Yt+k = μ + εt+k– θ1εt +k-1.
For k 2, Yˆt (k ) .That is the unconditional mean is the optimal forecast of Yt+k, k = 2, 3, ..., .
For this process, it holds that Yt = μ +εt – θ1εt-1–θ2εt-2, with θ1+θ2< 1, θ2–θ1<1 and |θ1| <1. The
model for Yt+k is Yt k = t k 1 t k 1 2 t k 2 .
For k 3 , Yˆt (k ) i.e. the unconditional mean is the optimal forecast of Yt+k, k = 3,4, ..., .
Similarly, it is possible to show that, after q forecast steps, the optimal forecasts of invertible
MA(q) processes, q > 1 are equal to the unconditional mean of the process and that the variance
of the forecast error is equal to the variance of the underlying process.
Example: The time series model has been fitted to some historical data as MA(2) process:
Yt = 20 + εt + 0.45 εt-1 - 0.35εt-2. If the first four observations are 17.5, 21.36, 18.24 and
16.91, respectively, then find forecasts for period 5, 6, 7,… from origin 4
Thus, for k>1, the forecast function for ARMA(1,1)process is Yˆt (k ) α + 1Yˆt (k 1)
Yt (1 1 B) t
Yt Yt 1 t 1 t 1
For k>1, we have Yˆt (k ) Yˆt (k 1) as the eventual forecast function for IMA(0,1,1) process.