Times Series 1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 88

1.

INTRODUCTION

1.1. Definition

A time series is a sequence of observations that are arranged according to the time of their
outcome. Many sets of data appear as time series: a monthly sequence of the quantity of goods
shipped from a factory, a weekly series of the number of road accidents, the newspapers'
business sections report daily stock prices, weekly interest rates, meteorology records hourly
wind speeds, daily maximum and minimum temperatures and annual rainfall. An intrinsic feature
of a time series is that, typically, adjacent observations are dependent. The nature of this
dependence among observations of a time series is of considerable practical interest. Time Series
analysis is concerned with techniques for the analysis of this dependence. Time Series Analysis
is the analysis of data organized across units of time.

Time series data provide useful information about the physical, biological, social or economic
systems.

Economic and financial time series: Many time series are routinely recorded in economics and
finance. Examples include share prices on successive days, export totals in successive months,
average incomes in successive months, and company profits in successive years and so on.

The average Monthly price of a certain crop at a town measured in successive years from 1993 to
2002 in fiscal year is given in figure1.1. This series is of particular interest to economic
historians, and is available in many places. The time plot shows that some apparent cyclic and
trend behaviors.
Figure 1.1: Plot of Price

Physical time series: Many types of time series occur in the physical sciences, particularly in
meteorology, marine science and geophysics. Examples are rainfall on successive days, and air
temperature measured in successive hours, days or months. Figure 1.2 the average weekly
maximum temperature of a country measured in 10 successive years in five months. The time
plot shows an outlier clearly at observation 141 and it needs adjustment using outlier adjustments
techniques.

Figure 1.2: The average maximum temperature in successive weeks over 5 months per 10 years.

Marketing time series: The analysis of time series arising in marketing is an important problem
in commerce. As an example, Figure 1.3 shows the sales of an engineering product by a certain
company in successive months over a 7-year period, as originally analyzed by Chatfield and
Prothero(1973). It is often important to forecast future sales so as to plan production. It may also
be of interest to examine the relationship between sales and other time series such as advertising
expenditure.

Figure 1.3: Sales of an industrial heater in successive months from Jan. 1965 to Nov. 1971.

Demographic time series: Various time series occur in the study of population change.
Examples include the population of the country measured annually, and monthly birth totals in
the country etc. Demographers want to predict changes in population for as long as ten or twenty
years in the future.

Process control data: In process control, the problem is to detect changes in the performance of
a manufacturing process by measuring a variable, which shows the quality of the process. These
measurements can be plotted against time as in Figure 1.4. When the measurements stray too far
from some target value, appropriate corrective action should be taken to control the process.
Special techniques have been developed for this type of time series problem, and the reader is
referred to a book on statistical quality control (e.g. Montgomery, 1996).

Figure 1.4: A process control chart.


Binary processes: A special type of time series arises when observations can take one of only
two values, usually denoted by 0 and 1 (see Figure 1.5). Time series of this type, called binary
processes, occur in many situations, including the study of communication theory. For example,
the position of a switch, either „on‟ or „off‟, could be recorded as one or zero, respectively.

Figure 1.5:A realization of a binary process.

Terminology

Univariate time series are those where only one variable is measured over time, whereas
multiple time series are those, where more than one variable are measured simultaneously.

Continuous time series: A time series is said to be continuous when observations are made
continuously in time (at every instant of time). The term „continuous‟ is used for series of this
type even when the measured variable can only take a discrete set of values.

Discrete time series: A time series is said to be discrete when observations are taken only at
specific times, usually equally spaced. The term „discrete‟ is used for series of this type even
when the measured variable is a continuous variable.

Discrete time series can arise in several ways. Given a continuous time series, we could read off
(or digitize) the values at equal intervals of time to give a discrete time series, sometimes called a
sampled series. The sampling interval between successive readings must be carefully chosen so
as to lose little information. A different type of discrete series arises when a variable does not
have an instantaneous value but we can aggregate (or accumulate) the values over equal
intervals of time. Examples of this type are monthly exports and daily rainfalls. Finally, some
time series are inherently discrete, an example being the dividend paid by a company to
shareholders in successive years.

Deterministic Time Series: Much statistical theory is concerned with random samples of
independent observations. The special feature of time-series analysis is the fact that successive
observations are usually not independent and that the analysis must take into account the time
order of the observations. When successive observations are dependent, future values may be
predicted from past observations. If a time series can be predicted exactly, it is said to be
deterministic. However, most time series are stochastic in that the future is only partly
determined by past values, so that exact predictions are impossible and must be replaced by the
idea that future values have a probability distribution, which is conditioned by a knowledge of
past values.

Time Plot is a plot of observations against time. It is most important step in any time series
analysis. This graph should show up important features of the series such as trend, seasonality,
outliers, change in structure etc. the plot is vital, both to describe the data and to help in
formulating a sensible model.

Stationary series: A series whose overall behavior remains the same over time. It fluctuates
around a constant mean. A time series „looks‟ stationary if the time plot of the series appears
„similar‟ at different points along the time axis.

1.2. Objectives of Time Series

There are several possible objectives in analyzing a time series. These objectives maybe
classified as description, explanation, prediction and control.

a. Description
When presented with a time series, the first step in the analysis is usually to plot the data and to
obtain simple descriptive measures of the main properties of the series. If a time series contains
trend, seasonality or some other systematic component and correlations between successive
observations, the usual summary statistics can be seriously misleading and should not be
calculated. Moreover, even when a series does not contain any systematic components, the
summary statistics do not have their usual properties.

b. Explanation
When observations are taken on two or more variables, it may be possible to use the variation in
one time series to explain the variation in another series or ascertaining the leading, lagging and
feedback relationships among several series. A univariate model for a given variable is based
only on past values of that variable, while a multivariate model for a given variable may be
based, not only on past values of that variable, but also on present and past values of other
(predictor) variables. In the latter case, the variation in one series may help to explain the
variation in another series. For example, it is of interest to see how sea level is affected by
temperature and pressure, and to see how sales are affected by price and economic conditions.

c. Prediction
Given an observed time series, one may want to predict the future values of the series. This is an
important task in sales forecasting, and in the analysis of economic and industrial time series.
Many writers use the terms „prediction‟ and „forecasting‟ interchangeably, but some authors do
not. For example, Brown (1963) uses „prediction‟ to describe subjective methods and
„forecasting‟ to describe objective methods.

d. Control
Time series are sometimes collected or analyzed so as to improve control over some physical or
economic system. For example, when a time series is generated those measures the „quality‟ of a
manufacturing process, the aim of the analysis may be to keep the process operating at a „high‟
level and design optimal control scheme. Prediction is closely related to control problems in
many situations. For example, if one can predict that a manufacturing process is going to move
off target, then appropriate corrective action can be taken.

1.3. Significance of time series analysis

- It helps in understanding past behavior in the data.


- It helps to understand “what changes have taken place in the past”
- It helps in planning future operation
- It helps in evaluating current accomplishment
- It facilitates comparison

1.4. Components of Time Series


Traditional methods of time series analysis are mainly concerned with decomposing the variation
in a series in to trend, seasonal variation, other cyclic changes, and irregular fluctuations. This
approach is not always the best but is particularly valuable when the variation is dominated by
trend and/or seasonality. However, it is worth nothing that the decomposition is generally not
unique unless certain assumptions are made. Thus some sort of modeling, either explicit or
implicit, may be involved in these descriptive techniques, and this demonstrates the blurred the
border line between descriptive and inferential techniques.

Trend Component: A trend is evolutionary movement, either upward or downward, in the value
of the variable. This type of component is present when a series exhibits steady upward growth
or a downward decline, at least over several successive time periods, when allowance has been
made for the other components. This may be loosely defined as „long-term change in the mean
level‟. A difficulty with this definition is deciding what is meant by „long term‟.

For example, climatic variables sometimes exhibit cyclic variation over a very long time period
such as 50 years. If one just had 20 years of data, this long-term oscillation may look like a trend,
but if several hundred years of data were available, then the long-term cyclic variation would be
visible. Nevertheless in the short term it may still be more meaningful to think of such a long-
term oscillation as a trend. Thus in speaking of a „trend‟, we must take into account the number
of observations available and make a subjective assessment of what is meant by the phrase „long
term‟. See figure1.1 above.

Seasonal Component: It is also known as periodicity. This type of component is generally


annual in period and arises for many series, whether weekly, monthly or quarterly measured,
when similar patterns of behavior are observed at particular times of the year. It describes any
regular fluctuations with a period of less than one year. For example, the costs of various types
of fruits and vegetables, average daily rainfall, all show marked seasonal variation. See figure 1.7
below.
Figure 1.5: series with seasonal component

Cyclic Component: Apart from seasonal effects, some time series exhibit variation at a fixed
period due to some other physical cause. This includes regular cyclic variation at periods other
than one year. In addition some time series exhibit oscillations, which do not have a fixed period
but which are predictable to some extent. For example, economic data are sometimes thought to
be affected by business cycles with a period varying from about 3 or 4 years to more than 10
years, depending on the variable measured. However, the existence of such business cycles is the
subject of some controversy, and there is increasing evidence that any such cycles are not
symmetric.

Irregular Component: The phrase „irregular fluctuations‟ is often used to describe any variation
that is „left over‟ when other components of the series (trend, seasonal and cyclical) have been
accounted for. As such, they may or may not be random.

1.5. Models of Time Series

A time series model describes the process that generates the time series data using mathematical
and/or statistical expressions. In a simple model, the original data at any time point (denoted by
Yt) may be expressed as a function f of the components: the seasonality, the trend, cyclical, and
the irregularity
That isYt = f(Tt, St, Ct, It). There are usually two forms for the function f;

An additive model:
Yt = Tt+ St+ Ct +It
And a multiplicative model:
Yt = Tt*St *Ct*It, where
Yt = observation for period t,
Tt = trend component for period t,
St = seasonal component for period t,
Ct = cyclical component for period t
It = irregular component for period t
A classical approach to decomposition of a time series into patterns are additive and
multiplicative (actually, there are a lot of different decomposition algorithms). The additive
model is appropriate if the magnitude (amplitude) of the seasonal variation does not vary with
the level of the series, while the multiplicative version is more appropriate if the amplitude of the
seasonal fluctuations increases or decreases with the average level of the time series.

1.6. Editing Time Series Data

Analysts generally like to think they have „good‟ data, meaning that the data have been carefully
collected with no outliers or missing values. In reality, this does not always happen, so that an
important part of the initial examination of the data is to assess the quality of the data and
consider modifying them, if necessary.

The process of checking through data is often called cleaning the data, or data editing. It is an
essential precursor to attempts at modeling data. Data cleaning could include modifying
outliers, identifying and correcting obvious errors and filling in (or imputing) any missing
observations. This can sometimes be done using fairly crude devices, such as down weighting
outliers to the next most extreme value or replacing missing values with an appropriate mean
value. Data cleaning often arises naturally during a simple preliminary descriptive analysis. In
particular, in time-series analysis, the construction of a time plot for each variable is the most
important tool for revealing any oddities such as outliers and discontinuities.
Missing Value
When a series does not have too many missing observations, it may be possible to perform some
missing data analysis, estimation and replacement.
When observations are missing at random, it may be desirable to estimate, or impute, the missing
values so as to have a complete time series. A crude missing data replacement method is to plug
in the mean for the overall series. Another algorithm is to take the mean of the adjacent
observations. Missing value replacement in exponential smoothing often applies one-step-ahead
forecasting from the previous observations.
Caution!! Nonetheless, if there are too many observations missing, the series may simply be
unusable.

Outliers
Outliers, or aberrant observations, are often clearly visible in the time plot of the data. If they are
obviously errors, then they need to be adjusted or removed. Instead of adjusting or removing
outliers, an alternative approach is to use robust methods, which automatically down weight
extreme observations. Running median smoothers (also called Odd-span moving medians)
are effective data smoothers when time series data may be contaminated with unusual values.
The moving median of span-3 is a very popular and effective data smoother, where mt[3] =
median(Yt-1, Yt, Yt+1). For example, consider the sequence of observations: 15, 18, 13, 12, 16,
14, 16, 17, 18, 15, 18, 200, 19, 14, 21, 24, 19, 19 and 25. Here 200 is unusual value and that
deserves special attention should likely not be analyzed along with the rest of the dataset. So, 200
will be replaced by 19 if applying the moving median of span-3, and then the smoothed data is:
15, 18, 13, 12, 16, 14, 16, 17, 18, 15, 18, 19, 19, 14, 21, 24, 19, 19 and 25.
2. TEST OF RANDOMNESS

2.1. Introduction

A time series, in which the observations fluctuate around a constant mean, have a constant
variance and are statistically independent, is called a random time series. In other words, the time
series does not exhibit any pattern:

 The observations do not trend upwards or downwards,


 The variance does not increase or decrease over time, and
 The observations do not tend to be larger in some periods than in other periods.
A random model can be written as: Yt = μ + et, where μ is a constant, the overall mean of Yt and
et is the residual (or error) term which is assumed to have a zero mean, a constant variance, and
et‟s are statistically independent.

One can examine whether a time series is random or not by visually, whether the time series plot
shows any trend or not, or visually by looking at the Correlogram of the time series, or
statistically, testing whether the observed series could have been generated by a random
stochastic process, for example, statistical test based on- Turning Points, difference sign test,
phase length test and rank test.

2.2. Statistical Tests

2.2.1 Turning Points Test

It is a type of test based on counting the number of turning points. Meaning the number of times
there is a local maximum or minimum in the series. A local maximum is defined to be any
observation Yt such that Yt> Yt-1 and also Yt>Yt+1. A converse definition applies to local
minimum. If the series really is random, one can work out the expected number of turning points
and compare it with the observed value. The following is the procedure that test randomness of a
series by turning points.

Count the number of peaks or troughs in the time series plot. A peak is a value greater than its
two neighbors. Similarly, a trough is a value less than its two neighbors. The two (peak and
trough) together are known as Turning Points.

Consider the time series [Y1, Y2, Y3, ..., YN], the initial value Y1 cannot define a turning point
since we do not know Y0. Similarly, the final value YN cannot define a turning point since we do
not know YN+1. Three consecutive values (Yi, Yi+1, Yi+2) are required to define a turning point. In
a random series, these three values can occur in any of six possible orders and in any four of
them, there would be a turning point. The probability of finding a turning point in any set of
three values is .

Now, define a counting variable C, where Ci ={

Therefore, the number of turning points p in the series is given by p= ∑ and then the

probability of finding a turning points in N consecutive values is E(p) = E(Ci) = .

If the observed number of Turning Points is more than the expected value, , then they

could not have arisen by chance alone. In other words, the series is not random. In order to test
whether the difference between the observed and expected number of turning points is
statistically significant, we have to calculate the variance of p. From combinatorial algebra as,
var(p) = . We can test an observed value against the expected value by the series is

random (null hypothesis) against the series is not random (alternative hypothesis) from the test
statistic, P, based on the decision rule of reject the null hypothesis if pcal is not in the interval of

2*√ .Note that the distribution of p tends to standard normality as N tends to

infinity, consequently, Z = and then the decision rule becomes reject the null

hypothesis if │Zcal │> ⁄ at significant level.


Example: Consider the following series at time t (in year) and test randomness of the series
using turning points.

Time, t 1 2 3 4 5 6 7 8 9 10 11 12
Data 1 102 112 113 100 90 88 85 86 91 92 99 105
Data 2 102 112 88 95 75 103 98 106 98 82 87 92
In order to apply test of randomness for the series by using turning point test first plot the series.

Solutions for data1:

Figure 2.1: Counting turning points

Then from the plot P = 2 and the CI = 2*√ = (3.98, 9.36) and 2 is not in the

interval and hence we can conclude that the series is not random.

Solutions for data2:


Figure 2.2: Counting turning points

Then from the plot P = 8 and the CI = 2*√ = (3.98, 9.36) and 8 is in the

interval and hence we can conclude that the series is random.

2.2.2 Difference Sign test

This test consists of counting the number of positive first difference of the series, that is to say,
the number of points where the series increases (we shall ignore points where there is neither
increase nor decrease) with a series of N terms we have N-1 differences. Let us define a variable

Di= { , where i = 1, 2, 3, …, N-1. Then the number of points of increase, say W, is

given as W = ∑ and assume it‟s distribution is normal with mean and variance

as n becomes infinity. This is because E(W) = ∑ = ∑ = and Var(W) =

∑ 2
= . The distribution of W tends to standard normal as N tends to

infinity, consequently, Z = and hence the hypothesis to be tested is: the series

is random (H0) against the series is not random (H1). So, we can decided to reject H0 if │Zcal
│> ⁄ at significant level.

Example 1: Consider the following series and test the randomness of it by applying difference
sign test. [Use 5% SL]

Time, t 1 2 3 4 5 6 7 8 9 10 11 12
Data 3 35 46 51 46 48 51 46 42 41 43 61 55
Solution: In order to test the randomness of the series, first find the difference of the series and
obtain the number of increasing points.

Time, t 1 2 3 4 5 6 7 8 9 10 11 12
Data 3 35 46 51 46 48 51 46 42 41 43 61 55
difference - 11 5 -5 2 3 -5 -4 -1 2 18 -6
Di - 1 1 0 1 1 0 0 0 1 1 0
Then w = ∑ = 6, E(w) = (12-1)/2 = 5.5, Var(w) =(12+1)/12=1.083, Zcal=(6-5.5)/(1.083) ½
= 0.45 and ⁄ = 1.96. Hence, Zcal is less than Zcrit and we retain H0 at 5% significance level
and conclude that the series is statistically random.

2.2.3 Phase length Test

A phase is an interval between two turning points (peak and trough or trough and peak). To
define a phase of length d from the series we require d + 3 points, i. e., a phase of length 1 will
requires 4 points, a phase of length 2 will requires 5 points, and the like.

Consider the d + 3 values arranged in increasing order of magnitude. Then the probability of a

phase either rising or falling is . Now in a series of length n there are n-d-2 possible

phase of length d and the expected number of phases of length is: E(d) = .
The phase length test compares the deserved number with the expected number through the
statistic with a slight modification on the decision rule.

Step1: classify the observed and expected counts of phase length in to three categories as d=1,
d=2 and d 3.

Step2: calculate as: =∑ .

Step3: if 6.3, compare it with otherwise compare with .

Hypothesis: Series is random (H0) Versus Series is not random (H1)

Decision: reject H0 if > , where c = 1 or 6/7 and df 2.5 or 2.

Example: consider data 2 above and test the randomness of it by using phase length test [Use 5%
sig.level].

Solution:

Figure 2.3: Identifying phase length

d #d E(d) =

1 6 3.75
2 1 1.467
3 0 0.3694

=∑ = + + = 1.868 < 6.3 and we use = 5.991

and = = 1.601.

Decision: since < (i.e., 1.601 < 5.991) we retain Ho at 5% sig. level and conclude that

the series is random.

2.2.4 Rank Test

Given a set of series in which having Trend pattern. From the series let us count the number of
cases in which each observation is greater than the previous observation(s), i.e., Yt> Yt-1, Yt-2,…
Y1. Now let each count be Mt and add up to M = ∑ , then calculate Kendall‟s correlation

coefficient, r = -1,

-1 r 1.

Hypothesis: series is random (H0) versus series is not random (H1). Use r as a test statistic by

assuming that it‟s distribution is normal with mean 0 and variance .

Decision: reject H0 if │r│ >√ √ otherwise retain H0.

Example: consider the following series and test the randomness of the series by using rank test.

Time, t 1 2 3 4 5 6 7 8 9 10 11 12
Data 4 10 9 11 10 12 13 12 13 14 12 15 12
Solution:

Time, t 1 2 3 4 5 6 7 8 9 10 11 12 Total
Mt 0 0 2 1 4 5 4 6 8 4 10 4 48
Therefore, r = -1 = -1= -1 = 0.454545 =0.45 and √ =√ .

Therefore, since │r│ >√ (i. e., 0.45 > 0.22) we reject H0 and conclude that the series is

not random.

Exercises

1. In a certain time series there are 56 values with 35 turning points. Then, test the
randomness of the series using turning points method. (Use ).
2. Consider Q1 and let the series has 34 phases with a phase-length of 1, 2 and 3 are 23, 7
and 4 respectively. So, what is your decision about the randomness of the series if you
apply phase-length test? (Use )
3. Test the randomness of the series by using difference sign test to distinguish the
randomness of the series having 73 observations and w= 35. (Use )
4. Test the randomness of the series by using rank test to distinguish the randomness of the
series having 73 observations and the sum of positive differences are 39.

3. ESTIMATION OF TREND COMPONENT

3.1. Constant mean model and its estimation


Consider a simple situation where a time series is generated by a constant mean plus random
error function. The model may then be written as: Yt = μ+ , where μ is a constant mean and
E( ) = 0 for all t. We wish to estimate μ with our observed time series Y1, Y2, Y3,…,YN. The most

common estimate of μ is the sample mean or average defined as: ̂ = ̅ = ∑ . Under the

minimal assumptions, we see that E( ̅ ) = μ; therefore ̅ is an unbiased estimate of μ.


i) Free Hand Method (High and Low Midpoint)

In order to estimate the trend pattern by free hand method the following steps will perform.

 Plot the series against time.


 Connect by straight lines of the high points of each cycle and do same for the low points.
 Determine by interpolation the value on the lines at time t.
 Compute the average of the high and low lines.
 Connect them, and that approximate the trend of the series.

Figure 3.1: Trend estimation by free hand Method

ii) Semi-Average Methods

This method is to estimate the trend pattern by divide the series in to equal parts and computing
the averages of each part.

Example: Estimate the trend pattern by semi-average method by divide it on 4 equal parts.
T 1 2 3 4 5 6 7 8 9 10 11 12
Yt 9 8 9 12 9 12 11 7 13 9 11 12
S-AM= ̂ t 8.67 11 10.33 10.67

15

10
Yt
5 Tt -hat
error
0
1 2 3 4 5 6 7 8 9 10 11 12
-5

Figure 3.2: Trend estimation by Semi-average Method


iii) Least Square Method

Suppose that all observations from the origin of time through the current period; say Y1, Y2,
Y3,…, YN are available. The least squares criterion is to choose μ so as to minimize sum of the

squares of error (SSE); i. e. SSE = ∑ ̂ . From ̂


= 0 one can obtains ̂ = ∑ .

Therefore, ̂ is the estimates of trend pattern in the series.

Example: estimate the trend for the following time series using the least square methods.

T 1 2 3 4 5 6 7 8 9 10 11 12
Data1 10 9 11 10 12 13 12 13 14 12 15 12
Data2 14 15 10 14 17 12 15 11 12 18
Estimated values for the trend component are 11.92 and 13.8 for Data1and Data2 respectively.

20
20

15 15

10 Data1 10 Data2
Tt-hat Tt-hat
5 5

0 0
1 3 5 7 9 11 1 2 3 4 5 6 7 8 9 10

Figure 3.3: Trend estimation by Least Square Method

iv) Moving Average Method

In the ordinal least square method, the arithmetic mean includes all part observations of the
series with equal weight, namely . Since the value of the unknown parameter can change

slowly with time, it is reasonable to give more weights to observations nearby in time are likely
to be close in values. Here there are two moving average methods to estimate Trend component.
Method 1: Simple Moving Average (Odd Order, k) it is denoted by kMA and calculates for all t
except for those at the most beginning and end. At each period, the oldest observation is
discarded and the nearest is included to the set.

Formula: ̂ t = Mt = kMA= , where m =

and Yt is the midpoint in the range of k.

Method 2: Centered Moving Average (Even Order, k)-when we computed the average of the 1st
k periods we could have placed the average in the middle of the interval of k periods. This works
well with odd periods, but not so good for even periods. So where would we place the first
moving average when k is even? Technically, the moving average would fall at t= 2.5, 3.5, ….
To avoid this problem we smooth the moving averages as making moving average of the moving
averages called Centered Moving Average, CMA denoted by 2*kMA. This method weighs the
series to be averaged be assigning to the first and the last series and to the rest of the middle

series. Means greater weighs is given to the middle set of the series and less weighs to the two
extreme series.

Formula: 2*kMA= , where m = and k is

even number.

Example: Consider the following series and estimate the trend pattern by taking 3, 5 and 4
period moving average methods (that is k =3, 5 and 4).

t 1 2 3 4 5 6 7 8 9 10
Yt 14 15 10 14 17 12 15 11 12 18
Solution: for instance, the first 13 in the 3rd column (3MA) is 1/3 (14+15+10), 13.625 in the 5th
column (2*4MA) is 1/8 (14+2*15+2*10+2*14+17).

T 1 2 3 4 5 6 7 8 9 10
Yt 14 15 10 14 17 12 15 11 12 18
3MA - 13 13 13.7 14.33 14.67 12.67 12.67 13.67 -
̂t 5MA - - 14 13.6 13.6 13.8 13.4 13.6 - -
2*4M 13.6 13.87 14.12
A - - 3 13.6 5 5 13.13 13.25 - -

20

15
Yt

10 3*MA
5*MA
5 2*4MA

0
1 2 3 4 5 6 7 8 9 10

Figure 3.4: Trend estimation by Moving Average Method

Therefore, the estimated trend, ̂ t, is either 3MA or 5MA or 2*4MA and the best one is an
estimate that have minimum mean square error (MSE) among 3MA, 5MA or 2*4MA.

v) Exponential Smoothing Method

This is a very popular scheme to produce a smoothed time series. Whereas in single moving
averages that past observations are weighted equally, exponential smoothing assigns
exponentially decreasing weights as the observation get older. In other words, recent
observations are given relatively more weight in correcting than the older observations. In the
case of moving averages, the weights assigned to the observations are the same and are equal to
(1/k) except the beginning and end values (1/2k) in the centered moving average method. The
effect of recent observations is expected to decline exponentially over time. The further back
along the historical time path one travels, the less influence each observation has on trend
estimation. To represent this geometric decline in influence, an exponential weighting scheme is
applied in a procedure referred to as simple (single) exponential smoothing (Gardiner, 1987). In
exponential smoothing, however, there are one more smoothing parameters to be determined (or
estimated- the constant level) and these choices determine the weights assigned the observations.
In the exponential smoothing, a new estimate is the combination of the random error, (Yt – ̂ t)
generated in the present time period. That is ̂ t = ̂ t-1 + ( ) or ̂ t+1 = ̂ t + ( , where

= Yt – ̂ t then; ̂ t+1 = ̂ t + –̂ = + (1 – ) ̂ t , again where

Yt = the actuate data in the present time period,

̂ t = the estimated value for the present time period,

= Yt – ̂ t = the estimating error for the present time period,

= a weight or percentage of smoothing and

̂ t+1 = the new estimated value for the next time period.

Therefore, ̂ t+1 = ̂ t = + (1 – )̂ t it is known as Simple Exponential Smoothing.

Why is it called “exponential”? Let us expand the simple exponential smoothing equation by
first substituting for ̂ t in the simple exponential equation to obtain:

̂ t+1 = + (1 – ) – ̂ t-1] = + (1 – ) – ̂ t-1.

By substituting for ̂ t-1, and so forth, until we reach ̂ 0, we will ultimately get:

̂t = ∑ – – ̂ 0.

For example, the expanded equation for t=4 the smoothed value is:

̂ t =∑ – – ̂0

=∑ – – ̂0

= – + – – + – + – ̂,

where ̂ 0is the starting value of exponential smoothing and an important role in computing all
the subsequent exponential weighted smoothed average‟s. Setting Y1 for ̂ 0 is one method of
initialization.

Another way is to set it to the target of the process may:

 A simple average of the most recent k observations if historical data are available,
 Some subjective prediction must be made if there are no reliable past-data available.
Example: Estimate trend component by simple exponential smoothing method for the series
given below. [use = 0.1 and ̂ 0 =Y1]

T 1 2 3 4 5 6 7 8 9 10
Yt 14 15 10 14 17 12 15 11 12 18
Solution: from ̂ t = + (1 – ) ̂ t then;

̂ 1 = 0.1*Y1+(1-0.1)*Y1 = 0.1*14+0.9*14 = 14

̂ 2 = 0.1*Y2+(1-0.1)* ̂ 1 = 0.1*15+0.9*14 = 14.1

̂ 3 = 0.1*Y3+(1-0.1)* ̂ 2 = 0.1*10+0.9*14.1 = 13.7 and the like.

t 1 2 3 4 5 6 7 8 9 10
Yt 14 15 10 14 17 12 15 11 12 18
̂t 14 14.1 13.7 13.7 14 13.8 14 13.7 13.5 14

20

15

10 Yt
Yt+1-hat
5 Error

0
1 2 3 4 5 6 7 8 9 10
-5

Figure 3.5: Trend estimation by Simple Exponential Smoothing Method


Properties of Simple Exponential Smoothing

For t sufficiently large so that – ̂ 0 is close to zero and the weights, – decreases
geometrically, and their sum is unity, then the exponential smoothing process gives as unbiased
estimate of μ, that means E( ̂ t) = μ.

Proof: Consider the constant mean method i. e., Yt = μ + , then;

Yt-k = μ + and E(Yt-k) = μ + =μ+0

This is becuase, E( ̂ t) = E( ̂ t+1) =E( ∑ – – ̂ 0)

=∑ – – ̂0

=∑ – – ̂0

=∑ – – ̂0

= ∑ – = ( ) =

Therefore, E(̂t) = ; hence simple exponential smoothing is an unbiased estimator.

What is the “best” value for ? The problem faced here is, how do we find an appropriate
value for ? In general the value of should between 0.1 and 0.9 (see ChatField, C. 2002). But
a smaller smoothing constant gives more relative weight to the observations in the more distant
past and a larger smoothing constant, within these bounds, gives more weight to the most recent
observation and less weight to the most distant observations.
In Practice, its value is finding by trial and error. This means try by taking different in the

given interval and then compute the Mean Square Error [MSE = ∑ ̂ for those

different value. The value that yields the smallest MSE is an appropriate value for

Example:Consider the following data set consisting 12 observations taken over time and
estimate Trend component at time t by assuming ̂0 = 71 and = 0.1 and 0.5. Which is
appropriate? Why?

t 1 2 3 4 5 6 7 8 9 10 11 12
Yt 71 70 69 68 64 65 72 78 75 75 75 70
Solution:

T 1 2 3 4 5 6 7 8 9 10 11 12 MSE
Yt 71 70 69 68 64 65 72 78 75 75 75 70
= 0.1 71 70.9 70.71 70.44 69.8 69.32 69.58 70.43 70.88 71.29 71.67 71.5
= 0.5 71 70.5 69.75 68.88 66.44 65.72 68.86 73.43 74.21 74.61 74.8 72.4
0.1) 0 -0.9 -1.71 -2.44 -5.8 -4.32 2.42 7.57 4.12 3.71 3.33 -1.5 14.1
0.5) 0 -0.5 -0.75 -0.88 -2.44 -0.72 3.14 4.57 0.79 0.39 0.2 -2.4 3.78
Estimation of trend using =0.5 is better than that of =0.1. Because the mean square error for
=0.5 is smaller than =0.1.

100

80
Yt
60
= 0.1
40 = 0.5
Error( )
20
Error( =0.5)
0
1 2 3 4 5 6 7 8 9 10 11 12
-20

Figure 3.6: Trend estimation by Simple Exponential Smoothing Method and comparing MSE.
3.2. Linear Trend Estimation

A time series that exhibits a trend is a non-stationary time series. Modeling and forecasting of
such a time series is greatly simplified if we can eliminate the trend. One way to do this is to fit a
regression model describing the trend component to the data and then subtracting it out of the
original observations, leaving a set of residuals that are free of trend. The trend models that are
usually considered are the linear trend, in which the mean of Yt is expected to change linearly
with time as in E(Yt) = β0 + β1t.

i) Least Square Estimation Method

Assume that there are T periods of data, say Y1, Y2, Y3, …, YT. Let again the estimation for β0
and β1 is ̂ 0 and ̂ 1 respectively. Thus, ̂ t = ̂ 0 + ̂ 1t is known as the fitted model. Then the
difference between the fitted model and the data is denoted as: ̂ t. To estimate ̂ 0

and ̂ 1 by the method of least square, we must choose ̂ 0 and ̂ 1, so that error sum of square is
minimum as much as possible. That is:

SSE = ∑ =∑ ̂ =∑ ̂ ̂ is minimum.

It is necessary that ̂ 0 and ̂ 1 satisfy the conditions:

̂ = 0 ----------------------(1) and ̂ = 0------------------------------------(2)

First let‟s consider equation (1) as:

∑ ̂ ̂
̂ = ̂ =0

➱-2 ∑ ̂ ̂ =0

➱ ̂ ̂ ∑ =∑

➱̂ = ∑ ̂ ∑ , we know that ∑ =
➱̂ = ∑ ̂ -------------------------------------------------------(3)

Now consider equation (2) as:

∑ ̂ ̂
̂ = ̂ =0

➱-2 ∑ ̂ ̂ =0

➱∑ ̂ ∑ ̂ ∑ =0

➱̂ ∑ ̂ ∑ ∑ , we know also∑ =

➱̂ ̂ ∑ -------------------------------------(4)

Substitute equation (3) in to equation (4) as:

∑ ̂ ̂ ∑

➱ ∑ ̂ ̂ ∑

➱̂ ∑ ∑

( )
➱̂ ∑ ∑

➱̂ ∑ – ∑ ---------------------------------------(5)

Thus, consider equation (3) and solve for ̂ as:

̂ = ∑ ̂
➱̂ = ∑ ∑ – ∑

➱̂ = ∑ ∑ ∑

➱̂ = ∑ ∑ --------------------------------(6)

Therefore, ̂ ̂ ̂ is the estimated of linear trend by least square method.

The magnitude of ̂ indicates the trend (or average rate of change) and its sign indicates the
direction of the trend.

Example: Assume linearity and estimate the trend pattern from the following series by least
square method.

Month Jan Feb Mar Apr May Jun Jul Aug Sep
Price 3 6 2 10 7 9 14 12 18
Solution:

Month Jan Feb Mar Apr May Jun Jul Aug Sep Total
Price 3 6 2 10 7 9 14 12 18 81
tYt 3 12 6 40 35 54 98 96 162 506
2
t 1 4 9 16 25 36 49 64 81 285

Therefore, ̂ = ∑ ∑ = = 0.58 And

̂ = ∑ ∑ = = 1.68

The fitted model is: ̂ t = ̂ t= ̂ 0 + ̂ 1t = 0.58 + 1.68t.

Now estimate the error term as:

Month Jan Feb Mar Apr May Jun Jul Aug Sep
Price 3 6 2 10 7 9 14 12 18
̂t 2.26 3.94 5.62 7.30 8.98 10.66 12.34 14.02 15.70
error 0.74 2.06 -3.62 2.70 -1.98 -1.66 1.66 -2.02 2.30

MSE = SSE = 4.89

ii) Moving Average Method

Unfortunately, neither the mean of all data nor the moving average of the most recent moving
values is able to cope with a significant trend. There exists a variation on the moving average
procedure that often does a better job of handling trend. It is called Double Moving averages for
a linear Trend process. It calculates a second moving average from the original moving average,
using the same value for moving. As soon as both single and double moving averages are
available. Therefore, estimation of trend in a time series which has a linear relationship with t is
as follows.

Recall the single moving average is Mt = and again

consider a double moving average from the single moving average, say:

= , where m = and k is the same in both cases,

then the trend estimate of linear model assumption is obtained as: ̂ t = 2 Mt - .

Example: consider the price series above and estimate the trend by linear moving average
procedure with k =3.

Month Price 3MA 3MA[2] ̂t


Jan 3 - - -
Feb 6 3.67 - -
Mar 2 6.00 5.33 6.67
Apr 10 6.33 7.00 5.67
May 7 8.67 8.33 9.00
Jun 9 10.00 10.11 9.89
Jul 14 11.67 12.11 11.22
Solution: Aug 12 14.67 - -
Sep 18 - - -

20
18
16
14
12
10 Price
8 Tt
6
4
2
0
Jan Feb Mar Apr May Jun Jul Aug Sep

Figure 3.7: Trend estimation by Double moving average Method from linear model assumption.

iii) Exponential Smoothing Method

As we previously observed, single smoothing does not excel in following the data when there is
a trend. This situation can be improved by the introduction of a second equation with a second
constant, , which must be chosen in conjunction with .

In simple exponential smoothing, we have:

̂ = +(1– ) ̂ =∑ – – ̂

Letting – = for convenience, then ̂ =∑ – – ̂

= ∑ ̂
Apply the same procedure for second order exponential smoothing and using similar arguments
one can get: ̂ = ∑ ̂ ̂

Then the trend estimate at the end of period t is given by ̂ t = 2 ̂ - ̂ . This procedure may
by refer to as the Brown’s one-parameter linear exponential smoothing.

The initial values of ̂ and ̂ are obtained from estimates of the two coefficients ̂ ̂ ,

which may be developed through simple linear regression analysis of a historical data. Given
initial estimates of ̂ ̂ from ordinal least square method ( ̂ t = ̂ + ̂ t from the series),

then ̂ =̂ ̂ and ̂ = ̂ ̂ .

Example: apply the Brown‟s method with = 0.2 to estimate the linear trend for the price data
given above.

Solution: From the ordinal least square method, we obtain: ̂ t = 0.58 + 1.68t.

This means the initial values of ̂ ̂ and then

̂ =̂ ̂ = 0.58 - *1.68 = -6.14.

̂ = 0.2*3+0.8*(-6.14) =-4.31

̂ = 0.2*3+0.8*(-4.31) =-2.25

̂ = 0.2*3+0.8*(-2.25) =-0.72 and the like.

̂ = ̂ ̂ =0.58 - 2* *1.68 = -12.86.

̂ = 0.2*(-4.31) + 0.8*(-12.86) = -11.25

̂ = 0.2*(-2.25) + 0.8*(-11.25) = -9.37


̂ = 0.2*(-0.72) + 0.8*(-9.37) = -7.64 and the like.

Finally, ̂ t = 2* ̂ -̂

̂ 1 = 2*(-4.31) – (-11.15) = 2.53

̂ 2 = 2*(-2.25) – (-9.37) = 4.87

̂ 3 = 2*(-0.72) – (-7.64) = 6.2 and the like.

Month Price ̂ ̂ ̂t
Jan 3 -4.31 -11.15 2.53
Feb 6 -2.25 -9.37 4.87
Mar 2 -1.40 -7.78 4.98
Apr 10 0.88 -6.04 7.81
May 7 2.10 -4.42 8.62
Jun 9 3.48 -2.84 9.80
Jul 14 5.59 -1.15 12.32
Aug 12 6.87 0.45 13.29
Sep 18 9.10 2.18 16.01

20

15

10 Price
Tt
5

0
Jan Feb Mar Apr May Jun Jul Aug Sep

Figure 3.8: Trend estimation by Double ES Method from linear model assumption.
Properties of Exponential Smoothing

In simple exponential smoothing, we have ̂ t = ∑ ̂0

Now taking expected values as:

E( ̂ t)=E( ∑ ̂ 0) = ∑ ̂0

As t become infinity will tend to 0

E( ̂ t)= ∑ (β0 β1 )

= β0 β1 ∑ β1 ∑

= β0 β1 β1 ∑

= β0 β1 β1 ∑

= β0 β1 β1

E( ̂ t) = β0 β1 β1 , since = . This shows that for a linear model, the first order

(simple) exponential smoothed statistic will tend to lag behind the true value by an amount equal
to β1.

Apply the same procedure for double exponential smoothing and using similar arguments one
can get: E( ̂ ) = E( ̂ t) β1 = β0 β1 t β1 β1 = β 0 β1 β1.

Finally, double exponential smoothed statistic will tend to lag behind the true value by an
amount equal to β1in a linear model assumption.

Exercises
1. Consider the following time series data in which monthly sales of shampoo product in a
certain super market and estimate the Trend pattern based on constant mean model
assumption using 3, 5 and 4 period moving average. Which moving average is best to
estimate Trend?
Month Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
price 266 145.9 183.1 119.3 180.3 168.5 231.8 224.5 192.8 122.9 136.5 185.9

2. A certain company‟s credit outstanding has been increasing at a relatively constant rate
(in millions) over time as we have seen from the following series. Therefore, estimate the
Trend component based on linear model assumption using: (I) Least square (II) 3 period
linear moving average (III) Simple Exponential Smoothing (Use ) method.
Which method is best to estimate Trend?

Year 1 2 3 4 5 6 7 8 9 10 11 12 13

Credit 133 155 165 171 194 231 274 312 313 343 333 360 373

3.3 Non-linear trend and its estimation

Polynomial trend model

In some cases, a linear trend is inadequate to capture the trend of a time series. A natural
generalization of the linear trend model is the polynomial trend model.

Tt = β0 + β1t + β2t2 + … + βptp ,


where p is a positive integer.

Note that the linear trend model is a special case of the polynomial trend model (p=1)

For economic time series we almost never require p > 2. That is, if the linear trend model is not
adequate, the quadratic trend model will usually work:
Quadratic model: Tt = β0 + β1t + β2t2

Estimating Quadratic Trend Model

Our assumption at this point is that our time series, yt, can be modeled as yt = Tt(β) + εt
whereTt is the quadratic trend model , β denotes the parameters of the quadratic trend model,
and εt denotes the other factors (i.e., the seasonal and cyclical components) that determine yt.

We don‟t observe the β‟s and so we will need to estimate them in order to forecast the trend (and,
eventually, y).

The natural approach to estimating the quadratic trend model is the least square approach –
Choose the β‟s to minimize

[ y
t 1
t  Tt (  )]2

In the case of the quadratic trends this is a straightforward application of OLS.

Choose β0,β1,β2 to minimize

[ yt 1
t  0  1t   2t 2 )]2

That is, run a regression of yt on a constant, t, and t2.

It turns out that under the assumptions of the unobserved components model, the OLS estimator
of the linear and quadratic trend models is unbiased, consistent, and asymptotically efficient.
Further, standard regression procedures can be applied to test hypotheses about the ‟s and
construct interval estimates.

The Log Linear Trend Model

Another alternative to the linear trend model is the log linear trend model, which is also called
the exponential trend model:

Tt = β0exp(β1t) or, taking natural logs on both sides,

log(Tt) = log(β0) + β1t.

Note that the log of the trend component is linear.

4. ESTIMATION OF SEASONAL COMPONENT


4.1 Introduction
The seasonal component refers to the variation in the time series that occurs within one year.
These movements are more or less consistent from year to year in terms of placement in time,
direction and magnitude. Seasonal fluctuations can occur for many reasons, including natural
conditions like climatic variations; price of cereals is low in harvesting season and high in
sowing season. The term seasonal effect is often used to describe the effects of these factors. It
is detected by measuring the quantity of interest for small time intervals, such as days, weeks,
months or quarters.
A decision maker or analyst can make one of the following assumptions when treating the
seasonal component: Additive (constant from period to period but showing variations with in
each period) and Multiplicative (the size of the seasonal effect appears to increase with the mean
that has slight variation from period to period).
The analysis of time series, which exhibit seasonal variation, depends on whether one wants to
(1) estimate if seasonality is of direct interest and/or (2) remove from the data if seasonality is
not of direct interest. For series showing little trend, it is usually adequate to estimate the
seasonal effect for a particular period (e.g. quarter) by finding the average of each quarter
observation minus the corresponding yearly average in the additive case, or the January
observation divided by the yearly average in the multiplicative case.
The following are the main reasons for studying seasonal variation. After establishing the
seasonal pattern, methods can be implemented to eliminate it from the time-series to study the
effect of other components such as cyclical and irregular variations. This elimination of the
seasonal effect is referred to as deseasonalizing or seasonal adjustment of data and to project the
past patterns into the future. Knowledge of the seasonal variations is a must for the prediction of
the future trends. Therefore, the process of estimating and removing the seasonal effects from a
time series is called Seasonal Adjustment. Seasonal variation is measured in terms of an index,
called a seasonal index. It is an average that can be used to compare an actual observation
relative to what it would be if there were no seasonal variations. An index value is attached to
each period of the time series within a year. The months (or quarters) we will refer to as periods
and an annual seasonal pattern has a cycle that is 12 periods long, if the periods are months, or 4
periods long if the periods are quarters. This implies that if monthly data are considered there are
12 separate seasonal indices, one for each month and 4 separate seasonal indices, one for each
quarter.
To measure seasonal effects, we calculate a series of seasonal indices. We may quantitatively
measure how far above or below a given period stands in comparison to the expected period (the
expected values are represented by a seasonal index of 100% or 1). The strength of the seasonal
effect on the original data can be seen in each period‟s seasonal index which is above or below
expected data level (seasonal index of 100%). Practical and widely used methods to compute
these indices are:
 The Simple Averages Method
 Link Relatives Method
 Ratio-to-Moving Average Method
 Ratio-to-Trend Method
One method is not superior to the other, but if the trend component is important, use ratio-to-
trend method as trend component is given better treatment than the other methods. While if
cyclical component is important, use ratio-to-moving average method as cyclical component is
given better treatment than the other methods.

4.2 Estimation of Seasonal Pattern for Multiplicative Model


To determine the seasonal pattern in equation: Yt = Tt * St * Ct * It, we must estimate how the
data in the time series vary from month to month, from quarter to quarter… throughout a typical
year. The procedures for calculation of the seasonal indices listed above are summarized below.
4.2.1 The Simple Average Method
This method averages the series for each period of the cycle to create for each period, and an
overall average. Under this method a typical figure for each season (day, week, month or quarter)
is obtained.
The steps in its computations are the following.
Step1: Compute the simple average for each year (the data may involve weekly, monthly or
quarterly).
Step2: Divide each of the season values by the corresponding yearly averages and express the
result as a percentage.
Step3: Sort-out these percentages by seasons.
Step4: Find the average percentage of each season and adjust these if it is not 100.
Example: Calculate the seasonal indices using the simple average method for the following
series showing the amount of money in million dollars on passengers travel in a certain country.
Quarter I II III IV
Year
1998 71 89 106 78
1999 71 70 108 79
2000 73 91 111 81
2001 76 97 122 89

140
120 122
106 108 111
100 97
89 91 89
80 78 79 81
71 71 70 73 76
60 Birr
40
20
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Solution: Step 1
e.g. 86 Year Average
1998 86
1999 87
2000 89
2001 96
Steps 2-4
e.g.82.6= Quarter I II III IV
Year
1998 82.6 103.5 123.2 90.7
1999 81.6 103.4 124.1 90.8
2000 82.0 102.2 124.7 91.0
2001 72.2 101.0 127.1 92.7
Mean = S.I. 79.6 102.5 124.8 91.3

E.g.: 79.6 =

Note that yearly mean value is 100 and hence the mean of each quarter is the same to that of the
seasonal index for each quarter. Therefore, the seasonal indices for quarter I, II, III and IV are
79.6, 102.5, 124.8 and 91.3 respectively. These shows that the effect of the season in quarter I
and IV is decreased by 20.4% and 8.7% from the grand mean while the effect of the season in
quarter II and III is increased by 2.5% and 24.8% from the grand mean respectively.
4.2.2 Link Relative Method
This method expresses each figure as a relative of the immediately preceding value (week,
month, quarter). The steps involved under this are given below.
Step1: Express each of the season values as a percentage of the pervious value. These are called
Link Relative and computed as LRt= .

Step2: Sort out the link relatives by season and obtain the average of link relative for each

season. It is calculated as ̅̅̅̅= , where s is number of observations in each season.

Step3: covert these averages into a series of Chain Relatives by setting the value of the first
season (week, January or quarter) as 100. Here the chain relative of any season is obtained
by multiplying the link relative of that preceding season and dividing by 100.

That is CRi= , i=1, 2, 3, …L and L is the length of season. For instance L =4 if the series

is found in the form of quarters.


Step4: adjust the chain relative for the effect by subtracting the relevant correction factor and
then obtain the adjusted seasonal indices.
Example: obtain the seasonal indices for the travel expense data using link relative method.
Solution: step1- calculation of link relatives
Year Quarter Yt (Yt/Yt-1)*100=LR
1 71
2 89 125.4
3 106 119.1
1998 4 78 73.6
1 71 91.0
2 90 126.8
3 108 120.0
1999 4 79 73.1
1 73 92.4
2 91 124.7
3 111 122.0
2000 4 81 73.0
1 76 93.8
2 97 127.6
3 122 125.8
2001 4 89 73.0
Step2- calculations of link relative averages for each quarter
Quarter I II III IV
Year
1998 125.4 119.1 73.6
1999 91 126.8 120 73.1
2000 92.4 124.7 122 73
2001 93.8 127.6 125.8 73
LR Average 92.1 126.1 121.7 73.2
Step3- calculation of chain relatives for each quarter: 103.4 = (92.1*112.3)/100
LR Average 92.1 126.1 121.7 73.2
CR 100 126.1 153.5 112.3 103.4

Step4- calculation of adjusted chain relatives and adjusted seasonal indices


Note that the correction factor is 103.4-100 = 3.4 and the difference is positive, so, we
subtract one-fourth of 3.4, two-fourth of 3.4 and three-fourth of 3.4 from the second, the
third and the fourth quarter of chain relative in order to obtain the mean of adjusted chain
relatives. Here the mean value of adjusted chain relative is 121.4 and hence adjust it to get
adjusted seasonal indices of the series.
125.32= 126.1-(1/4*3.4)
Quarter I II III IV Mean
Adjusted CR 100 125.32 151.05 109.28 121.4
Adjusted S.I. 82.4 103.2 124.4 90.0 100.0

Therefore, the seasonal indices for quarter I, II, III and IV are 82.4, 103.2, 124.4 and 90.0
respectively. These shows that the effect of the season in quarter I and IV is decreased by
17.6%and 10.0% from the mean while the effect of the season in quarter II and III is increased
by 3.2% and 24.4% from the mean respectively.

4.2.3 Ratio-to-Moving Average Method


Now let us try to understand the measurement of seasonal variation by using the Ratio-to-Moving
Average Method. This technique provides an index to measure the degree of the Seasonal
Variation in a time series. The index is based on a mean of 100, with the degree of seasonality
measured by variations away from the base. This method is also called the percentage moving
average method. This is because the original data values in the time-series are expressed as
percentages of moving averages. The steps of the calculation for seasonal indices are given
below.
Step1: Find the centered (12 monthly or 4 quarterly) moving averages of the original data values
in the time-series.
Step2: Express each original data value of the time-series as a percentage of the corresponding
centered moving average values obtained in step1.In other words, we get
(Original data values)/(Trend values) *100 = (T*C*S*I)/(T*C)*100 = (S*I) *100. This implies
that the ratio–to-moving average represents the seasonal and irregular components.
Step3: Arrange these percentages according to season (months or quarter) of given periods and
then find the grand averages of all season (months or quarters) of the given periods.
Step4: If the average of these indices is not 100, then multiply by a correction factor = 100/
(grand average of monthly or quarterly indices). Otherwise, seasonal averages (averages
of 12 months or 4 quarters) will be considered as seasonal indices.
Example: Let us calculate the seasonal index by the ratio-to-moving average method from the
following data:
Quarter
Year I II III IV
1996 75 60 54 59
1997 86 65 63 80
1998 90 72 66 85
1999 100 78 72 93
Solution: Calculations the moving averages for 4 quarters and ratio of actual data-to-moving
averages. Then calculation of moving average and ratio-to-moving averages are shown in the
table below.
Step1: Calculation of moving average ratios

Year Quarter Yt 2x4MA


1996 1 75
2 60
3 54 41.75 129.34
4 59 40.50 145.68
1997 1 86 40.88 210.40
2 65 44.13 147.31
3 63 48.00 131.25
4 80 46.25 172.97
1998 1 90 46.38 194.07
2 72 49.13 146.56
3 66 51.63 127.85
4 85 50.13 169.58
1999 1 100 50.13 199.50
2 78 53.50 145.79
3 72
4 93

Step 2 and 3: Calculation of seasonal index


Quarter
Year I II III IV
1996 129.34 145.68
1997 210.4 147.31 131.25 172.97
1998 194.07 146.56 127.85 169.58
1999 199.5 145.79
Seasonal Average 201.32 146.55 129.48 162.74

Step4: Now the mean of seasonal averages is not 100 rather it is 160.03(=
¼*{201.32+146.55+129.48+162.74}). Therefore the corresponding correction factor would be
100/160.03 = 0.625. Each seasonal average is multiplied by the correction factor 0.625 to get the
adjusted seasonal indices as shown in the table below. 101.70 = 162.74*0.625
Quarter I II III IV Mean
Adjusted Seasonal Average 125.8 91.6 80.9 101.7 100.0
Therefore, the seasonal indices for quarter I, II, III and IV are 125.8, 91.6, 80.9 and 101.7
respectively. This table clearly shows an annual seasonal pattern of below average in the
beginning periods followed by an interim time interval of above average and ending each year
with another below average. These uninterrupted highs and lows in the seasonal index set
represent a very strong seasonal effect in the data. Thus the effect of the season in quarter II and
III are decreased by 8.42% and 19.09% from the overall mean while in quarter I and IV are
increased by 25.83% and 1.7%, respectively.

4.2.4 The Ratio-to-Trend Method


It is also known as percentage to trend method. The following are the steps to compute seasonal
indices by the method of ratio to trend.
Step1: obtain trend values by applying the method of least squares (this is not ordinary method).
Stwp2: divide each of the original data by corresponding trend value and express the results as
percentages.
Step3: Obtain the average for each month or quarter for all the years.
Step4: adjust these if they do not average to 100%.
Example: Use ratio-to-trend method and estimate the seasonal index for the travel expense data.
Quarter
Year I II III IV
1998 71 89 106 78
1999 71 90 108 79
2000 73 91 111 81
2001 76 97 122 89

Solution: step1- estimate trend by least square method.


For determining the seasonal variation by ratio to trend method, we first calculate the quarterly
trend by considering the average quarterly expense for each year. This eliminates the seasonal
effect on the trend value. First estimate trend by least square method by giving dummy variables
for the average values of the year.
Year Average (YA) X yx X2
1998 86 -3 -258 9
1999 87 -1 -87 1
2000 89 1 89 1
2001 96 3 288 9
Total 358 0 32 20
∑ ∑
̂= ̂ ̂ , where ̂ = = = 89.5 and ̂ = = = 1.6.Therefore, ̂ =89.5+1.6x.

The fitted line shows that the values of Yt increase by 1.6 after every half a year or ½(1.6) =0.8
after every quarter. Assuming that the given quarterly data correspond to the middle of the
quarter(x=3 corresponds to July 1, 1998; x=-1 corresponds to July 1, 1999 and so on.) When x=0
which corresponds to January 1, 2000, the estimated trend is 89.5(=89.5 +1.6*0). But, we need
the trend value a half quarter later. Thus, the trend value of quarter I of year 2000 is
89.5+1/2(0.8) =89.9.
The trend value for the 2nd, the 3rd and 4th of 2000 and the quarters of 2001 are obtained by
successive addition of 0.8 to 89.9 and the trend values for the quarters of 1998 and 1999 are
obtained by successive subtraction of 0.8.

Quarter
Year I II III IV
1998 83.5 84.3 85.1 85.9
1999 86.7 87.5 88.3 89.1
2000 89.9 90.7 91.5 92.3
2001 93.1 93.9 94.7 95.5
Step2: dividing the actual values by the corresponding trend estimates and expressing the result
in percentages ({Yt/ ̂ ), we obtain the following results.

Quarter
Year I II III IV
1998 85 105.6 124.6 90.8
1999 81.9 102.8 122.3 88.6
2000 81.2 100.3 121.3 87.8
2001 81.6 103.3 128.8 93.2

Step3: calculation of averages for each quarter.


For example, 82.4 = (85+81.9+81.2+81.6)/4
Now seasonal indices for various quarters are obtained by expressing the average percentages of
the quarters as the percentages of the overall average=99.9, i.e., the adjusted seasonal index for
quarter I is (82.4/99.9)*100= 82.5 and for quarter II is (103/99.9)*100=103.1 and so on.

Quarter I II III IV Mean


Average 82.4 103 124.2 90.1 99.9
Adjusted S.I. 82.5 103.1 124.3 90.2 100.0

Therefore, the seasonal indices for quarter I, II, III and IV are 82.5, 10.3.1, 124.2 and 90.2
respectively. Thus the effect of the season in quarter I and IV are decreased by 17.5% and 9.8%
from the overall mean (expected mean which is 100) while in quarter II and III are increased by
3.1% and 24.3% from expected mean (=100), respectively.
4.3 Estimation of Seasonal Component for the Additive Model
A time series model with an assumption of patterns in the additive is given by: Yt = Tt+ St+Ct+It.
If this model appears to be appropriate, the seasonal pattern is given in absolute terms. The
methods discussed above can be adopted easily to obtain the seasonal component for the additive
model.
For instance, if the moving average method is to be used one may follow the following steps.
Step1: compute a 12-month or 4-quarter centered moving average
Step2: subtract the moving average from the actual data.
Step3: construct a table containing these differences by season (months or quarter) and find the
seasonal (monthly or quarterly) averages and then the grand average.
Step4: Adjust them if they do not total zero (or the grand mean is not zero) by the addition or
subtraction of a correction factor.
Note that the correction factor in additive model is the grand average with sign reversed and
adjusted seasonal index is average for season plus correction factor.
Example: find the seasonal index in an additive model for the following data.
Quarter I II III IV
Year
1985 416 477 462 466
1986 446 471 487 482
1987 449 483 490 484
1988 476 507 516 510
Solution: step1and 2- calculation of 4-quarter centered moving average and differences for the
actual and centered moving averages.
Year Quarter Yt 2*4MA Yt – 2*4MA
1985 1 416
2 477
3 462 459.0 3.0
4 466 462.0 4.0
1986 1 446 464.4 -18.4
2 471 469.5 1.5
3 487 471.9 15.1
4 482 473.8 8.3
1987 1 449 475.6 -26.6
2 483 476.3 6.8
3 490 479.9 10.1
4 484 486.3 -2.3
1988 1 476 492.5 -16.5
2 507 499.0 8.0
3 516
4 510
Step3: calculation of seasonal averages for the differences and the grand mean.

Quarter I II III IV
Year
1985 3 4
1986 -18.4 1.5 15.1 8.3
1987 -26.6 6.8 10.1 -2.3 Grand
1988 -16.5 8.0 Mean
Mean -20.5 5.4 9.4 3.3 -0.6
Step4: The grand mean is not zero. Therefore, we should adjust the mean of each quarter in
order to obtain the adjusted seasonal indices. Our correction factor is the grand mean with
reversed sign, i.e., -(-0.6)=0.6. So, the adjusted seasonal indices are given below in the table.

Quarter I II III IV Grand mean


Mean -20.5 5.4 9.4 3.3 -0.6
Adjusted S.I -19.9 6.0 10.0 3.9 0.0

4.4 Uses of Dummy Variables


Seasonally adjusted time series are obtained by removing the seasonal component from the data.
Statistician may implement a seasonal adjustment procedure using The Simple Averages
Method, Link Relatives Method, Ratio-to-Moving Average Method and Ratio-to-Trend Method
and report the deseaonalized time series. Another method for removing the seasonal factor is by
the use of dummy variables. The seasonal dummy variables can be created with the number of
periods in the seasonal cycle (4 for quarterly data and 12 for monthly data). For example, with
quarterly data, seasonal dummy variables may be defined as follows.
Q1 = 1, if quarter 1 (Jan., Feb., Mar.), 0 otherwise
Q2 = 1, if quarter 2 (Apr., May, June), 0 otherwise
Q3 = 1, if quarter 3 (July, Aug., Sep.), 0 otherwise
Q4 = 1, if quarter 4 (Oct., Nov., Dec.), 0 otherwise
Then the data has the form:
Q1 Q2 Q3 Q4
---------------------
Year 1 Quarter 1 1 0 0 0
2 0 1 0 0
3 0 0 1 0
4 0 0 0 1
Year 2 Quarter 1 1 0 0 0
2 0 1 0 0
3 0 0 1 0
4 0 0 0 1
etc.
Example: consider the travel expense data above.

Year Quarter Yt Q1 Q2 Q3 Q4
1998 1 71 1 0 0 0
2 89 0 1 0 0
3 106 0 0 1 0
4 78 0 0 0 1
….
2001 1 76 1 0 0 0
2 97 0 1 0 0
3 122 0 0 1 0
4 89 0 0 0 1
One can estimate the trend component by least square method as ̂ t = 80.75 + 1.03t with R2 =
0.099 from the series.

Coefficients: output from SPSS


Model Unstandardized Coefficients Standardized t-value Sig.
Coefficients
B Std. Error Beta
Constant 80.750 8.034 10.051 .000
Time 1.029 .831 .314 1.239 .236
Assuming the actual series is in the form of:Yt = + Q1+ Q2+ Q3 + εt , then one can
gets

̂ t = 81.75 - 9Q1 + Q2 + Q3with R2 = 0.922.

Coefficients: output from SPSS


Model Unstandardized Coefficients Stand.Coeffici. t-value Sig.
B Std. Error Beta
Constant 81.750 2.428 33.668 .000
Q1 -9.000 3.434 -.258 -2.621 .022
Q2 10.000 3.434 .287 2.912 .013
Q3 30.000 3.434 .861 8.736 .000

Assume the actual series is in the form of: Yt = + Q1 + Q2 + Q3 + εt, then one
can findthe fitted model as: ̂ t = 81.75 – 0.8t - 6.6Q1 + Q2 + Q3 with R2 = 0.979.

Coefficients: output from SPSS


Model Unstandardized Coefficients Stand. Coeffici. t-value Sig.
B Std. Error Beta
(Constant) 73.750 2.000 36.880 .000
Time .800 .149 .244 5.367 .000
Q1 -6.600 1.938 -.189 -3.406 .006
Q2 11.600 1.909 .333 6.077 .000
Q3 30.800 1.891 .883 16.286 .000

Note that the parameter estimates of the Q‟s will give the seasonal effect for each of the three
quarters. The seasonal effect for the fourth quarter is given by the constant intercepts. Alternative
schemes can be used to allocate the dummy variables. For example, instead of excluding the
fourth quarter dummy variable, the above application could have excluded the first quarter
dummy variable. Another way of proceeding is to include dummy variables for all four quarters.
If this method is used then the intercept must be dropped from the regression equation to avoid
the dummy variable trap.

4.5 Smoothing Methods for Seasonal Series


Recall that if the data are stationary (or if it can be represented by the constant mean model),
then one can use the least square method, the simple moving average method or simple
exponential smoothing method to estimate the Trend of the series. On the other round, if the data
exhibit a linear trend with time, then one can use least square method, linear moving average
method or linear exponential smoothing method to estimate the trend on the series. In both cases
the assumption is that there is no seasonality made as yearly data. But if there is seasonality in
the series the above method cannot effectively handle the estimation or modeling the series. In
such cases we can consider Winters‟ method smoothes the series by Holt-Winters Exponential
Smoothing. We can use this procedure when both trend and seasonality are present, with these
two components being either additive or multiplicative. Winters' Method calculates dynamic
estimates for three components: level, trend, and seasonal.
Holts‟ linear exponential smoothing employs the parameter component or overall smoothing (in
MINITAB it is called a level component), a trend component, and a seasonal component at each
period. It uses three weights, or smoothing parameters, to update the components at each period.
Winters‟ method requires initial values for the level, trend and seasonal components as well as
smoothing parameters. This is done by using the data what we have in our hand for the initials.
For example, initial values for a level and trend components are obtained from a linear regression
on time and Initial values for the seasonal component are obtained from a dummy-variable
regression (Ratio-to-Trend Method) using the historical data. The Winters' method smoothing
equations are:
Additive Model:
̂ ( ̂ ) ̂ ̂

̂ ̂ ̂ ̂

̂ ̂ ̂

̂ ̂ ̂ ̂

Multiplicative Model:

̂ ( ̂ ) ̂ ̂

̂ ̂ ̂ ̂

̂ ̂ ̂

̂ ̂ ̂ ̂ ,where

̂ is the parameter component or overall smoothing at time t


is the weight for the parameter component
̂ is the trend at time t
is the weight for the trend
̂ is the seasonal component at time t
is the weight for the seasonal component
L is the seasonal period (length)
Yt is the data value at time t
̂ is the fitted value, or one-period-ahead forecast, at time t

Winters‟ method requires initial values of the parameters like ̂ , ̂ and ̂ for all seasons and
smoothing constants , and . But if there is historical data, they can be used to provide some
or all the initial estimates.
4.6 Deseasonalization of Data
The non-stationary pattern in a time series data needs to be removed from the series before
proceeding with model building. One way of removing non-stationary is through the method of
differencing. The differenced series is defined as: Yt = Yt – Yt-1. Taking first differencing is a
very useful tool for removing non-stationary, but sometimes the differenced data will not appear
stationary and it may be necessary to difference the data a second time. The series of second
2
order difference is defined: Yt = Yt – Yt-1 = (Yt – Yt-1) – (Yt-1 – Yt-2) = Yt – 2Yt-1 + Yt-2. In
practice, it is almost never necessary to go beyond second order differences.
With seasonal data which is not stationary, it is appropriate to take seasonal differences. A
seasonal difference is the difference between an observation and the corresponding observation
from the previous year as: Yt = Yt – Yt-s, where s is the length of the season.
When both seasonal and first differences are applied, it does not make any difference which is
done first. It is recommended to do the seasonal differencing first since sometimes the resulting
series will be stationary and hence no need for a further first difference. When differencing is
used, it is important that the differences be interpretable.
The other method is estimating the seasonal component and removes the seasonal component
(de-seasonalized) from the series. The deseasonalized time-series data will have only trend (T)
cyclical(C) and irregular (I) components and is expressed as: Multiplicative model: Y/S*100 =(
T*S*C*I)/S*100 = (T*C*I)*100 and Additive model: Y – S = (T+S+C+I) – S = T+C+I.
Exercises
1. The following table provides monthly sales ($1000) at a certain college bookstore. The
sales show a seasonal pattern, with the greatest number when the college is in session and
decrease during the summer months. Therefore, estimate the seasonal index by the
following methods. (I) Simple Moving Average (II) Link Relatives (III) Ratio-to-
Moving Average methods. Assume the model is multiplicative!
M Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Y
1 196 188 192 164 140 120 112 140 160 168 192 200
2 200 188 192 164 140 122 132 144 176 168 196 194
3 196 212 202 180 150 140 156 144 164 186 200 230
4 242 240 196 220 200 192 176 184 204 228 250 260
2. Consider the following series and compute the seasonal index by using ratio-to-
Trend method for the assumption of multiplicative model.
Quarter I II III IV
Year
1996 75 60 54 59
1997 86 65 63 80
1998 90 72 66 85
1999 100 78 72 93
3. Consider the following series recorded Quarterly for five years and calculate the
Seasonal Index based on: (I) Simple Average Method if the model is additive (II)
Link Relative Method if the model is additive.

Q
Y I II III IV
1950 30 40 36 34
1951 34 52 50 44
1952 40 58 54 48
1953 54 76 68 62
1954 80 92 86 82

4.7 ESTIMATION OF CYCLICAL AND IRREGULAR COMPONENTS

4.7.1. Introduction
Cyclical components usually vary greatly from one another with respect to duration and
amplitude. In practice, the cyclical component is irregular in behavior and is so inter-movable
with irregular movement and it is impossible to separate them (cyclical and irregular). In the
analysis of time series in to its component directly while cyclical and irregular fluctuations are
left together after the other components (Trend and Seasonal) have been removed. However, the
measurements of the cyclical variations involve the following steps.

4.7.2 Basic Steps in Estimation of Irregular Component


1. The first step is to estimate the trend. Two different approaches could be used for this.
 One approach is to estimate the trend with a smoothing procedure such as moving
averages. (See chapter three of this course for more on that.) With this approach no
equation is used to describe trend.
 The second approach is to model the trend with a regression equation. It is done after the
seasonal component removed (deseasonalized the series). Here we should assume the
irregular component is normally distributed with mean equal to zero and constant
variance.
2. The second step is to “de-trend” the series. For an additive decomposition, this is done by
subtracting the trend estimates from the series. For a multiplicative decomposition, this is done
by dividing the series by the trend values.
3. Next, seasonal factors are estimated using the de-trended series. For monthly data, this entails
estimating an effect for each month of the year. For quarterly data, this entails estimating an
effect for each quarter. The simplest method for estimating these effects is to average the de-
trended values for a specific season. For instance, to get a seasonal effect for a quarter, we
average the de-trended values for all quarters in the series, and so on. (Averages may be median
or mean or mode) The seasonal effects are usually adjusted so that they average to 0 for an
additive decomposition or they average to 1 for a multiplicative decomposition.
4. The final step is to determine the random (irregular) component.
For the additive model, Irregular=Series–Trend–Seasonal and
For the multiplicative model, Irregular=Series/ (Trend*Seasonal).
Example: Consider the following series and estimate the irregular component for additive model
using the above procedures.
Quarter
Year I II III IV
1992 30 135 96 188
1993 51 156 115 209
1994 70 175 136 228
1995 98 196 175 249
1996 111 215 176 270
Solution:the overall steps for estimation of irregular component are given below in the following
table.
(C1) (C2) (C3) (C4) (C5) (C6) (C7) (C8)
Year Quarter t Yt 2*4MA Yt-2*4MA ̂t DSt=Yt- ̂ t ̂t ̂t
1992 1 1 30 -74.3 104.3 104.52 -0.22
2 2 135 23.7 111.3 109.72 1.58
3 3 96 114.875 -18.875 -16.2 112.2 114.93 -2.73
4 4 188 120.125 67.875 66.8 121.2 120.13 1.07
1993 1 5 51 125.125 -74.125 -74.3 125.3 125.33 -0.03
2 6 156 130.125 25.875 23.7 132.3 130.54 1.76
3 7 115 135.125 -20.125 -16.2 131.2 135.74 -4.54
4 8 209 139.875 69.125 66.8 142.2 140.94 1.26
1994 1 9 70 144.875 -74.875 -74.3 144.3 146.15 -1.85
2 10 175 149.875 25.125 23.7 151.3 151.35 -0.05
3 11 136 155.75 -19.75 -16.2 152.2 156.55 -4.35
4 12 228 161.875 66.125 66.8 161.2 161.75 -0.55
1995 1 13 98 169.375 -71.375 -74.3 172.3 166.96 5.34
2 14 196 176.875 19.125 23.7 172.3 172.16 0.14
3 15 175 181.125 -6.125 -16.2 191.2 177.36 13.84
4 16 249 185.125 63.875 66.8 182.2 182.57 -0.37
1996 1 17 111 187.625 -76.625 -74.3 185.3 187.77 -2.47
2 18 215 190.375 24.625 23.7 191.3 192.97 -1.67
3 19 176 -16.2 192.2 198.18 -5.98
4 20 270 66.8 203.2 203.38 -0.18
Note: Column (5) is obtained by adjusting the mean of each quarter as follows.
Quarter I II III IV Mean
-74.3 23.7 -16.2 66.8 0.0
Seasonal Mean=Adjusted S.I.
For instance: -74.3 = ¼ *[(-74.125) + (-74.875) + (-71.375) + (-76.625)] and the like.
Again column (7) is obtained by fitting column (6) with column (1) i.e. Tt = 99.318 + 5.203*t
and Column (8) is obtained by subtracting column (5) and column (7) from column (2).

Exercise: Estimate the Irregular components of the following series based on the
assumption that components are in the Multiplicative Model.

Quarter
Year I II III IV
1984 2881 3249 3180 3505
1985 3020 3449 3472 3715
1986 3184 3576 3657 3941
1987 3319 3850 3883 4159

5. INTRODUCTION TO BOX-JENKINS MODELS


5.1.Introduction
Probability models for time series: They are collectively known as stochastic processes.
Stochastic processes are families of random variables that are functional of time. In practice,
many time series are clearly non-stationary, and so the stationary models cannot be applied
directly. In Box-Jenkins modeling, the general approach is to difference an observed time series
until it appears to come from a stationary process. The Box-Jenkins have endowed modeling
developed serves not only explain the underlying process generating the series but as a basis for
forecasting.
Therefore, the Box-Jenkins approach is one of the most widely used methodologies for the
analysis of time series data that involves identifying an approximate ARIMA process, fitting it to
the data and then using the fitted model for forecasting. It is popular because of its generality; it
can handle any series, stationary or not, with or without seasonal elements, and it has well-
documented computer programs.

5.2.The Concept of Stationary


Broadly speaking a time series is said to be stationary if there is no systematic change in mean
(no trend), if there is no systematic change in variance and if strictly periodic variations have
been removed. A stationary process has a constant mean which defines the level about which it
fluctuates, and has a constant variance which measures its spread about this time level. Most of
the probability theory of time series is concerned with stationary time series, and for this reason
time series analysis often requires one to turn a non-stationary time series into a stationary one so
as to use this theory.

5.3. ARIMA Models


Autoregressive models:Suppose that {  t} is a purely random process with mean zero and
variance  e2 . Then a process {yt} is said to be an autoregressive process of order p, AR(p), if

Yt =  + 1 Yt-1+  2 Yt-2+…+  p Yt-p+  t.

In AR model, the current value of the process is expressed as a finite linear aggregates of
previous values of the process and a shock  t,  is a parameter that determines the “level” of the
process.

Moving Average models:Suppose that {  t} is a purely random process with mean zero and
variance  e2 . Then a process {yt} is said to be a moving average process of order q, MA(q), if
Yt =  +  t -  1  t-1 -  2  t-2 - …-  q  t-q

Autoregressive Moving Average (ARMA) processes are processes that are formed as
combination of autoregressive and moving average processes. ARMA process of order (p,q),
ARMA(p,q) has the form: Yt =  + 1 Yt-1+  2 yt-2+…+  p Yt-p-  1  t-1 -  2  t-2 - …-  q  t-q+  t.

Autoregressive Integrated Moving Average (ARIMA): It is a general model capable of


representing a wide range of non-stationary time series. Such a model (or process) represents a
non stationary series that can be reduced to a stationary series with a degree of differencing and
having p autoregressive and q moving average term.

A non-seasonal ARIMA model is classified as an ARIMA (p, d, q) model where:


- p is the number of autoregressive terms,
- d is the number of non-seasonal differencing passes, and
- q is the number of moving average terms (lagged errors in the equation).
According to the prescription of Box & Jenkins a few steps have to be taken when building an
ARIMA model.The first step is to determine if the series is stationary to achieve this one has to
take as many differences of the original series as are needed to reduce it to stationary. Therefore,
differencing is a procedure of converting a time series of length n into another time series of
length
n-d.
Operators
The backward shift operator, B, is defined such that
BYt = Yt-1. In general, this implies that
BjYt= Yt-j.
B2Yt= B(BYt) = BYt-1=Yt-2
The backward difference operator,, may be defined as Yt = Yt- Yt-1
Yt – Yt-1 = Yt- BYt = (1-B)Yt, and therefore ,  = 1-B.
2Yt = (1-B)2Yt = (1-2B+B2)Yt

5.4.Methodological tools for model identification


Auto-covariance: Suppose two random variables X and Y have means E(X) =  X and E(Y)=  Y

respectively. Then covariance of X and Y is Cov(X, Y) = E[(X-  X )(Y-  Y )].


With reference to time series, the covariance between Yt and another observation Yt+k is called
auto-covariance at lag k, denoted by  k , and is given by

 k = cov(Yt, Yt+k) = E[(Yt-E(Yt))( Yt+k-E(Yt+k))]

Autocorrelation:
Autocorrelation coefficient measures the relationship, or correlation, between a set of
observations and a lagged set of observations in a time series. The autocorrelation
cov(Yt , Yt  k ) 
coefficient at lag k, denoted by k, is given by k = = k .
var(Yt ) var(Yt  k )  0
Given the time series (Y1, Y2, Y3, ….,Yn), the autocorrelation between Yt, and Yt+k measures
the correlation between the pairs (Y1, Y1+k), (Y2, Y2+k), (Y3, Y3+k), .… ,( Yn, Yn+k). The
sample autocorrelationcoefficientat lag k (denoted by rk), an estimate of k is computed by
nk

 (Y t  Y )(Yt  k  Y )
rk  t 1
n
, where Yt = the data from the stationary time series;
 (Y
t 1
t Y ) 2

Yt+k = the data k time periods ahead; ̅ = the mean of the stationarytime series

A graph displaying the sample autocorrelation coefficient, rk, versus the lag k is called the
sample autocorrelation function (ACF) or a correlogram. This graph is useful both in
determining whether or not a series is stationary and in identifying a tentative ARIMA model.

Properties of Autocorrelation Coefficient


 - 1 ≤ rk≤ 1
 The ACF is an even function of the lag k (i.e., rk = r-k.). This follows from the result
rk = cov(Yt, Yt+k) = cov(Yt-k, Yt)
 rkis unit-less and for a random time series rkisapproximatelynormally distributed with
mean zero and variance 1/n, where n is number of observations.
Partial autocorrelation coefficients
A partial correlation coefficient is the measure of the relationship between two variables when
the effect of other variables has been removed or held constant. Similarly, partial autocorrelation
coefficient is the measure of the relationship between the stationary time-series variables Yt and
Yt+k when the effects of the intervening observations Yt+1, Yt+2,…, Yt+k-1 has been removed.
The partial autocorrelation coefficient is denoted by kk.

The plot of partial autocorrelation coefficient kk) against the lag k gives the Partial
Autocorrelation Function (PACF) and the behavior of the partial correlation coefficients
(PACF's) for the stationary time series, along with the corresponding ACF, is used to identify a
tentative ARIMA model.
Note: 00 =1, 11 = 1
Computing kk.

If the autocorrelation „matrix‟ for a stationary time series with length k is given by:

k =[ ], where measures the correlation between Yt and Yt+k, then

| |
kk = , where is with the last column replaced by [ ] and and are a k by
| |

ksquarematrices. Note that, by definition, 11 = =1 and the sample partial autocorrelation


coefficient is denoted by ̂ kk.
For example, if we compute sample partial autocorrelation, for k = 2.

1
| |
̂ 22 = 1 2
= .
1
| |
1

Example: Compute r1, r2 and ̂ 22for the following time series.


t 1 2 3 4 5 6 7 8 9 10

Yt 47 64 23 71 38 64 55 41 59 48

Solution: sum =510, mean = 51


T 1 2 3 4 5 6 7 8 9 10

Yt 47 64 23 71 38 64 55 41 59 48

Yt - ̅ -4 13 -28 20 -13 13 4 -10 8 -3


(Yt - ̅ )( Yt+1 - ̅ ) -52 -364 -560 -260 -169 52 -40 -80 -24

(Yt - ̅ )( Yt+2 - ̅ ) 112 260 364 260 -52 -130 32 30

∑ t ̅ ̅
r1 = = -1497/1896 = -0.790
∑ ̅ 2

∑ ̅ ̅
r2 = = 876/1896 = 0.462 and
∑ ̅ 2

1
| |
̂ 22 = 1 2
= = = -0.431.
1
| |
1

5.5.Stages of Box-Jenkins methodology

The general model introduced by Box and Jenkins (1976) includes autoregressive as well as
moving average parameters, and explicitly includes differencing in the formulation of the model.
Specifically, the three types of parameters in the model are: the autoregressive parameters (p),
the number of differencing passes (d), and moving average parameters (q). In the notation
introduced by Box and Jenkins, models are summarized as ARIMA (p, d, q); so, for example, a
model described as (0, 1, 2) means that it contains 0 (zero) autoregressive (p) parameters and 2
moving average (q) parameters which were computed for the series after it was differenced once.

The Box-Jenkins approach uses an iterative model-building strategy that consists of selecting an
initial model, estimating the model coefficients, and analyzing the residuals. If necessary, the
initial model is modified and the process is repeated until the residuals indicate no further
modification is necessary. At this point, the fitted model can be used for forecasting. The basis of
Box-Jenkins approach to modeling time series consists of three phases:
 Model selection/Identification
 Parameter estimation
 Model checking/Diagnostics

Model identification:
As a rule of thumb, Box-Jenkins requires at least 50 equally-spaced periods of data. The data
must also be edited to deal with extreme or missing values or other distortions through the use of
functions as log or inverse to achieve stabilization and differencing to avoid obvious patterns
such as trend and seasonality. The input series for ARIMA needs to be stationary, that is, it
should have a constant mean and variance through time.
Therefore, usually the series first needs to be differenced until it is stationary (this also often
requires log transforming the data to stabilize the variance). The number of times the series needs
to be differenced to achieve stationarity is reflected in the d parameter. Seasonal patterns require
respective seasonal differencing.
Once stationarity have been addressed, we need to decide how many autoregressive (p) and
moving average (q) parameters are necessary to yield an effectivebut still parsimonious model of
the process (parsimonious means that it has the fewest parameters and greatest number of
degrees of freedom among all models that fit the data).The major tools for doing this are
autocorrelation function (ACF), and partial autocorrelation function (PACF).The sample
autocorrelation plots and the sample partial autocorrelation plots are compared to the theoretical
behavior of these plots when the order is known.

Parameter estimation:
Once a tentative model has been identified, the estimates for the constant and the coefficients of
the equation must be obtained. It is a way that finding the values of the model coefficient
(like , , , …, , , , , .., ).
There are several different methods for estimating the parameters. All of them should produce
very similar estimates, but may be more or less efficient for any given model. In general, during
the parameter estimation phase a function minimization algorithm is used (the so-called quasi-
Newton method; refer to the description of the Nonlinear Estimation method) to maximize the
likelihood (probability) of the observed series, given the parameter values. In practice, this
requires the calculation of the (conditional) sums of squares (SS) of the residuals, given the
respective parameters.

Model checking/Diagnostics:
Before forecasting with the final equation, it is necessary to perform various diagnostic tests in
order to validate the goodness of fit of the model. If the model is not a good fit, the tests can also
point the way to a better model.
A good way to check the adequacy of an overall Box-Jenkins model is to analyze the residuals
( Yt  Yˆt ). If the residuals are truly random, the autocorrelations and partial autocorrelations
calculated using the residuals should be statistically equal to zero.
If they are not, this is an indication that we have not fitted the correct model to the data. When
this is the case, the residual ACF and PACF will contain information about which alternate
models to consider.
6. MODEL IDENTIFICATION AND ESTIMATION
6.1. Introduction

The Box-Jenkins approach consists of extracting the predictable movements (pattern) from the
observed data through a series of iterations. One first tries to identify a possible model from a
general class of linear models. The chosen model is then checked against the historical data to
see if it accurately describes the underlying process that generates the series. If the specified
model is not satisfactory, the process is repeated by using another model designed to improve the
original one. The process is repeated until a satisfactory model is found. This procedure is
carried out on stationary data (the trend has been removed).
Box-Jenkins models can only describe or represent stationary series or series that have been
made stationary by differencing. The models fall into one of the three following categories:
Autoregressive (AR), moving average (MA) and mixed process (ARMA). If differencing is
applied together with AR and MA, they are referred to as Autoregressive Integrated Moving
Average (ARIMA), with the „I‟ indicating "integrated" and referencing the differencing
procedure.
6.2. Autoregressive models

In autoregressive models, the current value of the process Ytis a linear function of past stationary
observations Yt-1, Yt-2, Yt-3,….and the current shock εt, where {εt} denotes a purely random
process with zero mean and variance .

The autoregressive model:


Yt    1Yt 1  2Yt 2  ........... pYt  p   t , where Yt = the present stationary observation; Yt-1,

Yt-2, …..Yt-p= past stationary observations; μ, 1, 2,….p= the parameters (constant and
coefficient); and  t = the random error for the present time period (expected value is equal to 0).
The number of past stationary observations used in an autoregressive model is known as the
order, p. So, if we use two past observations in a model, we say that it is an autoregressive (AR)
model of order 2, or AR(2).
Let Yt - μ = Zt be deviation of values from μ, then the process is rewritten as:
Z t  1 Z t 1  2 Z t 2  ........... p Z t  p   t
.
Notice that the above model can be written in terms of Ztby using the backward shift operator B,
such thatBZt =Zt-1, B2Zt = Zt-2, …,BpZt =Zt-p. TheAR(p) model may be written in the form:
Z t  1 BZ t  2 B 2 Z t  ........... p B p Zp   t

Z t  (1 B  2 B 2  .........   p B p )Z t   t .This implies that

 t  (1  1 B  2 B 2  .........  . p B p )Z t

 t   ( B)Z t
 (B)= 1  1 B  2 B 2  .........   p B p
is an autoregressive operator of order p.
The necessary requirement for stationarity is that the autoregressive operator,
 (B)= 1  1 B  2 B 2  .........   p B p , considered as a polynomial in B of degree p, must have

all roots of  (B) =0 greater than one in absolute value; that is all roots must lie outside the unit
circle.
Special cases for Autoregressive Model/Process
i) AR(1)process, p=1
The simplest example of an AR process is the first-order case given by
Yt    1Yt 1   t or Z t  1 Z t 1   t  1 BZ t   t . Hence,  t  (1  1 B)Z t

For AR(1) to be stationary the roots of 1 - 1B = 0 must lie outside the unit circle. The time
series literature typically says that an AR(1) process is stationary provided that │1│< 1. It is
more accurate to say that there is a unique stationary solution of AR(1) which is causal, provided
that │1│< 1

Auto-covariance and Autocorrelation function for AR(1)


Z t  1 Z t 1   t
, where Zt=Yt - µ
Multiplying both sidesbyZt-k,
Zt-kZt  1 Z t k Z t 1  Z t k  t

Taking expectation, we have

E(Zt-kZt)  E (1 Z t k Z t 1  Z t k  t )

E(Zt-kZt)  E (1 Z t k Z t 1 )  E (Z t k  t )  t up to
when k>0, Zt-k can only involve the shocks
Cov(Zt-k, Zt) = 1 cov(Zt-k, Zt-1) time t-k, which are uncorrelated with
t .

 k = 1  k 1

The autocorrelation at lag k is given by

k 
k  = 1 k 1
0 0

 k  1  k 1 , k>0

 k  1 (1  k 2 ) = 12  k 2

With  0 =1,

 k  1k , k≥0

The autocorrelation function decays exponentially to zero when 1 is positive, but decays

exponentially to zero and oscillates in sign when 1 is negative. This property of the
autocorrelation function for AR(1) process is stated as the ACF tails off. That is the ACF of
AR(1) process tails off exponentially.

The Variances for AR(1) model: Z t  1 Z t 1   t

Multiplying both sides byZt,

ZtZt  1 Z t Z t 1  Z t  t

Taking expectation,
E(Zt2) = E( Zt Zt-1) + E(Ztt) , since the only part of Ztthat will be correlated with t is the

most recent shock, t , E(Ztt) = E(  t2 ) =  2

0 =  2 = 1 1   2 , where  2 variance of the process of AR(1)

 e2
Hence,   2
and
1  12
Example: ConsiderAR(1) model: Zt = 0.5Zt-1+t, Here 1 =0.5.

1  1  0.5 ,  2  12  0.52  0.25, ....,  k  1k .

The autocorrelations decline exponentially.

+1 ACF

k
k
1 2 3 4 5 6

ii) p=2, AR(2) process


The second order autoregressive process, AR(2), may be written as: Z t  1 Z t 1  2 Z t 2   t or

 t  (1  1 B  2 B 2 )Z t . For an AR(2) to be stationary, we must require that the roots of


1 - 1B - 2B2 = 0 must lie outside the unit circle. This implies that the parameters 1 and 2 must
lie in the triangular region.
This stationarity condition is equivalent to
1+2< 1 1

2-1< 1
2 0
│2│< 1

-1
-2 0 2
1

Auto-covariance and Autocorrelation for AR(2)


Z t  1 Z t 1  2 Z t 2   t
Multiplying the AR(2) process with Zt-k, k 0, and taking expectations leads to:
E[ZtZt-k] = E[Zt-kZt-1] + E[Zt-kZt-2] + E[εtZt-k] and this leads to the following equations.
k=0: = + +
k=1: = +
k=2: = + , and, more generally, the following difference equation holds for the
auto-covariance, = + , k>0
The autocorrelations for AR(2) can be calculated accordingly. If we divide auto-covariance,
by the variance, , we get the linear homogenous second order difference equation,
Autocorrelation function, = + . k>0
If we write for k=1 and k=2, we obtain Yule-Walker equations

1  1  2 1
 2  1 1  2
When the roots of 1  1 B  2 B 2  0 are real, the autocorrelation function consists of a mixture

of damped exponentials. This occurs when 12  42  0 . If the roots are complex ( 12  42  0 ) a
second order autoregressive process displays a pseudo periodic behavior, the autocorrelation
function shows a damped sine wave.
Variance of AR(2) process,  2

 e2
 2

1  1 1  2  2
Example: Consider the AR(2) process given by Zt = Zt-1 - 0.5Zt-2 + εt. Is this process stationary?
Findtheautocorrelation of this process.

Auto-covariance and Autocorrelation for AR(p)


AR(p) model: Z t  1 Z t 1  2 Z t 2  ........... p Z t  p   t

Multiplying both sides by Zt-k and taking expectation,


 k = E[Zt-k( Zt-1 + Zt-2 + Zt-3 +….+ Zt-p + εt)]
= + + +…+

k= + ….  k  p , for k>0,  k =   k , E[Zt-kεt] =σε2 for k = 0 and

E[Zt-kεt] =0 for k> 0.

The Autocorrelation for AR(p) process:


= + + + …+
= + + + …+
.
.
.
= + ….+ ,
The variance of AR(p) process is given by

 e2
2 
1  1 1  2  2  ...   p  p
Example: Consider fourth order autoregressive process: Yt = Yt-4 +εt, 0 < < 1, whereεt is
white noise with zero mean and variance . Then find the Variance and Autocorrelation of Yt.

6.3. Moving average models


The present stationary observation (Yt) of the moving average process is a finite linear function
of previous and current values of errors or white noises. A moving average model of order q

, MA(q), has the form: Yt     t  1 t 1   2 t 2  ........   q  t q ,where Yt = the present

stationary observation;  t = the white-noise (random) error, which is unknown and whose

expected value is zero;  t 1 , ,  t  2 , …..,  t  q = previous errors ;and, θ1, θ2, .., θq = the constant

and moving average coefficients.


Since there are a finite number of weights of θ‟s in the MA(q), any MA(q) will be stationary
regardless of the values chosen for the weights. However, it is customary to impose a condition
on the parameter values of a MA model, called the invertibility condition, in order to ensure that
there is unique MA models for a given autocorrelation function.
MA(q) process corrected for mean is Zt=  t  1 t 1   2 t 2  ........  q  t q
Notice that the above model, MA(q) can be written in terms of  t by using the backward shift

operator B, such thatB  t =  t 1 , B2  t =  t  2 , …,Bq  t =  t  q , then theMA(q) model may be

written as Z t   t  1 B t   2 B 2 t  ...   q B q  t

Z t  (1  1 B   2 B 2  ......   q B q ) t

Z t   ( B) t

 (B)=( 1  1 B   2 B 2  .........   q B q )
is a moving average operator of order q.

Mean of MA(q) process


The mean for MA(q) process is E(Yt) =  , E(  t ) = 0 for any t

Variance of MA(q) process


q
The variance of the process is  0  Var (Yt )   2  i2
i 0

Consider the MA(q) process corrected for mean: Zt=  t  1 t 1   2 t 2  ........  q  t q
The auto-covariance at lag k is  k = E(ZtZt-k)

  ( k  1 k 1   2 k  2  ...   q k  q )


 , k  1,2,..., q
2

 k 

0 ,k  q

Thus, the autocorrelation function is


  k  1 k 1  ...   q k  q
k  , k=1, 2, …, q
1  12  ...   q2

k  0 , k>q
The autocorrelation function of a MA(q) process is zero beyond the order q of the process. In
other words, the autocorrelation function of a moving average, MA (q), process has a cut off
after lag q.
Special cases of MA(q)

i) MA(1) has the form: Yt     t  1 t 1


Properties:
E(Yt) =  , Var(Yt) =  2 (1  12 )
The autocorrelation function is
  1
 ,k 1
 k  1  12
0 ,k 1

ii) MA(2) has the form: Yt     t  1 t 1   2 t 2


E(Yt) =  , Var(Yt) =  2 (1  12   22 )
The autocorrelation function:
 1  1 2  1 (1   2 )
1  
1  12   22 1  12   22

2
2 
1  12   22

 k  0 , k>2. The ACF for a MA(2) cuts off after lag 2.


Invertibilityconditions for Moving Average Processes:

Consider the MA(1) process: Zt =  t  1 t 1


Zt =  t  1 B t
Z t  (11 B) t

Solve for  t , we get

t 
1
Z t   (1  1 B) 1 (Z t ) (*)
(1  1 B)

If │  1 │< 1 (*) may be written as

 

 t   1i B i Z t (**)
 i 0 
Consider (**)
 

 t   1i B i Z t = (1+ 1 B  12 B 2  13 B 3  ...)Z t
 i 0 
 t  Zt+  1 Zt-1+ 12 Zt-2+ 13 Zt-3+…
Zt=  t -  1 Zt-1-  12 Zt-2 -  13 Zt-3 -…
This is an infinite Autoregressive process (AR(∞)).
The condition 1  1 is called the invertibility condition for a MA(1) process.

Consider the following two MA(1) processes;

Model A: Zt =  t  1 t 1
1
Model B: Zt = t   t 1
1
The autocorrelation functions:
  1
 ,k 1
For model A:  k  1  12
0 ,k 1

  1
 ,k 1
For Model B:  k  1  12
0 ,k 1

Although the models are different, they have the same autocorrelation functions.

This shows that a MA process cannot be uniquely determined from the ACF.

If we express models A and B by putting  t in terms of Zt, Zt-1, Zt-2,…, we find by successive
substitution that

Model A:  t  Zt+  1 Zt-1+ 12 Zt-2+…

Zt =  t -  1 Zt-1 -  12 Zt-2 - …

1 1
Model B:  t  Zt+ Zt-1+ Zt-2+…
1  12
1 1
Zt =  t - Zt-1 - 2 Zt-2 - …
1 1

If 1  1 , the series for A converges whereas for B does not. Therefore, if 1  1 , model A is

said to be invertible whereas B is not. MA(1) is stationary for all values of  1


Imposing an invertibility condition ensures that there is a unique MA process for a given ACF.
In general, for any MA(q) to be invertible to an AR(∞) process we must require that the roots of
the polynomial, 1  1 B   2 B 2  .........   q B q  0 , lie outside the unit circle.

Invertibility conditions for second order Moving AverageMA(2) Process

The second-order moving average process is defined by Z t   t  1 t 1   2 t 2 and is stationary


for all values of 1 and  2 . However, it is invertible only if the roots of the characteristic

equation 1  1 B   2 B 2  0
lie outside the unit circle, that is,
 2  1  1
 2  1  1
1  2  1

6.4.Autoregressive Moving Average Model, ARMA(p,q)


To achieve greater flexibility in fitting of actual time series, it is sometimes advantageous to
include both Autoregressive and Moving Average terms in the model. That is, Yt is a linear
function of past stationary observations and present and past white noises and written as:
Yt    1Yt 1  2Yt 2  ....   pYt  p   t  1 t 1   2 t 2  ...   q  t q , where Yt = the present

stationary observation; Yt-1, Yt-2,…,= the past observations,  t-1,  t-2, …..= errors for the
stationary time series;  t = the present error (where the expected value is set equal to 0);μ, 1, 2,

…, θ1, θ2, …. = the constant and the parameters of the model; and p and q denote the order of the
model. The importance of ARMA processes is that many real data sets maybe approximated in a
more parsimonious way (meaning fewer parametersare needed) by a mixed ARMA model rather
than by a pure AR or pureMA process.
Notice that the above modelisrewrittenas follow using the backward shift operator.
1   B   B
1 2
2
  
    p B p Z t  1  1 B   2 B 2     q B q  t ,whereZt is thestochastic process

being modeled, {εt} is a white noise process (i.e., a sequence of uncorrelated random variables
having zero mean and constant variance), B is the backward shift operator, defined by BZt = Zt-1,
and the  ‟s and  ‟s are the (unknown) parameters of the model, to be estimated from a
realization of the process (i.e., a sample of successive Z‟s) and we assume that Zt represents a
deviation from a mean value.
Again we may write the model 1   B   B
1 2
2
  
    p B p Z t  1  1 B   2 B 2     q B q  t

as:  B Z t   B  t , where  (B) and  (B) are polynomial of degree p and q respectively in B.

There are some cases of model  B Z t   B  t that are of special interest. If  B   1, we may

write the model as:  B z t   t or Z t  1 Z t 1     p Z t  p   t . The model is hence simply a

regression model of the most recent Zt on previous Zt‟s, and the model is called an autoregressive
process of order p. If, on the other hand,  B   1, then the model becomes

Z t   B  t or Z t   t  1 t 1     q  t q .Thus Zt is a moving average of the “error terms”, and

the process is called a moving average process of order q. The general model  B Z t   B  t is
called a mixed autoregressive moving average process.
The stationarity of an ARMA process,  B Z t   B  t , is related to the AR component in the

model.  B Z t   B  t will define a stationary process provided that the roots of the

polynomial  (B) = 0 lie outside the unit circle. Similarly, the invertibility of an ARMA(p, q)
process is related to the MA components. The roots of the polynomial  ( B)  0 must lie outside
the unit circle if the process to be invertible.
Special Case of an ARMA(p,q)

ARMA (1, 1) process has the form: Zt = 1 Z t 1 +  t  1 t 1


Z t  1 Z t 1   t 1 t 1

Z t  1 BZ t   t 1 B t

(1 1 B)Z t  (11 B) t

Note that ARMA(1,1) is stationary for -1< 1 <1 and invertible for -1<  1 < 1

For ARMA(1,1), it can be shown that

E(ZtZt) =  0 = 1 1   2 (1  1 (1  1 )) (1)

Auto-covariance at lag 1:  1  1 0  1 2 (2)


Auto-covariance at lag k:  k  1 k 1 , k 2

Hence, solving equations (1) and (2) for  0 and  1 , the auto-covariance of ARMA(1,1) process is

0 
1   1
2
 211  2 
1  12

1 
1  11 1  1  2
1  12

 k  1 k 1 k2
The autocorrelation function of ARMA(1,1) process is

1 
1  11 1  1 
1  12  211

k  1k 1 k2

6.5.Autoregressive integrated moving average models


An ARMA process (  B Z t   B  t ) is stationary if the roots of  ( B)  0 lie outside the
unit circle, and exhibits explosive non-stationary behavior if the roots lie inside the unit
circle. A generalized autoregressive operator  (B) , in which one or more of the zeros of the
polynomial  (B) (i.e. one or more of the roots of the equation  (B) =0) lie on the unit circle.

In particular, if there are d unit roots, the operator  (B) can be written as  (B) =  ( B)(1  B) d
. Thus, a model that can represent homogeneous non-stationary behavior is of the form:
 (B) Zt =  ( B) t
 ( B)(1  B) d Zt =  ( B) t
where  (B) is a non-stationary autoregressive operator such that d of the roots of  (B) =0
are unity and the remainder lie outside the unit circle and  (B) is a stationary autoregressive
operator.

Since 1  B   , the differencing operator, the model  ( B)(1  B) d Zt =  ( B) t can be written
as
 ( B) d Z t   ( B) t
We call the process  ( B) d Z t   ( B) t an autoregressive integrated moving average
(ARIMA) process. If the autoregressive operator  (B) is order of p, the dthdifference is
taken, and the moving average operator  B  is order of q, we say that we have an ARIMA
model of order
(p, d, q), or simply an ARIMA(p, d, q) process.In practice, d is usually 0, 1, or at most 2

Special cases of ARIMA model


i) ARIMA(0, 1, 1)process: Integrated moving average (IMA) process of order (0, 1, 1)
Z t   t  1 t 1
 (1  1 B) t
corresponding to p = 0, d = 1, q =1,  B   1 ,  B   1  1 B
ii) ARIMA(0, 2, 2) process:
 2 Z t   t  1 t 1   2 t 2
 (1  1 B   2 B 2 ) t
corresponding to p = 0, d =2, q = 2,  B   1 ,  B   1  1 B   2 B 2
iii) ARIMA(1, 1, 1)process:
Z t  1 t 1   t  1 t 1
1  1 BZ t  (1  1 B) t
corresponding to p = 1, d = 1, q =1,  B   1 - 1 B ,  B   1  1 B

Exercise: Classify each of the following model as an ARIMA(p, d, q) process (i.e. find p, d , q)
(a) (1- B)(1- 0.2B)Yt= (1-0.5B)  t

(b) (1- B)(1- 0.2B)Yt =  t

(c) (1- B)Yt= (1- 0.5B)  t

(d) Z t  1 0.7 B  t

(e) 1  0.8B Z t   t

6.6.Use of the autocorrelation and partial autocorrelation functions in the model


identification

The autocorrelation function of an autoregressive process of order p tails off, its partial
autocorrelation function has a cut off after lag p. Conversely, the autocorrelation of a moving
average process of order q has a cut off after lag q, while its partial autocorrelation function tails
off. If both the autocorrelation and partial autocorrelation tails off, a mixed process is suggested.
Furthermore, the autocorrelation function for a mixed process, containing a pth–order
autoregressive component and a qth-order moving average component, is a mixture of
exponential and damped sine waves after the first q-p lags. Conversely, the partial
autocorrelation function for a mixed process is dominated by a mixture of exponentials and
damped sine waves after the first p-q.

Example: If the true (correct) model is AR(1), then could be different from zero and
= 0 ,k 2. Similarly, if AR(2) is the correct model and must be different from zero
while = 0 , k 3. Hence, we say that the PACF cuts off after lag p for AR(p). So, we can use
PACF to identify the order of an AR process.
Similarly, the ACF can be used to identify the order of a MA process.

Model ACF PACF

AR(p) Die out (Tails off) Cut off after the order p of the process

MA(q) Cut off after the order q of the process Die out (Tails off)

ARMA(p,q) Die out (tails off) Die out (Tails off)

N.B: In this context “Die out” means “tend to zero gradually” and “cutoff” means “disappear or
is zero”.

Process(Model) ACFs PACFs


ARIMA (0,0,0) No significant lags No significant lags

ARIMA (1,0,0) Exponential decline, with first two or Single significant positive peak
many lags significant at lag 1
0< 1 <1

ARIMA (1,0,0) Alternative exponential decline with a Single significant negative peak
negative peak ACF(1) at lag 1
–1 < 1 < 0

ARIMA (0,0,1) Single significant negative peak at lag 1 Exponential decline of negative
value, with first two or many
0< θ1<1 lags significant

ARIMA (0,0,1) Single significant positive peak at lag 1 Alternative exponential decline
with a positive peak PACF(1)
-1< θ1< 0

Estimated autocorrelations can have rather large variances and can be highly autocorrelated with
each other. For this reason, detailed adherence to the theoretical autocorrelation function cannot
be expected in the estimated function. Since we do not know the theoretical correlations and
since estimated that we compute will differ somewhat from their theoretical counterparts, it is
important to have some indication of how far an estimated value may differ from the
corresponding theoretical value. In particular, we need some means for judging whether the
autocorrelations and partial autocorrelations are effectively zero after some specific lag q or p,
respectively.
We use the standard errors of the sample autocorrelation and partial autocorrelation functions to
identify non-zero values. Recall that the sample autocorrelation, rk, is estimate of  k and the

sample partial autocorrelation, ̂ , is the estimate of .


For larger lags, on the hypothesis that the process is moving average of order q, we can compute
k 1
1  2 ri 2
i 1
standard errors of estimated autocorrelations (rk) is S.E.(rk) = k>q
n
For moderate n, the distribution of an estimated autocorrelation coefficient, whose theoretical
value is zero, is approximately normal.

On the hypothesis that the process is autoregressive of order p, the standard error for estimated

partial autocorrelations of order p+1 and higher is S.E.( ̂ ) = √ , k>p

̂ is approximately normally distributed with E( ̂ 0 and var( ̂ , ̂

As a general rule, we would assume rk or ̂ to be zero if the absolute value of its estimate is
less than twice its standard error. That is │rk│ 2S.E.(rk) or │ ̂ │ 2S.E( ̂ ).
One may plot the interval (-2S.E.(rk), 2S.E.(rk)) to test H0: k = 0 versus H1: k 0. Reject H0 if
the value of rk lies outside the interval.
Similarly, we can use the interval (-2S.E( ̂ ), 2S.E.( ̂ )) to test H0: kk= 0 versus H1: kk 0.
Reject H0 if the value of ̂ lies outside the interval.

Sometimes one may choose more than one tentative models. Then, the diagnostic procedure will
help determine the best model.
Example: Consider the following ACF and PACF for some time series data with 120
observations.
K 1 2 3 4 5 6 7
rk 0.709 0.523 0.367 0.281 0.208 0.096 0.132
S.E.(rk) 0.091 0.129 0.146 0.153 0.153 0.153 0.153
̂ 0.709 0.041 -0.037 0.045 -0.007 -0.123 0.204
S.E.( ̂ ) 0.091 0.091 0.091 0.091 0.091 0.091 0.091
Test for PACF: H0: 11= 0 versus H1: 11 0. Reject H0 if the value of ̂ lies outside the
interval
(-2*S.E.( ̂ ), 2*S.E.( ̂ )) = (-2*0.091, 2*0.091) = (-0.182, 0.182). Since ̂ =0.709 lies
outside the interval (-0.182, 0.182), H0 is rejected.
H0: 22= 0 versus H1: 22 0. Reject H0 if the value of ̂ lies outside the interval
(-2*S.E.( ̂ ), 2*S.E.( ̂ )) = (-2*0.091, 2*0.091) = (-0.182, 0.182). Since ̂ =0.041lies in the
interval (-0.182, 0.182, H0 is not rejected.
Example 2: Consider the following ACF and PACF for a time series with N = 106.
K 1 2 3 4 5 6 7
rk -0.448 0.004 0.052 -0.058 0.094 -0.021 0.012
S.E.(rk) 0.097 0.115 0.115 0.115 0.115 0.115 0.115
̂ -0.448 -0.423 -0.308 -0.340 -0.224 -0.151 0.009
S.E( ̂ ) 0.097 0.097 0.097 0.097 0.097 0.097 0.097

Test for ACF: H0: 1 = 0 versus H1: 1  0. Reject H0 if the value of r1 lies outside the interval
(-2*S.E.(r1), 2*S.E.(r1)) = (-2*0.097, 2*0.097) = (-0.019, 0.019). Since r1= -0.448 lies outside
the interval (-0.019, 0.019), H0 is rejected.
H0:  2 = 0 versus H1:  2  0. Reject H0 if the value of r2 lies outside the interval
(-2*S.E.(r2), 2*S.E.(r2)) = (-2*0.115, 2*0.115) = (-0.23, 0.23). Since r1= 0.004 lies in the
interval
(-0.23, 0.23), H0 is not rejected.

Another approach to model selection is the use of information criteria such as Akaike
Information Criteria (AIC) or Bayesian Information Criteria (BIC). In the implementation of this
approach, a range of potential ARMA models is estimated by maximum likelihood methods. In
the information criteria approach, models that yield a minimum value for the criterion are to be
preferred, and the AIC or BIC values are compared among various models as the basis for
selection of the model.
If the time series is not stationary, the sample autocorrelation function will die down extremely
slowly. If this type of behavior is exhibited, the usual approach is to compute the same
autocorrelation and partial autocorrelation functions for the first difference (regular) of the series.
If these functions behave according to the theoretical patterns, one difference is necessary. If not
we must try successively higher order of differencing until stationarity behavior is achieved.

6.7.Seasonal ARIMA (SARIMA) models

In practice, many time series contain seasonal periodic component. The fundamental fact about
seasonal time series with period S is that observations which are S intervals apart are similar.
Box and Jenkins have generalized the ARIMA model to deal with seasonality, and define a
general multiplicativeseasonal ARIMA (abbreviated SARIMA model) as
 p ( B) P ( B s )Wt   q ( B)Q ( B s ) t (*)
where B denotes the backward shift operator,  p ,  P ,  q ,  Q are polynomials of order p, P, q,
Q, respectively.(  p (B) = 1  1 B  2 B 2  .........   p B p ,
 P ( B s )  1  1 B s   2 B 2 s  .........   P B Ps ,  q (B)  1  1 B   2 B 2  .........   q B q ,
Q ( B s )  1  1 B s   2 B 2 s  .........  Q B Qs )
 t denotes a purely random process,
andWt=  d  sDYt (**),  d  (1  B) d ,  sD  (1  B s ) D
The variables {Wt} are formed from the original series {Yt}not only by simple differencing (to
remove trend) but also by seasonal differencing,  s , to remove the seasonality.
Example: If d=1, D=1, s=12, then Wt = 12Yt = 12Yt  12Yt 1 = (Yt – Yt-12) – (Yt-1 – Yt-13)
The model in (*) and (**) is said to be a SARIMA model of order (p, d, q)  (P, D, Q)s. The value
of d and D do not usually need to exceed one.
Example: Consider a SARIMA model of order (1, 0, 0)  (0, 1, 1)12
(p=1,d=0, q=0, P=0, D=1, Q=1)
(1  1 B)Wt  (1  1 B12 ) t
 p ( B)  1  1 B,  P ( B s )  1  q ( B)  1` , Q ( B s )  1  1 B s
Wt  12Yt  Yt  Yt 12

Then we find
(1  1 B)12Yt   t  1 B12 t
12Yt  1 B12Yt   t  1 t 12
Yt  Yt 12  1 B(Yt  Yt 12 )   t  1 t 12
Yt  Yt 12  1Yt 1  1Yt 13 )   t  1 t 12

Yt  Yt 12  1 (Yt 1  Yt 13 )   t  1 t 12

6.8 Estimation of parameters

Estimating the Model Parameters


Once a tentative model has been identified, the estimates for the constant and the coefficients of
the equation must be obtained.
This is accomplished by using a computer program. These programs consist of employing least
squares algorithms through a combination of search routines and successive approximations to
obtain final least square point estimates of the parameters. The final estimates are those that
minimize the sum of squared errors to a point where no other estimates can be found that yield
smaller sum of squared errors. This is known as convergence. Because this is an iterative
process, starting values for each of the estimates must be supplied. These preliminary estimates
must satisfy the stationarity and invertibility conditions. Initial estimates for  can be found in

the following manner: ̂  Y , where Y is the mean of the stationary time series.

Initial estimates for AR Process parameters


In AR (1) Processwe know that = . If r1 is given, then r1 = ̂
1  1 implies -1<r1<1 admissible values of r1
In AR(2) Process
We know from the Yule-Walker equations that:

̂ = and
̂ =
Example: Given r1 = 0.81 and r2 = 0.43, then we have estimate the parameters

̂ = = = 1.32and

̂ = = = -0.63

Initial estimates for MA Process parameters


There is no closed form solution for MA process. The usual approach is to apply an iterative
search procedure directly to the residual sum of squares function. Such algorithm requires an
initial value for the parameters. The preliminary estimates are found by the relationships between
the model parameters and autocorrelations.
 1
For a MA(1) process, we have 1 
1  12
 k  0, k 1
 ˆ1
Given r1, we have r1 
1  ˆ 2
1

Possible solutions are


1 1
1  1  2
1  1  2
ˆ1    2  1 and ˆ1    2  1
2r1  4r1  2r1  4r1 
However, we take the solution which satisfy the condition 1  1
1 1
Admissible regions for r1 is r1 
2 2
  (1   ) 2
For MA(2) process, we have 1  1 2 22 , 2 
1  1   2 1  12   22
Given r1 and r2, one can solve the equations
 ˆ (1  ˆ )  ˆ2
r1  1 2 22 and r2  for ˆ1 and ˆ2
ˆ ˆ
1  1   2 ˆ ˆ
1  1   2
2 2
For ARMA(1,1) we have
1  11 1  1 
1  , 2  11
1  12  211
Given r1 and r2, then

r1 
1  ˆ ˆ ˆ  ˆ  ,
1 1 1 1
r2  ˆ1r1
1  ˆ  2ˆ1ˆ1
1
2

Initial estimates for residual variance (


 2 )

This can be obtained by substituting an estimate of


ˆ 2 in the expression for the variance.
For AR( P) : ˆ 2  ˆ 2 (1  ˆ1r1  ˆ2 r2  ...  ˆp rp )

e.g. for AR(1) : ˆ 2  ˆ 2 (1  ˆ1r1 ) and for AR(2) : ˆ 2  ˆ 2 (1  ˆ1r1  ˆ2 r2 )

ˆ 2
For MA(q) , ˆ 2 
1  ˆ12  ˆ22  ...  ˆq2
ˆ 2 ˆ 2
e.g. for MA(1) , ˆ 2  and for MA(2) ˆ 2 
1  ˆ12 1  ˆ12  ˆ22

ˆ 2 (1  ˆ12 )
For ARMA(1,1) it takes the form ˆ 2 
1  ˆ12  2ˆ1ˆ1

6.9.Diagnostic checking

When a model has been fitted to a time seriesdata, it is advisable to check that the model really
does provide an adequate description of the data, if necessary, suggest potential improvements.
This is done through different approaches (usuallyby looking residuals).

Autocorrelation Check (Studying residuals)


Examine autocorrelation function for residuals.

Compute ˆt  Yt  Yˆt for all t, and then construct the autocorrelation function of the residuals.

Let rk (ˆ) be the ACF of the residuals.The estimated autocorrelations rk (ˆ) of the residuals, would
be uncorrelated and distributed approximately normally about zero with variance n-1, and hence
with standard error of n-1/2.
If the model is appropriate, then the residual sample autocorrelation function should have no

structure to identify. Values which lie outside the interval  2*√ are significantly different

from zero (at 5% significance level) and give evidence that the wrong model has been fitted.
Portmanteau Lack-of-Fit Test
Ruther than considering the rk (ˆ) terms individually, an indication is often needed of whether,
say, the first 20 autocorrelations of the residuals, indicate inadequacy of the model. Suppose we
have the first M autocorrelation of the residuals, rk (ˆ) (k=1, 2, …, M) from any ARIMA(p, d, q)
M
process. If the fitted model is appropriate Q = (N - d  rk2 (ˆ ) is approximately distributed as
k 1

 2 ( M  p q ) where N= number of observations before differencing.


If the model is inappropriate (inadequate), then Q>  2 ( M  p q ) .
A modified form of Q statistic is the Ljung-Box statistic, Q* , given by
M
Q* = ( N  d )( N  d  2) ( N  d  k ) 1 rk2 (ˆ) which is approximately distributed as  2 ( M  PQ )
i 1

If the model is inappropriate (inadequate), then Q*>  2 ( M  p q .

Example: The following table shows the first 25 autocorrelations of the residuals, rk (ˆ) , from an

IMA (0, 2, 2) process:2Zt = (1 – 0.13B- 0.12B2)εt, which was fitted to a series with N=226
observations. Check the adequacyof the model using Ljung-Box statistic, Q*.
k 1 2 3 4 5 6 7 8 9 10 11 12 13

rk (ˆ) 0.02 0.032 -0.125 -0.078 -0.011 -0.033 0.022 -0.056 -0.13 0.093 -0.129 0.063 -0.084

k 14 15 16 17 18 19 20 21 22 23 24 25

rk (ˆ) 0.022 -0.006 -0.05 0.153 -0.092 -0.005 -0.015 0.007 0.132 0.012 -0.012 -0.127
 0.022 0.0322  0.1272   36.2
Q* = (224)(226)    ...  
 223 222 199 

M-p-q= 25 - 0 – 2 =23
 02.05 (23)  35.2 and  02.1 (23)  32 .

Q*>  02.05 (23)  35.2 . There is some doubt as to the adequacy of this model.

Over fitting: It is another technique used for diagnostic checking. Having identified what is
believed to be a correct model, we actually fit a more elaborate one( more elaborate model
contains additional parameters) and use likelihood ratio or t-tests to check that they are not
significant.

7. Forecasting
The use of time t of available observations from a time series to forecast its value at some
future time t+k can provide a basis for
 economic and business planning
 production planning
 inventory and production control
 control and optimization of industrial processes
Box-Jenkins methodology
Here we will see the forecasting procedure based on ARIMA models which is usually known as
the Box-Jenkins Approach. For both seasonal and non-seasonal data, the adequacy of the fitted
model should be checked by what Box and Jenkins call „diagnostic checking‟. Here, we consider
only nonseasonal time series. When a satisfactory model is found, forecasts may readily be
computed.

Forecasts are usually needed over a period known as the lead time, which varies with each
problem. For example, in a sales forecasting problem, sales Yt, in the current month t and the
sales Yt-1, Yt-2, Yt-3, … in previous months might be used to forecast sales for lead times k=1, 2,
3, …, 12 months ahead.

Denote by Yˆt (k ) the forecast made at origin at t of the sales Yt+kat some future time t+k, that is at
lead time k. The function Yˆt (k ) , k=1, 2, ……which provides the forecasts at origin t for all future
lead times, will be called the forecast function at origin t. Our objective is to obtain a forecast
function such that the mean square of the deviations Yt+k b- Yˆt (k ) between the actual and
forecasted values is as small as possible for each lead time k.

Minimum mean square error forecasts

Recall the generalARIMA model  ( B)Yt   ( B) t where  (B)   ( B) d

We shall be concerned with forecasting a value Yt+k, k≥1, when we are currently standing at
time t. This forecast is said to be made at origin t for lead time k.

Now suppose, standing at origin t, that we are to make a forecast Yˆt (k ) of Yt+kwhich is to be a
linear function of current and previous observations Yt, Yt-1, Yt-2, ….. Then it will also be a linear
function of current and previous shocks  t ,  t 1 ,  t 2 , ....

The error of the forecast Yˆt (k ) at lead time k,  t (k ) , is given as  t (k )  Yt  k  Yˆt (k ) The standard
criterion to use in obtaining the best forecast is the mean squared error for which the expected
 
value of the squared forecast errors, E Yt  k  Yˆt (k ) , is minimized.
2

Let us denote EYt  k Yt , Yt 1 ,..., the conditional expectation of Yt+k given knowledge of all Y‟s up
to time t, by Et[Yt+k]. We will assume that the  t are a sequence of independent random variables.
Then the conditional expectation of  t  j given knowledge of all Y‟s up to time t, denoted by ,
Et [ t  j ], is zero. That is E[  t  j Yt , Yt 1 , ...]  0, j  0 .

It can be shown that Yˆt (k ) = Et[Yt+k]. Thus the minimum mean square error forecast at origin t,
for lead time k, is the conditional expectation of Yt+k at time t. When Yˆt (k ) is regarded as a
function of k for fixed t, it will be called the forecast function for origin t. We note that a
minimum requirement on the random shocks  t in the ARIMA model in order for the
conditional Et[Yt+k], which always equals the minimum mean square forecast, to coincide with
the minimum mean square error linear forecasts is that Et [ t  j ]  0, j  0 .

The one-step-ahead forecast error is  t (1)  Yt+1 - Yˆt (1)   t 1 . It follows that a minimum mean
square error forecast, the one-step-ahead forecast errors must be uncorrelated. Although the
optimal forecast errors at lead time 1 will be uncorrelated, the forecast errors for longer lead
times in general will be correlated.

To calculate the conditional expectations that occur in the forecast functions, we note that if j is a
non-negative integer,
 
Et Yt  j  Yt  j j = 0, 1, 2, . . .

 
Et Yt  j  Yˆt ( j ) j = 1, 2, . . .

 
Et  t  j   t  j  Yt  j  Yˆt  j 1 (1) j = 0, 1, 2, . . .

 
Et  t  j  0 j = 1, 2, . . .

Therefore, to obtain the forecast Yˆt (k ) , one writes down the model for Yt+k and treats the terms
according to the following rules:

1. The Yt  j (j =0, 1, 2, . . .), which have already happened at origin t, are left unchanged.
2. The Yt  j (j = 1, 2, . . .), which have not yet happened, are replaced by their forecasts
Yˆt ( j ) at origin t.
3. The  t  j (j =0, 1, 2, . . .), which have happened at origin t, are available from
Yt  j  Yˆt  j 1 (1).
4. The  t  j (j = 1, 2, . . .), which have not happened, are replaced by zeros.

In the expression  t = Yt  Yˆt 1 (1) ,  t 1 = Yt 1  Yˆt 2 (1) , the forecasting process may be
started off initially by setting unknown  ' s equal to their unconditional expected values
of zero.

Example: Consider a time series represented by the model


(1-0.8B)(1-B)Yt =  t
Find the forecast function for origin t.
Solution: (1-0.8B)(1-B)Yt =  t
(1 – 1.8B + 0.8B2)Yt =  t
Yt – 1.8Yt-1 + 0.8Yt-2 =  t
That is, Yt = 1.8Yt-1 - 0.8Yt-2 +  t
The model for Yt+k is, Yt+k = 1.8Yt+k-1 - 0.8Yt+k-2 +  t  k
The forecasts at origin t are given by
Yˆ (1)  1.8Y  0.8Y
t t , k=1
t 1

Yˆt (2)  1.8Yˆt (1)  0.8Yt , k=2


Yˆt (3)  1.8Yˆt (2)  0.8Yˆt (1) , k=3
Yˆt (k )  1.8Yˆt (k  1)  0.8Yˆt (k  2) , k=3, 4, . . .

In starting the forecasting process assume that εt -j = 0 for t-j 0.

Forecasts with AR(1) Process

For this process, it holds that Yt = + Yt-1 + εt, with | | < 1.

The model for Yt +k is: Yt +k =  0 + 1 Yt+k-1 + εt+k

The forecasts at origin t for lead time k are given by

For k=1, Yˆt (1)   0 + 1 Yt.

For k=2, Yˆt (2)   0 + 1 Yˆt (1)

In general, Yˆt (k )   0 + 1 Yˆt (k  1) for k 2.

Forecasts with AR(2) Process

For this process, it holds that Yt = + Yt-1 + Yt-2+ εt, with


and | | < 1.

The model for Yt +k is Yt+k =  0 + 1 Yt +k-1 + 2Yt k 2 t+ εt+k

The forecast functions at origin are

Yˆt (1)   0 + 1 Yt+  2Yt 1 , k=1

Yˆt (2)   0 + 1 Yˆt (1) +  2Yt , k=2

Yˆt (3)   0 + 1 Yˆt (2) +  2Yˆt (1) , k=3

In general, Yˆt (k )   0 + 1 Yˆt (k  1) + 2Yˆt (k  2) , k≥3

Examples

1. Suppose a time series is represented by a model: Yt = 25+0.34Yt-1+ εt. . Suppose that at


time t=100, the observation is Y100 = 28, then
a. Determine forecasts for periods 101, 102, 103 etc.
b. Suppose Y101 = 32, revise your forecasts for periods 102, 103, 104,..., using period
101 as the new origin of time t.
2. The following time series model has been fitted to some historical series as AR(2)
process:
Yt = 21 + 0.27Yt-1 + 0.41Yt-2 + εt . At time t =104, Y104 = 18 then determine forecasts for
periods 105, 106, 107, …

Forecasts with MA(1) Process

For this process, it holds that Yt = μ + εt – θ1εt-1, with |θ1| < 1. Then, Yt+k = μ + εt+k– θ1εt +k-1.

The forecasts at origin t are given by

For k =1, Yˆt (1)   – 1 t

For k=2, Yˆt (2)  

For k 2, Yˆt (k )   .That is the unconditional mean is the optimal forecast of Yt+k, k = 2, 3, ..., .

Forecasts with MA(2) Process

For this process, it holds that Yt = μ +εt – θ1εt-1–θ2εt-2, with θ1+θ2< 1, θ2–θ1<1 and |θ1| <1. The
model for Yt+k is Yt  k =    t k  1 t k 1   2 t k 2 .

The forecasts at origin t for lead time k are

For k = 1, Yˆt (1)   – 1 t -  2 t 1

For k = 2, Yˆt (2)   –  2 t

For k  3 , Yˆt (k )   i.e. the unconditional mean is the optimal forecast of Yt+k, k = 3,4, ..., .

Similarly, it is possible to show that, after q forecast steps, the optimal forecasts of invertible
MA(q) processes, q > 1 are equal to the unconditional mean of the process and that the variance
of the forecast error is equal to the variance of the underlying process.

Example: The time series model has been fitted to some historical data as MA(2) process:

Yt = 20 + εt + 0.45 εt-1 - 0.35εt-2. If the first four observations are 17.5, 21.36, 18.24 and
16.91, respectively, then find forecasts for period 5, 6, 7,… from origin 4

Forecasts with ARMA(1,1) Processes

Model: Yt = α + Yt-1 +εt – θ1εt-1


The model forYt+k is Yt+k = α + Yt+k-1 +εt+k – θ1εt+k-1

The forecasts at origin t are

Yˆt (1)  α + 1Yt  1 t for lead time k=1

Yˆt (2)  α + 1Yˆt (1) for lead time k=2

Thus, for k>1, the forecast function for ARMA(1,1)process is Yˆt (k )  α + 1Yˆt (k  1)

Forecast with an IMA(0,1,1)process

Yt  (1  1 B) t

Yt  Yt 1   t  1 t 1

The Yt+kmodel is Yt k  Yt k 1   t k  1 t k 1

The forecasts at origin t for lead timek:

For k=1, Yˆt (1)  Yt  1 t

For k=2, Yˆt (2)  Yˆt (1)

For k>1, we have Yˆt (k )  Yˆt (k  1) as the eventual forecast function for IMA(0,1,1) process.

You might also like