Time Series Forcasting
Time Series Forcasting
Time Series Forcasting
Project 6
1|Page
Content
Sr. No Particulars Page No
3 Observation. Pg: 7 to 9
2|Page
Problem Statement
For this assignment, we are requested to explore the gas (Australian monthly gas
production) dataset from the Forecast package in R.
The package contains methods and tools for displaying and analyzing univariate
time series forecasts including exponential smoothing via state space models and
automatic ARIMA modeling. We are requested to do the following:
3|Page
Solution 1: READ the data and Plot
We would begin with doing a basic analysis of the dataset to understand data provided. From
the below images we observe the following:
It has 476 Observations. {fig1(a), fig1(B) and fig1(B.i)}
Starts from Jan 1956 and ends in August 1995. { fig1(B) ,fig1(B.i) and fig1(C) }
Frequency of the dataset is a monthly series. {fig1(B) and fig1(B.i)}
The Cycle of the dataset shows that there are no missing values. fig1(B) and fig1(D)
The data type is a Time Series it can be observed in the Class. fig1(B)
Dataset is further Plotted and graphically explained in fig1 (E.i) & 1(E.ii).
In the beginning Gas Production is flattened from 1956 to 1970 thereafter it shows an
upward movement in the Gas Production from 1970 onwards.
FIG: 1(a)
FIG: 1(B)
4|Page
FIG: 1(B.i)
FIG: 1(C)
5|Page
FIG: 1(D)
FIG: 1(E.i)
6|Page
FIG: 1(E.ii)
Solution 2: OBSERVATION
From the above images FIG: 1(E.ii) we can see, initially there is stagnancy in the Gas production
from 1956 to 1969. Later the Gas Production increases from 1970 onwards and thereafter there
is a gradual increase in Gas Production and then there is an Upwards Trend. 1980 the gas
production becomes variable every month. Seasonality and Trend component is also indicated
in the graph.
For further analysis data set has been broken into Quarterly, and Annual groups, see FIG: 2(a)
Where Seasonality and Trend can be also studied better.
The graphical representation {see FIG: 2(b.i), FIG: 2(b.ii) & FIG: 2(b.iii)} of dataset shows indication of:
FIG: 2(a)
7|Page
FIG: 2(b.i)
FIG: 2(b.ii)
8|Page
FIG: 2(b.iii)
The above graph shows that across the initial period seasonality was not prevalent. It is visible
between May to October with July having Peak production.
On a time series data, decomposition is applied to separate different components of time series
by transforming it into multiple time series.
We can see the below mentioned components by decomposing.
Seasonality- patterns which repeats with fixed period of Time.
Trend – trend of metrics.
9|Page
Random- residual of time series after allocating into seasonality and trend time series. It
is also referred as noise, irregular or remainder.
For an effective decomposition we need to select the right model and look at the time series for
Additive or Multiplicative model.
An additive model is useful when seasonal variation is relatively constant while a Multiplicative
model is useful when seasonal variation increase over time.
With the below image of decomposition {FIG: 2(C.i) & FIG: 2(C.ii)} we can interpret that there is a
Strong seasonality pattern is present in time series and there is an upwards trend. Thus the
series is assumed to be Additive.
With the time series decomposed into its components (Trend, Seasonality and random
variation) we also observe a semiannual Seasonality with an upward trend in Gas Production.
FIG: 2(C.i)
FIG: 2(C.ii)
10 | P a g e
De-seasonalization
We need to De-seasonalize time series to focus if general trend of Gas production is up. To
forecast the production in the next month we need to consider seasonality and trend.
Since the data series is additive the other two component of the series (trend and random) are
added to deseasonalize and study the trend with the Original Data and De-seasonalized data
plotted in FIG: 2(D.ii).
From the image FIG: 2(D.ii we see that original series in RED & De-seasonalize Gas Production in
Blue. It shows that there is an increase in trend of Gas Production.
FIG: 2(D.i)
FIG: 2(D.ii)
For further Analysis of the time series dataset we have divided the data into TEST and TRAIN
data sets. Since the data series shows increase in production trend movement from 1970 we
will use data from January 1970 onwards for Analysis.
The Data set is divided into Test and Train keeping in mind that it captures one whole cycle.
Refer FIG: 3(a)
FIG: 3(a)
11 | P a g e
We conduct a Dickey Fuller Test on the time series dataset to check the null hypothesis can be
accepted or rejected. It will also assist to check if time series is stationary or non stationary.
If the Null hypothesis is accepted, then we can conclude that the time series is non stationary
and it has time dependent structure. The P Value is > 0.05 we shall retain Null Hypothesis.
Alternatively if Null Hypothesis is rejected then we can conclude that the time series is
Stationary and it does not have time dependent structure. The P Value is < /=0.05 we shall
reject Null Hypothesis.
From the below image FIG: 3(b) we find that the P value is 0.99 which is greater than the null
value therefore Null Hypothesis is retained concluding the time series data to be NON
Stationary.
FIG: 3(b)
In order to perform ARIMA model on the time series, the Time Series should be STATIONARY.
But the time series data set we have is NON Stationary, hence we need to perform different
transformation and observe if the time series is Stationary by plotting it. To perform this we
12 | P a g e
further create a new data set and apply Dickey Fuller Test on it and also Plot the dataset to
check if it is stationary or Non Stationary.
Inferring the below images (FIG: 3(C.i) & FIG: 3(C.ii) we can see that the P Value is arrived at 0.01
which is smaller than the Null Value and therefore the NULL Hypothesis can be rejected. The
Time series of Difference is Stationary and the level of the series appears to be Constant over
time. Plot of Time series difference FIG: 3(C.ii) also shows it is stationary.
FIG: 3(C.i)
FIG: 3(C.ii)
13 | P a g e
Solution 4: MODEL Building and Forecasting
We examine ACF and PACF of the Data to check the stationary data and auto correlation.
The ACF function helps to compute an estimate of the Auto correlation function. ACF Plots
helps in determining the order of MA (Q) model.
While the PACF function helps to compute a Partial Auto Correlation of a time series. PACF
Plots helps in determining the order of AR (P) model.
By inferring Image FIG: 3(D.ii) for ACF Plot we can see that Auto correlation is significant in first
10 LAGS over all LAGS, except for the 4th LAG and 10th LAG.
The inference of Image FIG: 3(D.iii) for PACF plot it shows that all LAGs are significant.
FIG: 3(D.i)
14 | P a g e
We shall now build a manual ARIMA model with the seasonal effects (P, D, and Q) which
defines the following 3 parameters:
No of Autoregressive Terms.
No of differencing to stationarize the series.
No of Moving average Terms.
We would forecast the Manual ARIMA for 12 advance period see image FIG: 4(a.i)
FIG: 4(a)
FIG: 4(a.i)
ARIMA model is assumed to be reasonable for a series, it is also important to check whether
residuals are independent before using the model for forecasting. Box-Ljung test is applied to
check whether the residuals of time series are stationary or not.
H0: Residuals are stationary.
H1: Residuals are not stationary.
15 | P a g e
From image FIG: 4(a.iii) for Box- Ljung test we can conclude that residuals are stationary and
therefore do not reject H0 as the P value got Hypothesis is 0.3341 which is greater that the level
of 0.05.
FIG: 4(a.ii)
FIG: 4(a.iii)
Box-Ljung test
data: manu.arima$residuals
FIG: 4(a.iv)
Let’s now check the model performance on train dataset. Let Auto ARIMA model decide the
parameters.
Image FIG: 5(b.) show Auto ARIMA models gives us the p,d,q values as 1,1,1 and the seasonal
order of P, D,Q as 0,1,1
16 | P a g e
FIG: 5(a.)
FIG: 5(b.)
Series: GasTrain
ARIMA(1,1,1)(0,1,1)[12]
Coefficients:
ar1 ma1 sma1
0.5489 -0.8076 -0.4130
s.e. 0.1061 0.0698 0.0581
FIG: 5(c.)
The above graph shows the forecasted Gas Production with the blue line with 80% and Grey
line with 95% confidence Intervals. As we know that a higher confidence level results in a wider
interval. The interval will be wider if we forecast farther.
FIG: 5(d.)
Box-Ljung test
data: auto.fit$residuals
17 | P a g e
FIG: 5(e.)
FIG: 5(f.)
The above Graph shows evident difference between Actual and Forecasted values for the test
period.
18 | P a g e
From the below table we conclude that the actual model based on AIC auto ARIMA is better
than Manual ARIMA
FIG: 5(g.)
FIG: 5(h.)
> accuracy(forecast(manu.arima,24),GasTest)
ME RMSE MAE MPE MAPE MASE ACF1
Training set 26.546 494.3995 290.9252 0.3328271 3.489569 0.2958573 -0.01897289
Test set 4300.173 5171.2027 4300.1733 12.5623292 12.562329 4.3730737 0.70894949
Theil's U
Training set NA
Test set 1.539188
>
FIG: 5(i.)
FIG: 5(j.)
accuracy(forecast(auto.fit,24),GasTest)
ME RMSE MAE MPE MAPE MASE ACF1
Training set 27.33581 494.6562 290.8485 0.3410585 3.494452 0.2957792 -0.01884895
19 | P a g e