Time Series Forcasting

Time Series Forecasting -
Project 6
Australian Monthly Gas

Production
Presented By: Sanan Sahadevan Olachery
Submission Date: May 3rd 2020
1|Page
Content
Sr. No Particulars Page No
1 Problem Statement. Pg: 3
2 Read the Data and Plot. Pg: 4 to 7
3 Observation. Pg: 7 to 9
4 Checking Stationary in the Time Series, Pg: 9 to 13

Decomposition & De-seasonalization of Data.
5 MODEL Building and Forecasting. Pg: 14 to 18
6 Accuracy, Observation & Conclusion of the Pg: 18 & 19

MODEL’s Created.
2|Page
Problem Statement
For this assignment, we are requested to explore the gas (Australian monthly gas
production) dataset from the Forecast package in R.
The package contains methods and tools for displaying and analyzing univariate
time series forecasts including exponential smoothing via state space models and
automatic ARIMA modeling. We are requested to do the following:
1. Read the data as a time series object in R. Plot the data.

2. What do you observe? Which components of the time series are present in
this dataset? What is the periodicity of dataset?
3. Is the time series Stationary? Inspect visually as well as conduct an ADF
test? Write down the null and alternate hypothesis for the stationarity test?
De-seasonalise the series if seasonality is present?
4. Develop an initial forecast for next 20 periods. Check the same using the
various metrics, after finalizing the model, develop a final forecast for the
12 time periods. Use both manual and auto.arima.
5. Report the accuracy of the model.
3|Page
Solution 1: READ the data and Plot
We would begin with doing a basic analysis of the dataset to understand data provided. From
the below images we observe the following:
 It has 476 Observations. {fig1(a), fig1(B) and fig1(B.i)}
 Starts from Jan 1956 and ends in August 1995. { fig1(B) ,fig1(B.i) and fig1(C) }
 Frequency of the dataset is a monthly series. {fig1(B) and fig1(B.i)}
 The Cycle of the dataset shows that there are no missing values. fig1(B) and fig1(D)
 The data type is a Time Series it can be observed in the Class. fig1(B)
 Dataset is further Plotted and graphically explained in fig1 (E.i) & 1(E.ii).
 In the beginning Gas Production is flattened from 1956 to 1970 thereafter it shows an
upward movement in the Gas Production from 1970 onwards.
FIG: 1(a)
FIG: 1(B)
4|Page
FIG: 1(B.i)
FIG: 1(C)
5|Page
FIG: 1(D)
FIG: 1(E.i)
6|Page
FIG: 1(E.ii)
Solution 2: OBSERVATION
From the above images FIG: 1(E.ii) we can see, initially there is stagnancy in the Gas production
from 1956 to 1969. Later the Gas Production increases from 1970 onwards and thereafter there
is a gradual increase in Gas Production and then there is an Upwards Trend. 1980 the gas
production becomes variable every month. Seasonality and Trend component is also indicated
in the graph.
For further analysis data set has been broken into Quarterly, and Annual groups, see FIG: 2(a)
Where Seasonality and Trend can be also studied better.
The graphical representation {see FIG: 2(b.i), FIG: 2(b.ii) & FIG: 2(b.iii)} of dataset shows indication of:
 Seasonality in Quarter plot.

 Trend in Annual plot.
 Monthly plot shows visible seasonality in Gas Production from May to October. With a
spike in production showing peak in the month of July across all years.
FIG: 2(a)
7|Page
FIG: 2(b.i)
FIG: 2(b.ii)
8|Page
FIG: 2(b.iii)
The above graph shows that across the initial period seasonality was not prevalent. It is visible
between May to October with July having Peak production.
Solution 3: Checking Stationary in the Time Series,

Decomposition & De-seasonalization of Data
Decomposition
On a time series data, decomposition is applied to separate different components of time series
by transforming it into multiple time series.
We can see the below mentioned components by decomposing.
 Seasonality- patterns which repeats with fixed period of Time.
 Trend – trend of metrics.
9|Page
 Random- residual of time series after allocating into seasonality and trend time series. It
is also referred as noise, irregular or remainder.
For an effective decomposition we need to select the right model and look at the time series for
Additive or Multiplicative model.
An additive model is useful when seasonal variation is relatively constant while a Multiplicative
model is useful when seasonal variation increase over time.
With the below image of decomposition {FIG: 2(C.i) & FIG: 2(C.ii)} we can interpret that there is a
Strong seasonality pattern is present in time series and there is an upwards trend. Thus the
series is assumed to be Additive.
With the time series decomposed into its components (Trend, Seasonality and random
variation) we also observe a semiannual Seasonality with an upward trend in Gas Production.
FIG: 2(C.i)
FIG: 2(C.ii)
10 | P a g e
De-seasonalization
We need to De-seasonalize time series to focus if general trend of Gas production is up. To
forecast the production in the next month we need to consider seasonality and trend.
Since the data series is additive the other two component of the series (trend and random) are
added to deseasonalize and study the trend with the Original Data and De-seasonalized data
plotted in FIG: 2(D.ii).
From the image FIG: 2(D.ii we see that original series in RED & De-seasonalize Gas Production in
Blue. It shows that there is an increase in trend of Gas Production.
FIG: 2(D.i)
FIG: 2(D.ii)
For further Analysis of the time series dataset we have divided the data into TEST and TRAIN
data sets. Since the data series shows increase in production trend movement from 1970 we
will use data from January 1970 onwards for Analysis.
The Data set is divided into Test and Train keeping in mind that it captures one whole cycle.
Refer FIG: 3(a)
FIG: 3(a)
11 | P a g e
We conduct a Dickey Fuller Test on the time series dataset to check the null hypothesis can be
accepted or rejected. It will also assist to check if time series is stationary or non stationary.
If the Null hypothesis is accepted, then we can conclude that the time series is non stationary
and it has time dependent structure. The P Value is > 0.05 we shall retain Null Hypothesis.
Alternatively if Null Hypothesis is rejected then we can conclude that the time series is
Stationary and it does not have time dependent structure. The P Value is < /=0.05 we shall
reject Null Hypothesis.
From the below image FIG: 3(b) we find that the P value is 0.99 which is greater than the null
value therefore Null Hypothesis is retained concluding the time series data to be NON
Stationary.
FIG: 3(b)
In order to perform ARIMA model on the time series, the Time Series should be STATIONARY.
But the time series data set we have is NON Stationary, hence we need to perform different
transformation and observe if the time series is Stationary by plotting it. To perform this we
12 | P a g e
further create a new data set and apply Dickey Fuller Test on it and also Plot the dataset to
check if it is stationary or Non Stationary.
Inferring the below images (FIG: 3(C.i) & FIG: 3(C.ii) we can see that the P Value is arrived at 0.01
which is smaller than the Null Value and therefore the NULL Hypothesis can be rejected. The
Time series of Difference is Stationary and the level of the series appears to be Constant over
time. Plot of Time series difference FIG: 3(C.ii) also shows it is stationary.
FIG: 3(C.i)
FIG: 3(C.ii)
13 | P a g e
Solution 4: MODEL Building and Forecasting
We examine ACF and PACF of the Data to check the stationary data and auto correlation.
The ACF function helps to compute an estimate of the Auto correlation function. ACF Plots
helps in determining the order of MA (Q) model.
While the PACF function helps to compute a Partial Auto Correlation of a time series. PACF
Plots helps in determining the order of AR (P) model.
By inferring Image FIG: 3(D.ii) for ACF Plot we can see that Auto correlation is significant in first
10 LAGS over all LAGS, except for the 4th LAG and 10th LAG.
The inference of Image FIG: 3(D.iii) for PACF plot it shows that all LAGs are significant.
FIG: 3(D.i)
FIG: 3(D.ii) ACF - PLOT
FIG: 3(D.iii) PACF - PLOT
14 | P a g e
We shall now build a manual ARIMA model with the seasonal effects (P, D, and Q) which
defines the following 3 parameters:
 No of Autoregressive Terms.
 No of differencing to stationarize the series.
 No of Moving average Terms.
AIC is arrived at 4217.55 FIG: 4(a)
We would forecast the Manual ARIMA for 12 advance period see image FIG: 4(a.i)
FIG: 4(a)
FIG: 4(a.i)
ARIMA model is assumed to be reasonable for a series, it is also important to check whether
residuals are independent before using the model for forecasting. Box-Ljung test is applied to
check whether the residuals of time series are stationary or not.
H0: Residuals are stationary.
H1: Residuals are not stationary.
15 | P a g e
From image FIG: 4(a.iii) for Box- Ljung test we can conclude that residuals are stationary and
therefore do not reject H0 as the P value got Hypothesis is 0.3341 which is greater that the level
of 0.05.
FIG: 4(a.ii)
FIG: 4(a.iii)
Box-Ljung test
data: manu.arima$residuals
X-squared = 360.79, df = 350, p-value = 0.3341
FIG: 4(a.iv)
Let’s now check the model performance on train dataset. Let Auto ARIMA model decide the
parameters.
Image FIG: 5(b.) show Auto ARIMA models gives us the p,d,q values as 1,1,1 and the seasonal
order of P, D,Q as 0,1,1
16 | P a g e
FIG: 5(a.)
FIG: 5(b.)
Series: GasTrain
ARIMA(1,1,1)(0,1,1)[12]
Coefficients:
ar1 ma1 sma1
0.5489 -0.8076 -0.4130
s.e. 0.1061 0.0698 0.0581
sigma^2 estimated as 259078: log likelihood=-2103.9

AIC=4215.79 AICc=4215.94 BIC=4230.26
FIG: 5(c.)
The above graph shows the forecasted Gas Production with the blue line with 80% and Grey
line with 95% confidence Intervals. As we know that a higher confidence level results in a wider
interval. The interval will be wider if we forecast farther.
FIG: 5(d.)
Box-Ljung test
data: auto.fit$residuals
X-squared = 358.18 df = 350, p-value = 0.37
17 | P a g e
FIG: 5(e.)
FIG: 5(f.)
The above Graph shows evident difference between Actual and Forecasted values for the test
period.
Solution 5: Accuracy OBSERVATION and Conclusion of the

MODEL’s Created:
Let’s find the accuracy of the Forecasted models on both Manual and Auto ARIMA. The below
Fig {5g-5j} we conclude that in most of the accuracy parameters Manual ARIMA model is very
close with the Auto ARIMA Model.
18 | P a g e
From the below table we conclude that the actual model based on AIC auto ARIMA is better
than Manual ARIMA
SR. Particulars MANUAL – ARIMA AUTO - ARIMA

NO
RMSE (TRAIN) 494.3395 494.6562
MAPE (TRAIN) 3.489569 3.494452
RMSE (TEST) 5171.2027 5184.8321
MAPE (TEST) 12.562329 12.586844
AIC 4217.55 4215.94
FIG: 5(g.)
FIG: 5(h.)
> accuracy(forecast(manu.arima,24),GasTest)
ME RMSE MAE MPE MAPE MASE ACF1
Training set 26.546 494.3995 290.9252 0.3328271 3.489569 0.2958573 -0.01897289
Test set 4300.173 5171.2027 4300.1733 12.5623292 12.562329 4.3730737 0.70894949
Theil's U
Training set NA
Test set 1.539188
>
FIG: 5(i.)
FIG: 5(j.)
accuracy(forecast(auto.fit,24),GasTest)
ME RMSE MAE MPE MAPE MASE ACF1
Training set 27.33581 494.6562 290.8485 0.3410585 3.494452 0.2957792 -0.01884895
Test set 4309.76776 5184.8321 4309.7678 12.5868439 12.586844 4.3828308 0.70824437

Theil's U
Training set NA
Test set 1.542299
19 | P a g e

Time Series Forcasting

Uploaded by

Copyright:

Available Formats

Time Series Forcasting

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Time Series Forcasting

Uploaded by

Copyright:

Available Formats

Time Series Forecasting -

Australian Monthly Gas

Presented By: Sanan Sahadevan Olachery

Submission Date: May 3rd 2020

1 Problem Statement. Pg: 3

2 Read the Data and Plot. Pg: 4 to 7

4 Checking Stationary in the Time Series, Pg: 9 to 13

5 MODEL Building and Forecasting. Pg: 14 to 18

6 Accuracy, Observation & Conclusion of the Pg: 18 & 19

1. Read the data as a time series object in R. Plot the data.

 Seasonality in Quarter plot.

Solution 3: Checking Stationary in the Time Series,

FIG: 3(D.ii) ACF - PLOT

FIG: 3(D.iii) PACF - PLOT

AIC is arrived at 4217.55 FIG: 4(a)

X-squared = 360.79, df = 350, p-value = 0.3341

sigma^2 estimated as 259078: log likelihood=-2103.9

X-squared = 358.18 df = 350, p-value = 0.37

Solution 5: Accuracy OBSERVATION and Conclusion of the

SR. Particulars MANUAL – ARIMA AUTO - ARIMA

Test set 4309.76776 5184.8321 4309.7678 12.5868439 12.586844 4.3828308 0.70824437

Test set 1.542299

You might also like