100% found this document useful (6 votes)
1K views24 pages

SANDYA VB-Business Report TSF

The document analyzes time series data on wine sales from 1980-1995 and performs forecasting. It splits the data into training and test sets, builds exponential smoothing and other models on the training data, and evaluates the models' performance on the test set using RMSE. Stationarity of the data is also checked.

Uploaded by

Sandya Vb
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (6 votes)
1K views24 pages

SANDYA VB-Business Report TSF

The document analyzes time series data on wine sales from 1980-1995 and performs forecasting. It splits the data into training and test sets, builds exponential smoothing and other models on the training data, and evaluates the models' performance on the test set using RMSE. Stationarity of the data is also checked.

Uploaded by

Sandya Vb
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

BUSINESS ANALYSIS

REPORT
TIME SERIES FORECASTING

JUNE 20, 2021

SANDYA V B
CONTENTS
1. Read the data as an appropriate Time Series data and plot the data.
2. Perform appropriate Exploratory Data Analysis to understand the data and also perform
decomposition.
3. Split the data into training and test. The test data should start in 1991.
4. Build various exponential smoothing models on the training data and evaluate the model
using RMSE on the test data.
Other models such as regression, naïve forecast models, simple average models etc. should
also be built on the training data and check the performance on the test data using RMSE.
5. Check for the stationarity of the data on which the model is being built on using
appropriate statistical tests and also mention the hypothesis for the statistical test. If the
data is found to be non-stationary, take appropriate steps to make it stationary. Check the
new data for stationarity and comment.
Note: Stationarity should be checked at alpha = 0.05.
6. Build an automated version of the ARIMA/SARIMA model in which the parameters are
selected using the lowest Akaike Information Criteria (AIC) on the training data and
evaluate this model on the test data using RMSE.
7. Build ARIMA/SARIMA models based on the cut-off points of ACF and PACF on the
training data and evaluate this model on the test data using RMSE.
8. Build a table with all the models built along with their corresponding parameters and the
respective RMSE values on the test data.
9. Based on the model-building exercise, build the most optimum model(s) on the complete
data and predict 12 months into the future with appropriate confidence intervals/bands.
10. Comment on the model thus built and report your findings and suggest the measures
that the company should be taking for future sales.
PROBLEM:
For this particular assignment, the data of different types of wine sales in the 20th century is to be
analysed. Both of these data are from the same company but of different wines. As an analyst in the ABC
Estate Wines, you are tasked to analyse and forecast Wine Sales in the 20th century.

Data set for the Problem: Sparkling.csv and Rose.csv


1. Read the data as an appropriate Time Series data and plot the data.

➢ The two datasets: Rose and Sparkling are imported using the read command. And convert to time series
data using
date_range function:
date = pd.date_range(start='01/01/1980', end='08/01/1995', freq='M')date
df['Time_Stamp'] = pd.DataFrame(date,columns=['Month'])
df.head()

o/p:

ROSE WINE YEAR WISE SALES SPARKLING WINE YEAR WISE


SALES

• We observe that there is no much trend in the


• From the above plot we observe that there is a
above plot.
decreasing trend in the initial years and stabilizes
• The seasonality seems to have a pattern on
over the years.
yearly basis.
• We also see that the seasonality in the data trend
and pattern seems to repeat.

2. Perform appropriate Exploratory Data Analysis to understand the data and also
perform decomposition.

ROSE WINE EDA SPARKLING WINE EDA

• The shape of the data is (187,1). • The shape of the data is (187,1).
• There are 2 null values present in the data, which • There are no null values present.
was interpolated using linear method. • Describing the data:
• Describing the data:
Measures count mean std min 25% 50% 75% max Measures count mean std min 25% 50% 75% max
Rose 185 90.3 39.1 28 63 86 112 267 Rose 187 2402.41 1295.11 1070 1605 1874 2549 7242
• From the above plot we see that the box plots • From the above plot, we see that the box plots
indicates a downward trend do not indicate any trend.
• We also see that there are few outliers present in • We also observe that the sale of Sparkling wine
the sales plot. has outliers for almost all the years except
1955.

• From the above plot, we observe that there is


• From the above plot, we see that December an increase in the sale.
month has the highest sales of wine. • We also see that the sale for the month
• There are also outliers present in June, July, December has the highest value.
August and September months.

• We observe that the line plot of year/month


• We observe that the line plot of year/month wise wise sales shows that the December month has
sales shows that the December month has the the highest sale and August, January and
highest sale and May, January and February show February show lower sale values.
lower sale values.

• The time series month plot is to understand the


• The time series month plot is to understand the spread of Rose wine sale across different years
spread of Rose wine sale across different years and within different months across years.
and within different months across years.
• To resample or aggregate the Time Series from
• To resample or aggregate the Time Series from an
an annual perspective and sum up the
annual perspective and sum up the observations.
observations.

• To resample or aggregate the Time Series from an • To resample or aggregate the Time Series from
annual perspective and take the mean of the an annual perspective and take the mean of the
observations of the year. observations of the year.

• If we take the resampling period to be 10 years or • If we take the resampling period to be 10 years
a decade, we see that the seasonality present has or a decade, we see that the seasonality present
been smoothed over and it is only giving an has been smoothed over and it is only giving
estimate of the trend. an estimate of the trend.

• This particular graph tells us what percentage


• This particular graph tells us what percentage of
of data points refer to what number of Sales.
data points refer to what number of Sales.
• The above two graphs tell us the Average Sales • The above two graphs tell us the Average Sales
and the Percentage change of Sales with respect to and the Percentage change of Sales with
the time. respect to the time.

• We see that the residuals are located around 0


• We see that the residuals are located around 0
from the plot of the residuals in the
from the plot of the residuals in the
decomposition.
decomposition.

• For the multiplicative series, we see that a lot of • For the multiplicative series, we see that a lot
residuals are located around 1. of residuals are located around 1.

3. Split the data into training and test. The test data should start in 1991.

ROSE WINE TRAIN & TEST DATA SPARKLING WINE TRAIN & TEST
DATA
• The train data of Rose wine has been splitted • The train data of Rose wine has been splitted
upto the year 1990 and has 132 data points. upto the year 1990 and has 132 data points.
• The test data has been splitted from the year • The test data has been splitted from the year
1991 and has 55 data points. 1991 and has 55 data points.
• From our train-test data split we will be • From our train-test data split we will be
predicting the future sales in comparison with predicting the future sales in comparison with
the past years’ sale. the past years’ sale.
• Training data: • Training data:
First few rows of Training Data First few rows of Training Data
Rose Sparkling
Time_Stamp Time_Stamp
1980-01-31 112.0 1980-01-31 1686
1980-02-29 118.0 1980-02-29 1591
1980-03-31 129.0 1980-03-31 2304
1980-04-30 99.0 1980-04-30 1712
1980-05-31 116.0 1980-05-31 1471

Last few rows of Training Data Last few rows of Training Data
Rose Sparkling
Time_Stamp Time_Stamp
1990-08-31 70.0 1990-08-31 1605
1990-09-30 83.0 1990-09-30 2424
1990-10-31 65.0 1990-10-31 3116
1990-11-30 110.0 1990-11-30 4286
1990-12-31 132.0 1990-12-31 6047

• Test data: • Test data:


First few rows of Test Data First few rows of Test Data
Rose Sparkling
Time_Stamp Time_Stamp
1991-01-31 54.0 1991-01-31 1902
1991-02-28 55.0 1991-02-28 2049
1991-03-31 66.0 1991-03-31 1874
1991-04-30 65.0 1991-04-30 1279
1991-05-31 60.0 1991-05-31 1432

Last few rows of Test Data Last few rows of Test Data
Rose Sparkling
Time_Stamp Time_Stamp
1995-03-31 45.0 1995-03-31 1897
1995-04-30 52.0 1995-04-30 1862
1995-05-31 28.0 1995-05-31 1670
1995-06-30 40.0 1995-06-30 1688
1995-07-31 62.0 1995-07-31 2031

4. Build various exponential smoothing models on the training data and evaluate
the model using RMSE on the test data. Other models such as regression, naïve
forecast models and simple average models. should also be built on the training
data and check the performance on the test data using RMSE.

ROSE WINE SPARKLING WINE


➢ MODEL 1: LINEAR REGRESSION ➢ MODEL 1: LINEAR REGRESSION
• For Linear Regression, we will regress the • For Linear Regression, we will regress the
‘Sales’ variable against the order of the ‘Sales’ variable against the order of the
occurrence. occurrence.
• Then we generate the numerical time instance • Then we generate the numerical time instance
order for both train and test set. order for both train and test set.
• We will add these values in the training and • We will add these values in the training and
test set. test set.
• Hence the train and test set are thus modified • Hence the train and test set are thus modified
to perform Linear regression. to perform Linear regression.

• First few rows of Training Data • First few rows of Training Data
Rose time Sparkling time
Time_Stamp Time_Stamp
1980-01-31 112.0 1 1980-01-31 1686 1
1980-02-29 118.0 2 1980-02-29 1591 2
1980-03-31 129.0 3 1980-03-31 2304 3
1980-04-30 99.0 4 1980-04-30 1712 4
1980-05-31 116.0 5 1980-05-31 1471 5

Last few rows of Training Data Last few rows of Training Data
Rose time Sparkling time
Time_Stamp Time_Stamp
1990-08-31 70.0 128 1990-08-31 1605 128
1990-09-30 83.0 129 1990-09-30 2424 129
1990-10-31 65.0 130 1990-10-31 3116 130
1990-11-30 110.0 131 1990-11-30 4286 131
1990-12-31 132.0 132 1990-12-31 6047 132

First few rows of Test Data First few rows of Test Data
Rose time Sparkling time
Time_Stamp Time_Stamp
1991-01-31 54.0 43 1991-01-31 1902 43
1991-02-28 55.0 44 1991-02-28 2049 44
1991-03-31 66.0 45 1991-03-31 1874 45
1991-04-30 65.0 46 1991-04-30 1279 46
1991-05-31 60.0 47 1991-05-31 1432 47

Last few rows of Test Data Last few rows of Test Data
Rose time Sparkling time
Time_Stamp Time_Stamp
1995-03-31 45.0 93 1995-03-31 1897 93
1995-04-30 52.0 94 1995-04-30 1862 94
1995-05-31 28.0 95 1995-05-31 1670 95
1995-06-30 40.0 96 1995-06-30 1688 96
1995-07-31 62.0 97 1995-07-31 2031 97

• TEST RMSE SCORE = 51.433312 • TEST RMSE SCORE = 1275.867052


➢ MODEL 2: NAÏVE MODEL ➢ MODEL 2: NAÏVE MODEL

• For the Naïve Model, we observe that the • For the Naïve Model, we observe that the
green line in the plot below shows a straight green line in the plot below shows a straight
line. line.
• Which predicts that the sale for tomorrow is • Which predicts that the sale for tomorrow is
the same as today. the same as today.
• And the prediction for day after tomorrow is • And the prediction for day after tomorrow is
tomorrow. tomorrow.
• Hence it applies to all the future years. • Hence it applies to all the future years.
• TEST RMSE SCORE = 79.718773 • TEST RMSE SCORE = 3864.279352

➢ MODEL 3: SIMPLE AVERAGE MODEL ➢ MODEL 3: SIMPLE AVERAGE MODEL

• In Simple Average method, we will forecast • In Simple Average method, we will forecast
the data using the average of the training the data using the average of the training
values. values.
• From the plot below, we observe that the • From the plot below, we observe that the green
green line is straight and shows the Simple line is straight and shows the Simple Average
Average forecasting. forecasting.
• TEST RMSE SCORE = 53.460570 • TEST RMSE SCORE = 1275.081804
➢ MODEL 4: MOVING AVERAGE MODEL ➢ MODEL 4: MOVING AVERAGE MODEL
• In Moving Average Model, we compute • In Moving Average Model, we compute
moving averages for 2, 4, 6 and 9 point moving averages for 2, 4, 6 and 9 point
intervals. intervals.
• Then the best interval is determined by the • Then the best interval is determined by the
maximum accuracy. maximum accuracy.
• From the below table we see that 2 point • From the below table we see that 2 point
trailing moving average has the least score. trailing moving average has the least score.

Moving average Test RMSE Moving average Test RMSE


2point trailing moving average 11.529278 2point trailing moving average 813.400684
4point trailing moving average 14.451403 4point trailing moving average 1156.589694
6point trailing moving average 14.566327 6point trailing moving average 1283.927428
9point trailing moving average 14.727630 9point trailing moving average 1346.278315

➢ MODEL 5: SIMPLE EXPONENTIAL ➢ MODEL 5: SIMPLE EXPONENTIAL


SMOOTHING MODEL SMOOTHING MODEL

• Simple Exponential Smoothing model is • Simple Exponential Smoothing model is


evaluated for alpha = 0.098. evaluated for alpha = 0.0.

• TEST RMSE SCORE = 36.796244 • TEST RMSE SCORE = 1275.081766


❖ SIMPLE EXPONENTIAL SMOOTHING ❖ SIMPLE EXPONENTIAL SMOOTHING
TUNING MODEL TUNING MODEL

• In Simple Exponential Smoothing Model, we • In Simple Exponential Smoothing Model, we


will run a loop with different alpha values to will run a loop with different alpha values to
understand which particular value is best. understand which particular value is best.
• Alpha value ranges from 0.3 to 0.9 • Alpha value ranges from 0.3 to 0.9.
• TEST RMSE SCORE = 47.504821 • TEST RMSE SCORE = 1935.507132

Alpha Values Train RSME Test RSME Alpha Values Train RSME Test RSME
0.3 32.470164 47.504821 0.3 1359.511747 1935.507132
0.4 33.035130 53.767406 0.4 1352.588879 2311.919615
0.5 33.682839 59.641786 0.5 1344.004369 2666.351413
0.6 34.441171 64.971288 0.6 1338.805381 2979.204388
0.7 35.323261 69.698162 0.7 1338.844308 3249.944092
0.8 36.334596 73.773992
0.9 37.482782 77.139276

➢ MODEL 6: DOUBLE EXPONENTIAL ➢ MODEL 6: DOUBLE EXPONENTIAL


SMOOTHING MODEL
SMOOTHING MODEL

• In Double Exponential Smoothing, we have


• In Double Exponential Smoothing, we have
two parameters alpha and beta.
two parameters alpha and beta.
• The values range from 0.3 to 0.1 and we get at
• The values range from 0.3 to 0.1 and we get at
least 5 RMSE scores.
least 5 RMSE scores.
• TEST RMSE SCORE = 18259.110704
• TEST RMSE SCORE = 98.653317

Alpha Beta Train Test


Alpha Beta Train Test
Values Values RMSE RMSE
Values Values RMSE RMSE
0.3 0.1 33.611269 98.653317
0.3 0.3 1592.292788 18259.110704
0.4 0.1 34.255060 128.978579
0.4 0.3 1569.338606
23878.496940
0.5 0.1 34.957515 155.358815
0.3 0.4 1682.573828 26069.841401
0.3 0.2 34.645117 177.140327
0.5 0.3 1530.575845 27095.532414

0.6 0.1 35.781643 178.004967 0.6 0.3 1506.449870 29070.722592


➢ MODEL 7: TRIPLE EXPONENTIAL
➢ MODEL 7: TRIPLE EXPONENTIAL SMOOTHING MODEL
SMOOTHING MODEL
• In Triple Exponential Smoothing we have three
• In Triple Exponential Smoothing we have
parameters: Alpha, Beta and Gamma.
three parameters: Alpha, Beta and Gamma.
• Smoothing level value represents Alpha.
• Smoothing level value represents Alpha.
• Smoothing trend value represents Beta.
• Smoothing trend value represents Beta.
• Smoothing seasonality value represents
• Smoothing seasonality value represents
Gamma.
Gamma.
• TEST RMSE SCORE = 383.155684
• TEST RMSE SCORE = 17.369489

❖ TRIPLE EXPONENTIAL SMOOTHING


❖ TRIPLE EXPONENTIAL SMOOTHING
TUNING MODEL
TUNING MODEL
• In Triple Exponential Smoothing Model, we
• In Triple Exponential Smoothing Model, we
will run a loop with different alpha, beta and
will run a loop with different alpha, beta and
gamma values to understand which particular
gamma values to understand which particular
set of value is best.
set of value is best.
• TEST RMSE SCORE = 10.945435
• TEST RMSE SCORE = 10.945435

Alpha Beta Gamma Train Test


Alpha Beta Gamma Train Test
value Value Value RMSE RMSE
value Value Value RMSE RMSE
0.3 0.3 0.3 404.513320 392.786198
0.3 0.4 0.3 28.111886 10.945435
0.3 0.4 0.3 424.828055 410.854547
0.3 0.3 0.4 27.399095 11.201633
0.4 0.3 0.4 435.553595 421.409170
0.4 0.3 0.8 32.601491 12.615607
0.7 0.8 0.3 700.317756 518.188752
0.3 0.5 0.3 29.087520 14.414604
0.5 0.3 0.6 32.144773 16.720720 0.5 0.3 0.5 498.239915 542.175497
5. Check for the stationarity of the data on which the model is being built on
using appropriate statistical tests and also mention the hypothesis for the
statistical test. If the data is found to be non-stationary, take appropriate steps to
make it stationary. Check the new data for stationarity and comment. Note:
Stationarity should be checked at alpha = 0.05.

ROSE WINE SPARKLING WINE

Results of Dickey-Fuller Test: Results of Dickey-Fuller Test:


Test Statistic -1.876699 Test Statistic -1.360497
p-value 0.343101 p-value 0.601061
#Lags Used 13.000000 #Lags Used 11.000000
Number of Observations Used 173.000000 Number of Observations Used 175.000000
Critical Value (1%) -3.468726 Critical Value (1%) -3.468280
Critical Value (5%) -2.878396 Critical Value (5%) -2.878202
Critical Value (10%) -2.575756 Critical Value (10%) -2.575653

• To check the stationarity of Rose data, we if the • To check the stationarity of Sparkling data, we
alpha value is less than 0.05 if the alpha value is less than 0.05
• From the above result we see that the alpha = 0.34 • From the above result we see that the alpha =
which is higher than 0.05 0.60 which is higher than 0.05
• Hence, we fail to reject the null hypothesis • Hence, we fail to reject the null hypothesis

Results of Dickey-Fuller Test: Results of Dickey-Fuller Test:


Test Statistic -8.044392e+00 Test Statistic -45.050301
p-value 1.810895e-12 p-value 0.000000
#Lags Used 1.200000e+01 #Lags Used 10.000000
Number of Observations Used 1.730000e+02 Number of Observations Used 175.000000
Critical Value (1%) -3.468726e+00 Critical Value (1%) -3.468280
Critical Value (5%) -2.878396e+00 Critical Value (5%) -2.878202
Critical Value (10%) -2.575756e+00 Critical Value (10%) -2.575653
• Therefore, we apply a difference of 1 and check • Therefore, we apply a difference of 1 and
for Stationarity. check for Stationarity.
• Now we from the above result the alpha value is • Now we from the above result the alpha value
less than 0.05. is less than 0.05.
• Hence, the null hypothesis is rejected and the data • Hence, the null hypothesis is rejected and the
is Stationary. data is Stationary.

6. Build an automated version of the ARIMA/SARIMA model in which the


parameters are selected using the lowest Akaike Information Criteria (AIC) on
the training data and evaluate this model on the test data using RMSE.

ROSE WINE SPARKLING WINE


➢ AUTOMATED ARIMA ➢ AUTOMATED ARIMA
• We check for the stationarity of the data at alpha • We check for the stationarity of the data at
= 0.05. alpha = 0.05.

Results of Dickey-Fuller Test: Results of Dickey-Fuller Test:


Test Statistic -2.164250 Test Statistic -1.208926
p-value 0.219476 p-value 0.669744
#Lags Used 13.000000 #Lags Used 12.000000
Number of Observations Used 118.000000 Number of Observations Used 119.000000
Critical Value (1%) -3.487022 Critical Value (1%) -3.486535
Critical Value (5%) -2.886363 Critical Value (5%) -2.886151
Critical Value (10%) -2.580009 Critical Value (10%) -2.579896

• From the above result we see that the alpha = • From the above result we see that the alpha =
0.21 which is higher than 0.05 0.66 which is higher than 0.05
• Hence, we take a difference of 1 to make the data • Hence, we take a difference of 1 to make the
stationary. data stationary.
• From the below result we see that the value of • From the below result we see that the value of
alpha is less than 0.05. alpha is less than 0.05.

Results of Dickey-Fuller Test:


Test Statistic -6.592372e+00 Results of Dickey-Fuller Test:
p-value 7.061944e-09 Test Statistic -8.005007e+00
#Lags Used 1.200000e+01 p-value 2.280104e-12
Number of Observations Used 1.180000e+02 #Lags Used 1.100000e+01
Critical Value (1%) -3.487022e+00 Number of Observations Used 1.190000e+02
Critical Value (5%) -2.886363e+00 Critical Value (1%) -3.486535e+00
Critical Value (10%) -2.580009e+00 Critical Value (5%) -2.886151e+00
Critical Value (10%) -2.579896e+00

• To build the automated ARIMA model we • To build the automated ARIMA model we
arrange AIC value from lowest to highest. arrange AIC value from lowest to highest.
• And then proceed to build the ARIMA model • And then proceed to build the ARIMA model
with the lowest Akaike Information Criteria with the lowest Akaike Information Criteria
(AIC) value. (AIC) value.

param AIC
param AIC
(2,1,2) 2210.616692
(0,1,2) 1276.835377
(2,1,1) 2232.360490
(1,1,2) 1277.359224
(0,1,2) 2232.783098
(1,1,1) 1277.775754
(1,1,2) 2233.597647
(2,1,1) 1279.045689
(1,1,1) 2235.013945
(2,1,2) 1279.298694

• TEST RMSE SCORE = 1374.9769475


• TEST RMSE SCORE = 15.6189123
➢ AUTOMATED SARIMA ➢ AUTOMATED SARIMA

• We see that in the ACF plot there is seasonality at • We see that in the ACF plot there is seasonality
the interval of 6 and 12. at the interval of 6 and 12.
• Therefore, we run the automated SARIMA model • Therefore, we run the automated SARIMA
for both the intervals. model for both the intervals.
• The sorted AIC values from lowest to highest. • The sorted AIC values from lowest to highest.
• TEST RMSE SCORE for interval 6= 26.13355444 • TEST RMSE SCORE for interval 6 =
626.880153

param seasonal AIC param seasonal AIC


(1,1,2) (2,0,2,6) 1041.655817 (1,1,2) (2,0,2,6) 1727.678697
(0,1,2) (2,0,2,6) 1043.600261 (0,1,2) (2,0,2,6) 1727.887986
(2,1,2) (2,0,2,6) 1045.286900 (0,1,1) (2,0,2,6) 1741.703671
(2,1,1) (2,0,2,6) 1051.673461 (2,1,1) (2,0,2,6) 1744.040750
(1,1,1) (2,0,2,6) 1052.778469 (2,1,2) (2,0,0,6) 1758.961073
• TEST RMSE SCORE for interval 12= 26.929368 • TEST RMSE SCORE for interval 12= 528.4527

param seasonal AIC param seasonal AIC


(0,1,2) (2,0,2,12) 887.937509 (1,1,2) (1,0,2,12) 1555.584254
(2,1,2) (2,0,2,12) 890.668848 (1,1,2) (2,0,2,12) 1556.080259
(2,1,1) (2,0,0,12) 896.518161 (0,1,2) (2,0,2,12) 1557.121563
(2,1,2) (2,0,0,12) 897.346498 (0,1,2) (1,0,2,12) 1557.160319
(2,1,1) (2,0,1,12) 897.639957 (2,1,2) (1,0,2,12) 1557.439140

Inference for both 6 and 12 iteration :


Inference for both 6 and 12 iteration :
• The Standardized Residual do not display any
• The Standardized Residual do not display any
obvious seasonality.
obvious seasonality.
• Histogram plus estimated density shows the
• Histogram plus estimated density shows the
KDE of the residuals is in normal distribution,
KDE of the residuals is in normal distribution,
therefore the model is normally distributed.
therefore the model is normally distributed.
• Normal Q-Q plot tells about the ordered
• Normal Q-Q plot tells about the ordered
distribution of residuals following the linear
distribution of residuals following the linear
trend taken normal distribution with N(0,1).
trend taken normal distribution with N(0,1).
• Correlogram time series residuals have low
• Correlogram time series residuals have low
correlation with lagged version.
correlation with lagged version.
7. Build ARIMA/SARIMA models based on the cut-off points of ACF and PACF
on the training data and evaluate this model on the test data using RMSE.

ROSE WINE SPARKLING WINE

➢ MANUAL ARIMA ➢ MANUAL ARIMA

• Manual ARIMA model is built based on ACF • Manual ARIMA model is built based on ACF
plot and PACF plot. plot and PACF plot.
• Hence, we choose AR parameter value as p and • Hence, we choose AR parameter value as p and
moving average parameter value to be q. moving average parameter value to be q.
• TEST RMSE SCORE = 15.73425 • TEST RMSE SCORE = 1461.6785026
➢ MANUAL SARIMA ➢ MANUAL SARIMA

• Manual ARIMA model is built based on ACF • Manual ARIMA model is built based on ACF
plot and PACF plot. plot and PACF plot.
• Hence, we choose AR parameter value as p, • Hence, we choose AR parameter value as p,
moving average parameter value to be q and moving average parameter value to be q and
d(difference) value to be 1. d(difference) value to be 1.
• We then derive the seasonal parameters based • We then derive the seasonal parameters based
on the seasonal cut-off. on the seasonal cut-off.
• TEST RMSE SCORE = 20.96410 • TEST RMSE SCORE = 558.438329
• The Standardized Residual do not display any
• The Standardized Residual do not display any obvious seasonality.
obvious seasonality. • Histogram plus estimated density shows the
• Histogram plus estimated density shows the KDE of the residuals is in normal distribution,
KDE of the residuals is in normal distribution, therefore the model is normally distributed.
therefore the model is normally distributed. • Normal Q-Q plot tells about the ordered
• Normal Q-Q plot tells about the ordered distribution of residuals following the linear
distribution of residuals following the linear trend taken normal distribution with N(0,1).
trend taken normal distribution with N(0,1).
• Correlogram time series residuals have low
• Correlogram time series residuals have low
correlation with lagged version.
correlation with lagged version.
8. Build a table (create a data frame) with all the models built along with their
corresponding parameters and the respective RMSE values on the test data.

ROSE WINE SPARKLE WINE

TEST
MODEL – ROSE RMS TEST
MODEL – SPARKLING
E RMSE

Alpha=0.3,Beta=0.4,Gamma=0.3,TripleExponentialSmoothing 10.945435
Alpha=0.154,Beta=1.307,Gamma=0.371,TripleExponentialSmoothing 383.15568
2pointTrailingMovingAverage 11.529278
Alpha=0.3,Beta=0.3,Gamma=0.3,TripleExponentialSmoothing 392.78619
4pointTrailingMovingAverage 14.451403
SARIMA(1,1,2)(1,0,2,12) 528.45273
6pointTrailingMovingAverage 14.566327
SARIMA(0,1,0)(1,1,3,6) 558.43832
9pointTrailingMovingAverage 14.727630
SARIMA(1,1,2)(2,0,2,6) 626.88015
ARIMA(0,1,2) 15.618912
2pointTrailingMovingAverage 813.40068
ARIMA(1,1,1) 15.734259
4pointTrailingMovingAverage 1156.5896
Alpha=0.106,Beta=0.048,Gamma=0.0,TripleExponentialSmoothin
17.369489
g Alpha=0.0,SimpleExponentialSmoothing 1275.0817
SARIMA(1,1,2)(2,0,2,6) 20.964110
SimpleAverageModel 1275.0818
SARIMA(1,1,2)(2,0,2,6) 26.133554
RegressionOnTime 1275.8670
SARIMA(0,1,2)(2,0,2,12) 26.929368
6pointTrailingMovingAverage 1283.9274
Alpha=0.098,SimpleExponentialSmoothing 36.796244 9pointTrailingMovingAverage 1346.2783
Alpha=0.3,SimpleExponentialSmoothing 47.504821 ARIMA(0,1,2) 1374.9769
RegressionOnTime 51.433312 ARIMA(1,1,1) 1461.6785
SimpleAverageModel 53.460570 Alpha=0.3,SimpleExponentialSmoothing 1935.5071
NaiveModel 79.718773 NaiveModel 3864.2793
Alpha=0.3,Beta=0.1,DoubleExponentialSmoothing 98.653317 Alpha=0.3,Beta=0.3,DoubleExponentialSmoothing 18259.110

• From the above table we see that the lowest • From the above table we see that the lowest score
score is 10.945435. is 383.15568.
• Obtained from triple exponential smoothing • Obtained from triple exponential smoothing
model. model.
• Which was executed on different alpha, beta • Which was executed on auto/manual fit
and gamma values ranging from 0.3 to 1.0. parameters alpha, beta and gamma values.
• Parameters having lowest score alpha = 0.3, • Whose smoothing level(alpha) = 0.154,
beta = 0.4 and gamma = 0.3. smoothing trend(beta) = 1.307 and smoothing
seasonality(gamma) = 0.371.
9. Based on the model-building exercise, build the most optimum model(s) on the
complete data and predict 12 months into the future with appropriate
confidence intervals/bands.

ROSE WINE SPARKLING WINE

• From the previous answer we observe that • From the previous answer we observe that
Triple Exponential Smoothing has the least Triple Exponential Smoothing has the least
RMSE score. RMSE score.
• It falls under most optimum model compared to • It falls under most optimum model compared to
other models. other models.
• The model is calculated with the parameters • The model is calculated with the parameters
having alpha =0.3, beta = 0.3, gamma =0.3. having alpha =0.154, beta = 1.307, gamma
• The upper and lower bands are calculated with =0.371.
95% accuracy. • The upper and lower bands are calculated with
• The final TEST RMSE SCORE = 24.2665. 95% accuracy.
• The final TEST RMSE SCORE = 353.9124
10.Comment on the model thus built and report your findings and suggest the
measures that the company should be taking for future sales.
➢ Time series analysis involves understanding various aspects about the inherent nature of the series so that
you are better informed to create meaningful and accurate forecasts
➢ Any time series may be split into the following components: Base Level + Trend + Seasonality + Error.

ROSE WINE SPARKLING WINE

• Rose sales shows decrease in trend compared • Sparkling sales shows stabilized values.
to the previous years. • December month shows the highest sales.
• December month shows the highest sales. • The models are built and are chosen based on
• The models are built and are chosen based on the least RMSE score.
the least RMSE score. • The sales of Sparkling wine is seasonal and
• The sales of Rose wine is seasonal and also also had trend. Therefore, the company cannot
had trend. Therefore, the company cannot have have the same stock throughout the year.
the same stock throughout the year. • The company should use prediction results to
• The company should use prediction results to plan about future stock.
plan about future stock.

END

You might also like