Rose Sparkling Wine
Rose Sparkling Wine
Rose Sparkling Wine
analysed. Both of these data are from the same company but of different wines. As an analyst in the
ABC Estate Wines, you are tasked to analyse and forecast Wine Sales in the 20th century.
1.Read the data as an appropriate Time Series data and plot the data.
2. Perform appropriate Exploratory Data Analysis to understand the data and also perform
decomposition.
Data set has 187 records and there are no null values in data set Sparkling
Decomposition of Sparkling-wine sales into Trend, Seasonal and Residual
3. Split the data into training and test. The test data should start in 1991.
After splitting train and test data set , train data set has 132 records and test has 55 records
4. Build various exponential smoothing models on the training data and evaluate the model using RMSE
on the test data. Other models such as regression,naïve forecast models, simple average models etc.
should also be built on the training data and check the performance on the test data using RMSE.
Please do try to build as many models as possible and as many iterations of models as possible with different
parameters.
Test RMSE
RegressionOnTime 15.280000
NaiveModel 79.741326
SimpleAverageModel 53.483727
2pointTrailingMovingAverage 11.529811
4pointTrailingMovingAverage 14.457115
6pointTrailingMovingAverage 14.571789
9pointTrailingMovingAverage 14.731914
Alpha=0.995,SimpleExponentialSmoothing 36.819844
Alpha=0.995,Beta=0.995:DoubleExponentialSmoothing 15.276679
Alpha=0.99,Beta=0.0001,Gamma=0.005:DoubleExponentialSmoothing 20.962011
Alpha=0.02,SimpleExponentialSmoothing 36.459396
Test RMSE
RegressionOnTime 1389.140000
NaiveModel 3864.279352
SimpleAverageModel 1275.081804
2pointTrailingMovingAverage 813.400684
4pointTrailingMovingAverage 1156.589694
6pointTrailingMovingAverage 1283.927428
Test RMSE
9pointTrailingMovingAverage 1346.278315
Alpha=0.995,SimpleExponentialSmoothing 1316.034674
Alpha=0.995,Beta=0.995:DoubleExponentialSmoothing 2007.238526
Alpha=0.99,Beta=0.0001,Gamma=0.005:DoubleExponentialSmoothing 469.591976
Alpha=0.02,SimpleExponentialSmoothing 1279.495201
5. Check for the stationarity of the data on which the model is being built on using appropriate statistical
tests and also mention the hypothesis for the statistical test. If the data is found to be non-stationary,
take appropriate steps to make it stationary. Check the new data for stationarity and comment. Note:
Stationarity should be checked at alpha = 0.05.
To check whether the series is stationary, we use the Augmented Dickey Fuller (ADF)test whose null and
alternate hypothesis can be simplified to
Null Hypothesis H0: Time Series is non-stationary
Alternate Hypothesis Ha: Time Series is stationary
Rose wine sales
After next level of levels of differencing p-value <0.05 therefore series is stationary.
6. Build an automated version of the ARIMA/SARIMA model in which the parameters are selected using
the lowest Akaike Information Criteria (AIC) on the training data and evaluate this model on the test data
using RMSE.
ARIMA Model
AIC values in descending order
param AIC
17 (3, 1, 3) 1273.194108
4 (1, 1, 2) 1277.359223
3 (1, 1, 1) 1277.775747
param AIC
9 (2, 1, 1) 1279.045689
10 (2, 1, 2) 1279.298694
5 (1, 1, 3) 1279.312635
15 (3, 1, 1) 1279.605966
16 (3, 1, 2) 1280.969245
11 (2, 1, 3) 1281.196226
1 (1, 0, 2) 1292.053210
7 (2, 0, 2) 1292.248055
2 (1, 0, 3) 1292.929011
6 (2, 0, 1) 1292.937195
14 (3, 0, 3) 1293.042709
8 (2, 0, 3) 1294.247938
0 (1, 0, 1) 1294.510585
12 (3, 0, 1) 1333.933193
13 (3, 0, 2) 1355.403813
SARIMA model
AIC values in descending order (lowest AIC 10 records)
SARIMAX Results
===================================================================================
Dep. Variable: y No. Observations: 132
Model: ARIMAX(0, 1, 2)x(2, 1, 2, 12) Log Likelihood -380.485
Date: Sat, 06 Mar 2021 AIC 774.969
Time: 13:41:44 BIC 792.622
Sample: 0 HQIC 782.094
- 132
Covariance Type: opg
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
ma.L1 -0.9524 0.184 -5.166 0.000 -1.314 -0.591
ma.L2 -0.0764 0.126 -0.605 0.545 -0.324 0.171
ar.S.L12 0.0480 0.177 0.271 0.786 -0.299 0.395
ar.S.L24 -0.0419 0.028 -1.513 0.130 -0.096 0.012
ma.S.L12 -0.7526 0.301 -2.503 0.012 -1.342 -0.163
ma.S.L24 -0.0721 0.204 -0.354 0.723 -0.471 0.327
sigma2 187.8679 45.274 4.150 0.000 99.132 276.604
===================================================================================
Ljung-Box (L1) (Q): 0.06 Jarque-Bera (JB): 4.86
Prob(Q): 0.81 Prob(JB): 0.09
Heteroskedasticity (H): 0.91 Skew: 0.41
Prob(H) (two-sided): 0.79 Kurtosis: 3.77
===================================================================================
ARIMA Model
AIC values in descending order
param AIC
8 (2, 1, 2) 2210.616954
7 (2, 1, 1) 2232.360490
param AIC
2 (0, 1, 2) 2232.783098
5 (1, 1, 2) 2233.597647
4 (1, 1, 1) 2235.013945
6 (2, 1, 0) 2262.035601
1 (0, 1, 1) 2264.906439
3 (1, 1, 0) 2268.528061
0 (0, 1, 0) 2269.582796
SARIMAX Results
==================================================================================
Dep. Variable: y No. Observations: 132
Model: SARIMAX(0, 1, 2)x(0, 1, 2, 12) Log Likelihood -686.242
Date: Sun, 28 Feb 2021 AIC 1382.484
Time: 18:06:38 BIC 1395.093
Sample: 0 HQIC 1387.573
- 132
Covariance Type: opg
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
ma.L1 -0.7223 0.107 -6.752 0.000 -0.932 -0.513
ma.L2 -0.2292 0.137 -1.671 0.095 -0.498 0.040
ma.S.L12 -0.4113 0.087 -4.743 0.000 -0.581 -0.241
ma.S.L24 -0.0419 0.138 -0.304 0.761 -0.312 0.228
sigma2 1.736e+05 2.06e+04 8.425 0.000 1.33e+05 2.14e+05
===================================================================================
Ljung-Box (L1) (Q): 0.02 Jarque-Bera (JB): 27.42
Prob(Q): 0.88 Prob(JB): 0.00
Heteroskedasticity (H): 0.84 Skew: 0.80
Prob(H) (two-sided): 0.62 Kurtosis: 5.15
===================================================================================
7. Build ARIMA/SARIMA models based on the cut-off points of ACF and PACF on the training data and
evaluate this model on the test data using RMSE.
By taking these parameters (4,1,2) and (4,1,2,12) SARIMA results are as under.
SARIMAX Results
====================================================================================
Dep. Variable: y No. Observations: 132
Model: SARIMAX(4, 1, 2)x(4, 1, 2, 12) Log Likelihood -277.661
Date: Sun, 07 Mar 2021 AIC 581.322
Time: 14:22:21 BIC 609.983
Sample: 0 HQIC 592.663
- 132
Covariance Type: opg
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
ar.L1 -0.9742 0.199 -4.899 0.000 -1.364 -0.584
ar.L2 -0.1122 0.285 -0.394 0.694 -0.670 0.446
ar.L3 -0.1044 0.277 -0.377 0.706 -0.647 0.438
ar.L4 -0.1285 0.162 -0.794 0.427 -0.446 0.189
ma.L1 0.1605 328.137 0.000 1.000 -642.976 643.297
ma.L2 -0.8395 275.462 -0.003 0.998 -540.734 539.055
ar.S.L12 -0.1441 0.364 -0.396 0.692 -0.858 0.569
ar.S.L24 -0.3597 0.227 -1.587 0.113 -0.804 0.085
ar.S.L36 -0.2153 0.106 -2.039 0.041 -0.422 -0.008
ar.S.L48 -0.1195 0.093 -1.281 0.200 -0.302 0.063
ma.S.L12 -0.5159 0.343 -1.503 0.133 -1.189 0.157
ma.S.L24 0.2086 0.373 0.559 0.576 -0.523 0.940
sigma2 215.3512 7.07e+04 0.003 0.998 -1.38e+05 1.39e+05
===================================================================================
Ljung-Box (L1) (Q): 0.03 Jarque-Bera (JB): 2.41
Prob(Q): 0.86 Prob(JB): 0.30
Heteroskedasticity (H): 0.49 Skew: 0.32
Prob(H) (two-sided): 0.10 Kurtosis: 3.68
===================================================================================
By taking these parameters (3,1,2) and (3,1,2,12) SARIMA results are as under.
SARIMAX Results
==================================================================================
Dep. Variable: y No. Observations: 132
Model: SARIMAX(3, 1, 2)x(3, 1, 2, 12) Log Likelihood -598.630
Date: Sun, 07 Mar 2021 AIC 1219.260
Time: 15:01:53 BIC 1245.462
Sample: 0 HQIC 1229.765
- 132
Covariance Type: opg
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
ar.L1 -0.7556 0.151 -5.013 0.000 -1.051 -0.460
ar.L2 0.1169 0.185 0.633 0.527 -0.245 0.479
ar.L3 -0.0520 0.143 -0.365 0.715 -0.332 0.228
ma.L1 0.0330 0.191 0.173 0.863 -0.341 0.407
ma.L2 -0.9670 0.156 -6.197 0.000 -1.273 -0.661
ar.S.L12 -0.7538 0.496 -1.520 0.128 -1.725 0.218
ar.S.L24 -0.6371 0.351 -1.818 0.069 -1.324 0.050
ar.S.L36 -0.2469 0.151 -1.641 0.101 -0.542 0.048
ma.S.L12 0.3719 0.491 0.758 0.448 -0.590 1.334
ma.S.L24 0.3466 0.365 0.949 0.343 -0.370 1.063
sigma2 1.79e+05 1.67e-06 1.07e+11 0.000 1.79e+05 1.79e+05
===================================================================================
Ljung-Box (L1) (Q): 0.01 Jarque-Bera (JB): 13.16
Prob(Q): 0.93 Prob(JB): 0.00
Heteroskedasticity (H): 0.66 Skew: 0.62
Prob(H) (two-sided): 0.29 Kurtosis: 4.55
===================================================================================
RMSE from SARIMA model is 329.53
8. Build a table (create a data frame) with all the models built along with their corresponding parameters
and the respective RMSE values on the test data.
RegressionOnTime 15.280000
NaiveModel 79.741326
SimpleAverageModel 53.483727
2pointTrailingMovingAverage 11.529811
4pointTrailingMovingAverage 14.457115
6pointTrailingMovingAverage 14.571789
9pointTrailingMovingAverage 14.731914
Alpha=0.995,SimpleExponentialSmoothing 36.819844
Alpha=0.995,Beta=0.995:DoubleExponentialSmoothing 15.276679
Alpha=0.99,Beta=0.0001,Gamma=0.005:DoubleExponentialSmoothing 20.962011
Alpha=0.02,SimpleExponentialSmoothing 36.459396
ARIMA(3, 1, 3) 15.99
ARIMA(4, 1, 2) 33.97
RegressionOnTime 1389.140000
NaiveModel 3864.279352
SimpleAverageModel 1275.081804
Test RMSE
2pointTrailingMovingAverage 813.400684
4pointTrailingMovingAverage 1156.589694
6pointTrailingMovingAverage 1283.927428
9pointTrailingMovingAverage 1346.278315
Alpha=0.995,SimpleExponentialSmoothing 1316.034674
Alpha=0.995,Beta=0.995:DoubleExponentialSmoothing 2007.238526
Alpha=0.99,Beta=0.0001,Gamma=0.005:DoubleExponentialSmoothing 469.591976
Alpha=0.02,SimpleExponentialSmoothing 1279.495201
ARIMA(2, 1, 2) 1375.03
ARIMA(3, 1, 2) 1375.10
9. Based on the model-building exercise, build the most optimum model(s) on the complete data and
predict 12 months into the future with appropriate confidence intervals/bands.
Since Rose data set has clear component of seasonality SARIM. Therefore, SARIMA model with parameters
(0,1,2)x(2,1,2,12) is selected for forecasting time line series and model details are as under.
SARIMAX Results
===================================================================================
Dep. Variable: Rose No. Observations: 187
Model: SARIMAX(0, 1, 2)x(2, 1, 2, 12) Log Likelihood -588.604
Date: Sat, 06 Mar 2021 AIC 1191.208
Time: 13:41:48 BIC 1212.142
Sample: 01-01-1980 HQIC 1199.714
- 07-01-1995
Covariance Type: opg
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
ma.L1 -0.8254 0.080 -10.334 0.000 -0.982 -0.669
ma.L2 -0.0807 0.086 -0.934 0.350 -0.250 0.089
ar.S.L12 0.0635 0.160 0.398 0.691 -0.249 0.376
ar.S.L24 -0.0340 0.019 -1.790 0.074 -0.071 0.003
ma.S.L12 -0.6953 0.207 -3.360 0.001 -1.101 -0.290
ma.S.L24 -0.0547 0.150 -0.365 0.715 -0.348 0.239
sigma2 166.0900 17.899 9.279 0.000 131.008 201.172
===================================================================================
Ljung-Box (L1) (Q): 0.07 Jarque-Bera (JB): 8.28
Prob(Q): 0.79 Prob(JB): 0.02
Heteroskedasticity (H): 0.51 Skew: 0.33
Prob(H) (two-sided): 0.02 Kurtosis: 3.95
Since Sparkling sales data has component of seasonality. Therefore, SARIMA model with para meters (0,1,2) (0, 1, 2, 12)
is proposed to used for forecast for next 12 months using full data. Details of model are as under.
SARIMAX Results
===================================================================================
Dep. Variable: Sparkling No. Observations: 187
Model: SARIMAX(0, 1, 2)x(0, 1, 2, 12) Log Likelihood -1087.003
Date: Sun, 28 Feb 2021 AIC 2184.006
Time: 18:26:36 BIC 2198.958
Sample: 01-01-1980 HQIC 2190.081
- 07-01-1995
Covariance Type: opg
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
ma.L1 -0.9094 0.104 -8.713 0.000 -1.114 -0.705
ma.L2 -0.1316 0.087 -1.507 0.132 -0.303 0.040
ma.S.L12 -0.5456 0.065 -8.393 0.000 -0.673 -0.418
ma.S.L24 -0.0202 0.084 -0.241 0.810 -0.185 0.145
sigma2 1.419e+05 1.32e+04 10.755 0.000 1.16e+05 1.68e+05
===================================================================================
Ljung-Box (L1) (Q): 0.01 Jarque-Bera (JB): 49.28
Prob(Q): 0.91 Prob(JB): 0.00
Heteroskedasticity (H): 0.79 Skew: 0.74
Prob(H) (two-sided): 0.42 Kurtosis: 5.41
===================================================================================
10. Comment on the model thus built and report your findings and suggest the measures that the
company should be taking for future sales.
Trend in sales of Rose is continuously decreasing over the period. Detailed study may be required to see whether
decreasing trend is due to change in customer preference or due to substitution. Seasonality of sales is observed, and
higher sales is maintained in the end of the year. Some promotion schemes and improvement / quality enhancers in the
product can be examined so as to attract new young generation customers.