100% found this document useful (6 votes)

1K views24 pages

SANDYA VB-Business Report TSF

The document analyzes time series data on wine sales from 1980-1995 and performs forecasting. It splits the data into training and test sets, builds exponential smoothing and other models on the training data, and evaluates the models' performance on the test set using RMSE. Stationarity of the data is also checked.

Uploaded by

Sandya Vb

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (6 votes)

1K views24 pages

SANDYA VB-Business Report TSF

Uploaded by

Sandya Vb

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

BUSINESS ANALYSIS

REPORT
TIME SERIES FORECASTING

JUNE 20, 2021

SANDYA V B
CONTENTS
1. Read the data as an appropriate Time Series data and plot the data.
2. Perform appropriate Exploratory Data Analysis to understand the data and also perform
decomposition.
3. Split the data into training and test. The test data should start in 1991.
4. Build various exponential smoothing models on the training data and evaluate the model
using RMSE on the test data.
Other models such as regression, naïve forecast models, simple average models etc. should
also be built on the training data and check the performance on the test data using RMSE.
5. Check for the stationarity of the data on which the model is being built on using
appropriate statistical tests and also mention the hypothesis for the statistical test. If the
data is found to be non-stationary, take appropriate steps to make it stationary. Check the
new data for stationarity and comment.
Note: Stationarity should be checked at alpha = 0.05.
6. Build an automated version of the ARIMA/SARIMA model in which the parameters are
selected using the lowest Akaike Information Criteria (AIC) on the training data and
evaluate this model on the test data using RMSE.
7. Build ARIMA/SARIMA models based on the cut-off points of ACF and PACF on the
training data and evaluate this model on the test data using RMSE.
8. Build a table with all the models built along with their corresponding parameters and the
respective RMSE values on the test data.
9. Based on the model-building exercise, build the most optimum model(s) on the complete
data and predict 12 months into the future with appropriate confidence intervals/bands.
10. Comment on the model thus built and report your findings and suggest the measures
that the company should be taking for future sales.
PROBLEM:
For this particular assignment, the data of different types of wine sales in the 20th century is to be
analysed. Both of these data are from the same company but of different wines. As an analyst in the ABC
Estate Wines, you are tasked to analyse and forecast Wine Sales in the 20th century.

Data set for the Problem: Sparkling.csv and Rose.csv

1. Read the data as an appropriate Time Series data and plot the data.

➢ The two datasets: Rose and Sparkling are imported using the read command. And convert to time series
data using
date_range function:
date = pd.date_range(start='01/01/1980', end='08/01/1995', freq='M')date
df['Time_Stamp'] = pd.DataFrame(date,columns=['Month'])
df.head()

o/p:

ROSE WINE YEAR WISE SALES SPARKLING WINE YEAR WISE

SALES

• We observe that there is no much trend in the

• From the above plot we observe that there is a
above plot.
decreasing trend in the initial years and stabilizes
• The seasonality seems to have a pattern on
over the years.
yearly basis.
• We also see that the seasonality in the data trend
and pattern seems to repeat.

2. Perform appropriate Exploratory Data Analysis to understand the data and also
perform decomposition.

ROSE WINE EDA SPARKLING WINE EDA

• The shape of the data is (187,1). • The shape of the data is (187,1).
• There are 2 null values present in the data, which • There are no null values present.
was interpolated using linear method. • Describing the data:
• Describing the data:
Measures count mean std min 25% 50% 75% max Measures count mean std min 25% 50% 75% max
Rose 185 90.3 39.1 28 63 86 112 267 Rose 187 2402.41 1295.11 1070 1605 1874 2549 7242
• From the above plot we see that the box plots • From the above plot, we see that the box plots
indicates a downward trend do not indicate any trend.
• We also see that there are few outliers present in • We also observe that the sale of Sparkling wine
the sales plot. has outliers for almost all the years except
1955.

• From the above plot, we observe that there is

• From the above plot, we see that December an increase in the sale.
month has the highest sales of wine. • We also see that the sale for the month
• There are also outliers present in June, July, December has the highest value.
August and September months.

• We observe that the line plot of year/month

• We observe that the line plot of year/month wise wise sales shows that the December month has
sales shows that the December month has the the highest sale and August, January and
highest sale and May, January and February show February show lower sale values.
lower sale values.

• The time series month plot is to understand the

• The time series month plot is to understand the spread of Rose wine sale across different years
spread of Rose wine sale across different years and within different months across years.
and within different months across years.
• To resample or aggregate the Time Series from
• To resample or aggregate the Time Series from an
an annual perspective and sum up the
annual perspective and sum up the observations.
observations.

• To resample or aggregate the Time Series from an • To resample or aggregate the Time Series from
annual perspective and take the mean of the an annual perspective and take the mean of the
observations of the year. observations of the year.

• If we take the resampling period to be 10 years or • If we take the resampling period to be 10 years
a decade, we see that the seasonality present has or a decade, we see that the seasonality present
been smoothed over and it is only giving an has been smoothed over and it is only giving
estimate of the trend. an estimate of the trend.

• This particular graph tells us what percentage

• This particular graph tells us what percentage of
of data points refer to what number of Sales.
data points refer to what number of Sales.
• The above two graphs tell us the Average Sales • The above two graphs tell us the Average Sales
and the Percentage change of Sales with respect to and the Percentage change of Sales with
the time. respect to the time.

• We see that the residuals are located around 0

• We see that the residuals are located around 0
from the plot of the residuals in the
from the plot of the residuals in the
decomposition.
decomposition.

• For the multiplicative series, we see that a lot of • For the multiplicative series, we see that a lot
residuals are located around 1. of residuals are located around 1.

3. Split the data into training and test. The test data should start in 1991.

ROSE WINE TRAIN & TEST DATA SPARKLING WINE TRAIN & TEST
DATA
• The train data of Rose wine has been splitted • The train data of Rose wine has been splitted
upto the year 1990 and has 132 data points. upto the year 1990 and has 132 data points.
• The test data has been splitted from the year • The test data has been splitted from the year
1991 and has 55 data points. 1991 and has 55 data points.
• From our train-test data split we will be • From our train-test data split we will be
predicting the future sales in comparison with predicting the future sales in comparison with
the past years’ sale. the past years’ sale.
• Training data: • Training data:
First few rows of Training Data First few rows of Training Data
Rose Sparkling
Time_Stamp Time_Stamp
1980-01-31 112.0 1980-01-31 1686
1980-02-29 118.0 1980-02-29 1591
1980-03-31 129.0 1980-03-31 2304
1980-04-30 99.0 1980-04-30 1712
1980-05-31 116.0 1980-05-31 1471

Last few rows of Training Data Last few rows of Training Data
Rose Sparkling
Time_Stamp Time_Stamp
1990-08-31 70.0 1990-08-31 1605
1990-09-30 83.0 1990-09-30 2424
1990-10-31 65.0 1990-10-31 3116
1990-11-30 110.0 1990-11-30 4286
1990-12-31 132.0 1990-12-31 6047

• Test data: • Test data:

First few rows of Test Data First few rows of Test Data
Rose Sparkling
Time_Stamp Time_Stamp
1991-01-31 54.0 1991-01-31 1902
1991-02-28 55.0 1991-02-28 2049
1991-03-31 66.0 1991-03-31 1874
1991-04-30 65.0 1991-04-30 1279
1991-05-31 60.0 1991-05-31 1432

Last few rows of Test Data Last few rows of Test Data
Rose Sparkling
Time_Stamp Time_Stamp
1995-03-31 45.0 1995-03-31 1897
1995-04-30 52.0 1995-04-30 1862
1995-05-31 28.0 1995-05-31 1670
1995-06-30 40.0 1995-06-30 1688
1995-07-31 62.0 1995-07-31 2031

4. Build various exponential smoothing models on the training data and evaluate
the model using RMSE on the test data. Other models such as regression, naïve
forecast models and simple average models. should also be built on the training
data and check the performance on the test data using RMSE.

ROSE WINE SPARKLING WINE

➢ MODEL 1: LINEAR REGRESSION ➢ MODEL 1: LINEAR REGRESSION
• For Linear Regression, we will regress the • For Linear Regression, we will regress the
‘Sales’ variable against the order of the ‘Sales’ variable against the order of the
occurrence. occurrence.
• Then we generate the numerical time instance • Then we generate the numerical time instance
order for both train and test set. order for both train and test set.
• We will add these values in the training and • We will add these values in the training and
test set. test set.
• Hence the train and test set are thus modified • Hence the train and test set are thus modified
to perform Linear regression. to perform Linear regression.

• First few rows of Training Data • First few rows of Training Data
Rose time Sparkling time
Time_Stamp Time_Stamp
1980-01-31 112.0 1 1980-01-31 1686 1
1980-02-29 118.0 2 1980-02-29 1591 2
1980-03-31 129.0 3 1980-03-31 2304 3
1980-04-30 99.0 4 1980-04-30 1712 4
1980-05-31 116.0 5 1980-05-31 1471 5

Last few rows of Training Data Last few rows of Training Data
Rose time Sparkling time
Time_Stamp Time_Stamp
1990-08-31 70.0 128 1990-08-31 1605 128
1990-09-30 83.0 129 1990-09-30 2424 129
1990-10-31 65.0 130 1990-10-31 3116 130
1990-11-30 110.0 131 1990-11-30 4286 131
1990-12-31 132.0 132 1990-12-31 6047 132

First few rows of Test Data First few rows of Test Data
Rose time Sparkling time
Time_Stamp Time_Stamp
1991-01-31 54.0 43 1991-01-31 1902 43
1991-02-28 55.0 44 1991-02-28 2049 44
1991-03-31 66.0 45 1991-03-31 1874 45
1991-04-30 65.0 46 1991-04-30 1279 46
1991-05-31 60.0 47 1991-05-31 1432 47

Last few rows of Test Data Last few rows of Test Data
Rose time Sparkling time
Time_Stamp Time_Stamp
1995-03-31 45.0 93 1995-03-31 1897 93
1995-04-30 52.0 94 1995-04-30 1862 94
1995-05-31 28.0 95 1995-05-31 1670 95
1995-06-30 40.0 96 1995-06-30 1688 96
1995-07-31 62.0 97 1995-07-31 2031 97

• TEST RMSE SCORE = 51.433312 • TEST RMSE SCORE = 1275.867052

➢ MODEL 2: NAÏVE MODEL ➢ MODEL 2: NAÏVE MODEL

• For the Naïve Model, we observe that the • For the Naïve Model, we observe that the
green line in the plot below shows a straight green line in the plot below shows a straight
line. line.
• Which predicts that the sale for tomorrow is • Which predicts that the sale for tomorrow is
the same as today. the same as today.
• And the prediction for day after tomorrow is • And the prediction for day after tomorrow is
tomorrow. tomorrow.
• Hence it applies to all the future years. • Hence it applies to all the future years.
• TEST RMSE SCORE = 79.718773 • TEST RMSE SCORE = 3864.279352

➢ MODEL 3: SIMPLE AVERAGE MODEL ➢ MODEL 3: SIMPLE AVERAGE MODEL

• In Simple Average method, we will forecast • In Simple Average method, we will forecast
the data using the average of the training the data using the average of the training
values. values.
• From the plot below, we observe that the • From the plot below, we observe that the green
green line is straight and shows the Simple line is straight and shows the Simple Average
Average forecasting. forecasting.
• TEST RMSE SCORE = 53.460570 • TEST RMSE SCORE = 1275.081804
➢ MODEL 4: MOVING AVERAGE MODEL ➢ MODEL 4: MOVING AVERAGE MODEL
• In Moving Average Model, we compute • In Moving Average Model, we compute
moving averages for 2, 4, 6 and 9 point moving averages for 2, 4, 6 and 9 point
intervals. intervals.
• Then the best interval is determined by the • Then the best interval is determined by the
maximum accuracy. maximum accuracy.
• From the below table we see that 2 point • From the below table we see that 2 point
trailing moving average has the least score. trailing moving average has the least score.

Moving average Test RMSE Moving average Test RMSE

2point trailing moving average 11.529278 2point trailing moving average 813.400684
4point trailing moving average 14.451403 4point trailing moving average 1156.589694
6point trailing moving average 14.566327 6point trailing moving average 1283.927428
9point trailing moving average 14.727630 9point trailing moving average 1346.278315

➢ MODEL 5: SIMPLE EXPONENTIAL ➢ MODEL 5: SIMPLE EXPONENTIAL

SMOOTHING MODEL SMOOTHING MODEL

• Simple Exponential Smoothing model is • Simple Exponential Smoothing model is

evaluated for alpha = 0.098. evaluated for alpha = 0.0.

• TEST RMSE SCORE = 36.796244 • TEST RMSE SCORE = 1275.081766

❖ SIMPLE EXPONENTIAL SMOOTHING ❖ SIMPLE EXPONENTIAL SMOOTHING
TUNING MODEL TUNING MODEL

• In Simple Exponential Smoothing Model, we • In Simple Exponential Smoothing Model, we

will run a loop with different alpha values to will run a loop with different alpha values to
understand which particular value is best. understand which particular value is best.
• Alpha value ranges from 0.3 to 0.9 • Alpha value ranges from 0.3 to 0.9.
• TEST RMSE SCORE = 47.504821 • TEST RMSE SCORE = 1935.507132

Alpha Values Train RSME Test RSME Alpha Values Train RSME Test RSME
0.3 32.470164 47.504821 0.3 1359.511747 1935.507132
0.4 33.035130 53.767406 0.4 1352.588879 2311.919615
0.5 33.682839 59.641786 0.5 1344.004369 2666.351413
0.6 34.441171 64.971288 0.6 1338.805381 2979.204388
0.7 35.323261 69.698162 0.7 1338.844308 3249.944092
0.8 36.334596 73.773992
0.9 37.482782 77.139276

➢ MODEL 6: DOUBLE EXPONENTIAL ➢ MODEL 6: DOUBLE EXPONENTIAL

SMOOTHING MODEL
SMOOTHING MODEL

• In Double Exponential Smoothing, we have

• In Double Exponential Smoothing, we have
two parameters alpha and beta.
two parameters alpha and beta.
• The values range from 0.3 to 0.1 and we get at
• The values range from 0.3 to 0.1 and we get at
least 5 RMSE scores.
least 5 RMSE scores.
• TEST RMSE SCORE = 18259.110704
• TEST RMSE SCORE = 98.653317

Alpha Beta Train Test

Alpha Beta Train Test
Values Values RMSE RMSE
Values Values RMSE RMSE
0.3 0.1 33.611269 98.653317
0.3 0.3 1592.292788 18259.110704
0.4 0.1 34.255060 128.978579
0.4 0.3 1569.338606
23878.496940
0.5 0.1 34.957515 155.358815
0.3 0.4 1682.573828 26069.841401
0.3 0.2 34.645117 177.140327
0.5 0.3 1530.575845 27095.532414

0.6 0.1 35.781643 178.004967 0.6 0.3 1506.449870 29070.722592

➢ MODEL 7: TRIPLE EXPONENTIAL
➢ MODEL 7: TRIPLE EXPONENTIAL SMOOTHING MODEL
SMOOTHING MODEL
• In Triple Exponential Smoothing we have three
• In Triple Exponential Smoothing we have
parameters: Alpha, Beta and Gamma.
three parameters: Alpha, Beta and Gamma.
• Smoothing level value represents Alpha.
• Smoothing level value represents Alpha.
• Smoothing trend value represents Beta.
• Smoothing trend value represents Beta.
• Smoothing seasonality value represents
• Smoothing seasonality value represents
Gamma.
Gamma.
• TEST RMSE SCORE = 383.155684
• TEST RMSE SCORE = 17.369489

❖ TRIPLE EXPONENTIAL SMOOTHING

❖ TRIPLE EXPONENTIAL SMOOTHING
TUNING MODEL
TUNING MODEL
• In Triple Exponential Smoothing Model, we
• In Triple Exponential Smoothing Model, we
will run a loop with different alpha, beta and
will run a loop with different alpha, beta and
gamma values to understand which particular
gamma values to understand which particular
set of value is best.
set of value is best.
• TEST RMSE SCORE = 10.945435
• TEST RMSE SCORE = 10.945435

Alpha Beta Gamma Train Test

Alpha Beta Gamma Train Test
value Value Value RMSE RMSE
value Value Value RMSE RMSE
0.3 0.3 0.3 404.513320 392.786198
0.3 0.4 0.3 28.111886 10.945435
0.3 0.4 0.3 424.828055 410.854547
0.3 0.3 0.4 27.399095 11.201633
0.4 0.3 0.4 435.553595 421.409170
0.4 0.3 0.8 32.601491 12.615607
0.7 0.8 0.3 700.317756 518.188752
0.3 0.5 0.3 29.087520 14.414604
0.5 0.3 0.6 32.144773 16.720720 0.5 0.3 0.5 498.239915 542.175497
5. Check for the stationarity of the data on which the model is being built on
using appropriate statistical tests and also mention the hypothesis for the
statistical test. If the data is found to be non-stationary, take appropriate steps to
make it stationary. Check the new data for stationarity and comment. Note:
Stationarity should be checked at alpha = 0.05.

ROSE WINE SPARKLING WINE

Results of Dickey-Fuller Test: Results of Dickey-Fuller Test:

Test Statistic -1.876699 Test Statistic -1.360497
p-value 0.343101 p-value 0.601061
#Lags Used 13.000000 #Lags Used 11.000000
Number of Observations Used 173.000000 Number of Observations Used 175.000000
Critical Value (1%) -3.468726 Critical Value (1%) -3.468280
Critical Value (5%) -2.878396 Critical Value (5%) -2.878202
Critical Value (10%) -2.575756 Critical Value (10%) -2.575653

• To check the stationarity of Rose data, we if the • To check the stationarity of Sparkling data, we
alpha value is less than 0.05 if the alpha value is less than 0.05
• From the above result we see that the alpha = 0.34 • From the above result we see that the alpha =
which is higher than 0.05 0.60 which is higher than 0.05
• Hence, we fail to reject the null hypothesis • Hence, we fail to reject the null hypothesis

Results of Dickey-Fuller Test: Results of Dickey-Fuller Test:

Test Statistic -8.044392e+00 Test Statistic -45.050301
p-value 1.810895e-12 p-value 0.000000
#Lags Used 1.200000e+01 #Lags Used 10.000000
Number of Observations Used 1.730000e+02 Number of Observations Used 175.000000
Critical Value (1%) -3.468726e+00 Critical Value (1%) -3.468280
Critical Value (5%) -2.878396e+00 Critical Value (5%) -2.878202
Critical Value (10%) -2.575756e+00 Critical Value (10%) -2.575653
• Therefore, we apply a difference of 1 and check • Therefore, we apply a difference of 1 and
for Stationarity. check for Stationarity.
• Now we from the above result the alpha value is • Now we from the above result the alpha value
less than 0.05. is less than 0.05.
• Hence, the null hypothesis is rejected and the data • Hence, the null hypothesis is rejected and the
is Stationary. data is Stationary.

6. Build an automated version of the ARIMA/SARIMA model in which the

parameters are selected using the lowest Akaike Information Criteria (AIC) on
the training data and evaluate this model on the test data using RMSE.

ROSE WINE SPARKLING WINE

➢ AUTOMATED ARIMA ➢ AUTOMATED ARIMA
• We check for the stationarity of the data at alpha • We check for the stationarity of the data at
= 0.05. alpha = 0.05.

Results of Dickey-Fuller Test: Results of Dickey-Fuller Test:

Test Statistic -2.164250 Test Statistic -1.208926
p-value 0.219476 p-value 0.669744
#Lags Used 13.000000 #Lags Used 12.000000
Number of Observations Used 118.000000 Number of Observations Used 119.000000
Critical Value (1%) -3.487022 Critical Value (1%) -3.486535
Critical Value (5%) -2.886363 Critical Value (5%) -2.886151
Critical Value (10%) -2.580009 Critical Value (10%) -2.579896

• From the above result we see that the alpha = • From the above result we see that the alpha =
0.21 which is higher than 0.05 0.66 which is higher than 0.05
• Hence, we take a difference of 1 to make the data • Hence, we take a difference of 1 to make the
stationary. data stationary.
• From the below result we see that the value of • From the below result we see that the value of
alpha is less than 0.05. alpha is less than 0.05.

Results of Dickey-Fuller Test:

Test Statistic -6.592372e+00 Results of Dickey-Fuller Test:
p-value 7.061944e-09 Test Statistic -8.005007e+00
#Lags Used 1.200000e+01 p-value 2.280104e-12
Number of Observations Used 1.180000e+02 #Lags Used 1.100000e+01
Critical Value (1%) -3.487022e+00 Number of Observations Used 1.190000e+02
Critical Value (5%) -2.886363e+00 Critical Value (1%) -3.486535e+00
Critical Value (10%) -2.580009e+00 Critical Value (5%) -2.886151e+00
Critical Value (10%) -2.579896e+00

• To build the automated ARIMA model we • To build the automated ARIMA model we
arrange AIC value from lowest to highest. arrange AIC value from lowest to highest.
• And then proceed to build the ARIMA model • And then proceed to build the ARIMA model
with the lowest Akaike Information Criteria with the lowest Akaike Information Criteria
(AIC) value. (AIC) value.

param AIC
param AIC
(2,1,2) 2210.616692
(0,1,2) 1276.835377
(2,1,1) 2232.360490
(1,1,2) 1277.359224
(0,1,2) 2232.783098
(1,1,1) 1277.775754
(1,1,2) 2233.597647
(2,1,1) 1279.045689
(1,1,1) 2235.013945
(2,1,2) 1279.298694

• TEST RMSE SCORE = 1374.9769475

• TEST RMSE SCORE = 15.6189123
➢ AUTOMATED SARIMA ➢ AUTOMATED SARIMA

• We see that in the ACF plot there is seasonality at • We see that in the ACF plot there is seasonality
the interval of 6 and 12. at the interval of 6 and 12.
• Therefore, we run the automated SARIMA model • Therefore, we run the automated SARIMA
for both the intervals. model for both the intervals.
• The sorted AIC values from lowest to highest. • The sorted AIC values from lowest to highest.
• TEST RMSE SCORE for interval 6= 26.13355444 • TEST RMSE SCORE for interval 6 =
626.880153

param seasonal AIC param seasonal AIC

(1,1,2) (2,0,2,6) 1041.655817 (1,1,2) (2,0,2,6) 1727.678697
(0,1,2) (2,0,2,6) 1043.600261 (0,1,2) (2,0,2,6) 1727.887986
(2,1,2) (2,0,2,6) 1045.286900 (0,1,1) (2,0,2,6) 1741.703671
(2,1,1) (2,0,2,6) 1051.673461 (2,1,1) (2,0,2,6) 1744.040750
(1,1,1) (2,0,2,6) 1052.778469 (2,1,2) (2,0,0,6) 1758.961073
• TEST RMSE SCORE for interval 12= 26.929368 • TEST RMSE SCORE for interval 12= 528.4527

param seasonal AIC param seasonal AIC

(0,1,2) (2,0,2,12) 887.937509 (1,1,2) (1,0,2,12) 1555.584254
(2,1,2) (2,0,2,12) 890.668848 (1,1,2) (2,0,2,12) 1556.080259
(2,1,1) (2,0,0,12) 896.518161 (0,1,2) (2,0,2,12) 1557.121563
(2,1,2) (2,0,0,12) 897.346498 (0,1,2) (1,0,2,12) 1557.160319
(2,1,1) (2,0,1,12) 897.639957 (2,1,2) (1,0,2,12) 1557.439140

Inference for both 6 and 12 iteration :

Inference for both 6 and 12 iteration :
• The Standardized Residual do not display any
• The Standardized Residual do not display any
obvious seasonality.
obvious seasonality.
• Histogram plus estimated density shows the
• Histogram plus estimated density shows the
KDE of the residuals is in normal distribution,
KDE of the residuals is in normal distribution,
therefore the model is normally distributed.
therefore the model is normally distributed.
• Normal Q-Q plot tells about the ordered
• Normal Q-Q plot tells about the ordered
distribution of residuals following the linear
distribution of residuals following the linear
trend taken normal distribution with N(0,1).
trend taken normal distribution with N(0,1).
• Correlogram time series residuals have low
• Correlogram time series residuals have low
correlation with lagged version.
correlation with lagged version.
7. Build ARIMA/SARIMA models based on the cut-off points of ACF and PACF
on the training data and evaluate this model on the test data using RMSE.

ROSE WINE SPARKLING WINE

➢ MANUAL ARIMA ➢ MANUAL ARIMA

• Manual ARIMA model is built based on ACF • Manual ARIMA model is built based on ACF
plot and PACF plot. plot and PACF plot.
• Hence, we choose AR parameter value as p and • Hence, we choose AR parameter value as p and
moving average parameter value to be q. moving average parameter value to be q.
• TEST RMSE SCORE = 15.73425 • TEST RMSE SCORE = 1461.6785026
➢ MANUAL SARIMA ➢ MANUAL SARIMA

• Manual ARIMA model is built based on ACF • Manual ARIMA model is built based on ACF
plot and PACF plot. plot and PACF plot.
• Hence, we choose AR parameter value as p, • Hence, we choose AR parameter value as p,
moving average parameter value to be q and moving average parameter value to be q and
d(difference) value to be 1. d(difference) value to be 1.
• We then derive the seasonal parameters based • We then derive the seasonal parameters based
on the seasonal cut-off. on the seasonal cut-off.
• TEST RMSE SCORE = 20.96410 • TEST RMSE SCORE = 558.438329
• The Standardized Residual do not display any
• The Standardized Residual do not display any obvious seasonality.
obvious seasonality. • Histogram plus estimated density shows the
• Histogram plus estimated density shows the KDE of the residuals is in normal distribution,
KDE of the residuals is in normal distribution, therefore the model is normally distributed.
therefore the model is normally distributed. • Normal Q-Q plot tells about the ordered
• Normal Q-Q plot tells about the ordered distribution of residuals following the linear
distribution of residuals following the linear trend taken normal distribution with N(0,1).
trend taken normal distribution with N(0,1).
• Correlogram time series residuals have low
• Correlogram time series residuals have low
correlation with lagged version.
correlation with lagged version.
8. Build a table (create a data frame) with all the models built along with their
corresponding parameters and the respective RMSE values on the test data.

ROSE WINE SPARKLE WINE

TEST
MODEL – ROSE RMS TEST
MODEL – SPARKLING
E RMSE

Alpha=0.3,Beta=0.4,Gamma=0.3,TripleExponentialSmoothing 10.945435
Alpha=0.154,Beta=1.307,Gamma=0.371,TripleExponentialSmoothing 383.15568
2pointTrailingMovingAverage 11.529278
Alpha=0.3,Beta=0.3,Gamma=0.3,TripleExponentialSmoothing 392.78619
4pointTrailingMovingAverage 14.451403
SARIMA(1,1,2)(1,0,2,12) 528.45273
6pointTrailingMovingAverage 14.566327
SARIMA(0,1,0)(1,1,3,6) 558.43832
9pointTrailingMovingAverage 14.727630
SARIMA(1,1,2)(2,0,2,6) 626.88015
ARIMA(0,1,2) 15.618912
2pointTrailingMovingAverage 813.40068
ARIMA(1,1,1) 15.734259
4pointTrailingMovingAverage 1156.5896
Alpha=0.106,Beta=0.048,Gamma=0.0,TripleExponentialSmoothin
17.369489
g Alpha=0.0,SimpleExponentialSmoothing 1275.0817
SARIMA(1,1,2)(2,0,2,6) 20.964110
SimpleAverageModel 1275.0818
SARIMA(1,1,2)(2,0,2,6) 26.133554
RegressionOnTime 1275.8670
SARIMA(0,1,2)(2,0,2,12) 26.929368
6pointTrailingMovingAverage 1283.9274
Alpha=0.098,SimpleExponentialSmoothing 36.796244 9pointTrailingMovingAverage 1346.2783
Alpha=0.3,SimpleExponentialSmoothing 47.504821 ARIMA(0,1,2) 1374.9769
RegressionOnTime 51.433312 ARIMA(1,1,1) 1461.6785
SimpleAverageModel 53.460570 Alpha=0.3,SimpleExponentialSmoothing 1935.5071
NaiveModel 79.718773 NaiveModel 3864.2793
Alpha=0.3,Beta=0.1,DoubleExponentialSmoothing 98.653317 Alpha=0.3,Beta=0.3,DoubleExponentialSmoothing 18259.110

• From the above table we see that the lowest • From the above table we see that the lowest score
score is 10.945435. is 383.15568.
• Obtained from triple exponential smoothing • Obtained from triple exponential smoothing
model. model.
• Which was executed on different alpha, beta • Which was executed on auto/manual fit
and gamma values ranging from 0.3 to 1.0. parameters alpha, beta and gamma values.
• Parameters having lowest score alpha = 0.3, • Whose smoothing level(alpha) = 0.154,
beta = 0.4 and gamma = 0.3. smoothing trend(beta) = 1.307 and smoothing
seasonality(gamma) = 0.371.
9. Based on the model-building exercise, build the most optimum model(s) on the
complete data and predict 12 months into the future with appropriate
confidence intervals/bands.

ROSE WINE SPARKLING WINE

• From the previous answer we observe that • From the previous answer we observe that
Triple Exponential Smoothing has the least Triple Exponential Smoothing has the least
RMSE score. RMSE score.
• It falls under most optimum model compared to • It falls under most optimum model compared to
other models. other models.
• The model is calculated with the parameters • The model is calculated with the parameters
having alpha =0.3, beta = 0.3, gamma =0.3. having alpha =0.154, beta = 1.307, gamma
• The upper and lower bands are calculated with =0.371.
95% accuracy. • The upper and lower bands are calculated with
• The final TEST RMSE SCORE = 24.2665. 95% accuracy.
• The final TEST RMSE SCORE = 353.9124
10.Comment on the model thus built and report your findings and suggest the
measures that the company should be taking for future sales.
➢ Time series analysis involves understanding various aspects about the inherent nature of the series so that
you are better informed to create meaningful and accurate forecasts
➢ Any time series may be split into the following components: Base Level + Trend + Seasonality + Error.

ROSE WINE SPARKLING WINE

• Rose sales shows decrease in trend compared • Sparkling sales shows stabilized values.
to the previous years. • December month shows the highest sales.
• December month shows the highest sales. • The models are built and are chosen based on
• The models are built and are chosen based on the least RMSE score.
the least RMSE score. • The sales of Sparkling wine is seasonal and
• The sales of Rose wine is seasonal and also also had trend. Therefore, the company cannot
had trend. Therefore, the company cannot have have the same stock throughout the year.
the same stock throughout the year. • The company should use prediction results to
• The company should use prediction results to plan about future stock.
plan about future stock.

END

Time Series Project
100% (3)
Time Series Project
45 pages
Machine Learning Report
92% (12)
Machine Learning Report
42 pages
SMDS-unit-3
No ratings yet
SMDS-unit-3
45 pages
FIFA WORLD Cup Kaushalkumar
No ratings yet
FIFA WORLD Cup Kaushalkumar
33 pages
UNDERSTANDING SHOPPER BEHAVIOUR AND EVALUATION OF CUSTOMER EXPERIENCE AT SELECTED STORES OF RELIANCE RETAIL in
100% (4)
UNDERSTANDING SHOPPER BEHAVIOUR AND EVALUATION OF CUSTOMER EXPERIENCE AT SELECTED STORES OF RELIANCE RETAIL in
51 pages
STATA Graphics
No ratings yet
STATA Graphics
35 pages
Multivariate Data Analysis
100% (3)
Multivariate Data Analysis
7 pages
Business Analysis and Econometric Application: Poonam Singh National Institute of Industrial Engineering
No ratings yet
Business Analysis and Econometric Application: Poonam Singh National Institute of Industrial Engineering
13 pages
Time Series Forecasting: Group Assignment - Group 5: Answer
100% (2)
Time Series Forecasting: Group Assignment - Group 5: Answer
29 pages
Time Series Forecasting Business Report
No ratings yet
Time Series Forecasting Business Report
42 pages
Polit LN Ch01
No ratings yet
Polit LN Ch01
39 pages
Business Analysis Report: SQL Lite and Mysql Project
76% (21)
Business Analysis Report: SQL Lite and Mysql Project
11 pages
An Introduction To Data Catalogs The Future of Data Management
No ratings yet
An Introduction To Data Catalogs The Future of Data Management
23 pages
MRA Project MIlestone1
83% (18)
MRA Project MIlestone1
29 pages
MRA Project Milestone 2
71% (17)
MRA Project Milestone 2
20 pages
SANDYA VB TIME SERIES FORECASTING PROJECT - HTML PDF
90% (20)
SANDYA VB TIME SERIES FORECASTING PROJECT - HTML PDF
196 pages
Business Report TSF - Rose DataSet
100% (4)
Business Report TSF - Rose DataSet
52 pages
Adv Stats Proj
95% (38)
Adv Stats Proj
25 pages
Time Series Forcast
No ratings yet
Time Series Forcast
18 pages
Ent&Startup Unit 2
No ratings yet
Ent&Startup Unit 2
18 pages
U02Lecture06 Regression
No ratings yet
U02Lecture06 Regression
25 pages
FRA Milestone 1 Jupyter Notebook PDF
100% (3)
FRA Milestone 1 Jupyter Notebook PDF
42 pages
Time Series Forecasting Business Report: Name: S.Krishna Veni Date: 20/02/2022
100% (1)
Time Series Forecasting Business Report: Name: S.Krishna Veni Date: 20/02/2022
31 pages
Cambridge Standard 12 Chapter 6
No ratings yet
Cambridge Standard 12 Chapter 6
11 pages
MATH 1281 Written Assignment Unit 6
No ratings yet
MATH 1281 Written Assignment Unit 6
3 pages
Anomaly Detection in Network Traffic For Cybersecurity
No ratings yet
Anomaly Detection in Network Traffic For Cybersecurity
9 pages
Mathematical-Economics Solved MCQs (Set-4)
No ratings yet
Mathematical-Economics Solved MCQs (Set-4)
8 pages
Sta 630 Online Quiz 1
No ratings yet
Sta 630 Online Quiz 1
29 pages
Time Series Forecasting - ShoeSales - Business Report - Divjyot Shah Singh
100% (5)
Time Series Forecasting - ShoeSales - Business Report - Divjyot Shah Singh
38 pages
Data Visualisation - Car Claim Insurance Project
100% (5)
Data Visualisation - Car Claim Insurance Project
6 pages
Effect of Welfare Measure On Employee Morale at Hidesign
No ratings yet
Effect of Welfare Measure On Employee Morale at Hidesign
64 pages
Project Time Series Forecasting
100% (1)
Project Time Series Forecasting
53 pages
Cold Storage Assignment Solution Ankur Jain
75% (8)
Cold Storage Assignment Solution Ankur Jain
6 pages
Biometry course outline
No ratings yet
Biometry course outline
3 pages
pyq time series analysis
No ratings yet
pyq time series analysis
2 pages
Customer Loyalty Marketing Research - A Comparative Approach Between Hospitality and Business Journals
No ratings yet
Customer Loyalty Marketing Research - A Comparative Approach Between Hospitality and Business Journals
12 pages
Tugas Rutin 1
No ratings yet
Tugas Rutin 1
5 pages
Examining The Dimensions of Rural Economic Development in South Sudan
No ratings yet
Examining The Dimensions of Rural Economic Development in South Sudan
7 pages
MRA Project As On 23rd Feb-2020
93% (14)
MRA Project As On 23rd Feb-2020
29 pages
An Analysis of Outlier Detection Through Clustering Method
No ratings yet
An Analysis of Outlier Detection Through Clustering Method
6 pages
Spurious Relationship
No ratings yet
Spurious Relationship
5 pages
Tableau - Project: Sandya VB
88% (8)
Tableau - Project: Sandya VB
19 pages
Abhay Singh
No ratings yet
Abhay Singh
2 pages
FRA Milestone1 - Maminulislam
100% (4)
FRA Milestone1 - Maminulislam
23 pages
Standard Deviation B Pharma
No ratings yet
Standard Deviation B Pharma
8 pages
EXTENDED PROJECT-Shoe - Sales
100% (6)
EXTENDED PROJECT-Shoe - Sales
28 pages
ITS62604 Tutorial 6 (Answer)
No ratings yet
ITS62604 Tutorial 6 (Answer)
2 pages
2007 How To Write A Systematic Review
No ratings yet
2007 How To Write A Systematic Review
7 pages
Predictive Model: Submitted by
100% (3)
Predictive Model: Submitted by
27 pages
Workplace Policies and Procedures
No ratings yet
Workplace Policies and Procedures
3 pages
Time Series Project
50% (4)
Time Series Project
2 pages
ML Ts Proj
100% (9)
ML Ts Proj
58 pages
FRA Business Report
100% (1)
FRA Business Report
21 pages
Vaibhav Kumar MRA Project Milestone 2
No ratings yet
Vaibhav Kumar MRA Project Milestone 2
18 pages
PM - ExtendedProject - Business Report
100% (4)
PM - ExtendedProject - Business Report
35 pages
Data Visualization Project Shreya
100% (2)
Data Visualization Project Shreya
27 pages
Linear - Regression - Assignment: Problem Statement
100% (3)
Linear - Regression - Assignment: Problem Statement
24 pages
SQL Prject
No ratings yet
SQL Prject
8 pages
Data Visualization in Tableau - Car Insurance Claim Project
50% (2)
Data Visualization in Tableau - Car Insurance Claim Project
51 pages
Project SQL
No ratings yet
Project SQL
2 pages
DVT Alternate Project
50% (2)
DVT Alternate Project
1 page
Business Report Machine Learning-1
100% (7)
Business Report Machine Learning-1
60 pages
Vaibhav Kumar MRA Project Milestone 1
100% (3)
Vaibhav Kumar MRA Project Milestone 1
29 pages
PROJECT - Time Series Forecasting by Akshay Kharote PDF
100% (2)
PROJECT - Time Series Forecasting by Akshay Kharote PDF
85 pages
Shivani Pandey TSF
100% (1)
Shivani Pandey TSF
32 pages
Advanced Statistics: Business Report Ranvijay Sharma
No ratings yet
Advanced Statistics: Business Report Ranvijay Sharma
16 pages
FRA Assignment
100% (1)
FRA Assignment
31 pages
Project Avinash Ray DVT Car Insurance
No ratings yet
Project Avinash Ray DVT Car Insurance
4 pages
TSF - Graded Quiz 4 - Great Lakes Institute
No ratings yet
TSF - Graded Quiz 4 - Great Lakes Institute
5 pages
Financial Risk Analysis Project Report Financial Risk Analysis Project Report
100% (2)
Financial Risk Analysis Project Report Financial Risk Analysis Project Report
29 pages
Capstone Proect Notes 2
100% (2)
Capstone Proect Notes 2
16 pages
Lifi
100% (1)
Lifi
16 pages
Mra Project: Prepared By: Deepak Batabyal Date:-09 Feb 2020
100% (2)
Mra Project: Prepared By: Deepak Batabyal Date:-09 Feb 2020
32 pages
DVT Group Assignment PDF
100% (1)
DVT Group Assignment PDF
14 pages
MRA Project Milestone 2
100% (2)
MRA Project Milestone 2
31 pages
PHD Thesis Proposal PDF Version
No ratings yet
PHD Thesis Proposal PDF Version
2 pages
MRA Project Milestone 1 PDF
No ratings yet
MRA Project Milestone 1 PDF
1 page
Predective Modellig Project
100% (1)
Predective Modellig Project
18 pages
Predictive Modelling Project 1 PDF
50% (2)
Predictive Modelling Project 1 PDF
38 pages
FRA Project Business Report
100% (2)
FRA Project Business Report
27 pages
Facebook Comment Volume Prediction
100% (1)
Facebook Comment Volume Prediction
12 pages
Week 7 Project Report 1 and 2
No ratings yet
Week 7 Project Report 1 and 2
10 pages
Executive Sumary - Rajarshi Das (Data Visualization Using Tableau Project)
100% (1)
Executive Sumary - Rajarshi Das (Data Visualization Using Tableau Project)
11 pages
Business Report Problem 2
No ratings yet
Business Report Problem 2
10 pages
Project 7 - DVT - Manoj
No ratings yet
Project 7 - DVT - Manoj
1 page
Mra Project
No ratings yet
Mra Project
12 pages
Capstone Project
100% (1)
Capstone Project
7 pages
Quality Circle
No ratings yet
Quality Circle
1 page
FRA Report
100% (1)
FRA Report
30 pages
MRA Project ML 1: Abhishek Kapoor Dsba Aug A20
100% (1)
MRA Project ML 1: Abhishek Kapoor Dsba Aug A20
47 pages
Harshini Week 8 Doc PDF
No ratings yet
Harshini Week 8 Doc PDF
10 pages
Problem 1:: Readingcsv PD Read - Excel (Readingcsv) Readingcsv Head
No ratings yet
Problem 1:: Readingcsv PD Read - Excel (Readingcsv) Readingcsv Head
18 pages
Boston Condo Sale Story
0% (1)
Boston Condo Sale Story
11 pages
Arnab Chowdhury As1
No ratings yet
Arnab Chowdhury As1
12 pages
Milestone 1
No ratings yet
Milestone 1
2 pages