Case Study of Sales Forecasting
Case Study of Sales Forecasting
Case Study of Sales Forecasting
2 Data
The main time series comprises the number of registrations of new automobiles in
Germany for every time period. Hence, the market sales are represented by the number of registrations of new automobiles, provided by the Federal Motor Transport
Authority.
The automobile market in Germany increased extraordinary by the reunification of
the two German states in 1990. This can only be treated as a massive shock event
which caused all data prior to 1992 to be discarded. Therefore, we use the yearly,
monthly and quarterly registrations of the years 1992 to 2007. The sales figures of
these data are shown in Figures 1-3 and the seasonal pattern of these time series is
clearly recognizable in the last two figures.
Our choice of the exogenous parameters fits the reference model for the automobile
market given by Lewandowski [2]. In this model the following properties are considered:
a) Variables of the global (national) economy
b) Specific variables of the automobile market
c) Variables of the consumer behavior w.r.t. the changing economic cycle
d) Variables that characterize the influences of credit restrictions or other fiscal
measures concerning the demand behaviour in the automobile industry.
Based on this model, the following ten market influencing factors, shown in Table 1,
are chosen [6] (exogenous parameters )
In Table 1, it is shown that not all exogenous parameters used are published on a
monthly, quarterly, and yearly base. In cases in which the necessary values are not given
directly, the following values are taken:
Yearly data analysis: The averages of the Unemployment and Interest Rate of each
year are used.
Quarterly data analysis: The average of the Unemployment and Interest Rate of
each quarter is used. For the parameters Consumer Price Index and Petrol Charge the
values of the first months of each quarter are taken.
Monthly data analysis: In the case of the quarterly published parameters, a linear
interpolation between the values of two sequential quarters is used.
3 Methodology
3.1 Time Series
Time Series Model
In this contribution an additive model with the following components to mimic the
time series is applied.
For the estimation of the seasonal component there are many standard methods like
exponential smoothing [7], the ASA-II method [8], the Census X-11 method [9], or
the method of Box and Jenkins [10]. In this contribution, the Phase Average method
[11] is used because it is quite easy to interpret. To get accurate results with this
method, the time series must have a constant seasonal pattern over time and it has to
be trendless. A constant seasonal pattern is given in our time series. To guarantee the
trend freedom, a trend component is estimated univariately and subtracted before the
seasonal component is estimated. The latter is done by using a method which is close
to the moving average method [12]. Because of the small given data set, differing
from the standard method, the following formula is used to compute the mean m of a
period:
Although a univariate trend estimation would be easier to explain, this route is not
followed in this contribution because the assumption that the registrations of new
automobiles in Germany are not influenced by any other parameter is not justified.
Hence, the most important component, the trend, is estimated multivariately.
has proven to provide suitable results in other industrial projects [16] [17]. However,
this choice might be altered in future publications.
Calendar Component
The calendar component considers the number of working days within a single period. For the estimation of the calendar component p
the total number of days of the period t.
Then the absolute values p
The error component is estimated with the Autoregressive-Moving-Average-Process
of order two [8]. A condition to use this method is the stationarity of the error component. This condition is tested by the Kwiatkowski-Phillips-Schmidt-Shin Test (KPSSTest) [18]. In the cases of non-stationarity, it is set to zero.
3.2 Data Pre-processing
Time lag
In reality, external influencing factors do not always have a direct effect on a time
series, but rather this influence is delayed. The method used to assign the time lag is
based on a correlation analysis.
Time lag estimation
If the value y
t
of a time series Y has its influence on the time series X in t + s, the
time lag of the time series Y is given by the value s. Then the correlation between the
main time series X with its values x
1
,...,x
T
and all of the k secondary time series Y
i
,
i=1,...,k, with its values y
i
1
,...,y
i
T
, is computed. Afterwards the secondary time series is
shifted by one time unit, i.e. the value y
i
t
becomes the value y
i
t+1
and the correlation
between the time series x
2
,...,x
T
and y
i
1
,...y
i
T-1
is assigned. This shifting is repeated up
to a pre-defined limit. The number of shifts of every highest correlation between the
main and a secondary time series is the value of the time lag of this secondary time
series.
Smoothing the exogenous parameters by using the time lag
It is assumed that every value y
t
of an exogenous parameter Y is influenced by its
past values. The time lag indicates how many past data points influence the current
value. This results in the following method:
Let s be the time lag of Y = y
1
,...,y
T
, then the current value y
t
is calculated by the
weighted sum
=
+=
=
sty
Tsty
y
t
s
j
jt
js
t
,...,1
,...,1)1(
1
where )1,0(
is the weighting factor.
Normalisation
To achieve comparability between factors which are not weighted in the same way,
these factors have to be normalized to a similar range. With that step numerical errors
can be tremendously reduced. As normalization method, the z-Transformation is
applied. It refines the mean value to zero and the standard deviation to one: Let v
t
be
any factor at a particular time t, tT, then the z-Transformation is calculated by
,
)(
)(
,
v
vv
v
t
normalizedt
=
where )(v
methods - the Multiple Linear Regression and the Support Vector Machine - is chosen
for dimension reduction. Compared with other methods, this method provides more
explicable results even for small data sets. Additionally, forecasts with the PCA are
calculated as a reference model for our results. The PCA results are not easily explicable, as the PCA-transformed parameters can not be traced back to the original ones.
Therefore the results have not been considered for the final solution
4 Evaluation Workflow
The data, i.e. the main time series and the exogenous parameters, are divided into
training and test data. The methods introduced in Chapter 3 are used to generate a
model on the training data, which is evaluated by applying it to the test data. The
complete evaluation workflow is shown in Figure 4.
Step 1: Data Integration: The bundling of all input information to one data source is
the first step in the workflow. Thereby, the yearly, quarterly or monthly data ranges
from 1992 to 2007. The initial data is assumed to have the following form:
Main Time Series Secondary Time Series
x
t
=m
t
+s
t
+p
t
+e
t
, t = 1, ..., T y
i
t
, t = 1, ..., T and i = 1, ..., k
Step 2: Data Pre-processing: Before the actual analysis, an internal data preprocessing is performed, wherein special effects contaminating the main time series
are eliminated. For example, the increase of the German sales tax in 2007 from 16%
Fig. 4. Evaluation Workflow: First, the data is collected and bundled. After a data preprocessing, it is split into training and test set. The model is built on the training set and the
training error is calculated. Then the model is applied to the test data. Thereby, the new
registrations for the test time period are predicted and compared with the real values and based on
this the test error is calculated.
to 19% led to an expert estimated sales increase of approximately 100.000 automobiles in 2006. Hence, this number was subtracted in 2006 and added in 2007. Furthermore, the exogenous parameters were normalized by the z-Transformation.
The normalized data are passed on to an external data pre-processing procedure.
The method used is the Wrapper Approach with an exhaustive search. Since we use a
T-fold cross-validation (leave-one-out) to select the best feature set, it should be noted
that we implicitly assumed independency of the parameters. As regression method for
the feature evaluation a Linear Regression is applied in the case of linear trend esti-
mation and a Support Vector Machine in the case of non-linear trend estimation.
The elimination of the special effects in monthly data is not applicable because the
monthly influences can be disregarded.
Step 3: Seasonal Component: The estimation of the seasonal component is done by
the Phase Average method. In this contribution, the seasonal component is estimated
before the trend component. The reason is that the Support Vector Machine absorbs a
model. These values are given by the sum of the values of the trend, seasonal, and
calendar component. In the case of a stationary error component, the values estimated
by the ARMA model are added. The mean ratio between the absolute training errors
and the original values gives the Mean Absolute Percentage Error (MAPE).
Let x
i
, i=1,...,T, be the original time series after the elimination of special effects
and z
i
, i=1,...,T, the estimated values. Then, the error functions considered are represented by the following formulas:
Mean Absolute Error
=
=
T
i
iiMAE
zx
T
E
1
1
=
T
i
i
ii
MAPE
x
zx
T
E
1
1
Step 8: Forecast: The predictions for the test time period are obtained by summing
up the corresponding seasonal component, the trend component based on the exogenous parameters of the new time period and the respective multivariate regression
method, and the calendar component. In the case of a stationary error component, the
values predicted by the ARMA process are added, too.
Step 9: Test error: The differences of the predictions and the original values of the
test set lead to the test errors. Its computation conforms exactly to the computation of
the training errors.
The results of the yearly, monthly, and quarterly data which generate the smallest
errors using the PCA-transformed exogenous parameters are shown in Table 2. A
training period of 14 years is used. In all cases the SVM gives much better results
than the MLR [22]. To optimize the parameters of the SVM, the Grid search algorithm is applied.
In contrast, the results for the non-linear models shown in Table 3 have a better
quality compared with the PCA analysis. That originates from the saturation effect
generated by the regression in conjunction with the Support Vector Machine. It leads
to the fact that data points far off can still be reasonably predicted. Another advantage
(and consequence) of this approach is the fact that a parameter reduction does not
severely lower the quality of the predictions down to a threshold value of five parameters. Models with such a low number of parameters offer the chance to easily
explain the predictions, which appeals to us.
A general problem, however, is the very limited amount of information that leads
to a prediction of, again, limited use (only annual predictions, no details for shortterm planning). Therefore, an obvious next step is to test the model with the best statistics available, i.e. with monthly data.
5.2 Monthly Model
In this case, the Feature Selection resulted in the following: For the linear trend
model, only the parameters Model Policy and Latent Replacement Demand were
relevant while in the non-linear model, new car registrations were significantly influenced by the Gross Domestic Product, Disposal Personal Income, Interest Rate Model
Policy, Latent Replacement Demand, Private Consumption, and Industrial Investment
Demand, i.e. a superset of parameters of the linear case.
The results given in Table 4 are again first compared to the PCA analysis, cf.
Table 2. As for the yearly data, the non-linear models are superior to both the results
for the PCA analysis and for the linear model. Most accurate predictions can be
achieved for the non-linear model with all parameters. However, the deviations are
deemed too high and are therefore unacceptable for accurate predictions in practice.
One reason for this originates from the fact that most parameters are not collected
and given monthly, but need to be estimated from their quarterly values. Additionally, the time lag of the parameters can only be roughly estimated and is assumed to
be a constant value for reasons of feasibility.
The results for the training and test errors for the models based on quarterly data
are given in Table 5. Again, the linear model is inferior compared to the PCA (cf.
Table 2) and compared to the non-linear model. The difference between training and
test errors for the linear model with all parameters is still severe. Furthermore, the
total error of the linear model with reduced parameters might look small. However, acloser
look reveals that this originates only from error cancellation [22]. Altogether
this indicates that the training set is again too small to successfully apply this model
for practical use.
The results for the non-linear model, in turn, are very satisfying. They are better
than the results from the PCA analysis and provide the best absolute test errors of all
investigated models. This also indicates that all parameters are meaningful contributions for this kind of macro-economic problem. In a previous work, we have shown
that a reduction to the six most relevant parameters would more than double the test
error [22].
5.4 Summary
It can be clearly stated that the SVM provides a superior prediction (less test errors)
compared to the MLR. This illustrates that the mutual influence of the parameters is
essential to achieve accurate forecasts. In order to identify the overall best models, the
errors of the training and test sets are accumulated to annual values. The results for
the best model based on yearly, monthly, and quarterly data are visualized in Figure 5.
Fig. 5. Graphical illustration of the absolute errors of the best models for a 15 years training
period, cumulated to years: On yearly and monthly data the non-linear model with reduced
parameters, on quarterly data the non-linear model with all parameters
During the training period, the best model for the monthly data is significantly
worse compared to both other models. The same holds for the test period. Here, the
best quarterly model is significantly superior to the best yearly model, with roughly
half of the test error. It can be observed that the quarterly model does not only deliver a
minor test error, but at the same time provides higher information content than the
yearly model. Both models generate very low errors during the training period, showing again that the set of parameters is well adapted to our problem. The only drawback
of the best quarterly model is the fact that all exogenous parameters are necessary,
making the model less explicable.
6 Discussion and Conclusion
Based on the results of Chapter 5, the three questions mentioned at the beginning of
this contribution can now be answered.
1.
Is it possible to create a model which is easy to interpret and which at the
same time provides reliable forecasts?
To answer this question, a more discriminate approach must be taken. Considering
only the used additive model, the answer is yes, because the additive model has
given better results than the Principal Component Analysis.
By looking at the different methods used in our model, answering the question becomes more difficult. Simple and easily explicable univariate estimations are used for
the seasonal, calendar and error component but a more difficult multivariate method
for the greatest and most important component, the trend. Thereby, the results given
by the more easily explicable Multiple Linear Regression are less favorable than the
results given by the less explicable Support Vector Machine. But in general, the chosen model is relatively simple and gives satisfying results in consideration of the quality of the forecast.
2.
Which exogenous parameters influence the sales market of the German
automobile industry?
Here, it has to be differentiated between the yearly, monthly and quarterly data. In
the yearly model, only a few exogenous parameters are needed to get satisfying results. But it is not possible to generalize the results, because of the very small data set.
Also, in the monthly model also less exogenous parameters can be used. But most of
the exogenous parameters are not published monthly, so that the exact values of these
parameters are not given, leading to inadequate results. However, in the quarterly
model, where the highest number of exogenous parameters is explicitly given, a reduction of the exogenous parameters in our tested model is not possible without decreasing the quality of the results.
3.
Which collection of data points, yearly, monthly or quarterly data, is the
most suitable one?
Yearly, monthly, and quarterly data are regarded. The problems in the yearly
model are the very small data set and the small information content of the forecast.
The problems presented by the monthly model include training and test errors which
are much higher than in the yearly and quarterly models. Probable causes for the
weakness of the monthly model are the inexact nature of the monthly data since most
of the exogenous parameters are not collected monthly. The problems of the yearly
model as well as the problems of the monthly model can be solved by using the quarterly model. Therefore, the quarterly model is the superior method, even though no
reduction of exogenous parameters is possible in this model.
To conclude, it should be pointed out that forecasts are always plagued by uncertainty. There can be occurrences (special effects) in the future, which are not predictable or whose effects can not be assessed. The current financial crisis, which led to
lower sales in the year 2008, is an example for such an occurrence. Because of this
fact, forecasts can only be considered as an auxiliary means for corporate management and have to be interpreted with care [23].