SAP PA Automated TimeSeriesTutorial
SAP PA Automated TimeSeriesTutorial
The data used in this guide is publicly available so that the reader
can follow hands-on and practice by carrying out the same
forecasts.
December 2015
Andreas Forster
Predictive Presales Expert
SAP Switzerland
andreas.forster@sap.com
www.sap.com
TABLE OF CONTENTS
INTRODUCTION ............................................................................................................................................... 3
HANDS-ON IMPLEMENTATION ...................................................................................................................... 4
Background ...................................................................................................................................................... 4
Pre-Requisites .................................................................................................................................................. 4
Initial Forecast ................................................................................................................................................. 4
CONCEPTS BEHIND THE SCENE ................................................................................................................ 14
EXTENDED FORECAST WITH ADDITIONAL PREDICTORS ...................................................................... 15
HINTS AND TIPS / MORE INFORMATION .................................................................................................... 27
DATA DESCRIPTION ..................................................................................................................................... 28
Hands-On Tutorial SAP Predictive Analytics, Automated Mode: Time Series Forecasting
INTRODUCTION
You may know, or guess, that the “Automated Analytics” in SAP Predictive Analytics is all about automating
the process of creating predictive models. This tutorial gives some hands-on introduction and practice with
time series forecasting, which is part of the “Automated Analytics” functionality. We start with a simple
example and built on this with a more complex scenario.
The time series we are using are the daily numbers from the London bicycle hire scheme. We use historic
rental numbers to forecast future rental numbers. Think of it as a demand forecast.
In this tutorial we will be looking at only one time series, the total numbers of bikes rented per day. SAP
Predictive Analytics can also automatically forecast multiple time series, ie rentals by location. That concept
is described in another tutorial1, which you may want to read after having gone through this document.
“Thank you”s go to Ben Lee-Rodgers for sharing detailed recordings from his weather station in London and
Antoine Carme for his expertise on time series forecasting.
1Hands-On Tutorial SAP Predictive Analytics, Automated Mode: Multiple Time Series
https://scn.sap.com/docs/DOC-68223
3
Hands-On Tutorial SAP Predictive Analytics, Automated Mode: Time Series Forecasting
HANDS-ON IMPLEMENTATION
Background
The city of London, United Kingdom, provides a bicycle hire scheme. There are over 700 locations spread
around town where bikes can be rented out and returned. More than 10.000 bicycles are available. The
Greater London Authority is sharing daily statics on the number of bikes rented out. We will use this data
ranging from January 2011 to September 2011 to forecast future rental numbers. Then in a second step we
enrich this data with additional information, such as weather data, to produce a more accurate forecast.
Please see the separate chapter “DATA DESCRIPTION” for more information on the data.
Pre-Requisites
You need to have an installation of SAP Predictive Analytics, which includes the time series forecasting used
in this tutorial. This guide has been written with SAP Predictive Analytics 2.3. Evaluation copies are currently
available on the SAP Community Network.2
The data used in this guide is available as download on the SAP Community Network (SCN).3
Initial Forecast
Start by opening up SAP Predictive Analytics.
2SCN, http://scn.sap.com/community/predictive-analytics
3Hands-On Tutorial SAP Predictive Analytics, Automated Mode: Time Series Analysis,
http://scn.sap.com/docs/DOC-69324
4
Hands-On Tutorial SAP Predictive Analytics, Automated Mode: Time Series Forecasting
Go into “Create a Time Series Analysis”. First you need to specify the data source. In our example we work
with a flat file.
Ensure the “Data Type” drop down is set to “Text Files”. Then click the first “Browse” button on the right hand
side select the folder you saved the files into. Finally, click the second “Browse” button and point the “Data
Set” to the file LondonBikeHire.csv.
5
Hands-On Tutorial SAP Predictive Analytics, Automated Mode: Time Series Forecasting
In the “Data Description” windows click “Analyze” so that SAP Predictive Analytics analyses the file’s data
structure.
You see the columns “Day” and “Hires” from the file. A third column “KxIndex” has been added by the tool for
internal processing.
It is crucial that the Storage type for the “Day” variable has been identified as “date”. This is the case for our
dataset, so all is fine. Should you want to try out other datasets and the variable has not been identified as
“date”, then see the chapter “HINTS AND TIPPS” to specify your data’s date format.
To see the historic data click the “View Data” icon. The first 100 rows are displayed. Each row shows a day
with the number of rentals.
6
Hands-On Tutorial SAP Predictive Analytics, Automated Mode: Time Series Forecasting
Back in the “Data Description” window you have to change the “Order” of the “Day” column to 1. The data
has to be ordered by date in descending order (so most recent dates are at the bottom). This flag indicates
that the data has indeed been sorted appropriately.
Click “Next”.
No changes should be needed in the “Selecting Variables” screen. The “Day” variable has been
automatically entered as “Time” indicator and the “Hires” variable has been selected automatically as
“Target”.
You can see on the bottom left that the last training date is September 20 2015. This is the last date in our
dataset.
Click “Next”.
7
Hands-On Tutorial SAP Predictive Analytics, Automated Mode: Time Series Forecasting
Set the “Number of Forecasts” to 10, so that you forecast until the end of September 2015.
Click “Generate” and SAP Predictive Analytics analyses the time series and forecasts the desired 10 days.
8
Hands-On Tutorial SAP Predictive Analytics, Automated Mode: Time Series Forecasting
Scroll down and you see the “Horizon-wide MAPE” of 0.197. MAPE is a common term in time series
forecasting and stands for Mean Absolute Percentage Error. The MAPE is calculated as follows:
Obviously we want to reduce this error as much as possible. We will address this in the next chapter
“Extended Forecast with Additional Predictors”.
Also note the “Model Components”. The model found a linear trend in the data, we will see this trend later
also in chart. Similarly, the model found two cycles in the data. These are patterns that repeat over time. The
cycles “dayOfYear” and “dayOfWeek” specify that both yearly and weekly cycles were found. We will also
see these later on in more detail.
9
Hands-On Tutorial SAP Predictive Analytics, Automated Mode: Time Series Forecasting
The green line shows the actual values as provided by the city of London. The blue line shows the forecast
produced by SAP Predictive Analytics. Overall there is a strong yearly pattern. Not surprisingly, rental
numbers are much higher in summer than in winter. The red line rising from left to right indicates a rising
trend over time. Rental numbers are clearly rising over time.
Values marked with a red rectangle indicate dates, in which the forecast was significantly different to the
actual value. Accordingly, these outliers increase the model’s MAPE. With a richer dataset, ie additional
predictor columns that describe the weather for instance, we can hope to better catch the data’s pattern. We
will see this in the next chapter.
10
Hands-On Tutorial SAP Predictive Analytics, Automated Mode: Time Series Forecasting
To get a closer look at the forecast you can zoom into the data by drawing a rectangle with the mouse over
the area of interest. The following screenshot shows the most recent data with the forecasted values. Just
change the “Display Time Unit” to “Week(s)”. You can clearly see the weekly pattern that was identified
earlier. Rental numbers are highest during the middle of the working week and lowest on the weekend.
The area shaded in blue on the right hand side around the forecasts of future values specifies the confidence
interval of the prediction (twice the standard-deviation either side). Simply put, the more narrow this range,
the more confident we are in the forecast.
11
Hands-On Tutorial SAP Predictive Analytics, Automated Mode: Time Series Forecasting
To save the forecast, click “Previous”, then “Save/Export” and “Apply Model”.
You can keep most of the default settings. Just specify the file name to write the forecasts into:
LondonBikeHireForecast_1.csv.
12
Hands-On Tutorial SAP Predictive Analytics, Automated Mode: Time Series Forecasting
By clicking on the familiar “View Output” icon you see a preview of the forecasts. The most important column
is “kts_1”, which contains the day’s forecast. The remaining columns describe various details of the model
and forecast, which we do not need to worry about now.
The dataset was rather basic though in that it consisted only of the day and the date’s value. In the next
chapter we improve the forecast by enriching the dataset with additional variables. Any information about the
individual day that can influence the rental numbers can be helpful, such as temperature or an indicator for
bank holidays.
13
Hands-On Tutorial SAP Predictive Analytics, Automated Mode: Time Series Forecasting
The model might include some or all of the above elements. Any delta that is not explained by the model is
called a residual. The aim is obviously to explain as much as possible of the signal. So the smaller the
remaining residual the better the model.
The elements trend, periodic and fluctuations are further explained below.
Trend
A trend describes the long-term evolution of the data. All together 7 different trend models, both deterministic
and stochastic (using probability distributions), are estimated.
Periodic
Next periodic components are investigated. These represent either cycles or seasons.
- Cycles describe a fixed periods, ie a week or year. Cycles are also evaluated for extra-predictable
variables, of type ordinal or continuous (not for nominal).
- Seasonal functions describe calendar events, such as “day of month”, “week of month”, “month of
year”, “day of week”, ….
When investigating these periods, the previously calculated trends are also taken into account. Subtracting
an individual trend from the signal results in a time series that does not have a long-term evolution anymore.
Hence periodic elements become apparent.
Fluctuation
Deducting trend and periodic elements from the signal might still leave a certain pattern in the data. Such
fluctuations are caught with autoregressive elements.
Residual
Deducting trend, periodic elements and fluctuations from the signal leave the remaining inexplicable element
called the residual.
Once the final model has been selected, it is applied on the historic data to calculate its accuracy, which is
measured as Mean Absolute Percentage Error (MAPE).
A MAPE of 0.12 for instance indicates that the mean absolute percentage error is 12%. So on average, 88%
of the signal was explained by the model.
14
Hands-On Tutorial SAP Predictive Analytics, Automated Mode: Time Series Forecasting
It is very important, that the values of these additional variables must be in the dataset for the dates we want
to forecast. We will see this in a few clicks.
Most steps forecasting the richer dataset are identical to the forecast using the simpler dataset. Go back to
the main screen of SAP Predictive Analytics, in the Modeler section click into “Create a Time Series
Analysis”.
15
Hands-On Tutorial SAP Predictive Analytics, Automated Mode: Time Series Forecasting
Click “Next”. Then click on “Analyze”. You see all columns of the richer dataset. It is good practice to get in
the habit of checking that the time variable has been identified with storage “date”. Also remember to set the
“Order” for the “Day” variable to 1.
16
Hands-On Tutorial SAP Predictive Analytics, Automated Mode: Time Series Forecasting
Close this window and click “Next”. You may see the following warning.
17
Hands-On Tutorial SAP Predictive Analytics, Automated Mode: Time Series Forecasting
This messages means that you need to fine-tune the “Data Description” that was analyzed automatically. On
the warning window, click the “Log” tab and scroll to the top of the log.
We will do exactly this. Close that window and click “Previous” to get back to the “Data Description”. Find the
“hmean” column (Index 40) and change the storage from “integer” to “number”.
If you like you can save this modified description as a file for later reuse. When done, click “Next” to continue.
This time no warning should appear. The modification of the data description was successful.
18
Hands-On Tutorial SAP Predictive Analytics, Automated Mode: Time Series Forecasting
You should just need to change the “Target” variable. Remove the existing variable by clicking on the icon to
the left of it. Then select the “Hires” variable on the left and select it as “Target”.
At the bottom left you see a new option, that is only available when you have additional predictor variables.
Click on “Select Date…” and you see the last record used for training the model. This is the last row that has
a value in the target variable “Hires”, 20 th September 2015.
You also see additional rows for future dates beyond the last training date are in the dataset. This is very
important when using additional predictor variables. Each date you want to forecast must be added to the
dataset with values entered for these predictor variables.
19
Hands-On Tutorial SAP Predictive Analytics, Automated Mode: Time Series Forecasting
You also see that the “Maximum Forecast” is 10. This means you can forecast 10 days into the future. With
our dataset 10 is the maximum, as we have 10 future dates in the dataset, from 21 st September to 30th
September 2015. Set the “Number of Forecasts” to 10.
20
Hands-On Tutorial SAP Predictive Analytics, Automated Mode: Time Series Forecasting
The model generation will take longer because more complex models are taken into account. When
complete, scroll down to see the MAPE.
The additional predictor variables have pushed the MAPE down from 0.197 to 0.13. So the model is
considerably more accurate than before.
Interestingly, no cycles are used. The new variables describe the data’s pattern than the weekly or yearly
cycles that were used in the earlier model!
21
Hands-On Tutorial SAP Predictive Analytics, Automated Mode: Time Series Forecasting
Now look at the forecast by clicking “Next” and “View Forecasts”. Zoom into the most recent data.
You can compare this display with the earlier forecast. This forecast using the additional predictors clearly
describes the data even better. Overall the forecast is very close to the actual values. Fewer outliers than
before remain. It turns out that the two outliers with larger rental numbers are dates on which the London
Underground was on strike.
Click “Previous” to get back to the main screen. Now we want to understand how the additional predictors
impact the model. Click into “Regressions: Contributions by Variables”.
The predictors that were selected for the model are displayed in descending importance. The most important
variable “Holiday” is separating working days (Monday to Friday) from non-working days (Saturday, Sunday,
bank holiday). The most important weather variable is “tmean”, the mean temperature.
22
Hands-On Tutorial SAP Predictive Analytics, Automated Mode: Time Series Forecasting
In order to understand how these variables relate to the rental numbers click on “Previous” and go into
“Statistical Reports”.
Here you find very detailed information on the data and the model. Go into the “Cross Statistics for
Continuous Target(s)” and set the “Variable” to “WorkingDay”.
This shows for instance the large difference between mean rental numbers on a working day (24,796.8) and
a non-working day (19,190.3).
23
Hands-On Tutorial SAP Predictive Analytics, Automated Mode: Time Series Forecasting
Similarly you can look at different variables, ie “tmean”, the mean temperature. SAP Predictive Analytics has
split the temperature in 20 ranges. Such ranges help producing more robust models. By comparing the
“Target Mean” of these ranges you see that on the warmest day twice as many bikes are rented as on the
coldest days.
Feel free to look at further details on this screen. To help understand the information you can click into the
“Help” menu, which automatically displays explanation the screen that is currently open.
24
Hands-On Tutorial SAP Predictive Analytics, Automated Mode: Time Series Forecasting
Save the forecasted values as before. Go into the “Using the Model” screen.
25
Hands-On Tutorial SAP Predictive Analytics, Automated Mode: Time Series Forecasting
You have completed a comprehensive time series forecast! With that background you can now experiment
with your own data. Just see the next chapter for some further hints and tips.
You can also try to enhance the bike rental forecast with additional columns. Some ideas to improve the
forecast are
- Derive new variables from the given datasets. Maybe a day’s change in temperature has an impact
(tmax – tmin)
- Combining multiple columns through composite variables might help. Maybe the temperature for
instance has a different impact on working days. This tutorial briefly touches on composite
variables.4
- Try to find completely new columns that have an impact.
Please let me know in case you manage to improve the MAPE below 0.12!
Andreas Forster
Predictive Presales Expert
SAP Switzerland
andreas.forster@sap.com
26
Hands-On Tutorial SAP Predictive Analytics, Automated Mode: Time Series Forecasting
When using additional predictor variables, you must have the dates you want to forecast in the training
dataset with the corresponding values of the various predictors. Only the target variable must be
empty for these future dates.
For further information see the help file “Time Series Scenarios” on http://help.sap.com/pa
5Hands-On Tutorial SAP Predictive Analytics, Automated Mode: Multiple Time Series
https://scn.sap.com/docs/DOC-68223
27
Hands-On Tutorial SAP Predictive Analytics, Automated Mode: Time Series Forecasting
DATA DESCRIPTION
The historic rental numbers are shared by “Transport for London”6 under an “Open Government Licence”.7
Ben Lee-Rodgers, who is operating a private weather station in London, kindly contributed the weather
statistics.8
Date-related variables (ie “Workinday”) were produced with a Custom R Component in SAP Predictive
Analytics, Expert Mode.9
LondonBikeHire.csv
Column Description
LondonBikeHire_Extended.csv
Column Description
3 SundayMonthInd Indicates if the date is a Sunday with the weekday’s occurrence count in
the month so far. 0 otherwise.
4 MondayMonthInd Indicates if the date is a Monday with the weekday’s occurrence count in
the month so far. 0 otherwise.
5 TuesdayMonthInd Indicates if the date is a Tuesday with the weekday’s occurrence count in
the month so far. 0 otherwise.
7 ThursdayMonthInd Indicates if the date is a Thursday with the weekday’s occurrence count in
the month so far. 0 otherwise.
8 FridayMonthInd Indicates if the date is a Friday with the weekday’s occurrence count in
the month so far. 0 otherwise.
9 SaturdayMonthInd Indicates if the date is a Saturday with the weekday’s occurrence count in
the month so far. 0 otherwise.
28
Hands-On Tutorial SAP Predictive Analytics, Automated Mode: Time Series Forecasting
29 MonthWorkingDayInd Indicates if working day with the work day’s occurrence count in the
month so far. 0 otherwise.
30 ReverseMonthWorkingDayInd Indicates if working day by counting down the work day’s occurrence
count in the month. 0 otherwise.
31 Last5WDinMonthInd Indicates the month’s last 5 working days by counting them up from 1 to
5. 0 otherwise.
33 Last4WDinMonthInd Indicates the month’s last 4 working days by counting them up from 1 to
4. 0 otherwise.
29
Hands-On Tutorial SAP Predictive Analytics, Automated Mode: Time Series Forecasting
48 rain Rainfall.
30
Hands-On Tutorial SAP Predictive Analytics, Automated Mode: Time Series Forecasting
31