Embedded R Code in SQL Script Running On Sap Hana
Embedded R Code in SQL Script Running On Sap Hana
Embedded R Code in SQL Script Running On Sap Hana
Author(s):
Thorsten Fg
Topic:
SAP HANA
Published:
December 2012
Overview
This tutorial describes how to embed a simple R code in SQL Script running on SAP HANA. The tutorial is divided into
two parts. In the first part, you will do some simple predictive analyses with R. In the second part, you will transfer the
first R native code from the tutorial CONNECT R WITH HANA TO SAVE & READ SOME DATA in embedded R Script
running directly on SAP HANA. By the end of this tutorial, you will have developed two simple R codes embedded in SQL
Script running on SAP HANA.
Prerequisites
o You have access to an SAP HANA system.
o You have SAP HANA Studio installed on your machine.
o You have SAP HANA Clients installed on your machine.
o You have an R Server available.
o You have configured the IP address the of R server in the SAP HANA configuration.
Topics
In this tutorial you will learn how to:
1. Change the R server IP in the SAP HANA configuration
2. Embed some R code in SQL Script
The estimated completion time for this tutorial is 40 minutes.
Using a simple R example, you will learn the basic steps about how to embed R code in SQL Script and run it on SAP
HANA.
After that, open the Internet browser on your Linux VM and visit the following Web site to get your current IP address:
http://myip.dnsdynamic.org/
Keep the IP in mind and minimize the VM. -> Dont close it!!!
After that, start SAP HANA Studio. At this point you, need system rights CATALOG READ and INIFILE ADMIN or you must
be logged in as SYSTEM user.
Double-click on the System displayed in the SAP HANA STUDIO navigator pane. After that, the administration editor will
open. Switch to the configuration tab and choose Indexserver.ini -> Calcengine -> Cer_reserve_addresses. Here,
enter your IP separated from the port 30120 with a colon.
Open a SQL Editor and copy the following code to fill the new table with data:
Please note: Change <YOUR SCHEMA> to your schema name!
SQL code
insert into "<YOUR SCHEMA>"."TICKETS_BY_YEAR" values('20110101',4195);
insert into "<YOUR SCHEMA>"."TICKETS_BY_YEAR" values('20110201',4245);
insert into "<YOUR SCHEMA>"."TICKETS_BY_YEAR" values('20110301',4971);
insert into "<YOUR SCHEMA>"."TICKETS_BY_YEAR" values('20110401',4469);
insert into "<YOUR SCHEMA>"."TICKETS_BY_YEAR" values('20110501',4257);
insert into "<YOUR SCHEMA>"."TICKETS_BY_YEAR" values('20110601',4973);
insert into "<YOUR SCHEMA>"."TICKETS_BY_YEAR" values('20110701',4470);
insert into "<YOUR SCHEMA>"."TICKETS_BY_YEAR" values('20110801',4981);
insert into "<YOUR SCHEMA>"."TICKETS_BY_YEAR" values('20110901',4530);
insert into "<YOUR SCHEMA>"."TICKETS_BY_YEAR" values('20111001',4167);
insert into "<YOUR SCHEMA>"."TICKETS_BY_YEAR" values('20111101',4059);
insert into "<YOUR SCHEMA>"."TICKETS_BY_YEAR" values('20111201',1483);
This table is supposed to hold the tickets sales for a given company during each month of the year 2011. What we want
to do here is to determine or predict how our sales will be in 2012. Were going to use R to do this.
Create the following script and call it Predict_Tickets. This script will have two Stored Procedures, called
Prediction_Model and Get_Tickets.
Predict_Tickets
CREATE TYPE T_PREDICTION_TICKETS AS TABLE (PERIOD VARCHAR(8), TICKETS INTEGER);
DROP PROCEDURE Prediction_Model;
DROP PROCEDURE Get_Tickets;
CREATE PROCEDURE Prediction_Model(IN tickets_year TICKETS_BY_YEAR,OUT result
T_PREDICTION_TICKETS)
LANGUAGE RLANG AS
BEGIN
period=as.integer(tickets_year$PERIOD)
tickets=as.integer(tickets_year$TICKETS)
var_year=as.integer(substr(period[1],1,4))
Generate the new date and put them into a dataframe
var_year=var_year+1
object.
new_period=gsub("^\\d{4}",var_year,period)
next_year=data.frame(year=new_period)
prt.lm<-lm(tickets ~ period)
Using lm to fit the linear model. After that call predict
as a generic function for predictions from the results of
pred=round(predict(prt.lm,next_year,interval="none"))
various model fitting and round the end result. Finally
result<-data.frame(PERIOD=new_period,TICKETS=pred)
give the result back.
END;
CREATE PROCEDURE Get_Tickets()
LANGUAGE SQLSCRIPT AS
BEGIN
Tickets = SELECT * FROM TICKETS_BY_YEAR;
CALL Prediction_Model(:Tickets,T_PREDICTION_TICKETS);
INSERT INTO "TICKETS_BY_YEAR" SELECT * FROM :T_PREDICTION_TICKETS;
END;
CALL Get_Tickets();
SELECT * FROM "TICKETS_BY_YEAR";
As you can see, in our first Stored Procedure called Prediction_Model, were using RLANG as the script
language...meaning that were going to embed R code that will go from our SAP HANA to our R Server and back with
the modified information.
Prediction_Model call the stored procedure Get_Tickets, which does a select from the table TICKETS_BY_YEAR and then
calls Prediction_Model to finally insert the data back into SAP HANA.
At the end of our script, we call Get_Tickets and do a select to TICKETS_BY_YEAR to verify that our script worked.
After that, we must modify our R code from tutorial one embed in SQL Script.
R code embedded in SQL Script
DROP TYPE T_SALESS;
DROP TYPE T_SALESA;
CREATE TYPE T_SALESS AS TABLE (ARTIKEL INTEGER, REGION VARCHAR(20), MENGE INTEGER);
CREATE TYPE T_SALESA AS TABLE (REGION VARCHAR(20), MENGE INTEGER);
References
Part of this tutorial is based on the following source: http://scn.sap.com/community/developercenter/hana/blog/2012/05/21/when-sap-hana-met-r--first-kiss