Embedded R Code in SQL Script Running On Sap Hana

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

| HOW-TO PAPER

[ SAP HANA Tutorial ]


mayato How-To Papers in the Area of Business Intelligence
Title:

Embedded R code in SQL script running on SAP HANA

Author(s):

Thorsten Fg

Topic:

SAP HANA

Published:

December 2012

Overview
This tutorial describes how to embed a simple R code in SQL Script running on SAP HANA. The tutorial is divided into
two parts. In the first part, you will do some simple predictive analyses with R. In the second part, you will transfer the
first R native code from the tutorial CONNECT R WITH HANA TO SAVE & READ SOME DATA in embedded R Script
running directly on SAP HANA. By the end of this tutorial, you will have developed two simple R codes embedded in SQL
Script running on SAP HANA.
Prerequisites
o You have access to an SAP HANA system.
o You have SAP HANA Studio installed on your machine.
o You have SAP HANA Clients installed on your machine.
o You have an R Server available.
o You have configured the IP address the of R server in the SAP HANA configuration.

Topics
In this tutorial you will learn how to:
1. Change the R server IP in the SAP HANA configuration
2. Embed some R code in SQL Script
The estimated completion time for this tutorial is 40 minutes.
Using a simple R example, you will learn the basic steps about how to embed R code in SQL Script and run it on SAP
HANA.

Starting R server and changing the IP in the SAP HANA configuration


You will find the steps to set-up a new R server here: http://help.sap.com/hana/hana_dev_r_emb_en.pdf.
You will need the free software VMware player
(https://my.vmware.com/web/vmware/info/slug/desktop_end_user_computing/vmware_player/4_0)
to start the image.
Please note: There are plans to setup an internal R server in mayatos infrastructure in the future. The static IP will then
be listed here and you no longer need the VM!
If you are successfully logged in on the running Linux VM, the R server daemon starts automatically. Please check this
with the terminal command <ps fx> to show all running processes.

If the daemon isnt running, execute the start script manually.

After that, open the Internet browser on your Linux VM and visit the following Web site to get your current IP address:
http://myip.dnsdynamic.org/

Keep the IP in mind and minimize the VM. -> Dont close it!!!

After that, start SAP HANA Studio. At this point you, need system rights CATALOG READ and INIFILE ADMIN or you must
be logged in as SYSTEM user.
Double-click on the System displayed in the SAP HANA STUDIO navigator pane. After that, the administration editor will
open. Switch to the configuration tab and choose Indexserver.ini -> Calcengine -> Cer_reserve_addresses. Here,
enter your IP separated from the port 30120 with a colon.

<YOUR R SERVER IP>:30120

Embedding some R in SQL script part one


On SAP HANA, create a table in your schema and call it TICKETS_BY_YEAR with the following structure:

Open a SQL Editor and copy the following code to fill the new table with data:
Please note: Change <YOUR SCHEMA> to your schema name!

SQL code
insert into "<YOUR SCHEMA>"."TICKETS_BY_YEAR" values('20110101',4195);
insert into "<YOUR SCHEMA>"."TICKETS_BY_YEAR" values('20110201',4245);
insert into "<YOUR SCHEMA>"."TICKETS_BY_YEAR" values('20110301',4971);
insert into "<YOUR SCHEMA>"."TICKETS_BY_YEAR" values('20110401',4469);
insert into "<YOUR SCHEMA>"."TICKETS_BY_YEAR" values('20110501',4257);
insert into "<YOUR SCHEMA>"."TICKETS_BY_YEAR" values('20110601',4973);
insert into "<YOUR SCHEMA>"."TICKETS_BY_YEAR" values('20110701',4470);
insert into "<YOUR SCHEMA>"."TICKETS_BY_YEAR" values('20110801',4981);
insert into "<YOUR SCHEMA>"."TICKETS_BY_YEAR" values('20110901',4530);
insert into "<YOUR SCHEMA>"."TICKETS_BY_YEAR" values('20111001',4167);
insert into "<YOUR SCHEMA>"."TICKETS_BY_YEAR" values('20111101',4059);
insert into "<YOUR SCHEMA>"."TICKETS_BY_YEAR" values('20111201',1483);

This table is supposed to hold the tickets sales for a given company during each month of the year 2011. What we want
to do here is to determine or predict how our sales will be in 2012. Were going to use R to do this.
Create the following script and call it Predict_Tickets. This script will have two Stored Procedures, called
Prediction_Model and Get_Tickets.
Predict_Tickets
CREATE TYPE T_PREDICTION_TICKETS AS TABLE (PERIOD VARCHAR(8), TICKETS INTEGER);
DROP PROCEDURE Prediction_Model;
DROP PROCEDURE Get_Tickets;
CREATE PROCEDURE Prediction_Model(IN tickets_year TICKETS_BY_YEAR,OUT result
T_PREDICTION_TICKETS)
LANGUAGE RLANG AS
BEGIN
period=as.integer(tickets_year$PERIOD)
tickets=as.integer(tickets_year$TICKETS)
var_year=as.integer(substr(period[1],1,4))
Generate the new date and put them into a dataframe
var_year=var_year+1
object.
new_period=gsub("^\\d{4}",var_year,period)
next_year=data.frame(year=new_period)
prt.lm<-lm(tickets ~ period)
Using lm to fit the linear model. After that call predict
as a generic function for predictions from the results of
pred=round(predict(prt.lm,next_year,interval="none"))
various model fitting and round the end result. Finally
result<-data.frame(PERIOD=new_period,TICKETS=pred)
give the result back.
END;
CREATE PROCEDURE Get_Tickets()

LANGUAGE SQLSCRIPT AS
BEGIN
Tickets = SELECT * FROM TICKETS_BY_YEAR;
CALL Prediction_Model(:Tickets,T_PREDICTION_TICKETS);
INSERT INTO "TICKETS_BY_YEAR" SELECT * FROM :T_PREDICTION_TICKETS;
END;
CALL Get_Tickets();
SELECT * FROM "TICKETS_BY_YEAR";
As you can see, in our first Stored Procedure called Prediction_Model, were using RLANG as the script
language...meaning that were going to embed R code that will go from our SAP HANA to our R Server and back with
the modified information.
Prediction_Model call the stored procedure Get_Tickets, which does a select from the table TICKETS_BY_YEAR and then
calls Prediction_Model to finally insert the data back into SAP HANA.

At the end of our script, we call Get_Tickets and do a select to TICKETS_BY_YEAR to verify that our script worked.

Embedding some R code in SQL script part two


In this section we will use the example of the tutorial CONNECT R WITH HANA TO SAVE & READ SOME DATA with
native R code.
First, execute the following SQL statement to create a new table and fill it with data from a CSV file. Two other tables
without data will also be created for the tasks later.
Note: Change <YOUR SCHEMA> to your schema name!
SQL code
DROP TABLE "TUTORIALS"."SALES";

DROP TABLE "<YOUR SCHEMA>"."SORTSALES";


DROP TABLE "<YOUR SCHEMA>"."AGGREGATESALES";
CREATE COLUMN TABLE "TUTORIALS"."SALES"(
"ARTIKEL" INTEGER NOT NULL,
"REGION" VARCHAR(20) NOT NULL,
"MENGE" INTEGER NOT NULL,
"PREIS" DECIMAL NOT NULL);
IMPORT FROM '/sap/usr/HDB/home/tmp/test.ctl';
CREATE COLUMN TABLE "<YOUR SCHEMA>"."SORTSALES" LIKE "TUTORIALS"."SALES" WITH NO DATA;
ALTER TABLE "<YOUR SCHEMA>"."SORTSALES" DROP (PREIS);
CREATE COLUMN TABLE "<YOUR SCHEMA>"."AGGREGATESALES"(
"REGION" VARCHAR(20) NOT NULL,
"MENGE" INTEGER NOT NULL);
Note: If you want to import data from a CSV file into an SAP HANA table, you need control file *.ctl. Both files, CSV and
CTL must be stored on SAP HANA server. In our case, both files are stored in /sap/usr/HDB/home/tmp/ by admin on the
SAP HANA server. You only have to execute the code above.
CTL code (from the file stored on SAP HANA server -> not needed here!)
IMPORT DATA
INTO TABLE TUTORIALS."SALES"
FROM 'test.csv'
record delimited by '\n'
fields delimited by ';'
error log 'test_lookup_tables.err'
Take a look at the data preview of the new table to verify that everything works fine. Please remember that you will also
find the table in the TUTORIALS schema!

After that, we must modify our R code from tutorial one embed in SQL Script.
R code embedded in SQL Script
DROP TYPE T_SALESS;
DROP TYPE T_SALESA;
CREATE TYPE T_SALESS AS TABLE (ARTIKEL INTEGER, REGION VARCHAR(20), MENGE INTEGER);
CREATE TYPE T_SALESA AS TABLE (REGION VARCHAR(20), MENGE INTEGER);

DROP PROCEDURE Sort_Model;


DROP PROCEDURE Aggregate_Model;
DROP PROCEDURE Get_Sales;
CREATE PROCEDURE Sort_Model(IN sales "<YOUR SCHEMA>"."SORTSALES",OUT result T_SALESS)
LANGUAGE RLANG AS
BEGIN
result <- sales[order(sales[,2],sales[,1], decreasing = FALSE),]
END;
CREATE PROCEDURE Aggregate_Model(IN sales "<YOUR SCHEMA>"."SORTSALES",OUT result
T_SALESA)
LANGUAGE RLANG AS
BEGIN
result <- aggregate(MENGE ~ REGION, data=sales, FUN=sum)
END;
CREATE PROCEDURE Get_Sales()
LANGUAGE SQLSCRIPT AS
BEGIN
saless = SELECT ARTIKEL, REGION, MENGE FROM "TUTORIALS"."SALES";
CALL Sort_Model(:saless,T_SALESS);
INSERT INTO "<YOUR SCHEMA>"."SORTSALES" SELECT * FROM :T_SALESS;
salesa = SELECT * FROM "<YOUR SCHEMA>G"."SORTSALES";
CALL Aggregate_Model(:salesa,T_SALESA);
INSERT INTO "<YOUR SCHEMA>"."AGGREGATESALES" SELECT * FROM :T_SALESA;
END;
CALL Get_Sales();
SELECT * FROM "<YOUR SCHEMA>"."SORTSALES";
SELECT * FROM "<YOUR SCHEMA>"."AGGREGATESALES";
Note: Change <YOUR SCHEMA> with your schema name!
Execute this SQL Script from your schema!
Finally, take a look at the results.

References
Part of this tutorial is based on the following source: http://scn.sap.com/community/developercenter/hana/blog/2012/05/21/when-sap-hana-met-r--first-kiss

You might also like