SAP Data Integration
Using Azure Data Factory
Update: Jun 28, 2020
INGEST PREPARE TRANSFORM SERVE VISUALIZE
& ENRICH
SAP data
On-premises data
Cloud data
STORE
SaaS data
Data Pipeline Orchestration & Monitoring
INGEST PREPARE TRANSFORM SERVE VISUALIZE
& ENRICH
SAP data
Typical SAP data integration scenarios:
• Ongoing batch ETL from SAP to data lake
• Historical migration from SAP to Azure
On-premises data
Cloud data
STORE
SaaS data
Data Pipeline Orchestration & Monitoring
Azure Data Factory
A fully-managed data integration service
for cloud-scale analytics in Azure
Connected & Scalable & Secure &
Productive
Integrated Cost-Effective Compliant
Rich connectivity Serverless scalability Certified compliance Drag & drop UI
without infra mgmt
Built-in transformation Enterprise grade security Single-pane-of-glass
Pay for use monitoring
Flexible orchestration MSI and AKV support
CICD model
Full integration with
Azure Data services
Code-free data
transformation
SAP data ingestion Azure Machine Learning integration
Single tool to enable data ingestion from SAP as well as other various sources,
and data transformation via built-in Data Flow, integration with Databricks/HDInsight/etc.
Azure Database & DW File Storage File Formats NoSQL Services & Apps Generic
Blob Storage Amazon Redshift Phoenix Amazon S3 Avro Cassandra Amazon MWS PayPal HTTP
Cosmos DB – SQL API DB2 PostgreSQL File System Binary Couchbase CDS for Apps QuickBooks OData
Cosmos DB – MongoDB API Drill Presto FTP Common Data Model MongoDB Concur Salesforce ODBC
ADLS Gen1 Google BigQuery SAP BW Open Hub Google Cloud Storage Delimited Text Dynamics 365 SF Service Cloud REST
ADLS Gen2 Greenplum SAP BW MDX HDFS Excel Dynamics AX SF Marketing Cloud
Data Explorer HBase SAP HANA SFTP JSON Dynamics CRM SAP C4C
Database for MariaDB Hive SAP Table ORC Google AdWords SAP ECC
Database for MySQL Impala Snowflake Parquet HubSpot ServiceNow
Database for PostgreSQL Informix Spark Jira SharePoint List
File Storage MariaDB SQL Server Magento Shopify
SQL Database Microsoft Access Sybase Marketo Square
SQL Managed Instance MySQL Teradata Office 365 Web Table
Synapse Analytics Netezza Vertica Oracle Eloqua Xero
Search Index Oracle Oracle Responsys Zoho
Table Storage Oracle Service Cloud
SAP Data Integration Overview
SAP HANA Connector
SAP Table Connector
SAP BW Open Hub Connector
SAP ECC Connector
SAP BW MDX Connector
More about Azure Data Factory Copy Activity
Resources
“I want to extract data from SAP HANA database” →
ADF connector:
(Connector deep-dive)
“I want to extract data from SAP BW” →
Suggested decision direction
ADF connector SAP BW SAP BW
SAP Table
options Open Hub via MDX
Table (Transparent, Pooled, DSO, InfoCube, MultiProvider,
Objects to extract InfoCubes, QueryCubes
Cluster Table) and View DataSource, etc
SAP side configuration N/A SAP Open Hub Destination N/A
Fast w/ built-in parallel loading Fast w/ built-in parallel loading
Performance based on configurable partitioning based on OHD specific schema
Slower
Well-thought-through workload Exploratory workload
Suitable workload Large volume
Large volume Small volume
(Connector deep-dive) (Connector deep-dive) (Connector deep-dive)
“I want to extract data from SAP ECC, S/4 HANA, or other SAP applications” →
Suggested decision direction
ADF connector options SAP Table SAP ECC
Table (Transparent, Pooled, Cluster Table) OData entities exposed via SAP Gateway
Objects to extract and View (BAPI, ODP)
SAP side configuration N/A SAP Gateway
Performance Fast w/ built-in parallel loading Slower
Suitable workload Large volume Small volume
(Connector deep-dive) (Connector deep-dive)
Supported versions • All SAP HANA versions, on-prem or in the cloud
• HANA Information Models (Analytic/Calculation views)
Supported SAP objects
• Row & Column Tables
• Basic – username & password
Supported authentications
• Windows – Single Sign-On via Kerberos-constrained delegation
• Built on top of SAP’s HANA ODBC driver
Mechanism and prerequisites • Pull data via custom query
• Run on Self-hosted Integration Runtime
• Built-in parallel loading option based on configurable data partitioning NEW
Performance & Scalability • Performant to handle TB level data with hundred millions to billion of rows per run,
observed several to several dozens MB/s (varies per customers’ data/env.)
Pipeline
Azure Data Stores
ADF Self-
SAP HANA
hosted
ODBC
Integration
Driver
Runtime
For each copy activity run, ADF issue the specified query to source to retrieve the data.
Source data
Single Copy Activity execution
Out-of-box optimization for SAP HANA: C1 C2 PartitionCol
e.g. set Parallel Copy = 4
… … … 10000
• Built-in parallel copy by partitions to
… … … 10001
boost performance for large table
ingestion. … … … …
… … … 30000
• Options of HANA physical table partition … … … 30001
and dynamic range partition.
… … … …
… … … 50000
… … … 50001
… … … …
… … … 70000
… … … 70001
… … … …
......
… … … …
C1 C2 … LastModifiedDate SELECT * FROM MyTable
WHERE LastModifiedDate >= @{formatDateTime(pipeline().parameters.windowStartTime, 'yyyy/MM/dd’)
… … … … AND LastModifiedDate < @{formatDateTime(pipeline().parameters.windowEndTime, 'yyyy/MM/dd’)
… … … 2019/03/18
… … … 2019/03/18
Execution start time: 2019/03/19 00:00:00 (window end time)
… … … …… Delta extraction: last modified time between 2019/03/18 – 2019/03/19
… … … 2019/03/18
… … … 2019/03/19
… … … 2019/03/19 Execution start time: 2019/03/20 00:00:00 (window end time)
… … … … Delta extraction: last modified time between 2019/03/19 – 2019/03/20
… … … 2019/03/19
… … … …
Workflow Pipeline
• SAP ECC or other applications in Business Suite version 7.01 and above, on-prem or in
Supported versions the cloud
• S/4 HANA
Supported SAP objects • SAP Transparent Table, Pooled Table, Cluster Table and View
Supported server type • Connect to Application Server or Message Server
• Basic – username & password
Supported authentications
• SNC (Secure Network Communications)
• Built on top of SAP .NET Connector 3.0, pull data via NetWeaver RFC w/ field selection
Mechanism and prerequisites & row filter
• Run on Self-hosted Integration Runtime
• Built-in parallel loading option based on configurable data partitioning
Performance & Scalability • Performant to handle TB level data, with per run dozen millions to billion of rows &
observed several to 20s MB/s (varies per customers’ data/env.)
✓ Field/column selection
✓ Row filter using SAP query operators
✓ Use default /SAPDS/RFC_READ_TABLE2 or
custom RFC module to retrieve data
Pipeline
Azure Data Stores
Capabilities:
ADF Self-
SAP .NET hosted
✓ Field selection
Connector Integration ✓ Row filter (SAP query operators)
Runtime
✓ Default or custom RFC func
✓ Built-in partition + parallel load
SAP table
ADF
C1 C2 PartitionCol Single Copy Activity execution
… … … 201809 e.g. set Parallel Copy = 4 Tips:
… … … 201809
… … … … Enable partitioning when
… … … 201810
ingesting large dataset,
e.g. dozen millions of
… … … 201810
rows.
… … … …
… … … 201811 To speed up, choose the
… … … 201811 proper partition column
… … … … and partition numbers,
… … … 201812 and adjust parallel copies.
… … … 201812
Learn more
… … … …
.......
… … … …
Pattern I: “my data has timestamp column e.g. calendar date”
Solution: tumbling window trigger + dynamic query with system variables via SAP table option (filter)
Pattern II: “my data has an incremental column e.g. id/last copied date”
Solution: external control table/file + high watermark.
Get started via solution template:
Supported versions • SAP BW version 7.01 and above, on-prem or in the cloud*
• Open Hub Destination (OHD) local table
Supported SAP objects
• Underneath objects can be DSO, InfoCube, MultiProvider, DataSource etc.
Supported server type • Connect to Application Server or Message Server NEW
Supported authentications • Basic – username & password
• Built on top of SAP .NET Connector 3.0, pull data via NetWeaver RFC
Mechanism and prerequisites • Run on ADF Self-hosted Integration Runtime
• SAP side config: create SAP OHD in SAP BW to expose data
• Built-in parallel loading option based on OHD specific schema
Performance & Scalability • Performant to handle TB level data, with per run dozens millions to billion of rows &
observed several to 20s MB/s (varies per customers’ data/env.)
✓ Base request ID for incremental copy to filter
out already copied data
✓ Exclude last request to avoid partial data
✓ Built-in parallel copy to boost perf based on
OHD’s specific schema
Open Hub
Destination
Table
OHD
• What is OHD: Data Sources Data Store Objects Cubes
Transform Transform F-fact
Extraction (DTP) (DTP)
PSA
• Supported data: ECC (InfoPackage) DSO E-fact
E-fact
Activate
OHD types:
DTP DTP
• InfoObject
Master Data
OHD OHD OHD
Pipeline
Azure Data Stores
ADF Self-
Open Hub
SAP .NET hosted
Destination
Table Connector Integration
Runtime
SAP BW OHD table ADF
Single Copy Activity execution
Request ID Package ID Record ID … e.g. set Parallel Copy = 4
1 1 1 …
1 1 2 …
SAP BW DTP 1 1 … …
execution #1: 1 2 1 …
unique Request ID
1 2 2 …
1 2 … …
1 3 2 …
1 … … …
2 … … …
SAP BW DTP
execution #2 2 … … …
2 … … …
……
… … … …
Request ID Package ID Record ID …
… … … …
100 … … …
… … … … Exclude Last request ID:
• Applicable if DTP and Copy may run at the same time
200 … … …
…
300 … … …
300 … … …
guidance
• SAP ECC version 7.0 and above
Supported versions
• Any entities exposed by SAP ECC OData services
• Entities exposed by SAP OData services
Supported SAP objects
• BAPI, ODP (DataExtractors/DataSource), etc.
Supported authentications • Basic – user name & password
• Though OData + SAP Gateway
• Run on Self-hosted Integration Runtime if SAP in private network
Mechanism and prerequisites
• SAP side config: set up SAP Gateway, activate OData service, and
expose entities
Pipeline
Azure Data Stores
• If your ECC is publicly
accessible, you can use
OData
ADF Self-hosted managed Azure Integration
SAP Integration Runtime Runtime instead of Self-hosted
Gateway
Integration Runtime.
• Tip: per run limit to under 1
million rows
(in general, same as HANA in earlier slides)
Pattern I: “my data has timestamp column e.g. last modified time”
Solution: tumbling window trigger + dynamic query with system variables via OData query
Pattern II: “my data has an incremental column e.g. ID”
Solution: external control table/file + high watermark.
Pattern III: “my data is small in size as dimension data”
Solution: full copy and overwrite
Supported versions • SAP BW version 7.x, on-prem or in the cloud e.g. on Azure
Supported server type • Connect to Application Server
Supported SAP objects • InfoCubes and QueryCubes (including BEx queries)
Supported authentications • Basic – username & password
• Built on top of SAP NetWeaver library, pull data via RFC
Mechanism and prerequisites
• Run on Self-hosted Integration Runtime
Pipeline
Azure Data Stores
ADF Self-
SAP
hosted
NetWeaver
Integration
library
Runtime
Flexible control flow &
scheduling to scale out.
Pipeline Pipeline (multiple copy activities,
concurrency, partitions)
Azure IR
Cloud Data Stores Azure Data Stores
On-prem Data Stores Self-hosted IR
Elastic managed infra to
handle data at scale.
(configurable DIUs per run)
Customer managed infra
with scaling options.
(powerfulness, concurrency)
Self-hosted IR deployed on premises
Data Factory
Self-hosted IR
on premises
Azure Storage Azure SQL DB Azure SQL DW HDInsight Databricks
Data stores
VNet ACL
Corporate
Firewall Boundary
Self-hosted IR deployed on Azure VM
Data Azure Storage Azure SQL DB Azure SQL DW
Factory
VNet ACL
VNet
service
endpoints
Express Route
Data stores (private peering) Self-hosted IR on
Azure VM
HDInsight Databricks
Corporate Firewall Azure Virtual
Boundary Network
Self-hosted IR deployed on Azure VM
Data Azure Storage Azure SQL DB Azure SQL DW
Factory
VNet ACL
VNet
service
endpoints
Data stores
Self-hosted IR on
Azure VM
HDInsight Databricks
Azure Virtual
Network
Copy Data Tool
Solution Template
ADF Copy Activity Overview https://docs.microsoft.com/azure/data-factory/copy-activity-overview
SAP HANA Connector https://docs.microsoft.com/azure/data-factory/connector-sap-hana
SAP Table Connector https://docs.microsoft.com/azure/data-factory/connector-sap-table
SAP BW Open Hub Connector https://docs.microsoft.com/azure/data-factory/connector-sap-business-warehouse-open-hub
SAP BW MDX Connector https://docs.microsoft.com/azure/data-factory/connector-sap-business-warehouse
SAP ECC Connector https://docs.microsoft.com/azure/data-factory/connector-sap-ecc
SAP C4C Connector https://docs.microsoft.com/azure/data-factory/connector-sap-cloud-for-customer
• Analytics and Integration for SAP Global Instance running on-premises with ADF
• Reckitt Benckiser (RB): https://customers.microsoft.com/story/reckitt-benckiser-consumer-
Customer case study
goods-power-bi
• Newell: https://customers.microsoft.com/story/newell-brands-consumer-goods-azure