0% found this document useful (0 votes)
42 views332 pages

SAP HANA Performance Developer Guide En

The SAP HANA Performance Guide for Developers provides comprehensive insights into optimizing performance within the SAP HANA platform. It covers various topics including schema design, SQL query performance, and SQLScript performance, offering guidelines and best practices for developers. The document serves as a resource for enhancing the efficiency and effectiveness of applications built on the SAP HANA platform.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views332 pages

SAP HANA Performance Developer Guide En

The SAP HANA Performance Guide for Developers provides comprehensive insights into optimizing performance within the SAP HANA platform. It covers various topics including schema design, SQL query performance, and SQLScript performance, offering guidelines and best practices for developers. The document serves as a resource for enhancing the efficiency and effectiveness of applications built on the SAP HANA platform.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 332

PUBLIC

SAP HANA Platform 2.0 SPS 07


Document Version: 1.0 – 2023-04-04

SAP HANA Performance Guide for Developers


© 2023 SAP SE or an SAP affiliate company. All rights reserved.

THE BEST RUN


Content

1 SAP HANA Performance Guide for Developers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 Disclaimer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3 Schema Design. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8
3.1 Choosing the Appropriate Table Type. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.2 Creating Indexes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9
Primary Key Indexes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10
Secondary Indexes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Multi-Column Index Types. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Costs Associated with Indexes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .13
When to Create Indexes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.3 Partitioning Tables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15
3.4 Query Processing Examples. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.5 Delta Tables and Main Tables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.6 Denormalization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.7 Additional Recommendations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .21

4 Query Execution Engine Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .22

5 SQL Query Performance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25


5.1 SQL Process. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .25
SQL Processing Components. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
5.2 SAP HANA SQL Optimizer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Rule-Based Optimization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Cost-Based Optimization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
Decisions Not Subject to the SQL Optimizer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Query Optimization Steps: Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.3 Analysis Tools. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .47
SQL Plan Cache. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
Explain Plan. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
Plan Visualizer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
SQL Trace. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
SQL Optimization Step Debug Trace. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
SQL Optimization Time Debug Trace. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
Views and Tables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
5.4 Case Studies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
Simple Examples. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

SAP HANA Performance Guide for Developers


2 PUBLIC Content
Performance Fluctuation of an SDA Query with UNION ALL. . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
Composite OOM due to Memory Consumed over 40 Gigabytes by a Single Query. . . . . . . . . . . 105
Performance Degradation of a View after an Upgrade Caused by Calculation View Unfolding
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
5.5 SQL Tuning Guidelines. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
Avoiding Implicit Type Casting in Queries. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
Avoiding Inefficient Predicates in Joins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
Avoiding Inefficient Predicates in EXISTS/IN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .124
Avoiding Set Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
Improving Performance for Multiple Column Joins. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
Using Hints to Alter a Query Plan. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

6 SQLScript Performance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134


6.1 SQLScript Optimizer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
6.2 SQLScript Performance Analysis - First Step. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
SQL Trace. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
SQLScript Plan Profiler. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
M_ACTIVE_PROCEDURES Monitoring View. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
Explain Plan for Call. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
Plan Visualizer for SQLScript. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
6.3 SQLScript Performance Tuning Tips - Advanced Step. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
SQLScript Inlining Feature. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
Query Parameterization - BIND_AS_PARAMETER and BIND_AS_VALUE. . . . . . . . . . . . . . . . . . 166
Using Hints with SQLScript Procedures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
6.4 SQLScript Performance Guidelines. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
Reduce the Complexity of SQL Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
Reduce Dependencies Between Statements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
Blocking Statement Inlining with the NO_INLINE Hint. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
Avoid Using Cursors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
Working with Tables and Table Variables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
Skipping Expensive Queries. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .194
Using Dynamic SQL Carefully. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
Simplifying Application Coding with Parallel Operators. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .197
Replacing Row-Based Calculations with Set-Based Calculations. . . . . . . . . . . . . . . . . . . . . . . .205
Avoiding Busy Waiting. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
Calling Procedures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
Miscellaneous. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212

7 Optimization Features in Calculation Views. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .215


7.1 Calculation View Instantiation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .216
7.2 Setting Join Cardinality. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
Join Cardinality. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221

SAP HANA Performance Guide for Developers


Content PUBLIC 3
Examples. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
7.3 Optimizing Join Columns. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
Optimize Join Columns Option. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
Prerequisites for Pruning Join Columns. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
Example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
7.4 Using Dynamic Joins. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254
Dynamic Join Example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
Workaround for Queries Without Requested Join Attributes. . . . . . . . . . . . . . . . . . . . . . . . . . . 261
7.5 Union Node Pruning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264
Pruning Configuration Table. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266
Example with a Pruning Configuration Table. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
Example with a Constant Mapping. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .269
Example with a Constant Mapping and an Indirect Filter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
Example Tables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275
7.6 Influencing the Degree of Parallelization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275
Example: Create the Table. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .277
Example: Create the Model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278
Example: Apply Parallelization Based on Table Partitions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280
Example: Apply Parallelization Based on Distinct Entries in a Column. . . . . . . . . . . . . . . . . . . . 283
Verifying the Degree of Parallelization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
Constraints. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288
7.7 Influencing Calculation View Behavior with Hints. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289
7.8 Using "Execute in SQL Engine" in Calculation Views. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290
Impact of the "Execute in SQL Engine" Option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292
Checking Whether a Query is Unfolded. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294
Influencing Whether a Query is Unfolded. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294
Understand Why a Query Is Not Unfolded. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296
7.9 Push Down Filters in Rank Nodes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298
7.10 Creating Snapshots of Queries on Calculation Views. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299
7.11 Static Cache. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302
Prerequisites for Using the Static Cache. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302
Enable the Static Cache. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303
Apply a Filter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305
Instruct Queries to Use the Static Cache. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306
Combine Data from Cached and Non-Cached Objects. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .306
Use Analytic Privileges. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307
Force the Static Cache. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308
Static Cache Monitoring. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .309
7.12 Reduce Retrieval Times for Value Helps. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
7.13 Analytical Model Health Check. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312
7.14 Condensed Performance Suggestions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314

SAP HANA Performance Guide for Developers


4 PUBLIC Content
7.15 Avoid Long Build Times for Calculation Views. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315
Check Whether Long Build Times are Caused by Technical Hierarchies. . . . . . . . . . . . . . . . . . . 316
Avoid Creating Technical Hierarchies (Optional). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .317

8 Multi-Dimensional Services (MDS) Query Performance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .318


8.1 Analyzing Multi-Dimensional Services (MDS) Queries. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318
8.2 MDS Cube Cache. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321
Cube Cache Behavior. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321
Cube Cache Configuration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323
Manual Cache Flushing and Invalidation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .326
Examples. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327

9 Important Disclaimer for Features in SAP HANA. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329

SAP HANA Performance Guide for Developers


Content PUBLIC 5
1 SAP HANA Performance Guide for
Developers

The SAP HANA Performance Guide for Developers provides an overview of the key features and characteristics
of the SAP HANA platform from a performance perspective. While many of the optimization concepts and
design principles that are common for almost all relational database systems also apply to SAP HANA, there
are some SAP HANA-specific aspects that are highlighted in this guide.

The guide is structured as follows:

• Schema Design
This section discusses physical schema design, which includes the optimal choice between a columnar
and row store table format, index creation, as well as further aspects such as horizontal table partitioning,
denormalization, and others.
• Query Execution Engine Overview
This section provides a brief overview of the query processing engines used in SAP HANA.
• SQL Query Performance
This section focuses on SQL query performance and the techniques that can be applied to improve
execution time. It also describes how the analysis tools can be used to investigate performance issues
and illustrates some key points through a selection of cases studies. It concludes by giving some specific
advice about writing and tuning SQL queries, including general considerations such as programming in a
database-friendly and more specifically column store-friendly way.
• SQLScript Performance
This section focuses on specific issues related to the performance of SQLScript stored procedures. It
describes how the analysis tools can be used to investigate performance issues and shows how tuning
methods can be applied to improve execution time. It also discusses recommended programming patterns
that yield optimal performance as well as anti-patterns that should be avoided.
• Optimization Features in Calculation Views
This section explains how some of the key optimization features available for calculation views can be used
to improve query performance.
• Multi-Dimensional Services (MDS) Query Performance
This section describes how the MDS cube cache is used to cache the results of query execution. It also
explains how to investigate performance issues with multidimensional queries.

Related Information

SAP HANA SQL Reference Guide for SAP HANA Platform


SAP HANA SQLScript Reference
SAP HANA Modeling Guide for SAP Web IDE for SAP HANA

SAP HANA Performance Guide for Developers


6 PUBLIC SAP HANA Performance Guide for Developers
2 Disclaimer

This guide presents generalized performance guidelines and best practices, derived from the results of internal
testing under varying conditions. Because performance is affected by many factors, it cannot be guaranteed
that these guidelines will improve performance in each case. We recommend that you regard these guidelines
as a starting point only.

As an example, consider the following. You have a data model that consists of a star schema. The general
recommendation would be to model it using a calculation view because this allows the SAP HANA database to
exploit the star schema when computing joins, which could improve performance. However, there might also
be reasons why this would not be advisable. For example:

• The number of rows in some of the dimension tables is much bigger than you would normally expect, or
the fact table is much smaller.
• The data distribution in some of the join columns appears to be heavily skewed in ways you would not
expect.
• There are more complex join conditions between the dimension and fact tables than you would expect.
• A query against such tables does not necessarily always involve all tables in the star schema. It could be
that only a subset of those tables is touched by the query, or there could also be joins with other tables
outside the star schema.

Performance tuning is essentially always a trade-off between different aspects, for example, CPU versus
memory, or maintainability, or developer effectiveness. It therefore cannot be said that one approach is always
better.

Note also that tuning optimizations in many cases create a maintenance problem going forward. What you may
need to do today to get better performance may not be valid in a few years' time when further optimizations
have been made inside the SAP HANA database. In fact, those tuning optimizations might even prevent you
from benefiting from the full SAP HANA performance. In other words, you need to monitor whether specific
optimizations are still paying off. This can involve a huge amount of effort, which is one of the reasons why
performance optimization is very expensive.

 Note

In general, the SAP HANA default settings should be sufficient in almost any application scenario. Any
modifications to the predefined system parameters should only be done after receiving explicit instruction
from SAP Support.

SAP HANA Performance Guide for Developers


Disclaimer PUBLIC 7
3 Schema Design

The performance of query processing in the SAP HANA database depends heavily on the way in which the data
is physically stored inside the database engine.

There are several key characteristics that influence the query runtime, including the choice of the table type
(row or column storage), the availability of indexes, as well as (horizontal) data partitioning, and the internal
data layout (delta or main). The sections below explain how these choices complement each other, and also
which combinations are most beneficial. The techniques described can also be used to selectively tune the
database for a particular workload, or to investigate the behavior and influencing factors when diagnosing the
runtime of a given query. You might also want to evaluate the option of denormalization (see the dedicated
topic on this subject), or consider making slight modifications to the data stored by your application.

Related Information

Choosing the Appropriate Table Type [page 8]


Creating Indexes [page 9]
Partitioning Tables [page 15]
Delta Tables and Main Tables [page 18]
Query Processing Examples [page 16]
Denormalization [page 19]

3.1 Choosing the Appropriate Table Type


The default table type of the SAP HANA database system is columnar storage.

Columnar storage is particularly beneficial for analytical (OLAP) workloads since it provides superior
aggregation and scan performance on individual attributes, as well as highly sophisticated data compression
capabilities that allow the main memory footprint of a table to be significantly reduced. At the same time,
the SAP HANA column store is also capable of sustaining high throughput and good response times for
transactional (OLTP) workloads. As a result, columnar storage should be the preferred choice for most
scenarios.

However, there are some cases where row-oriented data storage might give a performance advantage. In
particular, it might be beneficial to choose the row table type when the following apply:

• The table consists of a very small data set (up to a few thousand records), so the lack of data compression
in the row store can be tolerated.
• The table is subject to a high-volume transactional update workload, for example, performing frequent
updates on a limited set of records.
• The table is accessed in a way where the entire record is selected (select *), it is accessed based on a
highly selective criterion (for example, a key or surrogate key), and it is accessed extremely frequently.

SAP HANA Performance Guide for Developers


8 PUBLIC Schema Design
If a table does not fulfil at least one of the criteria given above, it should not be considered for row-based
storage. In general, row-based storage should be considered primarily for extreme OLTP scenarios requiring
query response times within the microsecond range.

Also note that cross-engine joins that include row and column store tables cannot be handled with the
same efficiency as in row-to-row and column-to-column joins. This should also be taken into account when
considering whether to change the storage type, as join performance can be significantly affected.

A large row store consisting of a large number of row store tables or several very large row store tables also
has a negative impact on the database restart time because currently the whole row store is loaded during a
restart. An optimization has been implemented for a planned stop and start of the database, but in the event of
a crash or machine failure that optimization does not work.

The main criteria for column-based storage and row-based storage are summarized below:

Column Store Row Store

Tables with many rows (more than a couple of 100,000), Very large OLTP load on the table (high rate of single up-
because of better compression in the column store dates, inserts, or deletes)

Tables with a full-text index Note that select * and select single are not sufficient criteria
for putting a table into row-based storage.
Tables used in an analytical context and containing business-
relevant data

 Note

If neither ROW nor COLUMN is specified in the CREATE TABLE statement, the table that is created is a
column table. However, because this was not the default behavior in SAP HANA 2.0 SPS 02 and earlier, it is
generally recommended to always use either the COLUMN or ROW keyword to ensure that your code works
for all versions of SAP HANA.

3.2 Creating Indexes

The availability of index structures can have a significant impact on the processing time of a particular table,
both for column-based and row-based data storage.

For example, when doing a primary-key lookup operation (where exactly one or no record is returned), the
availability of an index may reduce the query processing time from a complete table scan (in the worst case)
over all n records of a table, to a logarithmic processing time in the number of distinct values k (log k). This
could easily reduce query runtimes from several seconds to a few milliseconds for large tables.

The sections below discuss the different access paths and index options for the column store. The focus is on
the column store because columnar storage is used by default and is the preferred option for most scenarios.
However, similar design decisions exist for row-store tables.

Related Information

Primary Key Indexes [page 10]

SAP HANA Performance Guide for Developers


Schema Design PUBLIC 9
Secondary Indexes [page 11]
Multi-Column Index Types [page 12]
Costs Associated with Indexes [page 13]
When to Create Indexes [page 14]

3.2.1 Primary Key Indexes

Most tables in the SAP environment have a primary key, providing a unique identifier for each individual row
in a table. The key typically consists of several attributes. The SAP HANA column store automatically creates
several indexes for each primary key.

For each individual key attribute of a primary key, an implicit single-column index (inverted index) is created as
an extension to the corresponding column. Inverted indexes are light-weight data structures that map column
dictionary value IDs to the corresponding row IDs. The actual data in the column is stored as an array of value
IDs, also called an index vector.

The example below illustrates the direct mapping of dictionary value IDs to table row IDs using an inverted
index (shown on the right). The column dictionary contains all existing column values in sorted order, but it
does not provide any information about which rows of the table contain the individual values. The mapping
between the dictionary value IDs and the related table row IDs is only available through the inverted index.
Without the index, the whole column would have to be scanned to find a specific value:

If more than one attribute is part of the key, a concatenated index is also created for all the involved attributes.
The concatenated column values (index type INVERTED VALUE) or the hash values of the indexed columns
(index type INVERTED HASH) are stored in an internal index key column (also called a concat attribute), which
is added to the table. Note that this does not apply to the index type INVERTED INDIVIDUAL.

SAP HANA Performance Guide for Developers


10 PUBLIC Schema Design
For example:

• A table with a primary key on (MANDT, KNR, RNR) will have separate inverted indexes on the column
MANDT, on the column KNR, and on the column RNR.
Depending on the index type, the primary key will also have a resulting concatenated index for the three
attributes.
• A table with a surrogate primary key SUR (that is, an artificial identifier such as a GUID or monotonically
increasing counter) will have only one individual index on the attribute SUR. It will not have a concatenated
index, since there is only one key attribute.

Related Information

Multi-Column Index Types [page 12]

3.2.2 Secondary Indexes

Apart from a primary key, which provides a unique identifier for each row and is supported by one or multiple
corresponding indexes as outlined above, an arbitrary number of secondary indexes can be created. Both
unique and non-unique secondary indexes are supported.

Internally, secondary indexes translate into two different variants, depending on the number of columns that
are involved:

• Indexes on individual columns


When creating an index on an individual column, the column store creates an inverted list (inverted index)
that maps the dictionary value IDs to the corresponding entries in the index vector. Internally, two index
structures are created, one for the delta table and one for the main table.
When this index is created for the row store, only one individual B+ tree index is created.
• Indexes on multiple columns (concatenated indexes)
A multi-column index can be helpful if a specific combination of attributes is queried frequently, or to
speed up join processing where multiple attributes are involved. Note that when a concatenated index is
created, no individual indexes are created for the constituent attributes (this is only done for the primary
key, where individual indexes are also created for each of these attributes).
The column store supports the inverted value index, inverted hash index, and inverted individual index for
multi-column indexes.
When a concatenated index is created for the row store, only one individual B tree index is created.

Related Information

Multi-Column Index Types [page 12]


CREATE INDEX Statement (Data Definition)

SAP HANA Performance Guide for Developers


Schema Design PUBLIC 11
3.2.3 Multi-Column Index Types

The column store supports the inverted value index, inverted hash index, and inverted individual index for
multi-column indexes. The inverted value index is the default index type for multi-column keys.

Inverted Value Indexes

An inverted value index consists of the concatenation string of all values of the attributes for each individual
row. Internally, an inverted value index consists of three major components: A dictionary that contains the
concatenation of all attribute values, an index vector, and an inverted list (inverted index) that maps the
dictionary value IDs to the corresponding records.

For each composite key or index, a new internal index key column (also called a concat attribute) is added to
the table. This column, which is typically hidden and persisted, is handled like any other database column.

In addition, for each primary key and constituent of the primary key, a separate inverted index is created
automatically. Note that this does not occur for secondary keys.

Concatenated indexes should be used with care because their main memory footprint tends to be significant,
given the fact that an additional dictionary needs to be created.

Inverted Hash Indexes

If an index consists of many columns with long values, storing the concatenated keys can lead to significant
memory consumption. The inverted hash index helps reduce memory consumption and generally results in a
significantly smaller memory footprint (30% or more). For indexes of this type, the dictionary of the internal
index key column stores hash values of the indexed columns.

Inverted hash indexes can be used for primary indexes and unique indexes that consist of at least two columns.
They are not recommended for non-unique secondary indexes because they can cause performance issues.

Note that the memory footprint after a join is always at least the same as that for inverted value indexes.

Inverted Individual Indexes

Unlike an inverted value index or inverted hash index, an inverted individual index does not require a dedicated
internal index key column, because a light-weight inverted index structure is created instead for each individual
index column. Due to the absence of this column, the memory footprint can be significantly reduced.

Inverted individual indexes can be used for multi-column primary indexes and unique indexes. Good candidates
for inverted individual indexes are those where all the following conditions are met:

• They are large multi-column indexes that are required for uniqueness or primary key purposes.
• There is a selective index column that is typically used in a WHERE clause. Based on column statistics,
this column is processed first during a uniqueness check and query processing to obtain a small candidate

SAP HANA Performance Guide for Developers


12 PUBLIC Schema Design
result set. In the absence of a selective column, more column values need to be compared, resulting in a
larger query processing overhead.
• They are not accessed frequently during query processing of a critical workload (unless a slight
performance overhead is acceptable).

Bulk-loading scenarios can benefit from the inverted individual index type because the delta merge of the
internal index key columns is not needed and I/O is reduced. However, queries that could use a concatenated
index, such as primary key selects, are up to 20% slower with inverted individual indexes compared to inverted
value indexes. For special cases like large in-list queries, the impact may be even higher. Join queries could also
have an overhead of about 10%.

For DML insert, update, and delete operations, there are performance gains because less data is written to the
delta and redo log. However, depending on the characteristics of the DML change operations, the uniqueness
check and WHERE clause processing may be more complex and therefore slower.

Note that the inverted individual index type is not needed for non-unique multi-column indexes, because
inverted indexes can simply be created on the individual columns. This results in the same internal index
structures.

Related Information

CREATE INDEX Statement (Data Definition)

3.2.4 Costs Associated with Indexes

Indexes entail certain costs. Memory consumption and incremental maintenance are two major cost factors
that need to be considered.

Memory Consumption

To store the mapping information of value IDs to records, each index uses an inverted list that needs to be kept
in main memory. This list typically requires an amount of memory that is of the same order of magnitude as
the index vector of the corresponding attribute. When creating a concatenated index (for more information, see
above), there is even more overhead because an additional dictionary containing the concatenated values of all
participating columns needs to be created as well. It is difficult to estimate the corresponding overhead, but it
is usually notably higher than the summed-up size of the dictionaries of the participating columns. Therefore,
concatenated indexes should be created with care.

The memory needed for inverted individual indexes includes the memory for inverted indexes on all indexed
columns. This memory usage can be queried as a sum of M_CS_COLUMNS.MEMORY_SIZE_INDEX for all
indexed columns.

The main advantage of the inverted individual index is its low memory footprint. The memory needed for
inverted individual indexes is much smaller than the memory used for the internal index key column that is
required for inverted value and inverted hash indexes. In addition, the data and log I/O overhead, table load

SAP HANA Performance Guide for Developers


Schema Design PUBLIC 13
time, CPU needed for the delta merge, and the overhead of DML update operations are also reduced because
an internal column does not need to be maintained. Similarly, the DDL to create an inverted individual index
is also much faster than for the other index types because the concatenated string does not need to be
generated.

Incremental Maintenance

Whenever a DML operation is performed on the base table, the corresponding index structures need to be
updated as well (for example, by inserting or deleting entries). These additional maintenance costs add to the
costs on the base relation and depending on the number of indexes created, the number of attributes in the
base table, and the number of attributes in the individual indexes, might even dominate the actual update time.
Again, this requires that care is taken when creating additional indexes.

3.2.5 When to Create Indexes

Indexes are particularly useful when a query contains highly selective predicates that help to reduce the
intermediate results quickly.

The classic example for this is a primary key-based select that returns a single row or no row. With an index,
this query can be answered in logarithmic time, while without an index, the entire table needs to be scanned in
the worst case to find the single row that matches the attribute, that is, in linear time.

However, besides the overhead for memory consumption and the maintenance costs associated with indexes,
some queries do not really benefit from them. For example, an unselective predicate such as the client
(German: Mandant) does not filter the dataset much in most systems, since they typically have a very limited
set of clients and one client contains virtually all the entries. On the other hand, in cases of data skew, it could
be beneficial to have an index on such a column, for example, when frequently searching for MANDT='000' in a
system that has most data in MANDT='200'.

Consider the following recommendations when manually creating indexes:

Recommendation Details

Avoid non-unique indexes Columns in a column table are inherently index-like and there-
fore do not usually benefit from additional indexes. In some
scenarios (for example, multiple-column joins or unique con-
straints), indexes can further improve performance.

Start without any indexes and then add them if needed.

Create as few indexes as possible Every index imposes an overhead in terms of space and per-
formance, so you should create as few indexes as possible.

Ensure that the indexes are as small as possible Specify as few columns as possible in an index so that the
space overhead is minimized.

SAP HANA Performance Guide for Developers


14 PUBLIC Schema Design
Recommendation Details

Prefer single-column indexes in the column store Single-column indexes in the column store have a much lower
space overhead because they are just light-weight data struc-
tures created on top of the column structure. Therefore, you
should use single-column indexes whenever possible.

Due to the in-memory approach in SAP HANA environments,


it is generally sufficient to define an index on only the most se-
lective column. (In other relational databases, optimal perform-
ance can often only be achieved by using a multi-column index.)

For information about when to combine indexes with partitioning, see Query Processing Examples.

Related Information

Query Processing Examples [page 16]

3.3 Partitioning Tables

For column tables, partitioning can be used to horizontally divide a table into different, physical parts that can
be distributed to the different nodes in a distributed SAP HANA database landscape.

Partitioning criteria include range, hash, and round-robin partitioning. From a query processing point of view,
partitioning can be used to restrict the amount of data that needs to be analyzed by ruling out irrelevant parts
in a first step (partition pruning). For example, let's assume a table is partitioned based on a range predicate
operating on a YEAR attribute. When a query with a predicate on YEAR now needs to be processed (for
example, select count(1) from table where year=2013), the system can restrict the aggregation to
only the rows in the individual partition for year 2013 instead of taking all available partitions into account.

While partition pruning can dramatically improve processing times, it can only be applied if the query
predicates match the partitioning criteria. For example, partitioning a table by YEAR as above is not
advantageous if the query does not use YEAR as a predicate, for example, select count(1) from table
where MONTH=4. In the latter case, partitioning may even be harmful since several physical storage containers
need to be accessed to answer the query, instead of just a single one as in the unpartitioned case.

Therefore, to use partitioning to speed up query processing, the partitioning criteria need to be chosen in a
way that supports the most frequent and expensive queries that are processed by the system. For information
about when to combine partitioning with indexes, see Query Processing Examples.

Costs of Partitioning

Internally, the SAP HANA database treats partitions as physical data containers, similar to tables. In particular,
this means that each partition has its own private delta and main table parts, as well as dictionaries that are

SAP HANA Performance Guide for Developers


Schema Design PUBLIC 15
separate from those of the other partitions. As a result and depending on the actual value distribution and
partitioning criteria, the main memory consumption of a table might increase or decrease when it is changed
from a non-partitioned to a partitioned table. While this does not initially appear very intuitive, the root cause
for this lies in the dictionary compression that is applied.

For example:

• Increased memory consumption due to partitioning


A table has two attributes, MONTH and YEAR, and contains data for all 12 months and two distinct years
(2013 and 2014). When the table is partitioned by YEAR, the dictionary for the MONTH attribute needs to be
held in memory twice (both for 2013 and 2014), therefore increasing memory consumption.
• Decreased memory consumption due to partitioning
A table has two attributes, GENDER and FIRSTNAME, and stores data about German customers. When the
table is partitioned by GENDER, it is divided into two groups (female and male). In Germany, there is a
limited set of first names for both females and males. As a result, the FIRSTNAME dictionaries are implicitly
partitioned as well into two almost distinct groups, both containing almost n/2 distinct values, compared
to the unpartitioned table with n distinct values. Therefore, to represent those values in the index vector,
only n-1 bits are required instead of n bits in the original table. As there is virtually no redundancy in the
dictionaries, memory consumption can be reduced by partitioning.

3.4 Query Processing Examples

The examples below show how the different access paths and optimization techniques described above can
significantly influence query processing .

Exploiting Indexes

This example shows how a query with multiple predicates can potentially benefit from the different indexes that
are available. The query used in the example is shown below, where the table FOO has a primary key for MANDT,
BELNR, and POSNR:

SELECT * FROM FOO


WHERE MANDT='999' and BELNR='xx2342'

No Indexes: Attribute Scans


A straightforward plan would be to scan both the attributes MANDT and BELNR to find all matching rows and
then materialize the result set for those rows where both criteria have been fulfilled. Since the column store
uses dictionary compression, the system first needs to look up the corresponding value IDs from the dictionary
during predicate evaluation (MANDT='999' and BELNR='xx2342'). It does this with a binary search operation
on the sorted dictionary for the main table, which means log k, where k is the number of distinct values in
the main table. For the delta table, there are auxiliary structures that allow the value IDs to be retrieved with the
same degree of complexity from the unsorted delta dictionary. After that, the scan operation can be performed
to compare the value IDs. The scan operations are run sequentially so that if the first scan already reduces
the result set significantly, further scanning can be avoided and the values of the individual rows looked up
instead. This is also one of the reasons why query execution tries to start with the evaluation of the most

SAP HANA Performance Guide for Developers


16 PUBLIC Schema Design
selective predicate first (for example, it is more likely that BELNR will be evaluated before MANDT, depending on
the selectivity estimations).

Conceptually, the runtime for these scans is 2*n, where n is the number of values in the table. However, the
actual runtime depends on the number of distinct values in the corresponding column. For attributes with very
few distinct values (for example, MANDT), it might be sufficient to use a small number of bits to encode the
dictionary values (for example, 2 bits). Since the SAP HANA database scan operators use SIMD instructions
during processing, multiple-value comparisons can be done at the same time, depending on the number of bits
required for representing an entry. Therefore, a scan of n records with 2 bits per value is notably faster than a
scan of n records with 6 bits (an almost linear speedup).

In the last step of query processing, the result set needs to be materialized. Therefore, for each cell (that is,
each attribute in each row), the actual value needs to be retrieved from the dictionary in constant time.

Single-Column Indexes
To improve the query processing time, the system can use the single-column indexes that are created for
each column of the key. Instead of doing the column scan operations for MANDT and BELNR, the indexes can
be used to retrieve all matching records for the given predicates, reducing the evaluation costs from a scan
to a constant-time lookup operation for the column store. The other costs (combining the two result sets,
dictionary lookup, and result materialization) remain the same.

Concatenated Indexes
When a concatenated index is available, it is preferrable to use it for query processing. Instead of having to do
two individual index-backed search operations on MANDT and BELNR and combine the results afterwards (AND),
the query can be answered by a single index-access operation if a concatenated index on (MANDT, BELNR)
is available. In this particular example this is not the case, because the primary key also contains the POSNR
predicate and therefore cannot be used directly. However, in this special case, the concatenated index of the
primary key can still be exploited. Since the query uses predicates that form a prefix of the primary key, the
search can be regarded internally as semantically equivalent to SELECT * FROM FOO WHERE MANDT='999'
and BELNR='xx2342' and POSNR like '%'. Since the SAP HANA database engine internally applies a
similar rewrite (with a wildcard as the suffix of the concatenated attributes), the concatenated index can still be
used to accelerate the query.

When this example is actually executed in the system, the concatenated index is exploited as described above.

Indexes Versus Partitioning

Both indexes and partitioning can be used to accelerate query processing by avoiding expensive scans. While
partitioning and partition pruning reduce the amount of data to be scanned, the creation of indexes provides
additional, alternate access paths at the cost of higher memory consumption and maintenance.

Partitioning
If partition pruning can be applied, this can have the following benefits:

• Scan operations can be limited to a subset of the data, thereby reducing the costs of the scan.
• Partitioning a table into smaller chunks might enable the system to represent large query results in a more
efficient manner. For example, a result set of hundreds of thousands of records might not be represented
as a bit vector for a huge table with billions of records, but this might be feasible if the table is partitioned

SAP HANA Performance Guide for Developers


Schema Design PUBLIC 17
into smaller chunks. Consequently, result set comparisons (AND/OR of several predicates) and handling
might be more efficient in the partitioned case.

Note that these benefits heavily depend on having matching query predicates. For example, partitioning a table
by YEAR is not beneficial for a query that does not include YEAR as a predicate. In this case, query processing
will actually be more expensive.

Indexes
Indexes can speed up predicate evaluation. The more selective a predicate is, the higher the gain.

Combining Partitioning and Indexes


For partitioning, the greatest potential for improvement is when the column is not very selective. For indexing,
it is when the column is selective. Combining these techniques on different columns can be very powerful.
However, it is not beneficial to use them on the same column for the sole purpose of speeding up query
processing.

3.5 Delta Tables and Main Tables

Each column store table consists of two distinct parts, the main table and the delta table. While the main table
is read only, heavily compressed, and read optimized, the delta table is responsible for reflecting changes made
by DML operations such as INSERT, UPDATE, and DELETE. Depending on a cost-based decision, the system
automatically merges the changes of the delta table into the main table (also known as delta merge) to improve
query processing times and reduce memory consumption since the main table is much more compact.

The existence and size of the delta table might have a significant impact on query processing times:

• When the delta table is not empty, the system needs to evaluate the predicates of a query on both the delta
and main tables, and combine the results logically afterwards.
• When a delta table is quite large, query processing times may be negatively affected since the delta table is
not as read optimized as the main table.

Therefore, by merging the delta table into the main table to reduce main memory consumption, delta merges
might also have a positive impact on reducing query processing times. However, a delta merge also has an
associated cost, which is mostly linear to the size of the main table. A delta merge should therefore only be
performed after weighing the improvement in memory consumption and query processing times against this
cost. In the case of automatic merges, it has already been considered in the cost function.

Data Compression

After merging the contents of the delta table into the main table during the delta merge process, the system
might optionally run an additional data compression step to reduce the main memory footprint of the main
table part. This process is also known as optimize compression. Internally, the SAP HANA database system
contains multiple compression methods (run length encoding, sparse coding, default value, and so on).
While the most efficient compression mechanisms are automatically chosen by the system, the compression
mechanism that is applied might also affect the query processing times. It is normally not necessary to

SAP HANA Performance Guide for Developers


18 PUBLIC Schema Design
manually alter the compression methods, or even uncompress the table, but this can be done when there is a
problem with data compression. Generally, however, you should contact SAP Support when there is a problem
with data compression.

3.6 Denormalization

Denormalization can be applied as an additional tuning mechanism to improve performance. The idea of
denormalization is to combine data that was previously kept in different tables into a single combined table
to avoid the overhead of join processing. In most cases, this introduces some redundancy into the underlying
dataset (for example, by repeating customer addresses in multiple orders instead of storing them in a separate
master data table), but it potentially speeds up query processing.

In terms of relational theory, denormalization is a violation of good database design practices since it
deliberately causes violations of normal forms, thereby increasing the risk of anomalies, redundancy, potential
data inconsistencies, and even data loss. Before considering this measure, we strongly recommend becoming
familiar with relational theory and normal forms. Denormalization should be considered as a last resort in
performance optimization. Any schema design should therefore start with a reasonably high normal form (3rd
normal form, BCNF, or even 4th normal form). If it is then impossible to achieve your performance goals, these
forms can be carefully relaxed.

Benefits

Depending on the query workload and data model, join processing might be a significant cost factor in an
application. Particularly if the data that needs to be retrieved by a query is distributed across multiple tables,
join processing might easily become predominant. By changing the underlying database schema and merging
the records of two or more tables (thereby adding all necessary attributes to a single large, combined table),
the costs of join processing can be avoided, which therefore improves performance. The actual gains that can
be achieved depend heavily on the query complexity and datasets. Even for simple joins, such as two tables
reflecting the classical SAP header/line item schema (for example, STXH and STXL, or BKPF and BSEG), there
might be a notable performance boost. Measurements that were done with an example query that aggregated
1000 records after a join between BKPF and BSEG were up to a factor of 4 faster in a denormalized model.

Risks

The typical risks of denormalization revolve around accumulating redundant data, for example redundantly
keeping a customer address as part of a line item instead of storing it in a separate master data table. Special
care has to be taken, for example, to ensure that update operations touch all redundant copies of that data,
otherwise there might be inconsistencies (for example, different addresses for the same customer) or even
data loss (all orders of a customer are deleted and therefore the address is lost because it is not kept in a
separate table).

SAP HANA Performance Guide for Developers


Schema Design PUBLIC 19
The obvious performance drawbacks with added redundancy are as follows:

• Increased memory consumption by keeping redundant data multiple times in a table, for example the
customer address k times in a denormalized model. This is also relevant for table maintenance operations
(LOAD from disk and DELTA MERGE) and the I/O footprint of a table (savepoints, merges, table loading,
and storage requirements).
• Additional update costs (needing to insert the customer address redundantly for each order that is added
to the system)
• Potentially, additional lookup costs (needing to query the customer address from another row of the table
to insert it redundantly with a new order)

There are also less obvious cases where the performance of a system or query can suffer due to
denormalization. For example, consider a setup with two tables in a header and line item relationship. The
header table includes a ZIP code that is used to filter the data, before it is joined with the line item table and
certain values are aggregated for the corresponding line items (for example, price). No indexes are available.
The header table has 1 million entries and each header has a large number of line items (100). If both tables are
now merged through denormalization, the resulting table has the same number of entries as the old line item
table (therefore 100 million).

To now process the filter predicate on the ZIP code, the entire table must be scanned because there is no index
available. This means that 100 times more data needs to be scanned than before. Depending on the selectivity
of the ZIP code, this can easily result in a more expensive plan than when the header table (1 million records)
was simply scanned and the join with the line item table processed afterwards with a reduced set of records.
Obviously, this problem can be mitigated by creating additional indexes. However, this in turn can introduce
additional issues.

Note that while the simplified scenario above sounds trivial, similar effects have been observed with BW
In-Memory Optimised (IMO) InfoCube structures (which basically denormalize a snowflake schema into a star
schema).

Column Store Specifics

The dictionary compression used by the SAP HANA database column store helps to reduce the overhead
of storing redundant data. When the same value is stored multiple times (for example, the street name in a
redundant address), the corresponding literal is stored only once in the underlying dictionary. Therefore, the
added overhead is not the size of the literal but just the size of the corresponding entry in the index vector (this
requires k bits, where k is ceil(log_2 x) for x entries in the dictionary). Therefore, the penalty for storing
redundant data is typically much lower than when denormalization is applied in a row store, where the data is
uncompressed.

When to Denormalize

It is important that you consult an expert first. Denormalization should be applied carefully and only when there
is a clear benefit for the query workload in terms of response times and throughput. Any denormalization
efforts should therefore be driven by performance analysis, which also takes into account the update
workload on the denormalized tables, as well as resource consumption (main memory overhead, additional
I/O footprint, additional CPU costs, also for background operations like delta merge, optimize compression,

SAP HANA Performance Guide for Developers


20 PUBLIC Schema Design
table loading and unloading, savepoints, and backups). As a rule of thumb, the more redundancy there is, the
higher the benefits need to be for the query workload to justify denormalization.

3.7 Additional Recommendations

These points should also be considered in your schema design.

Type Columns Precisely

Data type conversions are expensive and should be avoided. Do not store numerical, date, or timestamp values
in string columns that need to be converted in every query.

Choose Efficient String Data Types

Use the BINARY(<n>) or NVARCHAR(<n>) data type instead of BLOB or NCLOB when possible (that is, when
a maximum <n> of 5000 is large enough for any data that you want to store in the column or value). The SAP
HANA database can more efficiently process BINARY(<n>) or NVARCHAR(<n>) values compared to BLOB or
NCLOB values.

No Materialized Aggregates

SAP HANA is generally very fast when aggregating large amounts of data on the fly. Therefore, aggregates
don’t usually have to be persisted or cached.

Reevaluate this if you see performance issues.

Related Information

Binary Data Types


Character String Data Types
Large Object (LOB) Data Types

SAP HANA Performance Guide for Developers


Schema Design PUBLIC 21
4 Query Execution Engine Overview

The SAP HANA query execution engines are responsible for different types of processing. During query
execution, different engines are invoked depending on the types of objects referenced in the query and the
types of processing therefore required.

An overview of the SAP HANA query execution engines is shown below:

Queries that are not supported by the HEX engine or where HEX execution is not considered beneficial are
automatically routed to engines in the old stack. For detailed information, see SAP Note 3326981 (Why HEX
Plans fallback to old engines such as Join, OLAP, ESX).

Engine Description

Calculation engine Calculation views, including star joins, are processed by the calculation engine.
To do so, the calculation engine may call any of the other engines directly or
indirectly.

SAP HANA Performance Guide for Developers


22 PUBLIC Query Execution Engine Overview
Engine Description

MDS engine SAP HANA multi-dimensional services (MDS) is used to process multidimen-
sional queries including aggregation, transformation, and calculation.

The queries are translated into an SAP HANA calculation engine execution plan or
SQL, which is executed by the SAP HANA core engines.

MDS is integrated with the SAP HANA Enterprise Performance Management


(EPM) platform and is used by reporting and planning applications.

The query languages currently supported use the Information Access (InA)
model. The InA model simplifies the definition of queries with rich or even com-
plex semantics. Data can be read from all kinds of SAP HANA views, EPM plan
data containers, and so on. The InA model also includes spatial (GIS) and search
features.

HEX Engine

The SAP HANA Execution Engine (HEX) is a query execution engine that will replace other SAP HANA engines
such as the join engine and OLAP engine in the long term, therefore eliminating materializations between
different engines, since all functionality is combined in HEX. The HEX engine connects the SQL layer with the
column store by creating an appropriate SQL plan during the prepare phase.

Data in the HEX engine is streamed, so you can get the first result rows quickly without having to wait until all
the result rows are processed.

Operators in the HEX engine are pipelined, which improves parallelization and can reduce memory
consumption.

SAP HANA Performance Guide for Developers


Query Execution Engine Overview PUBLIC 23
Join, OLAP, and ESX Engines

Engine Description

ESX engine The SAP HANA Extended SQL Executor (ESX, legacy) is a frontend execution
engine that retrieves database requests at session level and delegates them to
lower-level engines like the join engine and calculation engine. Communication
with other engines is simplified by using ITABs (internal tables) as the common
format.

Join engine The join engine is used to run plain SQL. Column tables are processed in the join
engine.

OLAP engine The OLAP engine is primarily used to process aggregate operations. Calculated
measures (unlike calculated columns) are processed in the OLAP engine.

SAP HANA Performance Guide for Developers


24 PUBLIC Query Execution Engine Overview
5 SQL Query Performance

Ensure your SQL statements are efficient and improve existing SQL statements and their performance.

Efficient SQL statements run in a shorter time. This is mostly due to a shorter execution time, but it can also be
the result of a shorter compilation time or more efficient use of the plan cache.

This section of the guide focuses on the techniques for improving execution time. When execution time is
improved, a more efficient plan is automatically stored in the cache.

Related Information

SQL Process [page 25]


SAP HANA SQL Optimizer [page 29]
Analysis Tools [page 47]
Case Studies [page 85]
SQL Tuning Guidelines [page 118]

5.1 SQL Process

Although database performance depends on many different operations, the SELECT statement is decisive for
the SQL optimizer and optimization generally.

When more than one SELECT statement is executed simultaneously, query processing occurs for each
statement and gives separate results, unless the individual statements are intended to produce a single result
through a procedure or a calculation view. These separate but collectively executed statements also produce
separate SQL plan cache entries.

 Note

The performance issues addressed here involve SELECT statements. However, they also apply to other
data manipulation language (DML) statements. For example, the performance of an UPDATE statement is
based on that of a SELECT statement because UPDATE is an operation that updates a selected entry.

SAP HANA Performance Guide for Developers


SQL Query Performance PUBLIC 25
The SAP HANA SQL process starts with a SELECT statement, which is then looked up in the cache, parsed,
checked, optimized, and made into a final execution plan. An overview is shown below:

The set of sequences to parse, check, optimize, and generate a plan is called query compilation. It is
sometimes also referred to as “query preparation”. Strictly speaking, however, query preparation includes
query compilation because a query always needs to be prepared, even if there is nothing to compile. Also,
query compilation can sometimes be skipped if the plan already exists. In this case, the stored cache can be
used instead, which is one of the steps of query preparation.

When a SELECT statement is executed, the cache is first checked to see if it contains the same query string.
If not, the query string is translated into engine-specific instructions with the same meaning (parsing) and
checked for syntactical and semantical errors (checking). The result of this is a tree, sometimes a DAG
(Directed Acyclic Graph) due to a shared subplan, which undergoes several optimization steps including logical
rewriting and cost-based enumerations. These optimizations generate an executable object, which is stored in
the SQL plan cache for later use.

Related Information

SQL Processing Components [page 26]

5.1.1 SQL Processing Components


The main components involved in processing an SQL query are the session, the SQL frontend, the SQL
optimizer, and the execution plan.

An overview is shown below:

SAP HANA Performance Guide for Developers


26 PUBLIC SQL Query Performance
Session

The first layer that a query passes through when it is executed is the session. A session is sometimes referred
to as "eapi" or "Eapi", implying that it works like an API.

The session layer is important because it is the first layer in SQL processing. When you work with the SQL plan
cache or several of the traces, you might need to consider the session layer. Also, when you work with different
clients, different sessions with different session properties are created. New transactions and working threads
are then created based on these.

SQL Frontend

The SQL frontend is where the SQL statement is parsed and checked.

When a statement is parsed, it is translated into a form that can be understood by the compiler. The syntax
and semantics of the executed statement are then checked. An error is triggered if the syntax of the statement
is incorrect, which also causes the execution of the statement to fail. A semantic check checks the catalog
to verify whether the objects called by the SQL statement are present in the specified schema. Missing user
privileges can be an issue at this point. If the SQL user does not have the correct permission to access the
object, the object is not found. When these processes have completed, a query optimizer object is created.

Query Optimizer

The query optimizer object, often referred to as a QO tree, is initially a very basic object that has simply
undergone a language translation. The critical task of the query optimizer is to optimize the tree so that it runs
faster, while at the same time ensuring that its data integrity is upheld.

The optimizer first applies a set of rules designed to improve performance. These are proven rules that simplify
the logical algorithms without affecting the result. They are called "logical rewriting rules" because by applying
the rules, the tree is rewritten. The set of rewriting rules is large and may be expanded if needed.

After the logical rewriting, the next step is cost-based enumeration, in which alternative plans are enumerated
with estimated costs. Here, the cost is disproportionate to the plan’s performance. It is a calculated measure
of the processing time of the operator when it is executed by the plan. A limit is applied to the number of cost
enumerations, with the aim of reducing compilation time. Unlimited compilation time might help the optimizer
find the best plan with the minimal execution time but at the cost of the time required for its preparation.

While doing the cost enumeration, the optimizer creates and compares the alternatives from two perspectives,
a logical and a physical one. Logical enumerators provide different tree shapes and orders. They are concerned
with where each operator, like FILTER and JOIN, should be positioned and in what order. Physical enumerators,
on the other hand, determine the algorithms of the operators. For example, the physical enumerators of the
JOIN operator include Hash Join, Nested Loop Join, and Index Join.

SAP HANA Performance Guide for Developers


SQL Query Performance PUBLIC 27
Executors

Once the optimization phase has completed, an execution plan can be created through a process called "code
generation" and sent to the different execution engines. The SAP HANA execution engines consist of two
different types, the row engine and the column engine.

The row engine is a basic processing engine that is commonly used in many databases, not only in SAP HANA.
The column engine is an SAP HANA-specific engine that handles data in a column-wise manner. Determining
which engine is faster is difficult because it always depends on many factors, such as SQL complexity, engine
features, and data size.

SQL Plan Cache

The first and foremost purpose of the SQL plan cache is to minimize the compilation time of a query. The
SQL plan cache is where query plans are stored once they pass the session layer, unless instructed otherwise.
Underlying the SQL plan cache is the monitoring view M_SQL_PLAN_CACHE.

The M_SQL_PLAN_CACHE monitoring view is a large table with primary keys that include user, session,
schema, statement, and more. A plan can be reused only when its plan cache key is matched. The plan cache
key is a composite key consisting of the following columns:

• USER_NAME
• SESSION_USER_NAME
• SCHEMA_NAME
• STATEMENT_STRING
• SESSION_PROPERTIES
• ABAP_VARCHAR_MODE
• IS_INTERNAL

To search for a specific cache entry or to ensure a query has a cache hit, you need to make sure you enter the
correct key values. To keep the SQL plan cache size to a minimum and to prevent it from becoming outdated,
the entries are invalidated or even evicted under specific circumstances.

The SQL plan cache uses eager eviction to automatically evict plans that are not so useful. When the eager
eviction mode is set to auto, SAP HANA determines the best configurations for applying eager eviction based
on usage frequency or a usefulness score:

• Time-based eviction: Cached plans that are not used for more than a certain time (default: 1.5 days)
are automatically evicted. If a plan is evicted due to time-based eviction, the LAST_EVICTION_REASON
column of the M_SQL_PLAN_STATISTICS system view has the value EXPIRED.
• Score-based eviction: The plan cache periodically assigns a score to each plan to capture the relative
usefulness of each cached plan. Based on this number, the plan cache evicts a certain percentage of
low-scoring plans. If a plan is evicted due to score-based eviction, the LAST_EVICTION_REASON column of
the M_SQL_PLAN_STATISTICS system view has the value LOW_SCORE.

For more information, see Plan Cache in the SAP HANA Administration Guide.

Therefore, good housekeeping of the SQL plan cache involves striking a balance between cache storage size
and frequent cache hits. A big cache storage allows almost all queries to take advantage of the plan cache,
resulting in faster query preparation but with the added risk of an ineffective or unintended plan and very high

SAP HANA Performance Guide for Developers


28 PUBLIC SQL Query Performance
memory consumption. Primary keys, the invalidation and eviction mechanism, and storage size settings are
there for a good reason.

Related Information

Plan Cache

5.2 SAP HANA SQL Optimizer

The two main tasks of the SQL optimizer are rule-based and cost-based optimization. The rule-based
optimization phase precedes cost-based optimization.

Rule-based optimization involves rewriting the entire tree by modifying or adding operators or information
that is needed. Every decision the optimizer makes must adhere to predefined rules that are algorithmically
proven to enhance performance. Cost-based optimization, which consists of logical and physical enumeration,
involves a size and cost estimation of each subtree within the tree. The optimizer then chooses the least costly
plan based on its calculations.

Note that rule-based optimization is a step-by-step rewriting approach applied to a single tree whereas cost-
based optimization chooses the best tree from many alternative trees.

The sections below describe each of the optimization steps. Rules and enumerators that are frequently
chosen are explained through examples. Note that the examples show only a subset of the many rules and
enumerators that exist.

Related Information

Rule-Based Optimization [page 29]


Cost-Based Optimization [page 34]
Decisions Not Subject to the SQL Optimizer [page 45]
Query Optimization Steps: Overview [page 46]

5.2.1 Rule-Based Optimization

The execution times of some query designs can be reduced through simple changes to the algorithms, like
replacing operators or converting one operator to another, irrespective of how much data the sources contain
and how complex they are.

These mathematically proven rules are predefined inside the SAP HANA SQL optimizer and provide the basis
for the rule-based optimization process. This process is very efficient and does not require data size estimation
or comparison of execution cost.

SAP HANA Performance Guide for Developers


SQL Query Performance PUBLIC 29
The frequently used rules are described in this section.

Related Information

Unfold Calculation View [page 30]


Select Pull-Up, Column Removal, Select Pushdown [page 31]
Simplify: Remove Group By [page 32]
Simplify: Remove Join [page 32]
Heuristic Rules [page 33]
How the Rules are Applied [page 33]

5.2.1.1 Unfold Calculation View

SQL is relational whereas calculation models are non-relational. To enable a holistic approach and integration
with the SQL optimizer, calculation models are translated into a relational form wherever possible. This is called
calculation view unfolding.

This process is shown below:

To understand calculation view unfolding, a good knowledge of calculation views is needed. A calculation
view is an SAP HANA-specific view object. Except for a standard SQL view that can be read with SQL, it
consists of functions in the SAP HANA calculation engine language, which are commonly referred to as "CE
functions". Due to this language difference, the SQL optimizer, which only interprets SQL, cannot interpret a
CE object unless it is coded otherwise. Calculation view unfolding, in this context, is a mechanism used to pass
"interpretable" SQL to the SQL optimizer by literally unfolding the compactly wrapped CE functions.

SAP HANA Performance Guide for Developers


30 PUBLIC SQL Query Performance
Calculation view unfolding is always applied whenever there are one or more calculation views. This is because
it is more cost efficient to run the complete optimization process in one integrated optimizer, the SQL optimizer
in this context, than to leave the CE objects to be handled by the calculation engine optimizer.

The calculation engine is used, however, in cases where the calculation view unfolding rule is blocked. Unfolding
blockers range from ambiguous to straightforward. If a calculation view is not unfolded, you can find out more
about the blocker by analyzing the traces.

Related Information

SQL Trace [page 69]


SQL Optimization Step Debug Trace [page 72]

5.2.1.2 Select Pull-Up, Column Removal, Select Pushdown

The main purpose of compilation is to minimize execution time, which depends heavily on the level of
complexity. The major cause of increased complexity is redundant column projection.

As part of query simplification, the optimizer pulls all projection columns up to the top of the tree, applies
simplification measures, which include removing unnecessary columns, and then pushes down the filters as far
as possible, mostly to the table layer, as shown below:

This set of pull-up, remove, and pushdown actions can be repeated several times during one query compilation.
Also, the simplification in between the pull-up and pushdown actions may include other measures like adding
more filters, removing operators, or even adding more operators to make optimization more efficient. The
query plan might temporarily be either shorter or longer while the later optimization steps are being compiled.

Related Information

SQL Optimization Step Debug Trace [page 72]

SAP HANA Performance Guide for Developers


SQL Query Performance PUBLIC 31
5.2.1.3 Simplify: Remove Group By
A tree might have multiple aggregation operators one after the other. This might be because you intended it or
because the plan was shaped like this during logical rewriting.

For example, after the first and second columns are aggregated, another aggregation is done for the third
column, and lastly the fourth and fifth columns are also aggregated.

While a query like this does not seem to make much sense, this type of redundant or repeated aggregation
occurs frequently. It mainly occurs in complex queries and when SQL views or calculation views are used. Often
users don't want to use multiple data sources because they are difficult to maintain. Instead they prefer a single
or compact number of views that can be used in many different analytical reports. Technically, therefore, the
views need to be parameterized with dynamic variables, which means that the query plans can vary according
to the actual variable values. Also, for maintenance purposes, users don't create all views from scratch but
instead reuse existing views as a data source for another view. Therefore, by the time the query is made into a
logical plan through calculation view unfolding, column pushdown, and redundancy removal, the aggregations
might be stacked, as shown in the example below:

To tackle this redundancy, the SQL optimizer removes or simplifies the aggregation operators by turning them
into a single operation, unless there are limiting factors. If the child aggregation does not contain all the
GROUP BY columns that are defined for its parent, their aggregations cannot be merged. Also, if the parent has
aggregation functions other than MIN, MAX, or SUM, which makes the parent aggregation itself rather heavy,
the aggregations cannot be merged.

5.2.1.4 Simplify: Remove Join


A join can be simplified when there is a logical redundancy.

One example would be a join that only needs the columns from one of the join children. In this case, no columns
are needed from the other child, so there is no need to do the join. Therefore, the join is removed and there
is just a projection. The example below shows how there is eventually just one operator, for example one
projection or one table scan:

SAP HANA Performance Guide for Developers


32 PUBLIC SQL Query Performance
5.2.1.5 Heuristic Rules

The predefined optimization rules include heuristic rules. Heuristic rules are not directly derived from
mathematical or algorithmic calculation but from examination and analysis.

For example, when join children contain so much data that their size exceeds a certain threshold, this is
known to be very inefficient. It is also known that in these cases performance can be improved by changing
the join order, that is, by switching the join with another join in the lower layer. The decisive factor for this
optimization rule is the threshold because it determines whether the rule is valid and whether it should be
applied. One record less can result in a completely different result. Heuristic results, therefore, have advantages
and disadvantages. On the one hand, they can significantly improve performance through simple changes, but
on the other hand, they can result in an undesirable and unoptimized decision.

Heuristic rules are one of the most prominent root causes of performance issues. There are still quite a few
cases that could be solved by simply limiting the optimization level to non-heuristic rules. However, most of
the time, query performance benefits from heuristic rules. Therefore, heuristic rules should not be ignored, but
they must be carefully observed when there is a performance issue.

5.2.1.6 How the Rules are Applied

The algorithms of the optimization rules are predefined as well as the order in which they are applied. The SQL
optimizer goes through the list of rules row by row to check if the rule is applicable. Applicable rules are applied,
others aren't.

The example below shows the application of a series of rules:

As shown above, rule-based optimization is sequential rather than concurrent. This is worth noting because
cost-based optimization (see below) is essentially a concurrent task.

Some rules are applied repeatedly until a specific condition is satisfied, and sometimes one rule triggers
another one, although this is not always a prerequisite for every query. The rules are applied repeatedly to the
same plan tree, modifying the tree in every step, until the list of predefined rules has been completed and the
SQL optimizer can move on to the next step, which is cost-based optimization.

Related Information

Cost-Based Optimization [page 34]

SAP HANA Performance Guide for Developers


SQL Query Performance PUBLIC 33
5.2.2 Cost-Based Optimization
Cost-based optimization involves finding the optimal plan by analyzing and comparing execution costs. Costs
are acquired by mathematically evaluating sizes. In this context, size basically indicates the number of rows in
the dataset, but it can also be larger depending on the data type and operator.

A table with a hundred million records is not particularly big for a commercial database. However, if it has
multiple text columns like DOUBLE or TEXT, and if there is a window function like ROW_NUMBER or RANK
that requires row-wise evaluation on those columns, the plan will no longer be clearly structured and this will
increase the costs. In this example, the optimizer will try to minimize the intermediate result by pushing down
the filters and joins so that the row engine does not have to do the window function evaluation on a massive
amount of data.

In a comparative sense, rule-based optimization can be regarded as a broad macroscopic task and cost-
based optimization as a more delicate microscopic one. Rule-based optimization is relatively predictable and
sometimes even obvious because the rules are predefined and work in a set way. Nevertheless, those rules
cannot address every detail of an individual query. So cost-based optimization, as an atomic assessment
process that tags each operator with its size and cost, adds more accuracy to the performance optimization.

Enumeration Types

There are two types of enumeration, logical enumeration and physical enumeration. Logical enumeration is
used to build the overall shape of the tree. Physical enumeration is used to determine the characteristics of
each of its components.

The only goal of the SQL optimizer is to achieve efficiency and since its main aim is to minimize execution time,
it is often prepared to sacrifice available resources such as memory and CPU for this purpose. This in turn can
raise other performance issues, such as the out-of-memory (OOM) condition, which is described in a dedicated
section.

Related Information

Logical Enumeration [page 34]


Physical Enumeration [page 37]
Column Search [page 40]
How Enumeration Works [page 44]

5.2.2.1 Logical Enumeration


For the SAP HANA SQL optimizer, logical enumeration involves comparing the costs of all possible candidates
detected within the limited boundary of the search area. It includes proposing alternatives for each operator,
calculating the estimated costs, and deciding for or against that alternative depending on the costs.

Logical enumeration, therefore, is an enumeration of the format of the tree and a decision on the existence
and order of the operators. When a filter, join, or aggregation is pushed down through another operator, that is

SAP HANA Performance Guide for Developers


34 PUBLIC SQL Query Performance
where the logical enumerator has come into play. Usually, whenever something is pushed down through or into
something else, it is due to the logical enumerator. Cost is the key factor that determines the optimized logical
plan.

An example is shown below:

Related Information

_THRU_ [page 35]


PRE…_BEFORE_ [page 36]

5.2.2.1.1 _THRU_
Enumerators that belong to this category include JOIN_THRU_JOIN, JOIN_THRU_AGGR, and
LIMIT_THRU_JOIN. These A_thru_B enumerators literally move the first operator A through the second B,
and position A below B in the plan tree. This is particularly effective when A reduces the data more than B does.

A_thru_B:

SAP HANA Performance Guide for Developers


SQL Query Performance PUBLIC 35
For example, suppose that there is an aggregation over a join that is very heavy because it is a left outer join
with range join conditions on multiple timestamp columns. The SQL optimizer tries to push the aggregation
through the join so that it can reduce the dataset of one of the join candidates. This is AGGR_THRU_JOIN.
Pushing down operators like limit and filter also work within this context.

JOIN_THRU_JOIN basically switches the order of the joins. How effective this is depends on the data sizes
involved, but it does not negatively impact performance. Unlike rule-based rewriting, cost-based optimization
does not “change” anything until it reaches the optimized point or is limited by the enumeration limit. During
the process of enumeration, it tries out the enumeration rules and uses them to propose better alternatives,
which then are evaluated in terms of costs, allowing the most inexpensive plan to finally be selected.

5.2.2.1.2 PRE…_BEFORE_

The most well-known enumerator of this type is PREAGGR_BEFORE_JOIN. PREAGGR_BEFORE_JOIN is a


preaggregation. This means that it adds a preaggregation to one of the join candidates so that the size of the
join can be reduced. In these cases, the aggregation over the join is called "post-aggregation".

PREAGGR_BEFORE_JOIN:

If you want to do a preaggregation on both children of the join, you can use
DOUBLE_PREAGGR_BEFORE_JOIN.

SAP HANA Performance Guide for Developers


36 PUBLIC SQL Query Performance
5.2.2.2 Physical Enumeration

Physical enumeration is similar to logical enumeration. However, while logical enumerators help generate
alternatives with a different logical structure and order of the operators, physical enumerators enumerate the
engine options, operator types, execution locations, and so on.

For example, in the physical enumeration phase, physical enumerators determine whether the column engine
or the row engine should perform a certain operation. They also determine whether a hash join, nested loop
join, index join, or other type of join should be used.

Physical alternatives:

Related Information

CS_ or RS_ [page 37]


Cyclic Join [page 38]
REMOTE_ [page 39]

5.2.2.2.1 CS_ or RS_

CS and RS denote column store and row store. This applies in SAP HANA generally and not just in the SQL
optimizer. Any enumerators that start with CS indicate that column engines should be used rather than row
engines for the designated operation.

For example, CS_JOIN means that a join should be performed with column engines. However, this is still
subject to the principle of cost-based optimization, which ensures that the most inexpensive plan is selected as
the final plan. Therefore, there is no guarantee that the joins in this plan would be performed in column engines
even if you use a hint to try to enforce it. The joins are ultimately only handled by column engines if the cost
evaluation confirms that CS_JOIN is expected to give a better outcome.

SAP HANA Performance Guide for Developers


SQL Query Performance PUBLIC 37
For example:

For information about which of the three main column engines (join, OLAP, or calculation engine) is responsible
for an operation, you need to examine traces like the Explain Plan and Plan Visualizer that show the final
execution plan rather than the compilation procedure.

5.2.2.2.2 Cyclic Join

When three or more tables are joined together in a chain or cycle of joins, this is called a cyclic join. For
example, table A is joined with table B, which is joined with table C, and table C is joined with table A.

In SAP HANA, the cyclic inner join is natively supported by the column engines whereas the cyclic outer join is
not. So when a cyclic join has outer joins, the cycle is broken into two parts. The join between table A and table
B is done first and its result is then joined with table C, or something similar. Determining which of the three
joins should be broken is a mathematical problem that the optimizer solves during compilation. The calculation
is based on the estimated size and costs of the joins. This approach does not present any problems.

A cyclic join broken into two parts:

The problem is when the cyclic join is an inner join. Because the column engines natively support it, the SQL
optimizer needs to decide whether to leave the cyclic join intact or whether to break it. Due to uncertain

SAP HANA Performance Guide for Developers


38 PUBLIC SQL Query Performance
size and cost estimations and the unpredictable performance of the execution engines, the SQL optimizer
sometimes makes a bad decision, which in turn causes a performance issue. The issue is even more critical
when the data sizes of the join children are large because the materialized intermediate result is likely to be
huge.

Therefore, in cases where there are many complicated join relations, it is worth checking if there is a cyclic join
and if the cycle is broken. If the performance hot spot proves to be the cyclic join, one of the workarounds could
be to break it using the SQL hint NO_CYCLIC_JOIN. This indicates that breaking the cyclic join is preferable to
keeping it.

5.2.2.2.3 REMOTE_
This physical enumerator is used and only makes sense when there is a remote server connected to SAP
HANA.

Many companies use SAP HANA as an analytics database but store their source data in another database
connected to the SAP HANA system. Examples of remote connections include Smart Data Access (SDA), Near
Line Storage (NLS), and Dynamic Tiering.

A connection to a remote server:

When a query containing a table from a remote server is compiled, if data statistics are created for the tables
on the remote server using the CREATE STATISTICS statement, this helps the SQL optimizer to make better
decisions for query plans during cost-based enumeration.

As one of the physical enumeration processes, the SQL optimizer compares the costs of doing operations on
the remote server and on SAP HANA. For example, it evaluates what the cost would be to do a join on the
remote server and transmit its result to SAP HANA, and then compares it with the cost of transmitting the
whole data source to SAP HANA and running the join on SAP HANA. The network performance between SAP
HANA and the remote server can make this task more difficult.

SAP HANA Performance Guide for Developers


SQL Query Performance PUBLIC 39
In terms of the remote execution, there can be a significant difference between what SAP HANA expects the
remote server would do and what it actually does. For example, according to the SQL optimizer’s plan, the
filtered data is supposed to be loaded from the remote server to SAP HANA. However, due to limited network
bandwidth and the huge volume of data, a performance issue may arise. An out-of-memory (OOM) situation
can also occur if there is not enough memory on the remote server. Therefore, when there is a remote server
involved in a query execution and the execution fails or runs slowly, you need to consider whether it might be
caused by the execution on the remote server. You need to examine the query statement transmitted to the
remote server to see how much time it took to run it and fetch the result.

5.2.2.3 Column Search

Column search is a composite operator that can be operated in a single IMS search call. An IMS search call is a
request from the SQL engine to the column engine, generally with the query execution plan (QE).

Each query plan that is passed to the column engines uses this interface. For an IMS search call to occur, the
query plan must have a columnar plan. A single IMS search call means that the SQL engine does not have
to send multiple execution request through IMS. Because the column search is composite, by definition, the
request can contain more than one operator even though it is a single interface call.

A column search is created or decided on during cost-based physical enumeration. The decision to run a
column search is called "column search availability". When a column search is "available", it means that one
or more operators will be executed in column engines and that the operation involves a single API call. For
example, a query has two column tables with JOIN, GROUPBY, and LIMIT operators in a bottom-up order.
Because the base tables are column-store table and the operators are natively supported by the column
engines, this query plan is likely to have at least one column search. If the optimizer decides that all three
operators can be combined into one composite operator, everything is packed or absorbed into one single
column search, resulting in a fully columnar plan.

Single column search:

SAP HANA Performance Guide for Developers


40 PUBLIC SQL Query Performance
Related Information

Absorption [page 41]


Pushdown Blocker [page 42]
Singular Versus Stacked Column Search [page 43]

5.2.2.3.1 Absorption

Absorption is a mechanism that extends the column search boundary across the operators. Starting with the
table, the column search moves up along the tree and merges the operators until it meets a blocker that
prevents its expansion.

The procedure by which the operators are merged into the column search is referred to as absorption. This is
shown below:

So that the operators can be absorbed into a column search, they should be types that are natively supported
by at least one of the column engines in SAP HANA.

For example, let's assume the JOIN is a range join that requires row-wise comparison. Because this join is
not natively supported by the column engines, it cannot be absorbed even though the join candidates are
both column-store tables. Because this join has already broken the column search, it is highly likely that the
aggregation and limit operations above it will be executed in row engines.

There are also cases where operators cannot be merged into the column search due to their logical order even
though they are natively supported by the column engines. For example, if the logical plan has an aggregation
that is then joined with another table, this join cannot be absorbed, because this type of composite operator
that handles an aggregation first and then a join is not supported by any of the column engines. Column
engines do support an aggregation over a join or a join over a join but not a join over an aggregation. So in this
case where there is a join over an aggregation, the column search will be divided into two searches, a column
search for the aggregation and possibly another column search containing the join above it.

SAP HANA Performance Guide for Developers


SQL Query Performance PUBLIC 41
A column search with a join that cannot be absorbed:

When a column search is split into multiple parts, it is referred to as a stacked column search. A stacked
column search can significantly impact the performance of the entire query.

Note that if the SQL optimizer decides that the join should be executed in a row engine rather than a column
engine, this is not a stacked column search, because there is only one column search and the operators that
are not absorbed are handled in row engines.

Absorption, therefore, can only occur if an operator is natively supported by a column engine and if it is
expected to give a better result. Absorption occurs in a bottom-up manner and in a predefined order until a
blocker is encountered. A valid order is as follows: Table scan – Join – Aggregation – Limit/Top.

5.2.2.3.2 Pushdown Blocker

A pushdown blocker is anything that prevents the absorption of an operator into an existing column search.

From a performance perspective, it is nearly always better to push down heavy operators, as is the case
when they are pushed down into the column search. However, pushdown blockers interfere with the pushdown
process.

Pushdown blockers are usually derived from the materialization operator and the limitations of the execution
engines. Some examples of blockers include the range join, window operators, and logical order. The best way
to find out which pushdown blockers are involved is to analyze the traces, particularly the SQL Optimization
Step trace.

Related Information

SQL Optimization Step Debug Trace [page 72]

SAP HANA Performance Guide for Developers


42 PUBLIC SQL Query Performance
5.2.2.3.3 Singular Versus Stacked Column Search
A stacked column search is where there are multiple column searches stacked on top of each other. The Plan
Visualizer provides the most intuitive way to understand the stacked column search, but other traces also
provide similar information.

A stacked column search impacts SQL performance because it entails data materialization. Data
materialization is an operation where the intermediate result is converted into a physical temporary table
with or without metadata information and part of the physical memory is used to temporarily store the data.
Its memory consumption increases as the size of the intermediate data grows. In many out-of-memory (OOM)
cases, it has been found that a large part of the memory is allocated for data materialization during a join. Even
for non-OOM performance issues, materialization of a vast amount of data normally takes a long time.

For example, consider the following:

The more complex a plan is, the more column searches that are stacked. Each column search can be regarded
as a data materialization operator. In the example, 8 base tables are used and each table has no more than 10
rows. However, joined together, there are 100 million intermediate records. This number is not a problem for a
normal database, but it is a problem when so many records need to be physically stored in memory even if it
is temporary. It becomes even more problematic if the intermediate result contains long texts such as TEXT,
BLOB, CLOB, and so on.

Because you can predict how the SQL optimizer will handle a query, you can prevent a plan from becoming a
stacked column search. The first step is to remove the potential pushdown blockers and add as many filters
as possible to the data sources. Because SAP HANA is evolving, it is difficult to know exactly which pushdown
blockers there are. Nevertheless, there are several examples you can safely use as a preventive measure or
performance tuning method.

SAP HANA Performance Guide for Developers


SQL Query Performance PUBLIC 43
Despite these problems, however, you should not simply switch to row engines. Column engines are by nature
faster than row engines when vast amounts of data needs to be processed for analytical purposes. Considering
that most of the business data for heavy analytics is stored in column-store tables, fully columnar plans will
outperform row engines with a few exceptions. However, it is important that you try to keep column search
stacks to a minimum. Due to their memory consumption, they can be critical for an in-memory database and
even lead to out-of-memory conditions. Therefore, you need to be aware of data materialization when you use
SQL, particularly where the intermediate results like join outputs are expected to be heavy. When analyzing
SQL, it is recommended that you try to identify problematic materialization and avoid it if possible by using
static filters or a union instead.

5.2.2.4 How Enumeration Works

Logical and physical enumerations are neither separate nor sequential processes. They are simultaneous and
continuous.

After rule-based optimization, the tree has been rewritten according to the predefined rules that are applicable
to it. As the SQL optimizer moves on to the enumeration process, an empty bucket called a "search space" is
created, which will store all the tree's alternatives. Starting with the tables at the bottom of the tree, the SQL
optimizer investigates each node's logical alternative. For example, for a simple base tree consisting of a join
with an aggregation above it, the logical enumerator AGGR_THRU_JOIN can create a logical alternative where
the aggregation is positioned below the join.

Directly after this, the SQL optimizer tags the applicable physical operators to each node. Examples for the
join operation include Hash Join, Nested Loop Join, and Index Join. Due to the physical operators that have
been added, there are now multiple alternative plans. Each alternative is evaluated based on its costs and
only the least expensive ones are retained (the others are pruned away). Having completed the work at the
bottom of the tree, the optimizer moves on to the upper nodes, for example a filter, limit, or possibly another
aggregation or join that is positioned above the join and aggregation (base tree) already described. Then, the
logical alternatives and physical alternatives are created on these nodes, the costs estimated, and pruning
applied. This round of activities is repeated from the bottom node up to the top.

SAP HANA Performance Guide for Developers


44 PUBLIC SQL Query Performance
An enumeration limit is in place to ensure that the plan does not become too complex. By default, the
enumeration limit is 2,000. This means that the maximum number of alternatives the optimizer can create
during enumeration is 2,000. This is shown as the "Enumeration Boundary" below:

Although it is generally beneficial to have a limit on the number of enumerated alternatives, it can sometimes
be disadvantageous. This might be the case when the optimizer fails to find the optimal plan simply because
the optimal point is too far away, so the enumeration limit is reached first and compilation completed. In this
case, the result of the compilation is inefficient and slow, which leads to another type of performance issue.

5.2.3 Decisions Not Subject to the SQL Optimizer

When cost-based enumeration ends, the final query execution plan is generated (called "code generation").
This is the plan that will be executed by the execution engines. At this point, the role of the SQL optimizer also
ends.

After code generation, the query plan can still be changed but only by the execution engines. This is called
"engine-specific optimization" because the changes are determined and made by the execution engines.

Engine-specific decisions vary from engine to engine. An example of an engine-specific decision is the parallel
thread decision made by the join engine. During the optimization process, the SQL optimizer does not consider
aspects related to memory management and CPU. Its primary task is to make the plan run faster and to do
so it uses all available resources to a maximum. Whether the query should run on multiple threads is not part
of its decision-making. The join engine is responsible for making this decision. That is why you need to use an
engine-specific hint and not an SQL optimizer hint to enforce parallel execution for the query.

SAP HANA Performance Guide for Developers


SQL Query Performance PUBLIC 45
Related Information

HINT Details

5.2.4 Query Optimization Steps: Overview

The key to solving a performance issue is to find the hot spot. This is the point at which the compilation
process no longer runs in an expected manner. To be able to identify the underlying issues, you require a
thorough understanding of the query optimization steps.

You need to know the specific activities that happen during each step of the query optimization process, as well
as the other activities that occur in the previous and subsequent steps, to be able to pinpoint the root cause of
the problem and devise a workaround.

An overview of the entire process is shown below:

SAP HANA Performance Guide for Developers


46 PUBLIC SQL Query Performance
5.3 Analysis Tools

This section provides an overview of the tools and tracing options that are available.

Related Information

SQL Plan Cache [page 47]


Explain Plan [page 54]
Plan Visualizer [page 61]
SQL Trace [page 69]
SQL Optimization Step Debug Trace [page 72]
SQL Optimization Time Debug Trace [page 79]
Views and Tables [page 84]

5.3.1 SQL Plan Cache

The SQL plan cache is where the plan cache entries are stored for later use. Each time a query is executed,
the SQL plan cache is checked to see if there is a cached plan for the incoming query. If there is, this plan is
automatically used.

This means that generally you do not need to search for a cache as part of ordinary database administration.
However, when the execution of a query has been slow and you need information about it and the related
statistics, the SQL plan cache is the place to look.

Finding an entry in the SQL plan cache can be difficult because the table contains many primary keys. However,
a query is usually always executed in the same client with the same user. In daily business, it is generally
the same person who logs into a system with the same user account and runs the same report in the same
manner. So if all this applies, it is not very difficult to find the cache. You simply need to set a filter that contains
part of the query that you always use on the STATEMENT_STRING column. You should get a single cache entry
that has been used recently for repeated executions.

The difficulty arises when there are many users that run the same or similar statements from many different
client environments with different variables from different schemas. Here, you first need to know which of
those variants caused the performance issue that you are analyzing so that you do not have to manually go
through multiple lines of search results from M_SQL_PLAN_CACHE to find the entry that you are looking for.

You can search the SQL plan cache using an SQL statement like the following:

SELECT
"STATEMENT_STRING", "IS_VALID", "LAST_INVALIDATION_REASON", "STATEMENT_HASH",
"USER_NAME", "IS_VALID", "PLAN_ID", "EXECUTION_COUNT", "PREPARATION_COUNT",
"LAST_EXECUTION_TIMESTAMP", "LAST_PREPARATION_TIMESTAMP"
FROM "M_SQL_PLAN_CACHE"
WHERE "STATEMENT_STRING" LIKE '%<part of your statement>%'
AND "HOST" = '<host_name>'
AND "USER_NAME" = '<database user name>';

SAP HANA Performance Guide for Developers


SQL Query Performance PUBLIC 47
HOST

The host matters particularly when the tables used as data sources are partitioned and distributed across the
hosts. This is because of the way in which the SQL plan cache integrates multiple physical tables that are
host-dependent and reside separately.

In other words, different cache entries are stored when the same query is compiled on host A and on host B.
The plan compiled on host A is stored in host A with value "A" in the HOST column, and the plan compiled on
host B is stored in host B with "B" in the HOST column. Later, when M_SQL_PLAN_CACHE is called from any
of the hosts, both records appear in the result. The location of the compilation is decided on demand by the
statement routing rules when the query is executed.

Since this is the underlying mechanism of the SQL plan cache, you will get double, triple, or even more records
when you search the M_SQL_PLAN_CACHE monitoring view using only the statement string filter. Therefore,
host information is important to find the correct entry.

An example with multiple hosts and distributed tables is shown below:

USER_NAME and SCHEMA_NAME

When the sharing type of the SQL plan is GLOBAL (as opposed to SESSION LOCAL, for example), a plan is
always shared with any other connections concurrently. This means that different entries are stored in the
plan cache for different execution users. This affects the cache hit ratio when the identical query is repeatedly
executed by multiple users.

If the expected cache hits for your query are missing, it is highly likely that you have logged in with a different
user, for example, "TRACE_USER" or "ADMIN_USER" for tracing or administration tasks, whereas the cache
might have been compiled by one of the business users like "SAPABAP1" or "SAPPRD".

To ensure that you have a cache hit, you must log in with the user experiencing the performance issue or
long-running threads. To search for one specific query plan in the SQL plan cache, you should set a user filter as
well as a schema filter.

SAP HANA Performance Guide for Developers


48 PUBLIC SQL Query Performance
SESSION_PROPERTIES

Different clients create different cache entries. This is controlled by the SESSION_PROPERTIES column. The
session property contains session context information, including the client interface, client type, and client
number.

This means, for example, that different entries are created when the same query is run from the SAP HANA
SQL console and from another BI analytics client. These entries might also have different users. Therefore,
if you are not certain which application or execution has caused the issue and you search using a single
STATEMENT_STRING, you should expect multiple search results.

STATEMENT_STRING

STATEMENT_STRING is the best known and most popular search criterion in the SQL plan cache because it is
intuitive and simple to use.

STATEMENT_STRING is handy because you can usually recall at least a few column names, or the
table, view, or procedure name used in the FROM clause directly after executing the query. In fact,
"STATEMENT_STRING LIKE '%<part-of-the-statement>%'" is a favorite among query tuning experts. However,
it is worth remembering that a microscopic approach through filtering other columns like user, schema, host,
and client can reduce the search time and make your work more accurate.

IS_VALID

For maintenance purposes, SQL plan cache entries can be invalidated or evicted.

Evicted plans no longer appear in the SQL plan cache. Invalidated plans are shown in the SQL plan cache, but
the IS_VALID column contains "false", indicating that they have become invalid. The reason why they were
invalidated is recorded in the LAST_INVALIDATION_REASON column.

In both cases the entries cannot be reused, so a new plan needs to be created. Therefore, if expected cache hits
are missing, it might be helpful to check whether the relevant plan has been invalidated or evicted.

EXECUTION_COUNT

The execution count is how many times the execute function has been called in SAP HANA for the given query
plan ID.

If the same query has been requested, for example 50 times by another system since it was first compiled, its
execution count should be 50 unless it is a parameterized query.

You can use this column to check whether the query concerned has been used every time it was requested. It is
generally assumed that everyday executions always have cache hits and use existing plans, whereas in fact new
plans are compiled and created each time. The execution count can shed some light on this matter.

SAP HANA Performance Guide for Developers


SQL Query Performance PUBLIC 49
AVG_EXECUTION_TIME and MAX_EXECUTION_TIME

The most important figure for performance issues is execution time. However, it is difficult to get the exact
execution time without running the query, which can result in long-running or out-of-memory (OOM) issues or
even cause the system to crash.

To get a more exact figure from the history, it is recommended to compare the average execution time and
maximum execution time of the existing plan because that indicates whether its performance is consistent.

LAST_EXECUTION_TIMESTAMP

The LAST_EXECUTION_TIMESTAMP column is the simplest to use because the query you want to find is
usually the one you have just run. By simply sorting the entries by this column in descending order, you should
find the entry that you are looking for.

Related Information

SQL Plan Cache Invalidation Reasons [page 50]

5.3.1.1 SQL Plan Cache Invalidation Reasons

For invalidated plans shown in the SQL plan cache, the IS_VALID column contains false, indicating that they
have become invalid. The reason why they were invalidated is recorded in the LAST_INVALIDATION_REASON
column.

Invalidation can be triggered by system changes or by the user. Invalidation reasons include the following:

Invalidation Reason Details By

ADDITIONAL RELATED OBJECT Check if related objects were added af- SYSTEM
FOUND ter OID collection for lock acquisition

AUTO RECOMPILE TRIGGERED When the user enables the auto recom- SYSTEM
pilation feature and a plan meets its
conditions to be recompiled

CLEAR CACHE When the user tries to clear cache; USER

Plans which are under execution can-


not be evicted, so these plans are invali-
dated instead of evicted.

SAP HANA Performance Guide for Developers


50 PUBLIC SQL Query Performance
Invalidation Reason Details By

CONNECTION IS CLOSED When the connection is closed; USER

SESSION LOCAL plans, which are valid


within a specific session, are evicted. If
the plan is under execution, these plans
are invalidated instead of evicted.

EMPTY SHARING USER SET For the inter-user shared plan; SYSTEM

If the sharing user pool of the target


plan is empty, the plan is invalidated.
This situation usually happens when
the relevant user is dropped.

ENTRY RECOMPILED When the user tries to invalidate the SYSTEM


target plan. This is the intended opera-
tion by the user.

EVICT-VICTIM NOT EVICTED When the user tries to add, remove, en- USER
able, or disable the statement hint;

The plan cache selects a specific entry


as an eviction victim, but it cannot be
evicted for some reason. In this case,
the plan cache just sets these types of
entries as invalidated.

EXCEPTION WHILE COMPILE If an exception occurs during compila- SYSTEM


tion, recompilation is needed.

FALLBACK COMPILE OPTION This invalidation type is used when a SYSTEM


query is optimized for HEX, but during
execution a fallback to the OLAP or join
engine is needed.

INCONSISTENCY_OF_REPLICA- Check the following three replication- SYSTEM


TION_STATUS relevant conditions to find the inconsis-
tency of the replication status; If all
three conditions are true, the plan is in-
validated:

• is_route_for_async_replication
• use_trexnet_execution
• isRemoteQueryExecution

INVALIDATED DUE TO DYNAMIC PLAN When the dynamic plan condition is not SYSTEM
CONDITIONS valid, the plan is recompiled.

LOCATION TIMESTAMP MISMATCH When there is a location timestamp SYSTEM


change;

Distributed plans are usually being re-


compiled. If there is difference between
the cached plan-related location time-
stamps and the current location time-
stamps, it leads to recompilation.

SAP HANA Performance Guide for Developers


SQL Query Performance PUBLIC 51
Invalidation Reason Details By

METADATA TIMESTAMP CHANGED If the metadata timestamp has SYSTEM


changed during compilation, recompila-
tion is needed.

NON-USER-OID NOT FOUND (OBJ-ID When the non-user object has disap- SYSTEM
<id>) peared and it is the related object of the
target plan, recompilation is needed.

NO EXECUTABLE PLAN When the plan does not have an execut- SYSTEM
able plan, recompilation is needed.

OBJECT TYPE MISMATCH (OBJ-ID When the cached object is different SYSTEM
<id>) from the metadata object (resides in re-
mote), recompilation is needed.

OBJECT VERSION MISMATCH (OBJ-ID When the cached object's version is dif- SYSTEM
<id> ) ferent from the metadata object's ver-
sion, recompilation is needed.

OID NOT FOUND (OBJ-ID <id>) When the remote plan's related ob- SYSTEM
ject has disappeared, recompilation is
needed.

ONLINE INDEX RUNNING (OBJ-ID The new online index is being created SYSTEM
<id>) concurrently. The previously compiled
plan should be recompiled.

PROXY TABLE UPDATED When the status of the proxy table SYSTEM
is changed, the plan should be recom-
piled.

PUBLIC SYNONYM OBJECT CREATED When the PUBLIC SYNONYM OBJECT USER
(OBJ-ID <id>) is created with the same OID in related
objects, recompilation is needed.

PUBLIC SYNONYM SEQUENCE CRE- When the PUBLIC SYNONYM SE- USER
ATED (OBJ-ID <id>) QUENCE is created with the same
OID in related objects, recompilation is
needed.

RECOMPILE REQUIRED If a remote query execution failed and SYSTEM


the error code is ERR_RETRY_STATE-
MENT, recompilation is needed.

RECOMPILE TRIGGERED DUE TO The initial precompiled plan is invali- SYSTEM


PARAM VALUES dated by a parameter binding.

REPEATABLE READ OR SERIALIZABLE In the case of using a REPEATABLE SYSTEM


TRANSACTION READ or SERIALIZABLE transaction,
recompilation is needed.

REPLICATED TABLE UPDATED When a replicated table that is relevant SYSTEM


to the cached plan is updated, recompi-
lation is needed.

RESULT IS CLOSED When a plan with a view cache and its SYSTEM
result is not valid anymore, recompila-
tion is needed.

SAP HANA Performance Guide for Developers


52 PUBLIC SQL Query Performance
Invalidation Reason Details By

SESSION ANCHOR VOL TIMESTAMP There is a difference between the SYSTEM


MISMATCH cached anchor volume timestamp and
the target volume timestamp.

SPECIFIED MVCC SNAPSHOT SET VIA When the specified MVCC snapshot SYSTEM
QUERY HINT timestamp is set via a query hint, re-
compilation is needed.

TABLE LOCATION CHANGED (OBJ-ID When the global temporary table's loca- SYSTEM
<id>) tion is changed to ERN, the previously
cached plan in the index server cannot
be used, with the following exception:
remote temp table access
failure.

TABLE STATISTICS CHANGED (OBJ-ID Table content has significantly changed SYSTEM
<id>) since the last execution:

• Definition of significance: More


than 25 rows, change happens
more than 2 times
Related configuration: Parameter
cs_statistics_cache_inv
alidation_threshold_fac
tor in the cache section of the
indexserver.ini file:
• Default value: 2.0
• To disable this feature: 0.0
• For control at statement level, add
the following hint:
dev_validate_plan(use_t
able_statistics|
no_use_table_statistics
)

USER-OID NOT FOUND (OBJ-ID <id>) When the user object has disappeared SYSTEM
and it is the related object of the target
plan, recompilation is needed.

WITHOUT-FALLBACK RECOMPILE When the plan needs to be recompiled SYSTEM (HEX)


for any reason within HEX, it is recom-
piled by HEX again.

WITH-FALLBACK RECOMPILE When the plan needs to be recompiled SYSTEM (HEX)


for any reason within HEX, it is recom-
piled by non-HEX.

WITH-DISK RECOMPILE When the plan needs to be recompiled SYSTEM (HEX)


when a disk operation is needed

SAP HANA Performance Guide for Developers


SQL Query Performance PUBLIC 53
5.3.1.2 Maintenance

Although it is sometimes best to clear up the whole plan cache before you reproduce an issue, there might be
occasions where your work requires an SQL plan cache hit.

Since there is no clear answer as to whether to clear up the plan cache, it is best to have several maintenance
mechanisms at hand and to apply the appropriate ones when needed.

You can clear up the plan cache or recompile a plan by using the ALTER SYSTEM statement. Mostly you
probably just want to avoid a cache hit for one particular SQL statement, leaving the existing plan cache intact
for future use. You can do so by adding the hint IGNORE_PLAN_CACHE at the end of the statement so that
when the query is executed the plan cache lookup is skipped and no cache entry created.

You can use SQL statements like the following to manage the SQL plan cache:

ALTER SYSTEM CLEAR SQL PLAN CACHE;


ALTER SYSTEM RECOMPILE SQL PLAN CACHE ENTRY '<plan ID>';
SELECT "COLUMN1", "COLUMN2" FROM "TABLE1" WITH HINT (IGNORE_PLAN_CACHE);

5.3.2 Explain Plan

The Explain Plan is a quick lightweight tool that shows you a compiled plan in tabular form without executing it.

To display a plan in tabular form, the Explain Plan saves the plan results in a table called
EXPLAIN_PLAN_TABLE, selects the data to display, and then deletes it straight afterwards since the
information does not need to be kept.

This set of processes is started through the "EXPLAIN PLAN FOR …" statement. It can be triggered by manually
running the SQL statement or using the internal features of the SAP HANA studio.

Using SAP HANA Studio Features

Using the SAP HANA Studio SQL console, you can start the Explain Plan by choosing Explain Plan in the
context menu or by blocking the whole statement and running CTRL + SHIFT + X . Either way, the result
shows that a plan has been created, selected, and then deleted.

An example is shown below:

Since these processes all occur at once, there is no room for intervention. You cannot revisit the Explain Plan
you have just created, because it has already been deleted from the table; you would need to recreate it first. In
such cases, instead of capturing the results directly from the query, it is more useful to capture them from an

SAP HANA Performance Guide for Developers


54 PUBLIC SQL Query Performance
existing entry in the SQL plan cache. Caches are always stored for later use (unless evicted) and therefore their
plans as well.

Using the SQL Plan Cache and Plan ID

The best way to obtain the Explain Plan is to use the SQL plan cache, which has a consistent state and ensures
that the Explain Plans of the currently cached plans are also available.

You can use SQL statements like the following to obtain the Explain Plan from the SQL plan cache:

SELECT "PLAN_ID"
FROM "M_SQL_PLAN_CACHE"
WHERE "STATEMENT_STRING" LIKE '%<part of the SQL statement string>%';
EXPLAIN PLAN FOR SQL PLAN CACHE ENTRY <plan ID>;

Using the SQL plan cache, you can always revisit the Explain Plan based on its plan ID, which facilitates
performance troubleshooting.

Often in performance tuning projects and performance consulting, the issue cannot be reproduced. A
performance issue experienced at a customer site long ago no longer occurs even when the same query is
executed at the same site in the same manner by the same user. Also, if a trace no longer exists, the issue can
no longer be tracked.

In contrast, users generally expect the system to capture every single plan that was executed so that they can
revisit any of these plans at a later point in time. In this respect, the Explain Plan is the one and only trace that
can ensure that the displayed tabular plan is the one that was compiled in the early phase of the issue because
it pulls the plan directly from the SQL plan cache using the plan ID.

Note that even for the Plan Visualizer, you still require the exact SQL statement so that it can utilize the cache,
and it is not always easy to match that statement with the exact cache entry you want. The Explain Plan
is therefore recommended as the first performance tuning strategy for developers and performance tuning
experts.

Related Information

Analyzing the Explain Plan [page 56]


Explain Plan Characteristics [page 61]
EXPLAIN PLAN Statement (Data Manipulation)

SAP HANA Performance Guide for Developers


SQL Query Performance PUBLIC 55
5.3.2.1 Analyzing the Explain Plan

The Explain Plan provides a compiled plan in tabular form without it having to be executed. It provides a list of
operators used to execute a query and the relevant information about them.

An Explain Plan is shown in the example below. The indentation represents the hierarchy, with incrementally
indented entries shown in the OPERATOR_NAME column, where the indentation represents the level from the
root operator:

Hierarchy Information

The Explain Plan provides information about the hierarchy. The values of the OPERATOR_ID,
PARENT_OPERATOR_ID, LEVEL, and POSITION columns let you infer the location and the level of the operator
within the tabular plan. For example, the values given in the following table would represent the hierarchy
shown below it:

PARENT_OPERA-
OPERATOR_NAME OPERATOR_ID TOR_ID LEVEL POSITION

COLUMN SEARCH 1 NULL 1 1

JOIN 2 1 2 1

COLUMN TABLE 3 2 3 1

COLUMN SEARCH 4 2 3 2

AGGREGATION 5 4 4 1

JOIN 6 5 5 1

COLUMN TABLE 7 6 6 1

COLUMN TABLE 8 6 6 2

SAP HANA Performance Guide for Developers


56 PUBLIC SQL Query Performance
The operator has a unique ID given to each node. The nodes can be located using their parent operator ID. For
example, the second column search (ID 4) from the top is the child of its parent Join (ID 2) and parent of its
child Aggregation (ID 5).

Along with the parent-child hierarchy, the Explain Plan also offers a level-based hierarchy by simply displaying
the level on which each node resides. For example, the column search (ID 4) is on the same level as the table
(ID 3), that is, on level 3, where the table (ID 3) is the first child and the column search (ID 4) the second child,
and their positions are indicated by 1 and 2 respectively.

Operator Information

The OPERATOR_NAME, OPERATOR_DETAILS, and OPERATOR_PROPERTIES values provide information about


the operators. Operator names range from the simple JOIN, FILTER, and LIMIT operators to SAP HANA-
specific operators like COLUMN SEARCH, ROW SEARCH, and MATERIALIZED UNION ALL.

Some of the operators, like FILTER and LIMIT, are not engine-specific, so you need the EXECUTION_ENGINE
information to determine whether they are column engine operations or row engine operations. The operator
details provide information about filter conditions, join conditions, aggregation type, grouping columns,
partition information, and so on. This information is generally human readable, so you can use this column
to check for problems, such as an unwanted grouping column during an aggregation operation, a complex
calculation performed simultaneously with other operations like an aggregation or join, or redundant columns
pulled for projection.

SAP HANA Performance Guide for Developers


SQL Query Performance PUBLIC 57
Size and Cost Information

Information about the estimated size and cost is provided in the columns TABLE_SIZE, OUTPUT_SIZE, and
SUBTREE_COST.

Table Size
Table size is the estimated number of rows in the base tables. Estimations are not shown for any of the
operators but only for the tables that are directly accessed, for example, physical column-store tables and
physical row-store tables. Table size is one of the figures which you can easily obtain using the SQL console and
which can sometimes help you find a root cause. In rare cases when the table size estimation is incorrect, it
can cause the entire size and cost estimation of the plan to be wrong. Therefore, this column is worth checking
particularly when a thorough analysis has not helped you find the root cause of a performance degradation.
Also, if the tables are too small, too big, or they do not have any data at all, there is a higher risk of incorrect
table size estimation, which may need your attention if there is a performance issue.

Output Size
Output size is the estimated output size of each operator. For example, if the output size of the join with the
operator ID 4 (denoted as "4. Join") in the example below is 30, this means that the optimizer expects the join

SAP HANA Performance Guide for Developers


58 PUBLIC SQL Query Performance
to generate 30 rows if you select the data right after that join. In other words, the intermediate result of this
subtree is equal to 30:

Whenever you address any kind of performance issue, you focus on the operators, levels, engines, and any
large or comparatively large sizes. The Explain Plan makes it easier to spot the level with a large estimated
output size. This is a starting point for analyzing the performance issue.

Subtree Cost
Subtree cost is the calculated cost of the subtree. Since cost is basically a product of arithmetic calculation
using size as its core variable, subtree cost is the same. Subtree cost is to some degree proportional to subtree
size (the bigger the size of the subtree, the larger the cost). Cost values are relative rather than absolute and
do not have a unit of measure. A large subtree cost is a relative measure of the time needed to execute the
subtree, so the final goal of your performance tuning should be to minimize the subtree cost.

SAP HANA Performance Guide for Developers


SQL Query Performance PUBLIC 59
A subtree is shown in the example below:

Related Information

EXPLAIN_PLAN_TABLE System View

SAP HANA Performance Guide for Developers


60 PUBLIC SQL Query Performance
5.3.2.2 Explain Plan Characteristics

The Explain Plan is a fast lightweight tool that provides a trace in tabular form.

Fast and Lightweight

One of the main advantages of using the Explain Plan is that it does not require the query to be executed. This
makes the Explain Plan useful when it comes to hard-to-reproduce issues like out of memory, system hang,
and system crash. These types of issue should not be reproduced, or if they need to be this must be done
with great care because reproducing them obviously means causing identical system failures. Therefore, as
a performance consultant, performance tuning expert, or developer, you should not run the SQL statement
at issue particularly when it is in a production system. In those circumstances the Explain Plan is very useful
because the query only needs to be compiled.

Tabular Data

A tabular form makes it easier to understand, compare, and analyze information. Since the Explain Plan is the
one and only trace that is in tabular form, it can be of great use when summarized information is needed.
The Explain Plan is particularly good for analyzing small-sized plans because the whole table can fit onto the
screen. You can easily recognize the levels through the indentation, see patterns, and follow figures to find the
hot spot.

Repeated Engine Usage Pattern

It is sometimes easier to see execution engine patterns with the Explain Plan than with the Plan Visualizer even
though the Plan Visualizer is unquestionably easier to follow due to its visualized format. However, when more
than three engines are involved in one plan, the shape of the tree cannot be clearly seen in the Plan Visualizer.
Although this is not often the case, the pattern of execution engine changes available in the Explain Plan is
sometimes better than in the Plan Visualizer.

5.3.3 Plan Visualizer

The Plan Visualizer (PlanViz) is used to create a visualized plan. A visualized plan is a visual map of the
operators and their relationships and hierarchies, which are described in a large single tree.

The Plan Visualizer might seem simple because it just makes existing information visual, but it is a powerful
tool. You typically use it multiple times throughout an entire analysis: at the beginning to find a hot spot, in the
middle to get more ideas and information, and at the end to confirm your findings.

SAP HANA Performance Guide for Developers


SQL Query Performance PUBLIC 61
Although the Plan Visualizer is intuitive, it is also complicated and comprehensive at the same time. This means
that it can serve as both a starter and an enhancer depending on your context, the level of analysis you are
applying, the information you are looking for, and how deeply involved you are with the Plan Visualizer.

Single Statement Tracing

Like the Explain Plan, the Plan Visualizer is a feature of the SAP HANA studio. You can use the Plan Visualizer by
selecting the statement you want to trace and choosing Visualize Plan in the context menu.

Offered as a sub-option, the trace can be done after execution or just after preparation. The Executed Plan
provides actual information and not only planned information. Actual information can include the actual sizes
of the used operators, the inclusive/exclusive execution time for each level, the execution timeline, network
information, thread information, and so on. In contrast, the Prepared Plan only shows the data that is available
before the execution, such as the estimated size. Compared to the Executed Plan, the Prepared Plan is
lightweight but lacking in the information it provides. Most of the time, you will find the Executed Plan more
useful.

An Executed Plan and Prepared Plan are shown in the example below:

The statements that you can trace with the Executed Plan include procedure calls as well as plain SELECT
statements. In the case of procedure calls, there are multiple visualized plans for the individual SELECT
statements within the procedure. The number of statements and how they are formatted depend on how the
procedure is inlined.

SAP HANA Performance Guide for Developers


62 PUBLIC SQL Query Performance
Time-based Multiple Statement Tracing

The Plan Trace is another way of tracing SQL queries and their execution plans. For example, if you go to the
trace section of the Administration editor in the SAP HANA studio, the Plan Trace is on the bottom right.

The Plan Trace creates a visualized plan for all statements that are executed during the trace time. The scope
of the trace is not confined to a single statement. Therefore, it is only useful when you want to make sure that
every statement is executed.

A case in point is when one of the clients running multiple statements on the database has a performance issue
but is not certain which of those statements is causing the issue. Other traces like the debug traces can also
be used in those circumstances. In fact, the debug traces are recommended over the Plan Trace because the
Plan Trace has a very high memory usage. If you need traces for all SQL statements that are executed during a
certain time frame, you should opt for lighter traces like the debug traces unless the visualization is critical for
your analysis.

 Note

The SQL analyzer tool for SAP HANA can be used instead of the SAP HANA studio (deprecated) to
analyze the performance of SAP HANA SQL queries. It supports features that are available in the PlanViz
perspective of the SAP HANA studio, as well as new features. The tool is available as an extension for
Microsoft Visual Studio Code and SAP Business Application Studio.

For more information, see the SQL Analyzer Tool for SAP HANA Guide.

Related Information

Plan Visualizer Overview [page 64]


Node Information [page 64]
Logical versus Physical Executed Plan [page 65]
Actual versus Estimated Size [page 67]
Timeline, Operator List, Tables Used, Performance Trace, Network [page 68]
Plan Visualizer Characteristics [page 69]
NO_INLINE and INLINE Hints

SAP HANA Performance Guide for Developers


SQL Query Performance PUBLIC 63
5.3.3.1 Plan Visualizer Overview

After executing the Plan Visualizer or opening a Plan Visualizer file on the disk, the initial page you see is the
Overview. It contains all kinds of information, including most importantly the compilation time, execution time,
dominant operators, node distribution, SQL statement, and possibly system version.

An Overview is shown in the example below:

The time KPIs tell you how much time was consumed. The execution time indicates how critical an issue is.
An execution time of 2 seconds is not very critical even if users complain that the query is slow. But a single
SQL statement that takes 3,000 seconds to run can be very critical because users need to wait 50 minutes
for a result. This obviously also depends on the business use case and requirements. From a developer's
perspective, however, execution time can be regarded as a standard for understanding how important an issue
is.

Another factor to consider is the compilation time. A query running slowly is not necessarily the result of slow
execution. Around 20 percent of these issues are caused by slow compilation. Therefore, you should always
consider the compilation time as well, particularly when the execution time is not as high as reported. It is
highly likely that this difference is caused by the compilation time.

5.3.3.2 Node Information

One of the simplest ways to find hidden information in the Plan Visualizer is to hover your mouse over any of
the displayed nodes.

This method is particularly useful for getting information about join conditions, filter conditions, calculated
column information, expressions, projected columns, and even intermediate result names, and so on. However,
you should note that the displayed window disappears as you move your mouse around, so you may want to
copy and paste the information to a text editor to have a better look at it.

SAP HANA Performance Guide for Developers


64 PUBLIC SQL Query Performance
Hover to display more information, as shown in the example below:

5.3.3.3 Logical versus Physical Executed Plan

The logical plan provides the big picture and an overview of the plan, but it does not give detailed information.
Physical plans contain more detailed information, including the information which is provided by the execution
engines. Physical plans are usually complex.

The logical plan shows the shape of the query optimizer (QO) tree and contains structural information. Before
analyzing the physical plan, you should first analyze the logical plan to find starting points or a hot spot.
Generally, you should analyze the logical plan before the physical plan and not the other way around.

SAP HANA Performance Guide for Developers


SQL Query Performance PUBLIC 65
A logical plan and physical plan are shown in the example below:

Note that when analyzing a physical or logical plan, you don't have to change an entire plan from physical to
logical or vice versa just to view certain information. In the physical plan, column searches are folded by default
into a single node to ensure a compact display. To keep your physical plan compact or to retain the logical plan
but still see a physical plan of one of the logical nodes, you can open a separate tab to display the additional
information. To do so, select the node and from the context menu choose Open Inner Plan.

The example below shows how to open a physical or logical inner plan. It can either be selected as the default
type for the open tab under Default Inner Plan, or an additional tab can be opened with the physical or logical
inner plan by using the context menu:

SAP HANA Performance Guide for Developers


66 PUBLIC SQL Query Performance
5.3.3.4 Actual versus Estimated Size

To understand the behavior of the SQL optimizer, it is very helpful to compare the actual value to the estimated
value.

In the Executed Plan, there are two figures for data size. The figure in parentheses is the size estimated by
the optimizer and used during query compilation. The figure without parentheses is the actual figure and is
only available after execution. Since the optimization process of a query depends heavily on cost estimation
and therefore also on size estimation, a large difference between the two figures is significant. Optimization
is essentially an assumption despite its scientific, algorithmic, and mathematic traits, and assumptions are
only accurate if they have a sound basis. When a large difference is seen between the actual and estimated
sizes, the inaccuracy of the estimated size is a potential cause of the performance issue. A large difference is
generally regarded as a difference of over a million. Smaller differences might have a small impact or even no
impact at all.

When the difference is tracked down to the base table layer, the table size is sometimes found to have been
incorrectly estimated in the first place. If this is the case, you need to check other traces like the Explain Plan
and the debug traces to confirm your findings. It could be that the estimated table size is 10,000 whereas the
table is in fact empty. This behavior is expected because when a table is empty the optimizer considers its size
to be 10,000. This shows that your SAP HANA system is working correctly. In other cases, more investigation is
needed.

In the same way that empty tables can legitimately result in faulty size estimations, so can huge tables and
large operations. As intermediate results get bigger, an expected result is that the estimations are less precise.
In these cases, one effective measure is to reduce the intermediate sizes by using filters or SQL hints.

The example below shows an empty table with an estimated size of zero:

Apart from the difference in the figures discussed above, the size figure itself can also indicate the root cause
of the problem. For example, suppose an out-of-memory condition occurs that can be reproduced by executing
an SQL statement and its visualized plan is obtained. In the Plan Visualizer, you can easily see on which
operation the execution failed (unless the system did not have enough memory to even draw the visualized
plan). If the failure point is one of the join nodes and one of its children is very large, containing for example a
billion rows, this immediately gives you an idea why the operation failed. SAP HANA is an in-memory database

SAP HANA Performance Guide for Developers


SQL Query Performance PUBLIC 67
that requires memory to be used smartly and efficiently. It will run out of memory if it has to process such
a large number of records. It is worth remembering that the maximum number of records allowed in a single
table in SAP HANA is 2 billion and temporary tables are not an exception. So if the failed join in this case had
been materialized, the execution would have failed anyway.

The actual and estimated sizes are therefore important. It is highly recommended that you keep an eye on
these sizes whenever you start analyzing a performance issue.

5.3.3.5 Timeline, Operator List, Tables Used, Performance


Trace, Network

Although it depends on how you organize your display, the panels in the SAP HANA studio that contain more
detailed information are by default located at the bottom. When you examine the visualized plan in the main
window, it is easy to overlook these additional windows.

The Timeline view provides detailed information about the duration of each operation, for example, the
duration of a fetch or pop operation for a single column search. The parallel thread information is generally
more important as well as any dominant time-consuming thread. In the main window of the visualized plan,
you cannot see which operations were started or executed at the same time or ended before the others. In the
Timeline view, all operations are shown on a timeline and the parallel executions are explicitly displayed with
their starting points and completion points.

The Timeline view is also useful when there is a dominant thread that cannot be explained. On the Overview
tab, it is simple to pinpoint the dominant operator by clicking Dominant Operators. In the Timeline view, you
can analyze whether the performance lag is the result of a slow pop operation or other slow processes, like
a slow fetch operation or slow compilation. If it proves to be a slow pop operation, you can start analyzing
the plan and tackling the hot spot. If the slowness is caused by non-reducible factors like opening, fetching
and closing, this is outside the scope of your analysis. If it concerns compilation, like ce-qo (calculation engine
query optimization) compilation, you need to investigate it further using the debug traces.

In the example below, the first column search that is outlined is problematic, the second is not:

The Operator List, Tables Used, and Performance Trace are useful when you want to search for a text or order
the operators because they are in a tabular form. For example, when the plan is complex and you want to focus
on a specific filter or expression, you can search for the terms here. You can also order the operators by their
exclusive execution times in descending order so that you can compare the durations between the operators.

SAP HANA Performance Guide for Developers


68 PUBLIC SQL Query Performance
The Network view allows you to check the network packages transmitted between the different nodes. This can
be very useful and even decisive for root-cause analysis, particularly when there is a network issue and the
Network view shows abnormal package transfers. It is worth checking the Network view when multiple nodes
are involved.

5.3.3.6 Plan Visualizer Characteristics

The Plan Visualizer is a good tool to start with because it lets you see a query plan in a visualized manner.

Visualized Tree

By zooming out of the tree, you can get a better overview. The overview can give you an idea of the query's
complexity, volume, repeated tree patterns, and repeated operators. This is beneficial when there are two or
more traces that you want to compare, for example, you have a plan that runs slowly and another plan with
the identical query that runs faster. To find the differences, you can open both visualized plans side by side and
compare the zoomed-out trees.

A visualized plan is a huge file that contains complex information distributed in several locations like bottom
panels, hover boxes, and side windows. The Plan Visualizer serves as both a starting point and a finishing point.
It gives you a broad (macroscopic) and a detailed (microscopic) view of the same plan at the same time.

Actual Size of Executed Operators

Actual execution information is only available in the Plan Visualizer. It is not provided by the Explain Plan. Actual
sizes are shown without parentheses whereas planned figures are shown within parentheses. A large difference
between the two figures can be significant.

The actual size shown in the Plan Visualizer will not be applicable to every case you have, because this trace
is only available after the plan has been executed. Let's say there is a query raising an out of memory (OOM),
but you don't have any traces. Although you know that the Plan Visualizer could help your analysis, you cannot
create this trace, because the system would run out of memory again. It is even worse when the case concerns
a system crash. In the OOM case you would get a visualized plan at the cost of an OOM. In the case of a system
crash, your only option is to use the Explain Plan to analyze the plan because it doesn't require the query to be
executed. So you need to be aware of the pros and cons of the Plan Visualizer and Explain Plan so that you can
apply the correct one when needed.

5.3.4 SQL Trace

The SQL trace works on the session layer and captures every single SQL statement that enters the database.

When you cannot find an SQL statement in the SQL trace there are two possible reasons. Either the statement
was never executed in this database or the statement never reached the session layer because it was executed

SAP HANA Performance Guide for Developers


SQL Query Performance PUBLIC 69
internally, in which case it would not have had the form of an SQL statement. One exception is where a
statement is propagated internally and transmitted back to the session layer in the form of an SQL statement.
This statement is not captured by the default SQL trace setting. The internal statement can be logged if you
enable the "internal statement" configuration setting, as described below.

Single Statement Tracing

You can configure the SQL trace in two ways, on the Trace Configuration tab of the Administration editor in the
SAP HANA studio or by simply running an SQL statement in the SQL console.

Enabling the Trace with SAP HANA Studio


The SQL trace is on the top right of the Trace Configuration tab of the Administration editor in the SAP HANA
studio. You can activate the trace by clicking the pencil icon and entering the appropriate context information.
It is recommended that you use as many filters as possible because otherwise the script will be very long. It is
important to use script-based traces efficiently because it can take a very long time to download a script. This
applies not only to the SQL trace but to all other kinds of configurable or script-based traces as well, such as
the SQL Optimization Step Trace.

You can turn off the SQL trace by clicking the same pencil icon again and setting the trace to inactive. You can
find the trace file on the Diagnosis Files tab of the Administration editor.

Enabling the Trace with SQL


You can enable and disable the SQL trace by running the appropriate SQL statements in the SQL console. The
SQL statements for setting system configuration, including SQL trace configuration, start with ALTER SYSTEM.
The ALTER SYSTEM statement is used more frequently than the Administration UI because it is available even
without the SAP HANA studio interface and is therefore very convenient to use.

You can enable the SQL trace with the following SQL statement:

ALTER SYSTEM ALTER CONFIGURATION ('indexserver.ini','SYSTEM')


SET ('sqltrace','trace') = 'on',
('sqltrace','user') = '<database_user>',
('sqltrace','application_user') = '<application_user>'
WITH RECONFIGURE;

You can disable the SQL trace with the following SQL statement:

ALTER SYSTEM ALTER CONFIGURATION ('indexserver.ini', 'System')


UNSET ('sqltrace', 'trace'), ('sqltrace','user'),
('sqltrace','application_user') WITH RECONFIGURE;

Multiple Statements from a Procedure Call

Some SQL statements are initiated internally and then sent all the way back to the session layer. The
representative case in point is when a procedure is run. Although a procedure is a collection of multiple SQL
statements, the statements contained in the definition are not executed individually.

When compiling the procedure, the SQLScript optimizer combines two or more of the statements if their
combined form is considered to be more efficient. This process is called inlining and is usually beneficial for

SAP HANA Performance Guide for Developers


70 PUBLIC SQL Query Performance
most procedures. Unless configured otherwise or prevented through hints, a procedure is inlined, so you never
know what SQL statements the procedure was divided into. This information is available when the SQL trace
has the "internal statement" option activated. Because the SQL statements spawned from the procedure are
built and executed internally, they are only captured if the SQL trace is explicitly configured to include internal
statements. This is done by enabling the internal option (setting it to true) for the SQL trace. When this option
is enabled, the internal statements are recorded in the SQL trace. If the query_plan_trace is also enabled and
the level option set appropriately, the trace will also include each statement's plan explanation as well as its
records, which are shown in arrays.

You can enable the internal SQL trace with the following SQL statement:

ALTER SYSTEM ALTER CONFIGURATION ('indexserver.ini','SYSTEM')


SET ('sqltrace','trace') = 'on',
('sqltrace','user') = '<database_user>',
('sqltrace','application_user') = '<application_user>',
('sqltrace','query_plan_trace') = 'on',
('sqltrace','internal') = 'true',
('sqltrace', 'level') = 'all_with_results'
WITH RECONFIGURE;

You can disable the internal SQL trace with the following SQL statement:

ALTER SYSTEM ALTER CONFIGURATION ('indexserver.ini', 'System')


UNSET ('sqltrace', 'trace'), ('sqltrace','user'),
('sqltrace','application_user'), ('sqltrace','internal'),
('sqltrace','query_plan_trace'), ('sqltrace', 'level')
WITH RECONFIGURE;

Related Information

Analyzing the SQL Trace [page 71]

5.3.4.1 Analyzing the SQL Trace

The purpose of the SQL trace is to find the statement at issue.

The following example shows an SQL statement and the context information that can help you find it:

SAP HANA Performance Guide for Developers


SQL Query Performance PUBLIC 71
It is important to know the time frame during which the statement ran. A narrower time frame will make it
easier. If you don't know the thread number or the transaction ID associated with the statement, you should
search for the schema name instead. In the editor where the trace file is open, search for the name of the
schema that contains the relevant data. There might be several statements that were executed during the
specific time frame using the tables in the specified schema. You should be able to find other information such
as the thread and transaction ID.

The SQL trace can also be used in reverse. For example, there might be cases where you already know the
thread number and you want to track down the execution time or the statement string. By juggling these key
pieces of information, you should be able to find the details you need.

5.3.5 SQL Optimization Step Debug Trace

The SQL optimization step debug trace (also referred to as the SqlOptStep trace) logs the shape and contents
of the query optimization (QO) tree captured in each step of the optimization.

As described in the SQL optimizer topics, there are two main categories of optimization, rule-based rewriting
and cost-based enumeration. In the case of rule-based rewriting, the SQL optimization step trace shows what
the QO tree looks like after each rule has been applied. For example, you can see that the QO trees are different
before and after calculation view unfolding. In the case of cost-based enumeration, the SQL optimization step
trace shows the logical tree and the physical tree of the final execution plan. While the logical execution plan
indicates which operators are located where, the physical execution plan tells you which types of operators
were selected.

The SQL optimization step trace also provides more detailed information at the debug level. This
includes estimated sizes, estimated subtree costs, applied rule names, applied enumerator names, partition
information, column IDs, relation IDs, and much more.

The fastest and most effective way to learn about the SQL optimization step trace is to use it yourself. This
means working through all the activities required to collect the trace and analyze it.

SAP HANA Performance Guide for Developers


72 PUBLIC SQL Query Performance
The example below shows the format of the SQL optimization step trace:

Collecting the SQL Optimization Step Trace

The SQL optimization step trace logs the optimization steps applied by the SQL optimizer and can therefore be
used to investigate the optimizer's behavior during the optimization phase. The trace should only be collected
during query compilation.

It is important to ensure that the query undergoes the compile phase because otherwise the SQL optimization
step trace will be empty. If the query is not compiled, the result will be a cache hit, the SQL optimizer will be
omitted, and the executors will use the cached plan.

There are two ways to obtain an SQL optimization step trace when a cache already exists. You can either
delete the cache or force the query to skip the cache hit. For information about how to delete the SQL plan
cache, see the SQL plan cache Maintenance topic. You can force the query to skip the cache by using the

SAP HANA Performance Guide for Developers


SQL Query Performance PUBLIC 73
hint IGNORE_PLAN_CACHE. This prevents the query execution from reading from and writing to the SQL plan
cache.

Related Information

Maintenance [page 54]


Enabling the SQL Optimization Step Trace with SQL [page 74]
Using Tags for Easier Analysis [page 75]
Extra Filters for a More Efficient Trace [page 75]
Session Debug Trace [page 76]
Analyzing the SQL Optimization Step Trace [page 76]
Optimized Logical and Physical Plans [page 78]

5.3.5.1 Enabling the SQL Optimization Step Trace with SQL

The most basic way to enable and disable the SQL optimization step trace is to use the ALTER SYSTEM
statement and set the parameter ('trace', 'sqloptstep').

The trace will then be included in the index server trace, the database trace that is always on by default to
log every activity that occurs on the index server at the INFO level. By enabling the additional trace, you are in
effect raising the severity level of the index server trace for this specific component. To make your investigation
more efficient, you might want to add a user filter by setting the sql_user parameter to the database user ID.
This should give you a shorter but more productive result.

But in most cases, you need a separate trace file rather than an index server trace that
contains the SQL optimization step trace together with all kinds of other INFO traces. The
"traceprofile" configuration enables you to customize the name of the file to which the trace will
be written. For example, by setting the configuration to traceprofile_MyTrace01, a file named
indexserver_<hostname>.<port>_MyTrace01.trc will be written which only contains the SQL
optimization step trace.

Although the simple way of enabling the trace with ('trace', 'sqloptstep') has been mentioned, this
is only to give you an idea of how the trace works. For most use cases, the more practical and preferred
method is to use ('traceprofile_<keywords>', 'sqloptstep') to obtain a separate trace file. You
can also use this parameter to indicate the purpose of the trace, for example, you could use keywords like
('traceprofile_20190620_17sec', 'sqloptstep') or ('traceprofile_SqlOptStep_GoodCase',
'sqloptstep'). These labels will help you spot the file you need in the list more easily.

You can enable and disable the internal SQL optimization step trace with the following SQL statements:

• Basic method:

ALTER SYSTEM ALTER CONFIGURATION ('indexserver.ini','SYSTEM')


SET ('trace','sql_user') = '<database_user>',
('trace','sqloptstep') = 'debug'
WITH RECONFIGURE;
ALTER SYSTEM ALTER CONFIGURATION ('indexserver.ini','SYSTEM')
UNSET ('trace','sql_user'), ('trace','sqloptstep')
WITH RECONFIGURE;

SAP HANA Performance Guide for Developers


74 PUBLIC SQL Query Performance
• Advanced method:

ALTER SYSTEM ALTER CONFIGURATION ('indexserver.ini','SYSTEM')


SET ('traceprofile_MyTrace01','sql_user') = '<database_user>',
('traceprofile_MyTrace01','sqloptstep') = 'debug'
WITH RECONFIGURE;
ALTER SYSTEM ALTER CONFIGURATION ('indexserver.ini','SYSTEM')
UNSET ('traceprofile_MyTrace01','sql_user'),
('traceprofile_MyTrace01','sqloptstep')
WITH RECONFIGURE;

5.3.5.2 Using Tags for Easier Analysis

When you start any type of trace, you should know how you intend to analyze it. To make the analysis easier, it
helps to add tags in the executed statement.

For example, you could add /*literal*/ and /*parameterized*/ if you want to compare a literal query
with a parameterized query. Or you could use /*Good_17_seconds*/ and /*Bad_300_seconds*/ if you
intend to compare plans with different levels of performance.

The SQL optimization step trace often involves a comparison. Therefore, it is important to make each file or
trace unique so that it can be easily recognized and is not confused with other ones.

5.3.5.3 Extra Filters for a More Efficient Trace

If you have the SQL user filter and trace file name configurations in place but still get noise from other queries
executed by the same user, you need more meticulous filters to remove that noise. The statement hash and
connection ID are the two main filters you can use.

You can add extra filters to the SQL optimization step trace with the following SQL statement:

ALTER SYSTEM ALTER CONFIGURATION ('indexserver.ini','SYSTEM')


SET ('traceprofile_MyTrace01','sql_user') = '<database_user>',
('traceprofile_MyTrace01','sqloptstep') = 'debug',
('traceprofile_MyTrace01','statement_hash') = '<stmt_hash>',
('traceprofile_MyTrace01','connection_id') = '<connection_id>'
WITH RECONFIGURE;

You can remove the extra filters from the SQL optimization step trace with the following SQL statement:

ALTER SYSTEM ALTER CONFIGURATION ('indexserver.ini','SYSTEM')


UNSET ('traceprofile_MyTrace01','sql_user'),
('traceprofile_MyTrace01','sqloptstep'),
('traceprofile_MyTrace01','statement_hash'),
('traceprofile_MyTrace01','connection_id')
WITH RECONFIGURE;

SAP HANA Performance Guide for Developers


SQL Query Performance PUBLIC 75
5.3.5.4 Session Debug Trace

You can use a separate session and trace only that session for the component you are interested in. For more
information, see SAP Note 2380176.

You can set a session for the SQL optimization step trace with the following SQL statement:

ALTER SYSTEM ALTER CONFIGURATION ('indexserver.ini','SYSTEM')


SET ('traceprofile_MyTrace01','sqloptstep') = 'debug' WITH RECONFIGURE;
SET SESSION 'TRACEPROFILE' = 'MyTrace01';

You can remove a session for the SQL optimization step trace with the following SQL statement:

UNSET SESSION 'TRACEPROFILE';


ALTER SYSTEM ALTER CONFIGURATION ('indexserver.ini','SYSTEM')
UNSET ('traceprofile_MyTrace01','sqloptstep') WITH RECONFIGURE;

Related Information

SAP Note 2380176

5.3.5.5 Analyzing the SQL Optimization Step Trace

Compiled Query String

To be able to analyze trace information efficiently, it is important to know on which line of the trace file the trace
you requested starts.

You can find the starting point by looking for Compiled Query String. The information it provides is not that
helpful, because it simply tells you the thread, connection ID, transaction ID, and timestamp. However, it
represents the starting line of the trace and is therefore the point at which you should start your analysis.

For example:

SAP HANA Performance Guide for Developers


76 PUBLIC SQL Query Performance
Query Rewriting

This section has the prefix Query Optimizer contained within square brackets. The SQL optimization step
trace lists the names of the query rewriting rules and specifies the QO tree directly after each rewriting
step. For example, under [Query Optimizer] Select Push-down (SeqId:nn), you would see the QO tree after
the Pushdown rule has been applied to the tree. This pattern is repeated until query rewriting reaches the
cost-based optimization step.

An example of a [Query Optimizer] Query Rewriting section is shown below:

QO Tree Component Basics

The components that form the QO trees under the rewriting rules are also important for the analysis.

For example:

The overall shape of the tree is very similar to the Explain Plan. Indentation is used to represent the hierarchy,
that is, the more a line is indented the lower it is in the hierarchy. Each line starts with a hash character (#),
which is followed by the name of the operator, such as PROJECT, INNER JOIN, or TABLE, with its ID number at
the end in the form (opId : nn). The opId is a unique ID given to each operator and used by the optimizer
to manage them. These IDs can also be used for plan analysis because they are unique and make the operators
easy to identify. For example, if you want to track every change made to a specific operator, you just need
to track its operator ID. To help, there are additional lines that indicate all changes that were made to the
operators. These are shown at the bottom of each QO tree after the rewriting names using the categories
Deleted Operator, Parent-updated Operator, and Contents-updated Operator. The number d@
after the Deleted Operator: is the ID of that deleted operator. Similarly, the numbers after pu@ and cu@
indicate those operators where either their parents' or their own content has been updated.

An understanding of the basic concepts of relation IDs and column IDs is needed for trace analysis. A relation
is a logical data set produced or consumed during an operation. For example, a table is a relation and an
aggregated set of the table is another relation. Each relation is given a unique ID starting from the outermost

SAP HANA Performance Guide for Developers


SQL Query Performance PUBLIC 77
relation and moving inwards. Since a relation is a set of columns filled with data, each column is also given
an ID starting from the leftmost column. A column in a relation is identified by two ID numbers in brackets
with a comma between them. For example, (4,1) denotes the second column from the left of the fifth outer
relation but does not indicate what data it contains. To find that information, you need to trace it back to the
top, possibly to table level, where you can see the name of the column.

Although it seems complicated, it is important to follow the relation IDs and column IDs because small changes
in these IDs within a set of operations can make a significant difference even though there might be no
apparent reason. Sometimes one or two columns are dropped during one of the aggregations, resulting in final
values that are unexpected. Other times there are just too many relations for the same table and you need to
find a way to combine them into one relation by deriving the appropriate filters or removing joins.

On the SQL surface, you see the names of tables and columns, but inside the optimizer they are distinguished
by their IDs. Therefore, you need to be familiar with these internal structures to be able to analyze the internal
operations of the optimizer.

Heuristic Rules

Heuristic rules are part of the rewriting rules, so the basic principles that apply to rewriting rules also apply to
heuristic rules. Heuristic rules, however, also show you the estimated sizes of the operators in the trace. This is
because heuristic rules themselves use estimated sizes.

Unless you specifically need the sizes estimated by the heuristic rules in order to know where an estimation
was started, the estimated values are not a good tool for analysis, because they are premature. If you intend to
analyze sizes and costs, it is better to do so using the values given in the Optimized Physical Plan.

For example, sizes are shown but can be ignored:

5.3.5.6 Optimized Logical and Physical Plans


Once rule-based query rewriting has finished, cost-based enumeration starts. The information provided by the
SQL optimization step trace about cost-based enumeration is limited to a few details about the logical and
physical aspects of the final product. This is because it is not necessary know about all the processes that
occur during enumeration.

More detailed information is provided by the enumeration trace when it is enabled. However, since the detailed
optimization processes were omitted from the SQL optimization step trace to make it as compact as possible,
it is not recommended to enable the enumeration trace unless a developer explicitly advises you to do so.

Although the two final products of enumeration look simple and short, they are very important and among the
most frequently used items in the entire trace. The Optimized Logical Plan resembles the logical subtree in
the Plan Visualizer and Explain Plan, whereas the Optimized Physical Plan is like a physical subtree in the Plan
Visualizer and Performance Trace. While the Optimized Logical Plan shows the trunk and branches of the tree,
indicating the locations and logical shapes of the operators, the Optimized Physical Plan shows the types of
leaves the tree selected as well as execution engine-specific information and column search information.

SAP HANA Performance Guide for Developers


78 PUBLIC SQL Query Performance
For example:

In plan analysis, you usually start by searching for the tag that was commented out in the statement string
to find the first line of the trace. This takes you to the Compiled Query String line, where you can search for
the term Optimized Physical Plan to find the plan details. Then you can locate the Optimized Logical
Plan higher above, which is where you can start the actual analysis. The Optimized Logical and Physical Plans
should give you an impression of what the entire plan looks like and some information about why it was shaped
the way it was.

5.3.6 SQL Optimization Time Debug Trace

The SQL optimization time trace (SqlOptTime trace) is a stopwatch that measures how much time it takes to
finish each step of the optimization process. It records not only the time required for the overall rule-based or
cost-based optimization process but also the time consumed by each writing rule in rule-based optimization
and the number of repeated enumerators during cost-based optimization.

The stopwatch for the SQL optimization time trace starts when compilation begins and ends when it finishes.
The SQL optimization time trace is primarily a compilation trace rather than an execution trace and can even
be obtained by compiling a query and cancelling it immediately before execution. There is no point in collecting
this trace if your aim is to improve execution time, particularly when this involves an out of memory condition
or long running thread caused by a large amount of materialization.

The SQL optimization time trace is helpful when there is a slow compilation issue. With regard to execution
time, there could be cases where long execution times can only be reduced at the expense of longer
compilation times, so the total time taken from compilation to execution remains the same irrespective of
which one you choose to reduce. Although these are execution time-related performance issues, you would
still need the SQL optimization time trace to investigate whether the compilation time could be improved.
Many cases involving long execution times could be closely related to compilation, such as when the optimizer
reaches the enumeration limit, which causes it to stop and generate a plan. Or there might simply be a slow
compilation issue where the purpose of the investigation is to reduce the compilation time.

The SQL optimization time trace can also be used in a general sense to get a summarized report of the rules
and enumerators used to construct the final plan. The SQL optimization step trace is usually very long, so a
brief version that just tells you what choices the optimizer made rather than how they were made might be
preferable. In that case, the SQL optimization time trace would be the appropriate tool.

SAP HANA Performance Guide for Developers


SQL Query Performance PUBLIC 79
The example below shows the format of the SQL optimization time trace:

Related Information

Enabling the SQL Optimization Time Trace [page 81]


Analyzing the SQL Optimization Time Trace [page 81]

SAP HANA Performance Guide for Developers


80 PUBLIC SQL Query Performance
5.3.6.1 Enabling the SQL Optimization Time Trace
The SQL optimization time trace trace is as simple to use as the SQL optimization step trace. You can enable
the trace by replacing the configuration parameter sqloptstep with sqlopttime.

The only difference between the two traces is that the SQL optimization time trace starts logging when
compilation begins and ends at the point of execution. Even if you cancel the transaction, the SQL optimization
time trace will still be available provided compilation was completed.

You can enable the internal SQL optimization time trace trace with the following SQL statement:

ALTER SYSTEM ALTER CONFIGURATION ('indexserver.ini','SYSTEM')


SET ('traceprofile_MyTrace01','sql_user') = '<database_user>',
('traceprofile_MyTrace01','sqlopttime') = 'debug'
WITH RECONFIGURE;

You can disable the internal SQL optimization time trace trace with the following SQL statement:

ALTER SYSTEM ALTER CONFIGURATION ('indexserver.ini','SYSTEM')


UNSET ('traceprofile_MyTrace01','sql_user'),
('traceprofile_MyTrace01','sqlopttime')
WITH RECONFIGURE;

Note that because query compilation also occurs as part of calculation engine query optimization (CeQo) or
calculation engine SQL (CeSql) during execution, these compilation logs are also written to the trace. If you had
CeQo in the plan, you would have two sets of SQL optimization time traces in one trace file at the end.

This can benefit your analysis because CeQo compilation time is often incorrectly rendered as execution when
it is actually compilation. There are cases where long execution times are caused by long CeQo compilation
times. The solution to this problem would be to remove the CeQo by enforcing calculation view unfolding or to
make the view simpler by modifying the base calculation views.

5.3.6.2 Analyzing the SQL Optimization Time Trace


The SQL optimization time trace has a distinct appearance compared to the other traces and so is relatively
easy to find even when the index server trace contains a large number of lines.

The SQL optimization time trace has several dash lines and the content between these lines is shown in
the form of a report rather than a log or trace. It lists the measured parameters with the values given in
milliseconds (ms) or megabytes (MB).

The SQL optimization time trace starts with the query string shown under [Query Compile] Compiled
Query String. It then lists the parsing time, checking time, query rewriting time, and cost-based optimization
time. In very rare cases, the parsing time or checking time could be unexpectedly long, say 10 seconds. These
cases need to be investigated further by the SAP HANA frontend experts. 99% of long compilation issues
involve either long query rewriting time or long cost-based optimization time.

Cost-based Optimization Time Analysis

This analysis provides filter and join selectivity estimation times, the number of enumerations for each
enumerator, and the enumeration limit. The local filter selectivity estimation time is the time taken to calculate

SAP HANA Performance Guide for Developers


SQL Query Performance PUBLIC 81
the selectivity of the filters. The selectivity is a ratio of the values filtered out of the data set. For example, if
the values are very distinct and only one out of a hundred matches the filter condition, the selectivity would be
0.01.

When the table or data source is too large for the entire set to be examined in a short time, SAP HANA
resorts to estimation strategies like sampling or histogram analysis. Almost all customer cases depend on
these estimation strategies, particularly sampling where by default a maximum of 1,000 rows are collected as
a sample set and prorated into the whole. The local filter selectivity estimation time refers to the time required
for this process.

For example:

The join selectivity estimation time is the time required to calculate the join cardinality. The SQL optimization
step trace shows cardinality information like (N-N) or (N-1) next to the JOIN. The cardinality information
allows the execution time to be shortened because the rows that will eventually be dropped do not need to be
combined beforehand.

These estimation times are shown in the trace, but they rarely cause any problems because they are
mathematical calculations based on algorithms. Even so, if there are any problematic estimations, it is
advisable to provide this information to the SQL optimization experts at SAP.

The remaining information about cost-based optimization time consists of the number of enumerations
of the various types. For example, Number of JOIN_THRU_AGGR means how many times the enumerator
JOIN_THRU_AGGR (join through aggregation) was applied. If this number is too high, you may want to lower
it by using an SQL optimization hint. Let’s say JOIN_THRU_AGGR was done 1000 times. By adding the hint
NO_JOIN_THRU_AGGR, you can make the SQL optimizer choose not to push down the joins. This approach is
not an analysis strategy as such, but it might help you get closer to your goal.

SAP HANA Performance Guide for Developers


82 PUBLIC SQL Query Performance
Rule-Based Optimization Time Analysis

A separate section called RewriteRule Time Analysis indicates whether each rewriting rule was processed
successfully and how long this took. Only the successful rewriting rules are relevant for your analysis together
with the times (in milliseconds) given in all three sections and not just in Prepare or MainBody.

For example:

This section is helpful when one of the rules took too many milliseconds to complete. The most common
example is when the calculation view unfolding process takes too long. In this case, you should first apply the
hint NO_CALC_VIEW_UNFOLDING so that this step is skipped. It is highly likely that the compilation time will
be reduced at the cost of a longer execution time if the calculation engine takes the CeQo path instead. Again,
this demonstrates the trade-off between the two major factors, compilation and execution. To find a solution,
you should consider how to remove the complex evaluation inside the calculation view so that the translation
into a QO tree does not take so long.

SAP HANA Performance Guide for Developers


SQL Query Performance PUBLIC 83
5.3.7 Views and Tables

Various views and tables are available for analyzing performance.

Expensive Statements

The expensive statements trace records information about a statement if its execution time exceeds the
specified threshold, which is 1 second by default. The contents of this trace are available in the monitoring
view M_EXPENSIVE_STATEMENTS. The monitoring view collects the information from the disk files where
the records are stored. This means that once you have enabled the trace, you can simply download the
expensive_statements file from the SAP HANA trace directory or the Diagnosis Files tab of the SAP HANA
studio administration editor.

The trace is disabled by default. It is therefore too late to use it to analyze cases in the past. However, it is
generally not recommended to leave the trace enabled all the time because this can increase system load and
memory usage. It is advisable to only enable the trace for one or two days when you detect a disruption in SQL
performance and need to find the specific cause of the issue.

The expensive statements trace is similar to the SQL trace.

Executed Statements

The executed statements trace should not be confused with the expensive statements trace. The executed
statements trace is not designed for performance analysis, because it only captures DDL statements like
CREATE, DROP, IMPORT, COMMENT, and so on. It does not provide any information about SELECT statements.

Active Threads

The information about active threads, which is available on the Performance tab of the Administration editor in
the SAP HANA studio, is helpful for analyzing performance.

The Performance Threads subtab shows all threads that are currently running. For performance
analysis, the best time to obtain as much contextual information as possible is while the issue occurs. The
Threads panel allows you to obtain this information.

Other panels provided on the Performance tab of the administration console are similar and indicate the
current status of the system. It is recommended to monitor these panels as well and not to rely solely on the
information provided by the traces even though they are very good sources of information.

SAP HANA Performance Guide for Developers


84 PUBLIC SQL Query Performance
5.4 Case Studies

This collection of case studies is used to demonstrate common performance issues.

Related Information

Simple Examples [page 85]


Performance Fluctuation of an SDA Query with UNION ALL [page 97]
Composite OOM due to Memory Consumed over 40 Gigabytes by a Single Query [page 105]
Performance Degradation of a View after an Upgrade Caused by Calculation View Unfolding [page 111]

5.4.1 Simple Examples

These simple examples introduce some common performance issues.

Related Information

Compilation Issue and Heuristic Rewriting Rules [page 85]


Column Search and Intermediate Results [page 87]
Execution Engines [page 93]

5.4.1.1 Compilation Issue and Heuristic Rewriting Rules

A query is extremely slow. The visualized plan is collected and shows that the majority of the time was
consumed during compilation and not execution. For further analysis, the SQL Optimization Time trace is
collected.

Statement

SELECT MANDT, ORDERID, ORDERNAME, ……, PRICE,


FROM "CSALESORDERFS"
WHERE "MANDT" = '700' AND "SALESORDER" = '00002153')
LIMIT 100000;

SAP HANA Performance Guide for Developers


SQL Query Performance PUBLIC 85
Analysis

The SQL Optimization Time trace shows the following:

=============================================================

* QC to QO conversion time: 3.579 ms
* Query rewriting time: 45844.013 ms
** Number of operators: 68
** Memory pool size: 6.000 MB

=== RewriteRule Time Analysis ===
[Predicate Grant Injection ] Prepare: 0.00 ms, MainBody: 0.28 ms, Finalize:
0.00 ms, success: False
[Remove Unnecessary Col ] Prepare: 0.00 ms, MainBody: 0.28 ms, Finalize: 0.00
ms, success: False
...
[Heuristic Join Reorder ] Prepare: 0.00 ms, MainBody: 34150.66 ms, Finalize:
0.00 ms, success: True
...
=============================================================

The trace indicates that the performance degradation is mainly caused by applying the rule Heuristic
Join Reorder. The reason why the process takes so long needs to be investigated separately, but there
is a valid workaround to mitigate the problem. One of the SQL hints that adjust the level of optimization
and disable specific optimization processes could help. The optimization level is set by default to the highest
level (COST_BASED). It is recommended to keep the default setting except for when there is an occasional
unexpected performance lag.

As seen in the table below, heuristic join reorder can be avoided by using the hint
OPTIMIZATION_LEVEL(RULE_BASED):

Hint Name Description

OPTIMIZATION_LEVEL(MINIMAL) Disables all optimization rules except the mandatory ones


to execute the user query as it is

OPTIMIZATION_LEVEL(RULE_BASED) MINIMAL plus all rewriting rules

OPTIMIZATION_LEVEL(MINIMAL_COST_BASED) RULE_BASED plus heuristic rules (including heuristic join


reorder)

OPTIMIZATION_LEVEL(COST_BASED) MINIMAL_COST_BASED plus available cost-based


optimizations (including logical enumeration)

OPTIMIZATION_LEVEL(RULE_BASED) omits all the later optimization steps, including cost-based


enumeration. You need to be careful when you apply this hint because a suboptimal plan can cause an
out-of-memory situation or a long-running thread when it is executed. You should first test it in your sandbox
system before moving it to production sites.

SAP HANA Performance Guide for Developers


86 PUBLIC SQL Query Performance
5.4.1.2 Column Search and Intermediate Results

There is a severe fluctuation in the performance of a single SQL statement. When it is fast it runs in 1 second,
but when it is slow it takes up to 2 minutes. This considerable difference needs to be explained and a solution
found to make the statement's performance consistent.

Statement

SELECT
column1, column2, column3, ……, column20,
(SELECT
COUNT(DISTINCT C.columnA)
FROM "view1" A
INNER JOIN "view2" B ON A.ID = B.ID
INNER JOIN "view3" C ON C.ID = B.ID
WHERE A.columnB = E.columnB)
FROM "view4" D
INNER JOIN "view5" E ON D.ID = E.ID
INNER JOIN "view6" F ON D.ID = F.ID
INNER JOIN "view7" G ON F.ID = G.ID
……
INNER JOIN "view12" G ON K.ID = G.ID;

Analysis

Two different Plan Visualizer traces are collected to compare the fast-running case with the slow-running case.
Overall they look very different.

SAP HANA Performance Guide for Developers


SQL Query Performance PUBLIC 87
The plan with a shorter execution time (fast case) is shown below:

SAP HANA Performance Guide for Developers


88 PUBLIC SQL Query Performance
The plan with a longer execution time (slow case) is shown below:

SAP HANA Performance Guide for Developers


SQL Query Performance PUBLIC 89
You first try to find a hot spot, which is the point with the longest exclusive execution time. In the slow case, 60.1
seconds out of 60.9 seconds are required for the second last column search. For a clearer view, you can expand
the column search by clicking the downward arrow in the top right corner.

In the expanded version of the heaviest column search (slow case), you can see that there is a join that takes 41
seconds and produces 148 million rows, which are then aggregated for another 17 seconds:

The logical plan of this join shows that 14 different tables are joined at once:

The two plans show that although the data size is small before and after the column search, there is a huge
intermediate result inside the column search between the aggregation and the join. You can only see these
details by expanding the thread. The heavy complex join is the root cause of the large amount of intermediate
data and the performance degradation. The same joins and aggregations also exist in the fast case, but
they take a different form. In the very first figure above, the red box is the equivalent part of the joins and
aggregations for the fast case.

To compare the two more easily, you might want to simplify the structure of the statement and create diagrams
of the join conditions and the numbers of resulting rows, as shown below. The big boxes contain the join
operators with their join conditions, including the join type and the coordinates of the relation and column. The
numbers between the boxes are the estimated sizes of the intermediate result.

SAP HANA Performance Guide for Developers


90 PUBLIC SQL Query Performance
A simplified diagram of the joins in the fast case is shown below:

SAP HANA Performance Guide for Developers


SQL Query Performance PUBLIC 91
A simplified diagram of the joins in the slow case is shown below:

By comparing the two diagrams, you can see that the two inner joins (34, 3)=(35,2) and (33,0)=(34,2) are
done separately in the fast case, whereas they are absorbed into the later column search in the slow case.
Because these two joins are already done beforehand in the fast case, the next column search does not have
to hold as much data and has a reduced number of joins (4) and therefore reduced complexity. To ensure that
the statement always uses the fast plan and does not revert to the slow plan, you should ensure that it does
not push down the joins, because otherwise the joins will be absorbed into the huge column search whenever
possible. Enumeration rules could help to achieve this.

The performance fluctuation, which is the main issue in this scenario, is caused by the fluctuating data. The
SQL optimizer is sensitive to the size of the data. Therefore, varying data sizes can diminish the level of
consistency in execution performance. Such issues arise particularly when the data size oscillates around

SAP HANA Performance Guide for Developers


92 PUBLIC SQL Query Performance
several internal thresholds that the optimizer uses. One way to make the performance consistent is to apply a
hint that can ensure that the fast plan is chosen.

5.4.1.3 Execution Engines

There are three very similar SQL statements that use the same view as their source. Although they are very
similar, their performance is very different.

For example, Query1 takes around 30 times longer than Query2 although the only difference between them
is their aggregated columns. Also, Query2 is 15 times faster than Query3 even though the former is an
aggregation of the latter. The logic behind these statements needs to be analyzed.

Statements

-- Query1: 6 seconds
-- executed in CALC engine
SELECT DISTINCT COUNT(CASE_KEY)
FROM "SCHEMA"."CALCULATION_VIEW"
WHERE "ID" = '002';
-- Query2: 236 ms
-- executed in OLAP engine
SELECT DISTINCT COUNT(ID)
FROM "SCHEMA"."CALCULATION_VIEW"
WHERE "ID" = '002';
-- Query3: 3 seconds
-- executed in JOIN engine
SELECT ID
FROM "SCHEMA"."CALCULATION_VIEW"
WHERE "ID" = '002';

Analysis

The main difference between Query3 and the other queries is that it does not have any aggregation or group
by operations. If a query string does not have any aggregations, the OLAP engine will not be used for any
part of its execution, because it is designed to operate only when there is an aggregation. (The presence of
one or more aggregations, however, does not guarantee that the OLAP engine will be used.) Although Query2
appears more complex than Query3 because it has an additional operation, an aggregation, it is highly likely
that it would be executed by the OLAP engine. Since the OLAP engine can demonstrate a very high level of
performance, particularly when a query contains aggregations with no complex expressions or functions, this
meant that Query2 took 0.2 seconds while Query3 took 3 seconds.

SAP HANA Performance Guide for Developers


SQL Query Performance PUBLIC 93
The visualized plan of Query1 is shown below:

SAP HANA Performance Guide for Developers


94 PUBLIC SQL Query Performance
The visualized plan of Query2 is shown below:

SAP HANA Performance Guide for Developers


SQL Query Performance PUBLIC 95
The visualized plan of Query3 is shown below:

When comparing Query1 and Query2, the only difference between them is an aggregated column. As
mentioned above, the presence of an aggregation in the query string does not guarantee that the OLAP engine
will be used to execute the query. OLAP execution could be blocked due to a limitation, such as too little data
in the data source. Another reason could be that the plan was not unfolded in the case of a calculation view. As
explained in previous sections, calculation views are unfolded by default except in certain cases where there are
unfolding blockers or missing features. If unfolding is blocked, the complete optimization task is handed over
to the calculation engine, which then executes the operation instead of the OLAP engine. This was the case for
Query1. Due to calculation view unfolding blockers, the view was not unfolded, which meant that it could not be
processed by the OLAP engine but instead had to be processed by the calculation engine.

The analysis of the three queries' performance has shown that their behavior was more different than
expected. Since a set of complex calculations, and not just one simple rule, determines which engine is used
for which operation, there is no guarantee which engine will be used in each case. More importantly, one
execution engine is not better than another; they exist for different purposes. So, there is no need to question
the validity of the execution engine selected to execute the generated plan.

SAP HANA Performance Guide for Developers


96 PUBLIC SQL Query Performance
5.4.2 Performance Fluctuation of an SDA Query with UNION
ALL

The performance of the same statement ranges from 0.5 seconds to 90 seconds. The system is a scale-out
system with more than 10 nodes with a remote data source connected through Smart Data Access (SDA).

The query statement is a SELECT statement on a calculation view which consumes SDA virtual tables with
UNION ALL, INNER JOIN, and AGGREGATION operations on those sources. Depending on how much data is
contained in the table and which variants are used for the query, the query transmitted to the remote source
changes in a way that has a significant impact on the overall performance. The best option needs to be found to
minimize the performance fluctuation and ensure that the query is always fast.

Statements

SELECT query on the main node

SELECT
sum("COLUMN_1") AS "COLUMN_1 ",
sum("COLUMN_2") AS "COLUMN_2",
sum("COLUMN_3") AS "COLUMN_3",
sum("COLUMN_4") AS "COLUMN_4",
sum("COLUMN_5") AS "COLUMN_5",
sum("COLUMN_6") AS "COLUMN_6",
sum("COLUMN_7") AS "COLUMN_7",
sum("COLUMN_8") AS "COLUMN_8",
sum("COLUMN_9") AS "COLUMN_9",
sum("COLUMN_10") AS "COLUMN_10",
sum("COLUMN_11") AS "COLUMN_11",
sum("COLUMN_12") AS "COLUMN_12",
sum("COLUMN_13") AS "COLUMN_13",
sum("COLUMN_14") AS "COLUMN_14"
FROM "SCHEMA"."VIEW_ABCD"
('PLACEHOLDER' = ('$$INPUT_PARAM$$', '''ZAAA'''))
where "SALESORG" = '1000' and "CALDAY" between '20190101' and '20191231';

The query created and transmitted to the remote source through SDA

SELECT
SUM("VIEW_ABCD"."COLUMN_1"),
SUM("VIEW_ABCD"."COLUMN_2"),
SUM("VIEW_ABCD"."COLUMN_3"),
SUM("VIEW_ABCD"."COLUMN_4"),
SUM("VIEW_ABCD"."COLUMN_5"),
SUM("VIEW_ABCD"."COLUMN_6"),
SUM("VIEW_ABCD"."COLUMN_7"),
SUM("VIEW_ABCD"."COLUMN_8"),
SUM("VIEW_ABCD"."COLUMN_9"),
SUM("VIEW_ABCD"."COLUMN_10"),
SUM("VIEW_ABCD"."COLUMN_11"),
SUM("VIEW_ABCD"."COLUMN_12"),
SUM("VIEW_ABCD"."COLUMN_13"),
SUM("VIEW_ABCD"."COLUMN_14")
FROM (SELECT
"COLUMN_1",
"COLUMN_2",
"COLUMN_3",
"COLUMN_4",

SAP HANA Performance Guide for Developers


SQL Query Performance PUBLIC 97
"COLUMN_5",
"COLUMN_6",
"COLUMN_7",
"COLUMN_8",
"COLUMN_9",
"COLUMN_10",
"COLUMN_11",
"COLUMN_12",
"COLUMN_13",
"COLUMN_14
FROM "_SYS_BIC"."Model.Package/VIEW_ABCD"
( PLACEHOLDER."$$INPUT_PARAM$$" => '''ZAAA''' ) ) "VIEW_ABCD"
WHERE "VIEW_ABCD"."SALESORG" = '1000'
AND "VIEW_ABCD"."CALDAY" >= '20190101'
AND "VIEW_ABCD"."CALDAY" <= '20191231';

Related Information

Initial Analysis [page 98]


Apply the Hint NO_CS_UNION_ALL [page 100]
Reinvestigate the Visualized Plan from an Overall Perspective [page 101]
Case-Study Conclusions [page 104]

5.4.2.1 Initial Analysis

You can use the Plan Visualizer to compare the plan with the short execution time (good plan) with the plan
with the long execution time (bad plan).

The visualized plan of the bad plan is shown below:

SAP HANA Performance Guide for Developers


98 PUBLIC SQL Query Performance
The visualized plan of the good plan is shown below:

Although the plans are obviously different, they do not show which operations were changed or how they were
changed. Since the problem involves a long execution time, you need to find the point in the execution plan
where the long execution time occurred. In the bad plan, Column Union All required 65 seconds out of the 80
seconds of inclusive time.

Comparing the two plans, you can see that different algorithms are used in the UNION operation. The bad plan
uses Column Union All whereas the good plan uses Materialized Union All:

It would be wrong to conclude that Materialized Union All is a faster operator than Column Union All. In
many cases, particularly when complex business data is stored in column store tables as numbers, column
operators perform better than row operators. This is demonstrated in the next step where you apply the hint
NO_CS_UNION_ALL.

SAP HANA Performance Guide for Developers


SQL Query Performance PUBLIC 99
5.4.2.2 Apply the Hint NO_CS_UNION_ALL

You have seen that the different physical Union All operators affect performance. By applying the
NO_CS_UNION_ALL hint, you can force the SQL optimizer to give more weight to the row operator when it
processes the physical enumeration step of Union All.

The hint will force the Materialized Union All operator to be chosen unless the estimated benefit of the column
operator outweighs the additional weight given to the row operator. When this hint was applied in the real-case
scenario, execution time improved considerably.

However, it is often not helpful to simply change a physical operator from column to row. One of the main
features of the SQL optimizer is to prefer column operators by default because of the benefits of a fully
columnar plan. When a plan is made into one single column search, no materialization is involved and
performance is expected to improve. Therefore, the optimizer prefers to use columnar operators and push
them into column searches whenever it can.

With respect to the optimizer's preference for column operators, further analysis should be done to identify the
fundamental scope and root cause of the issue.

SAP HANA Performance Guide for Developers


100 PUBLIC SQL Query Performance
5.4.2.3 Reinvestigate the Visualized Plan from an Overall
Perspective

Instead of viewing the visualized plan in the way it is initially presented, you can drill down the column search
operators by clicking the downward arrows. This gives you more information not only about the execution
engines but also about the operations that are hidden at first glance.

The visualized plan of the bad plan with Column Union All unfolded is shown below:

SAP HANA Performance Guide for Developers


SQL Query Performance PUBLIC 101
The visualized plan of the good plan with one of the Analytical Searches unfolded is shown below:

By expanding the column searches, you can see the overall structure of the plan.

The visualized plan of the bad plan with estimated data sizes is shown below:

SAP HANA Performance Guide for Developers


102 PUBLIC SQL Query Performance
The visualized plan of the good plan with estimated data sizes is shown below:

SAP HANA Performance Guide for Developers


SQL Query Performance PUBLIC 103
In the bad plan, Column Union All took 65 seconds out of a total execution time of 80 seconds, and most of the
65 seconds were consumed by the operation called Network Data Transfer. Network Data Transfer occurs when
the data in one node of the system needs to be transferred through the network to another node for further
processing. Considering that the same Network Data Transfers already happened at the beginning to fetch the
data from the base tables, this plan has a heavy network load compared to the faster plan. The location of
the operation is determined internally in the SQL optimizer, taking multiple values as variables and therefore
making it very difficult to relocate one operation without affecting the other operations. So you need to find the
reason why these operations needed to be transferred and why they caused such a significant performance
degradation.

The main difference between the two plans is that in the good plan the Union operation is after the joins,
whereas in the bad plan the Union operations are before the join. In the good plan, the INNER JOINs are under
the Analytical Search (column search of the OLAP engine), which is hidden when it is collapsed. The good plan
is even faster due to the AGGREGATIONs that are located right after the table scans. In the Analytical Search
at the bottom, the data is already aggregated with a single record as the intermediate result, which helps
reduce the execution time. The other Analytical Searches share the same structure, which means that there
are already four AGGREGATIONs and four INNER JOINs before the Materialized Union All operation is reached.

In the bad plan, there is nothing significant inside the base Column Searches, but the calculation scenario
inside Column Union All indicates that part of the plan could not be unfolded and that the calculation
engine had to be used. A calculation scenario can involve a large complex model which generally has many
aggregations, joins, and expressions, so it is likely that the counterparts of the AGGREGATIONs and INNER
JOINs are inside the calculation scenario in the bad plan. In the real case scenario, there were multiple
aggregations in the calculation model. The location of the aggregations and joins is important because they
need to be done at the beginning to reduce the intermediate data sizes. When heavy operations are performed
later, the intermediate results are larger and therefore performance is slower.

Overall, the difference in the logical structure is a potential cause of the performance degradation. You
could decide to apply hints that would push down the joins through the Union All operation and even the
aggregations as well.

Apply Hint PREAGGR_BEFORE_UNION

There are many hints that you could try applying. They include PREAGGR_BEFORE_JOIN, AGGR_THRU_JOIN,
PREAGGR_BEFORE_UNION, and JOIN_THRU_UNION. You could choose one or a combination of these hints
and test the resulting performance.

In the real-case scenario, PREAGGR_BEFORE_UNION alone was sufficiently effective because the aggregations
resulted in just one record, which is a cardinality that can improve the entire query performance. When data
is aggregated at an earlier stage, a greater performance gain is to be expected, which was confirmed by the
considerable improvement in performance when the aggregations were positioned before the joins.

5.4.2.4 Case-Study Conclusions

These were the key takeaways from this case study.

• The Plan Visualizer is the appropriate starting point in most cases.

SAP HANA Performance Guide for Developers


104 PUBLIC SQL Query Performance
Although the Plan Visualizer is not a perfect tool and it does not contain all the information that you might
need, it is the best tool for getting an analysis started on the right track. Since it uses a visualized format,
it is easier to understand than other traces and it allows you to get a better overview of the entire process.
You can drag the plan to see its overall shape. The node colors help you to more precisely compare plans
when you have multiple plans. Even if you know that the Plan Visualizer will not help you find the actual
solution, it can still help to ensure that your efforts are focused in the right direction.
• There is no single single solution to the problems.
Even though you have potential workarounds that seem to work for a specific scenario, you can never know
which one or which combination will work before you test them on the query. The plans are complicated
and in most cases you will not be able to correctly predict how the hints will work. SQL optimization is not
simple math and it takes numerous comparisons, calculations, and evaluations to reach a final decision.
You should never rely on suggested workarounds until they are proven to work in the production system.

5.4.3 Composite OOM due to Memory Consumed over 40


Gigabytes by a Single Query

A single query runs forever and does not return any results. With the statement memory limit set to 40
gigabytes, the query execution results in a composite out-of-memory error. Such behavior was first seen a few
days ago and since then it happens every time the query runs.

You need to find the root cause and a solution.

Statement

SELECT
/* FDA READ */ /* CDS access control applied */ "COMPANYCODE" ,
"ACCOUNTINGDOCUMENT" ,
"FISCALYEAR" ,
"ACCOUNTINGDOCUMENTITEM" ,
"TAXCODE" ,
"TRANSACTIONTYPEDETERMINATION" ,
"STATRYRPTGENTITY" ,
"STATRYRPTCATEGORY" ,
"STATRYRPTRUNID" ,
"TAXNUMBER1" ,
"COUNTRY" ,
"DEBITCREDITCODE" ,
"TAXCALCULATIONPROCEDURE" ,
"TAXRATE" ,
"IPITAXRATE" ,
"ADDITIONALTAX1RATE" ,
"CONDITIONAMOUNT" ,
"CUSTOMER" ,
"GLACCOUNT" ,
"TAXITEMGROUP" ,
"BUSINESSPLACE" ,
"TAXJURISDICTION" ,
"LOWESTLEVELTAXJURISDICTION" ,
"ACCOUNTINGDOCUMENTTYPE" ,
"REFERENCEDOCUMENTTYPE" ,
"REVERSEDOCUMENT" ,
"REVERSEDOCUMENTFISCALYEAR" ,

SAP HANA Performance Guide for Developers


SQL Query Performance PUBLIC 105
"CONDITIONRECORD" ,
"ROUNDINGDECIMALPLACES" ,
"DOCUMENTREFERENCEID" ,
"LEDGER" ,
"LEDGERGROUP" ,
"POSTINGDATE" ,
"DOCUMENTDATE" ,
"TAXREPORTINGDATE" ,
"REPORTINGDATE" ,
"FISCALPERIOD" ,
"EXCHANGERATE" ,
"ISREVERSAL" ,
"ACCOUNTINGDOCUMENTHEADERTEXT" ,
"COMPANYCODECOUNTRY" ,
"REPORTINGCOUNTRY" ,
"DIFFERENCETAXAMTINCOCODECRCY" ,
"DIFFTAXBASEAMOUNTINCOCODECRCY" ,
"TAXTYPE" ,
"TARGETTAXCODE" ,
"TAXNUMBER2" ,
"ACTIVETAXTYPE" ,
"TAXNUMBER3" ,
"CALCULATEDTXAMTINCOCODECRCY" ,
"CALCULATEDTAXAMOUNTINTRANSCRCY" ,
"BUSINESSPARTNER" ,
"CALCDTXBASEAMTINCOCODECRCY" ,
"CALCULATEDTXBASEAMTINTRANSCRCY" ,
"TOTALGROSSAMOUNTINCOCODECRCY" ,
"TOTALGROSSAMOUNTINTRANSCRCY" ,
"NONDEDUCTIBLEINPUTTAXAMOUNT" ,
"CONFIGURATIONISACTIVE" ,
"TRIGGERISACTIVE" ,
"DCBLVATINCRDCOSTINRPTGCRCY" ,
"BUSINESSPARTNERNAME" ,
"TAXTYPENAME" ,
"CUSTOMERSUPPLIERADDRESS" ,
"TAXISNOTDEDUCTIBLE" ,
"BALANCEAMOUNTINTRANSACCURRENCY" ,
"CONDITIONTYPE"
FROM /* Entity name: I_STRPTAXRETURNCUBE CDS access controlled */
"ISRTAXRETURNC" ( "P_ISPARAMETER" => 'N' , "P_CONDITIONISMANDATORY" => 'N' )
"I_STRPTAXRETURNCUBE"
WHERE "MANDT" = '100';

Related Information

Plan Visualizer Overview [page 107]


Filter Pushdown to Other Join Candidates [page 107]
Columnize the Filter [page 108]
Push Down the Aggregation Through the Joins [page 109]
Case-Study Conclusions [page 110]

SAP HANA Performance Guide for Developers


106 PUBLIC SQL Query Performance
5.4.3.1 Plan Visualizer Overview

Use the Plan Visualizer to start off your analysis.

The Plan Visualizer overview of the badly performing query is shown below:

You can see that the views VIEW A, VIEW B and VIEW C share the same base table (TABLE A) as their data
source. The join between the three views is followed by another join with five other tables, and then yet another
join, which is followed by an aggregation. These joins involve multiple tables, but the data size of each table is
not that big. It therefore seems strange that this query should have a memory problem.

5.4.3.2 Filter Pushdown to Other Join Candidates

The filter applied to VIEW A is pushed down from one of the parent nodes. For example, you position this
filter after the join between VIEW A, VIEW B, and VIEW C and then this filter can be pushed down to the join
candidates depending on their join conditions.

In this case, only VIEW A inherited the filter even though the three views have the same base table as the data
source. If you could therefore push down the same filter to the other two views as well, the join size could
be reduced dramatically. In the current situation, the maximum joined data size is 43.2 quadrillion (30,000 *
1,200,000 * 1,200,000 = 43,200,000,000,000,000). If you could reduce the data from the other two views,
it could be lowered to 27 trillion (30,000 * 30,000 * 30,000 = 27,000,000,000,000). This would be a huge
improvement with the data size reduced by more than a factor of a thousand.

SAP HANA Performance Guide for Developers


SQL Query Performance PUBLIC 107
Consider filter pushdown to TABLE A, which is used by each of the views:

You could consider altering some of the join conditions to force the filter to be pushed down to the other two
views. In the real case scenario, this was solved by adding a join condition to each relation (VIEW A to VIEW B,
and VIEW A to VIEW C). The column used in the join condition was the filtered column in VIEW A. In this way,
the filter could be pushed down from the parent node and even from one join candidate to another.

5.4.3.3 Columnize the Filter

Another aspect to consider would be changing the physical operators. Although all the source tables used in
this query are column-store tables, several row operators are used.

This is not "wrong", because if there is a column-search blocker, such as in VIEW A in this scenario, the logical
decision would be to execute the subsequent operations in row engines. However, if these operators were
executed in column engines and potentially even absorbed into existing column searches, this would change
the structure completely. If the filter in the diagram below were changed into a column operator, it is highly
likely that it would be absorbed into the column search and that the existing column searches would be merged
into one or two searches.

SAP HANA Performance Guide for Developers


108 PUBLIC SQL Query Performance
Consider columnizing the filter:

In the real case scenario, the row filter was turned into a columnar one by applying the hint CS_FILTER, which
reduced the memory usage of the query. The query required 2 minutes, which was still not entirely satisfactory,
but the main goal of avoiding the composite OOM was achieved.

5.4.3.4 Push Down the Aggregation Through the Joins

The aggregation at the end of the plan is very effective because it reduces 120,000 rows to 30 rows. However,
there is no reason why an effective data reducer needs to be positioned at the end of the plan, where the data
reduction might no longer be very significant.

It might be better to push down the aggregation below the joins to minimize the intermediate result as much as
possible. The intermediate result size is a critical factor when it comes to huge memory usage problems.

In the real case scenario, the hint NO_JOIN_THRU_AGGR was used together with the hint CS_FILTER. This
combination reduced the execution time to 29.6 seconds.

SAP HANA Performance Guide for Developers


SQL Query Performance PUBLIC 109
Consider pushing down the aggregation through the joins:

5.4.3.5 Case-Study Conclusions

These were the key takeaways from this case study.

• For memory problems, the key is to reduce the intermediate result.


Even when the datasets of join candidates are small, the result can be huge when they are multiplied. This
is a problem particularly when the join is an inner join because inner joins do not guarantee data reduction
after the join. In the worst case, the intermediate result is the cross product between the candidates.
The first step to minimize the intermediate results is to try adding more filters. It is best to change
the statement or use ALTER commands to modify the definition of the views. However, there are cases
where you cannot modify the definitions, because some of the objects are standard objects which cannot
be changed. Nevertheless, you still can modify the plan by changing the factors around those standard
objects. A good example is to add join conditions for filter pushdown.
• It is worth thinking about the physical operators that are used.
In general, one operator is not better than another, so it might be helpful to consider which physical
operator is more efficient for a given situation. For example, the tables used are mostly column-store
tables. If there is no clear evidence why the row operators are needed in the plan, it is worth putting more
weight on the column engines because that allows the benefits of absorption to be utilized.

SAP HANA Performance Guide for Developers


110 PUBLIC SQL Query Performance
5.4.4 Performance Degradation of a View after an Upgrade
Caused by Calculation View Unfolding

After an upgrade, one of the calculation views is slower than before the upgrade, whereas the other views have
an overall performance gain. The problematic view is a standard SAP HANA model, so changes to this model
are not allowed.

An obvious difference is that the plan is unfolded in the upgraded system whereas it was not unfolded in
the previous revision. This is because calculation views are unfolded by default unless they are configured
otherwise or they cannot be unfolded. Generally, calculation view unfolding allows a better optimization to
be achieved, but it does not seem to be the case in this scenario. You want to restore the previous level of
performance or at least mitigate the performance degradation in the current version.

Statement

select * from "_SYS_BIC"."sap.is.ddf.dc/DistributionCurveQuery"(


PLACEHOLDER."$$P_SAPClient$$" => '100',
PLACEHOLDER."$$P_TaskId$$" => '01',
PLACEHOLDER."$$P_PlanningDate$$" => '20190101');

Related Information

Overview of a Complex Execution Model [page 111]


Hot Spot Analysis [page 113]
Operation Detail Analysis [page 114]
Plan Analysis [page 115]
Case-Study Conclusions [page 117]

5.4.4.1 Overview of a Complex Execution Model

The execution plan at issue is extremely complex when displayed in the Plan Visualizer trace. However, it is
worth remembering that the Plan Visualizer is a good starting point because it is one of the least complex
traces.

To get started, it is important to find the hot spot. The hot spot is the operation with the longest execution
time. In the real case scenario, two points were found that required most of the execution time. You should start
off by focusing on these two points; enlarge the area around them and simplify the components for a better
overview.

SAP HANA Performance Guide for Developers


SQL Query Performance PUBLIC 111
The complete visualized plan for this case is shown below, expanded on the left and collapsed on the right:

SAP HANA Performance Guide for Developers


112 PUBLIC SQL Query Performance
5.4.4.2 Hot Spot Analysis

While the entire plan takes 33 seconds to execute, most of the execution time involves the later part of the
execution, which is shown by the operators at the top of the visualized plan.

The upper part of the graph is shown below:

You can see above that two column searches consumed most of the time, 14 seconds and 18 seconds
respectively. Given that the total execution time was 33 seconds, the performance of the query would improve
considerably if these two issues were resolved.

The logical structure shows that both column searches at issue are joins. As in the previous case study, you
should try to work out what happened by looking at the join candidates' data sizes. In this case, the join
candidates are not heavy. One of the join candidates of the 14-second column search only has one record. The
join of the 18-second column search has five different tables as join candidates, but they are not very big. Also,
the output of this join is just one record, which implies that the cardinality of this join must have been very
small.

It is strange that these two joins took so long. To investigate further, you need to look at the details of the join
operations.

SAP HANA Performance Guide for Developers


SQL Query Performance PUBLIC 113
5.4.4.3 Operation Detail Analysis

The operation details normally include the operator name, its internal ID, schema, inclusive and exclusive
execution times, execution start and end times, and other detailed information depending on the type of
operation. For joins, it shows the join type, conditions, and cardinality.

You can see the operation details by hovering over the operator.

The upper part of the visualized plan under investigation is shown below with the operation details added on
the right:

The first join from the top of the two previously mentioned joins has two inner join conditions, "A.col1 = B.col1"
and "A.col2 = NCASE WHEN HEXTOBIN (CASE WHEN…". While the first condition is quite simple, the second
condition contains multiple layers of evaluation. These evaluations contain expressions like HEXTOBIN(),
BINTOHEX(), and TO_BINARY().

Normally, multiple layers of CASE do not cause any problems. Depending on the situation, however, the CASE
predicate can act as a pushdown blocker, but the presence of multiple CASEs does not necessarily mean that
they should be rewritten or fixed. The expressions used to alter the values into different text types are more
worthy of attention. Turning hexadecimal codes into binaries and vice versa requires the actual value of each
row in the column to be expressed in the target type. This is a row-wise process that needs to be done by

SAP HANA Performance Guide for Developers


114 PUBLIC SQL Query Performance
reading the value of each individual row. In this context, it does not seem appropriate that these expressions
are processed in the column engines, as in the case scenario. It is not clear what the benefit would be of a
columnar operation in this situation.

The reason behind this is that row-wise expressions have gradually acquired columnar counterparts. At
first, these expressions probably only existed as row engine operators and therefore probably served as
pushdown blockers and undermined the chance of building columnar plans. To tackle those situations,
equivalent expressions were also built on the column engine side. In this particular case, it is likely that
column expressions are chosen and pushed down because the SQL optimizer prefers column operations to row
operations.

It is worth trying the hint NO_CS_EXPR_JOIN because this will remove the weight on the column engine when
the physical operators are determined for the expression joins. The NO_CS_EXPR_JOIN hint could also have a
positive effect on the other heavy join, which also has a row-wise expression (HEXTOBIN).

In the real case scenario, the NO_CS_EXPR_JOIN hint cut the execution time down to a quarter of its previous
length, that is, to around 9 seconds. With the hint applied, the upper join that originally took 14 seconds was
reduced to less than one second, and the other join requiring 18 seconds was reduced to 5 seconds.

5.4.4.4 Plan Analysis

More room for improvement was found by looking at the plan from a distance. The plan is big and complex,
with heavily repeated joins and aggregations. The joins and aggregations in this plan were mostly executed by
the join engine (column search) and row engine (hash join).

In SAP HANA, repeated joins with aggregations are often assigned to the OLAP engine, which is good at
integrating large datasets particularly when they are in a fact-and-dimension relationship. In this scenario,
however, OLAP operations (analytical search) were rare.

SAP HANA Performance Guide for Developers


SQL Query Performance PUBLIC 115
The lower part of the visualized plan is shown below:

The execution time at the beginning of the plan is not very long, but when the pattern is captured in the logical
structure, it is evident that there is still room for improvement. The estimated sizes of the output after each
operation are all less than 5000. By default, OLAP engines process aggregations only when the input data
has more than 5000 rows. In other words, when aggregations have an input data size that is less than 5000
rows, the SQL optimizer does not even consider using the OLAP engine to execute them. This means that
plans with small datasets cannot utilize the benefits of the OLAP engine even when it would be beneficial. The
decision rule determining whether the OLAP engine or join engine is used is designed to make the compilation
process more efficient and allow the SQL optimizer to avoid redundancy. From time to time, you might need to
intervene depending on the situation.

To make the SQL optimizer consider the OLAP engine for join and aggregation executions even with small
datasets, you can try using the hint USE_OLAP_PLAN. In the real case scenario, the hint USE_OLAP_PLAN
together with the hint NO_CS_EXPR_JOIN reduced the total execution time to 7 seconds.

SAP HANA Performance Guide for Developers


116 PUBLIC SQL Query Performance
The lower part of the visualized plan is shown below with the estimated sizes highlighted in red:

5.4.4.5 Case-Study Conclusions

These were the key takeaways from this case study.

• It is worth trying different column engines.


There are various rules that help the SQL optimizer decide which engine to use. Some rules are relative
whereas others are absolute. When rules are absolute, they don’t apply under certain conditions or they
are completely blocked. One example is the estimated size of the aggregation input, which cannot be
processed by the OLAP engine when the number of rows is less than 5000. In such cases, there are no
alternatives to compare. Therefore, you need to let the SQL optimizer know that there is an alternative, for
example, that the OLAP engine is an alternative to the join engine. If the SQL optimizer does not respond,
this means that the SQL optimizer still prefers its previous choice even after considering the alternatives
you have just enabled. However, if the SQL optimizer changes its behavior and chooses the alternative,
there is a chance that your workaround could solve the problem. Although there is a risk of performance
degradation after the workaround, it is still worth trying a different engine.
• There is no magic word in performance tuning.
Finding the problematic expressions when there are dozens of operators in the Plan Visualizer is neither
luck nor intuition. It is time that matters, and the more time you spend analyzing traces, the more
information you will get. There is no magical way to solve a problem, so you should try to read and
understand the traces. In many cases, performance issues can be mitigated by using simple hints.

SAP HANA Performance Guide for Developers


SQL Query Performance PUBLIC 117
Whenever there is a performance issue to solve, it is important to be patient. The Plan Visualizer should
be your first step before you move on to the other resources that are available to enrich your analysis.
Remember to spend time reading and analyzing the trace before jumping to conclusions. You should
always test your solution before implementing it in the production system.

5.5 SQL Tuning Guidelines

You can significantly improve the performance of SQL queries by knowing how the SAP HANA database and
SAP HANA engines process queries and by adapting them accordingly.

As a general guideline for improving SQL query performance, we recommend avoiding operations that are not
natively supported by the various SAP HANA engines since they can significantly increase the time required to
process the queries.

Please note that the specific recommendations described here may help to improve the performance of SQL
queries involving column tables.

 Caution

Throughout this section, adding generated columns is mentioned as a possible workaround for improving
query performance. However, it should be noted that adding generated columns improves query
performance at the expense of increased memory consumption, and increased insertion and update cost.
You should be aware of this trade-off before deciding to add generated columns.

Related Information

Avoiding Implicit Type Casting in Queries [page 118]


Avoiding Inefficient Predicates in Joins [page 119]
Avoiding Inefficient Predicates in EXISTS/IN [page 124]
Avoiding Set Operations [page 126]
Improving Performance for Multiple Column Joins [page 127]
Using Hints to Alter a Query Plan [page 127]

5.5.1 Avoiding Implicit Type Casting in Queries

You can avoid implicit type casting by instead using explicit type casting or by adding generated columns.

The system can generate type castings implicitly even if you did not explicitly write a type-casting operation.
For example, if there is a comparison between a VARCHAR value and a DATE value, the system generates
an implicit type-casting operation that casts the VARCHAR value into a DATE value. Implicit type casting is
performed from the lower precedence type to the higher precedence type. For information about the data
type precedence rules, see the SAP HANA SQL Reference Guide. If two columns are frequently compared by
queries, it is better to ensure that they both have the same data type from the beginning.

SAP HANA Performance Guide for Developers


118 PUBLIC SQL Query Performance
One way to avoid the cost of implicit type casting is by using explicit type casting on an inexpensive part of
the query. For instance, if a VARCHAR column is compared with a DATE value and you know that casting the
DATE value into a VARCHAR value gives you what you want, it is recommended to cast the DATE value into
a VARCHAR value. If the VARCHAR column contains only date values in the form of 'YYYYMMDD', it could be
compared with a string generated from a DATE value in the form of 'YYYYMMDD'.

In the example below, date_string is a VARCHAR column. Note that the result when strings are compared is
generally different from the result when dates are compared, and only in some cases is it identical:

Problematic query

SELECT * FROM T WHERE date_string < CURRENT_DATE;

Workaround

SELECT * FROM T WHERE date_string < TO_VARCHAR(CURRENT_DATE, 'YYYYMMDD');

If there is no way to avoid implicit type casting, one way to avoid the issue entirely is to add generated columns.

In the example below, you can find '1', '1.0', and '1.00' stored in a VARCHAR column using the revised query,
which avoids implicit type casting:

Problematic query

SELECT * FROM T WHERE varchar_value = 1;

Workaround

ALTER TABLE T ADD (num_value DECIMAL GENERATED ALWAYS AS varchar_value);


SELECT * FROM T WHERE num_value = 1;

Related Information

Data Type Conversion

5.5.2 Avoiding Inefficient Predicates in Joins

This section lists the predicate conditions that are not natively supported by the column engine.

Depending on the condition, intermediate results are materialized and consumed by the row engine or column
engine. It is always good practice to try to avoid intermediate result materialization since materialization can be
costly if results are large and have a significant impact on the performance of the query.

SAP HANA Performance Guide for Developers


SQL Query Performance PUBLIC 119
Related Information

Non-Equijoin Predicates in Outer Joins [page 120]


Calculated Column Predicates [page 120]
Rearranging Join Predicates [page 121]
Cyclic Joins [page 122]
Subquery Filter Predicates Over Multiple Tables Inside Outer Joins [page 123]
Subquery Projection of Constant or Calculated Values Below Outer Joins [page 123]

5.5.2.1 Non-Equijoin Predicates in Outer Joins

The column engine natively supports an outer join with a join predicate of equality. It does not natively support
an outer join with join predicates other than the equality condition (that is, non-equijoin) and join predicates
connected by OR (even if each predicate is an equality condition).

Also, if equijoin predicates are connected to non-equijoin predicates by AND, they are processed in the same
way as cases with only non-equijoin predicates.

When non-equijoin predicates are used, the row engine executes the join operation using the appropriate join
algorithm (nested loop, range, hashed range) after materializing the intermediate results from both children.

An example of how a non-equijoin predicate can be rewritten as an equijoin predicate is shown below. In the
example, M is a table containing the first and last dates of the months of interest:

Problematic query

SELECT M.year, M.month, SUM(T.ship_amount)


FROM T LEFT OUTER JOIN M ON T.ship_date BETWEEN M.first_date AND M.last_date
GROUP BY M.year, M.month;

Workaround

SELECT M.year, M.month, SUM(T.ship_amount)


FROM T LEFT OUTER JOIN M ON EXTRACT(YEAR FROM T.ship_date) = M.year AND
EXTRACT(MONTH FROM T.ship_date) = M.month
GROUP BY M.year, M.month;

5.5.2.2 Calculated Column Predicates

Intermediate results from calculations that are not natively supported by the column engine are materialized,
but this can be avoided by using generated columns.

Most calculations are natively supported by the column engine. If a calculation is not natively supported by the
column engine, the intermediate results from both children are materialized and consumed by the row engine.
One way to avoid this is to add generated columns.

SAP HANA Performance Guide for Developers


120 PUBLIC SQL Query Performance
An example of how calculated join columns can be avoided by using generated columns is shown below. In the
example, M is a table containing the first and last dates of the months of interest:

Problematic query

SELECT * FROM T JOIN M ON NEXT_DAY(T.ship_date) = M.last_date;

Workaround

ALTER TABLE T ADD (next_day DATE GENERATED ALWAYS AS ADD_DAYS(ship_date, 1));


SELECT * FROM T JOIN M ON T.next_day = M.last_date;

Calculations with parameters are not natively supported by the column engine. In these cases, the
intermediate results are materialized and consumed by the column engine. This can have an impact on
performance if there is a large amount of materialization.

Also, calculations involving multiple tables are not natively supported by the column engine. This results in
the intermediate results being materialized and consumed by the row engine. One way to avoid these types of
calculation is to maintain the needed columns in different tables by changing the schema design.

We recommend trying to rewrite the query to avoid the following conditions, if possible:

Predicate with parameter is not natively supported

SELECT * FROM T JOIN M ON T.ship_amount + ? = M.target_amount;

Calculation involving multiple tables is not natively supported

SELECT * FROM T JOIN M ON IFNULL(T.ship_amount, M.pre_order_amount) =


M.target_amount;

You can also try using the following hints to optimize performance for predicates with calculated columns:

WITH HINT(CS_EXPR_JOIN) --prefer column engine calculation


WITH HINT(NO_CS_EXPR_JOIN) --avoid column engine calculation

5.5.2.3 Rearranging Join Predicates

A filter predicate that refers to two or more relations on one side of the predicate (for example, "A1" - "B1" =
1) may not be efficiently handled, and is likely to cause an inefficient post-join filter operator above CROSS
PRODUCT in the resulting query plan.

Since there is a chance of runtime errors occuring, such as numeric overflow or underflow, it is usually not
possible for the SQL optimizer to rewrite the predicate to "A1" = "B1" + 1. However, you can rewrite the predicate
into a friendlier version by rearranging the terms across the comparison operator, as shown in the example
below (be aware though of the possibility of runtime errors):

SAP HANA Performance Guide for Developers


SQL Query Performance PUBLIC 121
Problematic query

SELECT * FROM TA, TB WHERE a1 - b1 = 1;


SELECT * FROM TA, TB WHERE DAYS_BETWEEN(a1, b1) = -1;

Workaround

SELECT * FROM TA, TB WHERE a1 = b1 + 1;


SELECT * FROM TA, TB WHERE ADD_DAYS(b1, -1) = a1;

5.5.2.4 Cyclic Joins

The column engine does not natively support join trees that have cycles in the join edges if an outer join is
involved in the cycle (also known as a cyclic outer join). If there is such a cycle involving an outer join, the result
from a child of the join that completes the cycle is materialized to break the cycle.

The column engine does support the cyclic inner join natively, but it is better to avoid it because its
performance is inferior to the acyclic inner join.

One way of breaking the cycle is to maintain the needed columns in different tables by changing the schema
design. For the acyclic join in the example below, the nation column in the supplier table is moved to a
line-item table:

Cyclic join

SELECT * FROM supplier S, customer C, lineitem L


WHERE L.supp_key = S.key AND L.cust_key = C.key AND S.nation = C.nation;

Acyclic join

SELECT * FROM supplier S, customer C, lineitem L


WHERE L.supp_key = S.key AND L.cust_key = C.key AND L.supp_nation = C.nation;

The SQL optimizer selects a cyclic inner join based on cost, so it might sometimes be worth using hints to
guide the optimizer to break the cyclic join into two column searches, and vice versa.

You can try using the following hints to optimize performance for cyclic inner joins:

WITH HINT(CYCLIC_JOIN) --prefer cyclic join


WITH HINT(NO_CYCLIC_JOIN ) --break cyclic join

SAP HANA Performance Guide for Developers


122 PUBLIC SQL Query Performance
5.5.2.5 Subquery Filter Predicates Over Multiple Tables
Inside Outer Joins

Filter predicates over multiple tables are not natively supported by the column engine if they are inside an outer
join. In these cases, the result from the child with the filter predicate is materialized before executing the join.

However, filter predicates over the left child of a left outer join and filter predicates over the right child of a
right outer join are exceptions because moving those predicates upwards in the outer join produces the same
results. This is done automatically by the SQL optimizer.

The example below shows a filter predicate that triggers the materialization of the intermediate result. One way
of avoiding the materialization in the example would be by maintaining the priority column in the lineitem
table instead of the orders table:

Problematic query

SELECT * FROM customer C LEFT OUTER JOIN (


SELECT * FROM orders O JOIN lineitem L ON O.order_key = L.order_key
WHERE L.shipmode = 'AIR' OR O.priority = 'URGENT') ON C.cust_key =
L.cust_key;

Workaround

SELECT * FROM customer C LEFT OUTER JOIN (


SELECT * FROM lineitem L WHERE L.shipmode = 'AIR' OR L.priority =
'URGENT') ON C.cust_key = L.cust_key;

5.5.2.6 Subquery Projection of Constant or Calculated


Values Below Outer Joins

The column engine does not natively support constant or calculated value projection below outer joins. If there
is a constant or calculated value projection below an outer join, the result from the child with the projection is
materialized before executing the join.

However, constant or calculated value projections over the left child of the left outer join or the right child of
the right outer join are exceptions because moving those projections above the join produces the same results.
This is done automatically by the SQL optimizer.

Also, if a calculation cannot guarantee that NULL will be returned when the input is NULL, the intermediate
result will be materialized.

The example below shows a calculation that triggers materialization because the COALESCE function cannot
guarantee that NULL will be returned when the input is NULL:

Calculation that will not trigger materialization

SAP HANA Performance Guide for Developers


SQL Query Performance PUBLIC 123
SELECT * FROM customer C LEFT OUTER JOIN (
SELECT L.order_key FROM orders O JOIN lineitem L ON O.order_key =
L.order_key
) ON C.cust_key = L.order_key;

Calculation that will trigger materialization

SELECT * FROM customer C LEFT OUTER JOIN (


SELECT coalesce(L.order_key, 0) FROM orders O JOIN lineitem L ON
O.order_key = L.order_key
) ON C.cust_key = L.order_key;

A possible workaround to avoid the materialization of the intermediate result is to add a generated column for
constant or calculated values:

Problematic query

SELECT * FROM customer C LEFT OUTER JOIN (


SELECT 1 const FROM orders O JOIN lineitem L ON O.order_key =
L.order_key ) ON C.cust_key = L.const;

Workaround

ALTER TABLE ORDERS ADD (const INTEGER GENERATED ALWAYS AS 1);


SELECT * FROM customer C LEFT OUTER JOIN (
SELECT * FROM orders O JOIN lineitem L ON O.order_key = L.order_key ) ON
C.cust_key = L.const;

5.5.3 Avoiding Inefficient Predicates in EXISTS/IN

This section lists inefficient predicate conditions.

Related Information

Disjunctive EXISTS and IN Predicates [page 124]


NOT IN Predicates [page 125]

5.5.3.1 Disjunctive EXISTS and IN Predicates

If possible, avoid disjunctive EXISTS and IN predicates.

When an EXISTS or NOT EXISTS predicate is connected to other predicates by OR, it is internally mapped to
a left outer join. Since left outer join processing is more expensive than inner join processing in general, we

SAP HANA Performance Guide for Developers


124 PUBLIC SQL Query Performance
recommend avoiding these disjunctive EXISTS predicates, if possible. Also, avoid using OR to connect EXISTS
or NOT EXISTS predicates to other predicates, if possible.

The example below shows how a disjunctive EXISTS predicate can be avoided:

Problematic query

SELECT * FROM T WHERE EXISTS (SELECT * FROM S WHERE S.a = T.a AND S.b = 1) OR
EXISTS (SELECT * FROM S WHERE S.a = T.a AND S.b = 2);

Workaround 1

SELECT * FROM T WHERE EXISTS (SELECT * FROM S WHERE S.a = T.a AND (S.b = 1 OR
S.b = 2));

Another workaround is to use UNION ALL in the nested query. If the nested query result is very small, it
benefits query execution:

Workaround 2

SELECT * FROM T WHERE EXISTS ((SELECT * FROM S WHERE S.a = T.a AND S.b = 1)
UNION ALL (SELECT * FROM S WHERE S.a = T.a AND S.b = 2));

A similar problem occurs with the IN predicate:

Problematic query

SELECT * FROM T WHERE a IN (SELECT a FROM S WHERE S.b = 1) OR EXISTS (SELECT a


FROM S WHERE S.b = 2);

Workaround 1

SELECT * FROM T WHERE a IN (SELECT a FROM S WHERE S.b = 1 OR S.b = 2);

Workaround 2

SELECT * FROM T WHERE a IN ((SELECT a FROM S WHERE S.b = 1) UNION ALL (SELECT a
FROM S WHERE S.b = 2));

These recommendations also apply to the following cases:

• A filter predicate inside a NOT EXISTS predicate accessing multiple tables


• A filter predicate inside a disjunctive EXISTS predicate accessing multiple tables

5.5.3.2 NOT IN Predicates


The NOT IN predicate is much more expensive to process than NOT EXISTS. It is recommended to use NOT
EXISTS instead of NOT IN if possible.

In general, NOT IN requires an entire subquery to be processed first before the overall query is processed,
matching entries based on the condition provided. However, with NOT EXISTS, true or false is returned when

SAP HANA Performance Guide for Developers


SQL Query Performance PUBLIC 125
the provided condition is checked, so unless the subquery result is very small, using NOT EXISTS is much
faster than NOT IN (the same applies for EXISTS/IN).

The example below shows how the NOT IN predicate can be avoided. Note that the transformation in the
example is not valid in general. It is valid only if there are no null values in the columns of interest. The
transformation is automatically applied by the SQL optimizer if all columns of interest have NOT NULL
constraints declared explicitly:

NOT IN query

SELECT * FROM T WHERE a NOT IN (SELECT a FROM S);

Possibly equivalent query in some cases

SELECT * FROM T WHERE NOT EXISTS (SELECT * FROM S WHERE S.a = T.a);

5.5.4 Avoiding Set Operations

Since UNION ALL, UNION, INTERSECT, and EXCEPT are not natively supported by the column engine, avoiding
them may improve performance.

Examples of how UNION, INTERSECT, and EXCEPT can be avoided are shown below. Note that the
transformations in the examples are not valid in general. They are valid only if there are no null values in
the columns of interest. The transformations for INTERSECT and EXCEPT are automatically applied by the SQL
optimizer if all columns of interest have NOT NULL constraints declared explicitly:

UNION query

SELECT a, b FROM T UNION SELECT a, b FROM S;

Possibly equivalent query in some cases

SELECT DISTINCT COALESCE(T.a, S.a) a, COALESCE(T.b, S.b) b FROM T FULL OUTER


JOIN S ON T.a = S.a AND T.b = S.b;

INTERSECT query

SELECT a, b FROM T INTERSECT SELECT a, b FROM S;

Possibly equivalent query in some cases

SELECT DISTINCT T.a a, T.b b FROM T JOIN S ON T.a = S.a AND T.b = S.b;

EXCEPT query

SELECT a, b FROM T EXCEPT SELECT a, b FROM S;

Possibly equivalent query in some cases

SAP HANA Performance Guide for Developers


126 PUBLIC SQL Query Performance
SELECT DISTINCT T.a a, T.b b FROM T WHERE NOT EXISTS (SELECT * FROM S WHERE T.a
= S.a AND T.b = S.b);

 Note

UNION ALL is usually faster than UNION because it does not require set comparison. If duplicates are
acceptable or if you know that there will not be any duplicates, UNION ALL should be preferred over
UNION.

5.5.5 Improving Performance for Multiple Column Joins

Creating a concatenated column index can improve query performance when multiple columns are involved in
a join.

One way to optimize this type of query is to create a concatenated column index explicitly. However, note that
the creation of the index will increase memory consumption.

The example below shows a query that needs concatenated columns and the syntax that can be used to create
those columns:

Problematic query

SELECT M.year, M.month, SUM(T.ship_amount)


FROM T JOIN M ON T.year = M.year AND T.month = M.month
GROUP BY M.year, M.month;

Workaround

CREATE INDEX T_year_month ON T(year, month);


CREATE INDEX M_year_month ON M(year, month);

You can try using the following hints to optimize performance for multiple-column join predicates:

WITH HINT(OPTIMIZE_METAMODEL) --prefer creation of concatenated attribute during


query processing time
WITH HINT(NO_OPTIMIZE_METAMODEL) --avoid creation of concatenated attribute

5.5.6 Using Hints to Alter a Query Plan

This section lists the hints that can be used to alter a query plan.

Related Information

SAP HANA Performance Guide for Developers


SQL Query Performance PUBLIC 127
Changing an Execution Engine Decision [page 128]
Changing Query Transformation [page 128]
Changing Operator Order [page 129]
Choosing Preferred Algorithms [page 131]
Using ALTER SYSTEM to Pin Hints to Problematic Queries [page 132]

5.5.6.1 Changing an Execution Engine Decision

The SQL optimizer chooses an execution engine (join engine, OLAP engine, HEX engine, or ESX engine) based
on the cost model. For various reasons, the optimal plan may not be executed using the best engine.

You can use hints to explicitly state which engine should be used to execute a query. To exclude an engine,
apply the NO_USE_<engine>_PLAN hint. To force the use of an engine, apply the USE_<engine>_PLAN hint. If
a query cannot be executed in the specified engine, the USE_<engine>_PLAN hint does not have any effect.

For example, you can apply the following hints either to choose the OLAP engine or to avoid it:

WITH HINT(USE_OLAP_PLAN) -- guides the SQL optimizer to prefer the OLAP engine
(if possible) over other engines

WITH HINT(NO_USE_OLAP_PLAN) -- guides the SQL optimizer to avoid the use of the
OLAP engine

Note that the NO_USE_OLAP_PLAN hint could lead to the join engine, HEX engine, or ESX engine being chosen
instead.

Related Information

HINT Details

5.5.6.2 Changing Query Transformation

You can use hints to apply or remove query transformation rules.

By examining the plan, you can apply different hints to alter the query transformation as needed. By using
NO_<hint>, the rules can be disabled to produce the opposite effect. For a full list of available hints, see HINT
Details in the SAP HANA SQL and System Views Reference.

SAP HANA Performance Guide for Developers


128 PUBLIC SQL Query Performance
Table 1: Hint: GROUPING_SIMPLIFICATION

Before After

Table 2: Hint: GROUPING_REMOVAL

Before After

Related Information

HINT Details

5.5.6.3 Changing Operator Order

You can use hints to change the order of operators during plan generation.

By examining the plan, you can apply different hints to change the operator order as needed. By using
NO_<hint>, the operators can be disabled to produce the opposite effect. For a full list of available hints, see
HINT Details in the SAP HANA SQL and System Views Reference.

SAP HANA Performance Guide for Developers


SQL Query Performance PUBLIC 129
Table 3: Hint: AGGR_THRU_JOIN

Before After

Table 4: Hint: PREAGGR_BEFORE_JOIN

Before After

SAP HANA Performance Guide for Developers


130 PUBLIC SQL Query Performance
Table 5: Hint: FILTER_THRU_JOIN

Before After

Related Information

HINT Details

5.5.6.4 Choosing Preferred Algorithms

You can use hints to select preferred algorithms for execution (column engine versus row engine).

By using NO_<hint>, they can be disabled to produce the opposite effect. For a full list of available hints, see
HINT Details in the SAP HANA SQL and System Views Reference.

Table 6: Hint: CS_JOIN

Before After

SAP HANA Performance Guide for Developers


SQL Query Performance PUBLIC 131
Table 7: Hint: CS_ORDERBY

Before After

Table 8: Hint: RANGE_JOIN

Before After

Related Information

HINT Details

5.5.6.5 Using ALTER SYSTEM to Pin Hints to Problematic


Queries

A hint table can be used to persist the binding between a query and its hints.

If you want to persist the binding between a query and its hints or you are unable to append hints to a query
during runtime (that is, for application-generated queries), a hint table can be used. For a full list of available
hints, see HINT Details in the SAP HANA SQL and System Views Reference.

SAP HANA Performance Guide for Developers


132 PUBLIC SQL Query Performance
Related Information

HINT Details

SAP HANA Performance Guide for Developers


SQL Query Performance PUBLIC 133
6 SQLScript Performance

SQLScript is a programming language used in SAP HANA to embed data-intensive application logic into the
database.

To implement data-intensive application logic (for example, complex business logic) inside the SAP HANA
database, SQLScript supports imperative programming constructs like assignments, branches, loops, and
exception handlers, as well as declarative structured query language (SQL) queries. Conceptually SQLScript
is related to stored procedures as defined in the SQL standard, but SQLScript is designed to provide superior
optimization capabilities to take advantage of modern hardware, for example, by having data residing in main
memory and allowing massive parallelization on multi-core CPUs.

Related Information

SQLScript Optimizer [page 134]


SQLScript Performance Analysis - First Step [page 139]
SQLScript Performance Tuning Tips - Advanced Step [page 152]
SQLScript Performance Guidelines [page 178]

6.1 SQLScript Optimizer

An SQLScript procedure can consist of multiple SQL statements and control-flow statements. The SQLScript
optimizer first compiles and optimizes both types of statements. Then, the SQL queries in the body of the
SQLScript procedure are reshaped to derive more benefit from the SQL query optimizer. For example, multiple
SQL statements can be merged into a single statement by inlining the statements. Afterwards, the SQL query
optimizer compiles each reshaped SQL statement.

Put simply, first the SQLScript procedure body is optimized by the SQLScript optimizer and then the
reshaped SQL queries are optimized by the SQL query optimizer. The SQLScript optimizer applies rule-based
optimizations in which certain rules are applied repeatedly.

The figure below shows how the SQLScript optimizer first compiles and optimizes the body of an SQLScript
procedure using defined rules, and then generates an execution plan which the SQLScript engine executes.

SAP HANA Performance Guide for Developers


134 PUBLIC SQLScript Performance
The SQLScript engine calls the SQL query engine to execute the SQL statements, whereas the control-flow
statements are handled by the SQLScript engine:

SQLScript Optimizer Rules

There are four typical rules the SAP HANA SQLScript optimizer applies:

• Constant propagation
This involves evaluating and folding constant SQL expressions. The value is propagated through the value
chain.
• Procedure flattening
The body of the called SQLScript procedure is unfolded into the body of the calling SQLScript procedure.
• SQL statement inlining and parallelization
The SQLScript optimizer decides whether to apply statement inlining or parallel execution based on a
dependency analysis. If the statements have dependencies, it reshapes them into a single statement with
inlining to derive additional benefit (for example, cost-based optimization) from the SQL query optimizer. If
the queries do not have any dependencies, they are executed in parallel.
Given multiple single statements (for example, S1, S2, S3), SQL inlining is used to replace the multiple
statements with a new single statement (for example, S4 = "WITH S1, S2, S3, SELECT ...").
• Control-flow simplification
This involves dead-code elimination. The SQLScript optimizer eliminates the following:
• Statements where it is guaranteed that their generated results will never be used in the SQLScript
procedure
• Unreachable branches where it is determined that the conditions for the branches evaluate to false
after constant propagation

Example

The following example shows how the SQLScript optimizer applies its rules during plan preparation and what
the final query looks like.

SAP HANA Performance Guide for Developers


SQLScript Performance PUBLIC 135
There are two procedures, P1 and P2:

CREATE TABLE TAB1 (MATERIAL NVARCHAR(10), FABRIC_ID NVARCHAR(10));


INSERT INTO TAB1 VALUES ('ABC' , 'BI');
CREATE TYPE T_RESULT AS TABLE (MATERIAL NVARCHAR(10), FABRIC_ID NVARCHAR(10));
CREATE PROCEDURE P2(IN F2 NVARCHAR(13), OUT T_RESULT TAB1)
AS BEGIN
DECLARE V_F_ID NVARCHAR(13);
IF :F2 = 'TA' THEN
V_F_ID = 'BI';
END IF;
T_TABLE = SELECT DISTINCT MATERIAL, FABRIC_ID FROM TAB1
WHERE :V_F_ID = '' OR :V_F_ID = FABRIC_ID;
T_RESULT = SELECT * FROM :T_TABLE ORDER BY MATERIAL;
END;
CREATE PROCEDURE P1(IN F1 NVARCHAR(13))
AS BEGIN
CALL P2(:F1, :COMPOSITION);
SELECT * FROM :COMPOSITION ORDER BY MATERIAL;
END;

The procedure P1 is now executed with the parameter value TA as follows:

CALL P1('TA');

1. Constant propagation
The input value for F1 in the procedure P1 can be replaced with the given constant TA. The SQLScript
optimizer replaces all F1 instances with the TA constant value.
Before constant propagation in the procedure P1:

AS BEGIN
CALL P2(:F1, :COMPOSITION);
SELECT * FROM :COMPOSITION ORDER BY MATERIAL;
END;

After constant propagation in the procedure P1:

AS BEGIN
CALL P2('TA', :COMPOSITION);
SELECT * FROM :COMPOSITION ORDER BY MATERIAL;
END;

The input value for F2 in the procedure P2 can also be replaced accordingly.
Before constant propagation in the procedure P2:

AS BEGIN
DECLARE V_F_ID NVARCHAR(13);
IF :F2 = 'TA' THEN
V_F_ID = 'BI';
END IF;
T_TABLE = SELECT DISTINCT MATERIAL, FABRIC_ID FROM TAB1
WHERE :V_F_ID = '' OR :V_F_ID = FABRIC_ID;
T_RESULT = SELECT * FROM :T_TABLE ORDER BY MATERIAL;
END;

After constant propagation in the procedure P2:

AS BEGIN
DECLARE V_F_ID NVARCHAR(13);
IF 'TA' = 'TA' THEN
V_F_ID = 'BI';
END IF;

SAP HANA Performance Guide for Developers


136 PUBLIC SQLScript Performance
T_TABLE = SELECT DISTINCT MATERIAL, FABRIC_ID FROM TAB1
WHERE :V_F_ID = '' OR :V_F_ID = FABRIC_ID;
T_RESULT = SELECT * FROM :T_TABLE ORDER BY MATERIAL;
END;

2. Procedure flattening
The body of the procedure P2 can be unfolded into the body of the caller procedure P1. Instead of CALL
P2('TA', :COMPOSITION), the body of the procedure P2 is unfolded into the procedure P1. The OUT
parameter :COMPOSITION is also changed to :T_RESULT accordingly.
Before procedure flattening in the procedure P1:

AS BEGIN
CALL P2('TA', :COMPOSITION);
SELECT * FROM :COMPOSITION :T_RESULT ORDER BY MATERIAL;
END;

After procedure flattening in the procedure P1:

AS BEGIN
DECLARE V_F_ID NVARCHAR(13);
IF 'TA' = 'TA' THEN
V_F_ID = 'BI';
END IF;
T_TABLE = SELECT DISTINCT MATERIAL, FABRIC_ID FROM TAB1
WHERE :V_F_ID = '' OR :V_F_ID = FABRIC_ID;
T_RESULT = SELECT * FROM :T_TABLE ORDER BY MATERIAL;
SELECT * FROM :T_RESULT ORDER BY MATERIAL;
END;

3. Control-flow simplification
Unnecessary conditions for branches are eliminated. Since 'TA' = 'TA' is always true, this IF statement
can be removed.
Before control-flow simplification in the procedure P1:

AS BEGIN
DECLARE V_F_ID NVARCHAR(13);
IF 'TA' = 'TA' THEN
V_F_ID = 'BI';
END IF;
T_TABLE = SELECT DISTINCT MATERIAL, FABRIC_ID FROM TAB1
WHERE :V_F_ID = '' OR :V_F_ID = FABRIC_ID;
T_RESULT = SELECT * FROM :T_TABLE ORDER BY MATERIAL;
SELECT * FROM :T_RESULT ORDER BY MATERIAL;
END;

After control-flow simplification in the procedure P1:

AS BEGIN
DECLARE V_F_ID NVARCHAR(13);
V_F_ID = 'BI';
T_TABLE = SELECT DISTINCT MATERIAL, FABRIC_ID FROM TAB1
WHERE :V_F_ID = '' OR :V_F_ID = FABRIC_ID;
T_RESULT = SELECT * FROM :T_TABLE ORDER BY MATERIAL;
SELECT * FROM :T_RESULT ORDER BY MATERIAL;
END;

4. Constant propagation
After the IF statement is removed, it is evident that V_F_ID always has the constant value BI. The
DECLARE statement for V_F_ID can be removed and V_F_ID replaced with BI.
Before constant propagation in the procedure P1:

AS BEGIN

SAP HANA Performance Guide for Developers


SQLScript Performance PUBLIC 137
DECLARE V_F_ID NVARCHAR(13);
V_F_ID = 'BI';
T_TABLE = SELECT DISTINCT MATERIAL, FABRIC_ID FROM TAB1
WHERE :V_F_ID = '' OR :V_F_ID = FABRIC_ID;
T_RESULT = SELECT * FROM :T_TABLE ORDER BY MATERIAL;
SELECT * FROM :T_RESULT ORDER BY MATERIAL;
END;

After constant propagation in the procedure P1:

AS BEGIN
T_TABLE = SELECT DISTINCT MATERIAL, FABRIC_ID FROM TAB1
WHERE 'BI' = '' OR 'BI' = FABRIC_ID;
T_RESULT = SELECT * FROM :T_TABLE ORDER BY MATERIAL;
SELECT * FROM :T_RESULT ORDER BY MATERIAL;
END;

5. SQL statement inlining


Multiple statements can be merged into a single statement using the WITH clause. Three SQL statements
in the procedure P1 can be merged into a single SQL statement.
Before SQL statement inlining in the procedure P1:

AS BEGIN
T_TABLE = SELECT DISTINCT MATERIAL, FABRIC_ID FROM TAB1
WHERE 'BI' = '' OR 'BI' = FABRIC_ID;
T_RESULT = SELECT * FROM :T_TABLE ORDER BY MATERIAL;
SELECT * FROM :T_RESULT ORDER BY MATERIAL;
END;

After SQL statement inlining in the procedure P1:

AS BEGIN
WITH "_SYS_T_TABLE_2" AS
(SELECT DISTINCT MATERIAL, FABRIC_ID FROM TAB1
WHERE 'BI' = '' OR 'BI' = FABRIC_ID),
"_SYS_T_RESULT_1" AS
(SELECT * FROM "_SYS_T_TABLE_2" "T_TABLE"
ORDER BY MATERIAL)
SELECT * FROM "_SYS_T_RESULT_1" "COMPOSITION"
ORDER BY MATERIAL;
END;

6. Final optimized query


CALL P1('TA') is finally compiled and optimized as follows:

WITH "_SYS_T_TABLE_2" AS
(SELECT DISTINCT MATERIAL, FABRIC_ID FROM TAB1
WHERE CAST(N'BI' AS NVARCHAR(13)) = '' OR
CAST(N'BI' AS NVARCHAR(13)) = FABRIC_ID),
"_SYS_T_RESULT_1" AS
(SELECT * FROM "_SYS_T_TABLE_2" "T_TABLE"
ORDER BY MATERIAL)
SELECT * FROM "_SYS_T_RESULT_1" "COMPOSITION"
ORDER BY MATERIAL;

When CALL P1('TA'); is executed, this reshaped SQL statement is executed.

SAP HANA Performance Guide for Developers


138 PUBLIC SQLScript Performance
6.2 SQLScript Performance Analysis - First Step

SQLScript performance issues are mainly caused by anti-patterns in the SQLScript code or by the SQLScript
procedure associated with the SQL plan.

• Anti-patterns include inefficient logic (that is, a row-based calculation becomes slow) and inefficient
use of imperative constructs. For more information about SQLScript anti-patterns, see the SQLScript
Performance Guidelines, which describe best practices for programming with SQLScript.
• SQLScript performance issues are often associated with the SQL plan. You can analyze these issues
accurately and quickly, as described in the topics in this section, using SAP HANA SQLScript supportability
tools designed for this purpose.

Let's assume that you have a procedure written in SQLScript and it ran well. After your application or SAP
HANA was upgraded, it suddenly started to run more slowly than before. You want to find the reason and
take the necessary action to improve its performance immediately. SQLScript performance can differ for many
reasons, for example, due to SQLScript code changes after application upgrades, changes to inner statements
by the SQLScript optimizer, or SQL plan changes for each statement.

When you have confirmed that there were no changes to the SQLScript code, you need to focus on the inner
statements of the SQLScript procedure. The first step is to find the dominant inner statement. Once you have
narrowed down the problem to statement level, you then need to find the dominant operation in the dominant
inner query, using an SQL performance analysis strategy. For more information, see SQL Query Performance.

SAP HANA Performance Guide for Developers


SQLScript Performance PUBLIC 139
An overview of the overall procedure is shown below:

SQLScript Supportability Tools

When you drill down into an SQLScript performance issue, two steps are sometimes needed, firstly to find
the dominant statement in the SQLScript procedure and secondly to find the dominant operator within that
SQL statement. SAP HANA provides various SQLScript supportability tools to analyze performance issues
resulting from inefficient logic and/or imperative constructs, as well as issues related to the query plans of

SAP HANA Performance Guide for Developers


140 PUBLIC SQLScript Performance
inner statements. Choosing the right supportability tool to use can be difficult. The following overview shows
which supportability tools can be used in which step:

The following table shows the monitoring scope of each supportability tool for SQLScript:

Engine Component Tool

SQLScript plan profiler

Optimization Explain plan for call

Plan visualizer for SQLScript

SQL trace
SQLScript
SQLScript plan profiler
Execution M_ACTIVE_PROCEDURES monitoring
view

Plan visualizer for SQLScript

The following supportability tools are described in dedicated topics with examples:

• SQL trace
• SQLScript plan profiler
• M_ACTIVE_PROCEDURES monitoring view
• Explain plan for call
• Plan visualizer for SQLScript

Related Information

SQLScript Performance Guidelines [page 178]


SQL Query Performance [page 25]
SQL Trace [page 142]
SQLScript Plan Profiler [page 144]

SAP HANA Performance Guide for Developers


SQLScript Performance PUBLIC 141
M_ACTIVE_PROCEDURES Monitoring View [page 146]
Explain Plan for Call [page 147]
Plan Visualizer for SQLScript [page 148]

6.2.1 SQL Trace

The information collected by the SQL trace includes the overall execution time of each statement, the number
of records affected, as well as other details. The SQL trace is a good starting point for understanding all inner
statements executed within an SQLScript procedure and for identifying potential performance bottlenecks at
statement level.

The SQL trace can also show the intermediate results of an inner query of an SQLScript procedure. This means
that you can reproduce the target inner query with temporary table results without having to execute the whole
procedure. The SQL trace information is saved as a python program (by default, as sqltrace_<...>.py).

The SQL trace is recommended when you want to do the following:

• Check the inner SQL statements reshaped by the SQLScript optimizer.


• Find out which inner queries have a high memory consumption or slow execution.
• Check each executed SQL plan for inner statements.
• See the intermediate results of each inner query.

The SQL trace can be used to monitor SQLScript execution and SQL execution.

You can configure the SQL trace by changing the values of the configuration parameters in the [sqltrace]
section of the indexserver.ini or nameserver.ini file. In particular, the configuration parameter
internal must be set to true to collect the inner statements in a procedure. The configuration parameter
query_plan_trace is optional, but it is helpful to check the explain plan of inner statements. For more
information about other options, see SQL Trace in the SAP HANA Troubleshooting and Performance Analysis
Guide.

Example

The following example shows how to collect an SQL trace.

Turn on the SQL trace as follows:

ALTER SYSTEM ALTER CONFIGURATION ('indexserver.ini','DATABASE') SET


('sqltrace', 'trace') = 'on',
('sqltrace', 'internal') = 'true',
('sqltrace', 'query_plan_trace') = 'on'
WITH RECONFIGURE;

Execute the procedure P1 used previously:

CALL P1('TA');

Turn off the SQL trace as follows:

ALTER SYSTEM ALTER CONFIGURATION ('indexserver.ini','DATABASE') UNSET

SAP HANA Performance Guide for Developers


142 PUBLIC SQLScript Performance
('sqltrace', 'trace'),
('sqltrace', 'internal'),
('sqltrace', 'query_plan_trace')
WITH RECONFIGURE;

The collected SQL trace looks like this:

The trace shows the execution of CALL P1('TA') and its compiled inner statement. The inner query and its
query plan are shown twice, once for each compilation and execution phase.

Related Information

SQL Trace

SAP HANA Performance Guide for Developers


SQLScript Performance PUBLIC 143
6.2.2 SQLScript Plan Profiler

The SQLScript plan profiler is a performance analysis tool designed mainly for use with stored procedures and
functions.

When the SQLScript plan profiler is enabled, a single tabular result per call statement is generated. The result
table contains the start time, end time, CPU time, wait time, thread ID, and some additional details for each
predefined operation. A predefined operation can be anything that is considered important for analyzing the
engine performance of stored procedures and functions, covering both compilation and execution times. The
tabular results are displayed in the SAP HANA monitoring view M_SQLSCRIPT_PLAN_PROFILER_RESULTS.

If the query is canceled for any reason, the SQLScript plan profiler information cannot be captured. In such
cases, you can use the SQL trace instead.

The SQLScript plan profiler is recommended when you want to do the following:

• Understand the entire process of SQLScript procedure runs (for example, compilation, execution,
parallelization).
• Find out which inner query has a high memory consumption or slow execution.

The SQLScript plan profiler can be used to monitor SQLScript optimization and SQLScript execution.

You can use the SQLScript plan profiler with either the ALTER SYSTEM statement or the SQL hint
SQLSCRIPT_PLAN_PROFILER.

Using the Plan Profiler with the ALTER SYSTEM Statement

The following example shows how to use the plan profiler using the ALTER SYSTEM statement.

Start the plan profiler as follows:

ALTER SYSTEM START SQLSCRIPT PLAN PROFILER;

Execute the procedure P1:

CALL P1('TA');

Stop the plan profiler:

ALTER SYSTEM STOP SQLSCRIPT PLAN PROFILER;

Check the result of the plan profiler:

SELECT * FROM M_SQLSCRIPT_PLAN_PROFILER_RESULTS;

SAP HANA Performance Guide for Developers


144 PUBLIC SQLScript Performance
You can see the results in the M_SQLSCRIPT_PLAN_PROFILER_RESULTS monitoring view:

For more information about the monitoring view, see M_SQLSCRIPT_PLAN_PROFILER_RESULTS View.

You can narrow down the scope of the collected profiler information to session level or to a certain procedure
as follows:

• Collect all procedures on a certain connection as follows:

ALTER SYSTEM START SQLSCRIPT PLAN PROFILER FOR SESSION <connection_id>;

• Collect only the specified procedure:

ALTER SYSTEM START SQLSCRIPT PLAN PROFILER FOR PROCEDURE <procedure_name>;

You can clear the plan profiler as follows:

ALTER SYSTEM CLEAR SQLSCRIPT PLAN PROFILER;

Using the Plan Profiler with the SQLSCRIPT_PLAN_PROFILER Hint

You can also use the SQLScript plan profiler directly with an SQL hint instead of using the ALTER SYSTEM
statement. By adding an SQL hint ‒ WITH HINT(SQLSCRIPT_PLAN_PROFILER) ‒ when executing a
procedure, you can collect and return a profiling result as well as the result set of the procedure.

In the following example, the procedure P1 is executed with the SQLSCRIPT_PLAN_PROFILER hint:

CALL P1('TA') WITH HINT(SQLSCRIPT_PLAN_PROFILER);

Note that a profiling result collected by using the SQL hint WITH HINT(SQLSCRIPT_PLAN_PROFILER) is not
shown in the M_SQLSCRIPT_PLAN_PROFILER_RESULTS view.

Related Information

M_SQLSCRIPT_PLAN_PROFILER_RESULTS View
SQLScript Plan Profiler
ALTER SYSTEM {START | STOP | CLEAR} SQLSCRIPT PLAN PROFILER Statement (System Management)

SAP HANA Performance Guide for Developers


SQLScript Performance PUBLIC 145
6.2.3 M_ACTIVE_PROCEDURES Monitoring View
The M_ACTIVE_PROCEDURES monitoring view shows all internally executed statements of a procedure call,
except the SQLScript compilation step. This also includes any remotely executed statements.

The M_ACTIVE_PROCEDURES monitoring view is very similar to the M_ACTIVE_STATEMENTS


monitoring view, but it retains the records of completed internal statements of the procedure
call until the parent procedure finishes, and shows them in hierarchical order of their nested
level. By default, you can see the M_ACTIVE_PROCEDURES content only while the parent
procedure is running. When the parent procedure finishes, the content in M_ACTIVE_PROCEDURES
associated with this procedure call is deleted. If you want to retain the content after
execution, you can configure the parameters NUMBER_OF_CALLS_TO_RETAIN_AFTER_EXECUTION and/or
RETENTION_PERIOD_FOR_SQLSCRIPT_CONTEXT in the [sqlscript] section of the ini file. For more
information, see M_ACTIVE_PROCEDURES.

The M_ACTIVE_PROCEDURES monitoring view is recommended when you want to do the following:

• Monitor SQLScript queries on the fly.


• Find out which inner query has a high memory consumption or slow execution.

The M_ACTIVE_PROCEDURES monitoring view can be used to monitor SQLScript execution.

Example

In the following example, the NUMBER_OF_CALLS_TO_RETAIN_AFTER_EXECUTION parameter is configured


to keep the latest results, so you can see which internal statements of the procedure P1 are executed even after
the execution of P1 has finished.

Set the NUMBER_OF_CALLS_TO_RETAIN_AFTER_EXECUTION parameter as follows:

ALTER SYSTEM ALTER CONFIGURATION ('indexserver.ini','DATABASE')


SET ('sqlscript','NUMBER_OF_CALLS_TO_RETAIN_AFTER_EXECUTION') = '10' WITH
RECONFIGURE;

Execute the procedure P1:

CALL P1('TA');

Check the result in M_ACTIVE_PROCEDURES for an execution of P1:

SELECT * FROM M_ACTIVE_PROCEDURES WHERE PROCEDURE_NAME = 'P1';

Unset the NUMBER_OF_CALLS_TO_RETAIN_AFTER_EXECUTION parameter as follows:

ALTER SYSTEM ALTER CONFIGURATION ('indexserver.ini','DATABASE')


UNSET ('sqlscript','NUMBER_OF_CALLS_TO_RETAIN_AFTER_EXECUTION') WITH RECONFIGURE;

The result shows which statement strings were executed internally, their compilation time, execution time,
materialization time, materialization memory size, and execution memory size:

SAP HANA Performance Guide for Developers


146 PUBLIC SQLScript Performance
Related Information

M_ACTIVE_PROCEDURES

6.2.4 Explain Plan for Call

The explain plan provides information about the compiled plan of a given procedure without executing it.
The information is stored in the EXPLAIN_CALL_PLANS table and shows operators, inner query strings, and
execution engines. It also shows whether inner queries will be executed in parallel or sequentially.

The explain plan is recommended when you want to see the compiled plan of an SQLScript procedure without
executing it.

The compiled plan of an SQLScript procedure contains the following:

• Inner final optimized statement strings


• Sequential or parallel executions
• SQLScript logic (for example, loops, conditions)

However, it does not show the explain plan of inner statements. You can check the query plan of inner queries
by collecting the SQL trace or the explain plan with the plan IDs of the inner statements stored in the SQL plan
cache.

The explain plan for call can be used to monitor SQLScript optimization.

Running the Explain Plan from the SQL Command Line

The following example shows how to execute the EXPLAIN PLAN FOR CALL statement directly for the
procedure P1.

Execute EXPLAIN PLAN FOR CALL as follows:

EXPLAIN PLAN SET STATEMENT_NAME = 'P1' FOR CALL P1('TA');

Retrieve the result from the EXPLAIN_CALL_PLANS table as follows:

SELECT * FROM EXPLAIN_CALL_PLANS WHERE STATEMENT_NAME = 'P1';

The result looks like this:

The OPERATOR_NAME column gives the name of the logical operator. The operators have a hierarchical
structure. The OPERATOR_STRING column contains the statement string or code string in text format. You can
check the full inner statements of the procedure call.

SAP HANA Performance Guide for Developers


SQLScript Performance PUBLIC 147
Explain Plan in SAP HANA Studio
The SAP HANA studio provides an Explain Plan menu to collect the explain plan, but this menu does not show
the results for a procedure, although its explain plan is collected in the EXPLAIN_CALL_PLANS table. You need
to query the EXPLAIN_CALL_PLANS table directly, as shown below:

SELECT * FROM EXPLAIN_CALL_PLANS WHERE CONNECTION_ID = CURRENT_CONNECTION;

Explain Plan in SAP HANA Database Explorer


Unlike the SAP HANA studio, the SAP HANA database explorer provides a feature to both collect and show the
explain plan for a procedure. It is available in the Analyze Explain Plan menu, as shown here:

Managing the EXPLAIN_CALL_PLANS Table


Since EXPLAIN_CALL_PLANS is a system temporary (no logging) table, it can grow unexpectedly if you forget
to delete the rows in it after executing the EXPLAIN PLAN FOR CALL statement or after using the Explain Plan
menu for a CALL procedure in the SAP HANA studio.

You can delete rows in the EXPLAIN_CALL_PLANS table by running a delete query on the table. For example:

DELETE FROM EXPLAIN_CALL_PLANS WHERE STATEMENT_NAME = 'P1';

Related Information

EXPLAIN PLAN for Call


Explain Plan [page 54]

6.2.5 Plan Visualizer for SQLScript

The plan visualizer is a very useful tool for investigating performance issues at both SQLScript and SQL levels.
It captures the executed plan of each statement as well as its execution time, memory usage, result record
count, and explain plan. It provides an SQLScript plan graph showing, for example, inner statements, loops, and
conditional branches, as well as an SQL plan graph for inner SQLScript queries.

The plan visualizer is recommended when you want to do the following:

• Check the detailed execution of inner statements of an SQLScript procedure.


• Find out which inner query has a high memory consumption or slow execution.
• Find the dominant SQL operators (that is, the hot spots) of inner statements.

The plan visualizer for SQLScript can be used to monitor SQLScript execution and SQL execution.

SAP HANA Performance Guide for Developers


148 PUBLIC SQLScript Performance
Visualize Plans in SAP HANA Studio

The following example shows how to access the plan visualizer in the SAP HANA studio. You can get a
plan visualizer trace (PLV file) by choosing Visualize Plan Execute in the context menu of the selected
SQLScript procedure (here CALL P1('T1')):

In the SAP HANA database explorer, you can access it by choosing Analyze Generate SQL Analyzer Plan
File :

Once the PLV file has been downloaded, you can open it using the SAP HANA studio.

The various pages of the plan visualizer are described briefly in the example below based on the PLV file for
executing CALL P1('T1').

SAP HANA Performance Guide for Developers


SQLScript Performance PUBLIC 149
Overview

On the Overview page (shown below), several important KPIs required to begin a performance analysis are
provided so that you can first get a big picture of what is going on before going into the complex details:

Statement Statistics

The Statement Statistics page displays a set of statistics for each inner SQL statement involved in a procedure,
as shown here:

Executed Plan

The Executed Plan page provides a plan graph showing how the SQLScript procedure was executed as well as
the inner SQL statements:

SAP HANA Performance Guide for Developers


150 PUBLIC SQLScript Performance
Timeline
The Timeline view shows the complete overview of the execution plan based on a visualization of sequential
timestamps:

In this Timeline view, you can see two Compile operators. The first Compile operator (shown at the top)
means the compilation of the procedure T1 itself, which is done by SQLScript optimizer. Information about this
Compile operator is also shown on the Overview page as Compilation 9.51 ms. The Statement Statistics
page also shows the same information, that is, the total compile time for CALL P1('T1') is given as 9.51 ms.

The other Compile operator (shown under the Select operator) is the compilation of the inner SQL statement.
This compilation is done during the execution of the procedure, so this time is included in the execution time
shown on the Overview page, not in the compilation time. On the Statement Statistics page, you can see that
the total compile time of this inner statement is 20.08 ms. This is also included in the total execution time of
CALL P1('T1').

For more information about the plan visualizer, see Plan Visualizer and Analyzing SQL Execution with the Plan
Visualizer.

 Note

The SQL analyzer tool for SAP HANA can be used instead of the SAP HANA studio (deprecated) to
analyze the performance of SAP HANA SQL queries. It supports features that are available in the PlanViz
perspective of the SAP HANA studio, as well as new features. The tool is available as an extension for
Microsoft Visual Studio Code and SAP Business Application Studio.

For more information, see the SQL Analyzer Tool for SAP HANA Guide.

Related Information

Plan Visualizer [page 61]


Analyzing SQL Execution with the Plan Visualizer

SAP HANA Performance Guide for Developers


SQLScript Performance PUBLIC 151
6.3 SQLScript Performance Tuning Tips - Advanced Step

The use of inlining, parameterized queries, and SQL hints can help improve the performance of SQLScript
procedures. In this advanced step, you learn more about these tuning techniques, how they can benefit
performance, and how to decide on a case-by-case basis whether a tuning method should be applied.

Related Information

SQLScript Inlining Feature [page 152]


Query Parameterization - BIND_AS_PARAMETER and BIND_AS_VALUE [page 166]
Using Hints with SQLScript Procedures [page 173]

6.3.1 SQLScript Inlining Feature

Inlining is an SQLScript optimization feature than can enhance performance.

From a user perspective, it is always recommended to reduce the complexity of SQL statements by using
variables. By splitting a very long and complex query into multiple short queries with table variables, you can
make the logic easier to comprehend. For more information, see Reduce the Complexity of SQL Statements.
However, from a performance perspective, it can be expensive to execute multiple statements and materialize
their results repeatedly. So, to reduce this overhead and run an SQLScript procedure faster, the SQLScript
optimizer tries to combine as many dependent statements as possible. This is called inlining and involves
rewriting multiple statements into a single equivalent statement using a WITH clause. Once the queries have
been inlined by the SQLScript optimizer, the statements themselves are reshaped. Since inlining is one of the
optimization features of SQLScript, it is beneficial for performance. For example, the materialization of the
intermediate results of table variables can be omitted and a better join order can be processed.

Related Information

Reduce the Complexity of SQL Statements [page 179]


Inlining Process [page 153]
Blocking Statement Inlining with Hints (NO_INLINE) [page 155]
Blocking Inlining for Better Performance [page 158]

SAP HANA Performance Guide for Developers


152 PUBLIC SQLScript Performance
6.3.1.1 Inlining Process

Inlining is one of the SQLScript optimization processes and can be beneficial for performance.

The following example shows how inlining is processed. The procedure get_results consists of four SQL
statements that are linked by table variables, as follows:

CREATE TABLE "Products_A" (PRODUCTID NVARCHAR(10), CATEGORY VARCHAR(40),


TYPECODE VARCHAR(2));
INSERT INTO "Products_A" VALUES ('HANA_NOTE1', 'Notebooks', 'PR');
CREATE TABLE "Item_A" (PRODUCTID NVARCHAR(10), DELIVERYDATE DATE);
INSERT INTO "Item_A" VALUES ('HANA_NOTE1', '2022-07-01');
CREATE PROCEDURE "get_results"(
IN im_category VARCHAR(40),
IN im_typecode VARCHAR(2),
IN im_start_date DATE DEFAULT '2022-06-02',
IN im_end_date DATE DEFAULT '2022-08-31',
OUT RESULTS TABLE (
PRODUCTID NVARCHAR(10),
RESULTDATE DAYDATE,
NUM_RESULT BIGINT
)
)
LANGUAGE SQLSCRIPT
READS SQL DATA AS
BEGIN
-- Statement#1
final_products = SELECT * FROM "Products_A" WHERE category = :im_category AND
typecode = :im_typecode;
-- Statement#2
items = SELECT i."PRODUCTID" AS PRODUCTID, i."DELIVERYDATE" AS RESULTDATE
FROM :final_products AS p
INNER JOIN "Item_A" AS i ON p.productid = i."PRODUCTID" WHERE i.DELIVERYDATE
>=:im_start_date
AND i.DELIVERYDATE <= :im_end_date;
-- Statement#3
aggregated_items = SELECT PRODUCTID, RESULTDATE, COUNT(PRODUCTID) AS NUM_RESULT
FROM :items
GROUP BY PRODUCTID, RESULTDATE;
-- Statement#4
RESULTS = SELECT * FROM :aggregated_items ORDER BY PRODUCTID, RESULTDATE;

END;

The procedure get_results is called as follows:

CALL "get_results"(
IM_CATEGORY => 'Notebooks',
IM_TYPECODE => 'PR',
RESULTS => ?);

The SQLScript optimizer checks the dependency of the four SQL statements. The table variable
final_products contains the result of statement 1 and is used in statement 2. The table variable items
is the result of statement 2 and is used in statement 3. Lastly, the table variable aggregated_items comes
from statement 3 and is consumed in statement 4:

-- Statement#1
final_products = SELECT * FROM "Products_A" WHERE category = :im_category AND
typecode = :im_typecode;
-- Statement#2
items = SELECT i."PRODUCTID" AS PRODUCTID, i."DELIVERYDATE" AS RESULTDATE FROM
:final_products

SAP HANA Performance Guide for Developers


SQLScript Performance PUBLIC 153
AS p INNER JOIN "Item_A" AS i ON p.productid = i."PRODUCTID" WHERE
i.DELIVERYDATE >=:im_start_date
AND i.DELIVERYDATE <= :im_end_date;
-- Statement#3
aggregated_items = SELECT PRODUCTID, RESULTDATE, COUNT(PRODUCTID) AS NUM_RESULT
FROM :items
GROUP BY PRODUCTID, RESULTDATE;
-- Statement#4
RESULTS = SELECT * FROM :aggregated_items ORDER BY PRODUCTID, RESULTDATE;

The SQLScript optimizer confirms the dependency of the four statements and their table variables. It
combines them into a single statement using a WITH clause. Statement 1 is merged into the WITH clause
as a subquery with the query name _SYS_FINAL_PRODUCTS_2. Statement 2 and statement 3 are also merged
into the clause with proper query names:

WITH "_SYS_FINAL_PRODUCTS_2" AS
(SELECT * FROM "Products_A"
WHERE category = CAST('Notebooks' AS VARCHAR(40)) AND
typecode = CAST('PR' AS VARCHAR(2))
),
"_SYS_ITEMS_2" AS
(SELECT i."PRODUCTID" AS PRODUCTID,
i."DELIVERYDATE" AS RESULTDATE
FROM "_SYS_FINAL_PRODUCTS_2" AS p
INNER JOIN "Item_A" AS i ON p.productid = i."PRODUCTID"
WHERE i.DELIVERYDATE >=__typed_Daydate__($1) AND
i.DELIVERYDATE <= __typed_Daydate__($2)
),
"_SYS_AGGREGATED_ITEMS_2" AS
(SELECT PRODUCTID, RESULTDATE,
COUNT(PRODUCTID) AS NUM_RESULT
FROM "_SYS_ITEMS_2" "ITEMS"
GROUP BY PRODUCTID, RESULTDATE
)
SELECT * FROM "_SYS_AGGREGATED_ITEMS_2" "AGGREGATED_ITEMS"
ORDER BY PRODUCTID, RESULTDATE

When the procedure get_results is executed, an inlined single SQL statement is now executed instead of
four individual SQL statements.

You can use the EXPLAIN PLAN FOR CALL statement to check the compiled SQLScript plan. You execute the
statement as follows:

EXPLAIN PLAN SET STATEMENT_NAME = 'INLINE' FOR


CALL "get_results"(
IM_CATEGORY => 'Notebooks',
IM_TYPECODE => 'PR',
RESULTS => ?);

You retrieve the result from the EXPLAIN_CALL_PLANS table as follows:

SELECT * FROM EXPLAIN_CALL_PLANS WHERE STATEMENT_NAME = 'INLINE';

SAP HANA Performance Guide for Developers


154 PUBLIC SQLScript Performance
In the OPERATOR_STRING column, there is only one statement and it has been inlined with four different
statements using a WITH clause in the Assign row. Due to the inlining feature, the table variables
final_products, items, and aggregated_items do not have to be materialized. Therefore, the inlined
statement is beneficial for performance from a memory-consumption perspective.

6.3.1.2 Blocking Statement Inlining with Hints


(NO_INLINE)

Inlining is one of the optimization features in SQLScript and is generally beneficial for performance. The
SQLScript optimizer therefore tries to inline as many SQL statements as possible.

There might be cases, however, where the combined statements do not result in an optimal plan. This
can affect performance when the inlined statement is executed. You therefore have the option of blocking
statement inlining by using the SQL hint NO_INLINE. This hint prevents a statement from being combined with
other statements. To block inlining, you must add the NO_INLINE hint at the end of each statement within the
procedure.

The following example shows how to add the NO_INLINE hint to the statements within a procedure:

CREATE OR REPLACE PROCEDURE "get_results"(


IN im_category VARCHAR(40),
IN im_typecode VARCHAR(2),
IN im_start_date DATE DEFAULT '2022-06-02',
IN im_end_date DATE DEFAULT '2022-08-31',
OUT RESULTS TABLE (
PRODUCTID NVARCHAR(10),
RESULTDATE DAYDATE,
NUM_RESULT BIGINT
)
)
LANGUAGE SQLSCRIPT
READS SQL DATA AS
BEGIN
-- Statement#1
final_products = SELECT * FROM "Products_A" WHERE category = :im_category AND
typecode = :im_typecode
WITH HINT (NO_INLINE);

SAP HANA Performance Guide for Developers


SQLScript Performance PUBLIC 155
-- Statement#2
items = SELECT i."PRODUCTID" AS PRODUCTID, i."DELIVERYDATE" AS RESULTDATE
FROM :final_products AS p
INNER JOIN "Item_A" AS i ON p.productid = i."PRODUCTID" WHERE i.DELIVERYDATE
>=:im_start_date
AND i.DELIVERYDATE <= :im_end_date WITH HINT (NO_INLINE);
-- Statement#3
aggregated_items = SELECT PRODUCTID, RESULTDATE, COUNT(PRODUCTID) AS NUM_RESULT
FROM :items
GROUP BY PRODUCTID, RESULTDATE WITH HINT (NO_INLINE);
-- Statement#4
RESULTS = SELECT * FROM :aggregated_items ORDER BY PRODUCTID, RESULTDATE;

END;

Because statement 1 has the SQL hint NO_INLINE, the SQLScript optimizer does not merge statement 1
into the other statements. The SQLScript optimizer checks whether statement 2 and statement 3 can be
combined, but statement 2 also has the NO_INLINE hint, so statement 2 is also executed separately. The same
applies for statement 3. Consequently, the results of statements 1, 2, and 3 are materialized into the table
variables final_products, items, and aggregated_items. This is illustrated below:

You can use the EXPLAIN PLAN FOR CALL statement to check the compiled SQLScript plan. You execute the
statement as follows:

EXPLAIN PLAN SET STATEMENT_NAME = 'NO_INLINE' FOR


CALL "get_results"(
IM_CATEGORY => 'Notebooks',
IM_TYPECODE => 'PR',
RESULTS => ?);

SAP HANA Performance Guide for Developers


156 PUBLIC SQLScript Performance
You retrieve the result from the EXPLAIN_CALL_PLANS table as follows:

SELECT * FROM EXPLAIN_CALL_PLANS WHERE STATEMENT_NAME = 'NO_INLINE';

The OPERATOR_STRING column shows that inlining is blocked by the NO_INLINE hint and that each statement
is executed individually.

The SQL hint NO_INLINE not only blocks statement inlining optimization, but it also prevents procedure
flattening. The following example illustrates this behavior:

CREATE OR REPLACE PROCEDURE P1(IN F1 NVARCHAR(13))


AS BEGIN
CALL P2(:F1, :COMPOSITION) WITH HINT (NO_INLINE);
SELECT * FROM :COMPOSITION ORDER BY MATERIAL;
END;

SAP HANA Performance Guide for Developers


SQLScript Performance PUBLIC 157
EXPLAIN PLAN SET STATEMENT_NAME = 'P1_NO_INLINE' FOR CALL P1('TA');
SELECT * FROM EXPLAIN_CALL_PLANS WHERE STATEMENT_NAME = 'P1_NO_INLINE';

Previously, when the EXPLAIN PLAN FOR CALL statement was executed for the procedure P1, the result
showed procedure P2 flattened inside procedure P1 (see Explain Plan for Call). Now CALL P2 is executed
independently inside procedure P1 and its result is passed with output variable COMPOSITION_2, as shown
below:

Related Information

Explain Plan for Call [page 147]

6.3.1.3 Blocking Inlining for Better Performance

Although inlining is generally beneficial for performance in terms of memory consumption and better join
plans, there might be rare cases where combined statements lead to a non-optimal plan. If this is the case,
blocking the combination of specific statements can help improve performance.

For example, inlining can make a query long and complex. Sometimes it is difficult for the SQL optimizer
to optimize a long and complex query, which as a result can mean a long compilation time, which in turn
can affect overall performance. A long compilation time does not always result in a long execution time, but
there can be cases where a long compilation time is one of the main causes of slow execution. When a long
compilation time does cause a performance issue, you can try blocking inlining to reduce the compilation time
through a simpler initial query tree. For an example, see Case Study 1.

Another case for blocking inlining is when a suboptimal plan is chosen with a large join size. Let's assume
there are two queries called query 1 and query 2 inside a procedure, and each query has several joins. The
two queries are merged into a single query by the SQLScript optimizer. At this point, after the joins of query
1 and query 2 are merged, the joins could become very large and complex, causing an out-of-memory (OOM)
situation. This would be unexpected, since query 1 and query 2 were designed to be separate. As a workaround,
you could try adding the NO_INLINE hint to the query to break inlining.

The NO_INLINE hint can be a good way of trying to avoid a suboptimal plan. When the NO_INLINE hint is
applied, the queries in the procedure can be processed individually by the SQLScript optimizer and in a more
efficient manner because the initial query tree is simpler than when inlining is used.

SAP HANA Performance Guide for Developers


158 PUBLIC SQLScript Performance
Related Information

Case Study 1 [page 159]

6.3.1.4 Case Study 1

A slow running procedure is observed. It takes about 18 seconds, which is much longer than expected.

In this case study, you use the plan visualizer and SQL analyzer to identify and analyze the specific query that
causes performance slowness in the SQLScript procedure.

Plan Visualizer

To identify where most of the time was spent, the plan visualizer trace is checked first. The collected plan
visualizer trace looks like this:

On the Statement Statistics page, the total execution time of call … is 17,983.04 ms, but the execution time of
all the inner statements is less than 100 ms.

As explained in Plan Visualizer for SQLScript, the compilation time for each statement within a procedure
belongs to the execution time of the procedure, so you need to check the total compile time of each inner
statement on the Statement Statistics page.

You can see that the compile time of one of the inner statements is 13,907.53 ms and that its line number is
295.

You can also find the same information on the Timeline page:

SAP HANA Performance Guide for Developers


SQLScript Performance PUBLIC 159
SQL Analyzer

Using the SQL analyzer tool for SAP HANA, you can check further information about the inner statements, as
shown below:

By selecting the text in the Plan field and then copying and pasting it elsewhere, you can get the explain plan of
an inner statement. When you click the open link in the Deep Dive field, you can see the plan visualizer trace of
that inner statement too.

The Comments field shows the comment generated by the SQLScript optimizer, including information about
which table variables were merged by inlining. The comment for the inner query with the long compilation time
of 13,907.53 ms is shown below:

The comment indicates that 17 table variables are used. This means that 17 inner statements are combined
into this inner statement by the inlining feature.

SAP HANA Performance Guide for Developers


160 PUBLIC SQLScript Performance
The explain plan of the same inner query is shown below. The query plan is over 400 lines:

SAP HANA Performance Guide for Developers


SQLScript Performance PUBLIC 161
By clicking the open link in the Deep Dive field for this inner query, you can view its plan visualizer trace. On the
COMPILATION SUMMARY tab at the bottom, the rewriting time shown is 10,124.637 ms:

Based on the indications so far, this inner statement within the procedure appears to be inlined with too many
statements. The inlined statement consists of many joins and filters, resulting in a large query plan, which in
turn also means a long compilation time.

In most cases, inlining makes an SQLScript procedure run faster, but there can be cases where this does not
occur. This case study shows an exceptional case where the inlining feature makes the query complicated and
slows down performance.

Applying a Tuning Method

A possible measure would be to add the SQL hint NO_INLINE and check what happens. First, you need to find
the positions where the NO_INLINE hint should be added.

SAP HANA Performance Guide for Developers


162 PUBLIC SQLScript Performance
When opening the definition of this procedure, you can see that another procedure,
CL_PPH_READ_NHP_BASIC=>GET_MRP_ELEMENT2, is called inside it, as shown below:

When opening the Comments field of the inlined statement, you can that see all the table variables also come
from the CL_PPH_READ_NHP_BASIC=>GET_MRP_ELEMENT2 procedure:

Therefore, the position for adding the NO_INLINE hint must be the end of the statement of each of these table
variables. In this example, according to the Comments value shown, the table variable LT_RESB is located in
line 21, so the NO_INLINE hint must be added at the end of that statement in line 45, as follows:

After the NO_INLINE hint has been added to the end of each statement, the plan visualizer trace of this call
procedure looks like this:

SAP HANA Performance Guide for Developers


SQLScript Performance PUBLIC 163
The execution time of the call procedure has been reduced from 17,987.54 ms to 2,357.647 ms, so it appears
that adding the NO_INLINE hint makes this call procedure much faster.

Comparing Query Performance

You now compare the Statement Statistics views from the previous plan visualizer trace and the new one. As
expected, the inner statement with a long compilation time has been replaced with 17 statements due to the
NO_INLINE hint that was added. The execution time of the inner statement with a long compilation time was
423.51 ms and its compile time was 13,907.53 ms. In the new trace, the sum of the execution times of the 17
new statements is 3,646.26 ms and the sum of their compile times is 16,424.82 ms. Their total execution and
compile time is longer than the execution and compile time of the inner statement with a long compilation time.
Therefore, how can this call procedure with the NO_INLINE hint run faster than before?

SAP HANA Performance Guide for Developers


164 PUBLIC SQLScript Performance
When checking the Timeline view (shown below), you can confirm that these 17 statements are compiled in
parallel and executed independently, if possible:

Due to parallel compilation and execution, more CPU resources are consumed than before, but considering the
faster execution time, this can be a reasonable trade-off if your system has enough CPU power.

The use of the NO_INLINE hint needs to be decided on a case-by-case basis. As this case has demonstrated,
you need to consider the resources and workload of the system when making this type of tuning decision.

In summary, this case study shows how you can approach a long-compilation problem inside an SQLScript
procedure by using the NO_INLINE hint to block inlining. When the NO_INLINE hint is added to the end of each
statement, the statements are compiled individually, resulting in a lower compilation time in total.

SAP HANA Performance Guide for Developers


SQLScript Performance PUBLIC 165
Related Information

Plan Visualizer for SQLScript [page 148]

6.3.2 Query Parameterization - BIND_AS_PARAMETER and


BIND_AS_VALUE

The advantage of using a parameterized query is that the generated query plan cache entry can be reused even
if the values of the variables change. A potential disadvantage is that you might not get the most optimal query
plan, because optimization using parameter values is done for all possible parameters, and optimal plans could
be different for different parameter values.

When constant values are used, a new query plan is always prepared, resulting in different query plan cache
entries for different parameter values. This also means that additional time is spent on query preparation and
that there could be potential flooding effects in the SQL plan cache in scenarios with fast-changing parameter
values.

You can use the BIND_AS_PARAMETER and BIND_AS_VALUE functions to explicitly control the
parameterization behavior of scalar parameters. With the BIND_AS_PARAMETER function, a query parameter
is always used, so the time spent on query preparation can be reduced. With the BIND_AS_VALUE function, a
constant value is always used, so different query plan cache entries are created for different parameter values.

Depending on the business scenario, you can choose to use either the BIND_AS_PARAMETER or
BIND_AS_VALUE function to control parameterization. For more information about the BIND_AS_VALUE
function, see Case Study 2.

The examples below illustrate how the following procedure works with constant values, with query parameters,
and when the BIND_AS_PARAMETER and BIND_AS_VALUE functions are used:

CREATE OR REPLACE PROCEDURE "get_results"(


IN im_category VARCHAR(40),
IN im_typecode VARCHAR(2),
IN im_start_date DATE DEFAULT '2022-06-02',
IN im_end_date DATE DEFAULT '2022-08-31',
OUT RESULTS TABLE (
PRODUCTID NVARCHAR(10),
RESULTDATE DAYDATE,
NUM_RESULT BIGINT
)
)
LANGUAGE SQLSCRIPT
READS SQL DATA AS
BEGIN
final_products = SELECT * FROM "Products_A" WHERE category = :im_category
AND typecode = :im_typecode with hint(NO_INLINE);

items = select i."PRODUCTID" as PRODUCTID, i."DELIVERYDATE" as RESULTDATE


from :final_products as p
inner join "Item_A" as i on p.productid = i."PRODUCTID" where i.DELIVERYDATE
>=:im_start_date
AND i.DELIVERYDATE <= :im_end_date with hint(NO_INLINE);

aggregated_items = SELECT PRODUCTID, RESULTDATE, COUNT(PRODUCTID) AS NUM_RESULT


FROM :items GROUP BY PRODUCTID,
RESULTDATE with hint(NO_INLINE);

SAP HANA Performance Guide for Developers


166 PUBLIC SQLScript Performance
RESULTS = select * from :aggregated_items order by PRODUCTID, RESULTDATE with
hint(NO_INLINE);

END;

The parameters im_category and im_typecode are the input parameters of the procedure get_results,
and they are used in the filter of the table variable final_products.

Constant Values

In this example, the procedure is used with the following constant values:

CALL /*constant_values*/ "get_results"(


IM_CATEGORY => 'Notebooks',
IM_TYPECODE => 'PR',
RESULTS => ?);

You run the procedure with constant values, using the EXPLAIN PLAN FOR CALL statement to check the result
as follows:

EXPLAIN PLAN SET STATEMENT_NAME = 'CONSTANT_VALUES' FOR


CALL /*constant_values*/ "get_results"(
IM_CATEGORY => 'Notebooks',
IM_TYPECODE => 'PR',
RESULTS => ?);

SELECT * FROM EXPLAIN_CALL_PLANS WHERE STATEMENT_NAME = 'CONSTANT_VALUES';

In the first inner statement, you can see that CAST('Notebooks' AS VARCHAR(40)) and CAST('PR' AS
VARCHAR(2)) are used as constant values instead of the parameters im_category and im_typecode.
The input parameters im_category and im_typecode of the procedure are represented as constant
values because this SQLScript procedure was compiled with the given constant values and not with query
parameters.

Query Parameters

In this example, the same procedure is used with query parameters:

CALL /*query_parameter*/ "get_results"(


IM_CATEGORY => ?,
IM_TYPECODE => ?,
RESULTS => ?);

SAP HANA Performance Guide for Developers


SQLScript Performance PUBLIC 167
When the procedure is executed, you need to enter the parameters as shown below and execute the procedure:

You can use the EXPLAIN PLAN FOR CALL statement to check how it works as follows:

EXPLAIN PLAN SET STATEMENT_NAME = 'QUERY_PARAMETER' FOR


CALL /*query_parameter*/ "get_results"(
IM_CATEGORY => ?,
IM_TYPECODE => ?,
RESULTS => ?);

SELECT * FROM EXPLAIN_CALL_PLANS WHERE STATEMENT_NAME = 'QUERY_PARAMETER';

You can see the that the input parameters im_category and im_typecode are now represented as the query
parameters __typed_String_($1, 40) and __typed_String__($2, 2). Before the query is executed,
the parameter values are bound to the query parameters.

BIND_AS_PARAMETER and BIND_AS_VALUE Functions

The following example shows the use of the BIND_AS_PARAMETER and BIND_AS_VALUE functions:

CREATE OR REPLACE PROCEDURE "get_results"(


IN im_category VARCHAR(40),
IN im_typecode VARCHAR(2),
IN im_start_date DATE DEFAULT '2022-06-02',
IN im_end_date DATE DEFAULT '2022-08-31',
OUT RESULTS TABLE (
PRODUCTID NVARCHAR(10),
RESULTDATE DAYDATE,
NUM_RESULT BIGINT
)
)
LANGUAGE SQLSCRIPT
READS SQL DATA AS
BEGIN
final_products = SELECT * FROM "Products_A" WHERE category =
BIND_AS_PARAMETER(:im_category)
AND typecode = BIND_AS_VALUE(:im_typecode) with hint(NO_INLINE);

items = select i."PRODUCTID" as PRODUCTID, i."DELIVERYDATE" as RESULTDATE


from :final_products as p

SAP HANA Performance Guide for Developers


168 PUBLIC SQLScript Performance
inner join "Item_A" as i on p.productid = i."PRODUCTID" where i.DELIVERYDATE
>=:im_start_date
AND i.DELIVERYDATE <= :im_end_date with hint(NO_INLINE);

aggregated_items = SELECT PRODUCTID, RESULTDATE, COUNT(PRODUCTID) AS NUM_RESULT


FROM :items
GROUP BY PRODUCTID, RESULTDATE with hint(NO_INLINE);

RESULTS = select * from :aggregated_items order by PRODUCTID, RESULTDATE with


hint(NO_INLINE);

END;

You can use the EXPLAIN PLAN FOR CALL statement to check the compiled SQLScript plan as follows:

DELETE FROM EXPLAIN_CALL_PLANS WHERE STATEMENT_NAME = 'BIND_AS';


EXPLAIN PLAN SET STATEMENT_NAME = 'BIND_AS' FOR
CALL "get_results"(
IM_CATEGORY => 'Notebooks',
IM_TYPECODE => 'PR',
RESULTS => ?);
SELECT * FROM EXPLAIN_CALL_PLANS WHERE STATEMENT_NAME = 'BIND_AS';

• BIND_AS_PARAMETER
The BIND_AS_PARAMETER function always uses a query parameter to represent a scalar variable during
query preparation.
In the above result, BIND_AS_PARAMETER(:im_category) is converted to __typed_String__($1,
40), which is an internal representation of the query parameter, and the input variable im_category is
represented using this query parameter. Although you entered Notebooks (constant value) as the value of
the variable im_category, it is bound to a query parameter because the function overrides the decisions
made by the optimizer.
• BIND_AS_VALUE
The BIND_AS_VALUE function always uses a value to represent a scalar variable during query preparation.
In the above result, BIND_AS_VALUE(:im_typecode) is converted to CAST('PR' AS VARCHAR(2)),
which is a constant value of the given PR value for the input variable im_typecode.

Related Information

Case Study 2 [page 170]

SAP HANA Performance Guide for Developers


SQLScript Performance PUBLIC 169
6.3.2.1 Case Study 2

When you are analyzing SQLScript performance issues, the very first thing to do is to detect performance
slowness and check whether there is a suspicious inner query within an SQLScript procedure.

There are several ways to find a suspicious inner query. Collecting the plan visualizer trace might be the most
convenient way, but collectiong it for a problematic SQLScript procedure can sometimes be difficult. If that
is the case, you can use the SQL trace to capture a specific query during issue reproduction, or you can use
the monitoring view M_ACTIVE_PROCEDURES. Depending on the situation, you can use various supportability
tools. For more information, see SQLScript Performance Analysis - First Step.

In this case study, you use the SQL trace and M_ACTIVE_PROCEDURES monitoring view to find a specific
query that causes performance slowness in a procedure.

Let's assume that one of your SQLScript procedures runs for a very long time in a production system but was
much faster in the quality and test systems. You could not detect any performance issues while developing
and testing the SQLScript procedure, when the statements were executed one by one. However, the application
now cancels the very long-running procedure based on its timeout logic.

SQL Trace

For your initial investigation, you try to find the suspicious query in the SQLScript procedure. One of the
useful traces for finding a suspicious query that causes performance issues is the SQL trace. To capture any
suspicious queries while reproducing the issue, you turn on the SQL trace as follows:

 Sample Code

-- SET SQLTRACE
ALTER SYSTEM ALTER CONFIGURATION ('indexserver.ini','SYSTEM')
set ('sqltrace','tracefile') = 'sqltrace_$HOST_${PORT}_$
{COUNT:3}_issue_reproduce.py' with reconfigure;
ALTER SYSTEM ALTER CONFIGURATION ('indexserver.ini','SYSTEM') set
('sqltrace','trace') = 'on' with reconfigure;
ALTER SYSTEM ALTER CONFIGURATION ('indexserver.ini','SYSTEM') set
('sqltrace','level') = 'all_with_results' with reconfigure;
ALTER SYSTEM ALTER CONFIGURATION ('indexserver.ini','SYSTEM') set
('sqltrace', 'query_plan_trace') = 'on' with reconfigure;
ALTER SYSTEM ALTER CONFIGURATION ('indexserver.ini','SYSTEM') set
('sqltrace', 'internal') = 'true' with reconfigure;
--REPRODUCE ISSUE--
-- UNSET SQLTRACE
ALTER SYSTEM ALTER CONFIGURATION ('indexserver.ini','SYSTEM') unset
('sqltrace','tracefile') with reconfigure;
ALTER SYSTEM ALTER CONFIGURATION ('indexserver.ini','SYSTEM') unset
('sqltrace','trace') with reconfigure;
ALTER SYSTEM ALTER CONFIGURATION ('indexserver.ini','SYSTEM') unset
('sqltrace','level') with reconfigure;
ALTER SYSTEM ALTER CONFIGURATION ('indexserver.ini','SYSTEM') unset
('sqltrace', 'query_plan_trace') with reconfigure;
ALTER SYSTEM ALTER CONFIGURATION ('indexserver.ini','SYSTEM') unset
('sqltrace', 'internal') with reconfigure;

SAP HANA Performance Guide for Developers


170 PUBLIC SQLScript Performance
The SQL trace that was collected when the query execution was canceled due to its long execution time looks
like this:

M_ACTIVE_PROCEDURES Monitoring View

Since the issue can be reproduced, you can also query the M_ACTIVE_PROCEDURES monitoring view to find
the suspicious query. This monitoring view is helpful for analyzing long-running procedures.

When the issue is reproduced, the M_ACTIVE_PROCEDURES monitoring view shows that one inner statement
has not finished and has been executing for a very long time:

By checking both the SQL trace and M_ACTIVE_PROCEDURES monitoring view, you can successfully identify
the specific query that causes performance slowness in the procedure, and its parameter values. It looks like
this:

INSERT INTO "MY_INVENTORY_SUPPLY_PERSISTENT"


SELECT "MANDT",
……
__typed_Integer__($1) as fiscal_week,
……
FROM "_SYS_BIC"."BI_MY_FACT.MyInventory/CV_FACT_SUPPLY"
(placeholder."$$P_LastArunDate$$" => __typed_NString__($2, 8),
placeholder."$$P_LastPreviewDate$$" => __typed_NString__($3, 8)
)
WHERE
LENGTH(WERKS) = 4

According to the SQL trace and M_ACTIVE_PROCEDURES monitoring view, the problematic query within the
procedure appears to have three parameters ($1, $2, and $3) with the values 202204, u'20210426', and
u'20210424'.

SAP HANA Performance Guide for Developers


SQLScript Performance PUBLIC 171
Analyzing the Problematic Query

Once you have found the suspicious query from the SQL trace and/or the M_ACTIVE_PROCEDURES
monitoring view, you can reproduce the issue with the suspicious query instead of running the SQLScript
procedure. Narrowing down the issue is the best way to find the reason for the slow performance.

Fortunately, you can compare the slow query execution with the fast one you tested before, to find the
difference. One of the base tables for the CV_FACT_SUPPLY calculation view is a multistore table, which is
linked with SAP HANA dynamic tiering. When the query was fast in the test system, the SNAPSHOT_DATE
filter was passed to dynamic tiering. However, when it was slow in the production system, this filter was
not contained in the M_REMOTE_STATEMENTS monitoring view, which shows all statements executed in
a remote source, including dynamic tiering. When the pruning predicate and the constant values of the
SNAPSHOT_DATE filter are combined, all the external partitions of the multistore table are successfully
pruned. That was the reason why the query in the test system was fast.

To find out why the SNAPSHOT_DATE filter was passed in the test system but not in the production system,
you compare the queries used in the two systems. You find that a literal query was used when this statement
was run in the test system, whereas a parameterized query was used in the production system. This means
that you used constant values in the test system. For the real execution, however, the three parameter values
are selected by other statements and passed to the parameters of this statement during the procedure call.

Applying a Tuning Method

The BIND_AS_VALUE function could be a good way to enforce literal string input to the nested query.
Therefore, BIND_AS_VALUE is added where the parameters of the above statement are used in the SQLScript
procedure, as follows:

INSERT INTO "MY_INVENTORY_SUPPLY_PERSISTENT"


SELECT "MANDT",
……
BIND_AS_VALUE(:current_fiscal_week) as fiscal_week,
……
FROM "_SYS_BIC"."BI_MY_FACT.MyInventory/CV_FACT_SUPPLY"
(placeholder."$$P_LastArunDate$$" => BIND_AS_VALUE(:last_arun_date),
placeholder."$$P_LastPreviewDate$$" => BIND_AS_VALUE(:last_preview_date)
)
WHERE
LENGTH(WERKS) = 4

As expected, the SQLScript procedure now runs much faster due to the addition of the BIND_AS_VALUE
function.

When the BIND_AS_VALUE function is used and different constant values are given, the inner statement
needs to be compiled every time because its plan is not stored in the SQL plan cache. This means that there
could be a compilation overhead, with many SQL plan cache entries being generated. When the BIND_AS_
PARAMETER function is used, however, even if different parameter values are given, the query plan only needs
to be compiled once and can then be reused. These factors need to be considered before you apply these
functions as tuning methods.

In summary, this case shows how you can find a long-running inner statement by using the SQL trace and/or
the M_ACTIVE_PROCEDURES monitoring view, and then how the BIND_AS_VALUE function can be applied in
this specific case as a performance tuning method.

SAP HANA Performance Guide for Developers


172 PUBLIC SQLScript Performance
Related Information

SQLScript Performance Analysis - First Step [page 139]

6.3.3 Using Hints with SQLScript Procedures

When you find that a query is running slowly due to a sub-optimal plan, you can apply SQL hints on that SQL
statement to improve its performance. Similarly, SQL hints can also be applied to SQLScript inner statements
to improve the query's performance or plans.

There are two ways to apply hints:

• Add hints to SQLScript procedure code


• Add hints using the pinning hint or statement hint feature

Related Information

Adding Hints to SQLScript Procedure Code [page 173]


Pinning SQL Hints to SQLScript Procedures [page 175]

6.3.3.1 Adding Hints to SQLScript Procedure Code

One of the ways to add SQL hints to the inner statements of an SQLScript procedure is to modify the code of
the procedure.

You can modify the code of an SQLScript procedure directly as follows:

 Sample Code

CREATE OR REPLACE PROCEDURE "get_results"(


IN im_category VARCHAR(40),
IN im_typecode VARCHAR(2),
IN im_start_date DATE DEFAULT '2022-06-02',
IN im_end_date DATE DEFAULT '2022-08-31',
OUT RESULTS TABLE (
PRODUCTID NVARCHAR(10),
RESULTDATE DAYDATE,
NUM_RESULT BIGINT
)
)
LANGUAGE SQLSCRIPT
READS SQL DATA AS
BEGIN
-- Statement#1
final_products = SELECT * FROM "Products_A" WHERE category = :im_category AND
typecode = :im_typecode;
-- Statement#2
items = SELECT i."PRODUCTID" AS PRODUCTID, i."DELIVERYDATE" AS RESULTDATE
FROM :final_products AS p

SAP HANA Performance Guide for Developers


SQLScript Performance PUBLIC 173
INNER JOIN "Item_A" AS i ON p.productid = i."PRODUCTID" WHERE i.DELIVERYDATE
>=:im_start_date
AND i.DELIVERYDATE <= :im_end_date;
-- Statement#3
aggregated_items = SELECT PRODUCTID, RESULTDATE, COUNT(PRODUCTID) AS
NUM_RESULT
FROM :items GROUP BY PRODUCTID, RESULTDATE WITH HINT (USE_HEX_PLAN);
-- Statement#4
RESULTS = SELECT * FROM :aggregated_items ORDER BY PRODUCTID, RESULTDATE;

END;

When you modify the SQLScript procedure by adding hints, the inner statements are recompiled and reshaped,
and the execution plans are changed to reflect the new inner statements and hints.

The hint not only performs its own functions, but it also breaks inlining at the statement where the hint was
added. For example, if you add a hint at the end of the table variable aggregated_items, which is at the end
of statement 3, it will break inlining at aggregated_items even though there is no NO_INLINE hint. This can
be seen as follows:

EXPLAIN PLAN SET STATEMENT_NAME = 'SQL_HINT' FOR


CALL /*SQL_HINT*/ "get_results"(
IM_CATEGORY => 'Notebooks',
IM_TYPECODE => 'PR',
RESULTS => ?);

SELECT * FROM EXPLAIN_CALL_PLANS WHERE STATEMENT_NAME = 'SQL_HINT';

The result shows that three statements (statement 1, statement 2, and statement 3) are combined into a single
statement that contains the hint USE_HEX_PLAN.

When an SQL hint is applied to a CALL statement, as shown below, this hint is not propagated to the inner
statements of the SQLScript procedure:

CALL /*SQL_HINT*/ "get_results"(


IM_CATEGORY => 'Notebooks',
IM_TYPECODE => 'PR',
RESULTS => ?) WITH HINT(USE_HEX_PLAN);

In this example, the SQL hint USE_HEX_PLAN is added to the CALL procedure get_results, but it does not
have any effect on the inner statements of the SQLScript procedure get_results. You can run the EXPLAIN
PLAN FOR CALL statement to confirm that the hint is not propagated at all.

SAP HANA Performance Guide for Developers


174 PUBLIC SQLScript Performance
From SAP HANA 2.0 SPS 07, you can propagate hints from a CALL statement to the inner SQL statements of
the procedure by adding CASCADE after the hint name, as follows:

CALL /*SQL_HINT*/ "get_results"(


IM_CATEGORY => 'Notebooks',
IM_TYPECODE => 'PR',
RESULTS => ?) WITH HINT(USE_HEX_PLAN CASCADE);

Related Information

HINT Details
CALL Statement (Procedural)

6.3.3.2 Pinning SQL Hints to SQLScript Procedures

If you cannot modify the SQLScript procedure code directly, you can also add SQL hints to the inner
statements of a procedure by using the ALTER SYSTEM ADD STATEMENT HINT statement or the ALTER
SYSTEM PIN statement.

However, if the SQLScript procedure contains table parameters, the name of each table parameter will be
changed to an internal temporary table name with a random GUID that is automatically generated by the
SQLScript optimizer. An example is shown below:

 Sample Code

CREATE TABLE PRODUCT (


PRODUCT_ID VARCHAR(10), PRODUCT_NAME VARCHAR(100)
);
CREATE TYPE TT_PRODUCT AS TABLE (
PRODUCT_ID VARCHAR(10), PRODUCT_NAME VARCHAR(100)
);
CREATE TABLE SALES (
PRODUCT_ID VARCHAR(10), SALES_AMOUNT BIGINT
);
CREATE TYPE TT_SALES AS TABLE (
PRODUCT_NAME VARCHAR(100), SALES_AMOUNT BIGINT
);
CREATE OR REPLACE PROCEDURE SALES_REPORT(
IN IN_PRODUCT TT_PRODUCT,
OUT OUT_RESULT TT_SALES
) AS BEGIN
OUT_RESULT = SELECT P.PRODUCT_NAME, SUM(S.SALES_AMOUNT) as SALES_AMOUNT
FROM :IN_PRODUCT P, SALES S WHERE P.PRODUCT_ID = S.PRODUCT_ID
GROUP BY P.PRODUCT_NAME;
END;
CREATE OR REPLACE PROCEDURE ALL_PRODUCT_SALES_REPORT (
OUT OUT_SALES TT_SALES
) AS BEGIN
TAB_PRODUCT = SELECT PRODUCT_ID, PRODUCT_NAME FROM PRODUCT;
CALL SALES_REPORT(:TAB_PRODUCT, :OUT_SALES) WITH HINT (NO_INLINE);
END;

SAP HANA Performance Guide for Developers


SQLScript Performance PUBLIC 175
You can check how it is executed by using the SQLScript plan profiler, as follows:

CALL ALL_PRODUCT_SALES_REPORT(?) WITH HINT (SQLSCRIPT_PLAN_PROFILER);

In this example, the NO_INLINE hint is explicitly used to show the usage of internal temporary table variables
by blocking procedure flattening. In the highlighted lines above, there are four internal temporary tables with a
GUID value in their names:

• _SYS_SS2_TMP_TABLE_170893_TAB_PRODUCT_2_FE58DA1C10C65B40BB413C058D185FEE_3
• _SYS_SS2_TMP_TABLE_170882_IN_PRODUCT_4519FC4A03D9424CA17949128AB8CBAC_3
• _SYS_SS2_TMP_TABLE_170882_OUT_RESULT_4519FC4A03D9424CA17949128AB8CBAC_3
• _SYS_SS2_TMP_TABLE_170893_OUT_SALES_FE58DA1C10C65B40BB413C058D185FEE_3

The first internal temporary table is for the table variable TAB_PRODUCT. It is used to store the result of
the first SQL statement of the SQLScript procedure ALL_PRODUCT_SALES_REPORT. The second temporary
table is for the table parameter IN_PRODUCT of the SQLScript procedure SALES_REPORT. Due to the
NO_INLINE hint, the CALL statement for the SALES_REPORT procedure cannot be flattened, so its table
variable IN_PRODUCT is generated as an input parameter. The third temporary table is for the output table
parameter OUT_RESULT of the SQLScript procedure SALES_REPORT. The last temporary table is for the
output table parameter OUT_SALES of the SQLScript procedure ALL_PRODUCT_SALES_REPORT. The name
of each temporary table also contains the name of the table variable or parameter as well as the GUID value,
which lets you easily find out where the temporary table comes from and is used.

The internal temporary table name is regenerated whenever a procedure is compiled. However, it is not
possible to predict what the temporary table name will be due to the random GUIDs, so it is not easy to find
the statement hash, plan ID, or exact statement strings of the inner SQL statement you want to bind with SQL
hints.

Therefore, it is recommended to pin SQL hints on the SQL statement associated with the table variables or
parameters with a GUID value in their names. This means that you need to pin a plan of the parent CALL
statement as well as the inner query to which the SQL hints will be applied. Similarly, if the same applies for the
parent statement, then you also need to pin its parent statement. Since the parent statement has ownership of
the temporary tables' life cycle (that is, when a parent statement is evicted, the temporary tables of the inner
statements are automatically dropped), it would not make sense to add hints only for the target query.

In this example, if you want to bind the SQL hint to the SQL statement of the SALES_REPORT procedure, then
you must pin two CALL statements. To pin them, you need their plan ID or statement hash, which you can find
in the M_SQL_PLAN_CACHE system view, as follows:

CALL ALL_PRODUCT_SALES_REPORT(?);
SELECT PLAN_ID, STATEMENT_HASH, STATEMENT_STRING FROM M_SQL_PLAN_CACHE
WHERE STATEMENT_STRING LIKE '%SALES_REPORT%' ORDER BY LAST_PREPARATION_TIMESTAMP;

SAP HANA Performance Guide for Developers


176 PUBLIC SQLScript Performance
From this result, you can see the plan ID and statement hash values of the CALL statements which should be
pinned in the first and third rows. The fourth row shows the inner SQL statement to which you want to bind the
SQL hint. You can now do the following:

• Pin the SQL plan without a hint on the parent CALL statements shown in the first and third rows.
• Pin the SQL plan with a hint on the target statement shown in the fourth row.

However, if the pinned plans are invalidated because the SQLScript procedure or objects were changed, or for
any other reason, then you should unpin the invalidated plan and re-pin the new SQL plan.

ALTER SYSTEM ADD STATEMENT HINT ON PROCEDURE

Adding SQL hints to the inner SQL statements of a procedure can be problematic because the actual query
string can change due to the use of random GUID values, as described above. When a generated query string
for inner SQL statements is changed by the recompilation of the parent CALL statement, any SQL hints
previously added become ineffective. To overcome this limitation, the ALTER SYSTEM ADD STATEMENT HINT
statement was extended to allow SQL hints to be added to the inner SQL statements of a procedure. The ON
PROCEDURE option of the ALTER SYSTEM ADD STATEMENT HINT statement allows statement hints to be
added based on the statement string in the procedure definition.

For the above example, the SQL hint USE_HEX_PLAN can be added to the SELECT statement inside the
SALES_REPORT procedure using the following ALTER SYSTEM ADD STATEMENT HINT statement:

ALTER SYSTEM ADD STATEMENT HINT (USE_HEX_PLAN) ON PROCEDURE SALES_REPORT


FOR SELECT P.PRODUCT_NAME, SUM(S.SALES_AMOUNT) as SALES_AMOUNT
FROM :IN_PRODUCT P, SALES S WHERE P.PRODUCT_ID = S.PRODUCT_ID
GROUP BY P.PRODUCT_NAME;

The SQL hint USE_HEX_PLAN can be added to the inner SQL statement of the
ALL_PRODUCT_SALES_REPORT procedure in the same way, as follows:

ALTER SYSTEM ADD STATEMENT HINT (USE_HEX_PLAN) ON PROCEDURE


ALL_PRODUCT_SALES_REPORT
FOR SELECT PRODUCT_ID, PRODUCT_NAME FROM PRODUCT;

After the ALTER SYSTEM ADD STATEMENT HINT statement is executed for a procedure, the affected
procedure is recompiled the next time it is executed, to apply the newly added SQL hints. The given SQL
hints are always applied when a procedure is recompiled, even if the generated query string for the inner SQL
statements changes due to newly generated GUIDs.

To remove an attached SQL hint, you can use the ALTER SYSTEM REMOVE STATEMENT HINT statement. The
target SQL statement must be specified in exactly the same way as when the hint was added, including
whitespaces. You cannot use the STATEMENT HASH to add or remove SQL hints when using the ON
PROCEDURE option:

ALTER SYSTEM REMOVE STATEMENT HINT ON PROCEDURE SALES_REPORT

SAP HANA Performance Guide for Developers


SQLScript Performance PUBLIC 177
FOR SELECT P.PRODUCT_NAME, SUM(S.SALES_AMOUNT) as SALES_AMOUNT
FROM :IN_PRODUCT P, SALES S WHERE P.PRODUCT_ID = S.PRODUCT_ID
GROUP BY P.PRODUCT_NAME;

Related Information

ALTER SYSTEM {PIN | UNPIN} SQL PLAN CACHE ENTRY Statement (System Management)
ALTER SYSTEM {ADD | ALTER | REMOVE} STATEMENT HINT Statement (System Management)

6.4 SQLScript Performance Guidelines

Conceptually SQLScript is related to stored procedures as defined in the SQL standard, but SQLScript is
designed to provide superior optimization capabilities. These are key to avoiding massive data copies to the
application server and to leveraging the sophisticated parallel execution strategies of the SAP HANA database.

SQLScript should be used in cases where other modeling constructs of SAP HANA are not sufficient.

This part of the guide discusses recommended programming patterns that yield optimal performance as
well as anti-patterns that should be avoided. The last sections discuss recommended patterns in terms of
programmer productivity.

Related Information

Reduce the Complexity of SQL Statements [page 179]


Reduce Dependencies Between Statements [page 179]
Blocking Statement Inlining with the NO_INLINE Hint [page 179]
Avoid Using Cursors [page 181]
Working with Tables and Table Variables [page 182]
Skipping Expensive Queries [page 194]
Using Dynamic SQL Carefully [page 195]
Simplifying Application Coding with Parallel Operators [page 197]
Replacing Row-Based Calculations with Set-Based Calculations [page 205]
Avoiding Busy Waiting [page 208]
Calling Procedures [page 210]
Miscellaneous [page 212]

SAP HANA Performance Guide for Developers


178 PUBLIC SQLScript Performance
6.4.1 Reduce the Complexity of SQL Statements

Variables in SQLScript enable you to arbitrarily break up a complex SQL statement into many simpler ones.
This makes a SQLScript procedure easier to comprehend.

To illustrate this point, consider the following query:

books_per_publisher = SELECT publisher, COUNT (*) AS cnt


FROM :books GROUP BY publisher;
largest_publishers = SELECT * FROM :books_per_publisher
WHERE cnt >= (SELECT MAX (cnt)
FROM :books_per_publisher);

Writing this query as a single SQL statement requires either the definition of a temporary view (using WITH),
or the multiple repetition of a sub-query. The two statements above break the complex query into two simpler
SQL statements that are linked by table variables. This query is much easier to understand because the names
of the table variables convey the meaning of the query and they also break the complex query into smaller
logical pieces.

The SQLScript compiler will combine these statements into a single query or identify the common sub-
expression using the table variables as hints. The resulting application program is easier to understand without
sacrificing performance.

6.4.2 Reduce Dependencies Between Statements

One of the most important methods for speeding up processing in the SAP HANA database is through
massively parallelized query execution.

Parallelization is exploited at multiple levels of granularity. For example, the requests of different users can be
processed in parallel, and single relational operators within a query can also be executed on multiple cores in
parallel. It is also possible to execute different statements of a single SQLScript procedure in parallel if these
statements are independent of each other. Remember that SQLScript is translated into a dataflow graph, and
independent paths in this graph can be executed in parallel.

As an SQLScript developer, you can support the database engine in its attempt to parallelize execution by
avoiding unnecessary dependencies between separate SQL statements, and by using declarative constructs if
possible. The former means avoiding variable references, and the latter means avoiding imperative features,
such as cursors.

6.4.3 Blocking Statement Inlining with the NO_INLINE Hint

The SQLScript compiler combines as many inline (that is, dependent) statements as possible to optimize the
code. There might be cases, however, where the combined statements do not result in an optimal plan. This
therefore affects performance when the statement is executed.

You can prevent a statement from being combined with other statements by using the NO_INLINE hint. It
needs to be placed at the end of the SELECT statement.

SAP HANA Performance Guide for Developers


SQLScript Performance PUBLIC 179
 Note

Use the NO_INLINE hint carefully since it directly impacts performance. Separate statements are generally
more expensive.

The examples below show a scenario where the use of the NO_INLINE hint is beneficial:

Inlined

DO (OUT tab2 TABLE (c int) => ?)


BEGIN
tab = SELECT A, B, C
FROM T WHERE A = 1;
tab2 = SELECT C FROM :tab WHERE C = 0;
END;

Combined query

WITH "VAR_TAB" AS (SELECT A, B, C


FROM T WHERE A = 1) SELECT C FROM "VAR_TAB"
WHERE C = 0

Not Inlined (Hint)

DO (OUT tab2 TABLE (c int) => ?)


BEGIN
tab = SELECT A, B, C
FROM T WHERE A = 1 WITH HINT(NO_INLINE);
tab2 = SELECT C FROM :tab WHERE C = 0;
END;

tab

SELECT A, B, C FROM T
WHERE A = 1 WITH HINT ( NO_INLINE )

tab2

SELECT C FROM ( SELECT A, B, C FROM "VAR_TAB"


WHERE "C" = 0

• In the first example, the resulting code of the combined query indicates that the SQLScript optimizer was
not able to determine the best way to combine the two statements.
• In the second example, the table variable assignments show that each statement was executed separately
due to the addition of WITH HINT(NO_INLINE) in the first statement. Note that the table variable name has
been simplified to "VAR_TAB" to make the example more readable.

SAP HANA Performance Guide for Developers


180 PUBLIC SQLScript Performance
6.4.4 Avoid Using Cursors

While the use of cursors is sometime required, they also imply row-by-row processing. Consequently,
opportunities for optimizations by the SQL engine are missed. You should therefore consider replacing cursors
with loops in SQL statements.

Read-Only Access

For read-only access to a cursor, consider using simple selects or joins:

CREATE PROCEDURE foreach_proc LANGUAGE SQLSCRIPT AS


Reads SQL DATA
BEGIN
DECLARE val decimal(34,10) = 0;
DECLARE CURSOR c_cursor1 FOR
SELECT isbn, title, price FROM books;
FOR r1 AS c_cursor1 DO
val = :val + r1.price;
END FOR;
END;

This sum can also be computed by the SQL engine:

SELECT sum(price) into val FROM books;

Computing this aggregate in the SQL engine may result in parallel execution on multiple CPUs inside the SQL
executor.

Updates and Deletes

For updates and deletes, consider using the following:

CREATE PROCEDURE foreach_proc LANGUAGE SQLSCRIPT AS


BEGIN
DECLARE val INT = 0;
DECLARE CURSOR c_cursor1 FOR
SELECT isbn, title, price FROM books;
FOR r1 AS c_cursor1 DO
IF r1.price > 50 THEN
DELETE FROM Books WHERE isbn = r1.isbn;
END IF;
END FOR;
END;

This delete can also be computed by the SQL engine:

DELETE FROM Books


WHERE isbn IN (SELECT isbn FROM books WHERE price > 50);

Computing this in the SQL engine reduces the calls through the runtime stack of the SAP HANA database. It
also potentially benefits from internal optimizations like buffering and parallel execution.

SAP HANA Performance Guide for Developers


SQLScript Performance PUBLIC 181
Insertion into Tables

CREATE PROCEDURE foreach_proc LANGUAGE SQLSCRIPT AS


BEGIN
DECLARE val INT = 0;
DECLARE CURSOR c_cursor1 FOR SELECT isbn, title, price FROM books;
FOR r1 AS c_cursor1 DO
IF r1.price > 50
THEN
INSERT INTO ExpensiveBooks VALUES(..., r1.title, ...);
END IF;
END FOR;
END;

This insertion can also be computed by the SQL engine:

SELECT ..., title, ... FROM Books WHERE price > 50


INTO ExpensiveBooks;

Like updates and deletes, computing this statement in the SQL engine reduces the calls through the runtime
stack of the SAP HANA database. It also potentially benefits from internal optimizations like buffering and
parallel execution.

6.4.5 Working with Tables and Table Variables

The recommended programming patterns allow you to efficiently access and manipulate tables and table
variables.

Related Information

Checking Whether a Table or Table Variable is Empty Using IS_EMPTY() [page 183]
Determining the Size of a Table or Table Variable Using RECORD_COUNT() [page 184]
Accessing a Specific Table Cell Using Index-Based Cell Access [page 185]
Searching for Key-Value Pairs in Table Variables [page 187]
Inserting Table Variables into Other Table Variables [page 188]
Inserting Records into Table Variables [page 189]
Updating Individual Records in Table Variables [page 191]
Deleting Individual Records in Table Variables [page 192]

SAP HANA Performance Guide for Developers


182 PUBLIC SQLScript Performance
6.4.5.1 Checking Whether a Table or Table Variable is
Empty Using IS_EMPTY()

The recommended way to check whether a table variable or table is empty is to use the predicate IS_EMPTY.

Compare the following examples:

Problematic: SELECT COUNT(*)

CREATE PROCEDURE P (
IN iv_tab TABLE(…),…)
AS
BEGIN
DECLARE size INTEGER;

SELECT COUNT(*) INTO size


FROM :iv_tab;

IF :size <= 0 THEN


RETURN;
END IF;
...
END;

Problematic: CARDINALITY

CREATE PROCEDURE P (
IN iv_tab TABLE(…),…)
AS
BEGIN

DECLARE size INTEGER;


size =
CARDINALITY(
ARRAY_AGG(:iv_tab.A)
);
IF :size <= 0 THEN
RETURN;
END IF;
...
END;

Recommended: IS_EMPTY

CREATE PROCEDURE P (
IN iv_tab TABLE(…),…)
AS
BEGIN

IF IS_EMPTY(:iv_tab) THEN
RETURN;
END IF;
...

END;

• In the first example, SELECT COUNT is used determine the size of the table variable. This approach is not
recommended because it involves a select query, which is expensive.

SAP HANA Performance Guide for Developers


SQLScript Performance PUBLIC 183
• The second example, which uses the CARDINALITY function, is faster than the first example because it
does not involve a query. However, it requires the data to be copied into an array, which has an impact on
performance.
• The recommended method is to use IS_EMPTY, which involves one direct operation on the table variable.
Note that another recommended option is to use RECORD_COUNT.

Related Information

Emptiness Check for Tables and Table Variables

6.4.5.2 Determining the Size of a Table or Table Variable


Using RECORD_COUNT()

For the best performance, you should use RECORD_COUNT rather than COUNT or CARDINALITY.
RECORD_COUNT is the fastest option. It simply takes the name of the table or table variable as the argument
and returns the number of records.

COUNT and CARDINALITY are both slower and have a greater impact on performance (see the previous topic).

The two examples below show the use of CARDINALITY and RECORD_COUNT, where the derived size of the
table variable is used to construct a loop:

Problematic: CARDINALITY

DO (IN inTab TABLE(i int) => TAB,...)


BEGIN
DECLARE i int;
DECLARE v nvarchar(200);

FOR i IN 1 .. CARDINALITY(ARRAY_AGG(:inTab.I))
DO
v = :t.col_a[:i];
...

END FOR;
END;

Recommended: RECORD_COUNT

SAP HANA Performance Guide for Developers


184 PUBLIC SQLScript Performance
DO (IN inTab TABLE(i int) => TAB,...)
BEGIN
DECLARE i int;
DECLARE v nvarchar(200);

FOR i IN 1 .. record_count(:inTab)
DO
v = :t.col_a[:i];
...

END FOR;
END;

6.4.5.3 Accessing a Specific Table Cell Using Index-Based


Cell Access

To access a specific cell of a table variable, use index-based access rather than a select query or an array. This
is faster and has a lower impact on performance.

Reading a Value

The examples below read the values from the first row of the two columns A and B. Note that for read access a
colon (:) is needed before the table variable:

Problematic: Array

CREATE PROCEDURE P (
IN iv_tab TABLE(A INT, B NVARCHAR(28)),…)
AS
BEGIN

DECLARE var_a_array INT ARRAY;


DECLARE var_b_array NVARCHAR(28) ARRAY;
DECLARE var_a INT;
DECLARE var_b NVARCHAR(28);
var_a_array = ARRAY_AGG(:iv_tab.A);
var_b_array = ARRAY_AGG(:iv_tab.B);

var_a = var_a_array[1];
var_b = var_b_array[1];
...

END;

Recommended: Index-Based Cell Access

SAP HANA Performance Guide for Developers


SQLScript Performance PUBLIC 185
CREATE PROCEDURE P (
IN iv_tab TABLE(A INT, B NVARCHAR(28)),…)
AS
BEGIN
DECLARE var_a INT;
DECLARE var_b NVARCHAR(28);
var_a = :iv_tab.A[1];
var_b = :iv_tab.B[1];
...
END;

• The first example shows the slower option, where ARRAY_AGG is used to get an array of values.
• The second example uses index-based access to read the values from the first row of the two columns A
and B.

Writing a Value

The examples below write a value to the second row of column A:

Problematic: Array

CREATE PROCEDURE P (
IN iv_tab TABLE(A INT, B NVARCHAR(28)),…)
AS
BEGIN
DECLARE a_arr INT ARRAY;
DECLARE b_arr NVARCHAR(28) ARRAY;
a_arr = ARRAY_AGG(:iv_tab.A);
b_arr = ARRAY_AGG(:iv_tab.B);

a_arr[2] = 5;

iv_tab = UNNEST(:a_arr, :b_arr) AS (A,B);


...
END;

Recommended: Index-Based Cell Access

CREATE PROCEDURE P (
IN iv_tab TABLE(A INT, B NVARCHAR(28)),…)
AS
BEGIN

iv_tab.A[2] = 5;
...

END;

• The slower option in the first example involves converting the two columns A and B of the table variable
iv_tab into arrays, then writing the new value 5 to the second cell in array a_arr. The two arrays are then
converted back into a table using the UNNEST function and assigned to the table variable iv_tab.

SAP HANA Performance Guide for Developers


186 PUBLIC SQLScript Performance
• The second example shows how index-based access can be used to directly access the second row of
column A in the iv_tab table variable and insert the value 5.

6.4.5.4 Searching for Key-Value Pairs in Table Variables

The SEARCH operator provides the most efficient way to search table variables by key-value pairs. It returns
the position of the first record that matches the search criteria, or NULL if there are no matches.

The syntax is as follows:

position = <tabvar>.SEARCH((<column_list>), (<value_list>) [, <start_position>])

For example, you have the following table named LT1:

Key 1 Key 2 Val 1

A 1 V11

E 5 V12

M 3 V15

You now run the following code:

DO (IN lt1 TABLE


(KEY1 NVARCHAR(1), Key2 INT, Val1 NVARCHAR(3)) => tab,...)
BEGIN
DECLARE pos INT;
DECLARE val LIKE :lt1.Val1;

IF :lt1.SEARCH((key1,key2), ('I',3)) IS NULL THEN

:lt1.insert(('I',3,'X'));

END IF;

pos = :lt1.SEARCH((key1,key2), ('M',3));

:lt1.delete(:pos);

val = :lt1.val1[:lt1.SEARCH((Key1,Key2),('E',5))];
...
END;

The first SEARCH operation is used to determine whether Key 1 Key 2 Val 1
the table already contains the key-value pair ('I',3) so
A 1 V11
that if not the record ('I',3,'X') can be inserted. Since
there is no record that matches, the new record is inserted. E 5 V12

M 3 V15

I 3 X

SAP HANA Performance Guide for Developers


SQLScript Performance PUBLIC 187
The second SEARCH operation is used to determine Key 1 Key 2 Val 1
whether the key-value pair ('M',3) exists. Since the value
A 1 V11
of pos (not NULL) confirms that it does, the record can be
deleted. E 5 V12

M 3 V15

I 3 X

The third search operation retrieves the position of the key- Key 1 Key 2 Val 1
value pair ('E',5), which is then used to assign the value
A 1 V11
of val1 to val (the value of val is then V12).
E 5 V12

I 3 X

6.4.5.5 Inserting Table Variables into Other Table Variables

The most efficient way to insert the content of one table variable into another is to use the INSERT operator.
The insertion is done in one operation and does not involve using an SQL query. An alternative approach using
the UNION operator has a higher performance cost.

The following two examples compare the use of the UNION and INSERT operators:

Problematic: Query

DO (OUT ret_tab TABLE(col_a nvarchar(200))=>?)


BEGIN
DECLARE i int;
DECLARE varb nvarchar(200);
...

FOR i IN 1 .. record_count(:t) DO
CALL p (:varb, out_tab);

ret_tab = SELECT COL_A FROM :out_tab


UNION SELECT COL_A FROM :ret_tab;
...

END FOR;
END;

Recommended: INSERT

SAP HANA Performance Guide for Developers


188 PUBLIC SQLScript Performance
DO (OUT ret_tab TABLE(col_a nvarchar(200))=>?)
BEGIN
DECLARE i int;
DECLARE varb nvarchar(200);
...

FOR i IN 1 .. record_count(:t) DO
CALL p (:varb, out_tab);

:ret_tab.insert(:out_tab);

...

END FOR;
END;

The examples combine col_a from the table variable out_tab with col_a in the table variable ret_tab:

• In the first example, the UNION operator is used to combine the result sets of two select queries. This is a
costly operation in terms of performance.
• The syntax used for the INSERT operation (second example) is as follows:

:<target_table_var>[.(<column_list>)].INSERT(:<source_table_var>[,
<position>])

• A position is not specified in the example, so the values are simply appended at the end. Similarly,
no columns are specified, so all columns are inserted. In this example, there is only the one column,
col_a.
• The souce and target columns must be of the same data type, in this case it is NVARCHAR.

6.4.5.6 Inserting Records into Table Variables

For scenarios requiring individual data records to be inserted at the end of a table variable, the recommended
way is to use the INSERT operator. The insertion is done in one operation and is faster and more efficient than
the alternative approaches using arrays or index-based cell access.

The three examples below compare the ways in which 10 rows can be inserted at the end of a table variable
using index-based array access, index-based cell access, and the INSERT operator:

Problematic: Array

SAP HANA Performance Guide for Developers


SQLScript Performance PUBLIC 189
CREATE PROCEDURE P (
IN iv_tab TABLE(A INT, B NVARCHAR(28),…),
…)
AS
BEGIN
DECLARE a_arr INT ARRAY;
DECLARE b_arr NVARCHAR(28) ARRAY;
DECLARE c_arr NVARCHAR(28) ARRAY;
DECLARE sizeTab,i BIGINT;

a_arr = ARRAY_AGG(:iv_tab.A);
b_arr = ARRAY_AGG(:iv_tab.B);
c_arr = ARRAY_AGG(:iv_tab.C);

sizeTab = CARDINALITY(:a_arr);

FOR i IN 1 .. 10 DO
A_ARR[:sizeTab+i] = 1;
B_ARR[:sizeTab+i] = 'ONE';
C_ARR[:sizeTab+i] = 'EINS';
END FOR;

iv_tab = UNNEST(:a_arr,:b_arr,:c_arr)
AS (A,B,C);

END;

Problematic: Index-Based Cell Access

CREATE PROCEDURE P (
IN iv_tab TABLE(A INT, B NVARCHAR(28),… ))
…)
AS
BEGIN

DECLARE sizeTab,i BIGINT;

sizeTab = RECORD_COUNT(:iv_tab);

FOR i IN 1 .. 10 DO
iv_tab.A[:sizeTab+i] = 1;
iv_tab.B[:sizeTab+i] = 'ONE';
iv_tab.C[:sizeTab+i] = 'EINS';
END FOR;

END;

Recommended: INSERT

SAP HANA Performance Guide for Developers


190 PUBLIC SQLScript Performance
CREATE PROCEDURE P (
IN iv_tab TABLE(A INT, B NVARCHAR(28),…),
…)
AS
BEGIN

DECLARE i BIGINT;

FOR i IN 1 .. 10 DO

:iv_tab.INSERT(( 1,'ONE','EINS'));

END FOR;

END;

• The first example has the highest impact on performance:


• Each column of the table variable needs to be converted into an array.
• The CARDINALITY function needs to be executed to determine the size of the array.
• The determined size is used within the loop to append the values at the end of each array.
• UNNEST is used to convert the arrays back into a table variable.
• The second example is faster but still requires the following:
• The size of the table variable is determined using RECORD_COUNT.
• The determined size is used within the loop to append the values at the end of each column (see the
next topic for an example of how this could be improved).
• The third example uses INSERT, which is recommended for the following reasons:
• It is not necessary to determine the size of the table variable. If a position is not given in the INSERT
statement, the new values are simply appended at the end of the table variable.
• INSERT allows entire rows to be inserted. The syntax is as follows:

:<table_variable>.INSERT((<value1,…, <valueN), <index>)

6.4.5.7 Updating Individual Records in Table Variables

It is more efficient to update all applicable values in a table row in one operation, for example, using the
UPDATE operator or index-based row access, rather than individual table cells (index-based cell access).

The two examples below show how a record at a specific position can be updated:

Problematic: Index-Based Cell Access

SAP HANA Performance Guide for Developers


SQLScript Performance PUBLIC 191
CREATE PROCEDURE P (
IN iv_tab TABLE(A INT, B NVARCHAR(28)…),…)
AS
BEGIN

iv_tab.A[42] = 1;
iv_tab.B[42] = 'ONE';
iv_tab.C[42] = 'EINS';
...
END;

Recommended: UPDATE

CREATE PROCEDURE P (
IN iv_tab TABLE(A INT, B NVARCHAR(28)),…)
AS
BEGIN

iv_tab[42] = (1, 'ONE', 'EINS');


iv_tab.update((1, 'ONE', 'EINS'),42);

...
END;

• In the first example, index-based cell access is used to assign new values to each column separately.
• In the second example, the entire row is updated in a single operation. There are two equivalent syntax
options:

• :<table_variable>.UPDATE((<value1>,…, <valuen>), <index>)

• <table_variable>[<index>] = (<value1>,…, <valuen>)

6.4.5.8 Deleting Individual Records in Table Variables

The DELETE operator is the most efficient way of deleting records from a table variable. It allows you to delete
single records, a range of records, or selected records.

The two examples below show how the last record in the table variable can be deleted:

Problematic: Query

SAP HANA Performance Guide for Developers


192 PUBLIC SQLScript Performance
CREATE PROCEDURE P (
IN iv_tab TABLE(A INT, B NVARCHAR(28),... ))
AS
BEGIN

DECLARE rc BIGINT = RECORD_COUNT(:iv_tab);


...
iv_tab_temp = SELECT
ROW_NUMBER() OVER() AS ROW_NO,
it.*
FROM :iv_tab AS it;

iv_tab = SELECT itt.A, itt.B, itt.C


FROM :iv_tab_temp as itt
WHERE row_no <> :rc;
...

END;

Recommended: DELETE

CREATE PROCEDURE P (
IN iv_tab TABLE(A INT, B NVARCHAR(28), ...))
AS
BEGIN

:iv_tab.delete(RECORD_COUNT(:iv_tab));

:iv_tab.delete(42..100);
:iv_tab.delete();

...

END;

• The first example uses queries:


• The size of the table variable is determined using RECORD_COUNT.
• A SELECT query copies the entire contents of the table variable iv_tab to a temporary table
iv_tab_temp. The ROW_NUMBER function is used to number the rows within the result set.
• A SELECT query copies all rows except the last row (determined by RECORD_COUNT) from the
temporary table back to the table variable iv_tab.
• The second example uses the DELETE operator:
• The size of the table variable is determined using RECORD_COUNT.
• The syntax of the DELETE operator is as follows:

:<table_variable>.DELETE(<index>)

The index of the last row has been determined by RECORD_COUNT.


Note that if an index is not specified, all rows are deleted. For example:

:iv_tabl:delete();

SAP HANA Performance Guide for Developers


SQLScript Performance PUBLIC 193
Other syntax options supported by the DELETE operator:
• Delete a block:

:<table_variable>.DELETE(<from_index>..<to_index>)

• Delete a selection:

:<table_variable>.DELETE(<array_of_integers>)

6.4.6 Skipping Expensive Queries

You can skip a query within a procedure by using a constant predicate that evaluates to false. Skipping an
expensive query is an effective measure for improving performance.

The three examples below show the ways in which an expensive query can be handled within a procedure:

Problematic

CREATE PROCEDURE get_product_by_filter(IN im_input_string VARCHAR(5000))


AS
BEGIN

step1 = SELECT * FROM VIEW1;


step2 = SELECT * FROM VIEW2; -- expensive query

res = SELECT * FROM :step1 UNION ALL


SELECT * FROM :step2;
...

END;

Recommended

CREATE PROCEDURE get_product_by_filter(IN im_input_string VARCHAR(5000))


AS
BEGIN

step1 = SELECT * FROM VIEW1;


step2 = SELECT * FROM VIEW2 WHERE :im_input_string NOT IN ('foo', 'bar'); --
expensive query with input filter

res = SELECT * FROM :step1 UNION ALL


SELECT * FROM :step2;
...

END;

Recommended

SAP HANA Performance Guide for Developers


194 PUBLIC SQLScript Performance
CREATE PROCEDURE get_product_by_filter(IN im_input_string VARCHAR(5000))
AS
BEGIN

step1 = SELECT * FROM VIEW1;


IF :im_input_string IN ('foo', 'bar') THEN
step2 = SELECT * FROM VIEW2; -- expensive query
END IF;

res = SELECT * FROM :step1 UNION ALL


SELECT * FROM :step2;
...

END;
CALL get_product_by_filter('baz');
-- internally rewritten by SQLScript optimizer rule (constant parameter/
variable propagation)
get_product_by_filter('baz')
AS
BEGIN

step1 = SELECT * FROM VIEW1;


IF 'baz' IN ('foo', 'bar') THEN
step2 = SELECT * FROM VIEW2; -- pruned by dead-branch elimination
END IF;

res = SELECT * FROM :step1 UNION ALL


SELECT * FROM :step2;
...

END;

• In the first example, the expensive query (shown in bold) is always executed.
• In the second example, a condition is applied in the WHERE clause of the expensive query. When the
condition evaluates to false, the complex query processing is pruned and an empty result returned.
• In the third example, the constant input parameter value is propagated to the referenced conditional
branch. When the condition evaluates to false, this leads to the query being pruned and consequently not
executed. The value shown in bold ('baz') is propagated as a constant during compilation.

6.4.7 Using Dynamic SQL Carefully

Using dynamic SQL can be slow. Because the query string is not known during compilation, it limits the
opportunities for optimizations, such as parallel execution and statement inlining. Also, the statement is
potentially recompiled every time it is executed if it is not used with a parameterized query.

Another related problem is security because constructing SQL statements without proper checks of the
variables used can create a security vulnerability, for example, SQL injection. Using variables in SQL
statements prevents these problems because type checks are performed at compile time and parameters
cannot inject arbitrary SQL code.

When there is an alternative to dynamic SQL using variables, you should choose that option. If you do decide to
use dynamic SQL, you need to be careful about how you use it, given the limitations described above.

SAP HANA Performance Guide for Developers


SQLScript Performance PUBLIC 195
The table below summarizes potential use cases for dynamic SQL:

Feature Proposed Solution

Projected attributes Dynamic SQL

Projected literals SQL + variables

FROM clause SQL + variables; result structure must remain unchanged

Related Information

Using Input and Output Parameters [page 196]

6.4.7.1 Using Input and Output Parameters

The INTO and USING clauses of the EXECUTE IMMEDIATE and EXEC statements provide additional support for
parameterizing dynamic SQL. The INTO clause can be used to assign the result sets or output parameters of a
query to scalar or table variables.

The two examples below compare a string built using dynamic SQL and a parameterized version of the same
code:

Problematic: Building the String

DO (IN I INT => ?, IN J NVARCHAR(28)=> ? )


BEGIN

DECLARE TNAME nvarchar(32) = 'MYTAB’;


DECLARE CNAME nvarchar(32) = 'AGE';
EXECUTE IMMEDIATE
'SELECT MAX('|| :CNAME ||') AS MAX, NAME FROM '
|| :TNAME ||
' WHERE A = ' ||:I ||' OR B='''||:J||'''GROUP BY
name';
END;

SELECT MAX(AGE) AS MAX,NAME FROM MYTAB WHERE A=1 OR B='ONE'


SELECT MAX(AGE) AS MAX,NAME FROM MYTAB WHERE A=2 OR B='TWO'

Recommended: With Parameters

SAP HANA Performance Guide for Developers


196 PUBLIC SQLScript Performance
DO (IN I INT => 1, IN J NVARCHAR(28)=> 'ONE')
OUT K TABLE ( MAX INT, NAME NVARCHAR(256)) => ?)
BEGIN

DECLARE TNAME nvarchar(32) = 'MYTAB';


DECLARE CNAME nvarchar(32) = 'AGE';
EXECUTE IMMEDIATE
'SELECT MAX('|| :CNAME ||') AS MAX, NAME FROM '
|| :TNAME ||
' WHERE A = ? OR B = ? group by NAME '
INTO K
USING :I,:J;
END;

SELECT MAX(AGE) AS MAX,NAME FROM MYTAB WHERE A=? OR B=?

• In the first example, the concatenated SQL strings contain the input values because parameters were
not used in the select statement. When the input is changed, a different query string is generated. The
resulting SQL string is stored in the plan cache. If the string is changed frequently, this may result in a flood
in the plan cache.
• In the second example, the code is parameterized as follows:
• The table variable K is defined as an output parameter. The INTO clause is used to assign the result set
to the table variable K. This allows it to be used in further processing.
• The select statement is parameterized. The USING clause is used to pass in the values of the scalar
variables I and J. Although the input values are constants, the query is still parameterized.
• The concatenated SQL string has a more generic form and will not cause the conflicts seen above
when stored in the plan cache.

6.4.8 Simplifying Application Coding with Parallel Operators

Parallel operators allow sequential execution to be replaced with parallel execution. By identifying costly
sequential coding and using parallel operators in its place, you can significantly improve performance. The
supported parallel operators are MAP_REDUCE and MAP_MERGE (a specialization of MAP_REDUCE).

The targeted sequential coding patterns are those consisting of FOR loops, where operations are performed
separately on each row of a table, and UNION operators, which typically combine the results.

With the MAP_MERGE operator, a mapper function is applied to each row of a mapper table and the results
returned in an intermediate result table for each row. On completion, all intermediate results tables are
combined into a single table.

This form of parallel execution allows significant performs gains:

• See the Map Reduce Operator topic for a graphical illustration of the process.
• See the Map Merge Operator topic for an example in which sequential execution is replaced with parallel
execution.

SAP HANA Performance Guide for Developers


SQLScript Performance PUBLIC 197
Related Information

Map Reduce Operator [page 198]


Map Merge Operator [page 204]

6.4.8.1 Map Reduce Operator

MAP_REDUCE is a programming model introduced by Google that allows easy development of scalable parallel
applications for processing big data on large clusters of commodity machines. The MAP_MERGE operator is a
specialization of the MAP_REDUCE operator.

Syntax

 Code Syntax

MAP_REDUCE(<input table/table variable name>, <mapper specification>,


<reducer specification>)
<mapper spec> ::= <mapper TUDF>(<list of mapper parameters>) group by <list
of columns in the TUDF> as <ID>
<reducer spec> ::= <reduce TUDF>(<list of reducer TUDF parameters>)
| <reduce procedure>(<list of reducer procedure parameters>)
<mapper parameter> ::= <table/table variable name>.<column name> | <other
scalar parameter>
<reducer TUDF parameter> ::= <ID> | <ID>.<key column name> | <other scalar
parameter>
<reducer procedure parameter> ::= <reducer TUDF parameter> | <output table
parameter>

Example

We take as an example a table containing sentences with their IDs. If you want to count the number of
sentences that contain a certain character and the number of occurrences of each character in the table, you
can use the MAP_REDUCE operator in the following way:

Mapper Function

 Sample Code

Mapper Function

create function mapper(in id int, in sentence varchar(5000))


returns table (id int, c varchar, freq int) as begin
using sqlscript_string as lib;
declare tv table(result varchar);
tv = lib:split_to_table(:sentence, ' ');
return select :id as id, result as c, count(result) as freq from :tv
group by result;

SAP HANA Performance Guide for Developers


198 PUBLIC SQLScript Performance
end;

Reducer Function

 Sample Code

Reducer Function

create function reducer(in c varchar, in vals table(id int, freq int))


returns table (c varchar, stmt_freq int, total_freq int) as begin
return select :c as c, count(distinct(id)) as stmt_freq, sum(freq) as
total_freq from :vals;
end;

 Sample Code

do begin
declare result table(c varchar, stmt_freq int, total_freq int);
result = MAP_REDUCE(tab, mapper(tab.id, tab.sentence) group by c as X,
reducer(X.c, X));
select * from :result order by c;
end;

The code above works in the following way:

1. The mapper TUDF processes each row of the input table and returns a table.

2. When all rows are processed by the mapper, the output tables of the mapper are aggregated into a single
big table (like MAP_MERGE).

SAP HANA Performance Guide for Developers


SQLScript Performance PUBLIC 199
3. The rows in the aggregated table are grouped by key columns.

4. For each group, the key values are separated from the table. The grouped table without key columns
is called 'value table'. The order of the rest of columns is preserved. It is possible to have multiple key

SAP HANA Performance Guide for Developers


200 PUBLIC SQLScript Performance
columns. If the layout of the output table is table(a int, b varchar, c timestamp, d int) and
the key column is b and c, the layout of the value table is table(a int, d int).

5. The reducer TUDF (or procedure) processes each group and returns a table (or multiple tables).

6. When all groups are processed, the output tables of the reducer are aggregated into a single big table (or
multiple tables, if the reducer is a procedure).

SAP HANA Performance Guide for Developers


SQLScript Performance PUBLIC 201
Retrieving Multiple Outputs from MAP_REDUCE

If you use a read-only procedure as a reducer, you can fetch multiple table outputs from a MAP_REDUCE
operator. To bind the output of MAP_REDUCE operators, you can simply apply the table variable as the
parameter of the reducer specification. For example, if you want to change the reducer in the example above to
a read-only procedure, apply the following code.

create procedure reducer_procedure(in c varchar, in values table(id int, freq


int), out otab table (c varchar, stmt_freq int, total_freq int))
reads sql data as begin
otab = select :c as c, count(distinct(id)) as stmt_freq, sum(freq) as
total_freq from :values;
end;

do begin
declare result table(c varchar, stmt_freq int, total_freq int);
MAP_REDUCE(tab, mapper(tab.id, tab.sentence) group by c as X,
reducer_procedure(X.c, X, result));
select * from :result order by c;
end;

Passing Extra Arguments as a Parameter to a Mapper or a Reducer

It is possible to pass extra arguments as parameters of a mapper or a reducer.

 Sample Code

create function mapper(in id int, in sentence varchar(5000), in


some_extra_arg1 int, in some_extra_arg2 table(...), ...)
returns table (id int, c varchar, freq int) as begin

SAP HANA Performance Guide for Developers


202 PUBLIC SQLScript Performance
...
end;

create function reducer(in c varchar, in values table(id int, freq int), in


some_extra_arg1 int, in some_extra_arg2 table(...), ...)
returns table (c varchar, stmt_freq int, total_freq int) as begin
...
end;

do begin
declare result table(c varchar, stmt_freq int, total_freq int);
declare extra_arg1, extra_arg2 int;
declare extra_arg3, extra_arg4 table(...);
... more extra args ...
result = MAP_REDUCE(tab, mapper(tab.id,
tab.sentence, :extra_arg1, :extra_arg3, ...) group by c as X,
reducer(X.c, X, :extra_arg2, :extra_arg4,
1+1, ...));
select * from :result order by c;
end;

 Note

There is no restriction about the order of input table parameters, input column parameters, extra
parameters and so on. It is also possible to use default parameter values in mapper/reducer TUDFs or
procedures.

Restrictions

The following restrictions apply:

• Only Mapper and Reducer are supported (no other Hadoop functionalities like group comparator, key
comparator and so on).
• The alias ID in the mapper output and the ID in the Reducer TUDF (or procedure) parameter must be the
same.
• The Mapper must be a TUDF, not a procedure.
• The Reducer procedure should be a read-only procedure and cannot have scalar output parameters.
• The order of the rows in the output tables is not deterministic.

Related Information

Map Merge Operator [page 204]

SAP HANA Performance Guide for Developers


SQLScript Performance PUBLIC 203
6.4.8.2 Map Merge Operator

Description

The MAP_MERGE operator is used to apply each row of the input table to the mapper function and unite all
intermediate result tables. The purpose of the operator is to replace sequential FOR-loops and union patterns,
like in the example below, with a parallel operator.

 Sample Code

DO (OUT ret_tab TABLE(col_a nvarchar(200))=>?)


BEGIN
DECLARE i int;
DECLARE varb nvarchar(200);
t = SELECT * FROM tab;
FOR i IN 1 .. record_count(:t) DO
varb = :t.col_a[:i];
CALL mapper(:varb, out_tab);
ret_tab = SELECT * FROM :out_tab
UNION SELECT * FROM :ret_tab;
END FOR;
END;

 Note

The mapper procedure is a read-only procedure with only one output that is a tabular output.

Syntax

<table_variable> = MAP_MERGE(<table_or_table_variable>, <mapper_identifier>


(<table_or_table_variable>.<column_name> [ {,
<table_or_table_variable>.<column_name>} … ] [,
<param_list>])
<param_list> ::= <param> [{, <param>} …] <paramter> =
<table_or_table_variable>
| <string_literal> | <numeric_literal> |
<identifier>

The first input of the MAP_MERGE operator is the mapper table <table_or_table_variable> . The mapper
table is a table or a table variable on which you want to iterate by rows. In the above example, it would be table
variable t.

The second input is the mapper function <mapper_identifier> itself. The mapper function is a function
you want to have evaluated on each row of the mapper table <table_or_table_variable>. Currently, the
MAP_MERGE operator supports only table functions as <mapper_identifier>. This means that in the above
example you need to convert the mapper procedure into a table function.

You also have to pass the mapping argument <table_or_table_variable>.<column_Name> as an input of


the mapper function. Going back to the example above, this would be the value of the variable varb.

SAP HANA Performance Guide for Developers


204 PUBLIC SQLScript Performance
Example

As an example, let us rewrite the above example to leverage the parallel execution of the MAP_MERGE operator.
We need to transform the procedure into a table function, because MAP_MERGE only supports table functions
as <mapper_identifier>.

 Sample Code

CREATE FUNCTION mapper (IN a nvarchar(200))


RETURNS TABLE (col_a nvarchar(200))
AS
BEGIN
ot = SELECT :a AS COL_A from dummy;
RETURN :ot;
END;

After transforming the mapper procedure into a function, we can now replace the whole FOR loop by the
MAP_MERGE operator.

Sequential FOR-Loop Version Parallel MAP_Merge Operator

DO (OUT ret_tab TABLE(col_a DO (OUT ret_tab TABLE(col_a


nvarchar(200))=>?) nvarchar(200))=>?)
BEGIN BEGIN
DECLARE i int; t = SELECT * FROM tab;
DECLARE varb nvarchar(200); ret_tab = MAP_MERGE(:t,
t = SELECT * FROM tab; mapper(:t.col_a));
FOR i IN 1 .. record_count(:t) END;
DO
varb = :t.col_a[:i];
CALL mapper(:varb,
out_tab);
ret_tab = SELECT *
FROM :out_tab
UNION SELECT *
FROM :ret_tab;
END FOR;
END;

6.4.9 Replacing Row-Based Calculations with Set-Based


Calculations

Row-based calculations are costly, particularly when they involve multiple nested loops. It is highly
recommended to rewrite them as set-based calculations. In the example below, which is a simplified illustration
of a real-life example, performance was improved by a factor of 85 through rewriting.

1. Compare the high-level overviews of the row-based calculation and set-based calculation.

SAP HANA Performance Guide for Developers


SQLScript Performance PUBLIC 205
The original scenario uses cursors with scalar functions. The scalar functions are called for each row. This
is repeated within the nested loops (n iterations each time):

Below, the scalar functions are rewritten as procedures. All values are contained in tables:

2. Operate on a set of values.


The original scenario uses the cursor C1 to do a row-by-row calculation for each B value:

CREATE FUNCTION sfunc2(...)


RETURNS v_result NVARCHAR(2)
AS
CURSOR c1 FOR
SELECT *
FROM (SELECT SFUNC_3( :A, ip.B , :im_cti, :im_ci, :im_rd, :im_sp) AS C
,ip.*

SAP HANA Performance Guide for Developers


206 PUBLIC SQLScript Performance
,view1.esi
,view1.esti
,view1.au
FROM ( SELECT ...
FROM table_pes es
,table_es stat WHERE es.si = :A
AND es.ensi = stat.ensi
) view1
RIGHT OUTER JOIN (...) ip ON view1.B = ip.B);
BEGIN
FOR rec AS c1 DO
IF (rec.C = 1 OR rec.C = 3 ) THEN
ex_result := rec.C;
RETURN;
END IF;
END FOR;
ex_result := NULL;
END;

Below, the cursor C1 is replaced by a table assignment TAB_B. The function SFUNC_3 is rewritten as a
table function PROC_3 that takes TAB_B as an input. All statuses are calculated at once:

CREATE PROCEDURE proc2(..., OUT v_result NVARCHAR(2) )


READS SQL DATA AS
BEGIN
TAB_B = SELECT ip.B , ,ip.*,view1.esi,view1.esti,view1.au
FROM ( SELECT ...
FROM table_pes es
,table_es stat WHERE es.si = :A
AND es.ensi = stat.ensi
) view1
RIGHT OUTER JOIN (...) ip ON view1.B = ip.B;
CALL PROC_3(:A,:TAB_B,:im_cti,:im_ci,:im_rd,:im_sp, TAB_C);

TAB_RES = SELECT max(C) FROM :TAB_C where C in (1,3);


IF (NOT IS_EMPTY(:TAB_RES)) THEN
ex_result = :TAB_C.C[1];
RETURN;
END IF;
ex_result := NULL;
END

3. Filter a set of values.


The original scenario uses the cursor enr1 to iterate over a set of values and filter by B:

CURSOR enrl FOR


SELECT schd_id,...
FROM ...
WHERE enrl.schd_id = :p_schd_id;
FOR enrlrec AS enrl DO
...
SELECT COUNT(*) AS t_count INTO var_sc
FROM table_shd b,
table_pes c ,
table_schdr d,
table_cm e,
table_schdr f,
table_es g
WHERE
e.cti = :im_cti
AND e.ci = :im_ci
AND e.rev_dte = :im_rd
AND f.shi = :B
AND b.cti = :im_cti
AND b.act_ci = :im_ci
AND b.rev_dte = :im_rd

SAP HANA Performance Guide for Developers


SQLScript Performance PUBLIC 207
AND c.si = :A
AND d.shi = b.shi
AND ABS(DAYS_BETWEEN(d.ed ,f.sd))< COALESCE(e.sdtd,0)
AND c.esi = g.esi
AND g.esti != 'C';
...
END FOR;

Below, the B filter is replaced by a GROUP BY clause. A left outer join is used to filter the result:

...
t1 = SELECT f.B AS shi, count(f.B) AS t_count
FROM table_shd b,
table_pes c ,
table_schdr d,
table_cm e,
table_schdr f,
table_es g
WHERE
e.cti = :im_cti
AND e.ci = :im_ci
AND e.rev_dte = :im_rev_dte
AND c.A = A
AND b.cti = :im_cti
AND b.act_ci = :im_ci
AND b.rev_dte = :im_rev_dte
AND d.B = b.B
AND ABS(DAYS_BETWEEN(d.ed ,f.sd))< COALESCE(e.sdtd,0)
AND c.esi = g.esi
AND g.esti != 'C'
GROUP BY f.B;
t2 = SELECT a.B, b.t_count
FROM :TAB_B as a
LEFT OUTER JOIN :t1 b
ON a.B = b.shi ORDER BY a.B;
...

6.4.10 Avoiding Busy Waiting

Avoid busy waiting situations by using the SLEEP_SECONDS and WAKEUP_CONNECTION procedures
contained in the SQLSCRIPT_SYNC built-in library.

In some scenarios you might need to let certain processes wait for a while (for example, when executing
repetitive tasks). Implementing such waiting manually might lead to busy waiting and to the CPU performing
unnecessary work during the waiting time.

Compare the following examples:

Problematic: Busy Wait

SAP HANA Performance Guide for Developers


208 PUBLIC SQLScript Performance
CREATE PROCEDURE WAIT( IN interval INT)
AS
BEGIN
DECLARE start_time TIMESTAMP = CURRENT_TIMESTAMP;

WHILE SECONDS_BETWEEN(start_time, CURRENT_TIMESTAMP) < :interval


DO

END WHILE;

END;
CREATE PROCEDURE MONITOR AS
BEGIN
WHILE 1 = 1 DO
IF RECORD_COUNT(OBSERVED_TABLE) > 100000 THEN
INSERT INTO LOG_TABLE VALUES (CURRENT_TIMESTAMP,
'Table size exceeds 100000 records');
END IF;
CALL WAIT (300);
END WHILE;
END

Recommended: Sleep Procedure

DO BEGIN
USING SQLSCRIPT_SYNC AS SYNC;
DECLARE i INTEGER;
lt_con = SELECT connection_id AS CON_ID
FROM M_SERVICE_THREADS
WHERE lock_wait_name like 'SQLSCRIPT_WAIT%';
FOR i in 1..record_count(:lt_con) DO
CALL SYNC:WAKEUP_CONNECTION(:lt_con.CON_ID[:i]);
END FOR;
END;
CREATE PROCEDURE MONITOR AS
BEGIN
USING SYS.SQLSCRIPT_SYNC AS SYNC;
WHILE 1 = 1 DO
IF RECORD_COUNT(OBSERVED_TABLE) > 100000 THEN
INSERT INTO LOG_TABLE VALUES (CURRENT_TIMESTAMP,
'Table size exceeds 100000 records');
END IF;
CALL SYNC:SLEEP_SECONDS(300);
END WHILE;
END

• In the first example, a WAIT procedure defines a period during which a process simply waits. In the
MONITOR procedure, WAIT is used to make the process wait for 300 seconds before it resumes. This
means that the thread is active all the time.
• In the second example, the SLEEP_SECONDS and WAKEUP_CONNECTION procedures are used as
follows:
• First, the SQLSCRIPT_SYNC library is referenced with USING SQLSCRIPT_SYNC.
• The WAKEUP_CONNECTION procedure is used to resume all sleeping processes.
A sleeping process is listed in the monitoring view M_SERVICE_THREADS. Its LOCK_WAIT_NAME
starts with 'SQLScript/SQLScript_Sync/Sleep/'.
• The SLEEP_SECONDS procedure is used to pause the execution of the current process for the
specified waiting time in seconds.

SAP HANA Performance Guide for Developers


SQLScript Performance PUBLIC 209
6.4.11 Calling Procedures

By using parameters correctly when you call procedures, you not only make your code more robust and
readable but also easier to parse and therefore faster to execute.

For better performance, SAP recommends that you use parameterized call statements:

• A parameterized query compiles once only, therefore reducing the compile time.
• A stored query string in the SQL plan cache is more generic and a precompiled query plan can be reused
for the same procedure call with different input parameters.
• When query parameters are not used in the call statement, the system triggers a new query plan
generation.

Related Information

Passing Named Parameters [page 210]


Changing the Container Signature [page 211]

6.4.11.1 Passing Named Parameters

When you call a procedure, you can pass named parameters. Named parameters are recommended because
they improve the readability of the code and make the purpose of the parameters clear. They also provide
flexibility because the order of the parameters in the call statement does not need to match the order in the
procedure signature.

You can pass named parameters by using the token =>.

For example, two ways in which the same procedure can be called are shown below, firstly without and
secondly with named parameters:

Without Named Parameters

DO (...)
BEGIN
...
CALL INNER_PROC (:tab,
:f,
:sd,
:es,
c_sum );
...
END;

With Named Parameters

SAP HANA Performance Guide for Developers


210 PUBLIC SQLScript Performance
DO (...)
BEGIN
...
CALL INNER_PROC ( cust_list => :tab,
filter_str => :f,
start_date => :sd,
end_date => :ed
cumulative_sum => c_sum );
...
END;

In the first example, the order of the parameters must be the same as in the procedure signature. In the
second, however, this order can be ignored. Note that this makes procedure calls more robust with respect to
changes to the container signature.

6.4.11.2 Changing the Container Signature


By using named parameters, you make your procedure calls more flexible and robust with respect to changes
to the container signature.

The examples below show how changes to container signatures can affect procedure calls:

Initial Container Signature


CREATE PROCEDURE P ( IN iv_tab TTYPE,
OUT ov_tab TTYPE
)
AS
BEGIN
...
END;

CALL P (iv_tab => inputTab, ov_tab => ?);

CALL P (inputTab, ?);

Changed Container Signa-


ture CREATE PROCEDURE P ( IN iv_tab TTYPE,
OUT ov_tab TTYPE,
IN var INTEGER DEFAULT NULL,
IN iv_tab2 DUMMY DEFAULT
DUMMY,
IN iv_tab3 TTYPE DEFAULT EMPTY
)
AS
BEGIN
...
END;

CALL P (iv_tab => inputTab, ov_tab => ?);

CALL P (inputTab, ?);

• The first example shows the initial container signature and two procedure calls created for it, firstly with
and secondly without named parameters.

SAP HANA Performance Guide for Developers


SQLScript Performance PUBLIC 211
• In the second example, the container signature includes three additional parameters with default values:
• Parameters with default values can be omitted in procedure calls with named parameters. The first
procedure call is therefore not affected by the changes.
• Parameters with default values cannot be omitted in procedure calls without named parameters. The
second procedure call is therefore affected by the changes, even though the initial parameter order has
not changed.

6.4.12 Miscellaneous

This section contains some additional best practices.

Related Information

Accessing and Assigning Variable Values [page 212]


Avoiding the No Data Found Exception [page 213]

6.4.12.1 Accessing and Assigning Variable Values

By using a colon (:) together with variables, you can make your code more readable as well as indicate to the
parser whether a variable value should be read or written.

To access a value from a variable (that is, to read) use the colon. To assign a value to a variable (that is, to
write), do not use a colon. By observing these rules, you not only make your code more readable but also easier
to parse.

Consider the following examples:

Problematic

DO (OUT tab TABLE( a INT, b INT) => ?)


BEGIN
DECLARE c INT = 10;
DECLARE b INT = 20;
...
tab = SELECT A AS A, b AS B FROM T WHERE C=c;
END;

Recommended

SAP HANA Performance Guide for Developers


212 PUBLIC SQLScript Performance
DO (OUT tab TABLE( a INT, b INT) => ?)
BEGIN
DECLARE c INT = 10;
DECLARE b INT = 20;
...
tab = SELECT A AS A, :b AS B FROM T WHERE C=:c;
END;

• In the first example above, it is not clear to the parser how to handle b and c. The column c is unknown and
b is interpreted as a column.
• In the second example, the colon indicates that it is the values of the two declared variables b and c that
need to be read.

Problematic

DO (...)
BEGIN
...
CALL INNER_PROC (tab, f, sd, ed, c_sum );
...
END;

Recommended

DO (...)
BEGIN
...
CALL INNER_PROC (:tab, :f, :sd, :ed, c_sum);
...
END;

• In the first example, it is not clear to the parser whether values should be read from or written to the
variables tab, f, sd, ed, and c_sum.
• In the second example, the colon indicates that the values should be read from the tab, f, sd, and ed
variables. The variable c_sum, however, should be assigned a value.

6.4.12.2 Avoiding the No Data Found Exception

The SELECT INTO statement does not accept an empty result set. To avoid a no data found exception being
thrown when a result set is empty, this condition should be handled by either an exit handler, an emptiness
check, or by assigning default values.

For example, the following query is run, but the result set is empty:

SELECT EID INTO ex_id FROM :tab;


SAP DBTech JDBC: [1299]: no data found: [1299] "SESSION28480"."GET_ID": line 14
col 4 (at pos 368):
[1299] (range 3) no data found exception: no data found

SAP HANA Performance Guide for Developers


SQLScript Performance PUBLIC 213
The examples below show how the three options mentioned above can be applied:

Exit Handler Emptiness Check Default Value

DECLARE EXIT HANDLER FOR IF is_empty(:tab) SELECT EID INTO ex_id


SQL_ERROR_CODE 1299 THEN DEFAULT NULL
BEGIN ex_id = NULL; FROM :tab;
ex_id = NULL; ELSE
END;
SELECT EID INTO ex_id SELECT EID INTO
FROM :tab; ex_id FROM :tab;

END IF;

• In the first example, an exit handler is defined specifically for the SQL error code 1299. If the result set is
empty, it assigns the value NULL to the target variable ex_id.
• In the second example, the IS_EMPTY predicate is used to check whether the result set is empty. If this is
the case, the target variable ex_id is assigned the value NULL.
• The third example makes uses of the DEFAULT values feature supported by the SELECT INTO statement,
which has the following syntax:

SELECT <select_list> INTO <var_name_list> [DEFAULT <scalar_expr_list>]


<from_clause>

SAP HANA Performance Guide for Developers


214 PUBLIC SQLScript Performance
7 Optimization Features in Calculation
Views

The calculation engine pre-optimizes queries before they are worked on by the SQL optimizer.

As shown in the overview below, a calculation view is instantiated at runtime when a query is executed. During
the instantiation process, the calculation engine simplifies the calculation view into a model that fulfills the
requirements of the query. This results in a reduced model that can be further optimized by the calculation
engine. After this, the SQL optimizer applies further optimizations and determines the query execution plan:

Some of the main optimization features include the following:

Feature Description

Join cardinality The specified join cardinality allows the optimizer to decide whether a
join needs to be executed. In models in which many joins are defined,
join pruning can lead to a significant performance boost and reduction in
temporary memory consumption.

SAP HANA Performance Guide for Developers


Optimization Features in Calculation Views PUBLIC 215
Feature Description

Optimized join columns By explicitly allowing join columns to be optimized, you can potentially re-
duce the number of records that need to be processed at later stages of
query processing, resulting in better performance and reduced resource
consumption.

Dynamic joins Dynamic joins help improve the join execution process by reducing the
number of records processed by the join view node at runtime.

Union node pruning This allows the data sources of union nodes to be pruned, which helps
reduce resource consumption and benefits performance.

Column pruning Column pruning is an intrinsic feature of calculation views. It automati-


cally removes any columns that are not needed for the final results during
the early stages of processing.

 Note

In general, we recommend that you set a join cardinality and consider setting the Optimized Join Columns
flag.

 Tip

Union node pruning has proved to be a helpful modeling pattern in many scenarios.

7.1 Calculation View Instantiation

Calculation views are processed by the calculation engine, during which it applies a variety of optimizations.
These can sometimes result in nonrelational behavior, that is, behavior that is not typical in SQL, and this in
turn can have an impact on performance. One of the reasons for non-relational behavior is the instantiation
process.

The instantiation process transforms a stored calculation model into an executed calculation model based on a
query that is run on top of a calculation view. The calculation view is technically a column view that references
one specific node of the stored calculation model. Therefore, during the instantiation process, the query and
the stored calculation model are combined to build the executed calculation model.

The main difference between a relational view, or SQL with subselects, and a calculation model is that the
projection list in a relational view is stable even when another SQL statement is stacked on top of it. In a
calculation model, on the other hand, the projection list of each calculation node in the calculation model
depends on the columns requested by the query or the parent calculation node. The requested columns are
the superset of all columns that are used in the query, for example, the columns in the projection list, the
columns in the WHERE and HAVING clauses, the columns used for sorting, and the columns in the ON clause
for joins, and so on.

The following examples illustrate the difference between the stored model, that is, the calculation view, and the
optimized model that results when the calculation view is matched to the query.

SAP HANA Performance Guide for Developers


216 PUBLIC Optimization Features in Calculation Views
Example 1: Aggregration over a table

In this example, you can see that the table request contains only the attribute A and column K1 in its projection
list:

SAP HANA Performance Guide for Developers


Optimization Features in Calculation Views PUBLIC 217
Example 2: Join over two tables

This example shows that the attribute A is added to the projection list at the join node and the underlying
nodes because it is required for the join:

Example 3: Different semantics

The semantics can change, depending on which attributes and columns are requested. In this example, a
calculated attribute is defined on the aggregation node in the middle. You can see that in the aggregation over
the table, the attribute A is added to the projection list. The reason for this is the presence of the calculated

SAP HANA Performance Guide for Developers


218 PUBLIC Optimization Features in Calculation Views
attribute C_Attr. When the calculated attribute is not queried, the attribute A is not added to the projection list
of the aggregation:

SAP HANA Performance Guide for Developers


Optimization Features in Calculation Views PUBLIC 219
Example 4: Enforcing attributes using the keep flag

To force an attribute to be requested irrespective of the projection list of the parent node, you can set the keep
flag for this attribute. The effect of the keep flag on the executed model in example 1 is shown below. Now the
attribute B is requested on the lower nodes, despite not being requested by the query:

7.2 Setting Join Cardinality

A specified join cardinality allows the optimizer to decide based on the fields requested in the query whether
a join needs to be executed. In models in which many joins are defined, join pruning can lead to significant
performance boosts and reductions in temporary memory consumption.

Imagine a defined star join involving 100 or more tables through further join chains. If no dimensional values
are requested and the cardinality setting is n..1 or 1..1, all joins can be omitted. As a result, the query can be
reduced to one single table access to the central table, instead of involving all the potential join cascades.

SAP HANA Performance Guide for Developers


220 PUBLIC Optimization Features in Calculation Views
However, while join pruning allows performance to be improved and resource consumption reduced, there is
also a risk that it will return surprising results if the cardinality setting does not reflect the actual cardinality of
the data. Also, should the data cardinality change at any stage, for example, due to data remodeling or data
loading, the cardinality setting will need to be adjusted. Ideally, your data loading process should ensure that
the specified cardinality is upheld.

Related Information

Analytical Model Health Check [page 312]

7.2.1 Join Cardinality

A cardinality setting can be applied to joins in calculation views. It specifies for each entry in one table how
many matching entries there are in the other table of the join.

Join cardinality is denoted using two numbers. The left number describes how many matching entries each
entry in the right table can have in the left table, while the right number describes how many matching entries
each entry in the left table can have in the right table. For example, there is a join on the EMPLOYEE field
between table 1 (left table) and table 2 (right table):

• A join cardinality of 1..n specifies that each entry in table 2 has at most one matching entry in table 1.
Conversely, each entry in table 1 can have zero to n matching entries in table 2. The symbol n denotes
an arbitrary positive number. For example, the entry Alice in table 1 might have zero, one, or an arbitrary
number of matches in table 2.
• A join cardinality of 1..1 indicates that each entry in table 1, for example, the entry Alice in table 1, has zero
or one matching entry in table 2. In the same way, the entry Alice in table 2 also has at most one match in
table 1.

Join Pruning Prerequisites

The cardinality setting is used by the optimizer to decide, based on the fields requested in the query, whether
a join needs to be executed or whether it can be omitted without comprising the correctness of the data. A
join can be omitted if executing the join does not add or remove any records, and provided that no fields are
requested from the table that is to be omitted.

While inner joins can add records (multiple matching entries in the other table) and remove records (no
matching entry, remember that ..1 includes zero), outer joins can only add records. Therefore, by using join
cardinality to ensure that the table to be pruned has at most one matching item, you allow join pruning to occur
for outer joins. Text joins behave like left-outer joins in this respect. In the case of referential joins, pruning
can only occur if the referential integrity is set for the table that is not to be pruned. Note that the referential
integrity can be placed on the left table, the right table, or on both tables.

There is one exception to the rule that requires that the table to be pruned has a setting of ..1. This applies
when the query only requests measures with the count distinct aggregation mode. In this case, any

SAP HANA Performance Guide for Developers


Optimization Features in Calculation Views PUBLIC 221
repeated values that could potentially exist in the tables to be pruned are made unique again by the count
distinct calculation. Consequently, join execution does not change the results of the count distinct
measure even if there is an n..m cardinality.

All the following prerequisites therefore need to be met for join pruning to occur:

• No fields are requested from the table to be pruned.


• The join type is either an outer join, a referential join with integrity on the table that is not to be pruned, or a
text join where the table to be pruned contains the language column.
• The join cardinality is either ..1 for the table to be pruned, or the query only requests measures with
count distinct aggregation or does not request any measures at all.

Join Cardinality Proposals

If tables are directly involved in the join, cardinality proposals can be obtained from the modeling tool. These
values are based on the data cardinality at the time of the proposal. These proposals are not available if the join
includes further nodes (for example, the table is added through a projection node).

Irrespective of whether the cardinality setting is achieved through a proposal or manual setting, the optimizer
relies on these values when making decisions about join pruning. There are no runtime checks. If the setting
gives a lower cardinality than the actual cardinality of the data (for example, it is set to n..1, but in the data it is
n..m), omitting the join might lead to changes in the measure values, compared to when the join is executed.

7.2.2 Examples

The following examples demonstrate how the join cardinality setting can influence the outcome of queries.

All examples are based on two tables and a join between these two tables. In some examples, additional
projection or aggregation nodes are added. Otherwise, the examples are intentionally simple to illustrate the
mechanisms at work. Obviously, performance gains cannot be observed in these simple models even though
they exist.

Related Information

Example Tables [page 223]


Get Join Cardinality Proposals [page 223]
Impact of Join Cardinality on Pruning Decisions [page 225]
Impact of Incorrect Join Cardinality Settings [page 233]

SAP HANA Performance Guide for Developers


222 PUBLIC Optimization Features in Calculation Views
7.2.2.1 Example Tables

These two tables are used in the examples.

Table 9: SalesOrders Table


SALESORDER EMPLOYEE AMOUNT

001 Donald 20

002 Alice 22

003 Hans 40

004 Horst 33

Table 10: Employees Table


MANAGER EMPLOYEE

null Alice

Alice Donald

Alice Dagobert

Alice Hans

Hans Heinz

Hans Jane

? Horst

There is a 1..1 relationship between the EMPLOYEE field in the SalesOrders table and the EMPLOYEE field in
the Employees table. In contrast, there is a 1..n relationship between the EMPLOYEE field in the SalesOrders
table and the MANAGER field in the Employees table.

7.2.2.2 Get Join Cardinality Proposals

In this example, the modeling tool is used to get a proposal for the join cardinality. The proposal is derived
from the data cardinality applicable for the join fields between the tables at the time of the proposal. To get a
cardinality proposal, the join must be based directly on tables.

Procedure

1. Create a calculation view.

Create a calculation view that only includes a join between the SalesOrders and Employees tables. Join the
two tables with a left outer join on the EMPLOYEE field, where the SalesOrders table is the left table:

SAP HANA Performance Guide for Developers


Optimization Features in Calculation Views PUBLIC 223
Note that the Cardinality field above has a Propose Cardinality button for requesting proposals.

As described earlier, the SalesOrders table only has distinct values in the EMPLOYEE column. Similarly, the
EMPLOYEE column in the Employees table also has no repeated values. Therefore, if you wanted to set the
cardinality for a join on the EMPLOYEE fields, you would set it to 1..1. Since only tables are involved in this
join, you can now get the cardinality proposals.
2. Get a join cardinality proposal:
a. To get a join cardinality proposal, click the Propose Cardinality button.

Based on the cardinality of the EMPLOYEE field in both tables, 1..1 is proposed.
b. For the purposes of the example, remove the join and create a new left-outer join, this time between
the EMPLOYEE field in the left table and the manager field in the right table.
c. Get a new join cardinality proposal.

Since there are multiple duplicates in the manager column (for example, Alice), the tool proposes
1..n.

The modeler therefore proposes a cardinality for the join field based on the current data cardinality,
provided you have only included tables in your join.
3. To see that join cardinality proposals are not available if the join includes other nodes, create another
calculation view.
a. This time do not add the Employees table directly to the join node, but place it in a projection, as shown
below:

SAP HANA Performance Guide for Developers


224 PUBLIC Optimization Features in Calculation Views
b. Request a join cardinality proposal.

The Propose Cardinality button is now grayed out. This is because a proposal is only available when
tables are joined directly. In this example, one table is joined indirectly through a projection.

This example has therefore demonstrated that join cardinality proposals are only supported when tables
are directly involved in the join and are not passed up through other nodes.

7.2.2.3 Impact of Join Cardinality on Pruning Decisions

The following examples demonstrate how setting the join cardinality in calculation view joins can lead to join
pruning and thus better performance.

Related Information

Example 1: All pruning conditions are satisfied [page 226]


Example 2: Additional field is requested from the right table [page 227]
Example 3: Different cardinality settings [page 228]
Example 4: Requested measures [page 230]

SAP HANA Performance Guide for Developers


Optimization Features in Calculation Views PUBLIC 225
7.2.2.3.1 Example 1: All pruning conditions are satisfied

Join pruning can occur when all pruning conditions are satisfied.

Procedure

1. Manually set the cardinality of the calculation view developed earlier in Get Join Cardinality Proposals to
1..1 and add all fields so you can build the calculation view.
2. Make sure that the EMPLOYEE_1 field comes from the left table. If this is not the case, for example,
because you built the model in a slightly different way, change the column mapping in the join node so that
EMPLOYEE_1 refers only to the left table.
3. After the calculation view has been built, start debugging by selecting the Semantics node and then
choosing the Debug button:

4. Replace the default debug query with a query that only requests fields from the left table and run the query
(you probably need to change the calculation view name):

SELECT
"SALESORDER",

SAP HANA Performance Guide for Developers


226 PUBLIC Optimization Features in Calculation Views
"EMPLOYEE_1",
SUM("AMOUNT") AS "AMOUNT"
FROM
"JOINCARDINALITY_1"."joinCardinalityExample.db.CVs::joinCardinalityNoEstimate"
GROUP BY
"SALESORDER","EMPLOYEE_1"

The executed debug query shows the pruned nodes and fields in gray:

Based on the grayed-out objects, you can conclude that the employees table was pruned away.

Pruning could occur because all of the following conditions were met:

• No fields were requested from Employees (EMPLOYEE_1 comes from the left table).
• Only left-outer joins were involved, which meant that records could not be removed by executing the
join.
• The cardinality was set such that the number of records would not be expected to increase by
executing the join.

7.2.2.3.2 Example 2: Additional field is requested from the


right table

If you include fields from the right table (for example, the MANAGER field), join pruning cannot occur, because
the join has to be executed to associate the records from both tables.

Procedure

To test this, run the following debug query that additionally requests the MANAGER field from the right table:

SELECT

SAP HANA Performance Guide for Developers


Optimization Features in Calculation Views PUBLIC 227
"SALESORDER",
"EMPLOYEE_1",
"MANAGER",
SUM("AMOUNT") AS "AMOUNT"
FROM
"JOINCARDINALITY_1"."joinCardinalityExample.db.CVs::joinCardinalityNoEstimate"
GROUP BY
"SALESORDER","EMPLOYEE_1","MANAGER"

As you can see, this debug query prevents pruning. The Employees table is not grayed out, confirming that it
was not pruned:

7.2.2.3.3 Example 3: Different cardinality settings

To see more demonstrations of the impact of the cardinality setting on join pruning, set the cardinality to the
following values sequentially before running the debug queries below:

1. n..1
2. 1..n
3. Leave it empty

Don’t forget to save and build the calculation view each time.

SAP HANA Performance Guide for Developers


228 PUBLIC Optimization Features in Calculation Views
Procedure

1. Run the following debug query, which only accesses the left table:

SELECT
"SALESORDER",
"EMPLOYEE_1",
SUM("AMOUNT") AS "AMOUNT"
FROM
"JOINCARDINALITY_1"."joinCardinalityExample.db.CVs::joinCardinalityNoEstimate"
GROUP BY
"SALESORDER","EMPLOYEE_1"

You should see that pruning only occurs with a left-outer join and n..1 (or 1..1 from before). As
discussed above, inner joins prevent pruning.
2. In the same way for right-outer joins, only 1..1 and 1..n work. To check, swap the join sides by choosing
the Swap Table button.
3. Define a right outer join with the cardinality 1..n:

4. Run the following debug query, for example:

SELECT
"EMPLOYEE_1",
SUM("AMOUNT") AS "AMOUNT"
FROM
"JOINCARDINALITY_1"."joinCardinalityExample.db.CVs::joinCardinalityRightOuter"
GROUP BY
"EMPLOYEE_1"

Join pruning should now occur.

SAP HANA Performance Guide for Developers


Optimization Features in Calculation Views PUBLIC 229
7.2.2.3.4 Example 4: Requested measures

This example shows how the type of requested measure influences the outcome. In the previous examples, join
pruning only worked when there was an outer-join setting and the table to be pruned had a cardinality of ..1.

Context

In this example, a cardinality of n..m also allows pruning to occur when the query only requests a count
distinct measure. The reason for this is that count distinct removes any potential duplicates and the
outer join ensures that no records are lost. In SUM, since no records can be lost due to the outer join and
potential duplicate records do not influence the outcome of the count distinct measure, executing the join
does not influence the value of the measure. Therefore, the join can be omitted if no fields are requested from
the table to be pruned.

Procedure

1. For a working example, define a calculated column of type count distinct on the EMPLOYEE_1 field:

SAP HANA Performance Guide for Developers


230 PUBLIC Optimization Features in Calculation Views
2. Set the cardinality to n..m:

SAP HANA Performance Guide for Developers


Optimization Features in Calculation Views PUBLIC 231
Debug queries that only request the count distinct measure and fields from the left table should now
show table pruning.
3. Instead of running the query in debug view, use the Database Explorer to check whether join pruning
occurs. Open the Database Explorer by clicking the Database Explorer button on the left.
4. Open an SQL console and enter the debug query, for example:

SELECT
"EMPLOYEE_1",
SUM("COUNTDISTINCT") AS "COUNTDISTINCT"
FROM
"JOINCARDINALITY_1"."joinCardinalityExample.db.CVs::JoinEstimateCardinalityCou
ntDistinct"
GROUP BY "EMPLOYEE_1"

5. From the Run dropdown menu, choose Analyze SQL.

On the Operators tab, you should see that only one table was used. For example:

SAP HANA Performance Guide for Developers


232 PUBLIC Optimization Features in Calculation Views
This demonstrates that join pruning also occurs with a cardinality setting of n..m when the only measure
used is of type count distinct.
6. Add an additional AMOUNT measure to the query to see that two tables are then involved:
a. Open an SQL console and enter the debug query, for example:

SELECT "EMPLOYEE_1", SUM("COUNTDISTINCT") AS "COUNTDISTINCT",


SUM("AMOUNT") AS "AMOUNT"
FROM
"JOINCARDINALITY_1"."joinCardinalityExample.db.CVs::JoinEstimateCardinality
CountDistinct"
GROUP BY "EMPLOYEE_1"

b. From the Run dropdown menu, choose Analyze SQL.

On the Operators tab, you should see that both tables are used and no join pruning occurs. For
example:

7.2.2.4 Impact of Incorrect Join Cardinality Settings

The three examples in this section illustrate how the resulting values of measures are affected depending on
whether join pruning occurs when an incorrect cardinality is defined. Each example shows different reasons for
the incorrect cardinalities: wrong from the outset, after remodeling, and after loading additional data.

Related Information

Example 1: Wrong from the Outset [page 234]


Example 2: Remodeling [page 236]
Example 3: Loading Additional Data [page 239]

SAP HANA Performance Guide for Developers


Optimization Features in Calculation Views PUBLIC 233
7.2.2.4.1 Example 1: Wrong from the Outset

This example shows how different results are obtained for the AMOUNT measure when the join is executed
because the cardinality is set incorrectly. In this case, the cardinality does not match the cardinality present in
the data.

Procedure

1. In the calculation view, set the cardinality of a left-outer join between the EMPLOYEE field in the SalesOrders
table and the MANAGER field in the Employees table to 1..1, as shown below:

The defined cardinality of 1..1 is not reflected in the data, which leads to the problems shown in the next
steps.
2. Compare the results of the two queries below:
a. Query on fields from the left table only:

SELECT
"EMPLOYEE_1",
SUM("AMOUNT") AS "AMOUNT"
FROM
"JOINCARDINALITY_1"."joinCardinalityExample.db.CVs::joinCardinalityWrongFro
mOutset"
GROUP BY "EMPLOYEE_1"

SAP HANA Performance Guide for Developers


234 PUBLIC Optimization Features in Calculation Views
b. Query on fields from both tables:

SELECT
"EMPLOYEE_1",
"MANAGER",
SUM("AMOUNT") AS "AMOUNT"
FROM
"JOINCARDINALITY_1"."joinCardinalityExample.db.CVs::joinCardinalityWrongFro
mOutset"
GROUP BY "EMPLOYEE_1","MANAGER"

Result: Query on Fields from the Left Table Only Result: Query on Fields from Both Tables

EMPLOYEE_1 AMOUNT EMPLOYEE_1 MANAGER AMOUNT

Donald 20 Donald NULL 20

Alice 22 Alice Alice 66

Hans 40 Hans Hans 80

Horst 33 Horst NULL 33

Different values are obtained for the AMOUNT measure for both Alice and Hans:

• In the first query, join pruning can occur because all fields come from the left table and the cardinality
setting allows it. Because the join is not executed, the multiple entries matching Alice in the right table
are not found:

Left Table: SalesOrders Right Table: Employees

SALESORDER EMPLOYEE AMOUNT MANAGER EMPLOYEE

001 Donald 20 null Alice

002 Alice 22 Alice Donald

003 Hans 40 Alice Dagobert

004 Horst 33 Alice Hans

Hans Heinz

Hans Jane

? Horst

• In the second query, a join is forced because the MANAGER field is queried from the right table. In this
case, join execution leads to the record for Alice being tripled in the left table, and the result for the
AMOUNT field is 3*22=66. Similarly, Hans has two matching entries in the right table (Hans has two
employees) and therefore the AMOUNT measure results in 2*40=80.

SAP HANA Performance Guide for Developers


Optimization Features in Calculation Views PUBLIC 235
7.2.2.4.2 Example 2: Remodeling

Examples of remodeling include adding another field, changing aggregations into projections, and so on.

Context

In the example below, the model firstly gives the same values for AMOUNT irrespective of whether join pruning
occurs. After an additional field is included with a keep flag, the AMOUNT for Alice and Hans changes depending
on whether join pruning occurs.

Procedure

1. Use a copy of the previous calculation view and insert an aggregation node before the join. The aggregation
node maps only the MANAGER field (therefore removing the EMPLOYEE field):

When only the MANAGER field is mapped in the aggregation node, multiple values for the same manager are
condensed into one value by the aggregation node.
2. Run the two SQL statements below. The first statement allows join pruning to occur as it only requests
fields from the left table. The second statement prevents join pruning by also selecting the MANAGER field
from the right table:
a. Query on fields from the left table only:

SELECT
"EMPLOYEE_1",
SUM("AMOUNT") AS "AMOUNT"

SAP HANA Performance Guide for Developers


236 PUBLIC Optimization Features in Calculation Views
FROM
"JOINCARDINALITY_1"."joinCardinalityExample.db.CVs::joinCardinalityWrongAft
erRemodeling"
GROUP BY "EMPLOYEE_1"

b. Query on fields from both tables:

SELECT
"EMPLOYEE_1",
"MANAGER",
SUM("AMOUNT") AS "AMOUNT"
FROM
"JOINCARDINALITY_1"."joinCardinalityExample.db.CVs::joinCardinalityWrongAft
erRemodeling"
GROUP BY "EMPLOYEE_1","MANAGER"

Result: Query on Fields from the Left Table Only Result: Query on Fields from Both Tables

EMPLOYEE_1 AMOUNT EMPLOYEE_1 MANAGER AMOUNT

Donald 20 Donald NULL 20

Alice 22 Alice Alice 22

Hans 40 Hans Hans 40

Horst 33 Horst NULL 33

The values for AMOUNT are the same irrespective of whether join pruning occurs. The reason is that the
aggregation on the MANAGER join field ensures that only distinct values enter the join from the right. Your
cardinality setting of 1..1 is therefore correct.

3. Now also map the EMPLOYEE field in the aggregation node and set the Keep flag to make sure that the
EMPLOYEE field is not pruned away if it is not requested:

4. Run the same queries again. Alice and Hans should now have different values for AMOUNT depending on
whether join pruning occurs:
a. Query on fields from the left table only:

SELECT
"EMPLOYEE_1",
SUM("AMOUNT") AS "AMOUNT"
FROM
"JOINCARDINALITY_1"."joinCardinalityExample.db.CVs::joinCardinalityWrongAft
erRemodeling"
GROUP BY "EMPLOYEE_1"

SAP HANA Performance Guide for Developers


Optimization Features in Calculation Views PUBLIC 237
b. Query on fields from both tables:

SELECT
"EMPLOYEE_1",
"MANAGER",
SUM("AMOUNT") AS "AMOUNT"
FROM
"JOINCARDINALITY_1"."joinCardinalityExample.db.CVs::joinCardinalityWrongAft
erRemodeling"
GROUP BY "EMPLOYEE_1","MANAGER"

Result: Query on Fields from the Left Table Only Result: Query on Fields from Both Tables

EMPLOYEE_1 AMOUNT EMPLOYEE_1 MANAGER AMOUNT

Donald 20 Donald NULL 20

Alice 22 Alice Alice 66

Hans 40 Hans Hans 80

Horst 33 Horst NULL 33

Different results occur when the EMPLOYEE field is kept in the aggregation, leading to multiple occurrences
of the same value for the MANAGER field.

This shows that a change to your model (adding another field) below the join node might mean that
your right table entries are no longer unique when the join node is reached, which violates the cardinality
setting. Consequently, the values for AMOUNT depend on whether join pruning occurs.

5. Using the same calculation view, replace the aggregation with a projection:

SAP HANA Performance Guide for Developers


238 PUBLIC Optimization Features in Calculation Views
When the aggregation node is changed to a projection node, the MANAGER field no longer provides
exclusively distinct values to the join.
6. Run the two queries below and compare their results:
a. Query on fields from the left table only:

SELECT
"EMPLOYEE_1",
SUM("AMOUNT") AS "AMOUNT"
FROM
"JOINCARDINALITY_1"."joinCardinalityExample.db.CVs::joinCardinalityWrongAft
erRemodeling"
GROUP BY "EMPLOYEE_1"

b. Query on fields from both tables:

SELECT
"EMPLOYEE_1",
"MANAGER",
SUM("AMOUNT") AS "AMOUNT"
FROM
"JOINCARDINALITY_1"."joinCardinalityExample.db.CVs::joinCardinalityWrongAft
erRemodeling"
GROUP BY "EMPLOYEE_1","MANAGER"

Result: Query on Fields from the Left Table Only Result: Query on Fields from Both Tables

EMPLOYEE_1 AMOUNT EMPLOYEE_1 MANAGER AMOUNT

Donald 20 Donald NULL 20

Alice 22 Alice Alice 66

Hans 40 Hans Hans 80

Horst 33 Horst NULL 33

When the aggregation node is changed to a projection node, join pruning has an impact on the resulting
values for the AMOUNT measure. Therefore, any changes to the model below the join node, such as
changing an aggregation into a projection, might result in a violation of your join cardinality setting.

7.2.2.4.3 Example 3: Loading Additional Data

Like the second example, the third example also starts with a correct cardinality setting.

Procedure

1. Use the first calculation view without any additional nodes, but replace the Employees table with a copy of
the table. A copy of the table is used so that it can be modified without influencing the other models built
earlier:

SAP HANA Performance Guide for Developers


Optimization Features in Calculation Views PUBLIC 239
The join is defined on the EMPLOYEE field on both sides and the cardinality setting of 1..1 reflects the true
data cardinality. Now also query the EMPLOYEE field in the right table to check the impact of join pruning.
2. Run the two queries below and compare their results:
a. Query on fields from the left table only:

SELECT
"EMPLOYEE_1",
SUM("AMOUNT") AS "AMOUNT"
FROM
"JOINCARDINALITY_1"."joinCardinalityExample.db.CVs::joinCardinalityWrongAft
erLoading"
GROUP BY "EMPLOYEE_1"

b. Query on fields from both tables:

SELECT
"EMPLOYEE_1",
"EMPLOYEE",
SUM("AMOUNT") AS "AMOUNT"
FROM
"JOINCARDINALITY_1"."joinCardinalityExample.db.CVs::joinCardinalityWrongAft
erLoading"
GROUP BY "EMPLOYEE_1","EMPLOYEE"

Result: Query on Fields from the Left Table Only Result: Query on Fields from Both Tables

EMPLOYEE_1 AMOUNT EMPLOYEE_1 MANAGER AMOUNT

Donald 20 Donald Donald 20

Alice 22 Alice Alice 22

Hans 40 Hans Hans 40

Horst 33 Horst Horst 33

As expected, when the cardinality setting in the calculation view reflects the data cardinality, the same
values are returned for AMOUNT irrespective of whether join pruning occurs.

3. Imagine that Donald gets a second manager, Herbert, and add this record to the CopyOfEmployees table:

SAP HANA Performance Guide for Developers


240 PUBLIC Optimization Features in Calculation Views
MANAGER EMPLOYEE

null Alice

Alice Donald

Alice Dagobert

Alice Hans

Hans Heinz

Hans Jane

? Horst

Herbert Donald

Due to this change, your cardinality setting of 1..1 for the EMPLOYEE field is no longer reflected in the
data. Consequently, you will see different values for Donald’s AMOUNT depending on whether the join is
executed.

4. Run the two queries below and compare their results:


a. Query on fields from the left table only:

SELECT
"EMPLOYEE_1",
SUM("AMOUNT") AS "AMOUNT"
FROM
"JOINCARDINALITY_1"."joinCardinalityExample.db.CVs::joinCardinalityWrongAft
erLoading"
GROUP BY "EMPLOYEE_1"

b. Query on fields from both tables:

SELECT
"EMPLOYEE_1",
"EMPLOYEE",
SUM("AMOUNT") AS "AMOUNT"
FROM
"JOINCARDINALITY_1"."joinCardinalityExample.db.CVs::joinCardinalityWrongAft
erLoading"
GROUP BY "EMPLOYEE_1","EMPLOYEE"

Result: Query on Fields from the Left Table Only Result: Query on Fields from Both Tables

EMPLOYEE_1 AMOUNT EMPLOYEE_1 MANAGER AMOUNT

Donald 20 Donald Donald 40

Alice 22 Alice Alice 22

Hans 40 Hans Hans 40

Horst 33 Horst Horst 33

As shown above, inconsistent results are returned when additional data leads to a difference between
the cardinality setting and the data cardinality. Donald’s AMOUNT varies depending on whether the join is
executed. This is a consequence of the cardinality setting, which was initially correct but became incorrect
after additional data was loaded.

SAP HANA Performance Guide for Developers


Optimization Features in Calculation Views PUBLIC 241
7.3 Optimizing Join Columns

By explicitly allowing join columns to be optimized, you can potentially reduce the number of records that need
to be processed at later stages of query processing, resulting in better performance and reduced resource
consumption.

This is because aggregation can then occur at an earlier stage of query processing, which allows the number of
records to be reduced early on. This is particularly relevant in scenarios in which there are many joins defined
on different fields, but the columns requested by the query are from just a few of the join partners. In these
cases, taking the join field out of the aggregation level might allow the number of records that need to be
processed further to be significantly reduced.

Note this is not the default query behavior because when columns are removed, the aggregation behavior of
the query could be changed, which could lead to measures having different resulting values. Therefore, by
default, all columns on which joins are defined are requested during query processing, irrespective of whether
they are required by the query, to ensure consistent aggregation behavior.

7.3.1 Optimize Join Columns Option

There may be cases where join columns can be omitted. Consider a scenario in which a query requests
fields from only one join partner, the cardinality to the other join partner (from which none of the fields are
requested) is set to 1, and the join is not an inner join.

In this case, executing the join does not affect the result. Since the join does not have to be executed and the
join field has not been requested by the query, it is a candidate for pruning. By default, it will not be pruned, but
if the Optimize Join Columns option is set, it can be removed.

In cases where there are multiple join partners on different join fields and where queries typically only select
fields from a small subset of join partners, the Optimize Join Columns option can be used to allow various join
fields to be excluded from the aggregation. This is because the option explicitly states that the join fields do not
have to be included in the aggregation. By omitting the join field and thus omitting a field that might otherwise
increase the granularity of the aggregation, the number of records that need to be further processed can be
significantly reduced. As a result, better performance and lower memory consumption can be achieved. The
extent of improvement depends on which fields the queries request at runtime and on which fields the joins are
defined.

However, when using the Optimize Join Columns option, you should be aware of the following side effects:

• An incorrect cardinality setting might lead to the wrong result, from your perspective (technically, the
optimizer has behaved correctly). The use of the Optimize Join Columns option requires that the cardinality
is set correctly (see the prerequisites in this section).
• When the Optimize Join Columns option is set, the optimizer changes the aggregation level by excluding
the join column from the aggregation, if possible. Whether the join column can be excluded also depends
on which fields are queried. This means that the inclusion or exclusion of the join column in the aggregation
level varies depending on the fields that are requested by the query. If you have aggregation functions for
which the aggregation level matters (such as finding the maximum), the results can change depending on
the aggregation level. This is demonstrated by the example in this section.

SAP HANA Performance Guide for Developers


242 PUBLIC Optimization Features in Calculation Views
7.3.2 Prerequisites for Pruning Join Columns
Join column pruning can only occur if all the following conditions are met:

• The join field is not requested by the query.


• Only fields from one of the join partners are requested by the query.
• The join is either an outer join, referential join, or text join.
• The cardinality to the join partner from which none of the fields are requested is set to 1.

The reasons for these prerequisites are as follows:

• If a join field is requested by the query, it cannot simply be excluded.


• If fields are requested from both join partners, the join must be executed and therefore the join field must
be requested.
• If the join is not an outer, referential, or text join, the join might add or delete records. The example (below)
shows the difference between inner and outer joins in terms of preserving the number of records.
• If the table to be pruned might deliver more than one matching partner, join pruning should not occur.
Therefore, the cardinality of the table that does not deliver any fields needs to be set to 1. However, the
optimizer does not check at runtime whether more than one matching partner could occur, so it is your
responsibility to ensure that your cardinality setting of 1 reflects the actual data cardinality.

Example: The effect of inner joins and outer joins on the number of records

The example below shows the difference between inner and outer joins in terms of preserving the number of
records.

Table 1 has the following entries in its join field:

a,b,c

Table 2 has the following entries in its join field:

a,a,a,b

You can create the tables as follows:

create column table leftTable (joinField NVARCHAR(10), measure Int);


create column table rightTable (joinField NVARCHAR(10));
insert into leftTable values ('a',10);
insert into leftTable values ('b',20);
insert into leftTable values ('c',30);
insert into rightTable values ('a');
insert into rightTable values ('a');
insert into rightTable values ('a');
insert into rightTable values ('b');

Inner join
When an inner join is used, only the entries that have a matching partner are retrieved.

Run the following query:

select

SAP HANA Performance Guide for Developers


Optimization Features in Calculation Views PUBLIC 243
l.joinfield "leftJoinfield",
l.measure "measure",
r.joinfield "rightJoinfield"
from
leftTable l
inner join
rightTable r
on
l.joinField=r.joinField

The output is as follows:

leftJoinField measure rightJoinField

a 10 a

a 10 a

a 10 a

b 20 b

The output contains the entries a,a,a,b from table 1. The record with the value a from table 1 is tripled
because there are three matching entries in table 2. The entry c is omitted because there are no matching
entries in table 2.

This shows that an inner join can result in a changed number of output records depending on whether the
join is executed. In this example, the number of output records in the left table changes from 3 to 4, but this
number could be higher or lower depending on the entries in the right table.

Outer join

When an outer join is used, entries without a matching entry in the right table are retrieved as well.

Run the following query:

select
l.joinfield "leftJoinfield",
l.measure "measure",
r.joinfield "rightJoinfield"
from
leftTable l
left outer join
rightTable r
on
l.joinField=r.joinField

This time the output is as follows:

leftJoinField measure rightJoinField

a 10 a

a 10 a

a 10 a

b 20 b

c 30 ?

You can see that the entry c is not omitted even though there is no match in the right table. To this extent,
outer joins preserve the number of output records. However, you can also see that the entry a has once again

SAP HANA Performance Guide for Developers


244 PUBLIC Optimization Features in Calculation Views
been tripled. This means that the number of output records can change if there is more than one matching
entry in the right table. Therefore, it is important to ensure that the correct cardinality is specified (the fourth
prerequisite above).

By setting the cardinality, for example, to n:1 so that there is at most one match in the right table, entries will
not be duplicated when the join is executed, because every entry in the left table has at most one entry in the
right table. If an entry in the left table does not have a matching entry in the right table, this is not a problem,
because the value will still be kept, as can be seen in the case of entry c above.

This means that when a cardinality of 1 is set for the right table, the execution of the join does not influence the
number of records that are processed further, and the join can therefore be omitted if there are no fields that
must be read from the right table.

Note that if you use the wrong cardinality, the output might be incorrect because pruning of the join might
occur even though the join would influence the number of records returned for further processing.

7.3.3 Example

This example shows the impact of the Optimize Join Columns option on query processing. In the example, a
simple calculation view is modeled firstly with the Optimize Join Columns option set and secondly without it
set.

The debug mode is used to check that join pruning occurs and that the intermediate aggregation level is
changed when the option is set. In addition, an aggregation function is used that is affected by the level of
aggregation. It is used to demonstrate how the fields requested in the query can change the resulting values of
the measures.

The example consists of the following steps:

1. Create the Tables [page 245]


2. Create the Calculation View [page 246]
3. Run the Queries [page 249]
4. Debug the Queries [page 250]
5. Analyze the Queries [page 253]
6. Change the Requested Fields [page 254]

7.3.3.1 Create the Tables

Create the tables needed for this example.

Table 11: ItemsMD Table


ITEM DESCRIPTION

001 Diapers

002 Diapers

003 Milk

SAP HANA Performance Guide for Developers


Optimization Features in Calculation Views PUBLIC 245
ITEM DESCRIPTION

004 Wine

 Sample Code

INSERT INTO "ItemsMD" VALUES ('001','Diapers');


INSERT INTO "ItemsMD" VALUES ('002','Diapers');
INSERT INTO "ItemsMD" VALUES ('003','Milk');
INSERT INTO "ItemsMD" VALUES ('004','Wine');

Table 12: SalesItems Table


ITEM EMPLOYEE AMOUNT

001 Donald 10

002 Donald 50

003 Alice 40

004 Dagobert 21

 Sample Code

INSERT INTO "SalesItems" VALUES ('001','Donald',10);


INSERT INTO "SalesItems" VALUES ('002','Donald',50);
INSERT INTO "SalesItems" VALUES ('003','Alice',40);
INSERT INTO "SalesItems" VALUES ('004','Dagobert',21);

7.3.3.2 Create the Calculation View

Create the calculation view needed for this example.

Procedure

1. Create a calculation view:


a. Add an aggregation node and then add the SalesItems table to it as a data source.
b. Map all columns to the output and select the aggregation type MAX for the AMOUNT column.

SAP HANA Performance Guide for Developers


246 PUBLIC Optimization Features in Calculation Views
The calculation view should now look like this:

2. Add a join node:


a. Connect the join node to both aggregation nodes.
b. Join the ItemsMD table on the ITEM column with a left-outer join. Use the aggregation node as the left
join partner
c. Keep the cardinality setting n..m.
d. Select the Optimize Join Columns checkbox.

The view should not look like this:

3. Map all columns to the output except the ITEM column from the ItemsMD table.
4. In the higher-level aggregation node, map all columns to the output
5. In the semantics node, change the aggregation type of the AMOUNT column to SUM:

SAP HANA Performance Guide for Developers


Optimization Features in Calculation Views PUBLIC 247
6. Try to build the calculation view.

The build fails. An error message states that you need a cardinality setting of N:1 or 1:1 when using the
Optimize Join Columns option. This tells you that not all prerequisites for the Optimize Join Columns option
have been met. The n..m cardinality setting has therefore prevented join pruning from occurring.
7. Set the cardinality to n:1. As a best practice, check that this cardinality reflects the data cardinality and
that there is never more than one matching entry for the join column item in the right table. In a real
scenario, you might want to enforce this by applying some underlying loading logic.

SAP HANA Performance Guide for Developers


248 PUBLIC Optimization Features in Calculation Views
8. Build the view again.

7.3.3.3 Run the Queries

Test the calculation view by running the queries below with and without the Optimize Join Columns option set.

Procedure

1. With the Optimize Join Columns option still selected, run the following query, which requests fields from
SalesItem table only:

SELECT
"EMPLOYEE",
SUM("AMOUNT") AS "AMOUNT"
FROM
"OPTIMIZEJOINCOLUMN_HDI_DB_1"."exampleOptimizeJoinColumns"
GROUP BY "EMPLOYEE"

You should see the following output:

SAP HANA Performance Guide for Developers


Optimization Features in Calculation Views PUBLIC 249
EMPLOYEE AMOUNT

Donald 50

Alice 40

Dagobert 21

2. Deselect the Optimize Join Columns option and build the view again.
3. Run the same query as above.

You should see different results:

EMPLOYEE AMOUNT

Donald 60

Alice 40

Dagobert 21

Results

The amount for Donald differs depending on whether the Optimize Join Columns option is set. The reason for
this is that in the first query the join field ITEM is excluded (pruned) because the prerequisites outlined above
are met:

• The join field ITEM is not requested by the query.


• Only the fields from one join partner are requested; the only fields requested are EMPLOYEE and AMOUNT
and they both come from the left table.
• The join is a left outer join.
• The cardinality to the join partner from which none of the fields are requested was set to 1.

The pruning of the join field ITEM changes the aggregation level for the maximum calculation that was defined
in the aggregation node.

7.3.3.4 Debug the Queries

For a better understanding of why the values changed, run the queries above in debug mode.

Procedure

1. Open the calculation view. The Optimize Join Columns option should still be deselected. To start
debugging, select the Semantics node and then choose the Debug button.
2. Replace the default query with the query previously used:

SELECT

SAP HANA Performance Guide for Developers


250 PUBLIC Optimization Features in Calculation Views
"EMPLOYEE",
SUM("AMOUNT") AS "AMOUNT"
FROM
"OPTIMIZEJOINCOLUMN_HDI_DB_1"."exampleOptimizeJoinColumns"
GROUP BY "EMPLOYEE"

3. Click Execute to run the query.

You should see the following:


• The join node is grayed out. This indicates that join pruning has occurred:

• The debug query of the lower-level aggregation node contains the ITEM column even though ITEM was
not requested in the end-user query:

SELECT "EMPLOYEE", "ITEM", "AMOUNT" FROM


"OPTIMIZEJOINCOLUMN_HDI_DB_1"."exampleOptimizeJoinColumns/dp
/Aggregation_1"('PLACEHOLDER'=('$$client$$','$$client$$'),
'PLACEHOLDER'=('$$language$$','$$language$$'))

4. Execute the debug query of the lower-level aggregation node.

You should see the following intermediate values:

EMPLOYEE ITEM AMOUNT

Donald 001 10

Donald 002 50

Alice 003 40

Dagobert 004 21

5. Exit the debug session, select the Optimize Join Columns option, and build the calculation view again.

SAP HANA Performance Guide for Developers


Optimization Features in Calculation Views PUBLIC 251
6. Start the debug session as described above and execute the same debug query:

SELECT
"EMPLOYEE",
SUM("AMOUNT") AS "AMOUNT"
FROM
"OPTIMIZEJOINCOLUMN_HDI_DB_1"."exampleOptimizeJoinColumns"
GROUP BY "EMPLOYEE"

You should see the following:


• Like before, the join node is grayed out. This indicates that join pruning has occurred.
• This time the debug query of the lower-level aggregation node does not contain the ITEM column.
This shows that the Optimize Join Columns feature excludes the ITEM join column at an early stage of
processing.
7. Execute the debug query of the lower-level aggregation node.

You should see the following intermediate values:

EMPLOYEE AMOUNT

Donald 50

Alice 40

Dagobert 21

Results

The maximum values are calculated based on the intermediate values of the lower-level aggregation node.
When the Optimize Join Columns option is not selected, the ITEM column is kept in the aggregation granularity
and the maximum calculation therefore involves both of Donald's records (item 001 and 002), leading to
max(10)+max(50)=60. The summing stems from the fact that you selected the aggregation mode SUM in the
semantics node. In contrast, when the Optimize Join Columns option is selected, just one record is calculated
for Donald: max(10,50)=50.

You therefore see different results for Donald depending on whether the join column is omitted. If you had
not used MAX as the aggregation function but, for example, SUM in the aggregation node, you would not see
the impact of join column pruning. That is because SUM is not sensitive to the aggregation level on which it is
calculated.

Note that you see fewer records at the intermediate nodes when the Optimize Join Columns option is set
compared to when it is not set (3 versus 4 records in this example). In this example the impact is small, but if
you have many join fields that could be pruned and the join fields also have high cardinalities, pruning might
have a substantial impact on the level of aggregation and thus the number of records that need to be processed
further. This means that performance benefits can be achieved when the Optimize Join Columns option is used
correctly.

SAP HANA Performance Guide for Developers


252 PUBLIC Optimization Features in Calculation Views
7.3.3.5 Analyze the Queries
You can use the Database Explorer to run the queries above in SQL analysis mode and check whether the join
field is omitted.

Procedure

1. Check the calculation view with the Optimize Join Columns option selected. To do so, open an SQL console
and enter the query to be run. For example:

SELECT
"EMPLOYEE",
SUM("AMOUNT") AS "AMOUNT"
FROM
"OPTIMIZEJOINCOLUMN_HDI_DB_1"."exampleOptimizeJoinColumns"
GROUP BY "EMPLOYEE"

2. From the Run dropdown menu, choose Analyze SQL.


3. On the Operators tab, check the Aggregation details. For example:

4. Repeat the steps above on the same calculation view but this time without the Optimize Join Columns
option set.
5. Again, on the Operators tab, check the Aggregation details. For example:

Results

The query on the calculation view with the Optimize Join Columns option set does not contain the ITEM field in
the aggregation level. However, the query on the calculation view without the Optimize Join Columns option set

SAP HANA Performance Guide for Developers


Optimization Features in Calculation Views PUBLIC 253
does contain the ITEM field in the grouping. Therefore, as this example confirms, join column pruning has been
effective.

7.3.3.6 Change the Requested Fields

Finally, to demonstrate how join column pruning can change the resulting measure values depending on the
fields requested, request one of the fields (in this example, DESCRIPTION) from the right table so that join
pruning and therefore join column pruning cannot occur.

Procedure

Run the following query with the Optimize Join Columns option set:

SELECT
"DESCRIPTION",
"EMPLOYEE",
SUM("AMOUNT") AS "AMOUNT"
FROM
"OPTIMIZEJOINCOLUMN_HDI_DB_1"."exampleOptimizeJoinColumns"
GROUP BY "DESCRIPTION"",EMPLOYEE"

Result: Query with the DESCRIPTION Field Result: Query without the DESCRIPTION Field

DESCRIPTION EMPLOYEE AMOUNT EMPLOYEE AMOUNT

Diapers Donald 60 Donald 50

Wine Dagobert 21 Alice 40

Milk Alice 40 Dagobert 21

The results are different when the additional column is requested from the right table and the Optimize Join
Columns option is set.

7.4 Using Dynamic Joins

Dynamic joins help improve the join execution process by reducing the number of records processed by the join
view node at runtime.

A dynamic join is a special join that comprises more than one join field. Which fields are evaluated at runtime
is determined based on the requested fields. For example, Table1 and Table2 are joined on Field1 and Field2.
However, when only Field1 or only Field2 is requested by the client, the tables (Table1 and Table2) are joined
based only on the requested field (Field1 or Field2).

SAP HANA Performance Guide for Developers


254 PUBLIC Optimization Features in Calculation Views
This is different from the classical join behavior. In the classical join, the join condition is static, meaning that
the join condition does not change according to the client query. In the case of the dynamic join, on the other
hand, the join condition changes dynamically based on the query.

This difference in behavior can lead to differences in the query result sets, so it is important to understand the
implications of using the dynamic join before applying it.

Note that to use dynamic joins, at least one of the fields involved in the join condition that comes from the
first source of the join node must be contained in the requested list of fields of the query. Otherwise, a query
runtime error will occur.

Related Information

Analytical Model Health Check [page 312]

7.4.1 Dynamic Join Example

This simple example shows the effect of the dynamic join and how it differs from the classical join. In this
example, you are evaluating material sales data and want to calculate the sales share of a material.

Dataset

The dataset shown below contains the sales data for materials at region and country level:

Table 13: DynamicSalesShare Table


REGION COUNTRY MATERIAL SALES

APJ CHN PROD1 10

APJ CHN PROD2 10

APJ CHN PROD3 0

APJ IND PROD1 20

APJ IND PROD2 50

APJ IND PROD3 60

EUR DE PROD1 50

EUR DE PROD2 100

EUR DE PROD3 60

EUR UK PROD1 20

EUR UK PROD2 30

EUR UK PROD3 40

SAP HANA Performance Guide for Developers


Optimization Features in Calculation Views PUBLIC 255
Model with a Dynamic Join

In the model below, the dataset shown above is joined using two aggregation nodes (both with the data source
DynamicSalesShare):

SAP HANA Performance Guide for Developers


256 PUBLIC Optimization Features in Calculation Views
Dynamic Join
The two aggregation nodes are joined dynamically on the fields REGION and COUNTRY:

Total Sales
The aggregation node on the right (Aggregation_2) does not contain the Material field as one of its output
columns. Therefore, this node always returns the total sales of a given region or country:

SAP HANA Performance Guide for Developers


Optimization Features in Calculation Views PUBLIC 257
Sales Share
The sales share is defined as a calculated column in the projection node:

Model with a Classical Join

You can build a similar model to the one above using a classical join on the fields REGION and COUNTRY.

Queries

The following SELECT statements illustrate the difference in the join logic of the two joins and the resulting
difference in the query result.

SAP HANA Performance Guide for Developers


258 PUBLIC Optimization Features in Calculation Views
Queries at Country Level

A simple query on the models evaluates the sales share of the materials at country level.

Query on the model with the dynamic join:

SELECT "COUNTRY","MATERIAL",SUM("SALES"),SUM("TOTAL_SALES"),SUM("SALES_SHARE")
FROM
"JOINCARDINALITY_1"."joinCardinalityExample.db.CVs::TestDynamicJoin"
GROUP BY "COUNTRY","MATERIAL"
ORDER BY "COUNTRY","MATERIAL";

COUNTRY MATERIAL SUM(SALES) SUM(TOTAL_SALES) SUM(SALES_SHARE)

CHN PROD1 10 20 0.5

CHN PROD2 10 20 0.5

CHN PROD3 0 20 0

IND PROD1 50 210 0.23

IND PROD2 100 210 0.47

IND PROD3 60 210 0.28

DE PROD1 20 130 0.15

DE PROD2 50 130 0.38

DE PROD3 60 130 0.46

UK PROD1 20 90 0.22

UK PROD2 30 90 0.33

UK PROD3 40 90 0.44

Query on the model with the classical join:

SELECT "COUNTRY","MATERIAL",SUM("SALES"),SUM("TOTAL_SALES"),SUM("SALES_SHARE")
FROM
"JOINCARDINALITY_1"."joinCardinalityExample.db.CVs::TestStaticJoin"
GROUP BY "COUNTRY","MATERIAL"
ORDER BY "COUNTRY","MATERIAL";

COUNTRY MATERIAL SUM(SALES) SUM(TOTAL_SALES) SUM(SALES_SHARE)

CHN PROD1 10 20 0.5

CHN PROD2 10 20 0.5

CHN PROD3 0 20 0

IND PROD1 50 210 0.23

IND PROD2 100 210 0.47

IND PROD3 60 210 0.28

DE PROD1 20 130 0.15

DE PROD2 50 130 0.38

DE PROD3 60 130 0.46

UK PROD1 20 90 0.22

SAP HANA Performance Guide for Developers


Optimization Features in Calculation Views PUBLIC 259
COUNTRY MATERIAL SUM(SALES) SUM(TOTAL_SALES) SUM(SALES_SHARE)

UK PROD2 30 90 0.33

UK PROD3 40 90 0.44

The result sets of both queries are identical irrespective of the join type (dynamic or classical).

Queries at Region Level


When the query is changed to evaluate the sales share at region level (which is basically a higher aggregation
level than country), the difference becomes apparent. The result sets showing the sales share of materials per
region are as follows.

Query on the model with the dynamic join:

SELECT "REGION","MATERIAL",SUM("SALES"),SUM("TOTAL_SALES"),SUM("SALES_SHARE")
FROM
"JOINCARDINALITY_1"."joinCardinalityExample.db.CVs::TestDynamicJoin"
GROUP BY "REGION","MATERIAL"
ORDER BY "REGION","MATERIAL";

REGION MATERIAL SUM(SALES) SUM(TOTAL_SALES) SUM(SALES_SHARE)

APJ PROD1 30 150 0.2

APJ PROD2 60 150 0.4

APJ PROD3 60 150 0.4

EUR PROD1 70 300 0.23

EUR PROD2 130 300 0.43

EUR PROD3 100 300 0.33

Query on the model with the classical join:

SELECT "REGION","MATERIAL",SUM("SALES"),SUM("TOTAL_SALES"),SUM("SALES_SHARE")
FROM
"JOINCARDINALITY_1"."joinCardinalityExample.db.CVs::TestStaticJoin"
GROUP BY "REGION","MATERIAL"
ORDER BY "REGION","MATERIAL";

REGION MATERIAL SUM(SALES) SUM(TOTAL_SALES) SUM(SALES_SHARE)

APJ PROD1 30 150 0.65

APJ PROD2 60 150 0.88

APJ PROD3 60 150 0.46

EUR PROD1 70 300 0.46

EUR PROD2 130 300 0.8

EUR PROD3 100 300 0.73

The dynamic join returns the sales share at region level by aggregating the sales value before joining the
datasets. The classical join model, however, first calculates the sales share at region plus country level
(because the join condition contains both region and country) and then aggregates the resulting sales share
after the join has been executed.

SAP HANA Performance Guide for Developers


260 PUBLIC Optimization Features in Calculation Views
7.4.2 Workaround for Queries Without Requested Join
Attributes

A query on a calculation view with a dynamic join results in an error when none of the columns from the first
source of the join node that are involved in the join are requested.

Context

The example below shows a typical error:

 Output Code

Could not execute 'SELECT SUM("SALES"),SUM("TOTAL_SALES"),SUM("SALES_SHARE")


FROM …'
Error: (dberror) [2048]: column store error: search table error: [34023]
Instantiation of calculation model failed;exception 306116:
At least one join attribute of the dynamic join should be requested on node
'Join_1'

You can handle situations where a model could be queried without any join-relevant columns in the drilldown
(in other words, in the GROUP BY clause) by enhancing the model with a DUMMY column, as described in the
example below.

Procedure

1. Add a calculated column (called DUMMY) to the aggregation nodes on both sides of the join. The column
contains a constant value:

SAP HANA Performance Guide for Developers


Optimization Features in Calculation Views PUBLIC 261
2. Include the DUMMY columns in the join condition:

3. In the output projection node, set the Keep Flag option on the DUMMY column to ensure that the DUMMY
column is always contained in the drilldown even if it is not explicitly requested:

SAP HANA Performance Guide for Developers


262 PUBLIC Optimization Features in Calculation Views
4. In the semantics node, set the DUMMY column to Hidden to avoid confusion when the calculation view is
used (the DUMMY column has been created purely for technical reasons):

SAP HANA Performance Guide for Developers


Optimization Features in Calculation Views PUBLIC 263
5. Execute the same query that previously resulted in an error, as shown below:

SELECT SUM("SALES"),SUM("TOTAL_SALES"),SUM("SALES_SHARE")
FROM
"JOINCARDINALITY_1"."joinCardinalityExample.db.CVs::TestDynamicJoinNoAttribute
s";

SUM(SALES) SUM(TOTAL_SALES) SUM(SALES_SHARE)

450 450 1

7.5 Union Node Pruning

Pruning data in union view nodes helps to optimize query execution.

To specify the pruning conditions for data sources, you have the following options:

• Constant mapping
Define a constant with specific values for each data source, and select the appropriate value in your query.
Constant mapping has the advantage that it can result in slightly improved performance compared to
using a configuration table.
• Pruning configuration table
Use a table or view to specify data slices contained in the individual data sources. At query runtime, this
information is matched to the query filter. As a result, only the required data sources are processed.
Using a pruning configuration table offers greater flexibility, but can slightly impact performance compared
to union node pruning based on constant mapping.

Unfolding

Unfolding is normal query execution behavior where query execution is passed to the SQL engine or the
optimizer after the calculation engine instantiates the query. Unfolding is not always possible for complex
calculation views.

If unfolding happens, successful pruning based on configuration tables requires a query that requests
aggregations.

Generally, it is not recommended to disable unfolding. However, if the queries executed on the union view node
unfold and do not perform any aggregation, no data is pruned. In such cases, to allow union pruning, disable
the unfolding behavior using the hint NO_CALC_VIEW_UNFOLDING.

Example

Assume that you are creating a sales report for a product across the years using a calculation view. The
calculation view consists of two data sources, one with the current sales data (for YEAR >= 2020) and

SAP HANA Performance Guide for Developers


264 PUBLIC Optimization Features in Calculation Views
the other with the archived sales data (YEAR <= 2019). Both are provided as input to a union node in the
calculation view, as shown below:

In this scenario, you want to query the calculation view to get the results of CURRENT_SALES only. You
therefore want the query to be executed on the CURRENT_SALES data source only, and the operation on the
archived data source to be excluded (that is, pruned). To do so, you have the following options:

• You provide a pruning definition in a pruning configuration table.


• You create a constant that maps the data sources to specific constant values.

 Note

If a union node is consumed by a rank node, query filters are by default not pushed down to it when
the query is executed. This applies to filter expressions and variables used in a WHERE clause. However,
the union node needs the filters to be able to apply pruning. Therefore, it only makes sense to use a
combination of rank node and union node with pruning if the Allow Filter Push Down option has been
activated for the rank node. If pruning is based on a configuration table and the view is unfolded (the typical
case), you should ensure that the front-end tool uses a GROUP BY clause in its query because otherwise
pruning will not occur.

 Note

When calculation views are called from SQLScript, SQLScript variables can be passed as parameters or as
literal values. The SQL optimizer determines which option is chosen. For views that are unfolded, however,
union pruning only works with variables that are passed as values.

Therefore, if a variable is relevant for union pruning, it should be passed as a value. An example where a
variable would be relevant for union pruning is if you use union pruning to select data sources based on this
variable.

SAP HANA Performance Guide for Developers


Optimization Features in Calculation Views PUBLIC 265
To override the decision made by the SQLScript optimizer and to ensure that a literal value is passed, you
should enclose the variable with "BIND_AS_VALUE". For example:

SELECT
"SalesOrderId",
"pruningConstant",
SUM("NetAmount") AS "NetAmount"
FROM "UnionPruning"
WHERE "pruningConstant"=BIND_AS_VALUE(:value)
GROUP BY "SalesOrderId", "pruningConstant";

For the same reason, you should avoid passing variables that are relevant for union pruning as bind
variables (for example, WHERE "pruningConstant"=?).

Related Information

Pruning Configuration Table [page 266]


Example with a Pruning Configuration Table [page 267]
Example with a Constant Mapping [page 269]
Example Tables [page 275]
SAP Note 2752070

7.5.1 Pruning Configuration Table

A pruning definition in a pruning configuration table has a specific format.

The format of a pruning configuration table is as follows:

Column Name Description

CALC_SCENARIO Fully qualified name of the calculation view to which union node pruning
is to be applied. For example, p1.p2::CV_PRUNING, where p1.p2 is the
namespace and CV_PRUNING is the name of the calculation view.

INPUT Name of the data source in the union node of the calculation view.

COLUMN Target column name

OPTION Operator (=, <, >, <=, >=, BETWEEN)

LOW_VALUE Filter condition

HIGH_VALUE Leave empty if a high value is not needed

To assign a pruning configuration table to a calculation view, select the Semantics node and choose View
Properties Advanced . Note that if you have not already created a pruning configuration table, you can both
create and build it from here.

SAP HANA Performance Guide for Developers


266 PUBLIC Optimization Features in Calculation Views
7.5.2 Example with a Pruning Configuration Table
In this example, you use a pruning configuration table to specify the filter conditions for pruning the union
node. A matching filter is then applied in the WHERE clause of the query so that only one of the data sources is
processed.

Context

The calculation view is based on the example shown earlier. The Union node is used to combine the two data
sources Current_Sales and Old_Sales. In the procedure below, you query the calculation view to get the current
sales results only.

Procedure

1. Create a pruning definition in a pruning configuration table. For example:

Table 14: UNION_PRUNING Table


CALC_SCENARIO INPUT COLUMN OPTION LOW_VALUE HIGH_VALUE

1 p1.p2::CV_PRUNING OldSales YEAR <= 2019 NULL

2 p1.p2::CV_PRUNING CurrentSales YEAR >= 2020 NULL

 Note

You can also create the table within step 3.

2. In the calculation view, select the Semantics node and choose View Properties Advanced .
3. In the Pruning Configuration Table field, use the search to find and add the pruning configuration table
created above, or choose + to create the table. For example:

SAP HANA Performance Guide for Developers


Optimization Features in Calculation Views PUBLIC 267
The calculation view now contains the pruning configuration table with the two data sources CurrentSales
and OldSales.
4. Build the calculation view.
5. Execute a query on the calculation view with a filter condition that is the same as the condition defined in
the pruning configuration table. For example:

SELECT
"SALESORDER",
"PRODUCT",
"YEAR",
SUM("SALES") AS "SALES"
FROM "JOINCARDINALITYEXAMPLE_HDI_DB_1"."p1.p2::CV_PRUNING"
WHERE "YEAR" >= '2020'
GROUP BY "SALESORDER", "PRODUCT", "YEAR";

6. From the Run dropdown menu, choose Analyze SQL.


7. On the Operators tab, filter by the Column Table operator.

For example:

Only the Current_Sales table is accessed. This confirms that the union node could be pruned because the
filter condition matches the one in the pruning configuration table.
8. Remove the pruning configuration table from the view properties of the calculation view and build it again.
9. Enter the same query as above and choose Analyze SQL from the Run dropdown menu.
10. On the Operators tab, filter by the Column Table operator.

For example:

The query now invokes both the archived sales data and current sales data although only the current sales
data is required. The presence of the Old_Sales table indicates that the union node was not pruned.

SAP HANA Performance Guide for Developers


268 PUBLIC Optimization Features in Calculation Views
Results

As demonstrated in this example, union node pruning in calculation views allows the execution flow to be
determined dynamically based on the query. This in turn allows resource consumption to be reduced and
performance improved.

7.5.3 Example with a Constant Mapping

In this example, you use a constant to define the criteria for pruning the union node and then apply it in the
WHERE clause of the query.

Context

The calculation view is based on the example shown earlier. The Union node is used to combine the two
data sources Current_Sales and Old_Sales. In the procedure below, you use the debug feature to visualize the
pruning.

Procedure

1. Create a calculation view with a Union node as follows:

Note that the SALESYEAR column shown above is created in the next step.
2. Create a constant to differentiate between the data sources:
a. On the Mapping tab of the Union node, choose + (Create Constant) in the Output Columns toolbar.

SAP HANA Performance Guide for Developers


Optimization Features in Calculation Views PUBLIC 269
b. Enter the required details and the constant values to be used for the two data sources.

3. Save and build the calculation view.


4. To start debugging, select the Semantics node and then choose the Debug button.
5. On the Debug Query tab, add the following WHERE clause to the default debug query:

WHERE "SALESYEAR" = 'CURRENT'

SELECT "SALESORDER",
"PRODUCT",
"YEAR",
SUM("SALES") AS "SALES"
FROM "JOINCARDINALITYEXAMPLE_HDI_DB_1"."p1.p2::UNION_PRUNING"
WHERE "SALESYEAR" = 'CURRENT'
GROUP BY "SALESORDER", "PRODUCT", "YEAR", "SALESYEAR";

6. Click Execute to run the query.

The Old_Sales data source is now grayed out, indicating that it is not being processed. For example:

The reason why it has been excluded is that the SALESYEAR filter does not match the constant value OLD
that you defined for it.
7. Rerun the debug query but replace CURRENT with OLD.

SELECT "SALESORDER",
"PRODUCT",
"YEAR",
SUM("SALES") AS "SALES"
FROM "JOINCARDINALITYEXAMPLE_HDI_DB_1"."p1.p2::UNION_PRUNING"
WHERE "SALESYEAR" = 'OLD'

SAP HANA Performance Guide for Developers


270 PUBLIC Optimization Features in Calculation Views
GROUP BY "SALESORDER", "PRODUCT", "YEAR", "SALESYEAR";

This time the Current_Sales data source is grayed out. For example:

Results

When a constant is added to the union node of a calculation view, queries that apply a filter on that constant
prune the applicable data sources in the Union node. This technique can be used to improve the runtimes of
complex models.

 Note

Constant mapping only works when a direct filter is applied on the source. For an example of how to use
an indirect filter, see Example with a Constant Mapping and an Indirect Filter in the SAP HANA Performance
Guide for Developers.

Related Information

Example with a Constant Mapping and an Indirect Filter

7.5.4 Example with a Constant Mapping and an Indirect Filter

In this example, you use a constant to define the criteria for pruning the union node but instead of applying a
direct filter on the union node, you use a filter that is derived from a mapping table.

Context

Union pruning based on a constant mapping requires explicit filters on the column that is used as the pruning
constant. If the pruning value is determined indirectly, for example through a join with a mapping table, union
pruning will not occur. In this situation, union pruning can be achieved by using a function that does the
mapping and then fills the input parameter which is used to filter the constant pruning column.

In the example below, you create a mapping table called mapping, a function called
FillInputParameterUseInputParameter, and the input parameters IP_1 and filter. When you execute a
query on the calculation view, you specify a filter value for the filter column of the table mapping.

SAP HANA Performance Guide for Developers


Optimization Features in Calculation Views PUBLIC 271
Based on the filter value, the FillInputParameterUseInputParameter function returns the corresponding
usedForPruningDecision value. The input parameter IP_1 is filled with this return value, which is then
used as a filter on the constant pruning column SALESYEAR.

 Note

The IP_1 parameter can only be filled with one value (a string or numeric value). Therefore, this scenario
works best when only one single value is expected for each value of filter. The string would otherwise
need to be parsed into several filter values first.

Procedure

1. Create a calculation view with a Union node as follows:

Note that the SALESYEAR column shown above is created in the next step.
2. Create a constant to differentiate between the data sources:
a. On the Mapping tab of the Union node, choose + (Create Constant) in the Output Columns toolbar.
b. Enter the required details and the constant values to be used for the two data sources. For example:

SAP HANA Performance Guide for Developers


272 PUBLIC Optimization Features in Calculation Views
3. Create a mapping table that maps the filter value to a value that is used for the pruning decision.
For example, the table mapping:

filter usedForPruningDecision

2020 CURRENT

2019 OLD

2018 OLD

4. Create a function that performs the mapping based on the table above and the filter value entered.
For example:

FUNCTION "CalculationViews.db::FillInputParameterUseInputParameter"(IN filter


NVARCHAR(20) )
RETURNS result NVARCHAR(10)
LANGUAGE SQLSCRIPT
SQL SECURITY INVOKER AS
BEGIN
SELECT top 1 "usedForPruningDecision" into result from "SourceData::mapping"
where "filter"=:filter;
END;

5. On the Filter Expression tab, enter the filter.


For example:

"SALESYEAR"='$$IP_1$$'

You can see the filter defined here:

SAP HANA Performance Guide for Developers


Optimization Features in Calculation Views PUBLIC 273
6. On the Parameters tab, add the input parameters IP_1 and filter. For example:

Name Parameter Type Procedure/Scalar Function

IP_1 Derived from Procedure/Scalar Func- CalculationViews.db::FillInputParame-


tion terUseInputParameter

filter Direct

The input parameter mapping for the function should look like this:

7. Save and build the calculation view.


8. Execute a query on the calculation view, specifying a filter value for the filter column of the table
mapping.
For example:

SELECT TOP 1000


"SALESORDER",
"PRODUCT",
"YEAR",
SUM("SALES") AS "SALES"
FROM "CalculationViews.db::UNION_PRUNING_CONSTANT"
(placeholder."$$filter$$"=>'2020')
GROUP BY "SALESORDER", "PRODUCT", "YEAR";

9. From the Analyze dropdown menu, choose Explain Plan.


The result shown confirms that only the Current_Sales table is accessed:

SAP HANA Performance Guide for Developers


274 PUBLIC Optimization Features in Calculation Views
7.5.5 Example Tables

The following tables are used in the examples.

Table 15: Current_Sales


SALESORDER PRODUCT YEAR SALES

006 PROD1 2020 450

006 PROD2 2020 330

006 PROD3 2020 510

007 PROD3 2020 1020

007 PROD1 2020 1350

007 PROD2 2020 990

008 PROD1 2020 450

008 PROD2 2020 330

009 PROD1 2020 900

009 PROD3 2020 1020

Table 16: Old_Sales


SALESORDER PRODUCT YEAR SALES

001 PROD3 2018 510

001 PROD1 2018 450

001 PROD2 2018 330

002 PROD1 2019 900

002 PROD3 2019 1020

002 PROD2 2019 660

003 PROD3 2019 510

004 PROD1 2019 450

004 PROD3 2019 510

005 PROD2 2019 330

7.6 Influencing the Degree of Parallelization

SAP HANA uses an involved algorithm to determine the degree of parallelization with which queries are
executed. However, there might be situations where you want to influence the degree of parallelization on top
of the general SAP HANA mechanism and control it explicitly for individual parts of your models based on
business considerations.

SAP HANA Performance Guide for Developers


Optimization Features in Calculation Views PUBLIC 275
SAP HANA calculation views provide an option for doing this. Although this option might be helpful for
individual highly time-critical models, it should not be used in an uncontrolled way, because the overall
parallelization in SAP HANA could otherwise be negatively affected.

You can control parallel execution by modeling the number of logical threads to be used to execute
specific parts of the model. The term logical threads is used because parallelization is described from the
perspective of the model. Each logical thread might be further parallelized based on SAP HANA parallelization
optimizations. For example, an aggregation node can be parallelized into 6 logical threads based on the model
definition. Within each logical thread, additional parallelization might occur irrespective of the model settings.

A flag called Partition Local Execution indicates the start and stop nodes between which the degree of logical
parallelization is defined. The start flag needs to be set on a node that is based on a table. Typically, this
is a projection or an aggregation node. This node indicates the beginning of the parallelization. The end of
parallelization is denoted by a union node in which this flag is set again. The degree of parallelization is defined
at the node where the start flag is set.

Parallelization Based on Table Partitions

The degree of parallelization can be determined by the number of partitions in the table used as the data
source of the start node. Parallelization based on table partitions can be particularly helpful in scale-out
solutions where data needs to be transferred between different hosts. Imagine a table whose partitions are
distributed over several hosts. Without parallelization based on table partitioning, data from the individual
partitions would be sent over the network in order to be processed according to the model on one host.
With parallelization based on table partitioning, the logic that is modeled in the parallelization block would be
processed on the local host on which the table partitions are located. This local processing would not only
distribute the processing load but would probably also reduce the number of records that need to be sent over
the network. Therefore, it is a good idea to include an aggregation node at the end of your parallelization block
to reduce the amount of data before it is sent over the network.

Parallelization Based on Distinct Entries in a Column

Alternatively, the degree of parallelization can be determined by the number of distinct values in a column of
the table that serves as the data source of the start node.

 Note

The Partition Local Execution flag gives you a high level of control over the degree of parallelization for
certain parts of your models based on your business logic and their relevance. Parallelization based on
table partitioning can be particularly helpful in scenarios in which tables are distributed across several
hosts. However, care must be taken when influencing parallelization through modeling to avoid over-
parallelizing all models at the cost of other processes.

SAP HANA Performance Guide for Developers


276 PUBLIC Optimization Features in Calculation Views
Related Information

Example: Create the Table [page 277]


Example: Create the Model [page 278]
Example: Apply Parallelization Based on Table Partitions [page 280]
Example: Apply Parallelization Based on Distinct Entries in a Column [page 283]
Verifying the Degree of Parallelization [page 285]
Constraints [page 288]

7.6.1 Example: Create the Table

The following table is used in the parallelization examples.

Procedure

Create the table partitionedTable.hdbtable with the following table definition:

COLUMN TABLE "tables::partitionedTable" ("SalesOrderId" NVARCHAR(10) NOT NULL


COMMENT 'Sales Order ID',
"CreatedBy" NVARCHAR(10) NOT NULL COMMENT 'Created By',
"CreatedAt" DATE CS_DAYDATE NOT NULL COMMENT 'Created At - Date and Time',
"ChangedBy" NVARCHAR(10) COMMENT 'Last Changed By',
"ChangedAt" DATE CS_DAYDATE COMMENT 'Last Changed At - Date and Time',
"NoteId" NVARCHAR(10) COMMENT 'SO Note Text ID',
"PartnerId" NVARCHAR(10) COMMENT 'Partner ID',
"Currency" NVARCHAR(5) NOT NULL COMMENT 'Currency Code',
"GrossAmount" DECIMAL(15,2) CS_FIXED DEFAULT 0 NOT NULL COMMENT 'Total
Gross Amount',
"NetAmount" DECIMAL(15,2) CS_FIXED DEFAULT 0 NOT NULL COMMENT 'Total Net
Amount',
"TaxAmount" DECIMAL(15,2) CS_FIXED DEFAULT 0 NOT NULL COMMENT 'Total Tax
Amount',
"LifecycleStatus" NVARCHAR(1) COMMENT 'SO Lifecycle Status',
"BillingStatus" NVARCHAR(1) COMMENT 'Billing Status',
"DeliveryStatus" NVARCHAR(1) COMMENT 'Delivery Status',
PRIMARY KEY ("SalesOrderId")) UNLOAD PRIORITY 5 AUTO MERGE PARTITION BY
HASH ("SalesOrderId") PARTITIONS 6

SAP HANA Performance Guide for Developers


Optimization Features in Calculation Views PUBLIC 277
7.6.2 Example: Create the Model

The following model is used in the parallelization examples.

Procedure

1. Create the calculation view.

The calculation view consists of a projection node (startParallelization) that feeds into an
aggregation node (runInParallel), which in turn feeds into a union node (stopParallelization).

For example:

2. To trigger the aggregation in the aggregation node, do not map all columns to the output. For example:

SAP HANA Performance Guide for Developers


278 PUBLIC Optimization Features in Calculation Views
Aggregation is enforced because the output columns are a subset of the input columns.
3. Define a calculated column in the aggregation node. For example:

SAP HANA Performance Guide for Developers


Optimization Features in Calculation Views PUBLIC 279
The calculated column in the aggregation node will be parallelized later.

7.6.3 Example: Apply Parallelization Based on Table


Partitions

This example shows how you can control parallelization in your model using table partitions.

Procedure

1. Define the start node for parallelization:


a. Select the projection node startParallelization.
b. On the Mapping tab, select the data source.
c. Under Properties, select the Partition Local Execution option:

SAP HANA Performance Guide for Developers


280 PUBLIC Optimization Features in Calculation Views
2. Define the stop node for parallelization:
a. Select the union node stopParallelization.
b. On the Mapping tab, select the data source.
c. Under Properties, select the Partition Local Execution option:

SAP HANA Performance Guide for Developers


Optimization Features in Calculation Views PUBLIC 281
Results

Every node between the startParallelization and stopParallelization nodes will be executed in
parallel for each partition of the data source table of the startParallelization node. For the sake of
simplicity, only one node was inserted between the start and stop nodes of the parallelization block, but you
could have multiple nodes between the start and stop nodes.

SAP HANA Performance Guide for Developers


282 PUBLIC Optimization Features in Calculation Views
7.6.4 Example: Apply Parallelization Based on Distinct
Entries in a Column

This example shows how you can control parallelization in your model using distinct entries in a table column.

Procedure

1. Define the start node for parallelization:


a. Select the projection node startParallelization.
b. On the Mapping tab, select the data source.
c. Under Properties, select the Partition Local Execution option.
d. Enter CreatedBy in the Partition Column field:

2. Define the stop node for parallelization:


a. Select the union node stopParallelization.
b. On the Mapping tab, select the data source.

SAP HANA Performance Guide for Developers


Optimization Features in Calculation Views PUBLIC 283
c. Under Properties, select the Partition Local Execution option:

Results

Every node between the startParallelization and stopParallelization nodes will be executed
in parallel for each distinct value in the CreatedBy column of the data source table of the
startParallelization node. For the sake of simplicity, only one node was inserted between the start and
stop nodes of the parallelization block, but you could have multiple nodes between the start and stop nodes.

SAP HANA Performance Guide for Developers


284 PUBLIC Optimization Features in Calculation Views
7.6.5 Verifying the Degree of Parallelization

You can verify the degree of parallelization at runtime by adding a calculated column that contains the
logical partition ID, or by using the SQL analyzer, or by tracing query execution. All methods work both for
parallelization based on table partitions and parallelization based on distinct entries.

Adding a Calculated Column that Contains the Logical Partition ID

To see the ID of the logical thread that processed a certain record, you can add the column engine expression
partitionid() as a calculated column and display this column in the results.

For example, you can add a calculated column that evaluates the partition identifier as shown below:

Assuming that parallelization is based on a table that consists of 6 partitions, an integer between 1 and 6 is
shown for each record. The number corresponds to the logical thread that processed the record. This means
that records with the same partition ID were processed by the same thread.

SAP HANA Performance Guide for Developers


Optimization Features in Calculation Views PUBLIC 285
For example:

The calculated column partitionIndicator, which is filled by the column engine expression
partitionid(), indicates the logical thread that processed the record.

Using the SQL Analyzer

The SQL analyzer can be used to check whether parallelization was applied. You need to run the SQL
analyzer for your statement and navigate to the nodes that should have been parallelized. For example, if
the parallelization is based on 6 partitions of a table, the nodes are shown 6 times.

In the following example, the SQL analyzer shows 6 parallel processing threads for the aggregation. The
aggregation node is grouped into 5 identical nodes plus an additional aggregation node where the processing is
merged:

SAP HANA Performance Guide for Developers


286 PUBLIC Optimization Features in Calculation Views
• The 6th node (aggregation):

• The 5 nodes:

SAP HANA Performance Guide for Developers


Optimization Features in Calculation Views PUBLIC 287
By drilling down into the aggregation ceAggregationPop, you can find the calculated columns that were
parallelized. As the example below shows, the calculated column is evaluated in each parallel thread:

Tracing Query Execution

By tracing query execution, you can also check whether parallelization was applied.

You can trace an individual query by adding the following, for example, to the query:

WITH PARAMETERS ('PLACEHOLDER' = ('$$CE_SUPPORT$$',"))

When you execute the query, a trace file with details about the calculation view execution is created and stored
as an index server trace file with a name that ends in _cesupport.trc.

When you open the trace file, you should see something like the following:

"Partitioned execution rule: 6 partitions found for table


'LD0::EXAMPLEPARALLELIZATION_HDI_DB:tables::partitionedTable (t -1)'.
The partitioned execution branch will be cloned."

7.6.6 Constraints

The constraints that apply when using the parallelization setting are listed below:

• The start of parallelization can only be defined in the nodes with the table data source (the lowermost
nodes).
• Parallelization is not supported across views.
• Only one parallelization block is allowed per query.
• Multiple start parallelization nodes are allowed only if a parallelization column is defined.

Also note that setting the parallelization flag will block unfolding (for background information about unfolding,
see SAP Note 2291812 and SAP Note 2223597 ).

SAP HANA Performance Guide for Developers


288 PUBLIC Optimization Features in Calculation Views
7.7 Influencing Calculation View Behavior with Hints

You can apply hints to specific calculation views or SQL statements to influence the behavior of individual
query executions, for example as a workaround or for performance tuning.

You can apply hints at the statement level or at the calculation view level. When a hint is applied at the
calculation view level, it is relevant for all queries that involve this view.

Applying Hints at Statement Level

You can set a hint at the statement level by adding the relevant hint to the query.

In the following example, the NO_CALC_VIEW_UNFOLDING hint is applied to block calculation view unfolding:

SELECT * FROM CV1 WITH HINT(NO_CALC_VIEW_UNFOLDING);

For more information about the syntax, see HINT Details in the SAP HANA SQL Reference Guide for SAP HANA
Platform.

Applying Hints at Calculation View Level

You can set a hint at the calculation view level as follows:

1. Select the Semantics node of the calculation view and choose the View Properties Advanced tab.
2. In the Execution Hints section, add the hint in either of the following ways:
• Choose Import Execution Hints to add the hint manually. For example, to block unfolding, enter the
following:
• Name: no_calc_view_unfolding
• Value: 1
• Choose Extract Semantics to extract the predefined hint from an underlying view.

You have the following options when entering hints at calculation view level:

Name Value Use Case

query_level_sql_hints Comma-separated list of SQL hints Can be used to apply SQL hints when
unfolding is done. If the view is not un-
folded (see Checking Whether a Query
is Unfolded), it has no effect.

The values CALC_VIEW_UNFOLDING


and NO_CALC_VIEW_UNFOLDING can-
not be used.

SAP HANA Performance Guide for Developers


Optimization Features in Calculation Views PUBLIC 289
Name Value Use Case

qo_pop_hints Comma-separated list of SQL hints Can be used if unfolding is not possi-
ble and the calculation view property
Execute In is set to SQL Engine. This
hint only applies to the part of the plan
which is optimized by the SQL engine.

no_calc_view_unfolding 1 Can be used to disable unfolding for the


calculation view

sql_opt_hints CALC_VIEW_UNFOLDING Can be used when unfolding is globally


disabled

Note the following:

• Hint names are case sensitive and must be in lowercase letters.


• In a stacked scenario, the hints need to be applied to the top-level calculation view.

Monitoring View

The monitoring view M_CE_CALCSCENARIO_HINTS displays the hints used in a calculation scenario.

Related Information

Checking Whether a Query is Unfolded [page 294]


HINT Details
SAP Note 3019087

7.8 Using "Execute in SQL Engine" in Calculation Views

The Execute in SQL Engine option allows you to override the default execution behavior in calculation views.

When calculation views are included in queries, the query is first optimized in the calculation engine. This has
a drawback when the query is included in a larger SQL query because two different optimization processes
are then involved, one for the SQL query itself and one for the calculation view. This can lead to inefficiencies
between the different optimization processes.

To avoid these inefficiencies, a global optimization is applied automatically. This means that after certain
calculation engine-specific optimizations have been applied, the resulting plan is translated into an SQL
representation, referred to as a "QO" (query optimization), which allows the SQL optimizer to work on the
whole query. The translation into an SQL representation is called "unfolding". For more information about
unfolding, see SAP Notes 2618790 and 2223597, for example.

However, some calculation view features cannot be readily translated into an SQL optimization due to their
non-relational behavior. As a result, the unfolding process is blocked for the whole calculation view. In such

SAP HANA Performance Guide for Developers


290 PUBLIC Optimization Features in Calculation Views
situations, you can direct the calculation engine to try to unfold parts of the view even though the view itself
cannot be fully unfolded. You can do this by setting the Execute in SQL Engine option. For example:

The most prominent features that block unfolding are queries that include non-SQL hierarchy views and
anonymization nodes.

Currently, some MDS queries (for example, queries from SAP Analytics Cloud) cannot be fully unfolded. It is
therefore often beneficial to set the Execute In option to SQL Engine in the topmost calculation views that are
queried by SAP Analytics Cloud or other front-end tools that use MDS.

Related Information

Impact of the "Execute in SQL Engine" Option [page 292]


Checking Whether a Query is Unfolded [page 294]
Influencing Whether a Query is Unfolded [page 294]
Understand Why a Query Is Not Unfolded [page 296]
SAP Note 2618790
SAP Note 2223597
SAP Note 1857202
SAP Note 2441054

SAP HANA Performance Guide for Developers


Optimization Features in Calculation Views PUBLIC 291
7.8.1 Impact of the "Execute in SQL Engine" Option

The mechanism of the Execute in SQL Engine option is illustrated using a calculation view in which a feature
(k-anonymity) prevents unfolding.

The following calculation view is used in this example. It includes an anonymization node as well as a join with a
table:

Due to the anonymization node, the calculation view cannot be fully unfolded. If Execute in SQL Engine is not
set, unfolding is blocked and the entire optimization occurs in the calculation engine. If you set the Execute in
SQL Engine option, all parts of the plan that can be translated into an SQL representation are unfolded. This is
effective for the view on which the flag is set as well as its included views. As a result, unfolding is not blocked
for the entire view but only for parts of the plan.

In this example, an SQL translation of a join and data retrieval is created, but the anonymization itself is not
unfolded. You can confirm this by analyzing the SQL plan.

The plan on the left shows the execution when the Execute in SQL Engine option is set. The plan on the right
is when the Execute in option is left empty. The ceQoPop operators on the left are the SQL representation of
the parts of the calculation engine plan that could be translated even though the calculation view as whole

SAP HANA Performance Guide for Developers


292 PUBLIC Optimization Features in Calculation Views
could not be unfolded. In contrast, there are no ceQoPop operators on the right and optimization is done in the
calculation engine only:

On the left, the first ceQoPop operator (bottom left) retrieves the data for the anonymization node and the
second ceQoPop operator executes the join. On the right, the table is read with a calculation engine operator

SAP HANA Performance Guide for Developers


Optimization Features in Calculation Views PUBLIC 293
(bottom right) and the join is executed using calculation engine operators. This confirms that by setting the
Execute in SQL Engine option, you can enforce partial unfolding even if global unfolding of the view is blocked.

7.8.2 Checking Whether a Query is Unfolded

You can check whether a query is unfolded by using the Explain Plan functionality. If all tables that are used in
the query appear in the plan, unfolding has taken place successfully.

The example below shows the Explain Plan of a query that has been executed on the calculation view above.
The k-anonymity node was replaced with a projection node. Unfolding therefore takes place by default and all
tables involved are shown in the plan:

Since the calculation view is unfolded by default, the hint NO_CALC_VIEW_UNFOLDING is used in the example
below to block unfolding:

A column view is now shown, but no tables can be seen at the lowest level.

7.8.3 Influencing Whether a Query is Unfolded

Unfolding is applied by default and you should avoid blocking it. The long-term goal is to be able to unfold all
queries, so future optimizations are highly focused on unfolded queries.

However, several options are available for blocking unfolding when needed (see SAP Note 2441054):

• Attach a hint to an individual statement: WITH HINT (NO_CALC_VIEW_UNFOLDING)


• Pin a hint to every execution of a statement (see SAP Note 2400006):

ALTER SYSTEM ADD STATEMENT HINT (NO_CALC_VIEW_UNFOLDING) FOR <sql_statement>

SAP HANA Performance Guide for Developers


294 PUBLIC Optimization Features in Calculation Views
ALTER SYSTEM PIN SQL PLAN CACHE ENTRY <plan_id> WITH HINT
(NO_CALC_VIEW_UNFOLDING)

• Add the hint to a specific view using an execution hint in a calculation view:

Name Value

no_calc_view_unfolding 1

• Block unfolding globally by setting the configuration parameter


[calcengine]no_calc_view_unfolding to 1 in the indexserver.ini file.

Related Information

SAP Note 2441054


SAP Note 2400006

SAP HANA Performance Guide for Developers


Optimization Features in Calculation Views PUBLIC 295
7.8.4 Understand Why a Query Is Not Unfolded

Information available about query optimization (QO) blockers allows you to see why a query could not be
unfolded and can help you decide whether to try to remove the blocker.

Context

Most QO blockers can be identified using the QO blocker feature in the SQL analyzer. An alternative way is to
use the calculation engine trace.

 Note

QO blockers are only recorded if the Execute in SQL Engine option is set for the calculation view. QO
blockers are not traced if it is not set (the default setting).

SQL Analyzer

Procedure

1. In the SAP HANA cockpit, choose Browse Database Objects in the Database Administration section.
The SAP HANA database explorer opens.
2. In the SAP HANA database explorer, open the SQL console and enter the SQL query to be analyzed.

3. Choose Analyze Analyze SQL .


The SQL analyzer opens.
4. In the SQL analyzer, choose Plan Graph at the top and drill down to the calculation scenario.
5. Click the operator to display its details.
For example, operator details with QO blocker information:

SAP HANA Performance Guide for Developers


296 PUBLIC Optimization Features in Calculation Views
An alternative way to retrieve this information is to use the calculation engine trace.

Calculation Engine Trace

You can activate a tracing profile for the calculation engine for individual queries from the SQL console.

Procedure

1. In the SQL console, add the following to the query before executing it:

SELECT [...] WITH PARAMETERS ('PLACEHOLDER' = ('$$CE_SUPPORT$$', 'i_ceqo'))

SAP HANA Performance Guide for Developers


Optimization Features in Calculation Views PUBLIC 297
 Sample Code

SELECT a, cc_1, sum(i)


FROM "TEST"."BlockedView"
GROUP BY a, cc_1 WITH PARAMETERS ('PLACEHOLDER' = ('$$CE_SUPPORT$$',
'i_ceqo'));

• The placeholder parameter $$CE_SUPPORT$$ enables support mode processing.


• The value pair i_ceqo specifies that the ceqo trace should be enabled with the trace level info. The
ceqo trace records QO blocker information.
2. Open the generated cesupport.trc trace file.
You can access the trace file from the SAP HANA cockpit or directly on operating system level:

Option Description

SAP HANA cockpit 1. In the Alerting and Diagnostics tile, choose View
trace and diagnostic files. The SAP HANA database
explorer opens.
2. In the directory structure under Datatabase
Diagnostic Files, expand the indexserver folder. The
trace files are named as follows:
indexserver_<host>.<port>_<statement
_id>_cesupport.trc

Operating system level /usr/sap/<sid>/HDB<inst>/<host>/trace/


indexserver_<host>.<port>_<statement_id
>_cesupport.trc

3. In the trace file, check for QO unfolding/QO conversion information. For example:

i ceQo ceNodeIsQOCompliantPattern.cpp(00131) : QO unfolding/QO conversion


is not possible.
Tag: EXPRESSION_SQL_CONVERSION_FAILED
Reason: RuntimeNode finalNode cannot be converted to QO because
an expression cannot be converted to SQL:
not supported function: 'string(abap_lower("A")[here])'

Related Information

SAP Note 2594739

7.9 Push Down Filters in Rank Nodes

Filters used in a query are by default not pushed down by rank nodes. This applies to filter expressions and
variables used in a WHERE clause. In these cases, a rank node accesses all the data of the nodes below it. You
can override this behavior by setting the Allow Filter Push Down option.

SAP HANA Performance Guide for Developers


298 PUBLIC Optimization Features in Calculation Views
Context

Set the Allow Filter Push Down option only if it is semantically correct for the particular use case.

Procedure

1. In the calculation view, select the rank node.


2. Choose the Mapping tab.
3. In the Properties section, select the Allow Filter Push Down option.
4. Save and build the calculation view.

Results

When the query is executed, the filter is pushed down to the nodes below the rank node.

7.10 Creating Snapshots of Queries on Calculation Views

Snapshots can be used to speed up queries if the queries do not need to be based on the most current data.

You define a "snapshot query" directly in the model. During deployment, procedures are generated for creating,
inserting data into, truncating, and dropping a "snapshot table" based on the defined snapshot query. The
procedures and snapshot table are named according to the following naming conventions:

• Procedures: <calculation_view_name>/<query_name>/SNAP/SNAPSHOT/CREATE|DROP|
TRUNCATE|INSERT
• Snapshot table: <calculation_view_name>/<query_name>/SNAP/SNAPSHOT

You can create snapshots on the Semantics node ( View Properties Snapshots ) of the calculation view.

The life cycle of the procedures and tables is coupled to the respective calculation view in which the snapshot
query is defined. Thus, deleting (that is, undeploying) the calculation view will also delete the corresponding
tables and procedures.

Authorizations

By default, the invoker mode is used for the procedures, which means that the authorizations of the caller are
applied. Therefore, snapshots contain the data that is visible to the caller of the INSERT procedure. Several
different snapshots can be generated based on an individual calculation view, and also by different database
users. In this way, data reflecting different privileges can be exposed.

SAP HANA Performance Guide for Developers


Optimization Features in Calculation Views PUBLIC 299
To run SELECT statements on snapshot tables, you must have SELECT privileges for the respective snapshot
tables.

Large Inserts

For large data sets, that is, when particularly large queries lead to a very large number of records, it can be
beneficial to schedule multiple inserts and to restrict each insert to a non-overlapping set of records based on
an input parameter. For example, you could use an input parameter, named IP_year in the example below, to
load individual years into the snapshot table:

 Sample Code

CALL "snapshotCV/Query_1/SNAP/SNAPSHOT/INSERT"(IP_YEAR => '2022')

If you want to avoid being prompted for the "technical" input parameter when the calculation view is not being
used to fill the snapshot, you can do so by adding a default value to the input parameter in the calculation view
and then checking against this default value in a filter expression.

For example, if the input parameter IP_year has the default value 'no influence', you could use the
following filter expression:

'$$IP_year$$'='no influence' or "year"='$$IP_year$$'

This would filter on the column "year" if IP_year differs from its default value. Otherwise, the filter would be
ignored.

Generating an Interface Calculation View

You also have the option of generating a calculation view based on both the original view and the snapshot
table. The generated calculation view acts like an interface view. It makes use of union pruning so that when
a query is run on the generated calculation view, either the original calculation view or the snapshot table
is accessed depending on the value specified for the input parameter I_SOURCE. The default value of the
parameter is SNAPSHOT, so the snapshot table is accessed by default. It can be set to BASE (base view) to allow
the original calculation view to be accessed instead.

For example, the following statement accesses the original calculation view:

 Sample Code

SELECT TOP 1000


"year",
"country",
"product",
SUM("amount") AS "amount"
FROM "I_Query_1"
(placeholder."$$I_SOURCE$$"=>'BASE')
GROUP BY "year", "country", "product"

The snapshot tables can still be accessed directly if the interface view is not needed.

SAP HANA Performance Guide for Developers


300 PUBLIC Optimization Features in Calculation Views
Considerations

The data of snapshot tables should be used as is and the result of a snapshot table processed with caution. If
you decide to place additional logic on top of a snapshot table, keep in mind that the data in the snapshot table
reflects the following:

• Status at the time the data was inserted: The data might be outdated.
• Visibility to the user who executed the insert: The privileges of the user who runs a SELECT statement on
the snapshot table might be different.
• Aggregation granularity of the snapshot query, that is, measures used in the snapshot query are calculated
at the granularity level of the snapshot query: Additional aggregations might lead to surprising data if the
granularity of the snapshot table is not considered.
If you want to mimic the aggregation granularity behavior of the snapshot query in the original calculation
view, you can do so by setting the Keep Flag option for every attribute that is included in the snapshot
query. Attributes that are used in calculated measures in the snapshot query also need the keep flag. For
more information, see Example: Using Keep Flag.

Monitoring Snapshots

You might want to find out how frequently snapshots are accessed and the approximate size of the snapshots.

You can execute the following statement, for example, to get the number of SELECT statements executed on all
the snapshot tables whose metadata is visible to you (that is, the user who runs the query):

select SELECT_COUNT,* from


SYS.M_TABLE_STATISTICS
where TABLE_NAME like '%SNAPSHOT'

You can get an estimation of the size of a snapshot by running the following statement:

select MEMORY_SIZE_IN_TOTAL,* from


M_CS_TABLES
where TABLE_NAME like '%SNAPSHOT'

For the M_TABLE_STATISTICS monitoring view, the following parameters must be enabled in the sql section of
the indexserver.ini file:

• table_statistics_modify_enabled
• table_statistics_select_enabled

Related Information

Create Snapshots
Example: Using Keep Flag
M_TABLE_STATISTICS System View
M_CS_TABLES System View

SAP HANA Performance Guide for Developers


Optimization Features in Calculation Views PUBLIC 301
7.11 Static Cache

The static cache allows the query results of calculation views to be cached in memory. This avoids having to
re-evaluate the view each time the view is accessed. By using cached results for frequent queries, you may be
able to reduce memory and CPU consumption and increase performance.

Because the cache is refreshed only periodically, the cached data might be stale if the tables used by the
view were updated since the last refresh. However, the data retention period that determines for how long
the cached data is reused before a refresh occurs is configurable, allowing you to determine the maximum
acceptable age of the data snapshot.

To keep the cache size to a minimum, you can select a subset of the columns to be cached (by default all
columns are cached). In addition, you can further reduce the cache size by adding frequently used filters.

When implementing cached views, you must test them carefully to validate that the queries use the cached
data and that the correct results are returned. You should apply caching only on processed and reduced data
(topmost view). It should not be applied on raw data.

Related Information

Prerequisites for Using the Static Cache [page 302]


Enable the Static Cache [page 303]
Apply a Filter [page 305]
Instruct Queries to Use the Static Cache [page 306]
Combine Data from Cached and Non-Cached Objects [page 306]
Use Analytic Privileges [page 307]
Force the Static Cache [page 308]
Static Cache Monitoring [page 309]

7.11.1 Prerequisites for Using the Static Cache

Queries can only use the static cache if certain conditions are met.

These are as follows:

• The Enable Cache option has been selected for the calculation view. See Enable the Static Cache [page
303].
• The calculation view can be unfolded. See Checking Whether a Query is Unfolded [page 294].
• The calculation view does not have any granularity-tracking calculations, which prevent the use of the
cache (to avoid unexpected results). For information about granularity-tracking calculations, see SAP Note
2618790 .
• The query has been directed to use the static cache. See Instruct Queries to Use the Static Cache [page
306].

SAP HANA Performance Guide for Developers


302 PUBLIC Optimization Features in Calculation Views
If a query cannot use the cache for any reason, it will not not fail but will be processed fully in the normal
manner.

7.11.2 Enable the Static Cache

You must explicitly enable the static cache for a calculation view. You can also select the columns to be cached
and specify a retention period. By selecting individual columns, you can reduce the size of the cache.

Procedure

1. Select the Semantics node of the calculation view.


2. On the View Properties tab, choose the Static Cache tab.
3. Set the Enable Cache option.

When the static cache is enabled, the RESULT_CACHE hint is automatically added to the view ( View
Properties Advanced tab):

4. Specify the retention period in minutes for the cached results.

The retention period determines for how long the cached data is used to serve a query. When the end
of the retention period is reached, the cache is refreshed the next time it is accessed by a query. Longer
retention periods mean that there are fewer cache refreshes and therefore the data does not have to be
reprocessed as often.
5. Select the columns to be cached.

To reduce memory consumption and to allow the cache to be refreshed more quickly, keep the number of
cached columns to a minimum. If you do not select any columns, all columns are cached by default.

For example:

SAP HANA Performance Guide for Developers


Optimization Features in Calculation Views PUBLIC 303
 Note

Queries can use a subset of the cached columns.

Example

You have cached the columns COLUMN1, COLUMN2, and COLUMN4. You run the following queries:

Query Result

SELECT COLUMN1, COLUMN2, COLUMN4 FROM … Accesses the static cache

SELECT COLUMN1, COLUMN3 FROM … Doesn't access the static cache

SELECT COLUMN1, COLUMN4 FROM … Accesses the static cache

SAP HANA Performance Guide for Developers


304 PUBLIC Optimization Features in Calculation Views
7.11.3 Apply a Filter

A cache filter allows a subset of records to be cached. Queries can only use the cache if their filter matches the
filter that is defined for the cache or reduces the data further.

Context

If you know that the queries you use frequently contain a filter, include this filter in the cache definition to
reduce memory consumption and the time required to refresh the cache.

You should make the filter as specific as possible while still ensuring that it can be used by most of the queries.

Procedure

1. Select the Semantics node of the calculation view.


2. On the View Properties tab, choose the Static Cache tab.
3. In the Filters section, choose + to add a row.
4. Enter the operator and value as used in the query filter.
If you add multiple filters, they will be combined using the AND operator.

Example

You have set the following filter on COLUMN3:

Column Operator Value

COLUMN3 = X

You run queries with the following filters:

Query Filter Result

WHERE COLUMN3='X' The query requests cached data, so the static cache is used.

WHERE COLUMN3='X' AND COLUMN2='Y' The query requests cached data, so the static cache is used.
The cached data can be further reduced by the additional
condition COLUMN2='Y'.

WHERE COLUMN3='Y' The query requests data that is not cached, so the static
cache cannot be used.

WHERE COLUMN3='X' OR COLUMN2='Y' The query requests data that is not cached
(COLUMN2='Y'), so the static cache cannot be used.

SAP HANA Performance Guide for Developers


Optimization Features in Calculation Views PUBLIC 305
7.11.4 Instruct Queries to Use the Static Cache

To make use of a view's static cache setting, queries must be explicitly directed to use the static cache.

Context

You can direct queries to use the static cache by applying the RESULT_CACHE hint. There are various ways to
apply the hint.

Procedure

Set the RESULT_CACHE hint using one of the following options depending on the desired scope:

• Set the hint for a specific calculation view


Set the RESULT_CACHE hint in the topmost calculation view:
1. Select the Semantics node and choose the View Properties Advanced tab.
2. In the Execution Hints section, do either of the following:
• Choose Import Execution Hints to add the RESULT_CACHE hint manually.
• Choose Extract Semantics to extract the RESULT_CACHE hint from the underlying view, if it has
already been defined for that view.
• Set the hint for a specific query
Add the RESULT_CACHE hint to the query as follows:

SELECT … WITH HINT(RESULT_CACHE);

• Set the hint to apply globally


Enable the result cache in the indexserver.ini file as follows: indexserver.ini [result_cache]
enabled

7.11.5 Combine Data from Cached and Non-Cached Objects

You can create joins and unions between cached and non-cached objects. However, mixing cached and non-
cached data can lead to a combination of data from different points in time. For this reason, you need to
explicitly request this type of combination by using the relevant hint.

Context

To combine data from cached and non-cached objects, you must set the
hint RESULT_CACHE_NON_TRANSACTIONAL in the topmost calculation view. The hint

SAP HANA Performance Guide for Developers


306 PUBLIC Optimization Features in Calculation Views
RESULT_CACHE_NON_TRANSACTIONAL allows join and union operations to be performed using the data
contained in the static cache. Any potential inconsistencies in data currentness are ignored.

Procedure

1. Select the Semantics node of the topmost calculation view.


Note that the static cache is not enabled in the topmost view.

2. Choose the View Properties Advanced tab.


3. In the Execution Hints section, choose Import Execution Hints and select the
RESULT_CACHE_NON_TRANSACTIONAL hint.
The hint is added as shown below:

7.11.6 Use Analytic Privileges

The static cache can only be used for calculation views that do not have underlying views with analytic
privileges. Therefore, when using the static cache, you should define analytic privileges on the topmost view
only. If they are used on underlying views, you need to apply a hint.

Context

If analytic privileges have been defined on underlying views, you can use the hint
RESULT_CACHE_BEFORE_ANALYTIC_PRIVILEGE so that the cache is built before the authorization
restrictions are applied. This ensures that the relevant data is cached in the normal manner and not per user
based on a user-specific analytic privilege. The analytic privilege is applied when the cached data is queried so
that users can only select the cached data they are authorized to access.

SAP HANA Performance Guide for Developers


Optimization Features in Calculation Views PUBLIC 307
Procedure

1. Select the Semantics node of the topmost calculation view.

2. Choose the View Properties Advanced tab.


3. In the Execution Hints section, choose Import Execution Hints and select the
RESULT_CACHE_BEFORE_ANALYTIC_PRIVILEGE hint.
The hint is added as shown below:

Related Information

Static Result Cache


Creating and Using the Static Result Cache

7.11.7 Force the Static Cache

Sometimes a calculation view cannot be deployed with the static cache setting. This might be because it
cannot be fully unfolded or because it uses granularity-tracking calculations. In these cases, you can use the
Force option to deploy it.

Prerequisites

You have enabled the static cache. See Enable the Static Cache [page 303].

Context

If the calculation view uses granularity-tracking calculations and the level of granularity differs between the
cached columns and the query, there is a risk that the data will be inconsistent (the calculation at the cached
granularity level does not match the calculation at the granularity level requested by the query).

You should test your queries extensively if their granularity changes, that is, the columns they reference. You
can enforce the granularity level by using the Keep Flag option. However, it is better to avoid the Force option
altogether. If you do use it, you should make sure that you understand the impact of the granularity-tracking
calculations.

SAP HANA Performance Guide for Developers


308 PUBLIC Optimization Features in Calculation Views
Procedure

1. Select the Semantics node of the calculation view.


2. On the View Properties tab, choose the Static Cache tab.
3. Select the Force option.

7.11.8 Static Cache Monitoring

There are dedicated system views for monitoring the use of the static cache. You can also use the explain plan
to check whether a query is served by the static cache.

System Views

The system views that provide information about the static cache are listed below together with some of their
main columns:

View Column Description

M_RESULT_CACHE MEMORY_SIZE The memory size of the cached entry.

Allows you to check that the data is not


too large to be effectively cached.

LAST_REFRESH_TIME The time when the cached entry was


last refreshed.

Allows you to see how old the data in


the cache is.

LAST_ACCESS_TIME The last time the cache entry was ac-


cessed.

Allows you to verify that the query used


the cache.

ACCESS_COUNT The number of times the cache entry


was accessed.

Allows you see how important the


cached view is and whether queries are
accessing the cache.

RESULT_CACHE CACHE_TYPE The type of cache that is enabled:


STATIC, FULL; STATIC, PARTIAL (SE-
LECTED COLUMNS); STATIC, PARTIAL
(FILTERED); STATIC, PARTIAL (SE-
LECTED COLUMNS, FILTERED)

SAP HANA Performance Guide for Developers


Optimization Features in Calculation Views PUBLIC 309
View Column Description

CACHE_RETENTION The cache refresh interval in minutes,


indicating the maximum age of the data
returned from the cache.

CACHE_FILTER The filter condition specified so that


only a subset of the records is cached.

RESULT_CACHE_COLUMNS COLUMN_NAME The name of the cached column. Allows


you to see which columns are cached.

Explain Plan

The explain plan (context menu in the SQL Console) allows you to see whether a query will try to use the static
cache. The OPERATOR_NAME value should be RESULT_CACHE.

For example:

Related Information

M_RESULT_CACHE System View


RESULT_CACHE System View
RESULT_CACHE_COLUMNS System View

SAP HANA Performance Guide for Developers


310 PUBLIC Optimization Features in Calculation Views
7.12 Reduce Retrieval Times for Value Helps

If a value help takes too long to be displayed, you should explicitly specify a smaller view or table to be used as
the source of the value help. This should make it significantly quicker for an analytics tool to retrieve the values
of the value help.

Context

This problem can occur when the attributes to be accessed are contained in a large view, requiring the whole
view to be executed even though you might only be interested in the values contained in one of the columns.
For example, you might want to provide a selection for a filter (an airline wants to list all airports for selection),
which would be the same as a SELECT DISTINCT statement on the attributes in that column.

Procedure

1. On the Columns tab of the Semantics node, choose Assign Value Help in the Assign Semantics dropdown
menu. For example:

2. In the Assign Value Help dialog, specify the view or table and column as follows:

• In the View/Table Value Help field, browse to select the view or table to be used in the value help.
• In the Reference Column field, open the search help to select the column to be used.

Results

The values for the value help for the selected column can be read from the view or table and column specified
above. An analytics tool can retrieve the information about the newly specified source as follows, for example:

SELECT

SAP HANA Performance Guide for Developers


Optimization Features in Calculation Views PUBLIC 311
dimension_column_name "value help column",
columnobject_name "value help view"
FROM _SYS_BI.BIMC_DIMENSION_VIEW
WHERE cube_name='<name_of_cube>'
AND column_name='<name_of_column>'

It can then issue a SELECT DISTINCT statement on the "value help column" of the newly specified source. It is
the responsibility of the analytics tool to make use of this information.

Related Information

View BIMC_DIMENSION_VIEW

7.13 Analytical Model Health Check

You can use the CHECK_ANALYTICAL_MODEL procedure to check the health of analytical models.

The CHECK_ANALYTICAL_MODEL procedure currently supports two checks:

• A check to determine whether the calculation view result can be cached from an MDS perspective.
This is particularly useful in the SAP HANA multi-dimensional services (MDS) or SAP Analytics Cloud
(SAC) context to find out whether the MDS cube cache can be used. The information about whether the
calculation view can be cached from an MDS perspective is specific to the user that queries the view.
Therefore, the same user should be used to run this procedure.
• Various checks relevant to join nodes, such as whether cardinalities have been defined or whether dynamic
join attributes are used.

The CHECK_ANALYTICAL_MODEL procedure has the following syntax:

CHECK_ANALYTICAL_MODEL ('<action>', '<schema_name>', '<object_name>',


<return_object>)

Argument Data Type Description

action VARCHAR(64) Action to be performed. This is cur-


rently always empty.

schema_name NVARCHAR(256) Name of the schema containing the cal-


culation view

object_name NVARCHAR(256) Name of the calculation view

The following columns are returned: ENTRY_ID, PARENT_ENTRY_IDS, LEVELS, SCHEMA_NAME,


OBJECT_NAME, OBJECT_TYPE, MESSAGE_TYPE, MESSAGE_TEXT.

The following messages are displayed, depending on the result:

SAP HANA Performance Guide for Developers


312 PUBLIC Optimization Features in Calculation Views
MESSAGE_TEXT MESSAGE_TYPE More Information

Result is cacheable INFO Model Cacheability


[page 322]

Join node "Join_<number>": Check if inner join can be changed to INFO Setting Join Cardinality
referential join [page 220]

Join node "Join_<number>": Check if referential integrity can be set INFO Setting Join Cardinality
to "Both" [page 220]

Join node "Join_<number>": Check if setting "Dynamic Join" can be INFO Using Dynamic Joins
used [page 254]

Join node "Join_<number>": Check if setting "Optimize Join Col- INFO Optimizing Join Col-
umns" can be used umns [page 242]

Result is not cacheable because the analytic privilege is not deter- WARNING Cacheability of SQL
ministic Analytic Privileges
[page 322]

Result is cacheable per user because the analytic privilege is ses- INFO
sion user deterministic

The information messages provide generic suggestions on how the model performance could be improved by
additional settings in the respective views. You should consider the semantic impact these settings could have
when deciding whether to apply the suggested changes.

Example

The check is called on an example calculation view as follows:

CALL CHECK_ANALYTICAL_MODEL('', 'TEST', 'IncludingView', ?);

The result shows that the calculation view result is not cacheable because the analytic privilege is not
deterministic. It also shows that the join node Join_1_1 does not have a cardinality set and that join node
Join_1_2 is an inner join. All join nodes do not have optimized join columns. You are advised to check whether
the cardinality of the join node Join_1_1 could be changed and whether the join node Join_1_2 could be changed
to a referential join. For all three joins you are advised to check whether the Optimize Join Columns option could
be set:

SAP HANA Performance Guide for Developers


Optimization Features in Calculation Views PUBLIC 313
7.14 Condensed Performance Suggestions

The following table summarizes the main performance recommendations.

Performance Suggestion Description

Join cardinality and key fields It is highly recommended to set a join cardinality because this makes the calcula-
tion view easier to optimize.

Optimized join columns It is highly recommended to set the Optimize Join Columns option where appro-
priate because it allows unnecessary columns to be pruned.

Join columns and partitioning criteria in It is recommended to use matching partitions for tables that are joined.
scale-out scenarios
The join columns should be used as the keys for partitioning tables. If the join col-
umns are used for partitioning on both sides, the first-level partition criteria must
be identical on both sides (particularly for round robin and hash partitioning).
This allows efficient parallel processing.

Join cardinality and referential integrity It is recommended to provide as much information as possible about the calcula-
tion model to support optimization.

When providing join cardinality information, check whether you can also provide
information about referential integrity.

Calculated attributes in join conditions Calculated attributes should be avoided in join conditions because they might
prevent certain internal optimizations.

Calculated attributes in filter expres- Calculated attributes should be avoided in filter expressions because they can
sions prevent certain internal optimizations related to filter pushdown.

Measures in join conditions Measures should not be used in join conditions because of potentially high car-
dinalities that lead to longer runtimes.

Measures in filter expressions Measures should be avoided in filter expressions because they might prevent
internal optimizations.

SAP HANA Performance Guide for Developers


314 PUBLIC Optimization Features in Calculation Views
Performance Suggestion Description

Data type conversion in filters Data type conversion should be avoided in filters.

Ideally, it should not be necessary to convert data types. However, because a


filter must be defined with the correct format, it might be necessary to convert
the data types involved, for example, for comparison purposes. In these cases,
it is better to cast individual values from one data type to another within the
filter expression, rather than casting an entire column. For example, if column1 is
compared to value1, it is preferable to convert value1 rather than column1.

= Recommended, = Problematic

7.15 Avoid Long Build Times for Calculation Views

When a calculation view takes a long time to build, it might be because it has many dependent hierarchy views.

Context

When you build a calculation view, all views based on the calculation view also need to be built. This can lead to
very long build times if there are many dependent views.

In general, you should try to reduce the number of views that are based on the calculation view you want to
build. You should also make sure that all views placed on top of the calculation view are valid views.

If the long build time is caused by the creation of technical hierarchies defined on the calculation view, it
is recommended that you check whether you can switch the data category of the calculation view to SQL
Access Only. This setting prevents technical hierarchies from being created.

The procedures below describe how to check whether long build times are caused by technical hierarchies and
how to set the data category of a calculation view to SQL Access Only.

SAP HANA Performance Guide for Developers


Optimization Features in Calculation Views PUBLIC 315
Check Whether Long Build Times are Caused by Technical
Hierarchies

The diserver trace files provide information that can be used to analyze issues related to calculation view
build times.

Procedure

1. Set the trace level of the hana_di component of the diserver service to at least level INFO (the highest
trace level is DEBUG). You can do this from the SAP HANA cockpit or the SQL console:

Option Description

SAP HANA cockpit 1. Click the Browse Database Objects tile. The SAP
HANA database explorer opens.
2. Select your database and from the context menu
choose Trace Configuration.
3. In the Database Trace tile, click Edit. The Database
Trace Configuration dialog box opens.
4. Under DISERVER, set the trace level of hana_di to
INFO or higher.
5. Choose OK to save your settings.

SQL ALTER SYSTEM ALTER CONFIGURATION


('diserver.ini', 'SYSTEM') SET
('trace', 'hana_di') = 'INFO';

2. Build the calculation view.


3. Open the current diserver trace file.
You can access the trace file from the SAP HANA cockpit or directly on operating system level:

Option Description

SAP HANA cockpit 1. In the Alerting and Diagnostics tile, choose View
trace and diagnostic files. The SAP HANA database
explorer opens.
2. In the directory structure under Datatabase
Diagnostic Files, expand the diserver folder.
The diserver trace files are named as follows:
diserver_<host>.<port>.<counter>.trc

Operating system level /usr/sap/<sid>/HDB<inst>/<host>/trace/


diserver_<host>.<port>.<counter>.trc

4. In the diserver trace file, check whether both of the following conditions apply:

• You can find a lot of entries that are related to views (for example, CREATE VIEW, DROP VIEW). These
entries have the following pattern:<view_name>/<column_name>/hier/<column_name>
• These statements take up a significant amount of the overall build time.

SAP HANA Performance Guide for Developers


316 PUBLIC Optimization Features in Calculation Views
Avoid Creating Technical Hierarchies (Optional)

Depending on the data category assigned to a calculation view, technical hierarchies are created for each
column of the view. Technical hierarchies are required for MDX queries.

Context

Calculation views are classified using one of the following data categories: CUBE, DIMENSION, or SQL ACCESS
ONLY. In the case of SQL ACCESS ONLY, technical hierarchies are not created. If your front-end tool does not
rely on technical hierarchies, you can avoid creating them by assigning this data category to the calculation
view. You can assign the data category either when you create the calculation view or later.

Procedure

• To create a calculation view with the SQL ACCESS ONLY data category, do the following:
a. In the New Calculation View dialog box, enter SQL ACCESS ONLY in the Data Category field.
• To switch the data category of a calculation view that has been created as a CUBE or DIMENSION, proceed
as follows:
a. Double-click the Semantics node.
b. On the View Properties tab, enter SQL ACCESS ONLY in the Data Category field.
c. Save the calculation view.

 Note

If you change the data category from CUBE or DIMENSION to SQL ACCESS ONLY and the view is
directly accessed by a front-end tool, verify that the front-end tool can still report on the respective
view. Also, do not change the data category to SQL ACCESS ONLY if you have explicitly modeled
hierarchies in the Semantics node.

SAP HANA Performance Guide for Developers


Optimization Features in Calculation Views PUBLIC 317
8 Multi-Dimensional Services (MDS) Query
Performance

SAP HANA multi-dimensional services (MDS) is used to process multidimensional queries, including
aggregations, transformations, and calculations.

The queries are translated into an SAP HANA calculation engine execution plan or SQL, which is executed by
the SAP HANA core engines.

The query languages currently supported use the Information Access (InA) model. The InA model simplifies
the definition of queries with rich or complex semantics. Data can be read from all kinds of SAP HANA views,
EPM plan data containers, and so on. The InA model also includes spatial (GIS) and search features.

Related Information

Analyzing Multi-Dimensional Services (MDS) Queries [page 318]


MDS Cube Cache [page 321]

8.1 Analyzing Multi-Dimensional Services (MDS) Queries

You can use the monitoring view M_MULTIDIMENSIONAL_STATEMENT_STATISTICS and the SAP HANA
performance tools to investigate issues with multidimensional queries, also referred to as InA (Information
Access) queries.

Identifying Long-Running MDS Queries

You can get an overview of the MDS activity in your SAP HANA database by using the monitoring
view M_MULTIDIMENSIONAL_STATEMENT_STATISTICS, which provides extensive information about MDS
queries. The M_MULTIDIMENSIONAL_STATEMENT_STATISTICS view is an aggregating view, similar to the
M_SQL_PLAN_CACHE view, which means that when the same query is executed several times, it is recorded in
a single row in the view.

To identify the expensive queries, you can sort the view by the average execution time (AVG_EXECUTION_TIME
column) to list the most expensive queries at the top.

The STATEMENT_STRING column shows the actual MDS statement. The LAST_PERFORMANCE_DATA column
contains MDS-internal performance data in JSON format, which you can copy, if needed. Batch queries are
also listed, as well as the individual queries they contain.

SAP HANA Performance Guide for Developers


318 PUBLIC Multi-Dimensional Services (MDS) Query Performance
Note that the M_MULTIDIMENSIONAL_STATEMENT_STATISTICS monitoring view does not provide
information about session variables. These are part of the SQL session and are not logged by MDS. It also
does not show if a certain view uses session variables.

The M_EXPENSIVE_STATEMENTS monitoring view should not be used for analyzing MDS queries due to
certain limitations. For example, it does not show the correct connection IDs or statements IDs. Also, since a
stored procedure is called, the actual MDS statement is not shown.

Managing the M_MULTIDIMENSIONAL_STATEMENT_STATISTICS Monitoring


View

The M_MULTIDIMENSIONAL_STATEMENT_STATISTICS monitoring view contains aggregated values, such as


the average execution time.

The content of the view is reset when the SAP HANA database instance is restarted. You can also clear
the content manually using the ClearStatisticsView request action in the EXECUTE_MDS procedure, as
follows. Note that you need the INIFILE ADMIN privilege.

CALL SYS.EXECUTE_MDS('Analytics', '', '', '', '', REQUEST=>'{"Analytics":


{"Actions":[{"Type":"ClearStatisticsView"}]}}', RESPONSE=>?);

The following INI parameters are available for configuring the


M_MULTIDIMENSIONAL_STATEMENT_STATISTICS view:

Parameter Default Value Description

disable_statistics False When set to true, statistics entries are


not created.

max_statistics_entry_count 10000 Maximum number of entries in the sta-


tistics view

Analyzing an MDS Query

You can analyze an MDS query as follows:

1. In the M_MULTIDIMENSIONAL_STATEMENT_STATISTICS monitoring view, copy the statement shown in


the STATEMENT_STRING column.
2. In the SQL console of the SAP HANA database explorer, call the built-in stored procedure
SYS.EXECUTE_MDS with the copied statement as the REQUEST parameter. Both REQUEST and
RESPONSE are in JSON format (data type NCLOB):

CALL SYS.EXECUTE_MDS('Analytics', '', '', '', ''


,REQUEST=>'{"Analytics"…'
,RESPONSE=>?);

The results are returned. Any errors that occured are shown in the response. You should be able to find the
error in the index server alert trace.
3. For long-running queries, in particular, analyze the plan:

SAP HANA Performance Guide for Developers


Multi-Dimensional Services (MDS) Query Performance PUBLIC 319
1. In the SQL console, choose Analyze Analyze SQL .
Alternatively, save the plan as follows so that you can use it with the SQL analyzer in Visual Studio
Code:
1. Choose Analyze Analyze SQL and Save Plan and specify a file name for the saved plan. Then
choose Analyze & Save and Close.
2. Under Database Diagnostic Files, open the folder given in the previous step, and in the context
menu of the saved plan, choose Download.
3. Upload the saved plan into Visual Studio Code. For more information, see Import a File (SQL
Analyzer).
2. On the Timeline tab of the SQL analyzer, drill down to the main steps of the MDS query.
3. For expensive parts, for example involving the calculation engine, on the Plan Graph tab, drill down to
the Execute Plan node and then the Calculation Model Execute Model nodes.
4. Drill down further to see details of the actual execution.

To check if a query is still being processed, proceed as follows:

1. On the Database Overview page of the SAP HANA cockpit, on the Threads card, click either the number of
active or blocked threads on the card or the card title to open the Threads page.
2. Using the Thread Detail and Duration columns, select the long-running MDS query.
3. Choose Call Stack and refresh the screen. The call stack below shows details of the execution of the
selected thread.
4. To cancel execution, select the thread and choose Cancel Operations in the footer toolbar. In the MDS
system, you should also see that the operation was canceled.

Additional Information

You can find detailed information about MDS and troubleshooting in the following SAP Notes:

SAP Note Title

2670064 FAQ: SAP HANA Multi-Dimensional Services (MDS)

2691501 SAP HANA MDS: Information needed by Development Sup-


port

2550833 HANA built-in procedure: EXECUTE_MDS

Related Information

M_MULTIDIMENSIONAL_STATEMENT_STATISTICS System View


Import a File (SQL Analyzer)

SAP HANA Performance Guide for Developers


320 PUBLIC Multi-Dimensional Services (MDS) Query Performance
8.2 MDS Cube Cache

SAP HANA multi-dimensional services (MDS) is an SAP HANA component that is responsible for processing
analytical and multidimensional queries. MDS uses the MDS cube cache to cache the results of aggregation
and calculation.

The MDS cache is one of the SAP HANA result caches, and the MDS cube cache is a subcomponent of the
MDS cache. It caches the results of a query execution after the required post processing has been performed.
Internally MDS stores a data structure called a cube, which is used to produce the query results. Therefore, the
cache is referred to as a cube cache and not a result cache. This cache is more versatile than a plain result
cache since the same cube structure can be used to serve different queries as long as they share certain key
aspects. For example, if you change the numeric formatting or the specific sort orders of a query, MDS still
uses the same cached cube to serve the results.

The main characteristics of the cube cache are as follows:

• The cube cache normally respects data and metadata refreshes (unless default settings for the cache are
overridden). Cached items are flushed when changes are made to the underlying model definitions or data.
• The cube cache is access control aware. The cube cache key consists of the elements of the query that
influence the query's cube, as well as the analytic privileges of the user that issues the query. The cached
item can therefore be reused for another user that issues the same query and has the same access level.
This is a common scenario for enterprise reporting applications.
• Not all models are cacheable. Cacheability depends on the definition of the model and the SQL analytic
privileges that are applied. A model's cacheability can be determined by running the model health check
procedure for that model (see Analytical Model Health Check).
• The cube cache default behavior can be overridden using SAP HANA configuration parameters. The cube
cache can be forced to cache items by time-to-live or to ignore analytic privileges and instead cache items
based on the user name, in which case the cube cache loses its access control awareness and/or ability to
flush items based on data source updates. It is obviously not desirable to change the default behavior.

Related Information

Query Execution Engine Overview [page 22]


Analytical Model Health Check [page 312]

8.2.1 Cube Cache Behavior

The MDS cube cache caches the cube structure.

The cache key represents not only the query content but also the SQL analytic privileges of the user that is
executing the query. Therefore, the MDS cube cache is access control aware. For example, if user A issues a
query and user B has the same access level as user A (that is, their analytic privileges are identical), then user
B's query will generate a cache hit. Also, analytic privileges can apply to the model that is directly being queried
or to any dependency models or views. If a model is not cacheable or its analytic privileges are not cacheable,
this behavior can be modified so that items are cached per user or at a predetermined interval.

SAP HANA Performance Guide for Developers


Multi-Dimensional Services (MDS) Query Performance PUBLIC 321
The number of cache hits or the performance of the MDS cube cache depends on how many end users and
how many access levels there are. Basically, if there are an N number of end users issuing queries and there
are an M number of access levels (which means N users map to M access levels), the cache hit rate increases
when M < N. In a large corporate environment, the number of access levels (M) is typically much lower than
the number of employees (N), so once an end user issues a query, all other users with the same access level
are able to retrieve the results of that query from the cache. Therefore, it is important that model designers
consider cache performance when designing a model. Incorporating too many access levels into a model's
definition will render the cache ineffective.

In general, MDS relies on the SAP HANA cache infrastructure to flush cached items when their underlying
runtime objects are changed. A change to a runtime object can involve either a change to the data or to
the model's metadata. For example, when new data is added to the underlying tables, the cached items that
depend on those tables are flushed. Therefore, the cached items are always up to date. If the model cannot be
cached, the default behavior can be overridden to forcefully cache query results based on a time interval. The
query results may then not reflect the current status of the model's metadata or data.

Related Information

Structure of SQL-Based Analytic Privileges

8.2.1.1 Model Cacheability

A model's cacheability refers to the properties of that model that enable SAP HANA to detect whether the data
tables which the model is built upon or the model's definition (that is, the model's metadata) has changed
since a query result was cached.

Basically, if a model is cacheable, both data table updates and changes to the model definition (or metadata)
can be detected by SAP HANA. The cached items for such a model can therefore be flushed when changes are
detected and the cube cache will be always be up to date.

The model metadata refers to both the model's definition and any dependency the model could have. For
example, if a calculation view is built on top of another calculation view, that is, it uses another calculation
view as the data source, and the dependency calculation view (the source calculation view) changes, then all
cached items of the top view will be flushed. Several calculation views can be daisy chained and if all of them
are cacheable, the top view will also be cacheable. This property also extends to any hierarchy or dimension
view that the model depends on.

8.2.1.2 Cacheability of SQL Analytic Privileges

The concept of cacheability also extends to a model's analytic privileges (SQL). Because a model's analytic
privileges also determine which portion of data a user can (or cannot) access, they affect the result of a query.

Different types of analytic privileges can be applied to a model. The following describes which types of analytic
privileges are cacheable and which are not:

SAP HANA Performance Guide for Developers


322 PUBLIC Multi-Dimensional Services (MDS) Query Performance
• Classical XML-based analytic privileges: This is the simplest type of analytic privilege and views with this
type of analytic privilege are cacheable.
• SQL-based dynamic analytic privileges: This type of SQL analytic privilege is based on a procedure. The
procedure returns a WHERE clause (as in an SQL WHERE clause) and must follow certain templates in
terms of input arguments and return type. Views with this type of analytic privilege are cacheable.
• SQL-based static analytic privileges: This type of SQL analytic privilege is based on an arbitrary SQL
statement. Views with this type of analytic privilege are not cacheable. This is because dependency
analysis and determinism of an arbitrary SQL statement is currently not possible for MDS.

 Note

Although static SQL analytic privileges are considered uncacheable (other than trivial cases like
WHERE 1=1), MDS provides an allowlist containing the names of SQL analytic privileges that are
treated as cacheable at runtime. If you add a static analytic privilege name to the allowlist (see
cube_cache_analytic_privileges_allow_list in the Cube Cache Configuration topic), it will be
treated as a cacheable analytic privilege during a cache write. The user or modeler must take special
care to verify that the static SQL analytic privilege really is deterministic before adding the analytic
privilege to the allowlist, otherwise this can cause permanent cache misses, which in turn will cause the
cache to rapidly overgrow and run out of space.

Related Information

Structure of XML-Based Analytic Privileges


Structure of SQL-Based Analytic Privileges

8.2.2 Cube Cache Configuration

You can configure the cube cache using the appropriate SAP HANA configuration parameters.

You can modify the SAP HANA configuration parameters using one of the following standard methods:

• Directly by setting the configuration parameters on the SAP HANA instance


• Overriding the set parameters by defining them as hints on a specific calculation view
• Overriding the set parameters by defining them as InA (Information Access) query hints

The following SAP HANA parameters can be used to configure the cube cache. The parameters need to be set
in the mds section of the SAP HANA INI configuration file:

SAP HANA Configuration Parameter Default Value Description

enable_cube_cache true Enables cube caching. This is the switch that enables the
cube cache for the database instance.

The following settings control cache key generation and the caching mechanism:

SAP HANA Performance Guide for Developers


Multi-Dimensional Services (MDS) Query Performance PUBLIC 323
SAP HANA Configuration Parameter Default Value Description

force_cube_cache false Forces cube cache usage for two cases:

• If the view is not cacheable, it simply forces caching.


• If the analytic privilege is not cacheable, it forces cach-
ing and appends the user name to the cache key.

cube_cache_with_timeout_only false Cache invalidation with timeout only.

Changes in the data source do not trigger invalidation in


this case. But the view should still be cacheable unless this
setting is combined with force_cube_cache.

cube_cache_with_user false Always appends the user name to the cache key and skips
the evaluation of the analytic privileges, even if the analytic
privilege is cacheable.

cube_cache_timeout_ms -1 (unlimited) Cube cache entries' time to live in milliseconds. If any of the
following conditions are present, cache behavior is governed
by the timeout and not the changes in the data source:

• The view is not cacheable but forced (with


force_cube_cache).
• The analytical privilege is not cacheable but forced (with
force_cube_cache).
• cube_cache_with_user is true.
• cube_cache_with_timeout_only is true.

force_cube_cache_overwrite false Forces re-execute and overwrite of the existing cube cache
entry (if one already exists). This setting must be applied as
an INA query hint to force refresh a cached item.

 Caution
Do not set this parameter in the SAP HANA configura-
tion file. It is designed to be used as a query hint only.
Setting this value to true in the SAP HANA configura-
tion file will cause all cached items to be refreshed every
time a query is run. That means that it not only effec-
tively disables the cache but also makes queries slightly
slower overall.

SAP HANA Performance Guide for Developers


324 PUBLIC Multi-Dimensional Services (MDS) Query Performance
SAP HANA Configuration Parameter Default Value Description

cube_cache_analytic_privileges_al- (empty) Semicolon-separated list of static SQL analytic privileges


low_list that are allowed in the cube cache. Other than trivial cases
(for example, WHERE 1=1), static SQL analytic privileges
cannot be cached. This behavior can be circumvented by
adding the name of the analytic privilege to the allowlist.
However, the SAP HANA administrator is responsible for en-
suring that the analytic privilege really can be cached. The
static WHERE clause of the analytic privilege is appended to
the cache key in the same way as for truly cacheable analytic
privileges.

 Caution
If an uncacheable static SQL analytic privilege is added
to the allowlist, it can cause permanent cache misses,
which can then cause the cache to quickly run out of
space. This is because the cache key is made up of the
SQL analytic privilege's WHERE clause plus other infor-
mation. If the WHERE clause contains ephemeral infor-
mation (such as a timestamp, session ID, and so on), the
cache key will never match subsequent instances of the
same query (because every subsequent execution of the
same query will result in a different cache key).

cube_cache_entry_size_limit_bytes 100,000,000 Maximum size limit per cache entry. If the size of the serial-
bytes ized cube is larger than this value, the item is not cached.

cube_cache_min_execution_thresh- 500 ms Minimum execution time in milliseconds for a query to be


old_ms cached. If the query execution time is lower than this value,
the cube is not cached.

The following settings control the entire cache behavior:

flush_cube_cache false If set to true, cube cache entries are flushed before the
next query execution. Once the entries have been flushed,
the parameter is set back to false automatically. It is cur-
rently only used for testing purposes.

cube_cache_max_memory_bytes -1 (unlimited) Maximum memory used by the cube cache. This limit is
not a hard limit, since garbage collection occurs at regular
intervals.

cache_component_memory_check_in- 600 s The time interval in seconds for the garbage collection of the
terval MDS cache.

Currently, this is only applicable to the cube cache, but it can


be extended to other cache components if needed.

cache_measure_performance false Enables performance data measurement for cube cache op-
erations.

SAP HANA Performance Guide for Developers


Multi-Dimensional Services (MDS) Query Performance PUBLIC 325
8.2.3 Manual Cache Flushing and Invalidation

If you have forced the cube cache (it means your model was not cacheable), you should either wait for the
invalidation period to expire or manually evict the cached items to invalidate them. You can only remove all the
cached items per model (or calculation view).

You can do so by issuing an INA action. An INA action requests MDS to perform a specific task, which in this
case is to remove the cached items for a specific view.

The following example shows how to issue a cache invalidation INA action:

 Sample Code

{
"Analytics": {
"Actions": [{
"Type": "FlushCubeCache",
"DataSources": [{
"PackageName": "liquid-sqe",
"ObjectName": "LIQUID_SALES_AV1"
},
{
...
}
]
}]
}
}

You can execute INA actions using the built-in stored procedure EXECUTE_MDS from the SAP HANA SQL
console. For more information about the built-in procedure, see SAP Note 2550833 .

The following example shows how this built-in procedure can be used:

 Sample Code

CALL EXECUTE_MDS (‘Analytics’, ‘’, ‘’, ‘’, ‘’, ‘{


“Analytics": {
"Actions": [{
"Type": "FlushCubeCache",
"DataSources": [{
"PackageName": "liquid-sqe",
"ObjectName": "LIQUID_SALES_AV1"
},
{
...
}
]
}]
}
}’,
?);

To prevent users from flushing everyone else's cache, the user calling this INA action must have the INIFILE
ADMIN role. This means that the user needs to have write permission to change the SAP HANA configuration.

SAP HANA Performance Guide for Developers


326 PUBLIC Multi-Dimensional Services (MDS) Query Performance
8.2.4 Examples

These examples illustrate some common scenarios.

Using the Cube Cache System-Wide with Auto-Update

Since the cube cache is always up to date for cacheable models, you simply need to enable the cube cache for
the entire SAP HANA instance. You can enable the cube cache by setting the following SAP HANA configuration
parameter:

enable_cube_cache = true

Caching is now enabled for your cacheable models. Cached items are always up to date because SAP HANA
keeps track of data and metadata updates. Also, the amount of memory the cache uses is controlled by SAP
HANA. When the system runs low on memory, less frequently used cached items are flushed from the cache.

Using the Cube Cache with a Specific Size Limit

Suppose that you want to use the above scenario and also set a specific size limit for the cache to ensure that
it does not use more than a certain amount of memory. You can enable the cache and specify a size limit by
setting the following SAP HANA configuration parameters:

enable_cube_cache = true
cube_cache_max_memory_bytes = 4294967295

The maximum memory that can be used by the cache is set to 4 GB. Once the cache hits this size, it starts
removing the objects that were last read every 2 seconds (by default, garbage collection is every 2 seconds).

Using the Cube Cache as a Time-to-Live Buffer

To use the cube cache as a time-to-live buffer, you can set the following SAP HANA configuration parameters in
the MDS section of the configuration file:

enable_cube_cache = true
force_cube_cache = true
cube_cache_with_timeout_only = true
cube_cache_timeout_ms = 180000

The cube cache is enabled and then forced so that it is also enabled for views that are not cacheable. MDS
is instructed to cache queries (or rather the cubes created from those queries) by timeout only. This means
that the cache will ignore data and model metadata refreshes. The timeout is set to 30 minutes (180000
milliseconds).

SAP HANA Performance Guide for Developers


Multi-Dimensional Services (MDS) Query Performance PUBLIC 327
Activating the Cube Cache for a Specific Model

As mentioned in the Cube Cache Configuration section, the SAP HANA configurations for MDS can also be
defined as hints on a specific calculation view in the same way as calculation engine configuration. For more
information, see SAP Note 2509161 .

For example, you enable the cube cache system-wide (as shown above), but one of your views is not cacheable
(let's say the view is defined with the now() function to filter out today's sales value), so you enable a
time-based cache strategy for that particular calculation view by setting hints only for that view.

SAP HANA Performance Guide for Developers


328 PUBLIC Multi-Dimensional Services (MDS) Query Performance
9 Important Disclaimer for Features in SAP
HANA

For information about the capabilities available for your license and installation scenario, refer to the Feature
Scope Description for SAP HANA.

SAP HANA Performance Guide for Developers


Important Disclaimer for Features in SAP HANA PUBLIC 329
Important Disclaimers and Legal Information

Hyperlinks
Some links are classified by an icon and/or a mouseover text. These links provide additional information.
About the icons:

• Links with the icon : You are entering a Web site that is not hosted by SAP. By using such links, you agree (unless expressly stated otherwise in your
agreements with SAP) to this:

• The content of the linked-to site is not SAP documentation. You may not infer any product claims against SAP based on this information.

• SAP does not agree or disagree with the content on the linked-to site, nor does SAP warrant the availability and correctness. SAP shall not be liable for any
damages caused by the use of such content unless damages have been caused by SAP's gross negligence or willful misconduct.

• Links with the icon : You are leaving the documentation for that particular SAP product or service and are entering an SAP-hosted Web site. By using
such links, you agree that (unless expressly stated otherwise in your agreements with SAP) you may not infer any product claims against SAP based on this
information.

Videos Hosted on External Platforms


Some videos may point to third-party video hosting platforms. SAP cannot guarantee the future availability of videos stored on these platforms. Furthermore, any
advertisements or other content hosted on these platforms (for example, suggested videos or by navigating to other videos hosted on the same site), are not within
the control or responsibility of SAP.

Beta and Other Experimental Features


Experimental features are not part of the officially delivered scope that SAP guarantees for future releases. This means that experimental features may be changed by
SAP at any time for any reason without notice. Experimental features are not for productive use. You may not demonstrate, test, examine, evaluate or otherwise use
the experimental features in a live operating environment or with data that has not been sufficiently backed up.
The purpose of experimental features is to get feedback early on, allowing customers and partners to influence the future product accordingly. By providing your
feedback (e.g. in the SAP Community), you accept that intellectual property rights of the contributions or derivative works shall remain the exclusive property of SAP.

Example Code
Any software coding and/or code snippets are examples. They are not for productive use. The example code is only intended to better explain and visualize the syntax
and phrasing rules. SAP does not warrant the correctness and completeness of the example code. SAP shall not be liable for errors or damages caused by the use of
example code unless damages have been caused by SAP's gross negligence or willful misconduct.

Bias-Free Language
SAP supports a culture of diversity and inclusion. Whenever possible, we use unbiased language in our documentation to refer to people of all cultures, ethnicities,
genders, and abilities.

SAP HANA Performance Guide for Developers


330 PUBLIC Important Disclaimers and Legal Information
SAP HANA Performance Guide for Developers
Important Disclaimers and Legal Information PUBLIC 331
www.sap.com/contactsap

© 2023 SAP SE or an SAP affiliate company. All rights reserved.

No part of this publication may be reproduced or transmitted in any form


or for any purpose without the express permission of SAP SE or an SAP
affiliate company. The information contained herein may be changed
without prior notice.

Some software products marketed by SAP SE and its distributors


contain proprietary software components of other software vendors.
National product specifications may vary.

These materials are provided by SAP SE or an SAP affiliate company for


informational purposes only, without representation or warranty of any
kind, and SAP or its affiliated companies shall not be liable for errors or
omissions with respect to the materials. The only warranties for SAP or
SAP affiliate company products and services are those that are set forth
in the express warranty statements accompanying such products and
services, if any. Nothing herein should be construed as constituting an
additional warranty.

SAP and other SAP products and services mentioned herein as well as
their respective logos are trademarks or registered trademarks of SAP
SE (or an SAP affiliate company) in Germany and other countries. All
other product and service names mentioned are the trademarks of their
respective companies.

Please see https://www.sap.com/about/legal/trademark.html for


additional trademark information and notices.

THE BEST RUN

You might also like