Lewis Cunningham September Shepherd Systems 26, 2005
SQL Analytics
Lewis R Cunningham Database Architect Sheperd Systems An Expert's Guide to Oracle http://blogs.ittoolbox.com/oracle/guide
An expert is a person who has made all the mistakes that can be made in a very narrow field. - Niels Bohr (1885 1962)
Introduction to Oracle Analytic Functions
David Wong
Author: David Wong
3 05/25/06
Introduction
Analytic functions were introduced in Release 2 of 8i and simplify greatly the means by which pivot reports and OLAP queries can be computed in straight, non-procedural SQL. Prior to the introduction of analytic functions, complex reports could be produced in SQL by complex self-joins, sub-queries and inline-views but these were resource-intensive and very inefficient.
4 September 26, 2005
Introduction
Furthermore, if a question to be answered was too complex, it could be written in PL/SQL, which by its very nature is usually less efficient than a single SQL statement
5 September 26, 2005
Addresses These Problems
Calculate a running total Top-N Queries Compute a moving average Rankings and percentiles Lag/lead analysis First/last analysis Linear regression statistics And more
6 September 26, 2005
How Analytic Functions Work
Analytic functions compute an aggregate value based on a group of rows. They differ from aggregate functions in that they return multiple rows for each group. Analytic functions are the last set of operations performed in a query except for the final ORDER BY clause. Therefore, analytic functions can appear only in the select list or ORDER BY clause.
7 September 26, 2005
The Syntax
Analytic-Function(<Argument>,<Argument>,...) OVER ( <Query-Partition-Clause> <Order-By-Clause> <Windowing-Clause> ) Analytic-Function AVG, CORR, COVAR_POP,
COVAR_SAMP, COUNT, CUME_DIST, DENSE_RANK, FIRST, FIRST_VALUE, LAG, LAST, LAST_VALUE, LEAD, MAX, MIN, NTILE, PERCENT_RANK, PERCENTILE_CONT, PERCENTILE_DISC, RANK, RATIO_TO_REPORT, STDDEV, STDDEV_POP, STDDEV_SAMP, SUM, VAR_POP, and more.
8 September 26, 2005
The Syntax
Query-Partition-Clause -Logically breaks a single result set into N groups, according to the criteria set by the partition expressions. The words "partition" and "group" are used synonymously here. The analytic functions are applied to each group independently, they are reset for each group Order-By-Clause - Specifies how the data is sorted within each group (partition). This will definitely affect the outcome of any analytic function.
9 September 26, 2005
The Syntax
Windowing-Clause - The windowing clause gives us a way to define a sliding or anchored window of data, on which the analytic function will operate, within a group. This clause can be used to have the analytic function compute its value based on any arbitrary sliding or anchored window within a group.
10 September 26, 2005
Running Total Example
Calculate a cumulative salary within a department row by row
LAST_NAME Whalen Fay Hartstein Baida Colmenares Himuro Khoo Raphaely Tobias DEPARTMENT_ID 10 20 20 30 30 30 30 30 30 SALARY 4400 6000 13000 2900 2500 2600 3100 11000 2800
11 September 26, 2005
Running Total Example
SELECT last_name, department_id, salary, SUM(salary) OVER (PARTITION BY department_id ORDER BY last_name) AS running_total ROW_NUMBER() OVER (PARTITION BY department_id ORDER BY last_name) AS emp_sequence FROM employees ORDER BY department_id, last_name;
12 September 26, 2005
Running Total Example
LAST_NAME Whalen Fay Hartstein Baida Colmenares Himuro Khoo Raphaely Tobias DEPARTMENT_ID 10 20 20 30 30 30 30 30 30 SALARY RUNNING_TOTAL 4400 4400 6000 19000 2900 5400 8000 11100 22100 24900 EMP_SEQUENCE 1 1 2 1 2 3 4 5 6 6000 13000 2900 2500 2600 3100 11000 2800
13 September 26, 2005
ROW_NUMBER function
ROW_NUMBER is an analytic function. It assigns a unique number to each row to which it is applied (either each row in the partition or each row returned by the query), in the ordered sequence of rows specified in the order_by_clause, beginning with 1.
14 September 26, 2005
Top-N Query Example
Find the top four paid sales rep by department
DEPARTMENT_ID 80 80 80 80 80 60 60 60 60 60 SALARY 11500 12000 13500 14000 11000 9000 6000 4800 4800 4200
15 September 26, 2005
LAST_NAME Ozer Errazuriz Partners Russell Cambrault Hunold Ernst Austin Pataballa Lorentz
Top-N Query Example
ROW_NUMBER SOLUTION
SELECT * FROM ( SELECT department_id, last_name, salary, ROW_NUMBER() OVER (PARTITION BY department_id ORDER BY salary DESC) AS top4 FROM employees ) WHERE top4 <= 4
16 September 26, 2005
Top-N Query Example
ROW_NUMBER SOLUTION
LAST_NAME Hunold Ernst Austin Pataballa Russell Partners Errazuriz Ozer DEPARTMENT_ID 60 60 60 60 80 80 80 80 SALARY 9000 6000 4800 4800 14000 13500 12000 11500 TOP4 1 2 3 4 1 2 3 4
17 September 26, 2005
DENSE_RANK function
DENSE_RANK computes the rank of a row in an ordered group of rows. The ranks are consecutive integers beginning with 1. The largest rank value is the number of unique values returned by the query. Rank values are not skipped in the event of ties. Rows with equal values for the ranking criteria receive the same rank.
18 September 26, 2005
Top-N Query Example
DENSE_RANK SOLUTION
SELECT * FROM ( SELECT department_id, last_name, salary, DENSE_RANK() OVER (PARTITION BY department_id ORDER BY salary DESC) AS top4 FROM employees ) WHERE top4 <= 4
19 September 26, 2005
Top-N Query Example
DENSE_RANK SOLUTION
LAST_NAME Hunold Ernst Austin Pataballa Lorentz Russell Partners Errazuriz Ozer DEPARTMENT_ID 60 60 60 60 60 80 80 80 80 SALARY 9000 6000 4800 4800 4200 14000 13500 12000 11500 TOP4 1 2 3 3 4 1 2 3 4
20 September 26, 2005
RANK function
RANK calculates the rank of a value in a group of values. Rows with equal values for the ranking criteria receive the same rank. Oracle then adds the number of tied rows to the tied rank to calculate the next rank. Therefore, the ranks may not be consecutive numbers.
21 September 26, 2005
Top-N Query Example
RANK SOLUTION
SELECT * FROM ( SELECT department_id, last_name, salary, RANK() OVER (PARTITION BY department_id ORDER BY salary DESC) AS top4 FROM employees ) WHERE top4 <= 4
22 September 26, 2005
Top-N Query Example
RANK SOLUTION
LAST_NAME Hunold Ernst Austin Pataballa Russell Partners Errazuriz Ozer DEPARTMENT_ID 60 60 60 60 80 80 80 80 SALARY 9000 6000 4800 4800 14000 13500 12000 11500 TOP4 1 2 3 3 1 2 3 4
23 September 26, 2005
First and Last Rows
The FIRST_VALUE and LAST_VALUE functions allow you to select the first and last rows from a group. These rows are especially valuable because they are often used as the baselines in calculations.
24 September 26, 2005
First Row Example
Find the employee with the lowest salary in each department
DEPARTMENT_ID 60 60 60 80 80 80 80 SALARY 9000 6000 4800 14000 13500 12000 11500
LAST_NAME Hunold Ernst Austin Russell Partners Errazuriz Ozer
25 September 26, 2005
First Row Example
SELECT department_id, last_name, salary, FIRST_VALUE(last_name) OVER (PARTITION BY department_id ORDER BY salary ASC) AS min_sal FROM employees
26 September 26, 2005
First Row Example
LAST_NAME Hunold Ernst Austin Russell Partners Errazuriz Ozer DEPARTMENT_ID 60 60 60 80 80 80 80 SALARY MIN_SAL 9000 Austin 6000 Austin 4800 Austin Ozer Ozer Ozer Ozer
14000 13500 12000 11500
27 September 26, 2005
Best Use for Me
I can use the result of a grouping (aggregate) function within each record of a group much more flexible, much less pain. I can perform relative ranking within a group it used to be tortuous with straight SQL I can perform calculations in the SELECT clause based on neighboring row values.
28 September 26, 2005
Summary
Analytic functions provide an easy mechanism to compute resultsets that, before 8i, were inefficient, impractical and, in some cases, impossible in "straight SQL". In addition to their flexibility and power, they are also extremely efficient.
29 September 26, 2005
Conclusion
This new set of functionality holds some exciting possibilities. It opens up a whole new way of looking at the data. It will remove a lot of procedural code and complex or inefficient queries that would have taken a long time to develop. Add analytic functions to your SQL arsenal and actively seek opportunities to use them.
30 September 26, 2005
Where to Get More Information
Oracle 9i Data Warehousing Guide Oracle
documentation, technet.oracle.com, March 2002 technet.oracle.com, October 2002
Oracle SQL Reference Oracle documentation,
31 September 26, 2005
SQL Analytics
Lewis R Cunningham Database Architect Sheperd Systems An Expert's Guide to Oracle http://blogs.ittoolbox.com/oracle/guide
An expert is a person who has made all the mistakes that can be made in a very narrow field. - Niels Bohr (1885 1962)