SQL window functions
Window functions and syntax
Please do not copy without permission. © ExploreAI 2023.
Window functions and syntax
What are window functions?
SQL aggregate functions are powerful tools that summarise
data across partitions. They provide valuable insights but
often restrict detailed analysis due to their summary format.
Patient
Average
Patient Clinic_i _age
Age
_id d
1 1 54 61.5
2 3 22 29.3
SQL window functions address this limitation, conducting 3 1 51
Clinic_id AVG(Age)
61.5
operations across related row sets or windows. They allow 4 2 30 1 61.5
42.3
for more nuanced tasks, like calculating running totals or 5 2 36 2 42.3 42.3
finding maximum group values, extending the capabilities 6 3 20 3 29.3 29.3
of standard aggregation functions. 7 1 67 61.5
8 1 74 61.5
9 2 61 42.3
10 3 46 29.3
In this example, we calculate the average age of patients
per clinic, but instead of viewing the aggregated results, we
use a window function to create a new column with the
average age now in each row, depending on which clinic
the patient visited.
Window function syntax
Data overview
We will use the following table called Employee that contains information about employees in a company located
in South Africa. We assume the database is selected, so we don’t specify it.
Start_date Department Province First_name Gender Salary
2015-06-01 Finance Gauteng Lily Female 35760
2020-08-01 Marketing Western_Cape Gabriel Male 30500
2022-03-01 Data_analytics Free_State Maryam Female 46200
2022-07-15 Marketing Gauteng Sophia Female 36900
2019-05-01 Data_analytics Western_Cape Alex Male 36200
2012-01-01 Finance Free_State Martha Female 48500
2014-05-01 Finance Western_Cape Joshua Male 35760
2017-06-15 Data_analytics Gauteng Emily Female 37800
2016-01-01 Marketing Western_Cape David Male 31000
3
Window function syntax
Syntax of window functions
Window functions use data from specified row sets or windows. argument
specifies columns or parameters passed to the function.
SELECT
Column1,
…, Window function
WIN_FUNCTION(arg) OVER( examples
PARTITION BY ColumnX
ORDER BY ColumnY)
FROM Aggregate Ranking Analytical
Db_name.Table_name;
● AVG()
● MAX() ● ROW_NUMBER() ● LAG()
● MIN() ● RANK() ● LEAD()
● SUM() ● DENSE_RANK() ● AVG()
● COUNT() ● NTILE() ● SUM()
● FIRST_VALUE()
● LAST_VALUE()
Window function syntax
Syntax of window functions
This keyword indicates that you are using a window function.
SELECT
Column1, (OPTIONAL)
…,
We use PARTITION BY when we want to divide our result set into
WIN_FUNCTION(arg) OVER(
partitions/windows or subsets, and apply a function to each window, based on
PARTITION BY ColumnX
the values in ColumnX.
ORDER BY ColumnY)
FROM This is similar to how GROUP BY works with aggregate functions, except
Db_name.Table_name; PARTITION BY keeps the individual row data within each partition.
Window function syntax
Syntax of window functions
(OPTIONAL)
We use ORDER BY within a window function to sort the rows within each
partition by the values in ColumnY.
SELECT This is important for ranking functions where the order of rows affects the
Column1, result. Using ORDER BY with aggregate functions we can calculate running
…, metrics like running totals (sum of all values in a partition, up to that row) or
WIN_FUNCTION(arg) OVER( running averages.
PARTITION BY ColumnX
ORDER BY ColumnY)
FROM
Db_name.Table_name;
Window function syntax
Example: Using PARTITION BY
| Suppose we want to compare each employee's salary with the average salary of the
corresponding department.
This aggregate window
function calculates the
SELECT Department, average salary from the
First_name, Salary column within each
partition, defined by the
Salary,
Query AVG(Salary) OVER (PARTITION BY Department)
Department column.
AS Average_salary In other words, the average
FROM will be calculated based on
Employee; the whole set of rows within
each partition.
7
Window function syntax
Example: Using PARTITION BY
Department First_name Salary Average_
salary Manual calculation:
(37800 + 36200 + 46200)/3 = R 40067
Data_analytics Emily 37800 40067
Data_analytics Alex 36200 40067
Manual calculation:
Data_analytics Maryam 46200 40067
(35760 + 35760 + 48500)/3 = R 40007
Finance Joshua 35760 40007
Finance Lily 35760 40007
Manual calculation:
Finance Martha 48500 40007 (31000 + 30500 + 36900)/3 = R 32800
Marketing David 31000 32800
Marketing Gabriel 30500 32800 This result set has been arranged to make the point
clear. SQL will return a table with the rows in their
Marketing Sophia 36900 32800 original positions.
8
Window function syntax
Example: ORDER BY
| Suppose we want to rank employees according to salary across the whole organisation.
SELECT RANK() assigns a unique
Department, rank to each row within a
partition, ordered by the
First_name,
specified column, such as
Salary,
Query Salary.
RANK() OVER (
ORDER BY DESC Salary) AS Rank_
FROM When RANK() assigns a
Employee; rank to a row, duplicate
values are “tied” and so they
receive the same rank and
Since there are no partitions, ORDER BY will just rank all then the next rank is
rows together. skipped.
9
Window function syntax
Example: ORDER BY
Department First_name Salary Rank_
Finance Martha 48500 1
Data_analytics Maryam 46200 2
Since there are no partitions, RANK() ranks all of the rows
Data_analytics Emily 37800 3
together.
Marketing Sophia 36900 4
Data_analytics Alex 36200 5
Note that Lily and Gabriel have the same salaries, so they
Finance Lily 35760 6 share rank 6, and David is 8th.
Finance Joshua 35760 6
Marketing David 31000 8
Marketing Gabriel 30500 9
10
Window function syntax
Example: ORDER BY
| Suppose we want to rank employees in each department from lowest to highest salary. We add
PARTITION BY Department, and then ORDER BY Salary.
SELECT
Department, RANK() arranges rows in
Query
First_name, each partition, specified by
Salary, the PARTITION BY column
RANK() OVER (PARTITION BY Department (Department), based on the
column we specify in the
ORDER BY Salary) AS Rank
ORDER BY column (Salary).
FROM
Employee;
By adding PARTITION BY, we will rank employees in each department separately.
11
Window function syntax
Example: ORDER BY
Department First_name Salary Rank
Each employee is ranked in their own department.
Data_analytics Maryam 46200 1
Data_analytics Alex 36200 2
Data_analytics Emily 37800 3 Joshua and Lily have the same salaries, so they are both
“tied” for rank 1. RANK() skips the next rank, and assigns
Finance Joshua 35760 1 rank 3 to Marta.
Finance Lily 35760 1
Manual calculation:
Finance Martha 48500 3
(31000 + 30500 + 36900)/3 = R 32800
Marketing Sophia 36900 1
Marketing David 31000 2 This result set has been arranged to make the point
clear. SQL will return a table with the rows in their
Marketing Gabriel 30500 3 original positions.
12
Window function syntax
Aggregate window functions with ORDER BY
| Suppose we want to compare the average salaries for each department over time. We
PARTITION BY Department, and then ORDER BY Date_started.
SELECT
Date_started,
Using PARTITION BY and
Department, ORDER BY with an
First_name, aggregate function orders
Query Salary, each partition and we get
AVG(Salary) OVER (PARTITION BY Department “running” metrics in the
result set, like running
ORDER BY Date_started) AS Average_salary
totals or running averages.
FROM
Employee;
By using ORDER BY with an aggregate function, we calculate the running average of
Salary for each department that changes as we hire more employees.
13
Window function syntax
Aggregate window functions with ORDER BY
Date First_ Average
_started Department name Salary _salary 01. Average salary = salary for Martha
2012-01-01 Finance Martha 48500 48500 01.
2014-05-01 Finance Joshua 29500 39000 02. Average salary between Martha and Joshua
02.
(48500 + 29500)/2 = R 39000
2015-06-01 Finance Lily 35760 37920 03.
2016-01-01 Marketing David 31000 31000 04.
Average salary among Martha, Joshua, and
2020-08-01 Marketing Gabriel 30500 30750 03. Lily
(36200 + 37800 + 46200)/3 = R 40067
2022-07-15 Marketing Sophia 36900 32700
Average_salary resets to R 31000 in the
2017-06-15 Data_analytics Emily 37800 37800 04.
Marketing partition.
2019-05-01 Data_analytics Alex 36200 37000
This result set has been arranged to make the point clear.
2022-03-01 Data_analytics Maryam 46200 40033
SQL will return a table with the rows in their original positions.
14