QUERY PROCESSING
Relational Algebra
University of Waterloo
1-1
List of Slides
1
2 How do we Execute Queries?
3 Relational Algebra
4 Examples
5 Projection
6 Selection
7 Product
8 Union
9 Difference
10 Calculus-to-Algebra Translation
11 Joins
12 Duplicate Operations
13 Example
14 Algebra Equivalences
15 Implementation of the Operators
16 Atomic Relations
17 Joins
18 Duplicates and Aggregates
19 The rest of the lot
20 Summary
Query Processing: Relational Algebra: 2
How do we Execute Queries?
1. Parsing, typechecking, etc.
2. Relational Calculus (SQL) translated
to Relational Algebra
3. Optimization:
generates an efficient query plan
uses statistics collected about the stored data
4. Plan execution:
access methods to access stored relations
physical relational operators to combine relations
Introduction to Data Management University of Waterloo
Query Processing: Relational Algebra: 3
Relational Algebra
Idea: “queries = functions over a universe of relations”.
is implemented as
universe : finite relations over DOM
and relational operations:
Projection: ! " ,
Selection: #$ "
Product: % % "
Union: & % "
Difference: ' % "
operators are easy to implement and can be composed
Introduction to Data Management University of Waterloo
3-1
Algebraic Approach:
Boolean algebra vs. propositional logic
Tarski, 1931
Cylindric Algebra
Codd, 1970
Relational Algebra
matches range-restricted queries
defined only for finite relations (no top)
Query Processing: Relational Algebra: 4
Examples
Account
Acnt# Type Balance Bank Branch
1234 CHK $1000 TD 1
1235 SAV $20000 TD 2
1236 CHK $2500 CIBC 1
1237 CHK $2500 Royal 5
2000 BUS $10000 Royal 5
2001 BUS $10000 TD 3
Bank
Name Address
TD TD Centre
CIBC CIBC Tower
Introduction to Data Management University of Waterloo
Query Processing: Relational Algebra: 5
Projection
Definition:
where is a set of column numbers.
Example:
!#"%$
1234 CHK
1235 SAV
1236 CHK
1237 CHK
2000 BUS
2001 BUS
Introduction to Data Management University of Waterloo
Query Processing: Relational Algebra: 6
Selection
Definition:
#
where is a built-in selection condition.
Example:
#
!#" $
1235 SAV $20000 TD 2
2000 BUS $10000 Royal 5
2001 BUS $10000 TD 3
Introduction to Data Management University of Waterloo
Query Processing: Relational Algebra: 7
Product
Definition:
%
Example: #" $ %
"
1234 CHK $1000 TD 1 TD TD Centre
1235 SAV $20000 TD 2 TD TD Centre
1236 CHK $2500 CIBC 1 TD TD Centre
1237 CHK $2500 Royal 5 TD TD Centre
2000 BUS $10000 Royal 5 TD TD Centre
2001 BUS $10000 TD 3 TD TD Centre
1234 CHK $1000 TD 1 CIBC CIBC Tower
1235 SAV $20000 TD 2 CIBC CIBC Tower
1236 CHK $2500 CIBC 1 CIBC CIBC Tower
1237 CHK $2500 Royal 5 CIBC CIBC Tower
2000 BUS $10000 Royal 5 CIBC CIBC Tower
2001 BUS $10000 TD 3 CIBC CIBC Tower
Introduction to Data Management University of Waterloo
Query Processing: Relational Algebra: 8
Union
Definition:
&
Example:
# ! " $ & #
!#"%$
1234 CHK
1236 CHK
1237 CHK
1235 SAV
Introduction to Data Management University of Waterloo
Query Processing: Relational Algebra: 9
Difference
Definition:
Example:
Is there an account without a bank?
!#"%$ ' #
!#"%$ %
"
1237 Royal
2000 Royal
Introduction to Data Management University of Waterloo
Query Processing: Relational Algebra: 10
Calculus-to-Algebra Translation
The translation is a simple recursive procedure:
Trans
Trans Trans % Trans
Trans # Trans
Trans
Trans
Trans Trans & Trans
Trans Trans ' Trans
where and have disjoint sets of free variables
and and are union compatible.
Theorem [Codd]:
For every (safe) relational calculus
query there is an equivalent RA expression
Introduction to Data Management University of Waterloo
Query Processing: Relational Algebra: 11
Joins
An equality condition after product (common situation):
1234 CHK $1000 TD 1 TD TD Centre
1235 SAV $20000 TD 2 TD TD Centre
1236 CHK $2500 CIBC 1 TD TD Centre
1237 CHK $2500 Royal 5 TD TD Centre
2000 BUS $10000 Royal 5 TD TD Centre
2001 BUS $10000 TD 3 TD TD Centre
1234 CHK $1000 TD 1 CIBC CIBC Tower
1235 SAV $20000 TD 2 CIBC CIBC Tower
1236 CHK $2500 CIBC 1 CIBC CIBC Tower
1237 CHK $2500 Royal 5 CIBC CIBC Tower
2000 BUS $10000 Royal 5 CIBC CIBC Tower
2001 BUS $10000 TD 3 CIBC CIBC Tower
1234 CHK $1000 TD 1 TD TD Centre
1235 SAV $20000 TD 2 TD TD Centre
2001 BUS $10000 TD 3 TD TD Centre
1236 CHK $2500 CIBC 1 CIBC CIBC Tower
we introduce a special composite binary operator join:
# %
absolutely necessary for performance.
Introduction to Data Management University of Waterloo
Query Processing: Relational Algebra: 12
Duplicate Operations
Projection and duplicate elimination
(set) projection ( ) is usually split to
1. a duplicate preserving projection ( )
2. a duplicate elimination operator ( )
Aggregates, counting, etc.
additional operator
groups by columns in
applies aggregates (new columns in result).
Duplicate versions of standard operators
Product, Selection (always preserve duplication)
Duplicate preserving Union and Difference
Introduction to Data Management University of Waterloo
Query Processing: Relational Algebra: 13
Example
SELECT V.Vno, Vname, Count(*), Sum{Amount}
FROM Vendor V, Transaction T
WHERE V.Vno = T.Vno
AND V.Vno Between 1000 and 2000
GROUP BY V.Vno, Vname
HAVING sum(Amount) > 100
Result
Scan
(Sum(Amount) > 100)
Grouping
(Vno, Vname)
Join
(V.Vno = T.Vno)
Scan
(Vno between 1000 and 2000)
Vendor Transaction
Introduction to Data Management University of Waterloo
Query Processing: Relational Algebra: 14
Algebra Equivalences
1. Selections can be “staged”:
#$
#$ #$
2. Selections are commutative:
#$ #$
#$ #$
3. only the last projection counts:
4. product can be replaced by join:
#$ %
5. joins are associative ( only over columns of ):
6.
for involving columns of only (and vice versa):
#
#
etc, etc, etc.
Introduction to Data Management University of Waterloo
Query Processing: Relational Algebra: 15
Implementation of the Operators
every of the operators of Relational Algebra can be
implemented in several ways
not always clear which is the best choice
we implement as many as possible (so we can
pick depending on the particular query and
database instance).
the operators are composed using an iterator
protocol.
in practice, the operations are often decomposed to
more primitive operations (e.g., retrieve a pointer to a
record and extract a field from previously retrieved
record).
Introduction to Data Management University of Waterloo
Query Processing: Relational Algebra: 16
Atomic Relations
We use the Access Methods (defined in last lecture) to
gain access to the stored data:
if an index (where is the search attribute)
is available we replace a subquery of the form
#
with accessing directly,
Otherwise: check all file blocks holding tuples for .
Even if an index is available, scanning the entire relation
may be faster in certain circumstances:
the relation is very small
the relation is large, but we expect most of the tuples
in the relation to satisfy the selection criteria
Introduction to Data Management University of Waterloo
Query Processing: Relational Algebra: 17
Joins
THE most studied operation of relational algebra;
There are many other ways to perform a join.
1. The Nested Loop Join
for t in R do for u in S do
if C(t,u) then output (tu)
with the optional use of indices on S
2. The Sort-Merge Join
sort the tuples of R and of R on the common
values, then merge the sorted relations.
3. The Hash Join
hash each tuple of R and of S to “buckets” by
applying a hash function to columns involved in
the join condition. Within each bucket, look for
tuples with the matching values.
the cost of the join depends on the chosen method
Introduction to Data Management University of Waterloo
Query Processing: Relational Algebra: 18
Duplicates and Aggregates
How do we eliminate duplicates in results of operations?
How do we group tuples for aggregation?
Similar solution:
1. sort the result and then eliminate
duplicates/aggregate
2. hash the result and do the same
often an index (e.g., a B+ tree) can be used to avoid
the sorting/hashing phase
Introduction to Data Management University of Waterloo
Query Processing: Relational Algebra: 19
The rest of the lot
we assume a natural implementation for selection,
duplicate-preserving projection, and duplicate
preserving union.
set difference can be evaluated similarly to a join.
additional operations:
sorts (used for Sort-Merge Join, Aggregation, and
Duplicate Elimination). Uses an external sort
algorithm (essentially a merge-sort adopted for
disk)
temporary store (to avoid recomputation of
subqueries; can be inserted anywhere in the
query plan)
...
Introduction to Data Management University of Waterloo
Query Processing: Relational Algebra: 20
Summary
Queries are translated to relational algebra
simple algebraic formalism
easy to manipulate
lot of equivalences of RA expressions
many implementations of basic operators
a physical plan for a query is a relational algebra
expression with choice of implementation for every
operator
the choices leave room for query optimization
Introduction to Data Management University of Waterloo