0% found this document useful (0 votes)
9 views

08 Query Processing Strategies and Optimization

Uploaded by

beshahashenafi32
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

08 Query Processing Strategies and Optimization

Uploaded by

beshahashenafi32
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

Query Processing

Strategies and
Optimization
CPS352: Database Systems

Simon Miner
Gordon College
Last Revised: 10/25/12
Agenda

• Check-in

• Design Project Presentations

• Query Processing

• Programming Project

• Exam 1 (time permitting)

• Homework 4
Check-in
Design Project
Presentations
Query Processing and
Optimization
Different Ways to Execute
Queries
• Database creates a plan to get the results for a query
• Not just one way to do this.

• Example : Find the titles of all books written by


Korth.
• π title σ author = ‘Korth’ Book |X| BookAuthor
• π title Book |X| σ author = ‘Korth’ BookAuthor

• Good DBMS will transform queries to make them


as efficient as possible
• Minimize disk accesses
Selection Strategies
• Linear search – full table scan
• Cost of potentially accessing each disk block containing the
desired data

• Binary search (with B+ tree index)


• Exact matches
• Multiple matches
• Range queries
• Complex queries

• Index often requires disk accesses the index structure as


well as for actual data
• Typically far fewer for data than linear search
• Index root and (perhaps) first level kept in buffer pool
Query Type vs. Index Type

Condition Example Clustering / Secondary Hashed Index


Primary Index Index
Exact match id = 12345 Great! Great! Great!
on candidate
key
Exact match status = N/A Find first Find first
on non-key ‘Active’ match (+ match (+
potential scan) potential scan)
Range query age between 21 Find first Less helpful Not useful
and 65 match +
sequential scan
Complex query color = ‘blue’ Not useful Not useful Not useful
or status = (multiple
‘Inactive’ indexes help)
Join Strategies

• Joins are most expensive part of query processing


• Number of tuples examined can approach the product of the
number of records in tables being joined

• Example
• σ Borrower.lastName = BookAuthor.authorName Borrower X BookAuthor
• Where BookAuthor has 10K tuples and Borrower has 2K tuples
• Cartesian join yields 20 million tuples to process
Nested Loop Join
Nested Block Join
Buffering an Entire Relation
Using Indexes to Speed Up
Joins
• Example: Borrower |X| CheckedOut
• Assume
• 2K Borrower tuples, 1K CheckedOut tuples
• 20 records per block (so 100 and 50 blocks for each table, respectively)
• We cannot buffer either table entirely
• Without indexes – nested block join takes 5050 or 5100 disk accesses,
depending on which table is in the outer loop
• With index on Borrower.borrowerID – exactly one match (PK)
• Scan all 1000 CheckedOut records (50 blocks) – each matches exactly one
Borrower record, which can be looked up in the index
• Requires processing only 2000 tuples
• Not quite as good as it seems
• Each borrower may require a separate disk access (50 + 1000 = 1050 accesses)
• Traversing index might take multiple disk accesses (especially B+ Tree indexes)
Temporary Indexes

• Indexes created and buffered for the purpose of a single


query and then discarded
• Example: neither Borrower nor CheckedOut is indexed
• Borrower |X| CheckedOut might cause a temporary index
to be built on Borrower.borrowerID
• If each (dense) index entry takes ~10 bytes, entire index will
be ~20K
• Index construction requires reading all 2K borrowers = 100
disk accesses
• Join itself costs up to 1050 disk accesses (see previous slide)
• Total of 1150 disk accesses
Merge Join
Order of Joins
• For multiple joins, performance can be greatly impacted by the order
in which the joins are done
• Example
• π last, first, authorName Borrower |X| BookAuthor |X| CheckedOut
• Assume 2K borrowers, 1K CheckedOut records, and 10K authors
• Each book has an average of 2 authors
• 3 ways to do the (binary commutative) join operations
• ( Borrower|X| BookAuthor ) |X| CheckedOut
• ( BookAuthor |X| CheckedOut ) |X| Borrower
• ( Borrower |X| CheckedOut ) \X| BookAuthor
• Final number of tuples is the same, but intermediate joins create
temporary tables which are then joined with the remaining table
• Which way is most efficient in light of this?
Rules of Equivalence
• Two formulations of a query are equivalent if the
produce the same set of results
• Not necessarily in the same order

• Example : Find the titles of all books written by Korth.


• select title
from Book natural join BookAuthor
where authorName = ‘Korth’;
• Equivalent relational algebra queries
• π title σ author = ‘Korth’ Book |X| BookAuthor

• π title Book |X| σ author = ‘Korth’ BookAuthor

• Equivalent, but the same in terms of performance


Equivalence Rules

1. Conjunctive selection operations can be deconstructed into a


sequence of individual selections.
s q Ùq ( E ) = s q (s q ( E ))
1 2 1 2
2. Selection operations are commutative.
s q (s q ( E )) = s q (s q ( E ))
1 2 2 1

3. Only the last in a sequence of projection operations is


needed, the others can be omitted.
 L1 ( L2 (( Ln ( E ))))   L1 ( E )

4. Selections can be combined with Cartesian products and


theta joins.
a. (E1 X E2) = E1  E2
b. 1(E1 2 E2) = E1 1 2 E2

Database System Concepts - 6th Edition 1.18 ©Silberschatz, Korth and Sudarshan
Equivalence Rules (Cont.)
5. Theta-join operations (and natural joins) are commutative.
E1  E2 = E2  E1
6. (a) Natural join operations are associative:
(E1 E2) E3 = E1 (E2 E3)

(b) Theta joins are associative in the following manner:

(E1 1 E2) 2 3 E3 = E1 1 3 (E2 2 E3)

where 2 involves attributes from only E2 and E3.

Database System Concepts - 6th Edition 1.19 ©Silberschatz, Korth and Sudarshan
Equivalence Rules (Cont.)
7. The selection operation distributes over the theta join operation
under the following two conditions:
(a) When all the attributes in 0 involve only the attributes of one
of the expressions (E1) being joined.

0E1  E2) = (0(E1))  E2

(b) When  1 involves only the attributes of E1 and 2 involves


only the attributes of E2.
1 E1  E2) = (1(E1))  ( (E2))

Database System Concepts - 6th Edition 1.20 ©Silberschatz, Korth and Sudarshan
Equivalence Rules (Cont.)
8. The projection operation distributes over the theta join operation
as follows:
(a) if  involves only attributes from L1  L2:
 L1 L2 ( E1  E2 )  ( L1 ( E1 ))  ( L2 ( E2 ))

(b) Consider a join E1  E2.


l Let L1 and L2 be sets of attributes from E1 and E2,
respectively.
l Let L3 be attributes of E1 that are involved in join condition ,
but are not in L1  L2, and
l let L4 be attributes of E2 that are involved in join condition ,
but are not in L1  L2.
 L  L ( E1
1 2  E2 )   L  L (( L  L ( E1 ))
1 2 1 3  ( L  L ( E 2 )))
2 4

Database System Concepts - 6th Edition 1.21 ©Silberschatz, Korth and Sudarshan
Equivalence Rules (Cont.)

9. The set operations union and intersection are commutative


E1  E2 = E2  E1
E1  E2 = E2  E1
n (set difference is not commutative).
10. Set union and intersection are associative.
(E1  E2)  E3 = E1  (E2  E3)
(E1  E2)  E3 = E1  (E2  E3)
11. The selection operation distributes over ,  and –.
 (E1 – E2) =  (E1) – (E2)
and similarly for  and  in place of –
Also:  (E1 – E2) = (E1) – E2
and similarly for  in place of –, but not for 
12. The projection operation distributes over union
L(E1  E2) = (L(E1))  (L(E2))
Database System Concepts - 6th Edition 1.22 ©Silberschatz, Korth and Sudarshan
Push Selections Inward
• Do selections as early as possible
• Reduces (“flattens”) the number of records in the relation(s) being
joined

• Example:
• π title σ author = ‘Korth’ Book |X| BookAuthor
• π title Book |X| σ author = ‘Korth’ BookAuthor

• Sometimes this is not feasible


• σ Borrower.lastName = BookAuthor.authorName Borrower X BookAuthor
• i.e. when there are no shared attributes

• Alter the structure of the selection itself


• Find late checked out books that cost more than $20.00.
• σ purchasePrice > 20 ∧ dateDue < today Book |X| CheckedOut
• σ purchasePrice > 20 Book |X|σ dateDue < today CheckedOut
Push Projections Inward

• Do projections as early as possible


• Reduces (“narrows”) the number of columns in the relation(s)
being joined

• Example:
• π lastName, firstName, title, dateDue Borrower|X| CheckedOut |X| Book
• π lastName, firstName, title, dateDue Borrower|X|
(π borrowerID, title, dateDue CheckedOut |X| Book )
• Reduces the number of columns in the temporary table from the
intermediate join
Statistics and Query
Optimization
• Using statistics about database objects can help speed
up queries

• Maintaining statistics as the data in the database


changes is a manageable process

• Types of statistics
• Table statistics
• Column statistics
Table Statistics

• On a relation r
• nr = number of tuples in the relation
• br = number of blocks used by the relation
• lr = size (in bytes) of a tuple in the relation
• fr = blocking factor, number of tuples per block
• Note that fr = floor( block size / lr ) if tuples do not span
blocks
• Note that br = ceiling( nr / fr ) if tuples in r reside in a single
file and are not clustered with other relations
Column Statistics
• on a column A
• V( A, r ) = number of distinct values in the column
• If A is a superkey, then V( A, r ) = nr
• If A is not a superkey, the number of times each
column value occurs can be estimated by nr / V( A, r )
• If column A is indexed, V( A, r ) s relatively easy to
maintain
• Keep track of the count of entries in the index

• Histogram of the relative frequency of column


values in different ranges
Estimating the Size of a Join
• Cartesian product– r X s
• Number of tuples in join = nr X s = nr * ns
• Size of each tuple in join = lr X s = lr + ls

• Natural join – r |X| s, where r and s have A in common


• The size of the join can be estimated in two ways
• The ns tuples of s will join with nr / V( A, r ) tuples of r
for ns * nr / V( A, r ) total tuples
• The nr tuples of r will join with ns / V( A, s ) tuples of s
for nr * ns / V( A, s ) total tuples
• We want to use the smaller of these estimates
• min(nr * ns / V(A, s) , ns * nr / V(A, r) ) = ns * nr / max( V(A, r), V(A, s) )
• Also note that V(A, r |X| s) = min( V(A, r), V(A, s) )
• Some tuples in the relation with the larger number of column values do not join
with any tuples in the other relation
Example Join Estimation
• π last, first, authorName Borrower |X| BookAuthor |X| CheckedOut

• 3 ways to do the join operations – Which is most efficient?


• ( Book |X| BookAuthor ) |X| CheckedOut
• ( BookAuthor |X| CheckedOut ) |X| Borrower
• ( Borrower |X| CheckedOut |X| BookAuthor

• Statistics
nr V( A, r )
nBorrower = 2000 V( borrowerID, Borrower ) = 2000
nCheckedOut = 1000 V( borrower, CheckedOut ) = 100
nBookAuthor = 10,000 V( callNo, CheckedOut ) = 500
V( callNo, BookAuthor ) = 5000
Programming Project
Part I
Exam 1
Homework 4

You might also like