Query Optimization

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 30

Query Optimization

Heuristic Optimization

Lecturer: Dr Hui Ma
Engineering and Computer Science
Outline
 Query Processing in DBMS
 Query optimization techniques
 Heuristic query optimization
 Translating SQL into relational algebra
 Reordering query operations
 Transformation rules for relational algebra operations

 Readings from the textbook:


 Chapter 19: Algorithms for Query Processing and
Optimization
 Chapters 17: Disk Storage, Basic File Structures, and
Hashing (Sections: 17.2, to 17.8)
 Chapter 18: Indexing Structures for Files
(Sections: 18.1 to 18.5)
SWEN 304 Lect10: Heuristic Query Optimization 2
Query Processing in DBMS
 Users/applications submit queries to the DBMS
 The DBMS processes queries before evaluating them
 Recall: DBMS mainly use declarative query languages (such
as SQL)
 Queries can often be evaluated in different ways

 SQL queries do not determine how to evaluate them

SWEN 304 Lect10: Heuristic Query Optimization 3


Query Processing in DBMS

SWEN 304 Lect10: Heuristic Query Optimization 4


Query Optimisation
 Query preparation:
 Decompose query into query blocks containing
 Exactly one SELECT and FROM clause
 At most one WHERE, GROUP BY and HAVING clause

 Nested queries within a query are identified as separate


query blocks
 Aggregate operators in SQL must be included in the extended
algebra
SWEN 304 Lect10: Heuristic Query Optimization 5
Heuristic Query Optimization: An Example
Student({Lname, Fname, StudId, Major})
Grades({StudId, CrsId, Grade})
Course({Cname, CrsId, Points, Dept})

SQL query:
SELECT StudId, Lname
FROM Student s, Grades g, Course c
WHERE s.StudId=g.StudId AND g.CrsId= c.CrsId
AND Grade = ‘A+’ AND Dept = ‘Comp’ AND Major = ‘Math’;

• Relational Algebra:
StudId, LName (σ s.StudId=g.StudId ˄ g.CrsId=c.CrsId ˄ Grade=‘A+’ ˄ Major = ‘Math’ ˄ Dept =
‘Comp’ (Course × (Student × Grades)))

SWEN 304 Lect10: Heuristic Query Optimization 6


Query Tree
 Query tree: A tree data structure that corresponds to a
relational algebra expression
 It represents the input relations of the query as leaf nodes
of the tree, and
 Represents the relational algebra operations as internal
nodes
 The initial query tree for the example SQL query:
 Lower level nodes, starting from leaves, contain Cartesian
product operators
 These are applied onto relations from the SQL FROM clause
 After that are (relational) select and join conditions from
SQL WHERE clause to upper tree nodes applied
 Finally is project operator of the SELECT clause attribute
list to tree root applied

SWEN 304 Lect10: Heuristic Query Optimization 7


Query Tree Example
• Relational Algebra:
StudId, LName (σ s.StudId=g.StudId ˄ g.CrsId=c.CrsId ˄ Grade=‘A+’ ˄ Major= ‘Math’ ˄ Dept =
‘Comp’ (Course × (Student × Grades)))

SWEN 304 Lect10: Heuristic Query Optimization 8


Query Tree Exercise
• Draw query trees for the following relational algebra expressions

StudId (σGrade=‘A+’ (Grades))

StudId, LName (σGrade=‘A+’ (Student * Grades))

(StudId,(σ CrsId = ‘M214’ (Grades))) - (StudId,(σ CrsId = ‘C201’ (Grades)))

SWEN 304 Lect10: Heuristic Query Optimization 9


Query Execution Plan
 For each query tree, computation proceeds bottom-up:
 Child nodes must be executed before their parent nodes
 But there can exist multiple methods of executing sibling
nodes, e.g.
 Process sequentially
 Process in parallel
 A query execution plan consists of a query tree with
additional annotation at each node indicating the
implementation method for each RA operator
 The query optimizer determines which query execution
plan is optimal, using a variety of algorithms
 Realistically, we cannot expect to always find the best
plan but we expect to consistently find a plan that is
good (near-optimal)
SWEN 304 Lect10: Heuristic Query Optimization 10
Query Optimisation
 Query optimizer rewrites the naïve (canonical) query plan
into a more efficient evaluation plan

 Choosing a suitable query execution plan is called Query


Optimization and it mainly means making decisions
 On the order of the execution of basic relational algebra
operations and
 On data access methods
SWEN 304 Lect10: Heuristic Query Optimization 11
Query Optimization Techniques
 Two main query optimization techniques
 One relies on the heuristic reordering of the relational algebra
operations,
 The other involves systematic estimating the cost of different
execution plans, and choosing one with the lowest cost

 Heuristics vs. cost-based optimization


 General heuristics allow to improve performance of most
queries
 Costs estimated from statistics allow for a good optimization
of each specific query
 Most DBMS use a hybrid approach between heuristics and
cost estimations

SWEN 304 Lect10: Heuristic Query Optimization 12


Query Optimization Techniques

 Enumerating potential query plans:

SWEN 304 Lect10: Heuristic Query Optimization 13


Heuristic Query Optimization
 Process for heuristics optimization
1. The parser of a high-level query generates an initial
internal representation
2. Apply heuristics rules to optimize the internal
representation
3. A query execution plan is generated to execute
groups of operations based on the access paths
available on the files involved in the query

 The main heuristic is to apply first the operations that


reduce the size of intermediate results
 E.g., Apply SELECT and PROJECT operations before
applying the JOIN or other binary operations

SWEN 304 Lect10: Heuristic Query Optimization 14


Using Heuristics in Query Optimization (1)
 General Transformation Rules for Relational Algebra
Operations:
1. Cascade of : A conjunctive selection condition can be
broken up into a cascade (sequence) of individual 
operations:
c1 ʌ c2 ʌ ... ʌ cn(R) = c1 (c2 (...(cn(R))...) )
 

2. Commutativity of : The  operation is commutative:


  ( (R)) =  ( (R))
c1 c2 c2 c1
3. Cascade of : In a cascade (sequence) of  operations,
all but the last one can be ignored:
List1 (List2 (...(Listn(R))...) ) = List1(R)
 

4. Commuting  with : If the selection condition c


involves only the attributes A1, ..., An in the projection
list, the two operations can be commuted:
A1, A2, ..., An (c (R)) = c (A1, A2, ..., An (R))
 

SWEN 304 Lect10: Heuristic Query Optimization 15


Using Heuristics in Query Optimization (2)
 General Transformation Rules for Relational Algebra
Operations (contd.):
5. Commutativity of ( and x ):

 R1 C R2 = R2 CR1; R1 x R2 = R2 x R1
6. Commuting  with (or x ): If all the attributes in the selection
condition c involve only the attributes of one of the relations
being joined—say, R1—the two operations can be commuted
as follows:
  (R R2 ) = c (R1) R2
c 1
 Alternatively, if the selection condition c can be written as (c1
and c2), where condition c1 involves only the attributes of R1
and condition c2 involves only the attributes of R2, the
operations commute as follows:
  (R R2) = c1(R1) c2 (R2)
c 1
SWEN 304 Lect10: Heuristic Query Optimization 16
Using Heuristics in Query Optimization (3)
 General Transformation Rules for Relational Algebra
Operations (contd.):
7. Commuting  with (or x): Suppose that the projection list
is AL = {A1, ..., An, B1, ..., Bm}, where A1, ..., An are
attributes of R1 and B1, ..., Bm are attributes of R2. If the join
condition c involves only attributes in AL, the two operations
can be commuted as follows:
 AL (R1 C R2) = (A1, ..., An (R1)) C ( B1, ..., Bm (R2))
 If the join condition C contains additional attributes not in
AL, these must be added to the projection list, and a final 
operation is needed.
 AL(R1 C R2) = AL(AL1(R1)  (R2)),
C AL2

where ALi contains the common attributes in Ri and AL and


the common attributes of R1 and R2
SWEN 304 Lect10: Heuristic Query Optimization 17
Using Heuristics in Query Optimization (4)

 
General Transformation Rules for Relational Algebra
Operations (contd.):
8. Commutativity of set operations: The set operations
and ∩ are commutative but “–” is not.
9. Associativity of , x, , and ∩ : These four operations
are individually associative; that is, if  stands for any
one of these four operations (throughout the
expression), we have
(R1  R2)  R3 = R1  (R2  R3)

10. Commuting  with set operations: The  operation


commutes with , ∩, and –. If  stands for any one of
these three operations, we have
 c (R1  R2) = (c (R1))  (c (R2))

SWEN 304 Lect10: Heuristic Query Optimization 18


Using Heuristics in Query Optimization (5)

 
 General Transformation Rules for Relational Algebra
Operations (contd.):
11. The  operation commutes with .
AL (R1 R2) = (AL (R1)) (AL (R2))

12. Converting a (, x) sequence into : If the condition c


of a  that follows a x corresponds to a join condition,
convert the (, x) sequence into a as follows:
(C (R1 x R2)) = R1 C R2

SWEN 304 Lect10: Heuristic Query Optimization 19


Using Heuristics in Query Optimization (6)
 Summary of Heuristics for Algebraic Optimization:
1. The main heuristic is to apply first the operations
that reduce the size of intermediate results.
2. Perform select operations as early as possible to
reduce the number of tuples and perform project
operations as early as possible to reduce the number
of attributes. (This is done by moving select and
project operations as far down the tree as possible,
e.g. rule 6, 7, 10, 11)
3. The select and join operations that are most
restrictive should be executed before other similar
operations. (This is done by reordering the leaf
nodes of the tree among themselves and adjusting
the rest of the tree appropriately.)
SWEN 304 Lect10: Heuristic Query Optimization 20
An Example Initial Query Tree:
 Relational algebra query
StudId, LName (σGrade=‘A+’ ˄ Major = ‘Math’ ˄ Dept = ‘Comp’ ˄ s.StudId=g.StudId ˄
g.CrsId=c.CrsId (Course × (Student × Grades)))

 Query tree

SWEN 304 Lect10: Heuristic Query Optimization 21


Analysis of the Initial Query Tree
 According to the structure of the initial query
tree, two Cartesian products should be executed
first

 The main heuristic: Apply SELECT and PROJECT


operations before applying the JOIN or other
binary operations

 Hence, move select operations down the tree

SWEN 304 Lect10: Heuristic Query Optimization 22


Query Tree After Moving Down Selects
 Move select operations down the tree (Rule 6)

SWEN 304 Lect10: Heuristic Query Optimization 23


Query Tree After Introducing Joins
 Replacing each Cartesian product followed by a select
according to a join condition with a join operator (Rule 12)

SWEN 304 Lect10: Heuristic Query Optimization 24


Query Tree After Switching Join Nodes
 Assume there are less Comp courses than Math major students
 switching Course and Student so that the very restrictive select
operation could be applied as early as possible (Rule 9)

SWEN 304 Lect10: Heuristic Query Optimization 25


Query Tree after Pushing down Project
 keeping in intermediate relations only the attributes needed by
subsequent operations by applying project () operations as early as
possible (Rule 7)

SWEN 304 Lect10: Heuristic Query Optimization 26


Effect of Project Operations
 Performing project operations as early as
possible brings an improvement in the efficiency
of a query execution
 tuples get shorter after projection, more of them
will fit into a file block of the same size
 the same number of tuples will be contained in a
smaller number of blocks
 As there will be less blocks to be processed by
subsequent operations, the query execution will
be faster

SWEN 304 Lect10: Heuristic Query Optimization 27


Query Optimisation
 All transformations can be applied to the canonical
evaluation plan
 However, there is no best operator sequence that is
always optimal
 Efficiency depends on the current data instance, the
actual implementation of base operations, access paths
and indexes, etc.
 Idea: assign average costs to operators (nodes) and
aggregate costs for each query plan

SWEN 304 Lect10: Heuristic Query Optimization 28


Summary

 Heuristic optimization converts a declarative


query to a canonical algebraic query tree that is
then gradually transformed using transformation
rules
 The main heuristics is to perform unary
relational operations (selection and projection)
before binary operations (joins, set theoretic),
and aggregate functions with (or without)
grouping

SWEN 304 Lect10: Heuristic Query Optimization 29


Next

 Cost-based query optimization


 Readings:
 Chapter 19: Algorithms for Query Processing and
Optimization
 Chapters 17: Disk Storage, Basic File Structures, and Hashing
(Sections: 17.2, to 17.8)
 Chapter 18: Indexing Structures for Files

(Sections: 17.1 to 17.5)


 Supposed knowledge: File Organization – COMP261

SWEN 304 Lect10: Heuristic Query Optimization 30

You might also like