Query Processing and
Optimization
Dr. S. RENUKA DEVI
Associate Professor
SCSE
VIT Chennai Campus
Introduction to Query Processing
Introduction to Query Processing
Two internal representations of a query:
Query Tree
Query Graph
Query optimization:
The process of choosing a suitable execution strategy for
processing a query.
Query Optimization
There are two main techniques that are employed during
query optimization
First technique is based on heuristic rules for ordering
the operations in a query execution strategy
A heuristic is a rule that works well in most cases but
is not guaranteed to work well in every case
Second technique involves systematically estimating
the cost of different execution strategies and choosing
the execution plan with the lowest cost estimate.
These techniques are usually combined in a query
optimizer
Translating SQL Queries into Relational Algebra
Query block:
The basic unit that can be translated into the algebraic
operators and optimized.
A query block contains a single SELECT-FROM-WHERE
expression, as well as GROUP BY and HAVING clause if these
are part of the block.
Nested queries within a query are identified as separate
query blocks.
Aggregate operators in SQL must be included in the
extended algebra.
Translating SQL Queries into Relational Algebra
SELECT LNAME, FNAME
FROM EMPLOYEE
WHERE SALARY > ( SELECT MAX (SALARY)
FROM EMPLOYEE
WHERE DNO = 5);
SELECT LNAME, FNAME SELECT MAX (SALARY)
FROM EMPLOYEE FROM EMPLOYEE
WHERE SALARY > C WHERE DNO = 5
πLNAME, FNAME (σSALARY>C(EMPLOYEE)) ℱMAX SALARY (σDNO=5 (EMPLOYEE))
Using Heuristics in Query Optimization (1)
Process for heuristics optimization
1. The parser of a high-level query generates an initial internal
representation;
2. Apply heuristics rules to optimize the internal
representation.
3. A query execution plan is generated to execute groups of
operations based on the access paths available on the files
involved in the query.
The main heuristic is to apply first the operations that
reduce the size of intermediate results.
E.g., Apply SELECT and PROJECT operations before
applying the JOIN or other binary operations.
Using Heuristics in Query Optimization (2)
Query tree:
A tree data structure that is used to represent a relational
algebra expression.
It represents the input relations of the query as leaf nodes of
the tree, and represents the relational algebra operations as
internal nodes.
An execution of the query tree consists of executing an
internal node operation whenever its operands are available
and then replacing that internal node by the relation that
results from executing the operation.
The order of execution of operations starts at the leaf nodes
and ends at the root node.
The execution terminates when the root node operation is
executed and produces the result relation for the query.
Using Heuristics in Query Optimization (3)
Example:
For every project located in ‘Stafford’, retrieve the project
number, the controlling department number and the
department manager’s last name, address and birthdate.
Relation algebra:
PNUMBER, DNUM, LNAME, ADDRESS, BDATE (((PLOCATION=‘STAFFORD’(PROJECT))*
DNUM=DNUMBER (DEPARTMENT)) * MGRSSN=SSN (EMPLOYEE))
SQL query:
Q2: SELECT P.NUMBER,P.DNUM,E.LNAME, E.ADDRESS,
E.BDATE
FROM PROJECT AS P, DEPARTMENT AS D, EMPLOYEE
AS E
WHERE P.DNUM=D.DNUMBER AND
D.MGRSSN=E.SSN AND
P.PLOCATION=‘STAFFORD’;
Using Heuristics in Query Optimization (4)
Query tree corresponding to the relational algebric expression for Q2
Using Heuristics in Query Optimization (5)
Heuristic Optimization of Query Trees:
The same query could correspond to many different relational
algebra expressions — and hence many different query trees.
The task of heuristic optimization of query trees is to find a
final query tree that is efficient to execute.
Example:
Q: SELECT LNAME
FROM EMPLOYEE, WORKS_ON, PROJECT
WHERE PNAME = ‘AQUARIUS’ AND
PNMUBER=PNO AND ESSN=SSN
AND BDATE > ‘1957-12-31’;
Slide 15- 11
Using Heuristics in Query Optimization (6)
Slide 15- 12
Using Heuristics in Query Optimization (7)
Slide 15- 13
Using Heuristics in Query Optimization (8)
Summary of Heuristics for Algebraic Optimization:
1. The main heuristic is to apply first the operations that
reduce the size of intermediate results
2. Perform select operations as early as possible to reduce the
number of tuples and perform project operations as early as
possible to reduce the number of attributes
3. The select and join operations that are most restrictive
should be executed before other similar operations.
Slide 15- 14
Any Queries?