Query Trees and Heuristics For Query Optimization
Query Trees and Heuristics For Query Optimization
Query Trees and Heuristics For Query Optimization
Query Optimization
1
2
3
4
5
- Query tree: relations at the leaf, operators at the nodes
- Perform operations until the root node is reached
- Node 1, 2, 3 must be executed in sequence
- Query tree represents a specific order of execution
Example:
Find the last names of employees born after 1957 who work on a
project name ‘Aquarius’.
SELECT E.Lname
FROM EMPLOYEE E, WORKS_ON W, PROJECT P
WHERE P.Pname = ‘Aquarius’ AND P.Pnumber = W.Pno AND
E.Ssn=W.Ssn AND E.Bdate = ‘1957-12-31’;
Fig. 19.2(a) is the Query tree, not optimized
Fig. 19.2(b) is the Query tree with improvement
Fig. 19.2 (c ) more improvement
Fig. 19.2 (d ) more improvement
A query can be transformed step by step into an equivalent query
that is more efficient to execute (need some rules to do this)
6
7
8
9
10
11
General Transformation Rules for Relational Algebra Operations
1. A conjunctive selection condition can be broken up into a
cascade of individual σ operations.
5. Commutativity of and X
R S = S R
RXS=SXR
6. Commuting σ c with
If the attributes in the selection condition c involve only the
attributes of the one of the relation being joined then
σc ( R S) = (σ c ( R ) ) S
12
if the selection condition c is c1 AND c2, c1 involves only attributes
in R and c2 involves only attributes in S then
σc ( R S) = (σ c1 ( R )) (σ c2 ( S ))
7. Commuting ∏ with
(R φ S ) φ T = R φ (S φ T)
σ (R φ S ) = (σ (R)) φ (σ (S))
σc (R X S) = (R c S)
σc (R - S) = σc (R ) - σc (S)
However, selection may be applied to only one relation.
σc (R - S) = σc (R ) - S
14
Outline of a heuristic optimization algorithm
1. Using rule 1, break up the select operation, this will allow
moving selection down the tree at different branches
2. Using rules 2, 4, 6, 10, 13, 14 move each select operation as far
down the query tree as is permitted by the attributes
3. Using rule 5 and 9, rearrange the leaf nodes
4. Using rule 12, combine X with a selection and JOIN
5. Using rule 3, 4, 7, 11 cascading of projection, breakdown and
move down the tree
6. Reduce the size of intermediate results; perform selection as
early as possible
7. Use projection attributes as early as possible to reduce number
of attributes
8. Execute select and join operations that are more restrictive or
result in less tuples
- Query tree includes the information about the access methods for
each relation and algorithms to execute the tree
- To execute this query plan, optimizer might choose an index for
SELECT operation on DEPARTMENT; assume also that an index
15
exits for Dno attribute of EMPLOYEE. Optimizer may also use a
pipelined operation.
16
For materialized evaluation, the result is stored as a temporary
(intermediate) relation.
For example, Join operation can produce intermediate results and
then apply projection. In a pipelined operation, one operation
results are fed to the next operation as they arrive.
The inner query is evaluated once and the outer query uses this
value. This is a non-correlated query.
17
The inner subquery has to be evaluated for every outer query
tuple, which is inefficient.
18
1. For a given query expression, multiple equivalent rules may exist;
there is no definitive convergence; it is difficult to do this in a
limited time
2. It is necessary to use some quantitative measures for evaluating
alternatives; reduce them to common metric called cost
3. Keep cheaper ones and prune the expensive ones
4. Scope is a query block; various index access paths, join
permutations, join methods, group by methods, etc…
5. In global query optimization, multiple query blocks used
19
- The blocking factor (bfr)
- Primary file organization
- Indexes
- The number of levels of multiindex x
- The number of first level index blocks (bl1)
- The number of distinct values of an attribute (d)
- Selectivity of an attribute (sl)
- Avg number of records that satisfy an equality condition on an
attribute (selection cardinality s=sl*r; selectivity and no of
records)
Histograms
- Tables or data structures maintained by DBMS to record
distribution of data
- Without histograms, it is uniform distribution
20
21
Examples of cost functions for SELECT
22
o If A is a key of R (A=B condition)
23
4. Overview of Query Optimization in Oracle
24
Global Query Optimizer
- Query optimization consists of logical and physical phases
- In Oracle, logical transformation and physical optimization are
integrated to generate optimal execution plan (Fig. 19.7)
- Transformation can be heuristic based on cost based
- Cost based query transformation (CBQT) introduced in 10g.
- Applies one or more transformations
- An SQL statement may consist of multiple blocks, which are
transformed by physical optimizer
25
- This process is repeated several times, each time applying a
different transformation and its cost is computed
- At the end one or more transformations are applied to the
original SQL statement if they result in optimal execution plan
- To avoid combinatorial explosion, it provides efficient search
strategies for searching the state space of various transformations
- Major transformations include: group-by, distinct, subquery
merging, subquery unnesting, predicate move aroung, common
subexpression elimination, join predicate push down, OR
expansion, subquery coalescing, join factorization, subquery
removal through window function, start transformation, group-by
placement, and bushy join trees
Adaptive Optimization
- Oracle’s physical optimizer is adaptive
- Uses feedback loop from execution level to improve on its
previous decisions (backtrack)
- Optimal execution plan for a given SQL statement (uses object
statistics, system statistics)
- Optimality depends on accuracy of the statistics fed to the model
and the sophistication of the model itself
- Execution engine and physical optimizer has the feedback loop
(Fig. 19.7)
- Based on the estimated value of the table cardinality, optimizer
may choose index based nested loop join method; during
execution, the actual cardinality may be different from the
estimated value; during the execution, this may trigger the
physical optimizer to change the decision to use hash join method
instead of index join.
26
Array Processing
- Oracle lacks N-dimensional array based computation
- Extensions are made for OLAP features
- Improves performance in complex modeling and analysis
- Computation clause allows a table to be treated as a multi-
dimensional array and specify a set of formulas over it, the
formulas replace multiple joins and union operations
Hints
- Application developer can provide hints to query optimizer (query
annotations or directives)
- Hints are embedded in the text of query statement
- Hints are used to address infrequent cases to help optimizer
- Occasionally, application developer can override the optimizer in
case of suboptimal plans chosen by the optimizer
- E.g EMPLOYEE record ‘Sex’ attributes may assume half male and
half female; it is possible in the database all are male; then
application developer can specify that to optimize the column
index
- Some types of hints:
o The access path for a given table
o The join order for a query block
o A particular join method for a join between tables
o Enabling or disabling of tranformations
Outlines
- Outlines are used to preserve execution plans of SQL statements
or queries
- They are implemented as a collection of hints
27
- Outlines are used for plan stability, what-if analysis, and
performance experiments
SQL Plan Management
- Execution plans have a significant impact on overall performance
of a database management system
- SQL Plan Management (SPM) was introduced in Oracle 11g
- This option can be enabled for all execution plans or for a specific
SQL statements
- Execution plans may become obsolete due to a variety of reasons;
new statistics, configuration parameter changes, software
updates, new optimization techniques
- SMP will use optimal plans and avoid semi-optimal ones, create
new plans and add to the system as needed
28
- Consider another example:
SELECT Lname, Salary
FROM EMPLOYEE, DEPARTMENT
WHERE EMPLOYEE.Dno=DEPARTMENT.Dnumber AND
EMPLOYEE.Salary > 100000;
It can be rewritten:
SELECT Lname, Salary
FROM EMPLOYEE
WHERE EMPLOYEE.Dno IS NOT NULL AND
EMPLOYEE.Salary > 100000;
(The referential integrity constraint that EMPLOYEE Dno is a
foreign key that refers to DEPARTMENT Dnumber primary
key. All the attributes referenced in the query are from
EMPLOYEE. Thus, there is no need for DEPARTMENT and it
can be eliminated and there is no need for join.
29