Unit-5 Query Processing and Optimization
Unit-5 Query Processing and Optimization
Unit-5 Query Processing and Optimization
Query Processing
and
Optimization
Steps in Query Processing
Parser checks the syntax of Translator translates the
query and verifies attribute query into its internal
name and relation name form (relational algebra)
Database Catalog
Data Statistics about Data
Steps in Query Processing
Steps in Query Processing
Steps of query processing are:
1. Query decomposition.
2. Query optimization.
3. Code generation
4. Runtime query execution.
Steps in Query Processing
1.Query Decomposition
Its include scanning, parsing and validating process.
Scanning: The scanner finds the language tokens like SQL keywords,
attribute names, and relation names in the text of the query.
Parsing: Parser checks the syntax of query to find out whether it is
written according to the syntax rules of the query language.
Validating:
Query validation is done by checking that all attribute and relation
names are valid.
An internal representation of the query is then shown as a tree data
structure called a query tree.
An internal representation of the query is then shown as a graph
data structure called a query graph.
Steps in Query Processing
2. Query optimization
Query optimization is one of the most important tasks of a relational
DBMS.
It generates alternative plans and chooses the best plan with the
least estimated cost.
Branch=‘CE’ (Student)
RollNo Name Branch SPI
101 Raj CE 8
104 Punit CE 9
Search algorithm for selection operation
1. Linear search (A1)
2. Binary search (A2)
Linear search (A1)
It scans each blocks and tests all records to see whether they
satisfy the selection condition.
• Cost of linear search (worst case) = br
br denotes number of blocks containing records from relation r
If the selection condition is there on a (primary) key
attribute, then system can stop searching if the required
record is found.
• cost of linear search (best case) = (br /2)
Linear search can be applied regardless of
• selection condition or
• ordering of records in the file (relation)
This algorithm is slower than binary search algorithm.
Binary search (A2)
Generally, this algorithm is used if selection is an equality
comparison on the (primary) key attribute and file
(relation) is ordered (sorted) on (primary) key attribute.
cost of binary search = [log2(br)]
• br denotes number of blocks containing records from relation r
If the selection is on non (primary) key attribute then
multiple block may contains required records, then the cost
of scanning such blocks need to be added to the cost
estimate.
This algorithm is faster than linear search algorithm.
Evaluation of expressions
Expression may contain more than one operations, solving
expression will be difficult if it contains more than one
expression.
Cust_Name ( Balance<2500 (account) customer )
To evaluate such expression we need to evaluate each
operation one by one in appropriate order.
3 Cust_Name
2
Bottom to top
Execution
1 Balance<2500 customer
account
Evaluation of expressions
Methods for evaluating an entire expression tree are:
1. Materialization
2. Pipelining
Materialization
Materialization evaluates the expression tree of the
relational algebra operation from the bottom and
performs the innermost or leaf-level operations first.
The intermediate result of each operation is materialized
(store in temporary relation) and becomes input for
subsequent (next) operations.
The cost of materialization is the sum of the individual
operations plus the cost of writing the intermediate results
to disk.
The problem with materialization is that
• it creates lots of temporary relations
• it performs lots of I/O operations
Pipelining
In pipelining, operations form a queue, and results are
passed from one operation to another as they are
calculated.
To reduce number of intermediate temporary relations, we
pass results of one operation to the next operation in the
pipelines.
Combining operations into a pipeline eliminates the cost of
reading and writing temporary relations.
Pipelines can be executed in two ways:
1. Demand driven (System makes repeated requests for tuples from the
operation at the top of pipeline)
2. Producer driven (Operations do not wait for request to produce
tuples, but generate the tuples eagerly.)
Query optimization
It is a process of selecting the most efficient query
evaluation plan from the available possible plans.
Cust_Name ( Balance<2500 (Account) Customer )
Customer Output
Cid Ano Cust_name Balance Cid Ano Cust_name Balance
C01 1 Raj 3000 C02 2 Meet 1000
C02 2 Meet 1000
C03 3 Harsh 2000
C04 4 Punit 4000
E1 θ E2 = E2 θ E1
Transformation of relational expressions
6. Natural join operations are associative.
(E1 E2) E3 = E1 (E2 E3)
7. Selection operation distribute over theta join operation
under the following condition
• When all the attributes in the selection condition θ0 involves only the
attributes of the one of the expression (says E1) being joined.
σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2
• When the selection condition θ1 involves only the attributes of E1 and
θ2 involves only the attributes of E2.
σθ1Λθ2 (E1 θ E2) = (σθ1(E1) θ (σθ2 (E2)))
Transformation of relational expressions
8. Set operations union and intersection are commutative.
set difference is not commutative Union Intersect
Customer Employee Output Output
Cust_name Emp_name Name Name
Raj Meet Raj Meet
Meet Suresh Meet
Suresh
E1 U E2 = E2 U E1
E1 ∩ E2 = E2 ∩ E1
Transformation of relational expressions
9. Set operations union and intersection are associative.
Union Intersect
Customer Employee Student Output Output
Cust_name Emp_name Emp_name Name Name
Raj Meet Raj Raj Meet
Meet Suresh Meet Meet
Suresh