0% found this document useful (0 votes)
4 views42 pages

Query Processing and Optimization

This document covers query processing and optimization in Database Management Systems (DBMS), detailing the steps involved in query processing, measures of query cost, and various operations such as selection and evaluation of expressions. It discusses methods for query optimization, including exhaustive and heuristic-based approaches, as well as transformations of relational expressions to improve efficiency. Key concepts include different search algorithms, evaluation techniques like materialization and pipelining, and the importance of minimizing query execution costs.

Uploaded by

Vishvas Panchal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views42 pages

Query Processing and Optimization

This document covers query processing and optimization in Database Management Systems (DBMS), detailing the steps involved in query processing, measures of query cost, and various operations such as selection and evaluation of expressions. It discusses methods for query optimization, including exhaustive and heuristic-based approaches, as well as transformations of relational expressions to improve efficiency. Key concepts include different search algorithms, evaluation techniques like materialization and pipelining, and the importance of minimizing query execution costs.

Uploaded by

Vishvas Panchal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 42

Database Management Systems

(DBMS)
GTU # 3130703

Unit-5
Query
Processing
and
Optimization
Prof. Firoz A Sherasiya
Computer Engineering
Department
Darshan Institute of Engineering & Technology, Rajkot
firoz.sherasiya@darshan.ac.in
9879879861
 Looping
Outline
• Steps in query processing
• Measures of query cost
• Selection operation
• Evaluation of expressions
• Query optimization
• Transformation of relational
expressions
• Sorting and join
Steps in query
processing
Section – 1
Steps in Query Processing
Parser checks the Translator
syntax of query and translates the
verifies attribute query into its
name and relation internal form
name Parser (relational algebra)
Relational
Quer and
algebra
y translat
expression
or
Choose best execution
plan Optimiz
Execute the query-
er
evaluation plan and
returns output
Evaluati
Query Execution
on
output plan
engine

Database
Data Catalog
Statistics about
#3130703 (DBMS)  Data
Unit 5 – Query Processing
Prof. Firoz A Sherasiya 4
Measures of query cost
Section – 2
Measures of Query Cost
 Cost is generally measured as the total time required to execute a
statement/query.
 Factors contribute to time cost
 Disk accesses (time to process a data request and retrieve the required data from
the storage device)
 CPU time to execute a query
 Network communication cost
 Disk access is the predominant (major) cost, since disk access is
slow as compared to in-memory operation.
 Cost to write a block is greater than cost to read a block because
data is read back after being written to ensure that the write was
successful.

#3130703 (DBMS)  Unit 5 – Query Processing


Prof. Firoz A Sherasiya 6
Selection operation
Section – 3
Selection Operator
 Symbol: σ (Sigma)
 Notation: σ (Relation)
condition
 Operation: Selects tuples from a relation that satisfy a given
condition.
 Operators: =, <>, <, >, <=, >=, Λ (AND), V (OR)
Exam σ
Display the detail of students belongs to Answ Branch=‘CE’
ple “CE” Branch. er (Student)
Stud Outp
ent
Roll Nam Bran SP ut
Roll Nam Bran SP
No e ch I No e ch I
101 Raju CE 8 101 Raju CE 8
102 Mites ME 9 104 Meet CE 9
h
103 Niles CI 9
h
104 Meet
Prof. Firoz CE
A Sherasiya 9 #3130703 (DBMS)  Unit 5 – Query Processing
8
Search algorithm for selection operation
 Linear search (A1)
 Binary search (A2)

#3130703 (DBMS)  Unit 5 – Query Processing


Prof. Firoz A Sherasiya 9
Linear search (A1)
 It scans each blocks and tests all records to see whether they satisfy
the selection condition.
 Cost of linear search (worst case) = br
br denotes number of blocks containing records from relation r
 If the selection condition is there on a (primary) key attribute, then
system can stop searching if the required record is found.
 cost of linear search (best case) = (br /2)
 If the selection is on non (primary) key attribute then multiple block
may contains required records, then the cost of scanning such
blocks need to be added to the cost estimate.
 Linear search can be applied regardless of
 selection condition or
 ordering of records in the file (relation)
 This algorithm is slower than binary search algorithm.

#3130703 (DBMS)  Unit 5 – Query Processing


Prof. Firoz A Sherasiya 10
Binary search (A2)
 Generally, this algorithm is used if selection is an equality comparison
on the (primary) key attribute and file (relation) is ordered
(sorted) on (primary) key attribute.
 cost of binary search = [log2(br)]
 br denotes number of blocks containing records from relation r
 This algorithm is faster than linear search algorithm.

#3130703 (DBMS)  Unit 5 – Query Processing


Prof. Firoz A Sherasiya 11
Evaluation of
expressions
Section – 4
Evaluation of expressions
 Expression may contain more than
one operations, solving expression ΠCust_Nam
will be difficult if it contains more
than one operations. e

ΠCust_Name ( σBalance<2500 (account)

Bottom to top
(customer) )

Execution
 To evaluate such expression we
need to evaluate each operations σBalance<25 (custom
one by one in appropriate order. er)
00
 Two methods for evaluating an
expression carrying multiple
(accou
operations are: nt)
 Materialization
 Pipelining

#3130703 (DBMS)  Unit 5 – Query Processing


Prof. Firoz A Sherasiya 13
Materialization
 Materialization evaluates the expression tree of the relational algebra
operation from the bottom and performs the innermost or leaf-level
operations first.
 The intermediate result of each operation is materialized (store in
temporary relation) and becomes input for subsequent (next)
operations.
 The cost of materialization is the sum of the individual operations
plus the cost of writing the intermediate results to disk.
 The problem with materialization is that
 it creates lots of temporary relations
 it performs lots of I/O operations

#3130703 (DBMS)  Unit 5 – Query Processing


Prof. Firoz A Sherasiya 14
Pipelining
 In pipelining, operations form a queue, and results are passed from
one operation to another as they are calculated.
 To reduce number of intermediate temporary relations, we pass
results of one operation to the next operation in the pipelines.
 Combining operations into a pipeline eliminates the cost of reading
and writing temporary relations.
 Pipelines can be executed in two ways:
 Demand driven (System makes repeated requests for tuples from the operation at
the top of pipeline)
 Producer driven (Operations do not wait for request to produce tuples, but
generate the tuples eagerly.)

#3130703 (DBMS)  Unit 5 – Query Processing


Prof. Firoz A Sherasiya 15
Query optimization
Section – 5
Query optimization
 It is a process of selecting the most efficient query evaluation plan
from the available possible plans.

ΠCust_Name ( σBalance<2500 (account) Custo Accoun


(customer) ) mer Na t Balan
CID ANO ANO
Efficient me ce
plan 2 4 C01 A01 Raj A01 3000
records records C02 A02 Meet A02 1000
C03 A03 Jay A03 2000
ΠCust_Name ( σBalance<2500 (account C04 A04 Ram A04 4000
customer) )

4 4
records records

#3130703 (DBMS)  Unit 5 – Query Processing


Prof. Firoz A Sherasiya 17
Approaches to Query Optimization
 Exhaustive Search Optimization
 Generates all possible query plans and then the best plan is selected.
 It provides best solution.
 Heuristic Based Optimization
 Heuristic based optimization uses rule-based optimization approaches for
query optimization.
 Performs select and project operations before join operations. This is done
by moving the select and project operations down the query tree. This reduces the
number of tuples available for join.
 Avoid cross-product operation because they result in very large-sized
intermediate tables.
 This algorithms do not necessarily produce the best query plan.

#3130703 (DBMS)  Unit 5 – Query Processing


Prof. Firoz A Sherasiya 18
Transformation of
relational expressions
Section – 6
Transformation of relational expressions
 Two relational algebra expressions are said to be equivalent if the two
expressions generate the same set of tuples.
 Example: Custo Accoun
mer Na t Balan
CID ANO ANO
me ce
C01 A01 Raj A01 3000
C02 A02 Meet A02 1000
C03 A03 Jay A03 2000
C04 A04 Ram A04 4000
ΠName ( σBalance<2500 (Account) ΠName ( σBalance<2500 (Account
(Customer) ) Customer) )
Custo
mer
Name
Meet
Jay
#3130703 (DBMS)  Unit 5 – Query Processing
Prof. Firoz A Sherasiya 20
Transformation of relational expressions
 Combined selection operation can be divided into sequence of
individual selections. This transformation is called cascade of σ.
 Example:

Custo
mer
CI AN Na Balan
D O me ce σANO<3 Λ Balance<2000 Output
C0 1 Raj 3000 (Customer) CI AN Na Balan
OUTP
1 D O me ce
UT
C0 2 Meet 1000 C0 2 Meet 1000
2
σANO<3 (σBalance<2000 2
C0 3 Jay 2000
(Customer))
3
C0
4
4 Ram 4000 σθ1Λθ2 (E) = σθ1 (σθ2
(E))

#3130703 (DBMS)  Unit 5 – Query Processing


Prof. Firoz A Sherasiya 21
Transformation of relational expressions
 Selection operations are commutative.
 Example:

Custo
mer
CI AN
D O
Na
me
Balan
ce σANO<3 (σBalance<2000 Output
C0 1 Raj 3000 (Customer)) CI AN Na Balan
OUTP
1 D O me ce
UT
C0 2 Meet 1000 C0 2 Meet 1000
2
σBalance<2000 (σANO<3 2
C0 3 Jay 2000
(Customer))
3
C0
4
4 Ram σθ1 (σθ2 (E))
4000 = σθ2 (σθ1
(E))

#3130703 (DBMS)  Unit 5 – Query Processing


Prof. Firoz A Sherasiya 22
Transformation of relational expressions
 If more than one projection operation is used in expression then only
the outer projection operation is required. So skip all the other
inner projection operation.
 Example:
Custo Custo
mer mer
CI AN
D O
Na
me
Balan
ce ΠName (ΠANO, Name Name

(Customer)) Raj
C0 1 Raj 3000 OUTP
1 Meet
UT
C0 2 Meet 1000 Jay
2
ΠName (Customer) Ram
C0 3 Jay 2000
3
C0
4
4 Ram ΠL1 (ΠL2 (…(Π Ln (E))…))
4000 = ΠL1
(E)

#3130703 (DBMS)  Unit 5 – Query Processing


Prof. Firoz A Sherasiya 23
Transformation of relational expressions
 Selection operation can be joined with Cartesian product and
theta join.
 Example:
Custo Accoun
mer tAN Balan
CI AN
D O
Na
me O ce σANO<3 (Customer Output

Account) CI AN Na Balan
C0 1 Raj 1 3000 OUTP D O me ce
1 UT
2 1000 C0 1 Raj 3000
C0 2 Meet 1
3 2000
2
(Customer) σANO<3 C0 2 Meet 1000
C0 3 Jay 4 4000 (Account) 2
3
C0
4
4 Ram σθ (E1 E2)) = E1 θE2
σθ1 (E1 θ2 E2)) = E1
θ1Λ θ2 E2
#3130703 (DBMS)  Unit 5 – Query Processing
Prof. Firoz A Sherasiya 24
Transformation of relational expressions
 Theta operations are commutative.
 Example:

Custo Accoun
mer tAN Balan
CI AN
D O
Na
me O ce (Account) σANO<3 Output

(Customer) CI AN Na Balan
C0 1 Raj 1 3000 OUTP D O me ce
1 UT
2 1000 C0 1 Raj 3000
C0 2 Meet 1
3 2000
2
(Customer) σANO<3 C0 2 Meet 1000
C0 3 Jay 4 4000 (Account) 2
3
C0
4
4 Ram E1 σθ E2 = E2 σθ E1

#3130703 (DBMS)  Unit 5 – Query Processing


Prof. Firoz A Sherasiya 25
Transformation of relational expressions
 Natural join operations are associative.
(E1 E2) E3 = E1
(E2 E3)
 Selection operation distribute over theta join operation under the
following condition
 When all the attributes in the selection condition θ0 involves only the attributes of
σθ0 (E1 = (σθ0 (E1))
the one of the expression (says E1) being joined.
θ E2)
θ E2

σ θ1Λθ2
only the attributes of E2.
( θ1(E1)= θ σ
 When the selection condition θ1 involves only the attributes of E1 and θ2 involves
(E1 θ E2)

(σθ2 (E2)))

#3130703 (DBMS)  Unit 5 – Query Processing


Prof. Firoz A Sherasiya 26
Transformation of relational expressions
 Set operations union and intersection are commutative.
Output
Custo Emplo Customer U
mer yee Employee Name
Cst_Na Emp_Na
me me OUTP Raj
Raj Meet UT Meet
Meet Suresh Employee U Suresh
Customer

Custo Emplo Customer ∩


mer yee Employee Output
Cst_Na Emp_Na
me me OUTP Name
Raj Meet UT Meet
Meet Suresh Employee ∩
Customer
E1 U E2 = E2 U E1 & E1 ∩ E2 = Set difference is not
E2 ∩ E1#3130703 (DBMS)  Unit 5 – Query Processing commutative
Prof. Firoz A Sherasiya 27
Transformation of relational expressions
 Set operations union and intersection are associative.
Output
Custo Emplo Studen (Customer U Employee) U
mer yee tStu_Na Student Name
Cst_Na Emp_N
me ame me OUTP Raj
Raj Meet Raj UT Meet
Meet Suresh Meet Customer U (Employee U Suresh
Student)

Custo Emplo Studen (Customer ∩ Employee) ∩


mer yee tStu_Na Student Output
Cst_Na Emp_N
me ame me OUTP Name
Raj Meet Raj UT Meet
Meet Suresh Meet Customer ∩ (Employee ∩
Student)
(E1 U E2) U E3 = E1 U (E2 U E3) & (E1 ∩ E2) ∩ E3 = E1
∩ (E2 ∩ E3)
#3130703 (DBMS)  Unit 5 – Query Processing
Prof. Firoz A Sherasiya 28
Transformation of relational expressions
 Selection operation distributes over U, ∩ and –.
σθ(E1 U E2) = σθ(E1) U
σθ(E2)
σθ(E1 ∩ E2) = σθ(E1) ∩
σθ(E2)
σθ(E1 – E2) = σθ(E1) –
σθ(E2)

#3130703 (DBMS)  Unit 5 – Query Processing


Prof. Firoz A Sherasiya 29
Sorting and joins
Section – 7
Sorting
 Several of the relational operations, such as joins, can be implemented
efficiently if the input relations are first sorted.
 We can sort a relation by building an index on the relation and then
using that index to read the relation in sorted order.
 Such a process orders the relation only logically rather than physically.
 Hence reading of tuples in the sorted order may lead to disk access for
each record, which can be very expensive.
 So it is desirable to order the records physically.
 Sorting of relation that fit into main memory, standard sorting
techniques such as quick-sort can be used.
 Sorting of relations that do not fit in main memory is called
external sorting.
 Most commonly used algorithm for this type of sorting is external sort
merge algorithm.
#3130703 (DBMS)  Unit 5 – Query Processing
Prof. Firoz A Sherasiya 31
External Sort-Merge (Example)
 Blocks=3
2 1 1
2 4 9 4 2
4 1 2 3
1
1 9 4 6 7
9 3
3 1 1 1
3 1
3 1
4 9 4
1
1 1 2 1
3 4 6 4 4
3
1 3 3
2 1
1 6 3 1 6
4 1 3
2 6 3 1
1 1 7
3
2 6
6 merg 1 merg
creat 3
2 1
2 1
1 e 4 e
e 7 7 9
6 pass- 1 pass-
runs 1 1 2
2 1 6 2
initial 4 run
4 run sorted
1
1
relation s s2 output
2
3 #3130703 (DBMS) 1 Unit 5 – Query Processing
4
Prof. Firoz A Sherasiya 32
External Sort-Merge (Algorithm)
 Let M denote memory size (in pages).
1. Create sorted runs. Let i be 0 initially.
 Repeatedly do the following till the end of the relation:
1) Read M blocks of relation into memory
2) Sort the in-memory blocks
3) Write sorted data to run Ri; then increment i.
 Let the final value of i be N
2. Merge the runs (N-way merge). We assume (for now) that N < M.
 Use N blocks of memory to buffer input runs, and 1 block to buffer output. Read the
first block of each run into its buffer page
 repeat
Select the first record (in sort order) among all buffer pages
Write the record to the output buffer. If the output buffer is full write it to disk.
Delete the record from its input buffer page.
 If the buffer page becomes empty then read the next block (if any) of the run into the
buffer.
 until all input buffer pages are empty:
#3130703 (DBMS)  Unit 5 – Query Processing
Prof. Firoz A Sherasiya 33
External Sort-Merge (Algorithm)
 If N  M, several merge passes are required.
 In each pass, contiguous groups of M - 1 runs are merged.
 A pass reduces the number of runs by a factor of M -1, and creates runs longer by
the same factor.
 E.g. If M=11, and there are 90 runs, one pass reduces the number of runs to 9, each 10
times the size of the initial runs
 Repeated passes are performed till all runs have been merged into one.

#3130703 (DBMS)  Unit 5 – Query Processing


Prof. Firoz A Sherasiya 34
Sum (Nested loop join)
 Assuming worst case memory availability and the following given
statistics for the relations customer and depositor
 Number of records of customer: 10,000 (ncustomer)
 Number of records of depositor: 5,000 (ndepositor)
 Number of blocks of customer: 400 (bcustomer)
 Number of blocks of depositor: 100 (bdepositor)
 Estimate the cost
1. with depositor as outer relation
2. with customer as outer relation

#3130703 (DBMS)  Unit 5 – Query Processing


Prof. Firoz A Sherasiya 35
Sum (Nested loop join)
 Assuming worst case memory availability and the following given
statistics for the relations customer and depositor
 Number of records of customer: 10,000 (ncustomer)
 Number of records of depositor: 5,000 (ndepositor)
 Number of blocks of customer: 400 (bcustomer)
 Number of blocks of depositor: 100 (bdepositor)
 Estimate the cost
1. with depositor as outer relation
No. of blocks access = ndepositor * bcustomer + bdepositor
= 5000 * 400 + 100
= 2000100

#3130703 (DBMS)  Unit 5 – Query Processing


Prof. Firoz A Sherasiya 36
Sum (Nested loop join)
 Assuming worst case memory availability and the following given
statistics for the relations customer and depositor
 Number of records of customer: 10,000 (ncustomer)
 Number of records of depositor: 5,000 (ndepositor)
 Number of blocks of customer: 400 (bcustomer)
 Number of blocks of depositor: 100 (bdepositor)
 Estimate the cost
1. with customer as outer relation
No. of blocks access = ncustomer * bdepositor + bcustomer
= 10000 * 100 + 400
= 1000400

#3130703 (DBMS)  Unit 5 – Query Processing


Prof. Firoz A Sherasiya 37
Sum (Nested loop join)
 Assuming best case memory availability and the following given
statistics for the relations customer and depositor
 Number of records of customer: 10,000 (ncustomer)
 Number of records of depositor: 5,000 (ndepositor)
 Number of blocks of customer: 400 (bcustomer)
 Number of blocks of depositor: 100 (bdepositor)
 Estimate the cost
1. with customer as outer relation
No. of blocks access = bdepositor + bcustomer
= 100 + 400
= 500

#3130703 (DBMS)  Unit 5 – Query Processing


Prof. Firoz A Sherasiya 38
Join Operations
 There are several different algorithms that can be used to implement joins
 Nested-Loop Join
 Block Nested-Loop Join
 Index Nested-Loop Join
 Sort-Merge Join
 Hash-Join

#3130703 (DBMS)  Unit 5 – Query Processing


Prof. Firoz A Sherasiya 39
Cost of computing for all joins
 R is the outer and S is the inner relation of the join.
 Number of records of R: (NR)
 Number of records of S: (NS)
 Number of blocks of R: (BR)
 Number of blocks of S: (BS)
Join Worst Case Best Case
Nested-Loop Join BR + NR ∗ BS BR + BS
Block Nested-Loop
BR + BR ∗ BS BR + BS
Join
Index Nested-Loop BR + NR ∗ c
Join
Merge Join BR + BS
Hash-Join 3 ∗ (BR + BS)
 c is the cost of a single selection on S using the join condition.
#3130703 (DBMS)  Unit 5 – Query Processing
Prof. Firoz A Sherasiya 40
Questions asked in GTU
1. Explain query processing steps. OR Discuss various steps of query
processing with proper diagram.
2. Explain Heuristics in optimization.
3. Explain the purpose of sorting with example with reference to query
optimization.
4. Explain the measures of finding out the cost of a query in query
processing.

#3130703 (DBMS)  Unit 5 – Query Processing


Prof. Firoz A Sherasiya 41
Database Management Systems
(DBMS)
GTU # 3130703

Thank
You

Prof. Firoz A Sherasiya


Computer Engineering
Department
Darshan Institute of Engineering & Technology, Rajkot
firoz.sherasiya@darshan.ac.in
9879879861

You might also like