0% found this document useful (0 votes)
3 views44 pages

Query Processing and Optimization

Chapter 6 discusses query processing and optimization in database systems, emphasizing the importance of transforming high-level SQL queries into efficient execution strategies. It covers relational algebra, the steps involved in query processing, and various optimization methods including heuristic and cost-based approaches. The chapter also illustrates how different execution plans can significantly impact resource usage and performance.

Uploaded by

tsiyonketema7
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views44 pages

Query Processing and Optimization

Chapter 6 discusses query processing and optimization in database systems, emphasizing the importance of transforming high-level SQL queries into efficient execution strategies. It covers relational algebra, the steps involved in query processing, and various optimization methods including heuristic and cost-based approaches. The chapter also illustrates how different execution plans can significantly impact resource usage and performance.

Uploaded by

tsiyonketema7
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 44

Chapter 6: Query

Processing and
Optimization
Adama Science and Technology University
School of Electrical Engineering and Computing
Department of CSE
CSEg 2208: Database Systems
(2023)
Outline

 Query Processing and Optimization: Why?


 Relational Algebra
 Steps of Processing
 Methods of Optimization
 Heuristic (Logical Transformations)
Transformation Rules
Heuristic Optimization Guidelines
 Cost Based (Physical Execution Costs)
Data Storage/Access Refresher
 Semantics based
08/26/25 2
Introduction to Query
Optimization
 With declarative languages such as SQL, the user specifies
what data is required
E.g. SELECT name FROM Staff
WHERE gender = ‘F’;
rather than how it is to be retrieved.
 This relieves the user of knowing what constitutes good
execution strategy.
 It also gives DBMS more control over system performance.

08/26/25 3
Introduction to Query
Optimization
 What is Query Processing?
 Steps required to transform high level SQL query into a correct and
“efficient” strategy for execution and retrieval.
 What is Query Optimization?
 The activity of choosing a single “efficient” execution strategy
(from hundreds) as determined by database catalog statistics.
 Which relational algebra expression, equivalent to the given query,
will lead to the most efficient solution plan?
 How do operations pass data (main memory buffer, disk buffer,…)?
 Will this plan minimize resource usage? (CPU/Response
Time/Disk)
08/26/25 4
DBMS techniques to process a
query
 A query expressed in a high-level query language such as SQL must first
be scanned, parsed, and validated.
 The scanner identifies the query tokens—such as SQL keywords,
attribute names, and relation names, that appear in the text of the query,
the parser checks the query syntax to determine whether it is formulated
according to the syntax rules (rules of grammar) of the query language.
 The query must also be validated by checking that all attribute and
relation names are valid and semantically meaningful names in the
schema of the particular database being queried.

08/26/25 5
DBMS techniques to process a
query
 A query has many possible execution strategies, and the process of
choosing a suitable one for processing a query is known as query
optimization.
 we will primarily focus on how queries are processed and what
algorithms are used to perform individual operations within the
query.
 The query optimizer module has the task of producing a good execution
plan, and the code generator generates the code to execute that plan.
 The runtime database processor has the task of running (executing) the query code,
whether in compiled or interpreted mode, to produce the query result.
 If a runtime error results, an error message is generated by the runtime database processor.

08/26/25 6
Relational Algebra

 The basic set of operations for the relational model is the relational
algebra.
 These operations enable a user to specify basic retrieval requests as
relational algebra expressions.
 A sequence of relational algebra operations forms a relational
algebra expression, whose result will also be a relation that
represents the result of a database query (or retrieval request).
 A table corresponds to the mathematical concept of a relation:
 Relations corresponds to tables,
 Tuples corresponds to rows.

08/26/25 7
Unary Relational Operations:
SELECT and PROJECT

The SELECT Operation


 The SELECT operation is used to choose a subset of the tuples from a relation that
satisfies a selection condition.
 For example, to select the EMPLOYEE tuples whose department is 4, or those whose
salary is greater than $30,000

 For example, to select the tuples for all employees who either work in department 4
and make over $25,000 per year, or work in department 5 and make over $30,000,

08/26/25 8
Unary Relational Operations:
SELECT and PROJECT
The PROJECT Operation
 The PROJECT operation, on the other hand, selects certain columns from the table
and discards the other columns.
 For example, to list each employee’s first and last name and salary

 In general, the PROJECT operation is denoted by:

 The PROJECT operation removes any duplicate tuples, so the result of the
PROJECT operation is a set of distinct tuples, and hence a valid relation.

 the keyword DISTINCT from this SQL query, then duplicates will not be
eliminated.
08/26/25 9
Unary Relational Operations:
SELECT and PROJECT
 Boolean conditions AND, OR, and NOT have their normal
interpretation, as follows:
 (cond1 AND cond2) is TRUE if both (cond1) and (cond2) are TRUE;
otherwise, it is FALSE.
 (cond1 OR cond2) is TRUE if either (cond1) or (cond2) or both are
TRUE; otherwise, it is FALSE.
 (NOT cond) is TRUE if cond is FALSE; otherwise, it is FALSE.

08/26/25 10
Unary Relational Operations:
SELECT and PROJECT

08/26/25 11
Relational Algebra Operations
from Set Theory: The UNION, INTERSECTION,
and MINUS Operations
 UNION: The result of this operation, denoted by R  S, is a relation that
includes all tuples that are either in R or in S or in both R and S. Duplicate
tuples are eliminated.
 INTERSECTION: The result of this operation, denoted by R  S, is a relation
that includes all tuples that are in both R and S.
 SET DIFFERENCE (or MINUS): The result of this operation, denoted by R –
S, is a relation that includes all tuples that are in R but not in S.
 For example, to retrieve the Social Security numbers of all employees who
either work in department 5 or directly supervise an employee who works in
department 5

08/26/25 12
Relational Algebra Operations
from Set Theory: The UNION,
INTERSECTION, and MINUS Operations

08/26/25 13
The Cartesian Product (Cross
Product)
Operation
CARTESIAN PRODUCT operation—also known as Cross Product Or
Cross Join—which is denoted by ×.
This is also a binary set operation, but the relations on which it is
applied do not have to be union compatible.
In general, the result of R(A1, A2, ..., An) × S(B1, B2, ..., Bm) is a
relation Q with degree n + m attributes Q(A1, A2, ..., An, B1, B2, ..., Bm),
in that order.

08/26/25 14
The JOIN Operation

JOIN operation, denoted by ⨝.


used to combine related tuples from two relations into single tuples.
For example, we want to retrieve the name of the manager of each department. To
get the manager’s name, we need to combine each department tuple with the
employee tuple whose Ssn value matches the Mgr_ssn value in the department tuple.
We do this by using the JOIN operation and then projecting the result over the
necessary attributes, as follows:

08/26/25 15
The JOIN Operation

THETA JOIN:
where each <condition> is of the form Ai θ Bj, Ai is an attribute of R,
Bj is an attribute of S, Ai and Bj have the same domain, and θ (theta) is
one of the comparison operators {=, <, ≤, >,≥, ≠}

08/26/25 16
Query Optimization
Example:
Identify all managers who work in a London branch
SELECT * FROM Staff s, Branch b
WHERE s.branchNo = b.branchNo AND s.position = ‘Manager’ AND
b.city = ‘london’;

Results in these equivalent relational algebra statements


(1)(position=‘Manager’)^(city=‘London’)^(Staff.branchNo=Branch.branchNo) (Staff X Branch)
(2) (position=‘Manager’)^(city=‘London’) (Staff wvStaff.branchNo = Branch.branchNo Branch)
(3) [(position=‘Manager’) (Staff)] Staff.branchNo = Branch.branchNo [(city=‘London’)
(Branch)]

08/26/25 17
Query Optimization
Assume:

1000 tuples in Staff.


 50 Managers
50 tuples in Branch.
 5 London branches
No indexes or sort key
All temporary results are written back to disk (memory is small)

Query Tuples
1 (Bad)are accessed one at a time (not in blocks)
s(position=‘Manager’)^(city=‘London’)^(Staff.branchNo=Branch.branchNo) (Staff X Branch)
 Requires (1000+50) disk accesses to read from Staff and Branch relations
 Creates temporary relation of Cartesian Product (1000*50) tuples
 Requires (1000*50) disk access to read in temporary relation and test predicate
Total Work = (1000+50) + 2*(1000*50) =
08/26/25 101,050 I/O operations 18
Query Optimization

Query 2 (Better)
(position=‘Manager’)^(city=‘London’) (Staff Staff.branchNo = Branch.branchNo
Branch)
 Again requires (1000+50) disk accesses to read from Staff and Branch
 Joins Staff and Branch on branchNo with 1000 tuples
(1 employee : 1 branch )
 Requires (1000) disk access to read in joined relation and check predicate
Total Work = (1000+50) + 2*(1000) =
3050 I/O operations
3300% Improvement over Query 1

08/26/25 19
Query Optimization

Query 3 (Best)
[ (position=‘Manager’) (Staff) ] Staff.branchNo = Branch.branchNo [ (city=‘London’) (Branch)

 Read Staff relation to determine ‘Managers’ (1000 reads)


 Create 50 tuple relation(50 writes)

 Read Branch relation to determine ‘London’ branches (50 reads)


 Create 5 tuple relation(5 writes)

 Join reduced relations and check predicate (50 + 5 reads)

Total Work = 1000 + 2*(50) + 5 + (50 + 5) =


1160 I/O operations

08/26/25 20
8700% Improvement over Query 1
Query Processing Steps

08/26/25 21
Query Processing Steps
 Parser and translator
 Parser checks syntax (e.g., correct relation and operator names)
 Translate the (SQL) query into relational algebra
 Optimizer
 Chooses the most efficient implementation to execute the query
 Annotates them with instructions (algorithms): query execution plan (QEP)
 Estimates the cost of each equivalent QEP, according to a given cost mod
 Choose the “best” QEP
 Evaluation engine
 The query-execution engine takes a query-evaluation plan, executes that
plan, and returns the answers to the query

08/26/25 22
Approaches to Query
Optimization
Using Heuristic in Query Optimization

Cost Estimates in Query Optimization

08/26/25 23
Approaches to Query
Optimization
A. Heuristics Approach
 Heuristics Approach uses the knowledge of the characteristics
of the relational algebra operations and the relationship
between the operators to optimize the query.
 Thus the heuristic approach of optimization will make use of:
Properties of individual operators
Association between operators
Query Tree: a graphical representation of the operators,
relations, attributes and predicates and processing sequence
during query processing.
08/26/25 24
Approaches to Query Optimization

 It is composed of three main parts:


 The Leafs: the base relations used for processing the query/
extracting the required information
 The Root: the final result/relation as an out put based on the
operation on the relations used for query processing
 Nodes: intermediate results or relations before reaching the
final result.
 Sequence of execution of operation in a query tree will start
from the leaves and continues to the intermediate nodes and
ends at the root.
08/26/25 25
Transformation Rules for Relational
Algebra

08/26/25 26
Transformation Rules for Relational
Algebra

08/26/25 27
Transformation Rules for Relational
Algebra

08/26/25 28
Implementation of Heuristic
Approach

08/26/25 29
Using Heuristics in Query
Optimization
 Process for heuristics optimization
1. The parser of a high-level query generates an initial internal
representation;
2. Apply heuristics rules to optimize the internal representation.
3. A query execution plan is generated to execute groups of
operations based on the access paths available on the files
involved in the query.
 The main heuristic is to apply first the operations that reduce the
size of intermediate results.
 E.g. Apply SELECT and PROJECT operations before applying
the JOIN or other binary operations.
08/26/25 30
Using Heuristics in Query
Optimization
 Query block: The basic unit that can be translated into the algebraic
operators and optimized.
 A query block contains a single SELECT-FROM-WHERE
expression, as well as GROUP BY and HAVING clause if these are
part of the block.
 Nested queries within a query are identified as separate query
blocks.
 Query tree:
A tree data structure that corresponds to a relational algebra
expression. It represents the input relations of the query as leaf
nodes of the tree, and represents the relational algebra operations
as internal nodes.

08/26/25 31
Using Heuristics in Query
Optimization
 Query graph:
 A graph data structure that corresponds to a relational calculus expression. It
does not indicate an order on which operations to perform first. There is only
a single graph corresponding to each query.

 Example:
 For every project located in ‘Stafford’, retrieve the project number, the controlling
department number and the department manager’s last name, address and birthdate.
 Relation algebra:
 π
PNUMBER, DNUM, LNAME, ADDRESS, BDATE (((σPLOCATION=‘STAFFORD’(PROJECT))
DNUM=DNUMBER (DEPARTMENT)) MGRSSN=SSN (EMPLOYEE))
 SQL query:
Q2: SELECT P.NUMBER, P.DNUM, E.LNAME, E.ADDRESS, E.BDATE
FROM PROJECT AS P ,DEPARTMENT AS D, EMPLOYEE AS E
WHERE P.DNUM=D.DNUMBER AND
D.MGRSSN=E.SSN AND
P.PLOCATION=‘STAFFORD’;
08/26/25 32
Using Heuristics in Query
Optimization
Q2: SELECT P.NUMBER, P.DNUM, E.LNAME, E.ADDRESS, E.BDATE
FROM PROJECT AS P ,DEPARTMENT AS D, EMPLOYEE AS E
WHERE P.DNUM=D.DNUMBER AND
D.MGRSSN=E.SSN AND
P.PLOCATION=‘STAFFORD’;

08/26/25 33
Using Heuristics in Query
Optimization

08/26/25 34
Using Heuristics in Query
Optimization
 Heuristic Optimization of Query Trees:
 The same query could correspond to many different relational algebra
expressions — and hence many different query trees.
 The task of heuristic optimization of query trees is to find a final query tree
that is efficient to execute.
 Example:
Q: SELECT LNAME
FROM EMPLOYEE, WORKS_ON, PROJECT
WHERE PNAME = ‘AQUARIUS’ AND
PNMUBER=PNO AND
ESSN=SSN AND BDATE > ‘1957-
12-31’;
08/26/25 35
Using Heuristics in Query
Optimization

08/26/25 36
Using Heuristics in Query
Optimization

08/26/25 37
Using Heuristics in Query
Optimization
Summary of Heuristics for Algebraic Optimization:
 The main heuristic is to apply first the operations that reduce the size of
intermediate results.
 Perform select operations as early as possible to reduce the number of tuples
and perform project operations as early as possible to reduce the number of
attributes. (This is done by moving select and project operations as far down
the tree as possible.)
 The select and join operations that are most restrictive should be executed
before other similar operations. (This is done by reordering the leaf nodes of
the tree among themselves and adjusting the rest of the tree appropriately.)

08/26/25 38
Cost Estimation Approach
to Query Optimization
 The main idea is to minimize the cost of processing a query. The cost function is
comprised of:
 I/O cost + CPU processing cost + communication cost + Storage cost
 These components might have different weights in different processing
environments
 The DBMs will use information stored in the system catalogue for the purpose of
estimating cost.
 The main target of query optimization is to minimize the size of the intermediate
relation. The size will have effect in the cost of:
 Disk Access
 Data Transportation
 Storage space in the Primary Memory
 Writing on Disk
08/26/25 39
Cost Estimation Approach
to Query Optimization
 Cost Components for Query Execution
 Access cost to secondary storage
 Storage cost
 Computation cost
 Memory usage cost
 Communication cost
Access Cost of Secondary Storage
 Data is going to be accessed from secondary storage, as a query will be needing
some part of the data stored in the database. The disk access cost can again be
analyzed in terms of:
 Searching
 Reading, and
 Writing, data blocks used to store some portion of a relation.
08/26/25 40
Cost Estimation Approach
to Query Optimization
 Remark: The disk access cost will vary depending on
 The file organization used and the access method implemented for the file
organization.
 whether the data is stored contiguously or in scattered manner, will affect the
disk access cost.
Storage Cost
 While processing a query, as any query would be composed of many database
operations, there could be one or more intermediate results before reaching the final
output.
 These intermediate results should be stored in primary memory for further
processing.
 The bigger the intermediate relation, the larger the memory requirement, which will
have impact on the limited available space.

08/26/25 41
Cost Estimation Approach
to Query Optimization
Computation Cost
Query is composed of many operations. The operations could be database operations
like reading and writing to a disk, or mathematical and other operations like:
 Searching
 Sorting
 Merging
 Computation on field values
Communication Cost
 In most database systems the database resides in one station and various queries
originate from different terminals.
This will have impact on the performance of the system adding cost for query
processing.
Thus, the cost of transporting data between the database site and the terminal from
where the query originate should be analyzed.

08/26/25 42
Question & Answer

08/26/25 43
Thanks !!!

08/26/25 44

You might also like