Lecture 3: Query processing and
optimization
Objectives:
Reduce processing time
Reduce buffer memory
Reduce communication cost between sites
Low resource usage
Hoa Dinh Nguyen, PhD.
PTIT 1 Department of Information Technology
Introduction
Functions of query processing:
www.ptit.edu.vn
• Transform a complicated query into a much
simpler query.
• This transformation must ensure correctness
and efficiency
• Each transformation method leads to different
resource usages → lowest resource usage.
Hoa Dinh Nguyen, PhD.
Lecture 3: Query processing and optimization 2 Department of Information Technology
Introduction
Simple transformation methods
Relational algebra:
www.ptit.edu.vn
Simplify queries based on equivalent relational algebra
expressions such that the querying time is minimized.
This method disregards database structure and size.
Cost estimation:
Specify data size and processing time of each element in the
query.
This method takes into account the data size and real
processing time of the query.
Hoa Dinh Nguyen, PhD.
Lecture 3: Query processing and optimization 3 Department of Information Technology
Querying Process
Procedure of query execution
Query pre-processing
www.ptit.edu.vn
Query transformation
Query optimization
Hoa Dinh Nguyen, PhD.
Lecture 3: Query processing and optimization 4 Department of Information Technology
www.ptit.edu.vn Querying Process
Hoa Dinh Nguyen, PhD.
Lecture 3: Query processing and optimization 5 Department of Information Technology
Querying Process
Query preprocessor:
Scanning: key words, attribute names,
www.ptit.edu.vn
relations,…
Parsing: syntax checking, parse tree
representation
Validating: semantic checking (relations,
attributes, data types)
Hoa Dinh Nguyen, PhD.
Lecture 3: Query processing and optimization 6 Department of Information Technology
Querying Process
Query optimizer:
Select suitable strategy for query processing
www.ptit.edu.vn
Query code generator:
Generate codes to implement the plan
Runtime database processor:
Compile codes to provide query results
Hoa Dinh Nguyen, PhD.
Lecture 3: Query processing and optimization 7 Department of Information Technology
www.ptit.edu.vn Querying Process
Hoa Dinh Nguyen, PhD.
Lecture 3: Query processing and optimization 8 Department of Information Technology
Querying Process
Transform in to relational algebra expressions
Query blocks: SELECT-FROM-WHERE-
www.ptit.edu.vn
GROUP BY-HAVING
Integrated queries: separate into query blocks
Hoa Dinh Nguyen, PhD.
Lecture 3: Query processing and optimization 10 Department of Information Technology
Querying Process
Transform in to relational algebra expressions
SELECT Name, Salary
www.ptit.edu.vn
FROM Staff
WHERE Salary > (SELECT AVG(Salary)
FROM Staff
WHERE Gender = “Male”)
Hoa Dinh Nguyen, PhD.
Lecture 3: Query processing and optimization 11 Department of Information Technology
Query Processing in Centralized DBS
Comparison between centralized and distributed query
processing:
Centralized:
www.ptit.edu.vn
Select the best relational algebra expression equivalent to the
query
Query processing strategies are described using relational algebra
extensions
Distributed:
Inherited from centralized environment
More issues to concern:
Math expressions of data transmission between sites
Choose the best site to process data
Data transmission methods
Hoa Dinh Nguyen, PhD.
Lecture 3: Query processing and optimization 12 Department of Information Technology
SQL
Syntax checking Optimization in RAE
www.ptit.edu.vn
Query with correct syntax Optimized RAE
Suitability checking Select optimized solution
Resonable SQL Implementation planning
Compile SQL query Generate codes
Relational algebra expression Query codes
Hoa Dinh Nguyen, PhD.
Lecture 3: Query processing and optimization 13 Department of Information Technology
Query on local relations
Global
Query decomposition Schema
Algebraic query on global relations
Control Fragment
www.ptit.edu.vn
Site Data localization Schema
Algebraic query on fragments
Local
Global Optimization Schema
Distributed query execution plan
Local sites Distributed execution
Optimized local queries
Generic layering scheme for distributed query processing 14
Hoa Dinh Nguyen, PhD.
Lecture 3: Query processing and optimization 14 Department of Information Technology
Query Processing in Centralized DBS
Optimization strategies in centralized databases:
Distributed queries must be composed and processed in
a centralized manner
www.ptit.edu.vn
All distributed query optimization techniques are
extensions from centralized query processing
approaches
Centralized query optimization is often simple
Hoa Dinh Nguyen, PhD.
Lecture 3: Query processing and optimization 15 Department of Information Technology
Query Processing in Centralized DBS
INGRES algorithm: recursively breaks up a query expressed
in SQL into smaller pieces which are executed along the way
The query is first decomposed into a sequence of queries
www.ptit.edu.vn
having a unique relation in common
Each mono-relation query is processed by selecting, based
on the predicate, the best access method to that relation
E.g: For example, if the predicate is of the form A = value, an
index available on attribute A would be used if it exists.
However, if the predicate is of the form A ≠ value, an index
on A would not help, and sequential scan should be used.
Hoa Dinh Nguyen, PhD.
Lecture 3: Query processing and optimization 16 Department of Information Technology
Query Processing in Centralized DBS
INGRES algorithm:
Executes first the unary (monorelation) operations and
tries to minimize the sizes of intermediate results in
www.ptit.edu.vn
ordering binary (multirelation) operations
Denote by qi-1qi a query q decomposed into two
subqueries, qi-1 and qi, where qi-1 is executed first and its
result is consumed by qi
Decomposes q into n subqueries q1q2 …qn using
detachment and substitution.
Hoa Dinh Nguyen, PhD.
Lecture 3: Query processing and optimization 17 Department of Information Technology
Query Processing in Centralized DBS
Detachment: breaks a query q into q’q’’, based on a
common relation that is the result of q’
q: SELECT R2.A2, R3.A3,. . ., Rn.An
www.ptit.edu.vn
FROM R1, R2,. . . , Rn
WHERE P1(R1.A’1) AND P2(R1.A1, R2.A2, . . . , Rn.An);
q’: SELECT R1A1 INTO R’1
FROM R1
WHERE P1(R1.A1);
q”:SELECT R2A2,. . ., RnAn
FROM R’1, R2,. . . , Rn
WHERE P2(R1.A1, R2.A2,. . ., Rn.An);
Hoa Dinh Nguyen, PhD.
Lecture 3: Query processing and optimization 18 Department of Information Technology
EMP (E) ASG (G)
ENO ENAME TITLE ENO PNO RESPONSIBILITY DUR
A1 Nam Phân tích HT A1 D1 Quản lý 12
A2 Trung Lập trình viên A2 D1 Phân tích 34
A3 Đông Phân tích HT A2 D2 Phân tích 6
A4 Bắc Phân tích HT A3 D3 Kỹ thuật 12
www.ptit.edu.vn
A5 Tây Lập trình viên A3 D4 Lập trình 10
A6 Hùng Kỹ sư điện A4 D2 Quản lý 6
A7 Dũng Phân tích HT A5 D2 Quản lý 20
A8 Chiến Thiết kế DL A6 D4 Kỹ thuật 36
A7 D3 Quản lý 48
A8 D3 Lập trình 15
PRJ (J) PAY(S)
PNO PNAME BUDGET TITLE SALARY
D1 CSDL 20000 Kỹ sư điện 1000
D2 CÀI ĐẶT 12000 Phân tích HT 2500
D3 BẢO TRÌ 28000 Lập trình viên 3000
D4 PHÁT TRIỂN 25000 Thiết kế DL 4000
Hoa Dinh Nguyen, PhD.
Lecture 3: Query processing and optimization 19 Department of Information Technology
Example
q1=“Find names of staffs working on project CSDL”
q 1: SELECT E.ENAME
FROM E, G, J
www.ptit.edu.vn
WHERE E.ENO = G.ENO AND G.PNO = J.PNO
AND PNAME = “CSDL”;
q1 is detached into q11q’, where TGIAN1 is an intermediate relation.
q11: SELECT J.PNO INTO TEMP1
FROM J
WHERE PNAME = “CSDL”;
q’: SELECT E.ENAME
FROM E, G, TEMP1
WHERE E.ENO = G.ENO
AND G.PNO = TEMP1.PNO;
Hoa Dinh Nguyen, PhD.
Lecture 3: Query processing and optimization 20 Department of Information Technology
Example
The successive detachments of q’ may generate
q12: SELECT G.ENO INTO TEMP2
FROM G, TEMP1
www.ptit.edu.vn
WHERE G.PNO =TEMP1.PNO;
q13: SELECT E.ENAME
FROM E, TEMP2
WHERE E.ENO = TEMP2.ENO;
q1 has been decomposed into q11q12q13
Query q11 is mono-relation and can be executed. However,
q12 and q13 are not mono-relation and cannot be reduced by
detachment.
Hoa Dinh Nguyen, PhD.
Lecture 3: Query processing and optimization 21 Department of Information Technology
Query Processing in Centralized DBS
Tuple substitution: Given an n-relation query q, the tuples of
one relation are substituted by their values, thereby producing
a set of (n-1)-relation queries.
www.ptit.edu.vn
One relation in q is chosen for tuple substitution. Let R1 be that
relation.
For each tuple t1i ∈ R1, the attributes referred to by in q are replaced
by their actual values in t1i, thereby generating a query q’ with n-1
relations.
q(R1,R2,...,Rn) is replaced by {q’(t1i,R2,R3, ... ,Rn),t1i ∈ R1}
Hoa Dinh Nguyen, PhD.
Lecture 3: Query processing and optimization 22 Department of Information Technology
Query Processing in Centralized DBS
Example: Let’s consider the query q13
q13: SELECT E.ENAME
FROM E, TEMP2
www.ptit.edu.vn
WHERE E.ENO = TEMP2.ENO ;
The relation TEMP2 is over a single attribute (ENO). Assume that it
contains only two tuples: <E1> and <E2>. The substitution of TEMP2
generates two one-relation subqueries:
q131: SELECT E.ENAME
FROM E
WHERE E.ENO = “E1”;
q132: SELECT E.ENAME
FROM E
WHERE E.ENO = “E2”;
These queries may then be executed
Hoa Dinh Nguyen, PhD.
Lecture 3: Query processing and optimization 23 Department of Information Technology
INGRES- QOA Algorithm
Input: MRQ: multirelation query with n relations
Output: Result of execution
Begin
Output
If n=1 then
Output run(MRQ)
www.ptit.edu.vn
Else
{ORQ1, ..., ORQm, MRQ’} MRQ
For i1 to m
Output’ run(ORQi)
Output output output’
Endfor
R CHOOSE_ RELATION(MRQ’)
For each tuple t R
MRQ” substitute values for t in MRQ’
Output’ INGRES-QOA(MRQ”)
Output output output’
Endfor
Endif
End.
Lecture 3: {INGRES-QOA}
Query processing and optimization 24 Hoa Dinh Nguyen, PhD.
Department of Information Technology
Query Decomposition
Normalization: to transform the query to a
normalized form to facilitate further processing.
www.ptit.edu.vn
Conjunctive normal form:
(p11 p12 ... p1n) ... (pm1 pm2 ... pmn)
Disjunctive normal form
(p11 p12 ... p1n) ... (pm1 pm2 ... pmn)
where pij is a simple predicate
Hoa Dinh Nguyen, PhD.
Lecture 3: Query processing and optimization 25 Department of Information Technology
Query Decomposition
Equivalence rules
p1 p2 ⟺ p2 p1
p1 p2 ⟺ p2 p1
www.ptit.edu.vn
p1 (p2 p3) ⟺ (p1 p2 ) p3
p1 (p2 p3) ⟺ (p1 p2) p3
p1 (p2 p3) ⟺ (p1 p2) (p1 p3)
p1 (p2 p3) ⟺ (p1 p2 ) (p1 p3)
¬(p1 p2) ⟺ ¬p2 ¬p1
¬(p1 p2) ⟺ ¬p2 ¬p1
¬(¬p) ⟺ p
Hoa Dinh Nguyen, PhD.
Lecture 3: Query processing and optimization 26 Department of Information Technology
Query Decomposition
Example: “Find the names of employees who have been
working on project P1 for 12 or 24 months”
SELECT ENAME
www.ptit.edu.vn
FROM EMP, ASG
WHERE EMP.ENO = ASG.ENO
AND ASG.PNO = "P1"
AND (DUR = 12 OR DUR = 24);
conjunctive normal form is
EMP.ENO=ASG.ENO ASG.PNO = “P1” (DUR =12 DUR=24)
disjunctive normal form is
(EMP.ENO = ASG.ENO ASG.PNO = “P1” DUR = 12)
(EMP.ENO = ASG.ENO ASG.PNO = “P1” DUR = 24)
Hoa Dinh Nguyen, PhD.
Lecture 3: Query processing and optimization 27 Department of Information Technology
Query Decomposition
Analysis: enables rejection of normalized queries for
which further processing is either impossible or
www.ptit.edu.vn
unnecessary.
A query is type incorrect: if any of its attribute or
relation names are not defined in the global schema,
or if operations are being applied to attributes of the
wrong type.
A query is semantically incorrect if its components
do not contribute in any way to the generation of the
result.
Hoa Dinh Nguyen, PhD.
Lecture 3: Query processing and optimization 28 Department of Information Technology
Query Decomposition
Example: This query is type incorrect
SELECT E#
www.ptit.edu.vn
FROM EMP
WHERE ENAME > 200;
For 2 reasons:
Attribute E# is not declared in the schema
the operation “>200” is incompatible with the type
string of ENAME.
Hoa Dinh Nguyen, PhD.
Lecture 3: Query processing and optimization 29 Department of Information Technology
Query Decomposition
In the context of relational calculus, it is not possible
to determine the semantic correctness of general
www.ptit.edu.vn
queries.
It is possible to do so based on the representation of
the query as a graph, called a query graph or
connection graph
Hoa Dinh Nguyen, PhD.
Lecture 3: Query processing and optimization 30 Department of Information Technology
Query Decomposition
Query graph or connection graph
One node indicates the result relation, and any other node
indicates an operand relation
www.ptit.edu.vn
An edge between two nodes one of which does not
correspond to the result represents a join, whereas an edge
whose destination node is the result represents a project
A non-result node may be labeled by a select or a self-join
(join of the relation with itself) predicate
An important subgraph of the query graph is the join
graph, in which only the joins are considered
Hoa Dinh Nguyen, PhD.
Lecture 3: Query processing and optimization 31 Department of Information Technology
Query Decomposition
Example: “Find the names and responsibilities of programmers
who have been working on the CAD/CAM project for more
than 3 years.”
www.ptit.edu.vn
SELECT ENAME, RESP
FROM E, G, J
WHERE E.ENO = G.ENO
AND G.PNO = J.PNO
AND PNAME = "CAD/CAM"
AND DUR > 36
AND TITLE = “Programmer”;
Hoa Dinh Nguyen, PhD.
Lecture 3: Query processing and optimization 32 Department of Information Technology
Query Decomposition
DUR 36
E.ENO=G.ENO G.PNO=J.PNO
www.ptit.edu.vn
TITLE= “Programer” E G.RESPONSIBILITY J
PNAME=”CAD/CAM”
E.ENAME
Result
(a) Query graph
G
G.ENO=G.ENO G.PNO=J.PNO
E J
(b) Joint graph
Hoa Dinh Nguyen, PhD.
Lecture 3: Query processing and optimization 33 Department of Information Technology
Query Decomposition
Example: Consider the following query
SELECT ENAME, RESP
www.ptit.edu.vn
FROM E, G, J
WHERE E.ENO = G.ENO
AND PNAME = "CAD/CAM"
AND DUR > 36
AND TITLE = "Programmer”;
Hoa Dinh Nguyen, PhD.
Lecture 3: Query processing and optimization 34 Department of Information Technology
Query Decomposition
Its query graph is disconnected, which tells us that the query
is semantically incorrect.
www.ptit.edu.vn
DUR 36
E.ENO=G.ENO
TITLE= “Programer” E G.RESPONSIBILITY J
PNAME=”CAD/CAM”
E.ENAME
Result
Query graph
Hoa Dinh Nguyen, PhD.
Lecture 3: Query processing and optimization 35 Department of Information Technology
Query Decomposition
Solutions to this problem:
Reject the query
www.ptit.edu.vn
Assume that there is an implicit Cartesian product
between relations G and J,
Infer (using the schema) the missing join predicate
G.PNO = J.PNO which transforms the query into
that of previous Example.
Hoa Dinh Nguyen, PhD.
Lecture 3: Query processing and optimization 36 Department of Information Technology
Query Decomposition
Elimination of Redundancy: The enriched query
qualification may contain redundant predicates. A
www.ptit.edu.vn
naive evaluation of a qualification with redundancy
can well lead to duplicated work. Such redundancy
and thus redundant work may be eliminated by
simplifying the qualification with the following well-
known idempotency rules:
Hoa Dinh Nguyen, PhD.
Lecture 3: Query processing and optimization 37 Department of Information Technology
Query Decomposition
1. p p p 2. p true true
3. p p p 4. p p false
www.ptit.edu.vn
5. p true p 6. p p true
7. p false p 8. p1 (p1 p2) p1
9. p false false 10.p1 (p1 p2) p1
Hoa Dinh Nguyen, PhD.
Lecture 3: Query processing and optimization 38 Department of Information Technology
Query Decomposition
Example: the SQL query
SELECT TITLE
www.ptit.edu.vn
FROM E
WHERE (NOT (TITLE = "Programmer")
AND (TITLE = "Programmer" OR TITLE = "Elect. Eng.")
AND NOT (TITLE = "Elect. Eng."))
OR ENAME = “J. Doe”;
Hoa Dinh Nguyen, PhD.
Lecture 3: Query processing and optimization 39 Department of Information Technology
Query Decomposition
Let:
p1: TITLE = “Programmer”,
www.ptit.edu.vn
p2: TITLE = “Elect. Eng.”,
p3: ENAME = “J. Doe”.
The query application is
( p1 (p1 p2) p2) p3
(( p1 p1 p2) ( p1 p2 p2)) p3
((false p2) ( p1 false) ) p3
(false false ) p3
p3
Hoa Dinh Nguyen, PhD.
Lecture 3: Query processing and optimization 40 Department of Information Technology
Query Decomposition
The query can be simplified using the previous rules to
become
www.ptit.edu.vn
SELECT TITLE
FROM E
WHERE ENAME = "J. Doe";
Hoa Dinh Nguyen, PhD.
Lecture 3: Query processing and optimization 41 Department of Information Technology
Query Decomposition
Rewriting: rewrites the query in relational algebra. it
is customary to represent the relational algebra query
www.ptit.edu.vn
graphically by an operator tree.
An operator tree is a tree in which a leaf node is a
relation stored in the database, and a non-leaf node is
an intermediate relation produced by a relational
algebra operator.
The sequence of operations is directed from the
leaves to the root, which represents the answer to
the query.
Hoa Dinh Nguyen, PhD.
Lecture 3: Query processing and optimization 42 Department of Information Technology
Query Decomposition
The transformation of a tuple relational calculus query into an
operator tree can easily be achieved as follows
A different leaf is created for each different tuple variable
www.ptit.edu.vn
(corresponding to a relation). In SQL, the leaves are
immediately available in the FROM clause.
The root node is created as a project operation involving the
result attributes. These are found in the SELECT clause in
SQL.
The qualification (SQL WHERE clause) is translated into the
appropriate sequence of relational operations (select, join,
union, etc.) going from the leaves to the root.
Hoa Dinh Nguyen, PhD.
Lecture 3: Query processing and optimization 43 Department of Information Technology
Query Decomposition
Example: “Find the names of employees other than J. Doe who
worked on the CAD/CAM project for either one or two
years”
www.ptit.edu.vn
SELECT ENAME
FROM PROJ, ASG, EMP
WHERE ASG.ENO = EMP.ENO
AND ASG.PNO = PROJ.PNO
AND ENAME != "J. Doe"
AND PROJ.PNAME = "CAD/CAM"
AND (DUR = 12 OR DUR = 24) ;
Hoa Dinh Nguyen, PhD.
Lecture 3: Query processing and optimization 44 Department of Information Technology
www.ptit.edu.vn Query Decomposition
Hoa Dinh Nguyen, PhD.
Lecture 3: Query processing and optimization 45 Department of Information Technology
Query Decomposition
1. Commutativity of binary operators
R×SS×R
www.ptit.edu.vn
R⋈SS⋈R
2. Associativity of binary operators
R × (S × T) (R × S) × T
R ⋈ (S ⋈ T) (R ⋈ S) ⋈ T
3. Idempotence of unary operators
A’(A’’(R)) A’(R) , where A’, A’’ R and A’ A’’
p ( A ) ( p
1 1 2 ( A2 )
( R)) p1 ( A1 ) p2 ( A2 ) ( R)
Hoa Dinh Nguyen, PhD.
Lecture 3: Query processing and optimization 46 Department of Information Technology
Query Decomposition
4. Commuting selection with projection
A1 ,...,An ( p ( Ap ) ( R)) A1 ,...,An ( p ( Ap ) ( A1 ,...,An , Ap ( R)))
www.ptit.edu.vn
Note that if Ap is already a member of {A1,…,An}, the last
projection on [A1,…, An] on the right-hand side of the
equality is useless.
A1 ,...,An ( p ( Ap ) ( R)) p ( Ap ) ( A1 ,...,An , Ap ( R))
Hoa Dinh Nguyen, PhD.
Lecture 3: Query processing and optimization 47 Department of Information Technology
Query Decomposition
5. Commuting selection with binary operators
p ( A ) ( R S ) p ( A ) ( R) S
www.ptit.edu.vn
p p
p( A ) (R
i p ( Ai , Bk )
S ) p ( Ai ) ( R ) p ( Ai , Bk ) S
p ( A ) ( R T ) p ( A ) ( R) p ( A ) (T )
i i i
Hoa Dinh Nguyen, PhD.
Lecture 3: Query processing and optimization 48 Department of Information Technology
Query Decomposition
6. Commuting projection with binary operators
If C=A’B’, where A’ A, B’ B, and A, B are the sets of
attributes over which relations R and S, respectively, are
www.ptit.edu.vn
defined, we have
C ( R S ) A' ( R) B ' (S )
C ( R p(Ai , B j ) S ) A ' ( R) p(Ai ,B j )
B' (S )
C ( R S ) A' ( R) B ' (S )
Hoa Dinh Nguyen, PhD.
Lecture 3: Query processing and optimization 49 Department of Information Technology
Query Decomposition
These rules can be used in four different ways:
First, they allow the separation of the unary operations,
simplifying the query expression.
www.ptit.edu.vn
Second, unary operations on the same relation may be
grouped so that access to a relation for performing unary
operations can be done only once.
Third, unary operations can be commuted with binary
operations so that some operations (e.g., selection) may be
done first.
Fourth, the binary operations can be ordered.
Hoa Dinh Nguyen, PhD.
Lecture 3: Query processing and optimization 50 Department of Information Technology
www.ptit.edu.vn Query Decomposition
Hoa Dinh Nguyen, PhD.
Lecture 3: Query processing and optimization 51 Department of Information Technology
www.ptit.edu.vn Query Decomposition
Hoa Dinh Nguyen, PhD.
Lecture 3: Query processing and optimization 52 Department of Information Technology
Localization of Distributed Data
General techniques for decomposing and restructuring
queries apply to both centralized and distributed DBMSs and
do not take into account the distribution of data.
www.ptit.edu.vn
Localization layer translates an algebraic query on global
relations into an algebraic query expressed on physical
fragments
Localization uses information stored in the fragment schema
A global relation can be reconstructed by applying the
reconstruction (or reverse fragmentation) rules and deriving
a relational algebra program whose operands are the
fragments. We call this a localization program.
Hoa Dinh Nguyen, PhD.
Lecture 3: Query processing and optimization 53 Department of Information Technology
Localization of Distributed Data
A naive way to localize a distributed query is to generate a
query where each global relation is substituted by its
localization program.
www.ptit.edu.vn
Localized query: replacing the leaves of the operator tree of
the distributed query with subtrees corresponding to the
localization programs
Reduction techniques: generate simpler and optimized
queries by using the transformation rules and the heuristics,
such as pushing unary operations down the tree
Hoa Dinh Nguyen, PhD.
Lecture 3: Query processing and optimization 54 Department of Information Technology
Reduction for PHF
The horizontal fragmentation function distributes a relation
based on selection predicates
Example: Relation EMP(ENO, ENAME, TITLE) can be split
www.ptit.edu.vn
into three horizontal fragments EMP1, EMP2, and EMP3
1. EMP1 = 𝜎𝐸𝑁𝑂≤"𝐸3" (EMP)
2. EMP2 = 𝜎"𝐸3"<𝐸𝑁𝑂≤"𝐸6" (EMP)
3. EMP3 = 𝜎𝐸𝑁𝑂>"𝐸6" (EMP)
The localization program for an horizontally fragmented
relation is the union of the fragments
EMP = EMP1 EMP2 EMP3
Hoa Dinh Nguyen, PhD.
Lecture 3: Query processing and optimization 55 Department of Information Technology
Reduction with Selection
Selections on fragments that have a qualification
contradicting the qualification of the fragmentation rule
generate empty relations.
www.ptit.edu.vn
Given a relation R that has been horizontally fragmented as
R1, R2, …, RN, where Ri = 𝜎𝑝𝑖 (R), the rule can be stated
formally as follows:
Rule 1:𝜎𝑝𝑗 𝑅𝑖 = ∅ if ∀𝑥 in R: ¬ 𝑝𝑖 𝑥 ⋀𝑝𝑗 𝑥
where pi and pj are selection predicates, x denotes a tuple, and
p(x) denotes “predicate p holds for x.”
Hoa Dinh Nguyen, PhD.
Lecture 3: Query processing and optimization 56 Department of Information Technology
Reduction with Selection
Example:
SELECT *
www.ptit.edu.vn
FROM EMP
WHERE ENO = “E5” ;
Hoa Dinh Nguyen, PhD.
Lecture 3: Query processing and optimization 57 Department of Information Technology
Reduction with Join
Joins on horizontally fragmented relations can be simplified
when the joined relations are fragmented according to the
join attribute.
www.ptit.edu.vn
The simplification consists of distributing joins over unions
and eliminating useless joins
𝑅1 ∪ 𝑅2 ⋈ 𝑆 = 𝑅1 ⋈ 𝑆 ∪ 𝑅2 ⋈ 𝑆
where Ri are fragments of R and S is a relation.
Rule 2: 𝑅𝑖 ⋈ 𝑅𝑗 = ∅ if ∀𝑥 in Ri, ∀𝑦 in Rj :
¬ 𝑝𝑖 𝑥 ⋀𝑝𝑗 𝑦
fragments Ri and Rj are defined, respectively, according to
predicates pi and pj on the same attribute
Hoa Dinh Nguyen, PhD.
Lecture 3: Query processing and optimization 58 Department of Information Technology
Reduction with Join
Example: Assume that relation EMP is fragmented between
EMP1, EMP2, and EMP3, as above, and that relation ASG is
fragmented as
www.ptit.edu.vn
1. ASG1 = 𝜎𝐸𝑁𝑂≤"𝐸3" (ASG)
2. ASG2 = 𝜎𝐸𝑁𝑂>"𝐸3" (ASG)
Consider the join query:
SELECT *
FROM EMP, ASG
WHERE EMP.ENO = ASG.ENO;
Hoa Dinh Nguyen, PhD.
Lecture 3: Query processing and optimization 59 Department of Information Technology
www.ptit.edu.vn Reduction with Join
Hoa Dinh Nguyen, PhD.
Lecture 3: Query processing and optimization 60 Department of Information Technology
Reduction for Vertical Fragmentation
The vertical fragmentation function distributes a relation
based on projection attributes
The localization program for a VF relation consists of the
www.ptit.edu.vn
join of the fragments on the common attribute.
Example: Relation EMP can be divided into two vertical
fragments where the key attribute ENO is duplicated:
1. EMP1 = ∏𝐸𝑁𝑂,𝐸𝑁𝐴𝑀𝐸 (EMP)
2. EMP2 = ∏𝐸𝑁𝑂,𝑇𝐼𝑇𝐿𝐸 (EMP)
The localization program is
EMP = EMP1 ⋈𝐸𝑁𝑂 EMP2
Hoa Dinh Nguyen, PhD.
Lecture 3: Query processing and optimization 61 Department of Information Technology
Reduction for Vertical Fragmentation
Queries on vertical fragments can be reduced by
determining the useless intermediate relations and removing
the subtrees that produce them
www.ptit.edu.vn
Projections on a vertical fragment that has no attributes in
common with the projection attributes (except the key of
the relation) produce useless, though not empty relations.
Rule 3: ∏𝐷,𝐾 𝑅𝑖 is useless if the set of projection
attributes D is not in A’
Where relation R(A1, A2,…, An) is vertically fragmented as
𝑅𝑖 = ∏𝐴′ 𝑅 , and 𝐴′ ⊆ 𝐴1 , 𝐴2 , … , 𝐴𝑛
Hoa Dinh Nguyen, PhD.
Lecture 3: Query processing and optimization 62 Department of Information Technology
Reduction for Vertical Fragmentation
Example: consider the query
SELECT ENAME
www.ptit.edu.vn
FROM EMP
Hoa Dinh Nguyen, PhD.
Lecture 3: Query processing and optimization 63 Department of Information Technology
Reduction for Derived Fragmentation
If relation R is subject to derived horizontal fragmentation
due to relation S, the fragments of R and S that have the
same join attribute values are located at the same site.
www.ptit.edu.vn
S can be fragmented according to a selection predicate.
Derived fragmentation should be used only for one-to-many
(hierarchical) relationships of the form S → R, where a
tuple of S can match with n tuples of R, but a tuple of R
matches with exactly one tuple of S.
Note that derived fragmentation could be used for many-to-many
relationships provided that tuples of S (that match with n tuples of R)
are replicated.
Hoa Dinh Nguyen, PhD.
Lecture 3: Query processing and optimization 64 Department of Information Technology
Reduction for Derived Fragmentation
Example: Given a one-to-many relationship from EMP to
ASG, relation ASG(ENO, PNO, RESP, DUR) can be
indirectly fragmented according to the following rules:
www.ptit.edu.vn
1. ASG1 = ASG ⋈𝐸𝑁𝑂 EMP1
2. ASG2 = ASG ⋈𝐸𝑁𝑂 EMP2
Where: EMP1 = 𝜎𝑇𝐼𝑇𝐿𝐸="𝑃𝑟𝑜𝑔𝑟𝑎𝑚𝑚𝑒𝑟" (EMP)
EMP2 = 𝜎𝑇𝐼𝑇𝐿𝐸≠"𝑃𝑟𝑜𝑔𝑟𝑎𝑚𝑚𝑒𝑟" (EMP)
The localization program for a horizontally fragmented
relation is the union of the fragments
ASG = ASG1 ASG2
Hoa Dinh Nguyen, PhD.
Lecture 3: Query processing and optimization 65 Department of Information Technology
Reduction for Derived Fragmentation
Queries on derived fragments can also be reduced: certain
joins will produce empty relations if the fragmentation
predicates conflict.
www.ptit.edu.vn
Example: the predicates of ASG1 and EMP2 conflict, thus
ASG1 ⋈ EMP2 = Ø
The reduced query is always preferable to the localized
query because the number of partial joins usually equals the
number of fragments of R.
Hoa Dinh Nguyen, PhD.
Lecture 3: Query processing and optimization 66 Department of Information Technology
Reduction for Derived Fragmentation
Example:
SELECT *
FROM EMP, ASG
www.ptit.edu.vn
WHERE ASG.ENO = EMP.ENO
AND TITLE = "Mech. Eng."
Hoa Dinh Nguyen, PhD.
Lecture 3: Query processing and optimization 67 Department of Information Technology
www.ptit.edu.vn Reduction for Derived Fragmentation
Hoa Dinh Nguyen, PhD.
Lecture 3: Query processing and optimization 68 Department of Information Technology
www.ptit.edu.vn Reduction for Derived Fragmentation
Hoa Dinh Nguyen, PhD.
Lecture 3: Query processing and optimization 69 Department of Information Technology
Reduction for Hybrid Fragmentation
The goal of hybrid fragmentation is to efficiently support
queries involving projection, selection, and join.
www.ptit.edu.vn
The optimization of an operation or of a combination of
operations is always done at the expense of other
operations.
The localization program for a hybrid fragmented
relation uses unions and joins of fragments
Hoa Dinh Nguyen, PhD.
Lecture 3: Query processing and optimization 70 Department of Information Technology
Reduction for Hybrid Fragmentation
Example:
1. EMP1 = 𝜎𝐸𝑁𝑂≤"𝐸4" ∏𝐸𝑁𝑂,𝐸𝑁𝐴𝑀𝐸 𝐸𝑀𝑃
www.ptit.edu.vn
2. EMP2 = 𝜎𝐸𝑁𝑂>"𝐸4" ∏𝐸𝑁𝑂,𝐸𝑁𝐴𝑀𝐸 𝐸𝑀𝑃
3. EMP3 = ∏𝐸𝑁𝑂,𝑇𝐼𝑇𝐿𝐸 𝐸𝑀𝑃
The localization program is
𝐸𝑀𝑃 = 𝐸𝑀𝑃1 ∪ 𝐸𝑀𝑃2 ⋈𝐸𝑁𝑂 𝐸𝑀𝑃3
Hoa Dinh Nguyen, PhD.
Lecture 3: Query processing and optimization 71 Department of Information Technology
Reduction for Hybrid Fragmentation
Queries on hybrid fragments can be reduced by
combining the rules:
www.ptit.edu.vn
1. Remove empty relations generated by contradicting
selections on horizontal fragments.
2. Remove useless relations generated by projections on
vertical fragments.
3. Distribute joins over unions in order to isolate and
remove useless joins.
Hoa Dinh Nguyen, PhD.
Lecture 3: Query processing and optimization 72 Department of Information Technology
Reduction for Hybrid Fragmentation
Example: application of rules (1) and (2)
SELECT ENAME
FROM EMP
www.ptit.edu.vn
WHERE ENO="E5“
Hoa Dinh Nguyen, PhD.
Lecture 3: Query processing and optimization 73 Department of Information Technology
Query Optimization in DDBS
Query decomposition and data localization are the two
successive functions that map a calculus query, expressed on
distributed relations, into an algebraic query (query
www.ptit.edu.vn
decomposition), expressed on relation fragments (data
localization).
Query decomposition can generate an algebraic query simply
by translating into relational operations the predicates and
the target statement as they appear.
Data localization can, in turn, express this algebraic query on
relation fragments, by substituting for each distributed
relation an algebraic query corresponding to its
fragmentation rules.
Hoa Dinh Nguyen, PhD.
Lecture 3: Query processing and optimization 76 Department of Information Technology