Unit 1
Unit 1
EVALUATION
Structure Page Nos.
1.0 Introduction 5
1.1 Objectives 5
1.2 Query Processing : An Introduction 6
1.2.1 Optimisation
1.2.2 Measure of Query Cost
1.3 Select Operation 7
1.4 Sorting 9
1.5 Join Operation 12
1.5.1 Nested-Loop Join
1.5.2 Block Nested-Loop Join
1.5.3 Indexed Nested-Loop Join
1.5.4 Merge-Join
1.5.5 Hash-Join
1.5.6 Complex Join
1.6 Other Operations 18
1.7 Representation and Evaluation of Query Expressions 19
1.8 Creation of Query Evaluation Plans 20
1.8.1 Transformation of Relational Expressions
1.8.2 Query Evaluation Plans
1.8.3 Choice of Evaluation Plan
1.8.4 Cost Based Optimisation
1.8.5 Storage and Query Optimisation
1.9 View And Query Processing 24
1.9.1 Materialised View
1.9.2 Materialised Views and Query Optimisation
1.10 Summary 26
1.11 Solutions/Answers 26
1.0 INTRODUCTION
The Query Language – SQL is one of the main reasons of success of RDBMS. A user
just needs to specify the query in SQL that is close to the English language and does
not need to say how such query is to be evaluated. However, a query needs to be
evaluated efficiently by the DBMS. But how is a query-evaluated efficiently? This
unit attempts to answer this question. The unit covers the basic principles of query
evaluation, the cost of query evaluation, the evaluation of join queries, etc. in detail.
It also provides information about query evaluation plans and the role of storage in
query evaluation and optimisation. This unit thus, introduces you to the complexity of
query evaluation in DBMS.
1.1 OBJECTIVES
Before defining the measures of query cost, let us begin by defining query processing.
Let us take a look at the Figure 1.
In the first step Scanning, Parsing, and Validating is done to translate the query into its
internal form. This is then further translated into relational algebra (an intermediate
query form). Parser checks syntax and verifies relations. The query then is optimised
with a query plan, which then is compiled into a code that can be executed by the
database runtime processor.
Each relational algebraic operation can be evaluated using one of the several different
algorithms. Correspondingly, a relational-algebraic expression can be evaluated in
many ways.
Query Optimisation: Amongst all equivalent plans choose the one with the lowest
cost. Cost is estimated using statistical information from the database catalogue, for
example, number of tuples in each relation, size of tuples, etc.
Thus, in query optimisation we find an evaluation plan with the lowest cost. The cost
estimation is made on the basis of heuristic rules.
Typically disk access is the predominant cost as disk transfer is a very slow event and
is also relatively easy to estimate. It is measured by taking into account the following
activities:
Please note that the cost for writing a block is higher than the cost for reading a block.
This is due to the fact that the data is read back after being written to ensure that the
write was successful. However, for the sake of simplicity we will just use number of
block transfers from disk as the cost measure. We will also ignore the difference in
cost between sequential and random I/O, CPU and communication costs. The I/O cost
depends on the search criteria i.e., point/range query on an ordering/other fields and
the file structures: heap, sorted, hashed. It is also dependent on the use of indices such
as primary, clustering, secondary, B+ tree, multilevel, etc. There are other cost factors
also, these may include buffering, disk placement, materialisation, overflow / free
space management etc.
In the subsequent section, let us try to find the cost of various operations.
File scan: These are the search algorithms that locate and retrieve records that fulfil a
selection condition in a file. The following are the two basic files scan algorithms for
selection operation:
7
DBMS Advanced 1) Linear search: This algorithm scans each file block and tests all records to see
Features and
Distributed Database
whether they satisfy the selection condition.
The cost of this algorithm (in terms of block transfer) is defined as:
Cost of searching records satisfying a condition = Number of blocks in a
database = Nb.
Cost for searching a key attribute value = Average number of block transfer
for locating the value (on an average, half of the file needs to be traversed) so
it is = Nb/2.
These two values may be calculated from the statistics of the database.
Index scan: Search algorithms that use an index are restricted because the selection
condition must be on the search-key of the index.
3) (a) Primary index-scan for equality: This search retrieves a single record that
satisfies the corresponding equality condition. The cost here can be calculated
as:
Cost = Height traversed in index to locate the block pointer +1(block of the
primary key is transferred for access).
(b) Hash key: It retrieves a single record in a direct way thus, cost in hash key
may also be considered as Block transfer needed for finding hash target +1
4) Primary index-scan for comparison: Assuming that the relation is sorted on the
attribute(s) that are being compared, (< , > etc.), then we need to locate the first
record satisfying the condition after which the records are scanned forward or
backward as the condition may be, displaying all the records. Thus cost in this
case would be:
Cost = Number of block transfer to locate the value in index + Transferring all the
blocks of data satisfying that condition.
Please note we can calculate roughly (from the cost point of view) the number of
blocks satisfying the condition as:
Number of values that satisfy the condition × average number of tuples per attribute
value/blocking factor of the relation.
Cost = cost of accessing index + number of records retrieved (It can be very
expensive).
Each record may be on a different block, thus, requiring one block access for each
retrieved record (this is the worst case cost).
(b) Secondary index, comparison: For the queries of the type that use comparison on
secondary index value ≥ a value, then the index can be used to find first index entry
which is greater than that value, scan index sequentially from there till the end and
also keep finding the pointers to records.
For the ≤ type query just scan leaf pages of index, also keep finding pointers to
records, till first entry is found satisfying the condition.
In either case, retrieving records that are pointed to, may require an I/O for each
record. Please note linear file scans may be cheaper if many records are to be fetched.
7) Conjunctive selection using one index: In such a case, select any algorithm
given earlier on one or more conditions. If possible, test other conditions on
these tuples after fetching them into memory buffer.
1.4 SORTING
Now we need to take a look at sorting techniques that can be used for calculating
costing. There are various methods that can be used in the following ways:
1) Use an existing applicable ordered index (e.g., B+ tree) to read the relation in
sorted order.
9
DBMS Advanced 2) Build an index on the relation, and then use the index to read the relation in
Features and
Distributed Database
sorted order. (Options 1&2 may lead to one block access per tuple).
3) For relations that fit in the memory, techniques like quicksort can be used.
4) For relations that do not fit in the memory, external sort-merge is a good choice.
(d) i = i + 1;
Repeat steps (a) to (e) until all input buffer blocks are empty;
(a) Select the first record (in sort order) among all input buffer blocks;
(c) If the output buffer block is full then write it to disk and empty it for the
next set of data. This step may be performed automatically by the Operating
System;
(d) Delete the record from its input buffer block;
(e) If the buffer block becomes empty then read the next block (if any) of the
temporary file into the buffer.
If N ≥ M, several merge passes are required, in each pass, contiguous groups of M − 1
partitions are merged and a pass reduces the number of temporary files temp (i) by a
factor of M –1. For example, if M=11 and there are 90 temporary files, one pass
reduces the number of temporary files to 9, each temporary file begin 10 times the
size of the earlier partitions.
Repeated passes are performed till all partitions have been merged into one.
10
Figure 2 shows an example for external sorting using sort-merge. Query Processing and
Evaluation
Cost Analysis:
Cost Analysis is may be performed, according to the following activities:
• Nested-loop join
• Block nested-loop join
• Indexed nested-loop join
• Merge-join
• Hash-join
• Complex join.
The choice of join algorithm is based on the cost estimate. Let us use the following
information to elaborate the same:
MARKS (enroll no, subject code, marks):10000 rows, 500 blocks
STUDENT (enroll no, name, dob): 2000 rows, 100 blocks
• In the nested loop join there is one outer relation and one inner relation.
• It does not use or require indices. It can be used with any kind of join condition.
However, it is expensive as it examines every pair of tuples in the two relations.
• If there is only enough memory to hold one block of each relation, the number
of disk accesses can be calculated as:
For each tuple of STUDENT, all the MARKS tuples (blocks) that need to be
accessed.
However, if both or one of the relations fit entirely in the memory, block transfer will
be needed only once, so the total number of transfers in such a case, may be calculated
as:
If only the smaller of the two relations fits entirely in the memory then use that as the
inner relation and the bound still holds.
12
Cost for the worst case: Query Processing and
Evaluation
Number of tuples of outer relation × Number of blocks of inner relation + Number of
blocks of outer relation.
There is one more possible bad case when MARKS is on outer loop and STUDENT in
the inner loop. In this case, the number of Block transfer will be:
Worst case of block accesses in this case = Number of Blocks of outer relation
(STUDENT) ×Number of blocks of inner relation (MARKS) + Number of blocks of
outer relation (STUDENT).
13
DBMS Advanced 1.5.3 Indexed Nested-Loop Join
Features and
Distributed Database
Index scans can replace file scans if the join is an equi-join or natural join, and an
index is available on the inner relation’s join attribute.
For each tuple si in the outer relation STUDENT, use the index to look up tuples in
MARKS that satisfy the join condition with tuple si.
In a worst case scenarios, the buffer has space for only one page of STUDENT, and,
for each tuple in MARKS, then we should perform an index lookup on MARKS index.
If a supporting index does not exist than it can be constructed as and when needed.
If indices are available on the join attributes of both STUDENT and MARKS, then
use the relation with fewer tuples as the outer relation.
1.5.4 Merge-Join
The merge-join is applicable to equi-joins and natural joins only. It has the following
process:
1) Sort both relations on their join attribute (if not already sorted).
2) Merge the sorted relations to join them. The join step is similar to the merge stage
of the sort-merge algorithm, the only difference lies in the manner in which
duplicate values in join attribute are treated, i.e., every pair with same value on
join attribute must be matched.
STUDENT MARKS
Enrol no Name ----- Enrol no subject Marks
1001 Ajay …. code
1002 Aman ….. 1001 MCS-011 55
1005 Rakesh ….. 1001 MCS-012 75
1100 Raman ….. 1002 MCS-013 90
…… ……….. …… 1005 MCS-015 75
……. ……….. ………
Figure 3: Sample relations for computing join
14
Hybrid Merge-Join Query Processing and
Evaluation
This is applicable only when the join is an equi-join or a natural join and one relation
is sorted and the other has a secondary B+-tree index on the join attribute.
1.5.5 Hash-Join
This is applicable to both the equi-joins and natural joins. A hash function h is used to
partition tuples of both relations, where h maps joining attribute (enroll no in our
example) values to {0, 1, ..., n-1}.
The join attribute is hashed to the join-hash partitions. In the example of Figure 4 we
have used mod 100 function to hashing, and n = 100.
Error!
Once the partition tables of STUDENT and MARKS are made on the enrolment
number, then only the corresponding partitions will participate in the join as:
A STUDENT tuple and a MARKS tuple that satisfy the join condition will have the
same value for the join attributes. Therefore, they will be hashed to equivalent
partition and thus can be joined easily.
• For each partition si of s, load the partition into memory and build an in-
memory hash index on the join attribute.
15
DBMS Advanced • Read the tuples in ri from the disk, one block at a time. For each tuple in ri
Features and
Distributed Database
locate each matching tuple in si using the in-memory hash index and output the
concatenation of their attributes.
In this method, the relation s is called the build relation and r is called the probe
relation. The value n (the number of partitions) and the hash function h is chosen in
such a manner that each si should fit in to the memory. Typically n is chosen as
Number of blocks of s/Number of memory buffers] *f (M) where f is a “fudge
factor”, typically around 1.2. The probe relation partitions ri need not fit in memory.
Average size of a partition si will be less than M blocks using the formula for n as
above thereby allowing room for the index. If the build relation s is very huge, then
the value of n as given by the above formula may be greater than M− 1 i.e., the
number of buckets is > the number of buffer pages. In such a case, the relation s can
be recursively partitioned, instead of partitioning n ways, use M – 1 partitions for s
and further partition the M – 1 partitions using a different hash function. You should
use the same partitioning method on r. This method is rarely required, as recursive
partitioning is not needed for relations of 1GB or less for a memory size of 2MB, with
block size of 4KB.
(ii) Cost of performing the hash-join using build and probe will require at least one
block transfer for reading the partitions
Cost 2 = (blocks of r + blocks of s)
(iii) There are a few more blocks in the main memory that may be used for
evaluation, they may be read or written back. We ignore this cost as it will be
too less in comparison to cost 1 and cost 2.
Thus, the total cost = cost 1 + cost 2
= 3 (blocks of r + blocks of s)
The cost for step (ii) and (iii) here will be the same as that given in steps
(ii) and (iii) above.
Thus, total cost = 2(blocks of r + blocks of s) ( [logM–1(blocks of s) – 1] )
+ (blocks of r + blocks of s).
Because s is in the inner term in this expression, it is advisable to choose the smaller
relation as the build relation. If the entire build input can be kept in the main memory,
n can be set to 1 and the algorithm need not partition the relations but may still build
an in-memory index, in such cases the cost estimate goes down to (Number of blocks
r + Number of blocks of s).
16
Handling of Overflows Query Processing and
Evaluation
Even if s is recursively partitioned hash-table overflow can occur, i.e., some partition
si may not fit in the memory. This may happen if there are many tuples in s with the
same value for join attributes or a bad hash function.
Partitioning is said to be skewed if some partitions have significantly more tuples than
the others. This is the overflow condition. The overflow can be handled in a variety of
ways:
Resolution (during the build phase): The overflow partition s is further
partitioned using different hash function. The equivalent partition of r must be
further partitioned similarily.
Avoidance (during build phase): Partition build relations into many partitions,
then combines them.
However, such approaches for handling overflows fail with large numbers of
duplicates. One option of avoiding such problems is to use block nested-loop join on
the overflowed partitions.
Let us explain the hash join and its cost for the natural join STUDENT MARKS
Assume a memory size of 25 blocks ⇒ M=25;
SELECT build s as STUDENT as it has less number of blocks (100 blocks) and r
probe as MARKS (500 blocks).
Number of partitions to be created for STUDENT = (block of STUDENT/M)* fudge
factor (1.2) = (100/25) × 1.2 = 4.8
Hybrid Hash-Join
This is useful when the size of the memory is relatively large, and the build input is
larger than the memory. Hybrid hash join keeps the first partition of the build relation
in the memory. The first partition of STUDENT is maintained in the first 20 blocks of
the buffer, and not written to the disk. The first block of MARKS is used right away
for probing, instead of being written and read back. Thus, it has a cost of
3(80 + 400) + 20 +100 = 1560 block transfers for hybrid hash-join, instead of 1800
with plain hash-join.
Hybrid hash-join is most useful if M is large, such that we can have bigger partitions.
A join with a disjunctive condition can be calculated either by using the nested loop or
block nested loop join, or it may be computed as the union of the records in individual
joins.
17
DBMS Advanced
Features and
Distributed Database
1.6 OTHER OPERATIONS
There are many other operations that are performed in database systems. Let us
introduce these processes in this section.
Duplicate Elimination: Duplicate elimination may be implemented by using hashing
or sorting. On sorting, duplicates will be adjacent to each other thus, may be identified
and deleted. An optimised method for duplicate elimination can be deletion of
duplicates during generation as well as at intermediate merge steps in external
sort-merge. Hashing is similar – duplicates will be clubbed together in the same
bucket and therefore may be eliminated easily.
Projection: It may be implemented by performing the projection process on each
tuple, followed by duplicate elimination.
Aggregate Function Execution: Aggregate functions can be implemented in a
manner similar to duplicate elimination. Sorting or hashing can be used to bring tuples
in the same group together, and then aggregate functions can be applied to each group.
An optimised solution could be to combine tuples in the same group during part time
generation and intermediate merges, by computing partial aggregate values. For count,
min, max, sum, you may club aggregate values on tuples found so far in the group.
When combining partial aggregates for counting, you would need to add up the
aggregates. For calculating the average, take the sum of the aggregates and the
count/number of aggregates, and then divide the sum with the count at the end.
Set operations (∪, ∩ and ) can either use a variant of merge-join after sorting, or a
variant of hash-join.
Hashing:
1) Partition both relations using the same hash function, thereby creating
r0, .., rn – 1 and s0,.., sn – 1
To find the results of the student(s) whose phone number is ‘1129250025’, the
following query may be given.
SELECT enrolno, name, subjectcode, grade
FROM STUDENT s, MARKS m
WEHRE s.enrolno=m.enrolno AND phone= ’1129250025’
The equivalent relational algebraic query for this will be:
This is a very good internal representation however, it may be a good idea to represent
the relational algebraic expression to a query tree on which algorithms for query
optimisation can be designed easily. In a query tree, nodes are the operators and
relations represent the leaf. The query tree for the relational expression above would
be:
Πenrolno, name, subjectcode, grade
σ phone=’1129250025’
MARKS
STUDENT
Figure 5: A Sample query tree
In the previous section we have seen the algorithms for individual operations. Now let
us look at the methods for evaluating an entire expression. In general we use two
methods:
• Materialisation
• Pipelining.
Pipelining
Evaluate operations in a multi-threaded manner, (i.e., passes tuples output from one
operation to the next parent operation as input) even as the first operation is being
executed. In the previous expression tree, it does not store (materialise) results instead,
it passes tuples directly to the join. Similarly, does not store results of join, and passes
tuples directly to the projection. Thus, there is no need to store a temporary relation on
a disk for each operation. Pipelining may not always be possible or easy for sort, hash-
join.
One of the pipelining execution methods may involve a buffer filled by the result
tuples of lower level operation while, records may be picked up from the buffer by the
higher level operation.
Complex Joins
When an expression involves three relations then we have more than one strategy for
the evaluation of the expression. For example, join of relations such as:
STUDENT MARKS SUBJECTS
may involve the following three strategies:
For each tuple m in MARKS, look up the corresponding tuples in STUDENT and the
corresponding tuples in SUBJECTS. Each tuple of MARKS will be examined only
once.
Strategy 3 combines two operations into one special-purpose operation that may be
more efficient than implementing the joins of two relations.
The cost difference between a good and a bad method of evaluating a query would be
enormous. We would therefore, need to estimate the cost of operations and statistical
information about relations. For example a number of tuples, a number of distinct
values for an attribute etc. Statistics helps in estimating intermediate results to
compute the cost of complex expressions.
20
Let us discuss all the steps in query-evaluation plan development in more details next. Query Processing and
Evaluation
Let us define certain equivalence rules that may be used to generate equivalent
relational expressions.
Equivalence Rules
1) The conjunctive selection operations can be equated to a sequence of individual
selections. It can be represented as:
σ θ1 ∧θ 2 (E ) = σ θ1 (σ θ2 ( E ))
2) The selection operations are commutative, that is,
σ θ ( σ θ ( E )) = σ θ ( σ θ ( E ))
1 2 2 1
3) Only the last of the sequence of projection operations is needed, the others can
be omitted.
πattriblist1 (πattriblist2 (πattriblist3 …..(E) ..) = πattriblist1 (E)
4) The selection operations can be combined with Cartesian products and theta
join operations.
σ θ1 (E1 × E2 ) = E1 θ1 E2
and
σ θ2 (E1 θ1 E2 ) = E1 θ2 ∧ θ1 E2
E1 θ E2 = E2 θ E1
6) The Natural join operations are associative. Theta joins are also associative but
with the proper distribution of joining condition:
7) The selection operation distributes over the theta join operation, under
conditions when all the attributes in selection predicate involve only the
attributes of one of the expressions (E1) being joined.
σθ1 ( E1 θ E2 ) = (σθ1( E1)) θ E2 )
8) The projections operation distributes over the theta join operation with only
those attributes, which are present in that relation.
πattrlist1 ∪ attrlist2 ( E1 θ E2 ) = (πattrlist1 ( E1) θ πattrlist2 (E2))
9) The set operations of union and intersection are commutative. But set difference
is not commutative.
21
DBMS Advanced 11) The selection operation can be distributed over the union, intersection, and set-
Features and
Distributed Database
differences operations.
The rules as above are too general and a few heuristics rules may be generated from
these rules, which help in modifying the relational expression in a better way. These
rules are:
(1) Combining a cascade of selections into a conjunction and testing all the
predicates on the tuples at the same time:
π4 ( π3 (……(E) )) = π4 (E)
(3) Commutating the selection and projection or vice-versa sometimes reduces cost
(4) Using associative or commutative rules for Cartesian product or joining to find
various alternatives.
(5) Moving the selection and projection (it may have to be modified) before joins.
The selection and projection results in the reduction of the number of tuples and
therefore may reduce cost of joining.
(6) Commuting the projection and selection with Cartesian product or union.
Let us explain use of some of these rules with the help of an example. Consider the
query for the relations:
STUDENT (enrollno, name, phone)
MARKS (enrollno, subjectcode, grade)
SUBJECTS (subjectcode, sname)
Consider the query: Find the enrolment number, name, and grade of those students
who have secured an A grade in the subject DBMS. One of the possible solutions to
this query may be:
σ sname=DBMS ∧ grade=A
SUBJECTS
STUDENT MARKS
The selection condition may be moved to the join operation. The selection condition
given in the Figure above is: sname = DBMS and grade = A. Both of these
22
conditions belong to different tables, as s name is available only in the SUBJECTS Query Processing and
table and grade in the MARKS table. Thus, the conditions of selection will be mapped Evaluation
accordingly as shown in the Figure below. Thus, the equivalent expression will be:
σ sname=DBMS
STUDENT σ grade=A
SUBJECTS
MARKS
The expected size of SUBJECTS and MARKS after selection will be small so it may
be a good idea to first join MARKS with SUBJECTS. Hence, the associative law of
JOIN may be applied.
STUDENT
σ grade=A σ sname=DBMS
MARKS SUBJECTS
Finally projection may be moved inside. Thus the resulting query tree may be:
STUDENT
π grade=A σ sname=DBMS
MARKS SUBJECTS
An evaluation plan defines exactly which algorithm is to be used for each operation,
and how the execution of the operation is coordinated. For example, Figure 6 shows
the query tree with evaluation plan.
(merge join)
(Pipeline) (sort on enroll no)
STUDENT (Block-Nested-Join)
Pipeline Pipeline
σ grade=A σ sname=DBMS
(Use cluster (Use linear scan)
index on grade)
MARKS SUBJECTS
Materialising the view would be very useful if the result of a view is required
frequently as it saves the effort of finding multiple tuples and totalling them up.
Further the task of keeping a materialised view up-to-date with the underlying data is
known as materialised view maintenance. Materialised views can be maintained by
recomputation on every update. A better option is to use incremental view
maintenance, that is, where only the affected part of the view is modified. View
maintenance can be done manually by defining triggers on insert, delete, and update
of each relation in the view definition. It can also be written manually to update the
view whenever database relations are updated or supported directly by the database.
Then this query would be rewritten using the materialised view ‘a’ as:
z = r NATURAL JOIN a
Do we need to perform materialisation? It depends on cost estimates for the two
alternatives viz., use of a materialised view by view definition, or simple evaluation.
Query optimiser should be extended to consider all the alternatives of view evaluation
and choose the best overall plan. This decision must be made on the basis of the
system workload. Indices in such decision-making may be considered as specialised
views. Some database systems provide tools to help the database administrator with
index and materialised view selection.
) Check Your Progress 3
1) Define methods used for evaluation of expressions?
……………………………………………………………………………………
……………………………………………………………………………………
……………………………………………………………………………………
2) How you define cost based optimisation?
……………………………………………………………………………………
……………………………………………………………………………………
……………………………………………………………………………………
25
DBMS Advanced 3) How you define evaluation plan?
Features and
Distributed Database ……………………………………………………………………………………
……………………………………………………………………………………
……………………………………………………………………………………
1.10 SUMMARY
In this unit we have discussed query processing and evaluation. A query in a DBMS is
a very important operation, as it needs to be efficient. Query processing involves
query parsing, representing query in alternative forms, finding the best plan of
evaluation of a query and then actually evaluating it. The major query evaluation cost
is the disk access time. In this unit, we have discussed the cost of different operations
in details. However, an overall query cost will not be a simple addition of all such
costs.
1.11 SOLUTIONS/ANSWERS
Check Your Progress 1
1) (a) In the first step scanning, parsing & validating is done
(b) Translation into relational algebra
(c) Parser checks syntax, verifies relations
(d) Query execution engine take a query evaluation plan, executes it and
returns answer to the query.
3) Cost of Hash-join.
(i) If recursive partitioning not required
3(Blocks of r +blocks of s)
(ii) If recursive partitioning required then
2(blocks of r + blocks of s ( logm –1(blocks of s) −1) + blocks of
r+blocks of s
4) Other operations
(a) Duplicate elimination
(b) Projection
(c) Aggregate functions.
3) Evaluation plan defines exactly what algorithms are to be used for each
operation and the manner in which the operations are coordinated.
27