0% found this document useful (0 votes)

25 views

BCSE302L-Database Systems Module - 4 Part2

Uploaded by

krishtiwari2122

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views

BCSE302L-Database Systems Module - 4 Part2

Uploaded by

krishtiwari2122

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 71

Dynamic Multilevel Indexes B+-Tree

Index Files
Construct B+ tree on the key values of name. Order of the tree is 4

Dynamic Multilevel Indexes B+-Tree

Index Files

Srini

Srini Wu

Mozar Srini Wu
Einstein

Srini

Einst Mozar Srini Wu

3
Dynamic Multilevel Indexes B+-Tree
Index Files

Srini

Einst El Sa Mozar Srini Wu

Dynamic Multilevel Indexes B+-Tree

Index Files

Srini

Einst El Sa Mozar Srini Wu

Gold
5
Dynamic Multilevel Indexes B+-Tree
Index Files

Gold Srini

Einst El Sa Gold Mozar Srini Wu 6

Dynamic Multilevel Indexes B+-Tree

Index Files

Gold Srini

Calif Einst El Sa Gold Katz Mozar Srini Wu 7

Dynamic Multilevel Indexes B+-Tree
Index Files

Dynamic Multilevel Indexes B+-Tree

Index Files
Properties of B+tree (n order)
• All paths from root to leaf are of the same length
• The number of pointers in a node is called the fanout of the node.
• Nonleaf nodes are also referred to as internal nodes.
• Each node that is not a root or a leaf has between n/2 and n children.
• A leaf node has between n/2-1 and n–1 values
• Special cases:
• If the root is not a leaf, it has at least 2 children.
• If the root is a leaf (that is, there are no other nodes in the tree), it can have
between 0 and (n–1) values.
9
Dynamic Multilevel Indexes B+-Tree
Index Files
When an entry is deleted, it is always removed from the leaf level.
If the value is only present in the leafnode:
• Check for the minimum condition, the node should have more than minimum key values, if so
delete it from the leaf
• If the node contains only minimum number of values, then deletion may cause underflow.
• In such case try to find immediate sibling left or right to it and redistribute the entries among the
node and its sibling so that both should satisfy the required condition; Add the key of the node
to the parent appropriately
• Otherwise, the node is merged with its siblings and the number of leaf nodes is reduced.
If it happens to occur in an internal node, it must also be removed from there.
• If there is more than the minimum number of keys in the node, simply delete the key from the
leaf node and delete the key from the internal node as well.
• Fill the empty space in the internal node with the inorder successor.
• If there is an exact minimum number of keys in the node, then delete the key and borrow a key
from its immediate sibling (through the parent).
• Fill the empty space created in the index (internal node) with the key.
10

Dynamic Multilevel Indexes B+-Tree

Index Files
If the value is present in the leafnode and root node:
• Check for the minimum condition, the node should have more than minimum key
values, if so delete it from the leaf
• Delete it from the root node and replace it with minimum value of the leaf node in
the right side
When the minimum condition for deletion is not satisfied
• A common method is to try to redistribute entries with the left sibling;
• if this is not possible, redistribute with the right sibling
• If this is also not possible, the three nodes are merged into two leaf nodes.
• In such a case, underflow may propagate to internal nodes because one fewer tree
pointer and search value are needed. This can propagate and reduce the tree levels
After deletion of each key, check whether all the properties of B+tree holds
11
Hashing Techniques
• Primary file organization hashingprovides very fast access to records under
certain search conditions- hash file.
• The search condition must be an equality condition on a single fieldhash field.
• If the hash field is a key field hash key.
• Hashing
• hash function or randomizing function - function h
• Applied to the hash field value of a record  yields the address of the disk block
in which the record is stored.
• Need only a single-block access to retrieve that record.

Hashing Techniques
Internal Hashing
• Hashing is implemented as a hash table
• Use of an array of records
• Consider the array index range is from 0 to M – 1 then we have M slots.
• Choose a hash function that transforms the hash field value into an integer between
0 and M − 1.
• Common hash function h(K) = K mod M
• Non integer hash field values  transformed into integers before the mod function
is applied
• For character strings, the numeric (ASCII) codes associated with characters can be
used in the transformation. 13
Hashing Techniques
Other hashing Functions
• Folding
• Applies arithmetic function -addition or a logical function - exclusive or
• For example, with an address space from 0 to 999 to store 1,000 keys,
• 6-digit key 235469 may be folded
• stored at the address: (235+964) mod 1000 = 199
• Another technique involves picking some digits of the hash field value—for
instance, the third, fifth, and eighth digits—to form the hash address
• Social Security numbers of 10 digits 301-67-8923 hash value of 172

Hashing Techniques
• Hashing functions do not guarantee that distinct values will hash to distinct
addresses
• hash field space is usually much larger than the address space
• hash field space—the number of possible values a hash field can take
• address space—the number of available addresses for records.
• The hashing function maps the hash field space to the address space

15
Hashing Techniques

Hashing Techniques
Collision
• Collision hash field value of a record that is being inserted hashes to an address
that already contains a different record.
• Insert the new record in some other position?
• Collision resolution- Process of finding another position
• Methods for collision resolution
• Open addressing
• Chaining
• Multiple hashing

17
Hashing Techniques
• Open addressing
• Proceeding from the occupied position specified by the hash addresschecks the
subsequent positions in order until an unused (empty) position is found

Hashing Techniques
• Chaining
• Various overflow locations are kept  by extending the array with a number of
overflow positions.
• A pointer field is added to each record location
• Places the new record in an unused overflow location
• setting the pointer of the occupied hash address location to the address of that
overflow location.

19
Hashing Techniques
• Chaining

Hashing Techniques
• Multiple hashing
• Applies a second hash function if the first results in a collision.
• If another collision results uses open addressing or applies a third hash function
and then uses open addressing if necessary.
• The series of hash functions are used in the same order for retrieval.

21
Hashing Techniques
• Goal of a good hashing function
• To distribute the records uniformly over the address space to minimize collisions,
to locate a record with a given key in a single access.
• Best to keep a hash file between 70 and 90% full the number of collisions
remains low and not waste too much space

Hashing Techniques
External Hashing for Disk Files
• Hashing for disk files is called External Hashing
• The file blocks are divided into M equal-sized buckets
• A bucket corresponds to one (or a fixed number of) disk block.
• One of the file fields hash key of the file.
• The record with hash key value K is stored in bucket i,
• i=h(K), and h is the hashing function
• Collisions occur when a new record hashes to a bucket that is already full.
• An overflow file is kept for storing such records.
• Overflow records that hash to each bucket can be linked together.

23
Hashing Techniques
External Hashing for Disk Files

Matching bucket numbers to disk

24
block addresses.

Hashing Techniques
External Hashing for Disk Files
• Static hashing - fixed number of buckets M is allocated.
• The function does key-to-address mappingfixing the address space
• Main disadvantages of static external hashing:
• Fixed number of buckets M if the number of records in the file grows or
shrinks.
• Ordered access on the hash key is inefficient requires sorting the records

25
Hashing Techniques
External Hashing for Disk Files
Dynamic and Extendible Hashing Techniques
• Dynamic and Extendible Hashing Techniques
• Allow the dynamic growth and shrinking of the number of file records.
• dynamic hashing, extendible hashing, and linear hashing.
• Dynamic and extendible hashing use the binary representation of the hash
value h(K) in order to access a directory.
• In dynamic hashing the directory is a binary tree.
• In extendible hashing the directory is an array of size 2d where d is called the global
depth.

Hashing Techniques
External Hashing for Disk Files
• Extendible Hashing Techniques

Global depth of directory:

Max # of bits needed to tell
which bucket an entry belongs
to.

Local depth of a bucket: # of

bits used to determine if an
entry belongs to this bucket.

Structure of the extendible hashing scheme 27

Hashing Techniques
External Hashing for Disk Files
• Extendible Hashing Techniques
• Suppose that we are using extendable hashing on a file that contains records with
the following search-key values:

• Show the extendable hash structure for this file if the hash function is h(x) = x and
buckets can hold three records

Hashing Techniques
External Hashing for Disk Files
• Extendible Hashing Techniques

Local Depth=1

0
1
Global Depth=1

29
Hashing Techniques
External Hashing for Disk Files
• Extendible Hashing Techniques

Local Depth=1

0 1
1
Global Depth=1

Hashing Techniques
External Hashing for Disk Files
• Extendible Hashing Techniques

Local Depth=1

0 1 7
1
Global Depth=1

31
Hashing Techniques
External Hashing for Disk Files
• Extendible Hashing Techniques

Local Depth=1

0 1 7 3
1
Global Depth=1

Hashing Techniques
External Hashing for Disk Files
• Extendible Hashing Techniques

Local Depth=1

0 1 7 3
1
Global Depth=1
8

33
Hashing Techniques
External Hashing for Disk Files
• Extendible Hashing Techniques

Local Depth=1

0 1 7 3
1
Global Depth=1
8 12

Hashing Techniques
External Hashing for Disk Files
• Extendible Hashing Techniques

Local Depth=1

0 1 7 3
1
Global Depth=1
8 12 14

35
Hashing Techniques
External Hashing for Disk Files
• Extendible Hashing Techniques

Local Depth=1

0 1 7 3
1
Global Depth=1
8 12 14

Inserting 11 will produce overflow in bucket1.

Increase the global depth to 2

Hashing Techniques
External Hashing for Disk Files
• Extendible Hashing Techniques

Local Depth=1

00 1 7 3
01
10
8 11 Local Depth=2
11
Global Depth=2 12 14 Local Depth=2

Inserting 2 will produce overflow in bucket 0.

37
Increase the local depth to 2
Hashing Techniques
External Hashing for Disk Files
• Extendible Hashing Techniques

Local Depth=2

1 3 2
00
01 7
10
8 11 Local Depth=2
11
Global Depth=2 12 14 Local Depth=2

Inserting 10, 13, 4 38

Hashing Techniques
External Hashing for Disk Files
• Extendible Hashing Techniques

Local Depth=2

1 3 2
00
01 7 4
10
8 11 10 Local Depth=2
11
Global Depth=2 12 14 13 Local Depth=2

Inserting 9 causes overflow of Bucket10

Increase the global depth to 3
39
Hashing Techniques
External Hashing for Disk Files
• Extendible Hashing Techniques

000 1 3 2 Local Depth=2

001
010 7 4 Local Depth=2

011
100
8 9 Local Depth=3

101
11 10 Local Depth=3
110
111 12 14 13 Local Depth=2
40
Global Depth=3

Hashing Techniques
External Hashing for Disk Files
• Extendible Hashing Techniques

41
Hashing Techniques
External Hashing for Disk Files
Dynamic hashing

Structure of the dynamic hashing scheme

Hashing Techniques
External Hashing for Disk Files
• Dynamic Hashing Techniques
• Show the Dynamic hash structure for the file if the hash function is h(x) = x and
buckets can hold three records

43
Hashing Techniques
External Hashing for Disk Files
• Dynamic Hashing Techniques

0
1

Hashing Techniques
External Hashing for Disk Files
• Dynamic Hashing Techniques

0 1 7 3
1
8 12 14

Inserting 11 will produce overflow in bucket1.

45
Hashing Techniques
External Hashing for Disk Files
• Dynamic Hashing Techniques

0 1 7 3
1
0 8 11
1
12 14

Inserting 2 will produce overflow in bucket0. 46

Hashing Techniques
External Hashing for Disk Files
• Dynamic Hashing Techniques

0 1 3 2
1
0 7
1
0 8 11
1
12 14

Insert 10, 13, 4 47

Hashing Techniques
External Hashing for Disk Files
• Dynamic Hashing Techniques

0 1 3 2
1
0 7 4
1
0 8 11 10
1
12 14 13
Inserting 9 causes overflow of Bucket10
48

Hashing Techniques
External Hashing for Disk Files
• Dynamic Hashing Techniques

1 3 2
0
1 7 4
0
1 8 9
0
0
1
1 11 10
12 14 13
49
Hashing Techniques
External Hashing for Disk Files
Linear Hashing
• Allow a hash file to expand and shrink its number of buckets dynamically without
needing a directory.
• For example file starts with M buckets numbered 0, 1, … , M − 1 and uses the mod
hash function h(K) = K mod Minitial hash function hi.
• When a collision leads to an overflow record in any file bucket the first bucket in
the file—bucket 0—is split into two buckets: the original bucket 0 and a new bucket M
at the end of the file.
• Records originally in bucket 0 are distributed between the two buckets based on a
different hashing function hi+1(K) = K mod 2M.
• Any records that hashed to bucket 0 based on hi will hash to either bucket 0 or
50
bucket M based on hi+1

Hashing Techniques
External Hashing for Disk Files
Linear Hashing
Show the Linear hash structure for the file with bucket size 2 and hashing parameter
M=3
1, 7, 3, 8, 12, 4, 11, 2, 10, 13, 14, 9

h1(k)=k mod m Bucket 0

h1(k)=k mod 3
h1(1)=1 mod 3=1 Bucket 1
h1(7)=7 mod 3=1
h1(3)=3 mod 3=0 Bucket 2
h1(8)=8 mod 3=2
h1(12)=12 mod 3=0
51
Hashing Techniques
External Hashing for Disk Files
Linear Hashing
Show the Linear hash structure for the file with bucket size 2 and hashing parameter
M=3
1, 7, 3, 8, 12, 4, 11, 2, 10, 13, 14, 9

h1(k)=k mod m Bucket 0 3 12

h1(k)=k mod 3
h1(1)=1 mod 3=1 Bucket 1 1 7
h1(7)=7 mod 3=1
h1(3)=3 mod 3=0 Bucket 2 8
h1(8)=8 mod 3=2
h1(12)=12 mod 3=0
h1(4)=4 mod 3=1Overflow occurs, maintaining individual overflow chains for
52

bucket 1

Hashing Techniques
External Hashing for Disk Files
Linear Hashing
1, 7, 3, 8, 12, 4, 11, 2, 10, 13, 14, 9

Since overflow occurs split the bucket 0 into bucket 0 and bucket M(3) new
bucket is at the end

Distribute the values in bucket0

between bucket0 and bucket3 by Bucket 0 3 12
applying hash function
h2(k)=k mod 2M Bucket 1 1 7 4
h2(k)=k mod 6
Bucket 2 8
Bucket 3 53
Hashing Techniques
External Hashing for Disk Files
Linear Hashing
1, 7, 3, 8, 12, 4, 11, 2, 10, 13, 14, 9

Distribute the values in bucket0

between bucket0 and bucket3 by
applying hash function
h2(k)=k mod 6
h2(3)=3 mod 6 =3 Bucket 0 12
h2(12)=12 mod 6 =0
Bucket 1 1 7 4
Bucket 2 8
Bucket 3 3 54

Hashing Techniques
External Hashing for Disk Files
Linear Hashing
1, 7, 3, 8, 12, 4, 11, 2, 10, 13, 14, 9

Insert 11
h1(11)=11 mod 3=2
Bucket 0 12

Bucket 1 1 7 4
Bucket 2 8 11
Bucket 3 3
55
Hashing Techniques
External Hashing for Disk Files
Linear Hashing
1, 7, 3, 8, 12, 4, 11, 2, 10, 13, 14, 9
Insert 2
h1(2)=2 mod 3=2 Overflow occurs, maintaining individual overflow chains for bucket 2
Since overflow occurs split the bucket 1 into bucket 1 and bucket M+1(4) new
bucket is at the end
Bucket 0 12

Bucket 1 1 7 4
Bucket 2 8 11 2
Bucket 3 3 56

Hashing Techniques
External Hashing for Disk Files
Linear Hashing
1, 7, 3, 8, 12, 4, 11, 2, 10, 13, 14, 9
Since overflow occurs split the bucket 1 into bucket 1 and bucket M+1(4) new
bucket is at the end
Distribute the values in bucket1 Bucket 0 12
between bucket1 and bucket4 by
applying hash function Bucket 1 1 7 4
h2(k)=k mod 2M
h2(k)=k mod 6 Bucket 2 8 11 2
Bucket 3 3
Bucket 4
57
Hashing Techniques
External Hashing for Disk Files
Linear Hashing
1, 7, 3, 8, 12, 4, 11, 2, 10, 13, 14, 9
Distribute the values in bucket1
between bucket1 and bucket4 by
applying hash function Bucket 0 12
h2(1)=1 mod 6=1
h2(7)=7 mod 6=1 Bucket 1 1 7
h2(4)=4 mod 6=4
Bucket 2 8 11 2
Bucket 3 3
Bucket 4 4
58

Hashing Techniques
External Hashing for Disk Files
Linear Hashing
1, 7, 3, 8, 12, 4, 11, 2, 10, 13, 14, 9
Insert 10
h1(10)=10 mod 6=4
Bucket 0 12
Insert 13
h1(13)=13 mod 6=1 Bucket 1 1 7 13
Overflow occurs, maintaining
individual overflow chains Bucket 2 8 11 2
for bucket 1
Bucket 3 3
Bucket 4 4 10
59
Hashing Techniques
External Hashing for Disk Files
Linear Hashing
Bucket 0 12
1, 7, 3, 8, 12, 4, 11, 2, 10, 13, 14, 9
Since overflow occurs split the
bucket 2 into bucket 2 and bucket
Bucket 1 1 7 13
M+2(5) new bucket is at the end
Bucket 2 8 2
Distribute the values in bucket2
between bucket2 and bucket5 by Bucket 3 3
applying hash function
h2(k)=k mod 2M Bucket 4 4 10
h2(k)=k mod 6
Bucket 5
h2(8)=8 mod 6=2 11
h2(11)=11 mod 6=5
60
h2(2)=2 mod 6=2

Hashing Techniques
External Hashing for Disk Files
Linear Hashing
Bucket 0 12
1, 7, 3, 8, 12, 4, 11, 2, 10, 13, 14, 9
Insert 14
h1(14)=14 mod 3=2
Bucket 1 1 7 13
Overflow occurs, maintaining
Bucket 2 8 2
individual overflow chains for bucket 2
Bucket 3 3
End of round 1, splitting has to be from
the beginning Bucket 4 4 10
h1(k)=k mod 3 is changed to
h1(k)=k mod 6, h2(k)=k mod 12 Bucket 5 11
61
Hashing Techniques
External Hashing for Disk Files
Linear Hashing
Bucket 0 12
1, 7, 3, 8, 12, 4, 11, 2, 10, 13, 14, 9
Insert 14
h1(14)=14 mod 6=2
Bucket 1 1 7 13
Overflow occurs, maintaining
Bucket 2 8 2 14
individual overflow chains for bucket 2
Since overflow occurs split the bucket 0 Bucket 3 3
into bucket 0 and bucket 6 new bucket is
at the end Bucket 4 4 10
Distribute the values in bucket0
between bucket0 and bucket6 by Bucket 5 11
applying hash function
h2(k)=k mod 2M, h2(12)=12 mod 12 =0 Bucket 6
62

Hashing Techniques
External Hashing for Disk Files
Linear Hashing
Bucket 0 12
1, 7, 3, 8, 12, 4, 11, 2, 10, 13, 14, 9
Insert 9
h1(9)=9 mod 6=3
Bucket 1 1 7 13
Bucket 2 8 2 14
Bucket 3 3 9
Bucket 4 4 10
Bucket 5 11
Bucket 6 63
Relational Algebra
• Procedural query language
• Set of operations  take one or two relations as input and produce a
new relation as their result.
• The fundamental operations select, project, union, set difference,
Cartesian product, and rename.
• Other operations—set intersection, natural join, and assignment,
Extended Relational-Algebra Operations

Relational Algebra
• select, project, and rename unary operations operate on one
relation.
• Set operations, Cartesian product, joins  binary operations operate
on pairs of relations

2
Relational Algebra
The Select Operation
• Selects tuples that satisfy a given predicate.
• The lowercase Greek letter sigma (σ ) to denote selection.
σ Condition/Predicate(Relation/Table name)
o Where predicate is consisting of terms connected by :  (and),  (or),
 (not)
o Each term is one of: <attribute>op <attribute> or <constant>
o where op is one of: =, , >, , <, 

Relational Algebra
The Select Operation
• Select operation is commutative
σ<cond1>(σ<cond2>(R)) = σ<cond2>(σ<cond1>(R))

• The degree of the resulting relationits number of attributes—is the

same as the degree of Relation

4
Relational Algebra
Employee relation
The Select Operation

 emp_name=“ram”(employee)

Relational Algebra
The Select Operation
 salary>6000 (employee)

 emp_name= “kumar”  salary>6000 (employee)

6
Relational Algebra
The Project Operation
• Returns its argument relation, with certain attributes left out.
• Any duplicate rows are eliminated.
• Denoted by the uppercase Greek letter pi (П).
П A1, A2, …, Ak (r)
where A1, A2 are attribute names and r is a relation name.

Relational Algebra
The Project Operation
• Number of tuples in a relation resulting from a Project operation is
always less than or equal to the number of tuples in R.
• Commutativity does not hold on Project.

8
Relational Algebra
The Project Operation Employee relation

emp_no, emp_name (employee)

Relational Algebra
Relational algebra expression Employee relation

Composition of
Relational Operations

Find the names of all employees who get

salary more than 6000.

emp_name (salary > 6000 (employee))

10
Relational Algebra
Set Operations
• UNION, INTERSECTION, and SET DIFFERENCE (MINUS or EXCEPT)-binary
operationsapplied to two sets (of tuples).
• The two relations on these three operations are applied must have the same
type of tuples called union compatibility or type compatibility.
• Two relations have the same number of attributes and each corresponding
pair of attributes has the same domain.

Relational Algebra
The Union Operation
• Unify tuples from two relations
• Notation: r ∪ s
For r ∪ s to be valid
• r, s must have the same arity (same number of attributes)
• The attribute domains must be compatible (e.g., 2nd column of r deals
with the same type of values as does the 2nd column of s)
• No duplicates present after the union operation

12
Relational Algebra
The Union Operation

depositor relation borrower relation

customer-name (depositor)  customer-name (borrower)

Relational Algebra
The Set-Difference / Minus Operation
• To find tuples that are in one relation but are not in another.
• Denoted by –
• r – s produces a relation containing those tuples in r but not in s
• Set differences must be taken between compatible relations.
• r and s must have the same arity
• attribute domains of r and s must be compatible

14
Relational Algebra
The Set-Difference Operation

depositor relation borrower relation

Example: Find all customers of the bank who have an

account but not a loan.

customer-name (depositor) - customer-name (borrower)

Relational Algebra
The Intersection Operation
• Includes all tuples that are in both relations
r∩s
• r, s have the same arity
• attributes of r and s are compatible
r ∩ s = r - (r - s)
• Set intersection is not a fundamental operation

16
Relational Algebra
The Intersection Operation

depositor relation borrower relation

Example: Find all the customers who have both a loan and
an account from the borrower and depositor relation.

customer-name (depositor) ∩ customer-name (borrower)

Relational Algebra
Set Operations
• Union and Intersection are commutative operations
R ∪ S = S ∪ R and R ∩ S = S ∩ R
• Union and Intersection n-ary operations applicable to any number of
relations because both are also associative operations
R ∪ (S ∪ T ) = (R ∪ S) ∪ T and (R ∩ S) ∩ T = R ∩ (S ∩ T )

18
Relational Algebra
The Cartesian-Product Operation
• Denoted by a cross (x)combine information from any two relations.
rxs
• Also called Cross Product or Cross Join
• R(A1, A2, … , An) × S(B1, B2, … , Bm) is a relation Q with degree n + m
attributes Q(A1, A2, … , An, B1, B2, … , Bm)
• R has i tuples and S has k tuples  Q will have i*k tuples

Relational Algebra
The Cartesian-Product Operation

EMPNAMES × DEPENDENT

20
Relational Algebra
The Cartesian-Product Operation

Relational Algebra
Assignment operation
• Denoted by ← (left arrow)

22
Relational Algebra
Assignment operation

Fname, Lname, Salary ( dno=5(EMPLOYEE))

Relational Algebra
Assignment operation
Fname, Lname, Salary ( dno=5(EMPLOYEE))
TEMP  dno=5(EMPLOYEE)
RESULT Fname, Lname, Salary (TEMP)

Result
24
Relational Algebra
Rename operation
• Denoted by the lowercase Greek letter rho ()
• Allows to nameto refer to, the results of relational-algebra
expressions.
• Allows to refer to a relation by more than one name.
• Example:
 x (E)
• Returns the expression E under the name x
• If a relational-algebra expression E has arity n, then
x (A1, A2, …, An) (E) 25

Relational Algebra
Rename operation

person (ID, name) (emp_no, emp_name (employee))

Employee relation person
ID name

26
Relational Algebra
Join operation
• Denoted by ⋈ is used to combine related tuples from two relations into single
tuples
• Only combinations of tuples satisfying the join condition
• Specified as a Cartesian product operation followed by a Select operation
• Natural join- Forms a Cartesian product of its two argumentsperforms a
selection forcing equality on those attributes that appear in both relation
schemas, and finally removes duplicate attributes
• Sometimes denoted by *
• Join involves join conditions with equality comparisons onlythe only comparison
operator used is =, is called an Equijoin.
27

Relational Algebra
Join operation

ACTUAL_DEPENDENTS ← EMPNAMES ⋈Ssn=Essn DEPENDENT

28
Relational Algebra
Join operation

ACTUAL_DEPENDENTS ← EMPNAMES ⋈Ssn=Essn DEPENDENT

• Equivalent to

TEMP← EMPNAMES × DEPENDENT

ACTUAL_DEPENDENTS ← σSsn=Essn(TEMP)

Relational Algebra
Join operation
Theta Join
R ⋈<join condition>S
A general join condition is of the form
<condition> AND <condition> AND … AND <condition>
Each <condition> is of the form Ai θ Bj, Ai is an attribute of R, Bj is an
attribute of S, Ai and Bj have the same domain, and θ (theta) is one of the
comparison operators {=, <, ≤, >, ≥, ≠}
A Join operation with such a general join condition is called a Theta Join
30
Relational Algebra
Join operation
Outer Join
• Preserves those tuples that would be lost in an join by creating tuples in
the result containing null values.
• Three forms of the operation:
• left outer join, denoted ⟕
• right outer join, denoted ⟖
• full outer join, denoted ⟗
• All three forms of outer join compute the join, and add extra tuples to
the result of the join.
31

Relational Algebra
Join operation
Outer Join
• Left outer join ⟕
• Takes all tuples in the left relation that did not match with any tuple in the right
relation, pads the tuples with null values for all other attributes from the right
relation, and adds them to the result of the natural join
• Right outer join ⟖
• Pads tuples from the right relation that did not match any from the left relation
with nulls and adds them to the result of the natural join.
• Full outer join ⟗
• Does both the left and right outer join operations 32
Relational Algebra
Generalized Projection
• Extends the projection operation by allowing operations such as
arithmetic and string functions to be used in the projection list.
 F1,F2,...,Fn(E)
• where E is any relational-algebra expression
• Each of F1, F2, . . . , Fn is an arithmetic expression involving constants
and attributes in the schema of E, permits operations such as
concatenation of strings.

Relational Algebra
Generalized Projection
Consider the relation
EMPLOYEE (Ssn, Salary, Deduction, Years_service)
A report may be required to show
Net Salary = Salary – Deduction,
Bonus = 2000 * Years_service
Tax = 0.25 * Salary
(Ssn, Salary – Deduction, 2000 * Years_service, 0.25 * Salary (employee))
(Ssn, Net_salary, Bonus, Tax) (Ssn, Salary – Deduction, 2000 * Years_service, 0.25 * Salary (employee))
34
Relational Algebra
Aggregate Functions and Grouping
• Take a collection of values and return a single value as a result.
• The symbol is the letter G in calligraphic font; read it as “calligraphic G.”
•  aggregation and its subscript specifies the aggregate operation to
be applied.
• Aggregate functions- sum, avg, count, min and max
• To eliminate duplicates, use function names with hyphenated string
“distinct” appended to the end of the function name (for example, count-
distinct).
• In some notations aggregate function represented by ℑ
35

Relational Algebra
Aggregate Functions and Grouping
<grouping attributes> <function list> (R)
Employee relation

Find out the sum of salaries of all employees

SQL: Select sum(salary) from employee;
sum(salary)(employee) 36
Relational Algebra
Aggregate Functions and Grouping
Employee relation

Find the average salary in each department

SQL: Select dept_no, average(salary) from employee group by dept_no;

dept_no average(salary)(employee) 37

Relational Algebra
Aggregate Functions and Grouping

• Retrieve each department number, the number of employees in the department,

and their average salary, while renaming the resulting attributes

R(Dno, No_of_employees, Average_sal) (Dno count (Ssn), average(salary)(employee))

38
Relational Algebra
Aggregate Functions and Grouping

count (Ssn), average(salary)(employee)

Dno count (Ssn), average(salary)(employee)

R(Dno, No_of_employees, Average_sal) (Dno count (Ssn), average(salary)(employee))

Relational Algebra

40
Relational Algebra

Relational Algebra

42
Relational Algebra
Examples (For the relations given above)

Relational Algebra
Examples (For the relations given above)

44
Relational Algebra
Examples (For the relations given above)

Relational Algebra
Examples (For the relations given above)

46
Relational Algebra
Examples (For the relations given above)

Relational Algebra
Examples (For the relations given above)

48
Relational Algebra
Examples (For the relations given above)

Relational Algebra
Examples

50
Query Processing
Query processing
• Refers to the range of activities involved in extracting data from a
database.
• Include translation of queries in high-level database languages into
expressions that can be used at the physical level of the file system,
• A variety of query-optimizing transformations, and actual evaluation of
queries

Query Processing
Query processing
• Steps
1. Parsing and translation.
2. Optimization.
3. Evaluation.

52
Query Processing
Query processing
• Given a query, there are a variety of methods for computing the answer
• Annotate it with instructions specifying how to evaluate each operation
• A relational algebra operation annotated with instructions on how to
evaluate it is called an evaluation primitive.
• A sequence of primitive operations that can be used to evaluate a
query is a query-execution plan or query-evaluation plan.

Query Processing
Query processing
Query tree
• Used in relational DBMSs (RDBMSs) to represent queries internally.
• A query tree or a query evaluation tree or query execution tree.
• A query tree is a tree data structure that corresponds to a relational algebra
expression.
• Represents the input relations of the query as leaf nodes of the tree, and
represents the relational algebra operations as internal nodes.
• Execution consists of executing an internal node operation whenever its
operands (represented by its child nodes) are available, and then replacing
that internal node by the relation that results from executing the operation. 54
Query Processing
Query processing
Query tree
• The execution terminates when the root node is executed and produces the
result relation for the query.
• The query parser will typically generate a standard initial query tree to
correspond to an SQL query, without doing any optimization.
• Canonical query tree represents a relational algebra expression that is very inefficient if
executed directly
• Heuristic query optimizer will transform this initial query tree into an
equivalent final query tree that is efficient to execute

Query Processing
Example

Fname, Lname, Salary ( dno=5(EMPLOYEE))

56
Query Processing
Example

Fname, Lname, Salary ( dno=5(EMPLOYEE))

Fname, Lname, Salary

 dno=5

EMPLOYEE
57

Query Processing and Optimization

Heuristic Optimization of Query Trees
• Different relational algebra expressions—hence many different query
trees—can be semantically equivalent;
• Represent the same query and produce the same results.
• The query parser will generate a standard initial query tree to
correspond to an SQL query, without doing any optimization
• The CARTESIAN PRODUCT of the relations specified in the FROM clause is
first applied;
• Then the selection and join conditions of the WHERE clause are applied
• Followed by the projection on the SELECT clause attributes
• very inefficient if executed directly
58
Query Processing and Optimization
Heuristic Optimization of Query Trees
• The heuristic query optimizer will transform this initial query tree
into an equivalent final query tree that is efficient to execute.
• Include rules for equivalence

Query Processing and Optimization

Steps in converting a query tree during heuristic optimization
(a) Initial (canonical) query tree for SQL query Q.
(b) Moving SELECT operations down the query tree.
(c) Applying the more restrictive SELECT operation first.
(d) Replacing CARTESIAN PRODUCT and SELECT with JOIN
operations.
(e) Moving PROJECT operations down the query tree

60
Query Processing and Optimization
Steps in converting a query tree during heuristic optimization
Example: Consider the following relations

Find the last names of employees born after

1957 who work on a project named
61
‘Aquarius’.

Query Processing and Optimization

Find the last names of employees born after 1957 who work on a
project named ‘Aquarius’.

SELECT E.Lname
FROM EMPLOYEE E, WORKS_ON W, PROJECT P
WHERE P.Pname=‘Aquarius’ AND P.Pnumber=W.Pno AND E.Ssn=W.ESsn
AND E.Bdate > ‘1957-12-31’;

62
Query Processing and Optimization
Find the last names of employees born after 1957 who work on a
project named ‘Aquarius’.
Initial (canonical) query tree for SQL query Q

Query Processing and Optimization

Find the last names of employees born after 1957 who work on a project
named ‘Aquarius’.
Moving SELECT operations down the query tree

64
Query Processing and Optimization
Find the last names of employees born after 1957 who work on a project
named ‘Aquarius’.
Applying the more restrictive SELECT
operation first

Query Processing and Optimization

Find the last names of employees born after 1957 who work on a project
named ‘Aquarius’.
Replacing CARTESIAN PRODUCT and SELECT with JOIN operations.

66
Query Processing and Optimization
Find the last names of employees born after 1957 who work on a project
named ‘Aquarius’.
Moving PROJECT operations
down the query tree

Query Processing and Optimization

Transformation of Relational Expressions
• Two relational-algebra expressions are said to be equivalent if, on every
legal database instance, the two expressions generate the same set of
tuples
• An equivalence rule says that expressions of two forms are equivalent.
• Replace an expression of the first form by an expression of the second form, or vice
versa
• , 1, 2, and so on to denote predicates
• L1, L2, L3, and so on to denote lists of attributes
• E, E1, E2, and so on to denote relational-algebra expressions
68
Query Processing and Optimization
1. Conjunctive selection operations can be deconstructed into a sequence of
individual selections.
s q1 q 2 ( E ) = s q1 (s q 2 ( E ))

2. Selection operations are commutative.

s q (s q ( E )) = s q (s q ( E ))
1 2 2 1

3. Only the last in a sequence of projection operations is needed, the others can
be omitted.
 L1 ( L2 ( ( Ln ( E )) )) =  L1 ( E )

4. Selections can be combined with Cartesian products and theta joins.

a. (E1 X E2) = E1 ⋈ E2
b. 1(E1 ⋈2 E2) = E1 ⋈ 1 2 E2 69

Query Processing and Optimization

5.Theta-join operations (and natural joins) are commutative.
E1 ⋈  E2 = E2 ⋈ E1
6.(a) Natural join operations are associative:
(E1 ⋈ E2) ⋈ E3 = E1 ⋈ (E2 ⋈ E3)

(b) Theta joins are associative in the following manner:

(E1 ⋈ 1 E2) ⋈2 3 E3 = E1 ⋈ 1 3 (E2 ⋈ 2 E3)

where 2 involves attributes from only E2 and E3.

70
Query Processing and Optimization

Query Processing and Optimization

7.The selection operation distributes over the theta join operation under the
following two conditions:
(a) When all the attributes in 0 involve only the attributes of one of the
expressions (E1) being joined.

0E1 ⋈ E2) = (0(E1)) ⋈ E2

(b) When  1 involves only the attributes of E1 and 2 involves only the
attributes of E2.
1 E1 ⋈ E2) = (1(E1)) ⋈ ( (E2))

72
Query Processing and Optimization
8.The projection operation distributes over the theta join operation as
follows:
(a) if  involves only attributes from L1  L2:

(b) Consider a join E1 ⋈ E2.

• Let L1 and L2 be sets of attributes from E1 and E2, respectively.
• Let L3 be attributes of E1 that are involved in join condition , but
are not in L1  L2, and
• let L4 be attributes of E2 that are involved in join condition , but are
not in L1  L2.
73

Query Processing and Optimization

9. The set operations union and intersection are commutative
E1  E2 = E2  E1

E1  E2 = E2  E1
(set difference is not commutative).
10.Set union and intersection are associative.
(E1  E2)  E3 = E1  (E2  E3)

(E1  E2)  E3 = E1  (E2  E3)

74
Query Processing and Optimization
11. The selection operation distributes over ,  and –.
s (E1 – E2) = s (E1) – s(E2)
and similarly for  and  in place of –
Also: s (E1 – E2) = s(E1) – E2
and similarly for  in place of –, but not for 

12. The projection operation distributes over union

L(E1  E2) = (L(E1))  (L(E2))

Query Processing and Optimization

Transformation Examples
instructor(ID, name, dept name, salary)
teaches(ID, course id, sec id, semester, year)
course(course id, title, dept name, credits)
Query: Find the names of all instructors in the Music department, along with the
titles of the courses that they teach
name, title(dept_name= “Music”(instructor ⋈ (teaches ⋈ course_id, title (course))))
• Transformation using rule 7a.
name, title((dept_name= “Music”(instructor)) ⋈ (teaches ⋈ course_id, title (course)))
• Performing the selection as early as possible reduces the size of the relation to
be joined. 76
Query Processing and Optimization
instructor(ID, name, dept name, salary)
teaches(ID, course id, sec id, semester, year)
course(course id, title, dept name, credits)

Find the names of all instructors in the

Music department together with the course
title of all the courses that the instructors
teach

Query Processing and Optimization

Initial expression tree

Constructs a large
intermediate relation
78
Query Processing and Optimization
Initial expression
tree
Transformed expression
tree

Query Processing and Optimization

Transformed expression tree

cd3281 Final Copy Lab Manual
100% (1)
cd3281 Final Copy Lab Manual
44 pages
Unit 3 - DBMS (Indexing, Hashing, B+-Tree)
No ratings yet
Unit 3 - DBMS (Indexing, Hashing, B+-Tree)
7 pages
Unit-3 Part 2 Indexing and Hashing
No ratings yet
Unit-3 Part 2 Indexing and Hashing
36 pages
DBMS Unit-3 Notes
No ratings yet
DBMS Unit-3 Notes
9 pages
Unit Iv Implementation Techniques
No ratings yet
Unit Iv Implementation Techniques
91 pages
DSAD Dynamic Hashing
No ratings yet
DSAD Dynamic Hashing
79 pages
Data Management: INFO125
No ratings yet
Data Management: INFO125
111 pages
Unit-4 Hand Written
No ratings yet
Unit-4 Hand Written
35 pages
DSA_240404_220052 (1)
No ratings yet
DSA_240404_220052 (1)
9 pages
CO3 Session 6
No ratings yet
CO3 Session 6
29 pages
Unit 6.2 Indexing and Hashing
No ratings yet
Unit 6.2 Indexing and Hashing
37 pages
Unit_6
No ratings yet
Unit_6
38 pages
Unit-5 B+Trees & Hashing
No ratings yet
Unit-5 B+Trees & Hashing
37 pages
Chap. 6 Hash-Based Indexing: Abel J.P. Gomes
No ratings yet
Chap. 6 Hash-Based Indexing: Abel J.P. Gomes
15 pages
Hashing
No ratings yet
Hashing
16 pages
Chap 12. Extendible Hashing: File Structures
No ratings yet
Chap 12. Extendible Hashing: File Structures
40 pages
22-File Organization-06-09-2024
No ratings yet
22-File Organization-06-09-2024
23 pages
Hashing in DBMS
No ratings yet
Hashing in DBMS
11 pages
Maps
No ratings yet
Maps
36 pages
Week 9_Hash Functions and Collision
No ratings yet
Week 9_Hash Functions and Collision
73 pages
L2.2-File Organization Techniques
No ratings yet
L2.2-File Organization Techniques
42 pages
Unit III-Hashing
100% (1)
Unit III-Hashing
135 pages
Hashing
No ratings yet
Hashing
29 pages
Ch11 Hash Indexes 1perpage Annotated
No ratings yet
Ch11 Hash Indexes 1perpage Annotated
28 pages
Dsa 4
No ratings yet
Dsa 4
55 pages
mod 5
No ratings yet
mod 5
13 pages
Indexing and Hashing: Solutions To Practice Exercises
No ratings yet
Indexing and Hashing: Solutions To Practice Exercises
11 pages
Hash Function
No ratings yet
Hash Function
9 pages
Lecture 6 - Searching
No ratings yet
Lecture 6 - Searching
41 pages
Chapter 12: Indexing and Hashing
No ratings yet
Chapter 12: Indexing and Hashing
31 pages
Indexing
No ratings yet
Indexing
77 pages
DSA MK Lect2 PDF
No ratings yet
DSA MK Lect2 PDF
92 pages
Tutorial 10 Indexing
No ratings yet
Tutorial 10 Indexing
36 pages
DSA Chapter 08 (Searching)
No ratings yet
DSA Chapter 08 (Searching)
65 pages
CO3 Notes Hashing
No ratings yet
CO3 Notes Hashing
10 pages
Final Hashing
No ratings yet
Final Hashing
41 pages
Database Systems (資料庫系統) : November 26/28, 2007 Lecture #9
No ratings yet
Database Systems (資料庫系統) : November 26/28, 2007 Lecture #9
43 pages
Hashing in Data Structure
No ratings yet
Hashing in Data Structure
43 pages
IT3020 L06 Indexing
No ratings yet
IT3020 L06 Indexing
41 pages
14-HashTable
No ratings yet
14-HashTable
38 pages
Hashing
No ratings yet
Hashing
23 pages
Week 3
No ratings yet
Week 3
29 pages
DBMS Chapter 4 Record Organization and Dile Management
No ratings yet
DBMS Chapter 4 Record Organization and Dile Management
36 pages
Modifed Hash
No ratings yet
Modifed Hash
42 pages
Adbs 5
No ratings yet
Adbs 5
37 pages
Unit 4-Hashing
No ratings yet
Unit 4-Hashing
24 pages
There Are Two Types of Hashing
No ratings yet
There Are Two Types of Hashing
2 pages
GROUP 15.Pptx Presentation
No ratings yet
GROUP 15.Pptx Presentation
29 pages
Hashing in DBMS
No ratings yet
Hashing in DBMS
9 pages
Implementation Priority Queue Using Array
No ratings yet
Implementation Priority Queue Using Array
3 pages
Hashing Techniques
No ratings yet
Hashing Techniques
15 pages
MODULE 5_BCS304_HASHING_Leftisht trees_OBST_Notes
No ratings yet
MODULE 5_BCS304_HASHING_Leftisht trees_OBST_Notes
32 pages
Hashing2
No ratings yet
Hashing2
59 pages
Hashing
No ratings yet
Hashing
8 pages
Hashing
No ratings yet
Hashing
15 pages
Unit 7
No ratings yet
Unit 7
27 pages
Assignment (DS)
No ratings yet
Assignment (DS)
8 pages
B+ Tree & B Tree
No ratings yet
B+ Tree & B Tree
38 pages
FS Mod5
No ratings yet
FS Mod5
23 pages
Search Tree: Fundamentals and Applications
From Everand
Search Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Hashing
From Everand
Hashing
Prakash Hegade
No ratings yet
Compiler
No ratings yet
Compiler
7 pages
Lab 11 File Streams
No ratings yet
Lab 11 File Streams
7 pages
LP48 User's - Manual EN
No ratings yet
LP48 User's - Manual EN
30 pages
ENGR 4560: Power System Control and Operation: Unit Commitment
No ratings yet
ENGR 4560: Power System Control and Operation: Unit Commitment
47 pages
Android Application Development
No ratings yet
Android Application Development
32 pages
Mingw32 Installation Details
No ratings yet
Mingw32 Installation Details
38 pages
30.unix - Useful Commands
No ratings yet
30.unix - Useful Commands
6 pages
Operating System MCQs-2
60% (5)
Operating System MCQs-2
37 pages
Data Crawler Usage
100% (1)
Data Crawler Usage
3 pages
IBM Knowledge Center - Charraymember
No ratings yet
IBM Knowledge Center - Charraymember
3 pages
Lecture 4-5 - ARM - Assembly - 3 (New)
No ratings yet
Lecture 4-5 - ARM - Assembly - 3 (New)
33 pages
Cwpuzzle PDF
No ratings yet
Cwpuzzle PDF
31 pages
11th Ch-2 Lab Activity 2025-26
No ratings yet
11th Ch-2 Lab Activity 2025-26
9 pages
Angularjs and Bootstrap
No ratings yet
Angularjs and Bootstrap
36 pages
CS101 All Finalterm Long Questions in One File VU
No ratings yet
CS101 All Finalterm Long Questions in One File VU
6 pages
gold EA
No ratings yet
gold EA
8 pages
Improve Your Software Engineer Skills Using AI
No ratings yet
Improve Your Software Engineer Skills Using AI
2 pages
SPPU High Performance Computing
No ratings yet
SPPU High Performance Computing
12 pages
Auto Increment A Field
No ratings yet
Auto Increment A Field
3 pages
Object Oriented Programming Through Java
No ratings yet
Object Oriented Programming Through Java
130 pages
CSE Sample Record
No ratings yet
CSE Sample Record
65 pages
Problem Solving and Search
No ratings yet
Problem Solving and Search
100 pages
MECSE09
No ratings yet
MECSE09
37 pages
Arduino Program
No ratings yet
Arduino Program
10 pages
CSL 210 Lab11 16122023 023049pm 14052024 090230am
No ratings yet
CSL 210 Lab11 16122023 023049pm 14052024 090230am
9 pages
Pramod Java Architect
No ratings yet
Pramod Java Architect
8 pages
BDA Lect5 Apache Spark 2023
No ratings yet
BDA Lect5 Apache Spark 2023
115 pages
dcs800 s02 0820 05+s171 - TC
No ratings yet
dcs800 s02 0820 05+s171 - TC
40 pages
Lecture Slides-CPlusPlus LectureNotes Module2
No ratings yet
Lecture Slides-CPlusPlus LectureNotes Module2
61 pages