0% found this document useful (0 votes)
25 views

BCSE302L-Database Systems Module - 4 Part2

Uploaded by

krishtiwari2122
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views

BCSE302L-Database Systems Module - 4 Part2

Uploaded by

krishtiwari2122
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 71

Dynamic Multilevel Indexes B+-Tree

Index Files
Construct B+ tree on the key values of name. Order of the tree is 4

Dynamic Multilevel Indexes B+-Tree


Index Files

Srini

Srini Wu

Mozar Srini Wu
Einstein

Srini

Einst Mozar Srini Wu


3
Dynamic Multilevel Indexes B+-Tree
Index Files

Srini

Einst El Sa Mozar Srini Wu

Dynamic Multilevel Indexes B+-Tree


Index Files

Srini

Einst El Sa Mozar Srini Wu


Gold
5
Dynamic Multilevel Indexes B+-Tree
Index Files

Gold Srini

Einst El Sa Gold Mozar Srini Wu 6

Dynamic Multilevel Indexes B+-Tree


Index Files

Gold Srini

Calif Einst El Sa Gold Katz Mozar Srini Wu 7


Dynamic Multilevel Indexes B+-Tree
Index Files

Dynamic Multilevel Indexes B+-Tree


Index Files
Properties of B+tree (n order)
• All paths from root to leaf are of the same length
• The number of pointers in a node is called the fanout of the node.
• Nonleaf nodes are also referred to as internal nodes.
• Each node that is not a root or a leaf has between n/2 and n children.
• A leaf node has between n/2-1 and n–1 values
• Special cases:
• If the root is not a leaf, it has at least 2 children.
• If the root is a leaf (that is, there are no other nodes in the tree), it can have
between 0 and (n–1) values.
9
Dynamic Multilevel Indexes B+-Tree
Index Files
When an entry is deleted, it is always removed from the leaf level.
If the value is only present in the leafnode:
• Check for the minimum condition, the node should have more than minimum key values, if so
delete it from the leaf
• If the node contains only minimum number of values, then deletion may cause underflow.
• In such case try to find immediate sibling left or right to it and redistribute the entries among the
node and its sibling so that both should satisfy the required condition; Add the key of the node
to the parent appropriately
• Otherwise, the node is merged with its siblings and the number of leaf nodes is reduced.
If it happens to occur in an internal node, it must also be removed from there.
• If there is more than the minimum number of keys in the node, simply delete the key from the
leaf node and delete the key from the internal node as well.
• Fill the empty space in the internal node with the inorder successor.
• If there is an exact minimum number of keys in the node, then delete the key and borrow a key
from its immediate sibling (through the parent).
• Fill the empty space created in the index (internal node) with the key.
10

Dynamic Multilevel Indexes B+-Tree


Index Files
If the value is present in the leafnode and root node:
• Check for the minimum condition, the node should have more than minimum key
values, if so delete it from the leaf
• Delete it from the root node and replace it with minimum value of the leaf node in
the right side
When the minimum condition for deletion is not satisfied
• A common method is to try to redistribute entries with the left sibling;
• if this is not possible, redistribute with the right sibling
• If this is also not possible, the three nodes are merged into two leaf nodes.
• In such a case, underflow may propagate to internal nodes because one fewer tree
pointer and search value are needed. This can propagate and reduce the tree levels
After deletion of each key, check whether all the properties of B+tree holds
11
Hashing Techniques
• Primary file organization hashingprovides very fast access to records under
certain search conditions- hash file.
• The search condition must be an equality condition on a single fieldhash field.
• If the hash field is a key field hash key.
• Hashing
• hash function or randomizing function - function h
• Applied to the hash field value of a record  yields the address of the disk block
in which the record is stored.
• Need only a single-block access to retrieve that record.

12

Hashing Techniques
Internal Hashing
• Hashing is implemented as a hash table
• Use of an array of records
• Consider the array index range is from 0 to M – 1 then we have M slots.
• Choose a hash function that transforms the hash field value into an integer between
0 and M − 1.
• Common hash function h(K) = K mod M
• Non integer hash field values  transformed into integers before the mod function
is applied
• For character strings, the numeric (ASCII) codes associated with characters can be
used in the transformation. 13
Hashing Techniques
Other hashing Functions
• Folding
• Applies arithmetic function -addition or a logical function - exclusive or
• For example, with an address space from 0 to 999 to store 1,000 keys,
• 6-digit key 235469 may be folded
• stored at the address: (235+964) mod 1000 = 199
• Another technique involves picking some digits of the hash field value—for
instance, the third, fifth, and eighth digits—to form the hash address
• Social Security numbers of 10 digits 301-67-8923 hash value of 172

14

Hashing Techniques
• Hashing functions do not guarantee that distinct values will hash to distinct
addresses
• hash field space is usually much larger than the address space
• hash field space—the number of possible values a hash field can take
• address space—the number of available addresses for records.
• The hashing function maps the hash field space to the address space

15
Hashing Techniques

16

Hashing Techniques
Collision
• Collision hash field value of a record that is being inserted hashes to an address
that already contains a different record.
• Insert the new record in some other position?
• Collision resolution- Process of finding another position
• Methods for collision resolution
• Open addressing
• Chaining
• Multiple hashing

17
Hashing Techniques
• Open addressing
• Proceeding from the occupied position specified by the hash addresschecks the
subsequent positions in order until an unused (empty) position is found

18

Hashing Techniques
• Chaining
• Various overflow locations are kept  by extending the array with a number of
overflow positions.
• A pointer field is added to each record location
• Places the new record in an unused overflow location
• setting the pointer of the occupied hash address location to the address of that
overflow location.

19
Hashing Techniques
• Chaining

20

Hashing Techniques
• Multiple hashing
• Applies a second hash function if the first results in a collision.
• If another collision results uses open addressing or applies a third hash function
and then uses open addressing if necessary.
• The series of hash functions are used in the same order for retrieval.

21
Hashing Techniques
• Goal of a good hashing function
• To distribute the records uniformly over the address space to minimize collisions,
to locate a record with a given key in a single access.
• Best to keep a hash file between 70 and 90% full the number of collisions
remains low and not waste too much space

22

Hashing Techniques
External Hashing for Disk Files
• Hashing for disk files is called External Hashing
• The file blocks are divided into M equal-sized buckets
• A bucket corresponds to one (or a fixed number of) disk block.
• One of the file fields hash key of the file.
• The record with hash key value K is stored in bucket i,
• i=h(K), and h is the hashing function
• Collisions occur when a new record hashes to a bucket that is already full.
• An overflow file is kept for storing such records.
• Overflow records that hash to each bucket can be linked together.

23
Hashing Techniques
External Hashing for Disk Files

Matching bucket numbers to disk


24
block addresses.

Hashing Techniques
External Hashing for Disk Files
• Static hashing - fixed number of buckets M is allocated.
• The function does key-to-address mappingfixing the address space
• Main disadvantages of static external hashing:
• Fixed number of buckets M if the number of records in the file grows or
shrinks.
• Ordered access on the hash key is inefficient requires sorting the records

25
Hashing Techniques
External Hashing for Disk Files
Dynamic and Extendible Hashing Techniques
• Dynamic and Extendible Hashing Techniques
• Allow the dynamic growth and shrinking of the number of file records.
• dynamic hashing, extendible hashing, and linear hashing.
• Dynamic and extendible hashing use the binary representation of the hash
value h(K) in order to access a directory.
• In dynamic hashing the directory is a binary tree.
• In extendible hashing the directory is an array of size 2d where d is called the global
depth.

26

Hashing Techniques
External Hashing for Disk Files
• Extendible Hashing Techniques

Global depth of directory:


Max # of bits needed to tell
which bucket an entry belongs
to.

Local depth of a bucket: # of


bits used to determine if an
entry belongs to this bucket.

Structure of the extendible hashing scheme 27


Hashing Techniques
External Hashing for Disk Files
• Extendible Hashing Techniques
• Suppose that we are using extendable hashing on a file that contains records with
the following search-key values:

• Show the extendable hash structure for this file if the hash function is h(x) = x and
buckets can hold three records

28

Hashing Techniques
External Hashing for Disk Files
• Extendible Hashing Techniques

Local Depth=1

0
1
Global Depth=1

29
Hashing Techniques
External Hashing for Disk Files
• Extendible Hashing Techniques

Local Depth=1

0 1
1
Global Depth=1

30

Hashing Techniques
External Hashing for Disk Files
• Extendible Hashing Techniques

Local Depth=1

0 1 7
1
Global Depth=1

31
Hashing Techniques
External Hashing for Disk Files
• Extendible Hashing Techniques

Local Depth=1

0 1 7 3
1
Global Depth=1

32

Hashing Techniques
External Hashing for Disk Files
• Extendible Hashing Techniques

Local Depth=1

0 1 7 3
1
Global Depth=1
8

33
Hashing Techniques
External Hashing for Disk Files
• Extendible Hashing Techniques

Local Depth=1

0 1 7 3
1
Global Depth=1
8 12

34

Hashing Techniques
External Hashing for Disk Files
• Extendible Hashing Techniques

Local Depth=1

0 1 7 3
1
Global Depth=1
8 12 14

35
Hashing Techniques
External Hashing for Disk Files
• Extendible Hashing Techniques

Local Depth=1

0 1 7 3
1
Global Depth=1
8 12 14

Inserting 11 will produce overflow in bucket1.


Increase the global depth to 2

36

Hashing Techniques
External Hashing for Disk Files
• Extendible Hashing Techniques

Local Depth=1

00 1 7 3
01
10
8 11 Local Depth=2
11
Global Depth=2 12 14 Local Depth=2

Inserting 2 will produce overflow in bucket 0.


37
Increase the local depth to 2
Hashing Techniques
External Hashing for Disk Files
• Extendible Hashing Techniques

Local Depth=2

1 3 2
00
01 7
10
8 11 Local Depth=2
11
Global Depth=2 12 14 Local Depth=2

Inserting 10, 13, 4 38

Hashing Techniques
External Hashing for Disk Files
• Extendible Hashing Techniques

Local Depth=2

1 3 2
00
01 7 4
10
8 11 10 Local Depth=2
11
Global Depth=2 12 14 13 Local Depth=2

Inserting 9 causes overflow of Bucket10


Increase the global depth to 3
39
Hashing Techniques
External Hashing for Disk Files
• Extendible Hashing Techniques

000 1 3 2 Local Depth=2

001
010 7 4 Local Depth=2

011
100
8 9 Local Depth=3

101
11 10 Local Depth=3
110
111 12 14 13 Local Depth=2
40
Global Depth=3

Hashing Techniques
External Hashing for Disk Files
• Extendible Hashing Techniques

41
Hashing Techniques
External Hashing for Disk Files
Dynamic hashing

Structure of the dynamic hashing scheme

42

Hashing Techniques
External Hashing for Disk Files
• Dynamic Hashing Techniques
• Show the Dynamic hash structure for the file if the hash function is h(x) = x and
buckets can hold three records

43
Hashing Techniques
External Hashing for Disk Files
• Dynamic Hashing Techniques

0
1

44

Hashing Techniques
External Hashing for Disk Files
• Dynamic Hashing Techniques

0 1 7 3
1
8 12 14

Inserting 11 will produce overflow in bucket1.

45
Hashing Techniques
External Hashing for Disk Files
• Dynamic Hashing Techniques

0 1 7 3
1
0 8 11
1
12 14

Inserting 2 will produce overflow in bucket0. 46

Hashing Techniques
External Hashing for Disk Files
• Dynamic Hashing Techniques

0 1 3 2
1
0 7
1
0 8 11
1
12 14

Insert 10, 13, 4 47


Hashing Techniques
External Hashing for Disk Files
• Dynamic Hashing Techniques

0 1 3 2
1
0 7 4
1
0 8 11 10
1
12 14 13
Inserting 9 causes overflow of Bucket10
48

Hashing Techniques
External Hashing for Disk Files
• Dynamic Hashing Techniques

1 3 2
0
1 7 4
0
1 8 9
0
0
1
1 11 10
12 14 13
49
Hashing Techniques
External Hashing for Disk Files
Linear Hashing
• Allow a hash file to expand and shrink its number of buckets dynamically without
needing a directory.
• For example file starts with M buckets numbered 0, 1, … , M − 1 and uses the mod
hash function h(K) = K mod Minitial hash function hi.
• When a collision leads to an overflow record in any file bucket the first bucket in
the file—bucket 0—is split into two buckets: the original bucket 0 and a new bucket M
at the end of the file.
• Records originally in bucket 0 are distributed between the two buckets based on a
different hashing function hi+1(K) = K mod 2M.
• Any records that hashed to bucket 0 based on hi will hash to either bucket 0 or
50
bucket M based on hi+1

Hashing Techniques
External Hashing for Disk Files
Linear Hashing
Show the Linear hash structure for the file with bucket size 2 and hashing parameter
M=3
1, 7, 3, 8, 12, 4, 11, 2, 10, 13, 14, 9

h1(k)=k mod m Bucket 0


h1(k)=k mod 3
h1(1)=1 mod 3=1 Bucket 1
h1(7)=7 mod 3=1
h1(3)=3 mod 3=0 Bucket 2
h1(8)=8 mod 3=2
h1(12)=12 mod 3=0
51
Hashing Techniques
External Hashing for Disk Files
Linear Hashing
Show the Linear hash structure for the file with bucket size 2 and hashing parameter
M=3
1, 7, 3, 8, 12, 4, 11, 2, 10, 13, 14, 9

h1(k)=k mod m Bucket 0 3 12


h1(k)=k mod 3
h1(1)=1 mod 3=1 Bucket 1 1 7
h1(7)=7 mod 3=1
h1(3)=3 mod 3=0 Bucket 2 8
h1(8)=8 mod 3=2
h1(12)=12 mod 3=0
h1(4)=4 mod 3=1Overflow occurs, maintaining individual overflow chains for
52

bucket 1

Hashing Techniques
External Hashing for Disk Files
Linear Hashing
1, 7, 3, 8, 12, 4, 11, 2, 10, 13, 14, 9

Since overflow occurs split the bucket 0 into bucket 0 and bucket M(3) new
bucket is at the end

Distribute the values in bucket0


between bucket0 and bucket3 by Bucket 0 3 12
applying hash function
h2(k)=k mod 2M Bucket 1 1 7 4
h2(k)=k mod 6
Bucket 2 8
Bucket 3 53
Hashing Techniques
External Hashing for Disk Files
Linear Hashing
1, 7, 3, 8, 12, 4, 11, 2, 10, 13, 14, 9

Distribute the values in bucket0


between bucket0 and bucket3 by
applying hash function
h2(k)=k mod 6
h2(3)=3 mod 6 =3 Bucket 0 12
h2(12)=12 mod 6 =0
Bucket 1 1 7 4
Bucket 2 8
Bucket 3 3 54

Hashing Techniques
External Hashing for Disk Files
Linear Hashing
1, 7, 3, 8, 12, 4, 11, 2, 10, 13, 14, 9

Insert 11
h1(11)=11 mod 3=2
Bucket 0 12

Bucket 1 1 7 4
Bucket 2 8 11
Bucket 3 3
55
Hashing Techniques
External Hashing for Disk Files
Linear Hashing
1, 7, 3, 8, 12, 4, 11, 2, 10, 13, 14, 9
Insert 2
h1(2)=2 mod 3=2 Overflow occurs, maintaining individual overflow chains for bucket 2
Since overflow occurs split the bucket 1 into bucket 1 and bucket M+1(4) new
bucket is at the end
Bucket 0 12

Bucket 1 1 7 4
Bucket 2 8 11 2
Bucket 3 3 56

Hashing Techniques
External Hashing for Disk Files
Linear Hashing
1, 7, 3, 8, 12, 4, 11, 2, 10, 13, 14, 9
Since overflow occurs split the bucket 1 into bucket 1 and bucket M+1(4) new
bucket is at the end
Distribute the values in bucket1 Bucket 0 12
between bucket1 and bucket4 by
applying hash function Bucket 1 1 7 4
h2(k)=k mod 2M
h2(k)=k mod 6 Bucket 2 8 11 2
Bucket 3 3
Bucket 4
57
Hashing Techniques
External Hashing for Disk Files
Linear Hashing
1, 7, 3, 8, 12, 4, 11, 2, 10, 13, 14, 9
Distribute the values in bucket1
between bucket1 and bucket4 by
applying hash function Bucket 0 12
h2(1)=1 mod 6=1
h2(7)=7 mod 6=1 Bucket 1 1 7
h2(4)=4 mod 6=4
Bucket 2 8 11 2
Bucket 3 3
Bucket 4 4
58

Hashing Techniques
External Hashing for Disk Files
Linear Hashing
1, 7, 3, 8, 12, 4, 11, 2, 10, 13, 14, 9
Insert 10
h1(10)=10 mod 6=4
Bucket 0 12
Insert 13
h1(13)=13 mod 6=1 Bucket 1 1 7 13
Overflow occurs, maintaining
individual overflow chains Bucket 2 8 11 2
for bucket 1
Bucket 3 3
Bucket 4 4 10
59
Hashing Techniques
External Hashing for Disk Files
Linear Hashing
Bucket 0 12
1, 7, 3, 8, 12, 4, 11, 2, 10, 13, 14, 9
Since overflow occurs split the
bucket 2 into bucket 2 and bucket
Bucket 1 1 7 13
M+2(5) new bucket is at the end
Bucket 2 8 2
Distribute the values in bucket2
between bucket2 and bucket5 by Bucket 3 3
applying hash function
h2(k)=k mod 2M Bucket 4 4 10
h2(k)=k mod 6
Bucket 5
h2(8)=8 mod 6=2 11
h2(11)=11 mod 6=5
60
h2(2)=2 mod 6=2

Hashing Techniques
External Hashing for Disk Files
Linear Hashing
Bucket 0 12
1, 7, 3, 8, 12, 4, 11, 2, 10, 13, 14, 9
Insert 14
h1(14)=14 mod 3=2
Bucket 1 1 7 13
Overflow occurs, maintaining
Bucket 2 8 2
individual overflow chains for bucket 2
Bucket 3 3
End of round 1, splitting has to be from
the beginning Bucket 4 4 10
h1(k)=k mod 3 is changed to
h1(k)=k mod 6, h2(k)=k mod 12 Bucket 5 11
61
Hashing Techniques
External Hashing for Disk Files
Linear Hashing
Bucket 0 12
1, 7, 3, 8, 12, 4, 11, 2, 10, 13, 14, 9
Insert 14
h1(14)=14 mod 6=2
Bucket 1 1 7 13
Overflow occurs, maintaining
Bucket 2 8 2 14
individual overflow chains for bucket 2
Since overflow occurs split the bucket 0 Bucket 3 3
into bucket 0 and bucket 6 new bucket is
at the end Bucket 4 4 10
Distribute the values in bucket0
between bucket0 and bucket6 by Bucket 5 11
applying hash function
h2(k)=k mod 2M, h2(12)=12 mod 12 =0 Bucket 6
62

Hashing Techniques
External Hashing for Disk Files
Linear Hashing
Bucket 0 12
1, 7, 3, 8, 12, 4, 11, 2, 10, 13, 14, 9
Insert 9
h1(9)=9 mod 6=3
Bucket 1 1 7 13
Bucket 2 8 2 14
Bucket 3 3 9
Bucket 4 4 10
Bucket 5 11
Bucket 6 63
Relational Algebra
• Procedural query language
• Set of operations  take one or two relations as input and produce a
new relation as their result.
• The fundamental operations select, project, union, set difference,
Cartesian product, and rename.
• Other operations—set intersection, natural join, and assignment,
Extended Relational-Algebra Operations

Relational Algebra
• select, project, and rename unary operations operate on one
relation.
• Set operations, Cartesian product, joins  binary operations operate
on pairs of relations

2
Relational Algebra
The Select Operation
• Selects tuples that satisfy a given predicate.
• The lowercase Greek letter sigma (σ ) to denote selection.
σ Condition/Predicate(Relation/Table name)
o Where predicate is consisting of terms connected by :  (and),  (or),
 (not)
o Each term is one of: <attribute>op <attribute> or <constant>
o where op is one of: =, , >, , <, 

Relational Algebra
The Select Operation
• Select operation is commutative
σ<cond1>(σ<cond2>(R)) = σ<cond2>(σ<cond1>(R))

• The degree of the resulting relationits number of attributes—is the


same as the degree of Relation

4
Relational Algebra
Employee relation
The Select Operation

 emp_name=“ram”(employee)

Relational Algebra
The Select Operation
 salary>6000 (employee)

 emp_name= “kumar”  salary>6000 (employee)

6
Relational Algebra
The Project Operation
• Returns its argument relation, with certain attributes left out.
• Any duplicate rows are eliminated.
• Denoted by the uppercase Greek letter pi (П).
П A1, A2, …, Ak (r)
where A1, A2 are attribute names and r is a relation name.

Relational Algebra
The Project Operation
• Number of tuples in a relation resulting from a Project operation is
always less than or equal to the number of tuples in R.
• Commutativity does not hold on Project.

8
Relational Algebra
The Project Operation Employee relation

emp_no, emp_name (employee)

Relational Algebra
Relational algebra expression Employee relation

Composition of
Relational Operations

Find the names of all employees who get


salary more than 6000.

emp_name (salary > 6000 (employee))


10
Relational Algebra
Set Operations
• UNION, INTERSECTION, and SET DIFFERENCE (MINUS or EXCEPT)-binary
operationsapplied to two sets (of tuples).
• The two relations on these three operations are applied must have the same
type of tuples called union compatibility or type compatibility.
• Two relations have the same number of attributes and each corresponding
pair of attributes has the same domain.

11

Relational Algebra
The Union Operation
• Unify tuples from two relations
• Notation: r ∪ s
For r ∪ s to be valid
• r, s must have the same arity (same number of attributes)
• The attribute domains must be compatible (e.g., 2nd column of r deals
with the same type of values as does the 2nd column of s)
• No duplicates present after the union operation

12
Relational Algebra
The Union Operation

depositor relation borrower relation

customer-name (depositor)  customer-name (borrower)

13

Relational Algebra
The Set-Difference / Minus Operation
• To find tuples that are in one relation but are not in another.
• Denoted by –
• r – s produces a relation containing those tuples in r but not in s
• Set differences must be taken between compatible relations.
• r and s must have the same arity
• attribute domains of r and s must be compatible

14
Relational Algebra
The Set-Difference Operation

depositor relation borrower relation

Example: Find all customers of the bank who have an


account but not a loan.

customer-name (depositor) - customer-name (borrower)


15

Relational Algebra
The Intersection Operation
• Includes all tuples that are in both relations
r∩s
• r, s have the same arity
• attributes of r and s are compatible
r ∩ s = r - (r - s)
• Set intersection is not a fundamental operation

16
Relational Algebra
The Intersection Operation

depositor relation borrower relation

Example: Find all the customers who have both a loan and
an account from the borrower and depositor relation.

customer-name (depositor) ∩ customer-name (borrower)


17

Relational Algebra
Set Operations
• Union and Intersection are commutative operations
R ∪ S = S ∪ R and R ∩ S = S ∩ R
• Union and Intersection n-ary operations applicable to any number of
relations because both are also associative operations
R ∪ (S ∪ T ) = (R ∪ S) ∪ T and (R ∩ S) ∩ T = R ∩ (S ∩ T )

18
Relational Algebra
The Cartesian-Product Operation
• Denoted by a cross (x)combine information from any two relations.
rxs
• Also called Cross Product or Cross Join
• R(A1, A2, … , An) × S(B1, B2, … , Bm) is a relation Q with degree n + m
attributes Q(A1, A2, … , An, B1, B2, … , Bm)
• R has i tuples and S has k tuples  Q will have i*k tuples

19

Relational Algebra
The Cartesian-Product Operation

EMPNAMES × DEPENDENT

20
Relational Algebra
The Cartesian-Product Operation

21

Relational Algebra
Assignment operation
• Denoted by ← (left arrow)

22
Relational Algebra
Assignment operation

Fname, Lname, Salary ( dno=5(EMPLOYEE))


23

Relational Algebra
Assignment operation
Fname, Lname, Salary ( dno=5(EMPLOYEE))
TEMP  dno=5(EMPLOYEE)
RESULT Fname, Lname, Salary (TEMP)

Result
24
Relational Algebra
Rename operation
• Denoted by the lowercase Greek letter rho ()
• Allows to nameto refer to, the results of relational-algebra
expressions.
• Allows to refer to a relation by more than one name.
• Example:
 x (E)
• Returns the expression E under the name x
• If a relational-algebra expression E has arity n, then
x (A1, A2, …, An) (E) 25

Relational Algebra
Rename operation

person (ID, name) (emp_no, emp_name (employee))


Employee relation person
ID name

26
Relational Algebra
Join operation
• Denoted by ⋈ is used to combine related tuples from two relations into single
tuples
• Only combinations of tuples satisfying the join condition
• Specified as a Cartesian product operation followed by a Select operation
• Natural join- Forms a Cartesian product of its two argumentsperforms a
selection forcing equality on those attributes that appear in both relation
schemas, and finally removes duplicate attributes
• Sometimes denoted by *
• Join involves join conditions with equality comparisons onlythe only comparison
operator used is =, is called an Equijoin.
27

Relational Algebra
Join operation

ACTUAL_DEPENDENTS ← EMPNAMES ⋈Ssn=Essn DEPENDENT

28
Relational Algebra
Join operation

ACTUAL_DEPENDENTS ← EMPNAMES ⋈Ssn=Essn DEPENDENT


• Equivalent to

TEMP← EMPNAMES × DEPENDENT


ACTUAL_DEPENDENTS ← σSsn=Essn(TEMP)

29

Relational Algebra
Join operation
Theta Join
R ⋈<join condition>S
A general join condition is of the form
<condition> AND <condition> AND … AND <condition>
Each <condition> is of the form Ai θ Bj, Ai is an attribute of R, Bj is an
attribute of S, Ai and Bj have the same domain, and θ (theta) is one of the
comparison operators {=, <, ≤, >, ≥, ≠}
A Join operation with such a general join condition is called a Theta Join
30
Relational Algebra
Join operation
Outer Join
• Preserves those tuples that would be lost in an join by creating tuples in
the result containing null values.
• Three forms of the operation:
• left outer join, denoted ⟕
• right outer join, denoted ⟖
• full outer join, denoted ⟗
• All three forms of outer join compute the join, and add extra tuples to
the result of the join.
31

Relational Algebra
Join operation
Outer Join
• Left outer join ⟕
• Takes all tuples in the left relation that did not match with any tuple in the right
relation, pads the tuples with null values for all other attributes from the right
relation, and adds them to the result of the natural join
• Right outer join ⟖
• Pads tuples from the right relation that did not match any from the left relation
with nulls and adds them to the result of the natural join.
• Full outer join ⟗
• Does both the left and right outer join operations 32
Relational Algebra
Generalized Projection
• Extends the projection operation by allowing operations such as
arithmetic and string functions to be used in the projection list.
 F1,F2,...,Fn(E)
• where E is any relational-algebra expression
• Each of F1, F2, . . . , Fn is an arithmetic expression involving constants
and attributes in the schema of E, permits operations such as
concatenation of strings.

33

Relational Algebra
Generalized Projection
Consider the relation
EMPLOYEE (Ssn, Salary, Deduction, Years_service)
A report may be required to show
Net Salary = Salary – Deduction,
Bonus = 2000 * Years_service
Tax = 0.25 * Salary
(Ssn, Salary – Deduction, 2000 * Years_service, 0.25 * Salary (employee))
(Ssn, Net_salary, Bonus, Tax) (Ssn, Salary – Deduction, 2000 * Years_service, 0.25 * Salary (employee))
34
Relational Algebra
Aggregate Functions and Grouping
• Take a collection of values and return a single value as a result.
• The symbol is the letter G in calligraphic font; read it as “calligraphic G.”
•  aggregation and its subscript specifies the aggregate operation to
be applied.
• Aggregate functions- sum, avg, count, min and max
• To eliminate duplicates, use function names with hyphenated string
“distinct” appended to the end of the function name (for example, count-
distinct).
• In some notations aggregate function represented by ℑ
35

Relational Algebra
Aggregate Functions and Grouping
<grouping attributes> <function list> (R)
Employee relation

Find out the sum of salaries of all employees


SQL: Select sum(salary) from employee;
sum(salary)(employee) 36
Relational Algebra
Aggregate Functions and Grouping
Employee relation

Find the average salary in each department


SQL: Select dept_no, average(salary) from employee group by dept_no;

dept_no average(salary)(employee) 37

Relational Algebra
Aggregate Functions and Grouping

• Retrieve each department number, the number of employees in the department,


and their average salary, while renaming the resulting attributes

R(Dno, No_of_employees, Average_sal) (Dno count (Ssn), average(salary)(employee))


38
Relational Algebra
Aggregate Functions and Grouping

count (Ssn), average(salary)(employee)

Dno count (Ssn), average(salary)(employee)

R(Dno, No_of_employees, Average_sal) (Dno count (Ssn), average(salary)(employee))


39

Relational Algebra

40
Relational Algebra

41

Relational Algebra

42
Relational Algebra
Examples (For the relations given above)

43

Relational Algebra
Examples (For the relations given above)

44
Relational Algebra
Examples (For the relations given above)

45

Relational Algebra
Examples (For the relations given above)

46
Relational Algebra
Examples (For the relations given above)

47

Relational Algebra
Examples (For the relations given above)

48
Relational Algebra
Examples (For the relations given above)

49

Relational Algebra
Examples

50
Query Processing
Query processing
• Refers to the range of activities involved in extracting data from a
database.
• Include translation of queries in high-level database languages into
expressions that can be used at the physical level of the file system,
• A variety of query-optimizing transformations, and actual evaluation of
queries

51

Query Processing
Query processing
• Steps
1. Parsing and translation.
2. Optimization.
3. Evaluation.

52
Query Processing
Query processing
• Given a query, there are a variety of methods for computing the answer
• Annotate it with instructions specifying how to evaluate each operation
• A relational algebra operation annotated with instructions on how to
evaluate it is called an evaluation primitive.
• A sequence of primitive operations that can be used to evaluate a
query is a query-execution plan or query-evaluation plan.

53

Query Processing
Query processing
Query tree
• Used in relational DBMSs (RDBMSs) to represent queries internally.
• A query tree or a query evaluation tree or query execution tree.
• A query tree is a tree data structure that corresponds to a relational algebra
expression.
• Represents the input relations of the query as leaf nodes of the tree, and
represents the relational algebra operations as internal nodes.
• Execution consists of executing an internal node operation whenever its
operands (represented by its child nodes) are available, and then replacing
that internal node by the relation that results from executing the operation. 54
Query Processing
Query processing
Query tree
• The execution terminates when the root node is executed and produces the
result relation for the query.
• The query parser will typically generate a standard initial query tree to
correspond to an SQL query, without doing any optimization.
• Canonical query tree represents a relational algebra expression that is very inefficient if
executed directly
• Heuristic query optimizer will transform this initial query tree into an
equivalent final query tree that is efficient to execute

55

Query Processing
Example

Fname, Lname, Salary ( dno=5(EMPLOYEE))


56
Query Processing
Example

Fname, Lname, Salary ( dno=5(EMPLOYEE))

Fname, Lname, Salary

 dno=5

EMPLOYEE
57

Query Processing and Optimization


Heuristic Optimization of Query Trees
• Different relational algebra expressions—hence many different query
trees—can be semantically equivalent;
• Represent the same query and produce the same results.
• The query parser will generate a standard initial query tree to
correspond to an SQL query, without doing any optimization
• The CARTESIAN PRODUCT of the relations specified in the FROM clause is
first applied;
• Then the selection and join conditions of the WHERE clause are applied
• Followed by the projection on the SELECT clause attributes
• very inefficient if executed directly
58
Query Processing and Optimization
Heuristic Optimization of Query Trees
• The heuristic query optimizer will transform this initial query tree
into an equivalent final query tree that is efficient to execute.
• Include rules for equivalence

59

Query Processing and Optimization


Steps in converting a query tree during heuristic optimization
(a) Initial (canonical) query tree for SQL query Q.
(b) Moving SELECT operations down the query tree.
(c) Applying the more restrictive SELECT operation first.
(d) Replacing CARTESIAN PRODUCT and SELECT with JOIN
operations.
(e) Moving PROJECT operations down the query tree

60
Query Processing and Optimization
Steps in converting a query tree during heuristic optimization
Example: Consider the following relations

Find the last names of employees born after


1957 who work on a project named
61
‘Aquarius’.

Query Processing and Optimization


Find the last names of employees born after 1957 who work on a
project named ‘Aquarius’.

SELECT E.Lname
FROM EMPLOYEE E, WORKS_ON W, PROJECT P
WHERE P.Pname=‘Aquarius’ AND P.Pnumber=W.Pno AND E.Ssn=W.ESsn
AND E.Bdate > ‘1957-12-31’;

62
Query Processing and Optimization
Find the last names of employees born after 1957 who work on a
project named ‘Aquarius’.
Initial (canonical) query tree for SQL query Q

63

Query Processing and Optimization


Find the last names of employees born after 1957 who work on a project
named ‘Aquarius’.
Moving SELECT operations down the query tree

64
Query Processing and Optimization
Find the last names of employees born after 1957 who work on a project
named ‘Aquarius’.
Applying the more restrictive SELECT
operation first

65

Query Processing and Optimization


Find the last names of employees born after 1957 who work on a project
named ‘Aquarius’.
Replacing CARTESIAN PRODUCT and SELECT with JOIN operations.

66
Query Processing and Optimization
Find the last names of employees born after 1957 who work on a project
named ‘Aquarius’.
Moving PROJECT operations
down the query tree

67

Query Processing and Optimization


Transformation of Relational Expressions
• Two relational-algebra expressions are said to be equivalent if, on every
legal database instance, the two expressions generate the same set of
tuples
• An equivalence rule says that expressions of two forms are equivalent.
• Replace an expression of the first form by an expression of the second form, or vice
versa
• , 1, 2, and so on to denote predicates
• L1, L2, L3, and so on to denote lists of attributes
• E, E1, E2, and so on to denote relational-algebra expressions
68
Query Processing and Optimization
1. Conjunctive selection operations can be deconstructed into a sequence of
individual selections.
s q1 q 2 ( E ) = s q1 (s q 2 ( E ))

2. Selection operations are commutative.


s q (s q ( E )) = s q (s q ( E ))
1 2 2 1

3. Only the last in a sequence of projection operations is needed, the others can
be omitted.
 L1 ( L2 ( ( Ln ( E )) )) =  L1 ( E )

4. Selections can be combined with Cartesian products and theta joins.


a. (E1 X E2) = E1 ⋈ E2
b. 1(E1 ⋈2 E2) = E1 ⋈ 1 2 E2 69

Query Processing and Optimization


5.Theta-join operations (and natural joins) are commutative.
E1 ⋈  E2 = E2 ⋈ E1
6.(a) Natural join operations are associative:
(E1 ⋈ E2) ⋈ E3 = E1 ⋈ (E2 ⋈ E3)

(b) Theta joins are associative in the following manner:

(E1 ⋈ 1 E2) ⋈2 3 E3 = E1 ⋈ 1 3 (E2 ⋈ 2 E3)

where 2 involves attributes from only E2 and E3.

70
Query Processing and Optimization

71

Query Processing and Optimization


7.The selection operation distributes over the theta join operation under the
following two conditions:
(a) When all the attributes in 0 involve only the attributes of one of the
expressions (E1) being joined.

0E1 ⋈ E2) = (0(E1)) ⋈ E2

(b) When  1 involves only the attributes of E1 and 2 involves only the
attributes of E2.
1 E1 ⋈ E2) = (1(E1)) ⋈ ( (E2))

72
Query Processing and Optimization
8.The projection operation distributes over the theta join operation as
follows:
(a) if  involves only attributes from L1  L2:

(b) Consider a join E1 ⋈ E2.


• Let L1 and L2 be sets of attributes from E1 and E2, respectively.
• Let L3 be attributes of E1 that are involved in join condition , but
are not in L1  L2, and
• let L4 be attributes of E2 that are involved in join condition , but are
not in L1  L2.
73

Query Processing and Optimization


9. The set operations union and intersection are commutative
E1  E2 = E2  E1

E1  E2 = E2  E1
(set difference is not commutative).
10.Set union and intersection are associative.
(E1  E2)  E3 = E1  (E2  E3)

(E1  E2)  E3 = E1  (E2  E3)

74
Query Processing and Optimization
11. The selection operation distributes over ,  and –.
s (E1 – E2) = s (E1) – s(E2)
and similarly for  and  in place of –
Also: s (E1 – E2) = s(E1) – E2
and similarly for  in place of –, but not for 

12. The projection operation distributes over union

L(E1  E2) = (L(E1))  (L(E2))

75

Query Processing and Optimization


Transformation Examples
instructor(ID, name, dept name, salary)
teaches(ID, course id, sec id, semester, year)
course(course id, title, dept name, credits)
Query: Find the names of all instructors in the Music department, along with the
titles of the courses that they teach
name, title(dept_name= “Music”(instructor ⋈ (teaches ⋈ course_id, title (course))))
• Transformation using rule 7a.
name, title((dept_name= “Music”(instructor)) ⋈ (teaches ⋈ course_id, title (course)))
• Performing the selection as early as possible reduces the size of the relation to
be joined. 76
Query Processing and Optimization
instructor(ID, name, dept name, salary)
teaches(ID, course id, sec id, semester, year)
course(course id, title, dept name, credits)

Find the names of all instructors in the


Music department together with the course
title of all the courses that the instructors
teach

77

Query Processing and Optimization

Initial expression tree


Constructs a large
intermediate relation
78
Query Processing and Optimization
Initial expression
tree
Transformed expression
tree

79

Query Processing and Optimization


Transformed expression tree

80

You might also like