BCSE302L-Database Systems Module - 4 Part2
BCSE302L-Database Systems Module - 4 Part2
Index Files
Construct B+ tree on the key values of name. Order of the tree is 4
Srini
Srini Wu
Mozar Srini Wu
Einstein
Srini
Srini
Srini
Gold Srini
Gold Srini
12
Hashing Techniques
Internal Hashing
• Hashing is implemented as a hash table
• Use of an array of records
• Consider the array index range is from 0 to M – 1 then we have M slots.
• Choose a hash function that transforms the hash field value into an integer between
0 and M − 1.
• Common hash function h(K) = K mod M
• Non integer hash field values transformed into integers before the mod function
is applied
• For character strings, the numeric (ASCII) codes associated with characters can be
used in the transformation. 13
Hashing Techniques
Other hashing Functions
• Folding
• Applies arithmetic function -addition or a logical function - exclusive or
• For example, with an address space from 0 to 999 to store 1,000 keys,
• 6-digit key 235469 may be folded
• stored at the address: (235+964) mod 1000 = 199
• Another technique involves picking some digits of the hash field value—for
instance, the third, fifth, and eighth digits—to form the hash address
• Social Security numbers of 10 digits 301-67-8923 hash value of 172
14
Hashing Techniques
• Hashing functions do not guarantee that distinct values will hash to distinct
addresses
• hash field space is usually much larger than the address space
• hash field space—the number of possible values a hash field can take
• address space—the number of available addresses for records.
• The hashing function maps the hash field space to the address space
15
Hashing Techniques
16
Hashing Techniques
Collision
• Collision hash field value of a record that is being inserted hashes to an address
that already contains a different record.
• Insert the new record in some other position?
• Collision resolution- Process of finding another position
• Methods for collision resolution
• Open addressing
• Chaining
• Multiple hashing
17
Hashing Techniques
• Open addressing
• Proceeding from the occupied position specified by the hash addresschecks the
subsequent positions in order until an unused (empty) position is found
18
Hashing Techniques
• Chaining
• Various overflow locations are kept by extending the array with a number of
overflow positions.
• A pointer field is added to each record location
• Places the new record in an unused overflow location
• setting the pointer of the occupied hash address location to the address of that
overflow location.
19
Hashing Techniques
• Chaining
20
Hashing Techniques
• Multiple hashing
• Applies a second hash function if the first results in a collision.
• If another collision results uses open addressing or applies a third hash function
and then uses open addressing if necessary.
• The series of hash functions are used in the same order for retrieval.
21
Hashing Techniques
• Goal of a good hashing function
• To distribute the records uniformly over the address space to minimize collisions,
to locate a record with a given key in a single access.
• Best to keep a hash file between 70 and 90% full the number of collisions
remains low and not waste too much space
22
Hashing Techniques
External Hashing for Disk Files
• Hashing for disk files is called External Hashing
• The file blocks are divided into M equal-sized buckets
• A bucket corresponds to one (or a fixed number of) disk block.
• One of the file fields hash key of the file.
• The record with hash key value K is stored in bucket i,
• i=h(K), and h is the hashing function
• Collisions occur when a new record hashes to a bucket that is already full.
• An overflow file is kept for storing such records.
• Overflow records that hash to each bucket can be linked together.
23
Hashing Techniques
External Hashing for Disk Files
Hashing Techniques
External Hashing for Disk Files
• Static hashing - fixed number of buckets M is allocated.
• The function does key-to-address mappingfixing the address space
• Main disadvantages of static external hashing:
• Fixed number of buckets M if the number of records in the file grows or
shrinks.
• Ordered access on the hash key is inefficient requires sorting the records
25
Hashing Techniques
External Hashing for Disk Files
Dynamic and Extendible Hashing Techniques
• Dynamic and Extendible Hashing Techniques
• Allow the dynamic growth and shrinking of the number of file records.
• dynamic hashing, extendible hashing, and linear hashing.
• Dynamic and extendible hashing use the binary representation of the hash
value h(K) in order to access a directory.
• In dynamic hashing the directory is a binary tree.
• In extendible hashing the directory is an array of size 2d where d is called the global
depth.
26
Hashing Techniques
External Hashing for Disk Files
• Extendible Hashing Techniques
• Show the extendable hash structure for this file if the hash function is h(x) = x and
buckets can hold three records
28
Hashing Techniques
External Hashing for Disk Files
• Extendible Hashing Techniques
Local Depth=1
0
1
Global Depth=1
29
Hashing Techniques
External Hashing for Disk Files
• Extendible Hashing Techniques
Local Depth=1
0 1
1
Global Depth=1
30
Hashing Techniques
External Hashing for Disk Files
• Extendible Hashing Techniques
Local Depth=1
0 1 7
1
Global Depth=1
31
Hashing Techniques
External Hashing for Disk Files
• Extendible Hashing Techniques
Local Depth=1
0 1 7 3
1
Global Depth=1
32
Hashing Techniques
External Hashing for Disk Files
• Extendible Hashing Techniques
Local Depth=1
0 1 7 3
1
Global Depth=1
8
33
Hashing Techniques
External Hashing for Disk Files
• Extendible Hashing Techniques
Local Depth=1
0 1 7 3
1
Global Depth=1
8 12
34
Hashing Techniques
External Hashing for Disk Files
• Extendible Hashing Techniques
Local Depth=1
0 1 7 3
1
Global Depth=1
8 12 14
35
Hashing Techniques
External Hashing for Disk Files
• Extendible Hashing Techniques
Local Depth=1
0 1 7 3
1
Global Depth=1
8 12 14
36
Hashing Techniques
External Hashing for Disk Files
• Extendible Hashing Techniques
Local Depth=1
00 1 7 3
01
10
8 11 Local Depth=2
11
Global Depth=2 12 14 Local Depth=2
Local Depth=2
1 3 2
00
01 7
10
8 11 Local Depth=2
11
Global Depth=2 12 14 Local Depth=2
Hashing Techniques
External Hashing for Disk Files
• Extendible Hashing Techniques
Local Depth=2
1 3 2
00
01 7 4
10
8 11 10 Local Depth=2
11
Global Depth=2 12 14 13 Local Depth=2
001
010 7 4 Local Depth=2
011
100
8 9 Local Depth=3
101
11 10 Local Depth=3
110
111 12 14 13 Local Depth=2
40
Global Depth=3
Hashing Techniques
External Hashing for Disk Files
• Extendible Hashing Techniques
41
Hashing Techniques
External Hashing for Disk Files
Dynamic hashing
42
Hashing Techniques
External Hashing for Disk Files
• Dynamic Hashing Techniques
• Show the Dynamic hash structure for the file if the hash function is h(x) = x and
buckets can hold three records
43
Hashing Techniques
External Hashing for Disk Files
• Dynamic Hashing Techniques
0
1
44
Hashing Techniques
External Hashing for Disk Files
• Dynamic Hashing Techniques
0 1 7 3
1
8 12 14
45
Hashing Techniques
External Hashing for Disk Files
• Dynamic Hashing Techniques
0 1 7 3
1
0 8 11
1
12 14
Hashing Techniques
External Hashing for Disk Files
• Dynamic Hashing Techniques
0 1 3 2
1
0 7
1
0 8 11
1
12 14
0 1 3 2
1
0 7 4
1
0 8 11 10
1
12 14 13
Inserting 9 causes overflow of Bucket10
48
Hashing Techniques
External Hashing for Disk Files
• Dynamic Hashing Techniques
1 3 2
0
1 7 4
0
1 8 9
0
0
1
1 11 10
12 14 13
49
Hashing Techniques
External Hashing for Disk Files
Linear Hashing
• Allow a hash file to expand and shrink its number of buckets dynamically without
needing a directory.
• For example file starts with M buckets numbered 0, 1, … , M − 1 and uses the mod
hash function h(K) = K mod Minitial hash function hi.
• When a collision leads to an overflow record in any file bucket the first bucket in
the file—bucket 0—is split into two buckets: the original bucket 0 and a new bucket M
at the end of the file.
• Records originally in bucket 0 are distributed between the two buckets based on a
different hashing function hi+1(K) = K mod 2M.
• Any records that hashed to bucket 0 based on hi will hash to either bucket 0 or
50
bucket M based on hi+1
Hashing Techniques
External Hashing for Disk Files
Linear Hashing
Show the Linear hash structure for the file with bucket size 2 and hashing parameter
M=3
1, 7, 3, 8, 12, 4, 11, 2, 10, 13, 14, 9
bucket 1
Hashing Techniques
External Hashing for Disk Files
Linear Hashing
1, 7, 3, 8, 12, 4, 11, 2, 10, 13, 14, 9
Since overflow occurs split the bucket 0 into bucket 0 and bucket M(3) new
bucket is at the end
Hashing Techniques
External Hashing for Disk Files
Linear Hashing
1, 7, 3, 8, 12, 4, 11, 2, 10, 13, 14, 9
Insert 11
h1(11)=11 mod 3=2
Bucket 0 12
Bucket 1 1 7 4
Bucket 2 8 11
Bucket 3 3
55
Hashing Techniques
External Hashing for Disk Files
Linear Hashing
1, 7, 3, 8, 12, 4, 11, 2, 10, 13, 14, 9
Insert 2
h1(2)=2 mod 3=2 Overflow occurs, maintaining individual overflow chains for bucket 2
Since overflow occurs split the bucket 1 into bucket 1 and bucket M+1(4) new
bucket is at the end
Bucket 0 12
Bucket 1 1 7 4
Bucket 2 8 11 2
Bucket 3 3 56
Hashing Techniques
External Hashing for Disk Files
Linear Hashing
1, 7, 3, 8, 12, 4, 11, 2, 10, 13, 14, 9
Since overflow occurs split the bucket 1 into bucket 1 and bucket M+1(4) new
bucket is at the end
Distribute the values in bucket1 Bucket 0 12
between bucket1 and bucket4 by
applying hash function Bucket 1 1 7 4
h2(k)=k mod 2M
h2(k)=k mod 6 Bucket 2 8 11 2
Bucket 3 3
Bucket 4
57
Hashing Techniques
External Hashing for Disk Files
Linear Hashing
1, 7, 3, 8, 12, 4, 11, 2, 10, 13, 14, 9
Distribute the values in bucket1
between bucket1 and bucket4 by
applying hash function Bucket 0 12
h2(1)=1 mod 6=1
h2(7)=7 mod 6=1 Bucket 1 1 7
h2(4)=4 mod 6=4
Bucket 2 8 11 2
Bucket 3 3
Bucket 4 4
58
Hashing Techniques
External Hashing for Disk Files
Linear Hashing
1, 7, 3, 8, 12, 4, 11, 2, 10, 13, 14, 9
Insert 10
h1(10)=10 mod 6=4
Bucket 0 12
Insert 13
h1(13)=13 mod 6=1 Bucket 1 1 7 13
Overflow occurs, maintaining
individual overflow chains Bucket 2 8 11 2
for bucket 1
Bucket 3 3
Bucket 4 4 10
59
Hashing Techniques
External Hashing for Disk Files
Linear Hashing
Bucket 0 12
1, 7, 3, 8, 12, 4, 11, 2, 10, 13, 14, 9
Since overflow occurs split the
bucket 2 into bucket 2 and bucket
Bucket 1 1 7 13
M+2(5) new bucket is at the end
Bucket 2 8 2
Distribute the values in bucket2
between bucket2 and bucket5 by Bucket 3 3
applying hash function
h2(k)=k mod 2M Bucket 4 4 10
h2(k)=k mod 6
Bucket 5
h2(8)=8 mod 6=2 11
h2(11)=11 mod 6=5
60
h2(2)=2 mod 6=2
Hashing Techniques
External Hashing for Disk Files
Linear Hashing
Bucket 0 12
1, 7, 3, 8, 12, 4, 11, 2, 10, 13, 14, 9
Insert 14
h1(14)=14 mod 3=2
Bucket 1 1 7 13
Overflow occurs, maintaining
Bucket 2 8 2
individual overflow chains for bucket 2
Bucket 3 3
End of round 1, splitting has to be from
the beginning Bucket 4 4 10
h1(k)=k mod 3 is changed to
h1(k)=k mod 6, h2(k)=k mod 12 Bucket 5 11
61
Hashing Techniques
External Hashing for Disk Files
Linear Hashing
Bucket 0 12
1, 7, 3, 8, 12, 4, 11, 2, 10, 13, 14, 9
Insert 14
h1(14)=14 mod 6=2
Bucket 1 1 7 13
Overflow occurs, maintaining
Bucket 2 8 2 14
individual overflow chains for bucket 2
Since overflow occurs split the bucket 0 Bucket 3 3
into bucket 0 and bucket 6 new bucket is
at the end Bucket 4 4 10
Distribute the values in bucket0
between bucket0 and bucket6 by Bucket 5 11
applying hash function
h2(k)=k mod 2M, h2(12)=12 mod 12 =0 Bucket 6
62
Hashing Techniques
External Hashing for Disk Files
Linear Hashing
Bucket 0 12
1, 7, 3, 8, 12, 4, 11, 2, 10, 13, 14, 9
Insert 9
h1(9)=9 mod 6=3
Bucket 1 1 7 13
Bucket 2 8 2 14
Bucket 3 3 9
Bucket 4 4 10
Bucket 5 11
Bucket 6 63
Relational Algebra
• Procedural query language
• Set of operations take one or two relations as input and produce a
new relation as their result.
• The fundamental operations select, project, union, set difference,
Cartesian product, and rename.
• Other operations—set intersection, natural join, and assignment,
Extended Relational-Algebra Operations
Relational Algebra
• select, project, and rename unary operations operate on one
relation.
• Set operations, Cartesian product, joins binary operations operate
on pairs of relations
2
Relational Algebra
The Select Operation
• Selects tuples that satisfy a given predicate.
• The lowercase Greek letter sigma (σ ) to denote selection.
σ Condition/Predicate(Relation/Table name)
o Where predicate is consisting of terms connected by : (and), (or),
(not)
o Each term is one of: <attribute>op <attribute> or <constant>
o where op is one of: =, , >, , <,
Relational Algebra
The Select Operation
• Select operation is commutative
σ<cond1>(σ<cond2>(R)) = σ<cond2>(σ<cond1>(R))
4
Relational Algebra
Employee relation
The Select Operation
emp_name=“ram”(employee)
Relational Algebra
The Select Operation
salary>6000 (employee)
6
Relational Algebra
The Project Operation
• Returns its argument relation, with certain attributes left out.
• Any duplicate rows are eliminated.
• Denoted by the uppercase Greek letter pi (П).
П A1, A2, …, Ak (r)
where A1, A2 are attribute names and r is a relation name.
Relational Algebra
The Project Operation
• Number of tuples in a relation resulting from a Project operation is
always less than or equal to the number of tuples in R.
• Commutativity does not hold on Project.
8
Relational Algebra
The Project Operation Employee relation
Relational Algebra
Relational algebra expression Employee relation
Composition of
Relational Operations
11
Relational Algebra
The Union Operation
• Unify tuples from two relations
• Notation: r ∪ s
For r ∪ s to be valid
• r, s must have the same arity (same number of attributes)
• The attribute domains must be compatible (e.g., 2nd column of r deals
with the same type of values as does the 2nd column of s)
• No duplicates present after the union operation
12
Relational Algebra
The Union Operation
13
Relational Algebra
The Set-Difference / Minus Operation
• To find tuples that are in one relation but are not in another.
• Denoted by –
• r – s produces a relation containing those tuples in r but not in s
• Set differences must be taken between compatible relations.
• r and s must have the same arity
• attribute domains of r and s must be compatible
14
Relational Algebra
The Set-Difference Operation
Relational Algebra
The Intersection Operation
• Includes all tuples that are in both relations
r∩s
• r, s have the same arity
• attributes of r and s are compatible
r ∩ s = r - (r - s)
• Set intersection is not a fundamental operation
16
Relational Algebra
The Intersection Operation
Example: Find all the customers who have both a loan and
an account from the borrower and depositor relation.
Relational Algebra
Set Operations
• Union and Intersection are commutative operations
R ∪ S = S ∪ R and R ∩ S = S ∩ R
• Union and Intersection n-ary operations applicable to any number of
relations because both are also associative operations
R ∪ (S ∪ T ) = (R ∪ S) ∪ T and (R ∩ S) ∩ T = R ∩ (S ∩ T )
18
Relational Algebra
The Cartesian-Product Operation
• Denoted by a cross (x)combine information from any two relations.
rxs
• Also called Cross Product or Cross Join
• R(A1, A2, … , An) × S(B1, B2, … , Bm) is a relation Q with degree n + m
attributes Q(A1, A2, … , An, B1, B2, … , Bm)
• R has i tuples and S has k tuples Q will have i*k tuples
19
Relational Algebra
The Cartesian-Product Operation
EMPNAMES × DEPENDENT
20
Relational Algebra
The Cartesian-Product Operation
21
Relational Algebra
Assignment operation
• Denoted by ← (left arrow)
22
Relational Algebra
Assignment operation
Relational Algebra
Assignment operation
Fname, Lname, Salary ( dno=5(EMPLOYEE))
TEMP dno=5(EMPLOYEE)
RESULT Fname, Lname, Salary (TEMP)
Result
24
Relational Algebra
Rename operation
• Denoted by the lowercase Greek letter rho ()
• Allows to nameto refer to, the results of relational-algebra
expressions.
• Allows to refer to a relation by more than one name.
• Example:
x (E)
• Returns the expression E under the name x
• If a relational-algebra expression E has arity n, then
x (A1, A2, …, An) (E) 25
Relational Algebra
Rename operation
26
Relational Algebra
Join operation
• Denoted by ⋈ is used to combine related tuples from two relations into single
tuples
• Only combinations of tuples satisfying the join condition
• Specified as a Cartesian product operation followed by a Select operation
• Natural join- Forms a Cartesian product of its two argumentsperforms a
selection forcing equality on those attributes that appear in both relation
schemas, and finally removes duplicate attributes
• Sometimes denoted by *
• Join involves join conditions with equality comparisons onlythe only comparison
operator used is =, is called an Equijoin.
27
Relational Algebra
Join operation
28
Relational Algebra
Join operation
29
Relational Algebra
Join operation
Theta Join
R ⋈<join condition>S
A general join condition is of the form
<condition> AND <condition> AND … AND <condition>
Each <condition> is of the form Ai θ Bj, Ai is an attribute of R, Bj is an
attribute of S, Ai and Bj have the same domain, and θ (theta) is one of the
comparison operators {=, <, ≤, >, ≥, ≠}
A Join operation with such a general join condition is called a Theta Join
30
Relational Algebra
Join operation
Outer Join
• Preserves those tuples that would be lost in an join by creating tuples in
the result containing null values.
• Three forms of the operation:
• left outer join, denoted ⟕
• right outer join, denoted ⟖
• full outer join, denoted ⟗
• All three forms of outer join compute the join, and add extra tuples to
the result of the join.
31
Relational Algebra
Join operation
Outer Join
• Left outer join ⟕
• Takes all tuples in the left relation that did not match with any tuple in the right
relation, pads the tuples with null values for all other attributes from the right
relation, and adds them to the result of the natural join
• Right outer join ⟖
• Pads tuples from the right relation that did not match any from the left relation
with nulls and adds them to the result of the natural join.
• Full outer join ⟗
• Does both the left and right outer join operations 32
Relational Algebra
Generalized Projection
• Extends the projection operation by allowing operations such as
arithmetic and string functions to be used in the projection list.
F1,F2,...,Fn(E)
• where E is any relational-algebra expression
• Each of F1, F2, . . . , Fn is an arithmetic expression involving constants
and attributes in the schema of E, permits operations such as
concatenation of strings.
33
Relational Algebra
Generalized Projection
Consider the relation
EMPLOYEE (Ssn, Salary, Deduction, Years_service)
A report may be required to show
Net Salary = Salary – Deduction,
Bonus = 2000 * Years_service
Tax = 0.25 * Salary
(Ssn, Salary – Deduction, 2000 * Years_service, 0.25 * Salary (employee))
(Ssn, Net_salary, Bonus, Tax) (Ssn, Salary – Deduction, 2000 * Years_service, 0.25 * Salary (employee))
34
Relational Algebra
Aggregate Functions and Grouping
• Take a collection of values and return a single value as a result.
• The symbol is the letter G in calligraphic font; read it as “calligraphic G.”
• aggregation and its subscript specifies the aggregate operation to
be applied.
• Aggregate functions- sum, avg, count, min and max
• To eliminate duplicates, use function names with hyphenated string
“distinct” appended to the end of the function name (for example, count-
distinct).
• In some notations aggregate function represented by ℑ
35
Relational Algebra
Aggregate Functions and Grouping
<grouping attributes> <function list> (R)
Employee relation
dept_no average(salary)(employee) 37
Relational Algebra
Aggregate Functions and Grouping
Relational Algebra
40
Relational Algebra
41
Relational Algebra
42
Relational Algebra
Examples (For the relations given above)
43
Relational Algebra
Examples (For the relations given above)
44
Relational Algebra
Examples (For the relations given above)
45
Relational Algebra
Examples (For the relations given above)
46
Relational Algebra
Examples (For the relations given above)
47
Relational Algebra
Examples (For the relations given above)
48
Relational Algebra
Examples (For the relations given above)
49
Relational Algebra
Examples
50
Query Processing
Query processing
• Refers to the range of activities involved in extracting data from a
database.
• Include translation of queries in high-level database languages into
expressions that can be used at the physical level of the file system,
• A variety of query-optimizing transformations, and actual evaluation of
queries
51
Query Processing
Query processing
• Steps
1. Parsing and translation.
2. Optimization.
3. Evaluation.
52
Query Processing
Query processing
• Given a query, there are a variety of methods for computing the answer
• Annotate it with instructions specifying how to evaluate each operation
• A relational algebra operation annotated with instructions on how to
evaluate it is called an evaluation primitive.
• A sequence of primitive operations that can be used to evaluate a
query is a query-execution plan or query-evaluation plan.
53
Query Processing
Query processing
Query tree
• Used in relational DBMSs (RDBMSs) to represent queries internally.
• A query tree or a query evaluation tree or query execution tree.
• A query tree is a tree data structure that corresponds to a relational algebra
expression.
• Represents the input relations of the query as leaf nodes of the tree, and
represents the relational algebra operations as internal nodes.
• Execution consists of executing an internal node operation whenever its
operands (represented by its child nodes) are available, and then replacing
that internal node by the relation that results from executing the operation. 54
Query Processing
Query processing
Query tree
• The execution terminates when the root node is executed and produces the
result relation for the query.
• The query parser will typically generate a standard initial query tree to
correspond to an SQL query, without doing any optimization.
• Canonical query tree represents a relational algebra expression that is very inefficient if
executed directly
• Heuristic query optimizer will transform this initial query tree into an
equivalent final query tree that is efficient to execute
55
Query Processing
Example
dno=5
EMPLOYEE
57
59
60
Query Processing and Optimization
Steps in converting a query tree during heuristic optimization
Example: Consider the following relations
SELECT E.Lname
FROM EMPLOYEE E, WORKS_ON W, PROJECT P
WHERE P.Pname=‘Aquarius’ AND P.Pnumber=W.Pno AND E.Ssn=W.ESsn
AND E.Bdate > ‘1957-12-31’;
62
Query Processing and Optimization
Find the last names of employees born after 1957 who work on a
project named ‘Aquarius’.
Initial (canonical) query tree for SQL query Q
63
64
Query Processing and Optimization
Find the last names of employees born after 1957 who work on a project
named ‘Aquarius’.
Applying the more restrictive SELECT
operation first
65
66
Query Processing and Optimization
Find the last names of employees born after 1957 who work on a project
named ‘Aquarius’.
Moving PROJECT operations
down the query tree
67
3. Only the last in a sequence of projection operations is needed, the others can
be omitted.
L1 ( L2 ( ( Ln ( E )) )) = L1 ( E )
70
Query Processing and Optimization
71
(b) When 1 involves only the attributes of E1 and 2 involves only the
attributes of E2.
1 E1 ⋈ E2) = (1(E1)) ⋈ ( (E2))
72
Query Processing and Optimization
8.The projection operation distributes over the theta join operation as
follows:
(a) if involves only attributes from L1 L2:
E1 E2 = E2 E1
(set difference is not commutative).
10.Set union and intersection are associative.
(E1 E2) E3 = E1 (E2 E3)
74
Query Processing and Optimization
11. The selection operation distributes over , and –.
s (E1 – E2) = s (E1) – s(E2)
and similarly for and in place of –
Also: s (E1 – E2) = s(E1) – E2
and similarly for in place of –, but not for
75
77
79
80