ch1011_tree_hash_index_csi3317 (1)
ch1011_tree_hash_index_csi3317 (1)
ch1011_tree_hash_index_csi3317 (1)
Indexes
Selected Sections of Chapters 10
& 11
P K P K 2 P K m Pm
0 1 1 2
Non-leaf
Pages
Leaf
Pages
Overflow
page
Primary pages
Leaf pages contain data entries.
atabase Management Systems 3ed, R. Ramakrishnan and J. Gehrke
Data
Comments on ISAM Pages
Index Pages
File creation: Leaf (data) pages allocated
sequentially, sorted by search key; then index
pages allocated, then space for overflow pages.
Overflow pages
Index entries: <search key value, page id>; they
`direct’ search for data entries, which are in leaf pages.
Search: Start at root; use key comparisons to go to leaf.
Cost log F N ; F = # entries/index pg, N = # leaf pgs
Insert: Find leaf data entry belongs to, and put it there.
Delete: Find and remove from leaf; if empty overflow
page, de-allocate.
20 33 51 63
10* 15* 20* 27* 33* 37* 40* 46* 51* 55* 63* 97*
Pages
20 33 51 63
Primary
Leaf
10* 15* 20* 27* 33* 37* 40* 46* 51* 55* 63* 97*
Pages
Pages
42*
20 33 51 63
10* 15* 20* 27* 33* 37* 40* 46* 55* 63*
Index Entries
(Direct search)
Data Entries
("Sequence set")
atabase Management Systems 3ed, R. Ramakrishnan and J. Gehrke
Example B+ Tree
24* ... 13 17 24 30
2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*
occupancy is
guaranteed in 2* 3* 5* 7* 8*
both leaf and
index pg splits.
Note difference
between copy- Entry to be inserted in parent node.
(Note that 17 is pushed up and only
up and push- 17
appears once in the index. Contrast
this with a leaf split.)
up; be sure you
understand the
5 13 24 30
reasons for this.
5 13 24 30
2* 3* 5* 7* 8* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*
17
5 13 27 30
2* 3* 5* 7* 8* 14* 16* 22* 24* 27* 29* 33* 34* 38* 39*
22
5 13 17 20 30
2* 3* 5* 7* 8* 14* 16* 17* 18* 20* 21* 22* 27* 29* 33* 34* 38* 39*
5 13 20 22 30
2* 3* 5* 7* 8* 14* 16* 17* 18* 20* 21* 22* 27* 29* 33* 34* 38* 39*
atabase Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1
Prefix Key Compression
Important to increase fan-out. (Why?)
Key values in index entries only `direct traffic’; can
often compress them.
E.g., If we have adjacent index entries with search key
values Dannon Yogurt, David Smith and Devarakonda
Murthy, we can abbreviate David Smith to Dav. (The
other keys can be compressed too ...)
• Is this correct? Not quite! What if there is a data entry Davey
Jones? (Can only compress David Smith to Davi)
• In general, while compressing, must leave each index entry
greater than every key value (in any subtree) to its left.
Insert/delete must be suitably modified.
N-1
Primary bucket pages Overflow pages
atabase Management Systems 3ed, R. Ramakrishnan and J. Gehrke 2
Static Hashing (Contd.)
Buckets contain data entries.
Hash fn works on search key field of record r.
Must distribute values over range 0 ... M-1.
h(key) = (a * key + b) usually works well.
a and b are constants; lots known about how to tune
h.
Long overflow chains can develop and degrade
performance.
Extendible and Linear Hashing: Dynamic techniques
to fix this problem.
Example
2 2
Bucket B
00 1* 5* 21* 13*
Directory is array of size 4.
01
To find bucket for r, take 2
10
last `global depth’ # bits of 10*
Bucket C
11
h(r); we denote r by h(r).
If h(r) = 5 = binary 101,
DIRECTORY 2
it is in bucket pointed to 15* 7* 19*
Bucket D
by 01.
DATA PAGES
2 2
3 2
00 1* 5* 21*13* Bucket B 000 1* 5* 21* 13* Bucket B
01 001
10 2 2
010
10* Bucket C
11 10*
011 Bucket C
100
2
DIRECTORY 101 2
Bucket D
15* 7* 19*
110 15* 7* 19* Bucket D
111
2
3
4* 12* 20* Bucket A2
DIRECTORY 4* 12* 20* Bucket A2
(`split image'
of Bucket A) (`split image'
of Bucket A)
atabase Management Systems 3ed, R. Ramakrishnan and J. Gehrke 2
Points to Note
20 = binary 10100. Last 2 bits (00) tell us r
belongs in A or A2. Last 3 bits needed to tell
which.
Global depth of directory: Max # of bits needed to tell
which bucket an entry belongs to.
Local depth of a bucket: # of bits used to determine if
an entry belongs to this bucket.
When does bucket split cause directory doubling?
Before insert, local depth of bucket = global depth.
Insert causes local depth to become > global depth;
directory is doubled by copying it over and `fixing’
pointer to split image page. (Use of least significant
bits enables efficient doubling via copying of directory!)
atabase Management Systems 3ed, R. Ramakrishnan and J. Gehrke 2
Directory Doubling
Why use least significant bits in directory?
Allows for doubling via copying!
6 = 110 3
6 = 110 3
000 000
001 100
2 2
010 010
1 00 1 00
011 110
0 6* 01 100 0 10 001
1 10 6* 101 1 6* 01
101
11 11 6*
110 6* 011 6*
111 111