ch1011_tree_hash_index_csi3317 (1)

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 42

Tree- and Hash-Structured

Indexes
Selected Sections of Chapters 10
& 11

atabase Management Systems 3ed, R. Ramakrishnan and J. Gehrke


Introduction
 As for any index, 3 alternatives for data entries
k*:
 Data record with key value k
 <k, rid of data record with search key value k>
 <k, list of rids of data records with search key k>
 Choice depends on the indexing technique used
to locate data entries k*.
 Tree-structured indexing techniques support both
range searches and equality searches.
 ISAM: static structure; B+ tree: dynamic,
adjusts gracefully under inserts and deletes.
 Hash-based indexes are best for equality
selections. Cannot support range searches.
atabase Management Systems 3ed, R. Ramakrishnan and J. Gehrke
Intuition Behind Tree
Indexes: Support of Range
Searches
 ``Find all students with gpa > 3.0’’
 If data is in sorted file, do binary search to find
first such student, then scan to find others.
 Cost of binary search can be quite high since it is
proportional to the # of pages fetched.
 Solution: Create an `index’ file.
k1 k2 kN Index File

Page 1 Page 2 Page 3 Page N Data File

 Can do binary search on (smaller) index file!


atabase Management Systems 3ed, R. Ramakrishnan and J. Gehrke
ISAM index entry

P K P K 2 P K m Pm
0 1 1 2

 Index file may still be quite large. But we


can apply the idea repeatedly!

Non-leaf
Pages

Leaf
Pages
Overflow
page
Primary pages
 Leaf pages contain data entries.
atabase Management Systems 3ed, R. Ramakrishnan and J. Gehrke
Data
Comments on ISAM Pages

Index Pages
 File creation: Leaf (data) pages allocated
sequentially, sorted by search key; then index
pages allocated, then space for overflow pages.
Overflow pages
 Index entries: <search key value, page id>; they
`direct’ search for data entries, which are in leaf pages.
 Search: Start at root; use key comparisons to go to leaf.
Cost log F N ; F = # entries/index pg, N = # leaf pgs
 Insert: Find leaf data entry belongs to, and put it there.
 Delete: Find and remove from leaf; if empty overflow
page, de-allocate.

atic tree structure: inserts/deletes affect only leaf pages.


atabase Management Systems 3ed, R. Ramakrishnan and J. Gehrke
Example ISAM Tree

 Each node can hold 2 entries; no need


for `next-leaf-page’ pointers. (Why?)
Root
40

20 33 51 63

10* 15* 20* 27* 33* 37* 40* 46* 51* 55* 63* 97*

atabase Management Systems 3ed, R. Ramakrishnan and J. Gehrke


After Inserting 23*, 48*, 41*,
42* ...
Root
Index 40

Pages

20 33 51 63

Primary
Leaf
10* 15* 20* 27* 33* 37* 40* 46* 51* 55* 63* 97*
Pages

Overflow 23* 48* 41*

Pages
42*

atabase Management Systems 3ed, R. Ramakrishnan and J. Gehrke


... Then Deleting 42*,
51*, 97*
Root
40

20 33 51 63

10* 15* 20* 27* 33* 37* 40* 46* 55* 63*

23* 48* 41*

 Note that 51* appears in index levels, but not in leaf!


atabase Management Systems 3ed, R. Ramakrishnan and J. Gehrke
B+ Tree: Most Widely Used
Index
 Insert/delete at log F N cost; keep tree height-
balanced. (F = fanout, N = # leaf pages)
 Minimum 50% occupancy (except for root). Each
node contains d <= m <= 2d entries. The
parameter d is called the order of the tree.
 Supports equality and range-searches efficiently.

Index Entries
(Direct search)

Data Entries
("Sequence set")
atabase Management Systems 3ed, R. Ramakrishnan and J. Gehrke
Example B+ Tree

 Search begins at root, and key


comparisons direct it to a leaf (as in
ISAM).
 Search for
Root5*, 15*, all data entries >=

24* ... 13 17 24 30

2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*

 Based on the search for 15*, we know it is not in the tree!


atabase Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1
B+ Trees in Practice

 Typical order: 100. Typical fill-factor: 67%.


 average fanout = 133
 Typical capacities:
 Height 4: 1334 = 312,900,700 records
 Height 3: 1333 = 2,352,637 records
 Can often hold top levels in buffer pool:
 Level 1 = 1 page = 8 Kbytes
 Level 2 = 133 pages = 1 Mbyte
 Level 3 = 17,689 pages = 133 MBytes

atabase Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1


Inserting a Data Entry into a
B+ Tree
 Find correct leaf L.
 Put data entry onto L.
 If L has enough space, done!
 Else, must split L (into L and a new node L2)
• Redistribute entries evenly, copy up middle key.
• Insert index entry pointing to L2 into parent of L.
 This can happen recursively
 To split index node, redistribute entries evenly, but
push up middle key. (Contrast with leaf splits.)
 Splits “grow” tree; root split increases height.
 Tree growth: gets wider or one level taller at top.

atabase Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1


Inserting 8* into Example B+
Tree
Entry to be inserted in parent node.
 Observe how 5 (Note that 5 is
s copied up and
minimum continues to appear in the leaf.)

occupancy is
guaranteed in 2* 3* 5* 7* 8*
both leaf and
index pg splits.
 Note difference
between copy- Entry to be inserted in parent node.
(Note that 17 is pushed up and only
up and push- 17
appears once in the index. Contrast
this with a leaf split.)
up; be sure you
understand the
5 13 24 30
reasons for this.

atabase Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1


Example B+ Tree After
Inserting 8*
Root
17

5 13 24 30

2* 3* 5* 7* 8* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*

 Notice that root was split, leading to increase in height.


 In this example, we can avoid split by re-
distributing entries; however, this is
usually not done in practice.
atabase Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1
Deleting a Data Entry from a
B+ Tree
 Start at root, find leaf L where entry belongs.
 Remove the entry.
 If L is at least half-full, done!
 If L has only d-1 entries,
• Try to re-distribute, borrowing from sibling (adjacent
node with same parent as L).
• If re-distribution fails, merge L and sibling.
 If merge occurred, must delete entry (pointing to L
or sibling) from parent of L.
 Merge could propagate to root, decreasing height.

atabase Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1


Example Tree After (Inserting
8*, Then) Deleting 19* and
20* ...
Root

17

5 13 27 30

2* 3* 5* 7* 8* 14* 16* 22* 24* 27* 29* 33* 34* 38* 39*

 Deleting 19* is easy.


 Deleting 20* is done with re-distribution.
Notice how middle key is copied up.
atabase Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1
... And Then Deleting
24*
 Must merge. 30
 Observe `toss’ of
index entry (on
22* 27* 29* 33* 34* 38* 39*
right), and `pull
down’ of index
entry (below).
Root
5 13 17 30

2* 3* 5* 7* 8* 14* 16* 22* 27* 29* 33* 34* 38* 39*

atabase Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1


Example of Non-leaf Re-
distribution
 Tree is shown below during deletion of 24*.
(What could be a possible initial tree?)
 In contrast to previous example, can re-
distribute entry from left child of root to
right child.
Root

22

5 13 17 20 30

2* 3* 5* 7* 8* 14* 16* 17* 18* 20* 21* 22* 27* 29* 33* 34* 38* 39*

atabase Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1


After Re-distribution
 Intuitively, entries are re-distributed by
`pushing through’ the splitting entry in the
parent node.
 It suffices to re-distribute index entry with
key 20; we’ve re-distributed
Root 17 as well for
illustration. 17

5 13 20 22 30

2* 3* 5* 7* 8* 14* 16* 17* 18* 20* 21* 22* 27* 29* 33* 34* 38* 39*
atabase Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1
Prefix Key Compression
 Important to increase fan-out. (Why?)
 Key values in index entries only `direct traffic’; can
often compress them.
 E.g., If we have adjacent index entries with search key
values Dannon Yogurt, David Smith and Devarakonda
Murthy, we can abbreviate David Smith to Dav. (The
other keys can be compressed too ...)
• Is this correct? Not quite! What if there is a data entry Davey
Jones? (Can only compress David Smith to Davi)
• In general, while compressing, must leave each index entry
greater than every key value (in any subtree) to its left.
 Insert/delete must be suitably modified.

atabase Management Systems 3ed, R. Ramakrishnan and J. Gehrke 2


Bulk Loading of a B+ Tree
 If we have a large collection of records,
and we want to create a B+ tree on some
field, doing so by repeatedly inserting
records is very slow.
 Bulk Loading can be done much more
efficiently.
 Initialization: Sort all data entries, insert
Root
Sorted pages of data entries; not yet in B+ tree
pointer to first (leaf) page in a new (root)
page.
3* 4* 6* 9* 10* 11* 12* 13* 20* 22* 23* 31* 35* 36* 38* 41* 44*

atabase Management Systems 3ed, R. Ramakrishnan and J. Gehrke 2


Bulk Loading (Contd.)
Root 10 20

 Index entries for


leaf pages always 6 12 23 35
Data entry pages
not yet in B+ tree
entered into right-
most index page
just above leaf 3* 4* 6* 9* 10* 11* 12* 13* 20*22* 23* 31* 35* 36* 38*41* 44*
level. When this
fills up, it splits.
(Split may go up Root 20
right-most path to
the root.) 10 35 Data entry pages
not yet in B+ tree
 Much faster than
repeated inserts, 6 12 23 38
especially when one
considers locking!
3* 4* 6* 9* 10* 11* 12* 13* 20*22* 23* 31* 35* 36* 38*41* 44*
atabase Management Systems 3ed, R. Ramakrishnan and J. Gehrke 2
A Note on `Order’
 Order (d) concept replaced by physical space
criterion in practice (`at least half-full’).
 Index pages can typically hold many more entries
than leaf pages.
 Variable sized records and search keys mean differnt
nodes will contain different numbers of entries.
 Even with fixed length fields, multiple records with
the same search key value (duplicates) can lead to
variable-sized data entries (if we use Alternative (3)).

atabase Management Systems 3ed, R. Ramakrishnan and J. Gehrke 2


Static Hashing
 # primary pages fixed, allocated
sequentially, never de-allocated; overflow
pages if needed.
 h(k) mod M = bucket to which data entry
with key k belongs. (M = # of buckets)
0
h(key) mod N
2
key
h

N-1
Primary bucket pages Overflow pages
atabase Management Systems 3ed, R. Ramakrishnan and J. Gehrke 2
Static Hashing (Contd.)
 Buckets contain data entries.
 Hash fn works on search key field of record r.
Must distribute values over range 0 ... M-1.
 h(key) = (a * key + b) usually works well.
 a and b are constants; lots known about how to tune
h.
 Long overflow chains can develop and degrade
performance.
 Extendible and Linear Hashing: Dynamic techniques
to fix this problem.

atabase Management Systems 3ed, R. Ramakrishnan and J. Gehrke 2


Extendible Hashing
 Situation: Bucket (primary page) becomes full.
Why not re-organize file by doubling # of
buckets?
 Reading and writing all pages is expensive!
 Idea: Use directory of pointers to buckets, double #
of buckets by doubling the directory, splitting just the
bucket that overflowed!
 Directory much smaller than file, so doubling it is
much cheaper. Only one page of data entries is split.
No overflow page!
 Trick lies in how hash function is adjusted!

atabase Management Systems 3ed, R. Ramakrishnan and J. Gehrke 2


LOCAL DEPTH 2
Bucket A
GLOBAL DEPTH 4* 12* 32* 16*

Example
2 2
Bucket B
00 1* 5* 21* 13*
 Directory is array of size 4.
01
 To find bucket for r, take 2
10
last `global depth’ # bits of 10*
Bucket C
11
h(r); we denote r by h(r).
 If h(r) = 5 = binary 101,
DIRECTORY 2
it is in bucket pointed to 15* 7* 19*
Bucket D
by 01.
DATA PAGES

sert: If bucket is full, split it (allocate new page, re-distribute


necessary, double the directory. (As we will see, splitting a
bucket does not always require doubling; we can tell by
comparing global depth with local depth for the split bucket.)
atabase Management Systems 3ed, R. Ramakrishnan and J. Gehrke 2
Insert h(r)=20 (Causes
Doubling)
LOCAL DEPTH 2 3
Bucket A LOCAL DEPTH
GLOBAL DEPTH 32*16* 32* 16* Bucket A
GLOBAL DEPTH

2 2
3 2
00 1* 5* 21*13* Bucket B 000 1* 5* 21* 13* Bucket B
01 001
10 2 2
010
10* Bucket C
11 10*
011 Bucket C
100
2
DIRECTORY 101 2
Bucket D
15* 7* 19*
110 15* 7* 19* Bucket D
111
2
3
4* 12* 20* Bucket A2
DIRECTORY 4* 12* 20* Bucket A2
(`split image'
of Bucket A) (`split image'
of Bucket A)
atabase Management Systems 3ed, R. Ramakrishnan and J. Gehrke 2
Points to Note
 20 = binary 10100. Last 2 bits (00) tell us r
belongs in A or A2. Last 3 bits needed to tell
which.
 Global depth of directory: Max # of bits needed to tell
which bucket an entry belongs to.
 Local depth of a bucket: # of bits used to determine if
an entry belongs to this bucket.
 When does bucket split cause directory doubling?
 Before insert, local depth of bucket = global depth.
Insert causes local depth to become > global depth;
directory is doubled by copying it over and `fixing’
pointer to split image page. (Use of least significant
bits enables efficient doubling via copying of directory!)
atabase Management Systems 3ed, R. Ramakrishnan and J. Gehrke 2
Directory Doubling
Why use least significant bits in directory?
 Allows for doubling via copying!

6 = 110 3
6 = 110 3
000 000

001 100
2 2
010 010
1 00 1 00
011 110
0 6* 01 100 0 10 001
1 10 6* 101 1 6* 01
101
11 11 6*
110 6* 011 6*
111 111

Least Significant vs. Most Significant


atabase Management Systems 3ed, R. Ramakrishnan and J. Gehrke 3
Comments on Extendible
Hashing
 If directory fits in memory, equality search
answered with one disk access; else two.
 100MB file, 100 bytes/rec, 4K pages contains
1,000,000 records (as data entries) and 25,000
directory elements; chances are high that directory will
fit in memory.
 Directory grows in spurts, and, if the distribution of
hash values is skewed, directory can grow large.
 Multiple entries with same hash value cause problems!
 Delete: If removal of data entry makes bucket
empty, can be merged with `split image’. If each
directory element points to same bucket as its
split image, can halve directory.
atabase Management Systems 3ed, R. Ramakrishnan and J. Gehrke 3
Linear Hashing
 This is another dynamic hashing scheme, an
alternative to Extendible Hashing.
 LH handles the problem of long overflow chains
without using a directory, and handles duplicates.
 Idea: Use a family of hash functions h , h , h , ...
0 1 2
 hi(key) = h(key) mod(2iN); N = initial # buckets
 h is some hash function (range is not 0 to N-1)
 If N = 2d0, for some d0, hi consists of applying h and
looking at the last di bits, where di = d0 + i.
 hi+1 doubles the range of hi (similar to directory
doubling)
atabase Management Systems 3ed, R. Ramakrishnan and J. Gehrke 3
Linear Hashing (Contd.)
 Directory avoided in LH by using overflow
pages, and choosing bucket to split round-
robin.
 Splitting proceeds in `rounds’. Round ends when all
NR initial (for round R) buckets are split. Buckets 0
to Next-1 have been split; Next to NR yet to be split.
 Current round number is Level.
 Search: To find bucket for data entry r, find hLevel(r):
• If hLevel(r) in range `Next to NR’ , r belongs here.
• Else, r could belong to bucket hLevel(r) or bucket
hLevel(r) + NR; must apply hLevel+1(r) to find out.

atabase Management Systems 3ed, R. Ramakrishnan and J. Gehrke 3


Overview of LH File
 In the middle of a round.
Buckets split in this round:
Bucket to be split If h Level ( search key value )
Next is in this range, must use
h Level+1 ( search key value )
Buckets that existed at the
to decide if entry is in
beginning of this round: `split image' bucket.
this is the range of
hLevel
`split image' buckets:
created (through splitting
of other buckets) in this round

atabase Management Systems 3ed, R. Ramakrishnan and J. Gehrke 3


Linear Hashing (Contd.)
 Insert: Find bucket by applying hLevel / hLevel+1:
 If bucket to insert into is full:
• Add overflow page and insert data entry.
• (Maybe) Split Next bucket and increment Next.
 Can choose any criterion to `trigger’ split.
 Since buckets are split round-robin, long
overflow chains don’t develop!
 Doubling of directory in Extendible Hashing is
similar; switching of hash functions is implicit
in how the # of bits examined is increased.
atabase Management Systems 3ed, R. Ramakrishnan and J. Gehrke 3
Example of Linear Hashing
 On split, hLevel+1 is used
to re-distribute entries.
Level=0, N=4 Level=0

h h PRIMARY h h PRIMARY OVERFLOW


1 0 Next=0 PAGES 1 0 PAGES PAGES

32*44* 36* 32*


000 00 000 00
Next=1
Data entry r
001 01 9* 25* 5* with h(r)=5 001 01 9* 25* 5*

14* 18*10*30* 14* 18*10*30*


010 10 Primary 010 10
bucket page
31*35* 7* 11* 31*35* 7* 11* 43*
011 11 011 11
(This info (The actual contents
is for illustration of the linear hashed 100 00 44* 36*
only!) file)
atabase Management Systems 3ed, R. Ramakrishnan and J. Gehrke 3
Example: End of a Round
Level=1
PRIMARY OVERFLOW
h1 h0 PAGES PAGES
Next=0
Level=0 000 00 32*
PRIMARY OVERFLOW
h1 h0 PAGES PAGES
001 01 9* 25*
000 00 32*
010 10 66* 18* 10* 34* 50*
001 01 9* 25*
011 11 43* 35* 11*
010 10 66*18* 10* 34*
Next=3 100 00 44* 36*
011 11 31*35* 7* 11* 43*

101 11 5* 37* 29*


100 00 44*36*

101 5* 37*29* 110 10 14* 30* 22*


01

110 10 14*30*22* 111 11 31* 7*

atabase Management Systems 3ed, R. Ramakrishnan and J. Gehrke 3


LH Described as a Variant of
EH
 The two schemes are actually quite similar:
 Begin with an EH index where directory has N elements.
 Use overflow pages, split buckets round-robin.
 First split is at bucket 0. (Imagine directory being
doubled at this point.) But elements <1,N+1>,
<2,N+2>, ... are the same. So, need only create
directory element N, which differs from 0, now.
• When bucket 1 splits, create directory element N+1, etc.
 So, directory can double gradually. Also, primary
bucket pages are created in order. If they are
allocated in sequence too (so that finding i’th is
easy), we actually don’t need a directory! Voila,
LH.
atabase Management Systems 3ed, R. Ramakrishnan and J. Gehrke 3
Summary
 Tree-structured indexes are ideal for range-
searches, also good for equality searches.
 ISAM is a static structure.
 Only leaf pages modified; overflow pages needed.
 Overflow chains can degrade performance unless
size of data set and data distribution stay constant.
 B+ tree is a dynamic structure.
 Inserts/deletes leave tree height-balanced; log F N
cost.
 High fanout (F) means depth rarely more than 3 or 4.
 Almost always better than maintaining a sorted file.

atabase Management Systems 3ed, R. Ramakrishnan and J. Gehrke 3


Summary (Contd.)
 Typically, 67% occupancy on average.
 Usually preferable to ISAM, modulo locking
considerations; adjusts to growth gracefully.
 If data entries are data records, splits can change rids!
 Key compression increases fanout, reduces height.
 Bulk loading can be much faster than repeated
inserts for creating a B+ tree on a large data set.
 Most widely used index in database management
systems because of its versatility. One of the
most optimized components of a DBMS.

atabase Management Systems 3ed, R. Ramakrishnan and J. Gehrke 4


Summary (Contd.)
 Hash-based indexes: best for equality searches,
cannot support range searches.
 Static Hashing can lead to long overflow chains.
 Extendible Hashing avoids overflow pages by
splitting a full bucket when a new data entry is
to be added to it. (Duplicates may require
overflow pages.)
 Directory to keep track of buckets, doubles
periodically.
 Can get large with skewed data; additional I/O if this
does not fit in main memory.
atabase Management Systems 3ed, R. Ramakrishnan and J. Gehrke 4
Summary (Contd.)
 Linear Hashing avoids directory by splitting
buckets round-robin, and using overflow pages.
 Overflow pages not likely to be long.
 Duplicates handled easily.
 Space utilization could be lower than Extendible
Hashing, since splits not concentrated on `dense’
data areas.
• Can tune criterion for triggering splits to trade-off
slightly longer chains for better space utilization.
 For hash-based indexes, a skewed data
distribution is one in which the hash values of
data entries are not uniformly distributed!
atabase Management Systems 3ed, R. Ramakrishnan and J. Gehrke 4

You might also like