B - Trees
B - Trees
B - Trees
B+ Trees. Section 10.3-10.6, 10.8 Static and Extendible Hashing. Section 11.1 and 11.2
Winter 2003
P rob l em
w i th I S A M
Inserts and deletes affect only the leaf pages. As a consequence, long overflow chains at the leaf pages may result Long overflow chains may significantly slow down the time to retrieve a record (all overflow pages have to be searched) Overflow pages are not sorted. Therefore, lookup is slow. Possible to keep overflow pages sorted also but inserts will be slow Overflow pages do not go away unless all records in the overflow pages are deleted, or a complete reorganization is performed
Winter 2003 2
A dynamic index structure that adjusts gracefully to inserts and deletes A balanced tree Leaf pages are not allocated sequentially. They are linked together through pointers (a doubly linked list).
Index Entries (Direct search)
i del y U sed
Main characteristics: Insert/delete at logFN cost; keep tree height-balanced. (F = fanout, N = # leaf pages) Minimum 50% occupancy (except for root). Each node contains d <= m <= 2d entries. The parameter d is called the order of the tree. Supports equality and range-searches efficiently.
i del y U sed
Winter 2003
F orm at of a n ode
Same as that of ISAM Non-leaf nodes with m index entries contain m+1 pointers to children Pointer Pi points to a child with key values k such that ki k < ki+1 P0 points to a child whose key values are less than k1
Winter 2003
E x am p l e B + Tree
Search begins at root, and key comparisons direct it to a leaf. At each node, a binary search or linear search can be performed Search for 5*, 15*, all data entries >= 24* ...
Root
13 17 24 30
2*
3*
5*
7*
14* 16*
I n serti n g 8 * i n to E x am p l e B + Tree
Observe how minimum occupancy is guaranteed in both leaf and index pg splits. Note difference between copy-up and push-up; be sure you understand the reasons for this.
5 Entry to be inserted in parent node. (Note that 5 is copied up and s continues to appear in the leaf.)
2*
3*
5*
7*
8*
17
Entry to be inserted in parent node. (Note that 17 is pushed up and only appears once in the index. Contrast this with a leaf split.)
30
13
24
Winter 2003
13
24
30
2*
3*
5*
7* 8*
14* 16*
Notice that root was split, leading to increase in height. In this example, we can avoid split by re-distributing entries; however, this is usually not done in practice.
Winter 2003 9
13
24
30
2*
3*
5*
7* 8*
14* 16*
Notice that the value 5 occurs redundantly, once in a leaf page and once in a non-leaf page. This is because values in the leaf page cannot be pushed up, unlike the value 17
Winter 2003 10
Insert 8*
2*
3*
5*
7*
14* 16*
Root
8 17 24 30
2*
3*
5*
7*
8* 14* 16*
Winter 2003
11
Winter 2003
12
a B + Tree
Start at root, find leaf L where entry belongs. Remove the entry. If L is at least half-full, done! If L has only d-1 entries, Try to re-distribute, borrowing from sibling (adjacent node with same parent as L). If re-distribution fails, merge L and sibling. If merge occurred, must delete entry (pointing to L or sibling) from parent of L. Merge could propagate to root, decreasing height.
Winter 2003 13
13
27
30
2*
3*
5*
7* 8*
14* 16*
22* 24*
27* 29*
Deleting 19* is easy. Deleting 20* is done with re-distribution. Notice how middle key is copied up.
Winter 2003 14
. . . A n d Th en D el eti n g 2 4 *
Must merge. Observe `toss of index entry (on right), and `pull down of index entry (below).
Root
5 13 17 30 30
22*
27*
29*
33*
34*
38*
39*
2*
3*
5*
7*
8*
14* 16*
Winter 2003
15
D el ete 2 4 *
Root
22
13
17
20
27
30
2*
3*
5*
7* 8*
14* 16*
22* 24*
27* 29*
17* 18*
20* 21*
Winter 2003
16
13
17
20
30
2* 3*
5* 7* 8*
14* 16*
17* 18*
20* 21*
Winter 2003
17
13
20
22
30
2* 3*
5* 7* 8*
14* 16*
17* 18*
20* 21*
Winter 2003
18
B ul k Loadi n g of a B + Tree
If we have a large collection of records, and we want to create a B+ tree on some field, doing so by repeatedly inserting records is very slow. Bulk Loading for creating a B+ tree index on existing records is much more efficient
Winter 2003
19
B ul k Loadi n g
Sort all data entries If data entries are (key, pointer) pairs, sort these pairs according to key values and not the actual data records Allocate an empty page to be the root. Insert pointer to first (leaf) page in root page
Root Sorted pages of data entries; not yet in B+ tree
3* 4*
6* 9*
10* 11* 12* 13* 20* 22* 23* 31* 35* 36*
Winter 2003
20
10
B ul k Loadi n g
Add entry into root page for each page of sorted data entries. Doubly linked data entry pages. Proceed until root page is filled or no more data entry pages
Root
6 10
3* 4*
6* 9*
10* 11* 12* 13* 20* 22* 23* 31* 35* 36*
Winter 2003
21
B ul k Loadi n g
If root page is filled and to insert one more page of data entries, split the root and create a new root page
10
Root
6 12
3* 4*
6* 9*
10* 11* 12* 13* 20* 22* 23* 31* 35* 36*
Winter 2003
22
11
B ul k Loadi n g
Root
10 20
Index entries for leaf pages always entered into right-most index page just above leaf level. When this fills up, it splits. (Split may go up right-most path to the root.) Much faster than repeated inserts, especially when one considers locking!
12
23
35
3* 4*
6* 9*
10* 11* 12* 13* 20*22* 23* 31* 35*36* 38*41* 44*
Root
20
10
35
12
23
Winter 2003
3* 4*
6* 9*
10* 11* 12* 13* 20*22* 23* 31* 35*36* 38*41* 44*
23
S um m ary of B ul k Loadi n g
Option 1: multiple inserts. Slow. Does not give sequential storage of leaves. Option 2: Bulk Loading Has advantages for concurrency control. Fewer I/Os during build. Leaves will be stored sequentially (and linked, of course). Can control fill factor on pages.
Winter 2003
24
12
S um m ary
Many alternative file organizations exist, each appropriate in some situation. If selection queries are frequent, sorting the file or building an index is important. Hash-based indexes only good for equality search. Sorted files and tree-based indexes best for range search; also good for equality search. (Files rarely kept sorted in practice; B+ tree index is better.) Index is a collection of data entries plus a way to quickly find entries with given key values.
Winter 2003 25
S um m ary
Data entries can be actual data records, <key, rid> pairs, or <key, rid-list> pairs. Choice orthogonal to indexing technique used to locate data entries with a given key value. Can have several indexes on a given file of data records, each with a different search key. Indexes can be classified as clustered vs. unclustered, primary vs. secondary, and dense vs. sparse. Differences have important consequences for utility/performance.
Winter 2003 26
13
S um m ary
Tree-structured indexes are ideal for range-searches, also good for equality searches. ISAM is a static structure. Only leaf pages modified; overflow pages needed. Overflow chains can degrade performance unless size of data set and data distribution stay constant. B+ tree is a dynamic structure. Inserts/deletes leave tree height-balanced; logFN cost. High fanout (F) means depth rarely more than 3 or 4. Almost always better than maintaining a sorted file.
Winter 2003 27
Hash-B ase d I n d e x i n g
Recall that for any index, there are 3 alternatives for data entries k*: Data record with key value k <k, rid of data record with search key value k> <k, list of rids of data records with search key k> Choice orthogonal to the indexing technique Hash-based indexes are best for equality selections. Cannot support range searches. Static and dynamic hashing techniques exist; trade-offs similar to ISAM vs. B+ trees.
Winter 2003 28
14
S tati c H ash i n g
h(k) mod N returns the bucket to which data entry with key k belongs. (N = # of buckets) We refer to the above as the hash function which maps values into a range of buckets
h(key) mod N key h 0 2
N-1
Primary bucket pages
Winter 2003
Overflow pages
29
S tati c H ash i n g
# primary pages fixed (which is N), allocated sequentially, never de-allocated; overflow pages if needed. Buckets contain data entries, which can be in any of the three alternatives discussed earlier Hash function works on search key field of record r. Must distribute values over range 0 ... N-1. Typically, h(key) = (a * key + b) for some constants a and b lots known about how to tune h.
Winter 2003 30
15
S tati c H ash i n g
Search for data entry k: Apply hash function h on k to obtain the bucket number. Then, search the bucket for k. Data entries in each bucket are typically maintained in sorted order to speed up the search Insert a data entry k: Apply hash function h on k to obtain the bucket number. Place data entry in that bucket. If no space left, allocate a new overflow page and place data entry in the overflow page. Chain up the overflow page. Delete a data entry k: Search k and delete
Winter 2003
31
S tati c H ash i n g - E x am p l e
Assume 2 data entries per bucket and we have 5 buckets Insert key values a,b,c,d,e,f,g where h(a)=1, h(b)=2, h(c)=3, h(g)=7
Insert z,x where h(z)=1 and h(x)=5 1 2 3 4 5 Insert p,q,r where h(p)=1, h(q)=5, and h(r)=1 z 1 2 3 4 5
a,f b,g c d e
1 2 3 4 5
z,p
q
32
Winter 2003
16
S tati c H ash i n g
Long overflow chains can develop and degrade performance. Number of buckets is fixed. What if file shrinks significantly through deletions? Extendible and Linear Hashing: Dynamic techniques to fix this problem.
Winter 2003
33
E x ten di b l e H ash i n g
Situation: Bucket (primary page) becomes full. Why not reorganize file by doubling number of buckets? Reading and writing all pages is expensive! Idea: Use directory of pointers to buckets, double # of buckets by doubling the directory, splitting just the bucket that overflowed! Directory much smaller than file, so doubling it is much cheaper. Only one page of data entries is split. No overflow page! Trick lies in how hash function is adjusted!
Winter 2003 34
17
E x am p l e
Directory is array of size 4. Search: To find bucket for r, take last `global depth # bits of h(r); If h(r) = 5 = binary 101, it is in bucket pointed to by 01.
LOCAL DEPTH GLOBAL DEPTH 2 00 01 10 11 2 10* Bucket C 2 4* 12* 32* 16* Bucket A
DIRECTORY
Winter 2003
35
I n se r t h( r )= 9 ( 1 0 0 1 ) ( C au se s S p l i t t i n g b u t n o d o u b l i n g )
LOCAL DEPTH GLOBAL DEPTH 3 000 001 010 011 100 101 110 111 3 DIRECTORY 4* 12* 20* Bucket A2 DIRECTORY 2 15* 7* 19* Bucket D 2 10* Bucket C 2 1* 5* 21* 13* Bucket B 000 001 010 011 100 101 110 111 3 4* 12* 20* 3 5* 21* 13* Bucket A2 36 Bucket B2 2 15* 7* 19* Bucket D 2 10* Bucket C 3 32* 16* Bucket A LOCAL DEPTH GLOBAL DEPTH 3 3 1* 9* Bucket B
Winter 2003
18
P oi n ts to N ote
Global depth of directory p: Max # of bits needed to tell which bucket an entry belongs to. Local depth of a bucket q: # of bits used to determine if an entry belongs to this bucket. Each bucket has 2p-q pointers to it from the directory If a bucket has only 1 pointer to it from the directory, splitting the bucket causes doubling. Otherwise, there is more than one pointer to the bucket; when the bucket is splitted, simply redistribute the pointers. Delete: Search and delete. Merging with its split image occurs when a bucket becomes empty; If every directory element and its split image directory entry point to the same bucket, shrink directory by 1/2.
Winter 2003 37
C om m en ts on E x ten di b l e H ash i n g
If directory fits in memory, equality search answered with two disk access 100MB file, 100 bytes per data entry, 4K pages There is about 1,000,000 data entries and 25,000 directory elements; chances are high that directory will fit in memory. Directory grows in spurts, and, if the distribution of hash values is skewed, directory can grow large. Multiple entries with same hash value cause problems!
Winter 2003
38
19