0% found this document useful (0 votes)

37 views20 pages

Indexing and Hashing

Indexing mechanisms like indices and hashing are used to speed up access to desired data records. There are two main types of indices: ordered indices which store search keys in sorted order, and hash indices which distribute search keys uniformly using a hash function. B-tree indexing structures are commonly used as they can efficiently support insertion, deletion and search operations in logarithmic time. B-trees keep the tree balanced through splitting nodes and propagating keys during insertion if nodes overflow.

Uploaded by

Mukul Dilwaria

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

37 views20 pages

Indexing and Hashing

Uploaded by

Mukul Dilwaria

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

26

Indexing and Hashing

Indexing mechanisms used to speed up access to
desired data. E.g., author catalog in library.
Search Key - attribute to set of attributes used to look
up records in a file.
An index file consists of records (called index entries)
of the form
search-key

pointer

Index files are typically much smaller than the original file
Two basic kinds of indices:
Ordered indices: search keys are stored in sorted
order
Hash indices: search keys are distributed uniformly
across buckets using a hash function.

DBMS: Rajeev Wankar

Main concepts
search keys are sorted in the index file and point to
the actual records
primary vs. secondary indices
Clustering (sparse) vs non-clustering (dense) indices
Primary key index: on primary key (no duplicates)
123
234
345
456
567

STUDENT
Ssn
123
234
678
456
345

Nam e
smith
jones
tom s o n
stevens
smith

Address
main str
forbes a ve
main str
forbes a ve
forbes a ve

Secondary key index: duplicates may exist

forbes ave
main str

Address-index

DBMS: Rajeev Wankar

STUDENT
Ssn
Name
123 smith
234 jones
345 tomson
456 stevens
567 smith

Address
main str
forbes ave
main str
forbes ave
forbes ave

secondary key index: typically, with postings lists

Postings lists
STUDENT
Ssn
Name
123 smith
234 jones
345 tomson
456 stevens
567 smith

forbes ave
main str

Address
main str
forbes ave
main str
forbes ave
forbes ave

Clustering (= sparse) index: records are physically

sorted on that key (and not all key values are
needed in the index)
Non-clustering (=dense) index: the opposite
E.g.: Clustering/sparse index on ssn
123
456

>=123

>=456

DBMS: Rajeev Wankar

STUDENT
Ssn
Name
123 smith
234 jones
345 tomson
456 stevens
567 smith

Address
main str
forbes ave
main str
forbes ave
forbes ave

Sparse Index: contains index records for only some

search-key values.
Dense Index Files: Index record appears for every
search-key value in the file
Non-clustering / dense index
123
234
345
456
567

Ssn
345
234
567
456
123

Nam e
tom s o n
jones
smith
ste v e n s
smith

Address
main str
forbes a ve
forbes a ve
forbes a ve
main str

ISAM
What if index is too large to search sequentially?
>=123
123
3,423

123
456

>=456
block

DBMS: Rajeev Wankar

STUDENT
Ssn
Name
123 smith
234 jones
345 tomson
456 stevens
567 smith

Address
main str
forbes ave
main str
forbes ave
forbes ave

Multilevel Index
If primary index does not fit in memory, access
becomes expensive.
if index is too large, store it on disk and keep
index-on-the-index
usually two levels of indices, one first- level entry
per disk block (why? )
What about insertions/deletions?
Index Update: Deletion
If deleted record was the only record in the file with
its particular search-key value, the search-key is
deleted from the index also.
>=123
123
3,423

123
456

>=456
124; peterson; fifth ave.

DBMS: Rajeev Wankar

STUDENT
Ssn
Name
123 smith
234 jones
345 tomson
456 stevens
567 smith

Address
main str
forbes ave
main str
forbes ave
forbes ave

123
3,423

123
456

STUDENT
Ssn
Name
123 smith
234 jones
345 tomson
456 stevens
567 smith

overflows
Address
main str
forbes ave
main str
forbes ave
forbes ave

124; peterson; fifth ave.

Problems?

overflow chains may become very long - what to

do?
overflow chains may become very long - thus:
shut-down & reorganize
start with ~80% utilization
if index is too large, store it on disk and keep index
on the index (in memory)
usually two levels of indices, one first- level entry per
disk block (why? )
indices (like ISAM) suffer in the presence of frequent updates
alternative indexing structure: B - trees
DBMS: Rajeev Wankar

the most successful family of index schemes

(B-trees, B+-trees, B*-trees)
Can be used for primary/secondary, clustering/
non-clustering index.

B-trees
Eg., B-tree of order 3:

<6
>6
1

<9
7

9
>9
13

A B/B+ tree is a rooted tree satisfying the following

properties:
All paths from root to leaf are of the same length
Each node that is not a root or a leaf has between
( ( n 2 ) and n children.
A leaf node has between ( ( ( n 1 ) 2 ) and n 1
values. Special cases: If the root is not a leaf, it has
at least 2 children.
DBMS: Rajeev Wankar

If the root is a leaf (that is, there are no other nodes

in the tree), it can have between 0 and (n1) values.
O(log (N)) for everything! (ins/del/search)
typically, if m = 50 - 100, then 2 - 3 levels
utilization >= 50%, guaranteed; on average 69%
Queries: Algorithm for exact match query?
(eg., ssn=8?)

<6
>6
3

H steps
>9

<9
7

B-tree print keys in sorted order?

<6
>6
1

DBMS: Rajeev Wankar

<9
7

9
>9
13

Solution B+-Tree Index Files

facilitate sequential ops.
They string all leaf nodes together
AND
replicate keys from non-leaf nodes, to make sure
every key appears at the leaf level
6

<6
>=6
1

9
>=9

<9
6

Advantage of B+-tree index files: automatically

reorganizes itself with small, local, changes, in the
face of insertions and deletions. Reorganization of
entire file is not required to maintain performance.
Disadvantage of B+-trees: extra insertion and deletion overhead, space overhead.
Advantages of B+-trees outweigh disadvantages,
and they are used extensively.

DBMS: Rajeev Wankar

B+-Tree Node Structure

Ki are the search-key values

Pi are pointers to children (for non-leaf nodes) or
pointers to records or buckets of records (for leaf
nodes).
The search-keys in a node are ordered
K1 < K2 < K3 < . . . < Kn1
Leaf Nodes in B+-Trees
For i = 1, 2, . . ., n1, pointer Pi either points to a file
record with search-key value Ki, or to a bucket of
pointers to file records, each record having searchkey value Ki. Only need bucket structure if searchkey does not form a primary key.
Pn points to next leaf node in search-key order

DBMS: Rajeev Wankar

Non-Leaf Nodes in B+-Trees

Non leaf nodes form a multi-level sparse index on
the leaf nodes. For a non-leaf node with m pointers:
All the search-keys in the sub-tree to which P1 points
are less than K1
For 2 i n 1 , all the search-keys in the sub-tree to
which Pi points have values greater than or equal to
Ki1 and less than Km1
B+-tree for account file (n = 3)

DBMS: Rajeev Wankar

Leaf nodes must have between 2 and 4 values

( ( ( n 1 ) 2 ) and n 1, with n = 5).
Non-leaf nodes other than root must have between 3 and
5 children ( ( n 2 ) and n with n =5).
Root must have at least 2 children.

Observations about B+-trees

Since the inter-node connections are done by
pointers, logically close blocks need not be
physically close.
The non-leaf levels of the B+-tree form a hierarchy
of sparse indices.
The B+-tree contains a relatively small number of
levels (logarithmic in the size of the main file), thus
searches can be conducted efficiently.

DBMS: Rajeev Wankar

Queries on B+-Trees

Find all records with a search-key value of k.

Start with the root node

1. Examine the node for the smallest search-key value

> k.
2. If such a value exists, assume it is Kj. Then follow Pi
to the child node
3. Otherwise k K m 1 , where there are m pointers in
the node. Then follow Pm to the child node.

If the node reached by following the pointer above

is not a leaf node, repeat the above procedure on
the node, and follow the corresponding pointer.
Eventually reach a leaf node. If for some i, key Ki =
k follow pointer Pi to the desired record or bucket.
Else no record with search-key value k exists.
Insertions and deletions to the main file can be
handled efficiently, as the index can be restructured in logarithmic time (as we shall see).
If there are K search-key values in the file, the path is
no longer than log n 2 ( K ) .
With 1 million search key values and n = 100, at most
log50(1,000,000) = 4 nodes are accessed in a lookup.
DBMS: Rajeev Wankar

B+ tree insertion
Find the leaf node in which the search-key value
would appear
If the search-key value is already there in the leaf
node, record is added to file and if necessary a
pointer is inserted into the bucket.
If the search-key value is not there, then add the
record to the main file and create a bucket if necessary. Then:
If there is room in the leaf node, insert (key-value,
pointer) pair in the leaf node
Otherwise, split the node (along with the new (keyvalue, pointer) entry).

Splitting a node:
take the n(search-key value, pointer) pairs (including
the one being inserted) in sorted order. Place the first
n 2 in the original node, and the rest in a new
node.

DBMS: Rajeev Wankar

let the new node be p, and let k be the least key value
in p. Insert (k,p) in the parent of the node being split.
If the parent is full, split it and propagate the split
further up.
The splitting of nodes proceeds upwards till a node
that is not full is found. In the worst case the root
node may be split increasing the height of the tree
by 1.
/* ATTENTION:
a split at the LEAF level is handled by COPYING
the middle key upstairs;
A split at a higher level is handled by PUSHING
the middle key upstairs */
INSERTION OF KEY K
insert search-key value to L such that the keys are in
order;
if ( L overflows) {
split L ;
insert (ie., COPY) smallest search-key value
of new node to parent node P;
if (P overflows) {
repeat the B-tree split procedure recursively;
/* Notice: the B-TREE split; NOT the B+ -tree */
}
}
DBMS: Rajeev Wankar

E g ., in s e r t 8
6

>=9

>=6
3

E g ., in s e r t 8
6

>=9

>=6
1

C O P Y m id d le u p s t a ir s

Eg., insert 8
6

>=9
>=6

<9
6

7
7

COPY middle upstairs

DBMS: Rajeev Wankar

N o n -leaf overflow
just PU S H t h e
m iddle

E g ., insert 8
6

>=9

>=6
1

C O P Y m iddle upstairs

E g ., in s e r t 8

>=7
9

>=9

>=6
1

FIN A L T R E E

DBMS: Rajeev Wankar

B+-Tree before and after insertion of Clearview

B+-Tree File Organization
Index file degradation problem is solved by using
B+-Tree indices.
The leaf nodes in a B+-tree file organization store
records, instead of pointers.
Since records are larger than pointers, the maximum
number of records that can be stored in a leaf node is
less than the number of pointers in a nonleaf node.
Leaf nodes are still required to be half full.
Insertion and deletion are handled in the same way
as insertion and deletion of entries in a B+-tree index.
DBMS: Rajeev Wankar

B-Tree Index Files

Similar to B+-tree, but B-tree allows search-key values to appear only once; eliminates redundant storage of search keys.
Search keys in nonleaf nodes appear nowhere else in
the B-tree; an additional pointer field for each search
key in a nonleaf node must be included.

DBMS: Rajeev Wankar

B-tree (above) and B+-tree (below) on same data

DBMS: Rajeev Wankar

Unit Iii
No ratings yet
Unit Iii
20 pages
CCXT Documentation: Release 1.10.1143
No ratings yet
CCXT Documentation: Release 1.10.1143
109 pages
DFD Example 3 Gifts
No ratings yet
DFD Example 3 Gifts
3 pages
Index Architecture: Febriliyan Samopa
No ratings yet
Index Architecture: Febriliyan Samopa
110 pages
Chapter 11: Indexing and Hashing
No ratings yet
Chapter 11: Indexing and Hashing
47 pages
Index and Hashing
No ratings yet
Index and Hashing
82 pages
Indexing Hashing Files
No ratings yet
Indexing Hashing Files
68 pages
INDEXING
No ratings yet
INDEXING
10 pages
Unit Iv Indexing and Hashing: Basic Concepts
No ratings yet
Unit Iv Indexing and Hashing: Basic Concepts
35 pages
Ch14, Veiws, Normalization - Summary
No ratings yet
Ch14, Veiws, Normalization - Summary
68 pages
DBMS Indexing Methods
No ratings yet
DBMS Indexing Methods
33 pages
CH 12 Updated
No ratings yet
CH 12 Updated
55 pages
Indexing: Contents
No ratings yet
Indexing: Contents
13 pages
7 Indexing
No ratings yet
7 Indexing
13 pages
Indexing
No ratings yet
Indexing
10 pages
Indexing - II
No ratings yet
Indexing - II
57 pages
Indexing
No ratings yet
Indexing
6 pages
IN3020/4020 - Database Systems Spring 2020, Week 3.1 Indexing
No ratings yet
IN3020/4020 - Database Systems Spring 2020, Week 3.1 Indexing
44 pages
CS2202 IndexingHashing
No ratings yet
CS2202 IndexingHashing
83 pages
UNIT-5: Indexing and Hashing
No ratings yet
UNIT-5: Indexing and Hashing
78 pages
IT3020 L06 Indexing
No ratings yet
IT3020 L06 Indexing
41 pages
03 UW Indexing
No ratings yet
03 UW Indexing
97 pages
08 Indexes1
No ratings yet
08 Indexes1
7 pages
Indexing
No ratings yet
Indexing
8 pages
CO3-Session-09 & 10
No ratings yet
CO3-Session-09 & 10
41 pages
Indexing
No ratings yet
Indexing
24 pages
Unit 3 Storage Strategies Indices B-Trees Hashing
No ratings yet
Unit 3 Storage Strategies Indices B-Trees Hashing
12 pages
Storage Final
No ratings yet
Storage Final
77 pages
Lecture Index Structures
No ratings yet
Lecture Index Structures
43 pages
CSE 301 Lecture-8-Indexing WT
No ratings yet
CSE 301 Lecture-8-Indexing WT
31 pages
Unit5 Dbms Indexing
No ratings yet
Unit5 Dbms Indexing
6 pages
Indexing in DBMS
No ratings yet
Indexing in DBMS
12 pages
DBMS Indexing
No ratings yet
DBMS Indexing
43 pages
Memoryhierarchy Indexing
No ratings yet
Memoryhierarchy Indexing
9 pages
Unit Iv
No ratings yet
Unit Iv
29 pages
Database Management System-203105251: Assistant Professor Computer Science & Engineering
No ratings yet
Database Management System-203105251: Assistant Professor Computer Science & Engineering
35 pages
CSE 544: Lecture 11 Storing Data, Indexes: Monday, 5/1/2006
No ratings yet
CSE 544: Lecture 11 Storing Data, Indexes: Monday, 5/1/2006
52 pages
L4 Indexing
No ratings yet
L4 Indexing
56 pages
Indexing in Database
No ratings yet
Indexing in Database
33 pages
DBMS Unit5
No ratings yet
DBMS Unit5
40 pages
Indexing - DBMS
No ratings yet
Indexing - DBMS
20 pages
Link
No ratings yet
Link
4 pages
Database Modeling - Notes-V
No ratings yet
Database Modeling - Notes-V
9 pages
File Organization and Indexing
No ratings yet
File Organization and Indexing
38 pages
DBMS Unit-4
No ratings yet
DBMS Unit-4
9 pages
Indexing and Hashing: B.Ramamurthy
No ratings yet
Indexing and Hashing: B.Ramamurthy
24 pages
Indexes
No ratings yet
Indexes
70 pages
Lecture12 (CNC 312)
No ratings yet
Lecture12 (CNC 312)
36 pages
Unit 4 Notes
No ratings yet
Unit 4 Notes
15 pages
File Organizations and Indexing: R&G Chapter 8
No ratings yet
File Organizations and Indexing: R&G Chapter 8
40 pages
Aplikasi DB-MKG 7
No ratings yet
Aplikasi DB-MKG 7
22 pages
Static Hashing in DBMS
No ratings yet
Static Hashing in DBMS
75 pages
Lecture 5 Trees
No ratings yet
Lecture 5 Trees
47 pages
Dbms Indexing
No ratings yet
Dbms Indexing
3 pages
Co3 Session 21
No ratings yet
Co3 Session 21
53 pages
Index Structures
No ratings yet
Index Structures
34 pages
Dm module-3
No ratings yet
Dm module-3
60 pages
Unit-6 Storage Strategies
No ratings yet
Unit-6 Storage Strategies
43 pages
E Receipt
No ratings yet
E Receipt
1 page
Driver Salary Payment Temp
No ratings yet
Driver Salary Payment Temp
1 page
Driver Salary Payment Temp-3
No ratings yet
Driver Salary Payment Temp-3
1 page
Class XII - Indian Society (Sociology)
No ratings yet
Class XII - Indian Society (Sociology)
161 pages
NDLC
100% (2)
NDLC
50 pages
Chapter 5 Stack and Queue
No ratings yet
Chapter 5 Stack and Queue
22 pages
It Is Suitability For Large Companies
No ratings yet
It Is Suitability For Large Companies
3 pages
Proarch Overview 2.0
No ratings yet
Proarch Overview 2.0
11 pages
BA Foundation - Lecture 2
No ratings yet
BA Foundation - Lecture 2
34 pages
SPFX
No ratings yet
SPFX
10 pages
Two (2) Types of Streams: Handling Files
No ratings yet
Two (2) Types of Streams: Handling Files
2 pages
Instance - Checklist For Changing SID and Hostname
No ratings yet
Instance - Checklist For Changing SID and Hostname
19 pages
Online MCQ Test Object Oriented Analysis and Design MCQ Set 1
No ratings yet
Online MCQ Test Object Oriented Analysis and Design MCQ Set 1
17 pages
CS Final Project
No ratings yet
CS Final Project
26 pages
Exercise 1 Linking Urls To Sap Web Lists: Its Exercises
No ratings yet
Exercise 1 Linking Urls To Sap Web Lists: Its Exercises
0 pages
Proframming For Information Securty
No ratings yet
Proframming For Information Securty
15 pages
Citrix, XenApp, NT 4.0 TSE, Presentation Server, MetaFrame Server
No ratings yet
Citrix, XenApp, NT 4.0 TSE, Presentation Server, MetaFrame Server
6 pages
Aman Practical File (1 To 5)
No ratings yet
Aman Practical File (1 To 5)
30 pages
Raghavendra Updated 22
No ratings yet
Raghavendra Updated 22
3 pages
WT 7th
No ratings yet
WT 7th
4 pages
Project Management S3 Notes
No ratings yet
Project Management S3 Notes
3 pages
Amazon Connect
No ratings yet
Amazon Connect
2 pages
C++ Program To Swap Tow Number Without Third Variable - Javatpoint
No ratings yet
C++ Program To Swap Tow Number Without Third Variable - Javatpoint
3 pages
Google Looker Studio
No ratings yet
Google Looker Studio
4 pages
B Msgs Exc PDF
No ratings yet
B Msgs Exc PDF
48 pages
Nist 767863
No ratings yet
Nist 767863
4 pages
Evolution of SCM
No ratings yet
Evolution of SCM
10 pages
UL-WEG-Motor ELÉCTRICO, Mod. ODP
No ratings yet
UL-WEG-Motor ELÉCTRICO, Mod. ODP
3 pages
Week 6-7
No ratings yet
Week 6-7
8 pages
Java Programming Solved MCQs (Set-4)
No ratings yet
Java Programming Solved MCQs (Set-4)
5 pages
All Lab Assignment Kcs-551
No ratings yet
All Lab Assignment Kcs-551
11 pages

Indexing and Hashing

Uploaded by

Indexing and Hashing

Uploaded by

26

Indexing and Hashing

DBMS: Rajeev Wankar

Secondary key index: duplicates may exist

DBMS: Rajeev Wankar

secondary key index: typically, with postings lists

Clustering (= sparse) index: records are physically

DBMS: Rajeev Wankar

Sparse Index: contains index records for only some

DBMS: Rajeev Wankar

DBMS: Rajeev Wankar

124; peterson; fifth ave.

overflow chains may become very long - what to

the most successful family of index schemes

A B/B+ tree is a rooted tree satisfying the following

If the root is a leaf (that is, there are no other nodes

B-tree print keys in sorted order?

DBMS: Rajeev Wankar

Solution B+-Tree Index Files

Advantage of B+-tree index files: automatically

DBMS: Rajeev Wankar

B+-Tree Node Structure

Ki are the search-key values

DBMS: Rajeev Wankar

Non-Leaf Nodes in B+-Trees

DBMS: Rajeev Wankar

Leaf nodes must have between 2 and 4 values

Observations about B+-trees

DBMS: Rajeev Wankar

Find all records with a search-key value of k.

1. Examine the node for the smallest search-key value

If the node reached by following the pointer above

DBMS: Rajeev Wankar

COPY middle upstairs

DBMS: Rajeev Wankar

DBMS: Rajeev Wankar

B+-Tree before and after insertion of Clearview

B-Tree Index Files

DBMS: Rajeev Wankar

B-tree (above) and B+-tree (below) on same data

DBMS: Rajeev Wankar

You might also like