Unit III-Hashing
Unit III-Hashing
Unit III-Hashing
Algorithms
1
Trees
Graphs
Hashing
File Organization
2
UNIT 3 HASHING
Supports very fast retrieval via a
key
Contents
3
1. Hash Table
◻ Hash function, Bucket, Collision, Probe
◻ Synonym, Overflow, Open hashing, Closed hashing
◻ Perfect hash function, Load density, Full table, Load factor,
rehashing
2. Issues in hashing
◻ Hash functions- properties of good hash function
◻ Division, Multiplication, Extraction, Mid-square, Folding and
universal, Collision
3. Collision resolution strategies-
◻ Open addressing and chaining
4. Hash table overflow - extended hashing
5. Dictionary- Dictionary as ADT, ordered dictionaries
6. Skip List- representation, searching and operations- insertion,
removal.
Searching - most frequent and prolonged tasks
4
The hash function provides a way for assigning numbers to the input
such that the data can be stored at the array index corresponding to
the assigned number.
Hashing is similar to indexing as it involves associating a key with a
relative record address.
With hashing the address generated appears to be random —
No obvious connection between the key and the location of the
corresponding record.
Sometimes referred to as randomizing.
With hashing, two different keys may be transformed to the same
address
Two records may be sent to the same place in a file – Collision
Two or more records that result in the same address are known as
Synonyms.
Hash Function
9
Hash
String ∑ASCII characters % table_size
Value
Key
Definition
Term
Hash Hash table is an array [0 to Max − 1]
Table of size Max
For better performance – keep table
size as prime number.
Hash A hash function is a mathematical
Functio function that maps an input value into
n an index / address.
(i.e. transforms a key into an address)
25 Key % Table_size 5
25 % 10
55 Key % Table_size
5
55 % 10
COLLISION
Key Terms
18
Collisions
Key % 10 are stored
outside the
table.
Application - Open / External Hashing
21
blocks.
LINUX
Key Terms
22
Collisions result in
storing one of the
records at another
slot in the table.
Limits the table size.
Key Terms used in Hashing
23
Hash(Key) =
Another way is to extract the first two and the last one or two
digits.
For example, for key 345678, the address is 3478 if the first two
and the last two digits are extracted or 348 if the first two and the
last digit are extracted.
OR
For 6-digit employee number get 3-digit hash address(000-999)
Select the first, third. and fourth digits (from left) and use them
as the address.
140145=101
137456=174
214562=456
Mid-Square (middle of Square)
39
The key is squared and the middle part of the result is extracted as
the hash value.
For e.g. To map the key 3121 into a hash table of size 1000 , we
square it 3121= “9740641” and extract 406 as the hash value.
Disadvantage-
For large key, it is very difficult to store its square as it should
The key is partitioned into a number of parts, each of which has the
same size.
The size of the subparts of the key is the same as that of the address.
The parts are then added together, ignoring the final carry, to form
an address.
For a key with digits, we can subdivide the digits into three parts,
add them up, and use the result as an address.
Example: If key = 356942781 and address slots are of 3 digits.
Part1 = 356, Part2 = 942, Part 3 = 781
Add them to get the sum as 2079 → ignore the final carry.
Hash value = 079
Folding Method
42
A hash table has space for 100 records. What is the probability of
collision before the table is 10% full?
(A) 0.45
(B) 0.5
(C) 0.3
(D) 0.34
For the first time there will be no collision because all the slot is empty.
Now, once the first slot is filled, after then to fill the next key in the slot
there is chances of collision by 1/100.
For the next key, 2/100.
So 1/100 + 2/100+.............+9/100.
= 0.01+0.02+0.03+.......+0.08+0.09= 0.45
Forms of Hashing - revisited
47
Open Hashing:
It allows records to be stored in unlimited space (could be Hard Disk).
It place no limitation on the size of the table.
Close Hashing:
It uses fixed space to store data.
It limits the size of the table.
Collision in Hashing - revisited
48
Collision Resolution
Q
L u
P Linked
i a
Opens List
n d K Bucket
e
addressing (Separate
e r e
u chaining)
a a y
d
r t
o
P i O
r c f
R
o P f
a
b r s
n
Data CANNOT i o be Resolution
d
e
stored in then home b
o
t
g i
address or demanded m
n
address (i.e. collision)?
g
Open addressing
51
Collisions are resolved by finding an available empty location other than the
home address (original address).
If Hash(Key) is not empty, the positions are probed in the following sequence
until an empty location is found.
End of table ?
The search is wrapped around to start and the search continues till the current collision
location.
Open addressing resolve collisions in the primary area i.e. the area that contain
all of the home addresses.
Large space is required for open addressing.
There are 4 methods used in open addressing i.e.
Linear probing
Quadratic Probing
Double hashing
Key Offset
Linear Probing
52
54
10 % 7 = 3 (FREE) 5 40
(Hash(55) + I ) % 7
( 6 + 1 ) % 7 = 0 (Occupied) 6 76
55 % 7 = 6
( 6 + 2 ) % 7 = 1 (FREE)
(OCCUPIED)
Function - ( Hash(x) + i ) MOD Max
Linear Probing
55
10 % 7 = 3 (FREE) 5 40
(Hash(55) + i ) % 7
( 6 + 1 ) % 7 = 0 (Occupied) 6 76
55 % 7 = 6
( 6 + 2 ) % 7 = 1 (FREE)
(OCCUPIED)
Function - ( Hash(x) + i ) MOD Max
Linear Probing
56
Linear Probing
57
Advantages:
It quite simple to implement.
With Replacement:
Without Replacement :
Linear Probing – with replacement
59
If the existing key’s actual address is different, then the NEW KEY
Example
Linear Probing – without replacement
61
Store the following data into a hash table of size 10 and bucket size 1. Use linear
probing for collision resolution.
12, 01, 04, 03, 07, 08, 10, 02, 05, 14 Hashing function is key % 10
12, 01, 04, 03, 07, 08, 10, 02, 05, 14 Hashing function is key % 10
Index Key Chain Index Key Chain
0 10 -1 0 10 -1
1 1 -1 1 1 -1
Chain for
2 12 5 the key 2 2 12 6
3 3 -1 3 3 -1
4 4 -1
Add 4 4 9
key 5
5 2 -1 and 5 5 -1
6 -1 14 6 2 -1
7 7 -1 7 7 -1
8 8 -1 8 8 -1
9 -1 9 14 -1
Linear Probing – Example
64
Store 12, 01, 04, 03, 07, 08, 10, 02, 05, 14 Hashing function is key % 10
//hash function to get position //function for inserting a record using linear probe
int linear_prob(int Hashtable[], int key) {
int hash(int key)
int pos, i;
{ pos = Hash(Key);
return( key % MAX); if(Hashtable[pos] == 0) // empty slot
} {
Hashtable[pos] = key;
return pos;
}
Else { // slot is not empty
for(i = pos + 1; i % MAX != pos; i++) {
if(Hashtable[i] == 0) {
Hashtable[i] = key;
return i;
}
}//for
}//else
return -1;
}
Question
66
Suppose you are given the following set of keys to insert into a
hash table that holds exactly 11 values:
113 , 117 , 97 , 100 , 114 , 108 , 116 , 105 , 99
Which of the following best demonstrates the contents of the
has table after all the keys have been inserted using linear
probing?
(A) 100, __, __, 113, 114, 105, 116, 117, 97, 108, 99
(B) 99, 100, __, 113, 114, __, 116, 117, 105, 97, 108
(C) 100, 113, 117, 97, 14, 108, 116, 105, 99, __, __
(D) 117, 114, 108, 116, 105, 99, __, __, 97, 100, 113
Quadratic Probe
67
As the offset added is NOT 1, quadratic probing SLOWS down the growth of
primary clusters.
Function - ( Hash(x) + i2 ) % M
Quadratic Probing Where i = 1,2,3,4,…. (M-1)/2
68
5 % 7 = 5 (FREE) 5 5
(Hash(24) + i2 ) % 7
( 3 + 12 ) % 7 = 4 (Occupied) 6
24 % 7 = 3
( 3 + 22 ) % 7 = 0 (FREE)
(OCCUPIED)
Example
69
Example
70
Quadratic Probe
71
Disadvantages:
Time required to square the probe number.
Impossible to generate a new address for every element in the
list.
Quadratic Probing – Function
72
//hash function to get position //function for inserting a record using quadratic probe
int quadratic_prob(int Hashtable[], int key) {
int hash(int key)
int pos, i;
{ pos = Hash(Key);
return( key % MAX); if(Hashtable[pos] == 0) // empty slot
} {
Hashtable[pos] = key;
return pos;
}
Else { // slot is not empty
for(i = 1; i % MAX != pos; i++){
pos = (pos + i * i ) % MAX;
if(Hashtable[pos] key == 0) {
Hashtable[pos] = key;
return pos;
}
}//for
} // Else return -1;
}
73
Open Hashing
Closed Hashing
Open Addressing
Linear Probing
Quadratic Probing
Primary and secondary clustering
74
78
Index Value
0 90
1
Insert 17
2 22
3
Key % 10
4
M-(key%M)
5 45
6
7 37
Insert 17 8
H1(17)=17%10 but it already occupied
9 49
Calculate H2
81
H2(key)=M-(key%M)
H2(17)=7-(17%7)=7-3=4
88
Chaining Rehashing
Unlimited number of synonyms can A limited but good number of
be synonyms are taken care of.
handled.
Additional cost to be paid is an The table size is doubled but no
overhead of multiple linked lists. additional fields of links are to be
maintained.
1. Compute Hash(I)
9
To add 20 ; binary value is 10100
Take 2 right most bits i.e. 00
In bucket, 00- local depth is 2 i.e.
overflow & global depth is also 2 –
then double the table ie.
000, 001, 010, ………………..111
DICTIONARY By: Aditya Solanki
Aditya Nair
Dictionary – Basic Information
An Abstract Data Structure which stores data in
form of Key|Value pairs.
Each Key has a Value associated/paired with it.
A dictionary with duplicates is a dictionary which
Value
allows two or more (key, value) pairs with the
same key. Key
Dictionaries are built upon Hash Tables, and the keys in the key/value pairs are
stored in memory inside these hash tables at indexes which are determined by a
Hash Function.
Now, we can be using Open Addressing or Separate Chaining Approach for the hash
table to implement our Dictionary.
Open Addressing: (In case of Collision ) Put the key in some other index location
separate from the one returned to us by the hash function.
Separate Chaining: (also Closed Addressing) In case of collision, uses Linked Lists to chain together
keys which result in the same Hash Values.
The length of the chain of nodes corresponding to a bucket decides the complexity of separate
chained hash tables.
Separate Chaining Advantages over Open Addressing :
Some Disadvantages :
• It requires the implementation of a separate data structure for chains,
1. Insertion Operation -
We will start from highest level in the list and compare key of next
node of the current node with the key to be inserted.
Key of next node is less than key to be inserted then we keep on
moving forward on the same level.
Key of next node is greater than the key to be inserted then we
store the pointer to current node i at update[i] and move one level
down and continue our search.
At the level 0, we will definitely find the position to insert given key.
Skip List: Insertion
Implementation (Pseudo Code):
Insert(list, searchKey)
local update[0...MaxLevel+1]
update[i] := x
x := x -> forward[0]
lvl := randomLevel()
for i := 0 to level do
2. Searching Operation-
Searching an element is very similar to approach for searching a spot for inserting an
element in Skip list. The basic idea is if –
Key of next node is less than search key then we keep on moving forward on the same
level
Key of next node is greater than the key to be inserted then we store the pointer to current
node i at update[i] and move one level down and continue our search.
At the lowest level (0), if the element next to the rightmost element (update[0]) has key
equal to the search key, then we have found key otherwise failure.
Skip List Searching:
Implementation(Pseudo code):
Search(list, searchKey)
x := list -> header
-- loop invariant: x -> key level downto 0 do
while x -> forward[i] -> key forward[i]
x := x -> forward[0]
if x -> key = searchKey then return x -> value
else return failure
Skip List Operation:
3. Deletion Operation-
Deletion of an element k is preceded by locating element in the Skip list
using above mentioned search algorithm.
Once the element is located, rearrangement of pointers is done to remove
element form list just like we do in singly linked list.
We start from lowest level and do rearrangement
until element next to update[i] is not k.
After deletion of element there could be levels with no elements, so we will
remove these levels as well by decrementing the level of Skip list.
Skip List Deletion:
Implementation(Pseudo code):
Delete(list, searchKey)
local update[0..MaxLevel+1]
x := list -> header
for i := list -> level downto 0 do
while x -> forward[i] -> key forward[i]
update[i] := x
x := x -> forward[0]
Implementation(contd.):
if x -> key = searchKey then
for i := 0 to list -> level do
if update[i] -> forward[i] ≠ x then break
update[i] -> forward[i] := x -> forward[i]
free(x)
while list -> level > 0 and list -> header -> forward[list ->
level] = NIL do
list -> level := list -> level – 1
Asymptotic Analysis (Time Complexity):
It power applications world over, ranging from mobile devices to sites like Twitter, Apple
and Wikipedia.
Leveldb is a fast key-value storage library written at
Google that provides an ordered mapping which uses
skip lists.
backend database for Google Chrome .
Bitcoin Core and go-ethereum stores the blockchain
metadata using a Leveldb database.
Minecraft
Autodesk AutoCAD 2016
Thank You :)
Dictionaries
129
Examples :
Webster’s dictionary
Telephone dictionary
Dictionary ADT
131
Hashing - Applications
132
A balanced tree is one of the most popular data structures used for
searching.
One of the variants of balanced trees is the skip list.
The skip list is a probabilistic data structure.
Used by many search-based applications instead of balanced trees.
A skip list stores the sorted data in the form of a linked list.
Items are stored as a hierarchy of linked lists where each list links
increasingly sparse subsequences of the items.
These supplementary lists result in an item search that is as efficient
as that of balanced binary search trees.
Since each link of the sparser lists skips over many items of the full
list in one step, the list is called skip list.
Skip List
134
These forward links are added on the basis of the probability of the
element search.
Hence, insert, search, and delete operations are performed in
logarithmic expected time.
Skip list algorithms are simpler, faster, and use less space.
Diagrammatic representation of a skip list-
135