Hashing
Hashing
Hashing
Searching Techniques
◦ Linear Search takes O(n) time to perform the search in unsorted arrays of n
elements.
◦ Binary Search takes O(log n) time to perform the search in sorted arrays of n
elements.
◦ It takes O(log n) time to perform the search in a Binary Search Tree consisting
of n elements.
◦ Advantage-
1. Division method
2. Folding method
3. Mid-square method
4. Multiplication method
1. Division Method
◦ This is the simplest and easiest method to generate a hash value. The hash
function divides the value k by M and then uses the remainder obtained.
◦ Formula:
h(K) = k mod M
Here,
◦ In this method, we map a key K into one of the M slots by taking the
remainder of K divided by M
Example
◦ For example, let the hash function be h(k) = k mod 10
where k represents the keys. Let the keys be: 12, 11,
14, 17, 16, 15, 18 and 13. The empty table will look as
follows having a table size of 10 and index starting
from 0.
◦ To insert values into the hash table, we make use of
the hash function.
h(12) = 12mod10 = 2
h(11) = 11mod10 = 1
h(14) = 14mod10 = 4
h(17) = 17mod10 = 7
h(16) = 16mod10 = 6
h(15) = 15mod10 = 5
h(18) = 18mod10 = 8
h(13) = 13mod10 = 3
◦ The output of h(k) is the index at which we need to
place the particular value in the hash table. Thus, after
inserting values, the hash table will be:
2. Folding Method
◦ This method involves two steps:
◦ Divide the key-value k into several parts, i.e., k1, k2, k3,….,kn, where each part has the same number of digits
except for the last part, which can have lesser digits than the other parts.
◦ Add the individual parts. The hash value is obtained by ignoring the last carry, if any.
◦ Formula:
k = k1, k2, k3, k4, ….., kn
s = k1+ k2 + k3 + k4 +….+ kn
h(K)= s
Here, s is obtained by adding the parts of the key k
◦ For example, if our item was the phone number 12345, we would take the digits and divide them into
groups of 2 digits
K1= 12, k2= 34, k3= 5
S= 12+34+5, we get 51. so, h(12345)=51
If we assume our hash table has 11 slots, we need to perform the extra step of dividing by 11 and
keeping the remainder. In this case, 51 % 11 is 7, so the phone number 12345 hashes to slot 7.
3. Mid Square Method
◦ The mid-square method is a very good hashing method. It involves two steps to compute the hash
value-
◦ Square the value of the key k, i.e., k2
◦ Extract the middle r digits as the hash value.
◦ Formula:
h(K) = h(k x k)
Here, k is the key value.
◦ The value of r can be decided based on the table size.
◦ Example:
◦ Suppose the hash table has 100 memory locations. So, r = 2 because two digits are required to
map the key to the memory location.
k = 60
k x k = 60 x 60
= 3600
h(60) = 60
◦ The hash value obtained is 60
4. Multiplication method
◦ This method involves two steps:
◦ Determine a constant value. A, where (0, A, 1)
◦ Add A to the key value and multiply.
◦ Consider kA's fractional portion.
◦ Multiply the outcome of the preceding step by M, the hash table's size.
◦ Formula:
h(K) = floor (M (kA mod 1))
Here, M = size of the hash table, k = key value and A = constant value
◦ For example, k = 5678, A = 0.6829, M = 200
Now, calculating the new value of h(5678):
h(5678) = floor[200(5678 x 0.6829 mod 1)]
h(5678) = floor[200(3877.5062 mod 1)]
h(5678) = floor[200(0.5062)]
h(5678) = floor[101.24]
h(5678) = 101
Collision in Hashing
◦ In hashing,
◦ Hash function is used to compute the hash value for a key.
◦ Hash value is then used as an index to store the key in the hash table.
◦ Hash function may return the same hash value for two or more keys.
◦ When the hash value of a key maps to an already occupied bucket of the hash table, it is called
as a Collision.
◦ For example, let us consider a hash table with size 10, where the following items are to be
stored:12, 24, 56, 30, 62.
Let us assume that the hash function is given by h(K) = K % 10
We take the numbers one by one, apply the hash function to it, and the result tells us the
index that the numbers will occupy in the hash table. Applying the hash function, to the
given numbers,
h(12) = 12 % 10 = 2
h(24) = 24 % 10 = 4
h(56) = 56 % 10 = 6
h(30) = 30 % 10 = 0
h(62) = 62 % 10 = 2 ← Collision Occurs
Collision Resolution Techniques
◦ Collision Resolution Techniques are used to resolve or handle the
collision.
◦ Collision resolution techniques are classified as-
Separate Chaining
◦ To handle the collision,
◦ This technique creates a linked list to the slot for which collision occurs.
◦ The new key is then inserted into the linked list.
◦ These linked lists to the slots appear like chains.
◦ That is why this technique is called separate chaining.
◦ Time Complexity-
◦ For Searching-
◦ In the worst case, all the keys might map to the same bucket of the hash table. In such a case, all the keys
will be present in a single linked list.
◦ Sequential search will have to be performed on the linked list to perform the search. So, the time taken for
searching in the worst case is O(n).
◦ For Deletion-
◦ In the worst case, the key might have to be searched first and then deleted. In the worst case, the time
taken for searching is O(n). So, the time taken for deletion in the worst case is O(n).
𝑵𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒆𝒍𝒆𝒎𝒆𝒎𝒕𝒔 𝒑𝒓𝒆𝒔𝒆𝒏𝒕 𝒊𝒏 𝒕𝒉𝒆 𝒉𝒂𝒔𝒉 𝒕𝒂𝒃𝒍𝒆
◦ Load Factor (α) =
𝑻𝒐𝒕𝒂𝒍 𝒔𝒊𝒛𝒆 𝒐𝒇 𝒕𝒉𝒆 𝒉𝒂𝒔𝒉 𝒕𝒂𝒃𝒍𝒆
◦ If Load factor (α) = constant, then time complexity of Insert, Search, Delete = Θ(1)
Using the hash function ‘key mod 7’, insert the following sequence of keys in the hash
table- 50, 700, 76, 85, 92, 73 and 101
given numbers,
h(50) = 50 % 7 = 1
h(700) = 700 % 7 = 0
h(76) = 76 % 7 = 6
h(85) = 85 % 7 = 1
h(92) = 92 % 7 = 1
h(73) = 73 % 7 = 3
h(101) = 101 % 7 = 3
Open Addressing
◦ In open addressing,
◦ Unlike separate chaining, all the keys are stored inside the hash table.
Thus, a bigger hash table is needed.
◦ No key is stored outside the hash table.
◦ If a collision occurs, alternative cells are tried until an empty cell/bucket is
found.
Array Index
S. NO. Key Hash Array Index after linear 0 700
Probing
1 50
1 50 50 % 7 = 1 1 1
2 700 700 % 7 = 0 0 0
2 85
3 76 76 % 7 = 6 6 6 3 92
4 85 85 % 7 = 1 1 2 4 73
5 92 92 % 7 = 1 1 3
5 101
6 73 73 % 7 = 3 3 4
7 101 101 % 7 = 3 3 5 6 76
Quadratic Probing
◦ Quadratic Probing eliminates the primary clustering problem of linear
probing.
◦ Collision function is quadratic.
◦ When a collision occurs, we probe for the i2 th bucket in ith iteration.
◦ We keep probing until an empty bucket is found.
◦ If the hash function evaluates to h and a search in cell h is
inconclusive, we try cells h + 12, h+22, … h + i2.
◦ i.e., It examines cells 1, 4, 9 and so on away from the original probe.
◦ let hash(x)be the slot index computed using hash function.
◦ If slot hash(x) % S is full, then we try (hash(x) + 1*1) % S
◦ If (hash(x) + 1*1) % S is also full, then we try (hash(x) + 2*2) % S
◦ If (hash(x) + 2*2) % S is also full, then we try (hash(x) + 3*3) % S
Using the hash function ‘key mod 7’, insert the following sequence of keys in the hash table-
50, 700, 76, 85, 92, 73 and 101. Use a Quadratic Probing technique for collision resolution.
Array
Array Quadratic Probing in
Index 0 700
S. NO. Key Hash after
Index case of collision
Quadratic 1 50
Probing
1 50 50 % 7 = 1 1 1
2 85
2 700 700 % 7 = 0 0 0 3 73
3 76 76 % 7 = 6 6 6 4 101
(85 % 7 + 1*1) % 7 = 2
4 85 85 % 7 = 1 1 2 5 92
5 92 92 % 7 = 1 1 (92 % 7 + 2*2) % 7 = 5 5 6 76
6 73 73 % 7 = 3 3 3
7 101 101 % 7 = 3 3 (101 % 7 + 1*1) % 7 = 4 4
Double Hashing
◦ The double hashing technique uses two hash functions. The
second hash function comes into use when the first function
causes a collision. It provides an offset index to store the value.
◦ We use another hash function hash2(x) and look for i * hash2(x)
slot in i’th rotation.
◦ The function hash2(x) must never evaluate to zero.
◦ let hash(x) be the slot index computed using hash function.
◦ If slot hash(x) % S is full, then we try (hash(x) + 1*hash2(x)) % S
◦ If (hash(x)+1*hash2(x))%S is full, then try (hash(x) + 2*hash2(x)) % S
◦ If (hash(x) + 2*hash2(x)) % S is full, then try (hash(x) + 3*hash2(x)) % S
◦ A function such as hash2(x) = R – ( x mod R) with R is a number
that must be smaller than the size of the hash table and should be
coprime with the size of the table.
Let the size of the Array
Arra Double Hashing in Index 0
hash table be 11. S. Ke Hash y case of collision after
NO. y H1(K) = K % 11 Inde 1 34
Double
Using the hash x (H1(K) + i * H2(K)) % 11
Hashing 2
function ‘H1(K) = K % 1 20 20 % 11 = 9 9 9 3 56
11’ and ‘H2(K) = 8 - (K 2 34 34 % 11 = 1 1 1 4 45
% 8)’, insert the H1(45)= 45%11=1 5
3 45 45 % 11 = 1 1 H2(45)= 8-(45%8)=3 4
following sequence (1+1*3)%11=4 6 70
of keys in the hash H1(70)= 70%11=4 7
4 70 70 % 11= 4 4 H2(70)= 8-(70%8)=2 6 8
table- 20, 34, 45, 70, (4+1*2)%11=6
9 20
and 56. Use a Double H1(56)= 56%11=1
H2(56)= 8-(56%8)=8 10
Hashing technique for 5 56 56 % 11 = 1 1 (1+1*8)%11=9 3
(1+2*8)%11=6
collision resolution.
(1+3*8)%11=3