Hashing

HASHING
Searching Techniques
◦ Linear Search takes O(n) time to perform the search in unsorted arrays of n
elements.
◦ Binary Search takes O(log n) time to perform the search in sorted arrays of n
elements.
◦ It takes O(log n) time to perform the search in a Binary Search Tree consisting
of n elements.
◦ The main drawback of these techniques is-

◦ As the number of elements increases, the search time also increases. This
becomes problematic when the total number of elements becomes too
large.
Hashing in Data Structure
◦ Hashing is a well-known technique to search for any particular element
among several elements.
◦ It minimizes the number of comparisons while performing the search.
◦ Advantage-
Unlike other searching techniques,

◦ Hashing is extremely efficient.
◦ The time taken by it to perform the search does not depend upon the total number of
elements.
◦ It completes the search with constant time complexity O(1).
Hashing Mechanism
◦ In hashing,
◦ An array data structure called a Hash table
is used to store the data items.
◦ Based on the hash key value, data items
are inserted into the hash table.
◦ Hash Key Value-

◦ Hash key value is a special value that
serves as an index for a data item.
◦ It indicates where the data item should be
stored in the hash table.
◦ Hash key value is generated using a hash
function.
Hash Function
◦ Hash function is a function that maps any big number or string to a small integer
value.
◦ The main idea behind the hashing is to create the (key/value) pairs. If the key is
given, then the hash function algorithm computes the index at which the value
would be stored.
◦ Hash function takes the data item as an input (key) and returns a small integer
value as an output (hash value).
hash value = hash function (key)
◦ Hash value represents the original string of characters, but it is normally smaller
than the original.
◦ Hash value of the data item (key) is then used as an index for storing it in the hash
table.
Key Properties of Hash Functions
◦ Deterministic: A hash function must consistently produce the same output
for the same input.
◦ Fixed Output Size: The output of a hash function should have a fixed size,
regardless of the size of the input.
◦ Efficiency: The hash function should be able to process input quickly.
◦ Uniformity: The hash function should distribute the hash values uniformly
across the output space to avoid clustering.
◦ Pre-image Resistance: It should be computationally infeasible to reverse
the hash function, i.e., to find the original input given a hash value.
◦ Collision Resistance: It should be difficult to find two different inputs that
produce the same hash value.
◦ Avalanche Effect: A small change in the input should produce a
significantly different hash value.
Types of Hash Functions
◦ There are different ways through which we can calculate hash
functions:
1. Division method
2. Folding method
3. Mid-square method
4. Multiplication method
1. Division Method
◦ This is the simplest and easiest method to generate a hash value. The hash
function divides the value k by M and then uses the remainder obtained.
◦ Formula:
h(K) = k mod M
Here,
k is the key value, and
M is the size of the hash table.
◦ In this method, we map a key K into one of the M slots by taking the
remainder of K divided by M
Example
◦ For example, let the hash function be h(k) = k mod 10
where k represents the keys. Let the keys be: 12, 11,
14, 17, 16, 15, 18 and 13. The empty table will look as
follows having a table size of 10 and index starting
from 0.
◦ To insert values into the hash table, we make use of
the hash function.
h(12) = 12mod10 = 2
h(11) = 11mod10 = 1
h(14) = 14mod10 = 4
h(17) = 17mod10 = 7
h(16) = 16mod10 = 6
h(15) = 15mod10 = 5
h(18) = 18mod10 = 8
h(13) = 13mod10 = 3
◦ The output of h(k) is the index at which we need to
place the particular value in the hash table. Thus, after
inserting values, the hash table will be:
2. Folding Method
◦ This method involves two steps:
◦ Divide the key-value k into several parts, i.e., k1, k2, k3,….,kn, where each part has the same number of digits
except for the last part, which can have lesser digits than the other parts.
◦ Add the individual parts. The hash value is obtained by ignoring the last carry, if any.
◦ Formula:
k = k1, k2, k3, k4, ….., kn
s = k1+ k2 + k3 + k4 +….+ kn
h(K)= s
Here, s is obtained by adding the parts of the key k
◦ For example, if our item was the phone number 12345, we would take the digits and divide them into
groups of 2 digits
K1= 12, k2= 34, k3= 5
S= 12+34+5, we get 51. so, h(12345)=51
If we assume our hash table has 11 slots, we need to perform the extra step of dividing by 11 and
keeping the remainder. In this case, 51 % 11 is 7, so the phone number 12345 hashes to slot 7.
3. Mid Square Method
◦ The mid-square method is a very good hashing method. It involves two steps to compute the hash
value-
◦ Square the value of the key k, i.e., k2
◦ Extract the middle r digits as the hash value.
◦ Formula:
h(K) = h(k x k)
Here, k is the key value.
◦ The value of r can be decided based on the table size.
◦ Example:
◦ Suppose the hash table has 100 memory locations. So, r = 2 because two digits are required to
map the key to the memory location.
k = 60
k x k = 60 x 60
= 3600
h(60) = 60
◦ The hash value obtained is 60
4. Multiplication method
◦ This method involves two steps:
◦ Determine a constant value. A, where (0, A, 1)
◦ Add A to the key value and multiply.
◦ Consider kA's fractional portion.
◦ Multiply the outcome of the preceding step by M, the hash table's size.
◦ Formula:
h(K) = floor (M (kA mod 1))
Here, M = size of the hash table, k = key value and A = constant value
◦ For example, k = 5678, A = 0.6829, M = 200
Now, calculating the new value of h(5678):
h(5678) = floor[200(5678 x 0.6829 mod 1)]
h(5678) = floor[200(3877.5062 mod 1)]
h(5678) = floor[200(0.5062)]
h(5678) = floor[101.24]
h(5678) = 101
Collision in Hashing
◦ In hashing,
◦ Hash function is used to compute the hash value for a key.
◦ Hash value is then used as an index to store the key in the hash table.
◦ Hash function may return the same hash value for two or more keys.
◦ When the hash value of a key maps to an already occupied bucket of the hash table, it is called
as a Collision.
◦ For example, let us consider a hash table with size 10, where the following items are to be
stored:12, 24, 56, 30, 62.
Let us assume that the hash function is given by h(K) = K % 10
We take the numbers one by one, apply the hash function to it, and the result tells us the
index that the numbers will occupy in the hash table. Applying the hash function, to the
given numbers,
h(12) = 12 % 10 = 2
h(24) = 24 % 10 = 4
h(56) = 56 % 10 = 6
h(30) = 30 % 10 = 0
h(62) = 62 % 10 = 2 ← Collision Occurs
Collision Resolution Techniques
◦ Collision Resolution Techniques are used to resolve or handle the
collision.
◦ Collision resolution techniques are classified as-
Separate Chaining
◦ To handle the collision,
◦ This technique creates a linked list to the slot for which collision occurs.
◦ The new key is then inserted into the linked list.
◦ These linked lists to the slots appear like chains.
◦ That is why this technique is called separate chaining.
◦ Time Complexity-
◦ For Searching-
◦ In the worst case, all the keys might map to the same bucket of the hash table. In such a case, all the keys
will be present in a single linked list.
◦ Sequential search will have to be performed on the linked list to perform the search. So, the time taken for
searching in the worst case is O(n).
◦ For Deletion-
◦ In the worst case, the key might have to be searched first and then deleted. In the worst case, the time
taken for searching is O(n). So, the time taken for deletion in the worst case is O(n).
𝑵𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒆𝒍𝒆𝒎𝒆𝒎𝒕𝒔 𝒑𝒓𝒆𝒔𝒆𝒏𝒕 𝒊𝒏 𝒕𝒉𝒆 𝒉𝒂𝒔𝒉 𝒕𝒂𝒃𝒍𝒆
◦ Load Factor (α) =
𝑻𝒐𝒕𝒂𝒍 𝒔𝒊𝒛𝒆 𝒐𝒇 𝒕𝒉𝒆 𝒉𝒂𝒔𝒉 𝒕𝒂𝒃𝒍𝒆
◦ If Load factor (α) = constant, then time complexity of Insert, Search, Delete = Θ(1)
Using the hash function ‘key mod 7’, insert the following sequence of keys in the hash
table- 50, 700, 76, 85, 92, 73 and 101
Applying the hash function to the
given numbers,
h(50) = 50 % 7 = 1
h(700) = 700 % 7 = 0
h(76) = 76 % 7 = 6
h(85) = 85 % 7 = 1
h(92) = 92 % 7 = 1
h(73) = 73 % 7 = 3
h(101) = 101 % 7 = 3
Open Addressing
◦ In open addressing,
◦ Unlike separate chaining, all the keys are stored inside the hash table.
Thus, a bigger hash table is needed.
◦ No key is stored outside the hash table.
◦ If a collision occurs, alternative cells are tried until an empty cell/bucket is
found.
◦ Techniques used for open addressing are-

◦ Linear Probing
◦ Quadratic Probing
◦ Double Hashing
Linear Probing
◦ In case of a collision, it probes through the hash table
linearly, looking for an empty slot, and assigns the free slot to
the value.
◦ Let hash(x) be the slot index computed using the hash
function and S be the table size
◦ If slot hash(x) % S is full, then we try (hash(x) + 1) % S
◦ If (hash(x) + 1) % S is also full, then we try (hash(x) + 2) % S
◦ If (hash(x) + 2) % S is also full, then we try (hash(x) + 3) % S
◦ Clustering:
◦ The main problem with linear probing is clustering, many
consecutive elements form groups, and it starts taking time to
find a free slot or to search for an element
Using the hash function ‘key mod 7’, insert the following sequence of keys in
the hash table- 50, 700, 76, 85, 92, 73 and 101. Use a linear probing technique
for collision resolution.
Applying the hash function to the given numbers,
Array Index
S. NO. Key Hash Array Index after linear 0 700
Probing
1 50
1 50 50 % 7 = 1 1 1
2 700 700 % 7 = 0 0 0
2 85
3 76 76 % 7 = 6 6 6 3 92
4 85 85 % 7 = 1 1 2 4 73
5 92 92 % 7 = 1 1 3
5 101
6 73 73 % 7 = 3 3 4
7 101 101 % 7 = 3 3 5 6 76
Quadratic Probing
◦ Quadratic Probing eliminates the primary clustering problem of linear
probing.
◦ Collision function is quadratic.
◦ When a collision occurs, we probe for the i2 th bucket in ith iteration.
◦ We keep probing until an empty bucket is found.
◦ If the hash function evaluates to h and a search in cell h is
inconclusive, we try cells h + 12, h+22, … h + i2.
◦ i.e., It examines cells 1, 4, 9 and so on away from the original probe.
◦ let hash(x)be the slot index computed using hash function.
◦ If slot hash(x) % S is full, then we try (hash(x) + 1*1) % S
◦ If (hash(x) + 1*1) % S is also full, then we try (hash(x) + 2*2) % S
◦ If (hash(x) + 2*2) % S is also full, then we try (hash(x) + 3*3) % S
Using the hash function ‘key mod 7’, insert the following sequence of keys in the hash table-
50, 700, 76, 85, 92, 73 and 101. Use a Quadratic Probing technique for collision resolution.
Applying the hash function to the given numbers,
Array
Array Quadratic Probing in
Index 0 700
S. NO. Key Hash after
Index case of collision
Quadratic 1 50
Probing
1 50 50 % 7 = 1 1 1
2 85
2 700 700 % 7 = 0 0 0 3 73
3 76 76 % 7 = 6 6 6 4 101
(85 % 7 + 1*1) % 7 = 2
4 85 85 % 7 = 1 1 2 5 92
5 92 92 % 7 = 1 1 (92 % 7 + 2*2) % 7 = 5 5 6 76
6 73 73 % 7 = 3 3 3
7 101 101 % 7 = 3 3 (101 % 7 + 1*1) % 7 = 4 4
Double Hashing
◦ The double hashing technique uses two hash functions. The
second hash function comes into use when the first function
causes a collision. It provides an offset index to store the value.
◦ We use another hash function hash2(x) and look for i * hash2(x)
slot in i’th rotation.
◦ The function hash2(x) must never evaluate to zero.
◦ let hash(x) be the slot index computed using hash function.
◦ If slot hash(x) % S is full, then we try (hash(x) + 1*hash2(x)) % S
◦ If (hash(x)+1*hash2(x))%S is full, then try (hash(x) + 2*hash2(x)) % S
◦ If (hash(x) + 2*hash2(x)) % S is full, then try (hash(x) + 3*hash2(x)) % S
◦ A function such as hash2(x) = R – ( x mod R) with R is a number
that must be smaller than the size of the hash table and should be
coprime with the size of the table.
Let the size of the Array
Arra Double Hashing in Index 0
hash table be 11. S. Ke Hash y case of collision after
NO. y H1(K) = K % 11 Inde 1 34
Double
Using the hash x (H1(K) + i * H2(K)) % 11
Hashing 2
function ‘H1(K) = K % 1 20 20 % 11 = 9 9 9 3 56
11’ and ‘H2(K) = 8 - (K 2 34 34 % 11 = 1 1 1 4 45
% 8)’, insert the H1(45)= 45%11=1 5
3 45 45 % 11 = 1 1 H2(45)= 8-(45%8)=3 4
following sequence (1+1*3)%11=4 6 70
of keys in the hash H1(70)= 70%11=4 7
4 70 70 % 11= 4 4 H2(70)= 8-(70%8)=2 6 8
table- 20, 34, 45, 70, (4+1*2)%11=6
9 20
and 56. Use a Double H1(56)= 56%11=1
H2(56)= 8-(56%8)=8 10
Hashing technique for 5 56 56 % 11 = 1 1 (1+1*8)%11=9 3
(1+2*8)%11=6
collision resolution.
(1+3*8)%11=3

Hashing

Uploaded by

Copyright:

Available Formats

Hashing

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Hashing

Uploaded by

Copyright:

Available Formats

HASHING

◦ The main drawback of these techniques is-

◦ It minimizes the number of comparisons while performing the search.

Unlike other searching techniques,

◦ Hash Key Value-

k is the key value, and

M is the size of the hash table.

Applying the hash function to the

◦ Techniques used for open addressing are-

Applying the hash function to the given numbers,

Applying the hash function to the given numbers,

You might also like