0% found this document useful (0 votes)
39 views18 pages

Hash Function in Data Structures

The document provides an overview of hash functions in data structures, explaining their role in mapping keys to indices in hash tables for efficient data retrieval. It discusses various hashing methods, collision resolution techniques, and the importance and limitations of hashing. Additionally, it outlines specific hashing methods like Division, Mid Square, Digit Folding, and Multiplication, along with their pros and cons.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views18 pages

Hash Function in Data Structures

The document provides an overview of hash functions in data structures, explaining their role in mapping keys to indices in hash tables for efficient data retrieval. It discusses various hashing methods, collision resolution techniques, and the importance and limitations of hashing. Additionally, it outlines specific hashing methods like Division, Mid Square, Digit Folding, and Multiplication, along with their pros and cons.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 18

Hash Function in Data Structures: An Overview

The hash function in Data Structures is a function that takes a key and returns an index into the hash table.
Have you ever heard of hashing but aren't sure how it works or why it's important? Hashing is the process of
mapping a variable-length input data set into a finite-sized output data set. It increases your efficiency in
retrieving the desired result from a bunch of data sets and even storing it.
In this DSA tutorial, we will endeavor to unravel key hashing concepts, shedding light on its vital role and why
it has emerged as a go-to technique for efficiently managing extensive datasets within the realm of the Dsa
Course Online
What is Hashing in Data Structures?
In this technique, we give an input called a key to the hash function. The function uses this key and generates
the unique index corresponding to that value in the hash table. After that, it returns the value stored at that
index which is known as the hash value.
Data can be hashed into a shorter, fixed-length value for quicker access using a key or set of characters. This is
how key-value pairs are stored in hash tables. The representation of the hash function looks like this:
hash=hashfunc(key)

How does Hashing in Data Structures Work?

 Hash key: It is the data you want to be hashed in the hash table. The hashing algorithm translates the
key to the hash value. This identifier can be a string or an integer. There are some types of hash keys:
1. Public key - It is an open key used solely for data encryption.
2. Private key - It is known as a symmetric key used for both purposes, encryption and decryption.
3. SSH public key - SSH is a set of both public and private keys.
 Hash Function: It performs the mathematical operation of accepting the key value as input and
producing the hash code or hash value as the output. Some of the characteristics of an ideal hash
function are as follows:

1. It must produce the same hash value for the same hash key to be deterministic.
2. Every input has a unique hash code. This feature is known as the hash property.
3. It must be collision-friendly.
4. A little bit of change leads to a drastic change in the output.
5. The calculation must be quick

 Hash Table: It is a type of data structure that stores data in an array format. The table maps keys to
values using a hash function.
Read More - Data Structure Interview Questions for Experienced
Use cases of Hashing In DSA

 Password Storage: Hash functions are commonly used to securely store passwords. Instead of storing
the actual passwords, the system stores their hash values. When a user enters a password, it is hashed
and compared with the stored hash value for authentication.
 Data Integrity: Hashing is used to ensure data integrity by generating hash values for files or messages.
By comparing the hash values before and after transmission or storage, it's possible to detect if any
changes or tampering occurred.
 Data Retrieval: Hashing is used in data structures like hash tables, which provide efficient data
retrieval based on key-value pairs. The hash value serves as an index to store and retrieve data quickly.
 Digital Signatures: Hash functions are an integral part of digital signatures. They are used to generate a
unique hash value for a message, which is then encrypted with the signer's private key. This allows for
verification of the authenticity and integrity of the message using the signer's public key.

Types of Hash functions

There are many hash functions that use numeric or alphanumeric keys. This article focuses on discussing
different hash functions:
1. Division Method.
2. Mid Square Method.
3. Folding Method.
4. Multiplication Method.

Let’s begin discussing these methods in detail.

1. Division Method:

This is the most simple and easiest method to generate a hash value. The hash function divides the value k by M
and then uses the remainder obtained.

Formula:

h(K) = k mod M

Here,
k is the key value, and
M is the size of the hash table.

It is best suited that M is a prime number as that can make sure the keys are more uniformly distributed. The
hash function is dependent upon the remainder of a division.

Example:

k = 12345
M = 95
h(12345) = 12345 mod 95
= 90

k = 1276
M = 11
h(1276) = 1276 mod 11
=0
Pros:
1. This method is quite good for any value of M.
2. The division method is very fast since it requires only a single division operation.

Cons:
1. This method leads to poor performance since consecutive keys map to consecutive hash values in the
hash table.
2. Sometimes extra care should be taken to choose the value of M.

2. Mid Square Method:

The mid-square method is a very good hashing method. It involves two steps to compute the hash value-
1. Square the value of the key k i.e. k2
2. Extract the middle r digits as the hash value.

Formula:

h(K) = h(k x k)

Here,
k is the key value.

The value of r can be decided based on the size of the table.

Example:

Suppose the hash table has 100 memory locations. So r = 2 because two digits are required to map the key to
the memory location.

k = 60
k x k = 60 x 60
= 3600
h(60) = 60

The hash value obtained is 60

Pros:
1. The performance of this method is good as most or all digits of the key value contribute to the result.
This is because all digits in the key contribute to generating the middle digits of the squared result.
2. The result is not dominated by the distribution of the top digit or bottom digit of the original key value.

Cons:
1. The size of the key is one of the limitations of this method, as the key is of big size then its square will
double the number of digits.
2. Another disadvantage is that there will be collisions but we can try to reduce collisions.

3. Digit Folding Method:

This method involves two steps:


1. Divide the key-value k into a number of parts i.e. k1, k2, k3,….,kn, where each part has the same
number of digits except for the last part that can have lesser digits than the other parts.
2. Add the individual parts. The hash value is obtained by ignoring the last carry if any.

Formula:

k = k1, k2, k3, k4, ….., kn


s = k1+ k2 + k3 + k4 +….+ kn
h(K)= s

Here,
s is obtained by adding the parts of the key k

Example:

k = 12345
k1 = 12, k2 = 34, k3 = 5
s = k1 + k2 + k3
= 12 + 34 + 5
= 51
h(K) = 51

Note:
The number of digits in each part varies depending upon the size of the hash table. Suppose for example the
size of the hash table is 100, then each part must have two digits except for the last part which can have a lesser
number of digits.

4. Multiplication Method

This method involves the following steps:


1. Choose a constant value A such that 0 < A < 1.
2. Multiply the key value with A.
3. Extract the fractional part of kA.
4. Multiply the result of the above step by the size of the hash table i.e. M.
5. The resulting hash value is obtained by taking the floor of the result obtained in step 4.

Formula:

h(K) = floor (M (kA mod 1))

Here,
M is the size of the hash table.
k is the key value.
A is a constant value.

Example:

k = 12345
A = 0.357840
M = 100

h(12345) = floor[ 100 (12345*0.357840 mod 1)]


= floor[ 100 (4417.5348 mod 1) ]
= floor[ 100 (0.5348) ]
= floor[ 53.48 ]
= 53
Pros:

The advantage of the multiplication method is that it can work with any value between 0 and 1, although there
are some values that tend to give better results than the rest.

Cons:

The multiplication method is generally suitable when the table size is the power of two, then the whole process
of computing the index by the key using multiplication hashing is very fast.

Digit analysis is a technique for generating non-cryptographic hashes.

In general, a hash function is one that accepts an input of arbitrary length and distribution and transforms it
(typically via a technique that non-recoverably discards information) into an output that is of fixed size and
evenly distributed.

If all of the inputs that must be hashed are known in advance, then we can write a program that analyzes those
inputs and discards the digits (or characters) that are least-uniformly-distributed until we have output values
that are of the desired size and are uniformly distributed.

So (for example), if our inputs were 1938m, 3391i, 3091b, 4903a, 4930a, 6573b, and 4891c, we analyze the
digits: there are three 1s, six 9s, seven 3s, two 8s, one m, one i, three 0s, two bs, three 4s, two as, one 6, one 5,
one 7, and one c. So we might decide to eliminate 1, 8, m, i, 0, b, 4, a, 6, 5, 7 and c from the inputs to produce
93, 339, 39, 93, 93, 3, 9. Let’s further say we only want a one-character hash, so we take only the first
character: 9, 3, 3, 9, 9, 3, 9. We now have a hash function that hashes these seven identifiers more-or-less
evenly into one of two buckets.

What is a Hash Collision?


Collision in a hash table is a term used to denote the phenomena when the hashing algorithm produces the same
hash value for two or more keys using a hash function. However, it's crucial to note that collisions are not an
issue; rather, they constitute a key component of hashing algorithms. Because various hashing methods used in
data structures convert each input into a fixed-length code regardless of its length, collisions happen. The
hashing algorithms will eventually yield repeating hashes since there are an infinite number of inputs and a
finite number of outputs.

Collision Resolution Techniques in Data Structures

1. Open hashing/separate chaining/closed addressing


2. Open addressing/closed hashing

1. Open hashing/separate chaining/closed addressing

A typical collision handling technique called "separate chaining" links components with the same hash using
linked lists. It is also known as closed addressing and employs arrays of linked lists to successfully prevent
hash collisions.

Advantages:

 Implementation is simple and easy


 We can add more keys to the table because the hash table has a lot of empty places.
 Less sensitive than average to changing load factors
 Typically utilized when there is uncertainty on the number and frequency of keys to be used in the hash
table.
Disadvantages:

 Space is wasted
 The length of the chain lengthens the search period.
 Comparatively worse cache performance to closed hashing.

1. Closed hashing (Open addressing)

Instead of using linked lists, open addressing stores each entry in the array itself. The hash value is not used to
locate objects. To insert, it first verifies the array beginning from the hashed index and then searches for an
empty slot using probing sequences. The probe sequence, with changing gaps between subsequent probes, is the
process of progressing through entries. There are three methods for dealing with collisions in closed hashing:

1. Linear Probing

Linear probing includes inspecting the hash table sequentially from the very beginning. If the site requested is
already occupied, a different one is searched. The distance between probes in linear probing is typically fixed
(often set to a value of 1).

Formula

index = key % hashTableSize

Sequence

index = ( hash(n) % T)
(hash(n) + 1) % T
(hash(n) + 2) % T
(hash(n) + 3) % T … and so on.

1. Quadratic Probing

The distance between subsequent probes or entry slots is the only difference between linear and quadratic
probing. You must begin traversing until you find an available hashed index slot for an entry record if the slot is
already taken. By adding each succeeding value of any arbitrary polynomial in the original hashed index, the
distance between slots is determined.

Formula

index = index % hashTableSize

Sequence

index = ( hash(n) % T)
(hash(n) + 1 x 1) % T
(hash(n) + 2 x 2) % T
(hash(n) + 3 x 3) % T … and so on

1. Double-Hashing

The time between probes is determined by yet another hash function. Double hashing is an optimized technique
for decreasing clustering. The increments for the probing sequence are computed using an extra hash function.

Formula

(first hash(key) + i * secondHash(key)) % size of the table


Sequence

index = hash(x) % S
(hash(x) + 1*hash2(x)) % S
(hash(x) + 2*hash2(x)) % S
(hash(x) + 3*hash2(x)) % S … and so on
Importance of Hashing

1. Easy retrieval of required information from large data sets in an efficient manner.
2. The hash code produced by the hash function serves as the unique identifier in the data set thus
maintaining data integrity.
3. The data is stored in a structured manner as there is an index for every record in the hash table. This
ensures efficient storage and retrieval.

Limitations of Hashing

1. Many a time there leads to a situation of collision where two or more inputs have the same hash value.
2. The performance of the hashing algorithm depends upon the quality of the hash function. Sometimes, a
not well-thought-of hash function may lead to collisions thus reducing the efficiency of the algorithm.

Collision Resolution Techniques


In Hashing, hash functions were used to generate hash values. The hash value is used to create an index for the
keys in the hash table. The hash function may return the same hash value for two or more keys. When two or
more keys have the same hash value, a collision happens. To handle this collision, we use Collision Resolution
Techniques.

Collision Resolution Techniques


There are mainly two methods to handle collision:
1. Separate Chaining
2. Open Addressing
1) Separate Chaining

The idea behind Separate Chaining is to make each cell of the hash table point to a linked list of records that
have the same hash function value. Chaining is simple but requires additional memory outside the table.
Example: We have given a hash function and we have to insert some elements in the hash table using a
separate chaining method for collision resolution technique.
Hash function = key % 5,
Elements = 12, 15, 22, 25 and 37.
Let's see step by step approach to how to solve the above problem:

Collision Resolution Techniques


Last Updated : 10 Mar, 2025


In Hashing, hash functions were used to generate hash values. The hash
value is used to create an index for the keys in the hash table. The hash
function may return the same hash value for two or more keys. When two or
more keys have the same hash value, a collision happens. To handle this
collision, we use Collision Resolution Techniques.

Collision Resolution Techniques


There are mainly two methods to handle collision:
1. Separate Chaining
2. Open Addressing
1) Separate Chaining

The idea behind Separate Chaining is to make each cell of the hash table point
to a linked list of records that have the same hash function value. Chaining is
simple but requires additional memory outside the table.
Example: We have given a hash function and we have to insert some
elements in the hash table using a separate chaining method for collision
resolution technique.
Hash function = key % 5,
Elements = 12, 15, 22, 25 and 37.
Let's see step by step approach to how to solve the above problem:
1/5

Please You Own Hash Table with Chaining for implementation of


this technique

2) Open Addressing

In open addressing, all elements are stored in the hash table itself. Each table
entry contains either a record or NIL. When searching for an element, we
examine the table slots one by one until the desired element is found or it is
clear that the element is not in the table.
2.a) Linear Probing
In linear probing, the hash table is searched sequentially that starts from the
original location of the hash. If in case the location that we get is already
occupied, then we check for the next location.
Algorithm:
1. Calculate the hash key. i.e. key = data % size
2. Check, if hashTable[key] is empty
 store the value directly by hashTable[key] = data
3. If the hash index already has some value then
 check for next index using key = (key+1) % size
4. Check, if the next index is available hashTable[key] then store
the value. Otherwise try for next index.
5. Do the above process till we find the space.
Example: Let us consider a simple hash function as “key mod 5” and a
sequence of keys that are to be inserted are 50, 70, 76, 85, 93.
2/6

Please refer Your Own Hash Table with Linear Probing in Open
Addressing for implementation details.
2.b) Quadratic Probing
Quadratic probing is an open addressing scheme in computer programming for
resolving hash collisions in hash tables. Quadratic probing operates by taking
the original hash index and adding successive values of an arbitrary quadratic
polynomial until an open slot is found.
An example sequence using quadratic probing is:
H+1 2 ,H+2 2 ,H+3 2 ,H+4 2 ...................... H + k 2

This method is also known as the mid-square method because in this method
we look for i2-th probe (slot) in i-th iteration and the value of i = 0, 1, . . . n – 1.
We always start from the original hash location. If only the location is occupied
then we check the other slots.
Let hash(x) be the slot index computed using the hash function and n be the
size of the hash table.
If the slot hash(x) % n is full, then we try (hash(x) + 1 2 ) % n.
If (hash(x) + 1 2 ) % n is also full, then we try (hash(x) + 2 2 ) % n.
If (hash(x) + 2 2 ) % n is also full, then we try (hash(x) + 3 2 ) % n.
This process will be repeated for all the values of i until an empty
slot is found
Example: Let us consider table Size = 7, hash function as Hash(x) = x % 7 and
collision resolution strategy to be f(i) = i 2 . Insert = 22, 30, and 50

1/4
Please refer Your Own Hash Table with Quadratic Probing in Open
Addressing for implementation.
2.c) Double Hashing
Double hashing is a collision resolving technique in Open Addressed Hash
tables. Double hashing make use of two hash function,
 The first hash function is h1(k) which takes the key and gives out a location
on the hash table. But if the new location is not occupied or empty then we
can easily place our key.
 But in case the location is occupied (collision) we will use secondary hash-
function h2(k) in combination with the first hash-function h1(k) to find the
new location on the hash table.
This combination of hash functions is of the form
h(k, i) = (h1(k) + i * h2(k)) % n
where
 i is a non-negative integer that indicates a collision number,
 k = element/key which is being hashed
 n = hash table size.
Complexity of the Double hashing algorithm:
Time complexity: O(n)
Example: Insert the keys 27, 43, 692, 72 into the Hash Table of size 7. where
first hash-function is h1(k) = k mod 7 and second hash-function is h2(k) = 1
+ (k mod 5)

2/5

1/5
Please You Own Hash Table with Chaining for implementation of this technique

2) Open Addressing

In open addressing, all elements are stored in the hash table itself. Each table entry contains either a record or
NIL. When searching for an element, we examine the table slots one by one until the desired element is found
or it is clear that the element is not in the table.
2.a) Linear Probing
In linear probing, the hash table is searched sequentially that starts from the original location of the hash. If in
case the location that we get is already occupied, then we check for the next location.
Algorithm:
1. Calculate the hash key. i.e. key = data % size
2. Check, if hashTable[key] is empty
 store the value directly by hashTable[key] = data
3. If the hash index already has some value then
 check for next index using key = (key+1) % size
4. Check, if the next index is available hashTable[key] then store the value. Otherwise try for
next index.
5. Do the above process till we find the space.
Example: Let us consider a simple hash function as “key mod 5” and a sequence of keys that are to be inserted
are 50, 70, 76, 85, 93.

2/6
2.b) Quadratic Probing
Quadratic probing is an open addressing scheme in computer programming for resolving hash collisions in hash
tables. Quadratic probing operates by taking the original hash index and adding successive values of an
arbitrary quadratic polynomial until an open slot is found.
An example sequence using quadratic probing is:
H + 1 2 , H + 2 2 , H + 3 2 , H + 4 2 ...................... H + k 2
This method is also known as the mid-square method because in this method we look for i2-th probe (slot) in i-
th iteration and the value of i = 0, 1, . . . n – 1. We always start from the original hash location. If only the
location is occupied then we check the other slots.
Let hash(x) be the slot index computed using the hash function and n be the size of the hash table.
If the slot hash(x) % n is full, then we try (hash(x) + 1 2 ) % n.
If (hash(x) + 1 2 ) % n is also full, then we try (hash(x) + 2 2 ) % n.
If (hash(x) + 2 2 ) % n is also full, then we try (hash(x) + 3 2 ) % n.
This process will be repeated for all the values of i until an empty slot is found
Example: Let us consider table Size = 7, hash function as Hash(x) = x % 7 and collision resolution strategy to
be f(i) = i 2 . Insert = 22, 30, and 50
2.c) Double Hashing
Double hashing is a collision resolving technique in Open Addressed Hash tables. Double hashing make use of
two hash function,
 The first hash function is h1(k) which takes the key and gives out a location on the hash table. But if the
new location is not occupied or empty then we can easily place our key.
 But in case the location is occupied (collision) we will use secondary hash-function h2(k) in combination
with the first hash-function h1(k) to find the new location on the hash table.
This combination of hash functions is of the form
h(k, i) = (h1(k) + i * h2(k)) % n
where
 i is a non-negative integer that indicates a collision number,
 k = element/key which is being hashed
 n = hash table size.
Complexity of the Double hashing algorithm:
Time complexity: O(n)
Example: Insert the keys 27, 43, 692, 72 into the Hash Table of size 7. where first hash-function is h1(k) = k
mod 7 and second hash-function is h2(k) = 1 + (k mod 5)

2/5

You might also like