Hashing in Data Structures

Download as pdf or txt
Download as pdf or txt
You are on page 1of 27

DATA STRUCTURES AND ALGORITHMS

Hashing in Data Structures.

What is Hashing?

• Hashing is the process of mapping large amount of data item to smaller table with the help
of hashing function. It uses hash tables to store the data in an array format. Each value in the
array has been assigned a unique index number. Hash tables use a technique to generate these
unique index numbers for each value stored in an array format. This technique is called the
hash technique. You only need to find the index of the desired item, rather than finding the
data. With indexing, you can quickly scan the entire list and retrieve the item you wish.
Indexing also helps in inserting operations when you need to insert data at a specific location.
No matter how big or small the table is, you can update and retrieve data within seconds.
• Hashing is also known as Hashing Algorithm or Message Digest Function.
• It is a technique to convert a range of key values into a range of indexes of an array.

Suppose we want to design a system for storing users and their passwords. And we want to access to

be performed efficiently:
1. Insert a user name and corresponding password.
2. Search a user and change their password.
3. Delete a user and corresponding password.

Think about it. We can use the following data structures to maintain it: array with pair element,

Linked List, Binary Tree.

GEORGE WAINAINA 0718313173/0731919231 georgewainaina58@gmail.com Page 1


The above data structures all of these operations can be guaranteed to be in performed with O(1)

time using hash table. O(1) means that, no matter how much data, it will execute in constant time.

O(n) means that it is proportional to the amount of data.

USER HASH CODE / HASH VALUE INDEX

USER1 368 4

USER2 369 5

USER3 370 6

USER4 371 0

USER5 372 1

INDEX 0 1 2 3 4 5 6

PASSWORD Password4 Password5 Password1 Password2 Password3

A C program to convert string to ASCII VALUES:


/*
* C Program To Find the Sum of ASCII values of All Characters in a
* given String
*/
#include <stdio.h>
#include <string.h>
void main()
{
int sum = 0, i, len;
char string1[100];

printf("Enter the string : ");


scanf("%[^\n]s", string1);
len = strlen(string1);
for (i = 0; i < len; i++)
{

GEORGE WAINAINA 0718313173/0731919231 georgewainaina58@gmail.com Page 2


sum = sum + string1[i];
}
printf("\nSum of all characters : %d ",sum);
}

OUTPUT:

• Hashing is used to facilitate the next level searching method when compared with the linear
or binary search.
• Hashing allows to update and retrieve any data entry in a constant time O(1).
• Constant time O(1) means the operation does not depend on the size of the data.
• Hashing is used with a database to enable items to be retrieved more quickly.
• It is used in the encryption and decryption of digital signatures.

Types of hashing in data structure is a two-step process.

1. The hash function converts the item into a small integer or hash value. This integer is used
as an index to store the original data.
2. It stores the data in a hash table. You can use a hash key to locate data quickly.

Examples of Hashing in Data Structure

The following are real-life examples of hashing in the data structure –


• In schools, the teacher assigns a unique roll number to each student. Later, the teacher uses
that roll number to retrieve information about that student.
• A library has an infinite number of books. The librarian assigns a unique number to each book.
This unique number helps in identifying the position of the books on the bookshelf.
• Hashing is designed to solve the problem of needing to efficiently find or store an item in a
collection. For example, if we have a list of 10,000 words of English and we want to check if
a given word is in the list, it would be inefficient to successively compare the word with all
10,000 items until we find a match.

GEORGE WAINAINA 0718313173/0731919231 georgewainaina58@gmail.com Page 3


What is Hash Function?

A fixed process converts a key to a hash key is known as a Hash Function. The hash function in a
data structure maps the arbitrary size of data to fixed-sized data. It returns the following values: a
small integer value (also known as hash value), hash codes, and hash sums. The hashing techniques
in the data structure are very interesting, such as:

hash = hashfunc(key)
index = hash % array_size

The hash function must satisfy the following requirements:

• A good hash function is easy to compute.


• A good hash function never gets stuck in clustering and distributes keys evenly across the
hash table.
• A good hash function avoids collision when two elements or items get assigned to the same
hash value.
One of the hashing techniques of using a hash function is used for data integrity. If using a hash
function one change in a message will create a different hash.

The three characteristics of the hash function in the data structure are:
1. Collision free
2. Property to be hidden
3. Puzzle friendly

GEORGE WAINAINA 0718313173/0731919231 georgewainaina58@gmail.com Page 4


• This function takes a key and maps it to a value of a certain length which is called a Hash
value or Hash.
• Hash value represents the original string of characters, but it is normally smaller than the
original.
• It transfers the digital signature and then both hash value and signature are sent to the
receiver. Receiver uses the same hash function to generate the hash value and then
compares it to that received with the message.
• If the hash values are same, the message is transmitted without errors.
What is Hash Table?
• A hash table is a data structure that implements an associative array abstract data type, a
structure that can map keys to values. A hash table uses a hash function to compute an index,
also called a hash code, into an array of buckets or slots, from which the desired value can be
found.
• The hash table is basically the array of elements and the hash techniques of search are
performed on a part of the item i.e. key. Each key has been mapped to a number; the range
remains from 0 to table size 1.
• Hash table or hash map is a data structure used to store key-value pairs.
• It is a collection of items stored to make it easy to find them later.
• It uses a hash function to compute an index into an array of buckets or slots from which the
desired value can be found.
• It is an array of list where each list is known as bucket.
• It contains value based on the key.
• Hash table is used to implement the map interface and extends Dictionary class.
• Hash table is synchronized and contains only unique elements.

• The above figure shows the hash table with the size of n = 10. Each position of the hash
table is called as Slot. In the above hash table, there are n slots in the table, names = {0, 1,
2, 3, 4, 5, 6, 7, 8, 9}. Slot 0, slot 1, slot 2 and so on. Hash table contains no items, so every
slot is empty.

GEORGE WAINAINA 0718313173/0731919231 georgewainaina58@gmail.com Page 5


• As we know the mapping between an item and the slot where item belongs in the hash table
is called the hash function. The hash function takes any item in the collection and returns
an integer in the range of slot names between 0 to n-1.
• Suppose we have integer items {26, 70, 18, 31, 54, 93}. One common method of
determining a hash key is the division method of hashing and the formula is :

Hash Key = Key Value % Number of Slots in the Table

• Division method or reminder method takes an item and divides it by the table size and
returns the remainder as its hash value.

Data Item Value % No. of Slots Hash Value


26 26 % 10 = 6 6
70 70 % 10 = 0 0
18 18 % 10 = 8 8
31 31 % 10 = 1 1
54 54 % 10 = 4 4
93 93 % 10 = 3 3

• After computing the hash values, we can insert each item into the hash table at the
designated position as shown in the above figure. In the hash table, 6 of the 10 slots are
occupied, it is referred to as the load factor and denoted by, λ = No. of items / table size.
For example , λ = 6/10.
• It is easy to search for an item using hash function where it computes the slot name for the
item and then checks the hash table to see if it is present.
• Constant amount of time O(1) is required to compute the hash value and index of the hash
table at that location.
• Hash tables must support 3 fundamental operations:

GEORGE WAINAINA 0718313173/0731919231 georgewainaina58@gmail.com Page 6


✓ Insert(key,value) -> Adds an item to the hash table.
✓ get(key) -> Fetches the value with the help of the given key.
✓ delete(key) -> Removes a value with the help of the given key.

Collision in Hashing-
In hashing,
• Hash function is used to compute the hash value for a key.
• Hash value is then used as an index to store the key in the hash table.
• Hash function may return the same hash value for two or more keys.

When the hash value of a key maps to an already occupied bucket of the hash table,
it is called as a Collision.

Two keys could possibly provide the same value since a hash function returns a little number for a
key that is a large integer or string. A collision handling mechanism must be used to deal with the
circumstance when a newly added key maps to a hash table slot that is already occupied.

How often are collisions with the big table?

Even if we have a large table to put the keys on, collisions are still highly frequent. Birthday Paradox
is a crucial finding. With only 23 people, there is a 50% chance that two people will share the same
birthday.

Collision Resolution Techniques-

Collision Resolution Techniques are the techniques used for resolving or handling the collision.

GEORGE WAINAINA 0718313173/0731919231 georgewainaina58@gmail.com Page 7


Collision resolution techniques are classified as-

1. Separate Chaining / Open Hashing.


2. Open Addressing / Closed Hashing.

Separate Chaining / Open Hashing.


To handle the collision,
• This technique creates a linked list to the slot for which collision occurs.
• The new key is then inserted in the linked list.
• These linked lists to the slots appear like chains.
• That is why, this technique is called as separate chaining.

Time Complexity-
For Searching-
• In worst case, all the keys might map to the same bucket of the hash table.
• In such a case, all the keys will be present in a single linked list.
• Sequential search will have to be performed on the linked list to perform the search.
• So, time taken for searching in worst case is O(n).

For Deletion-

• In worst case, the key might have to be searched first and then deleted.
• In worst case, time taken for searching is O(n).
• So, time taken for deletion in worst case is O(n).

GEORGE WAINAINA 0718313173/0731919231 georgewainaina58@gmail.com Page 8


Load Factor (α)-
Load factor (α) is defined as-

If Load factor (α) = constant, then time complexity of Insert, Search, Delete = Θ(1)
PRACTICE PROBLEM BASED ON SEPARATE CHAINING
Problem-
Using the hash function ‘key mod 7’, insert the following sequence of keys in the hash table-
50, 700, 76, 85, 92, 73 and 101
Use separate chaining technique for collision resolution.
Solution-
The given sequence of keys will be inserted in the hash table as-
Step-01:
• Draw an empty hash table.
• For the given hash function, the possible range of hash values is [0, 6].
• So, draw an empty hash table consisting of 7 buckets as-

Step-02:

• Insert the given keys in the hash table one by one.


• The first key to be inserted in the hash table = 50.
• Bucket of the hash table to which key 50 maps = 50 mod 7 = 1.
• So, key 50 will be inserted in bucket-1 of the hash table as-

GEORGE WAINAINA 0718313173/0731919231 georgewainaina58@gmail.com Page 9


Step-03:
• The next key to be inserted in the hash table = 700.
• Bucket of the hash table to which key 700 maps = 700 mod 7 = 0.
• So, key 700 will be inserted in bucket-0 of the hash table as-

Step-04:
• The next key to be inserted in the hash table = 76.
• Bucket of the hash table to which key 76 maps = 76 mod 7 = 6.
• So, key 76 will be inserted in bucket-6 of the hash table as-

GEORGE WAINAINA 0718313173/0731919231 georgewainaina58@gmail.com Page 10


Step-05:
• The next key to be inserted in the hash table = 85.
• Bucket of the hash table to which key 85 maps = 85 mod 7 = 1.
• Since bucket-1 is already occupied, so collision occurs.
• Separate chaining handles the collision by creating a linked list to bucket-1.
• So, key 85 will be inserted in bucket-1 of the hash table as-

Step-06:
• The next key to be inserted in the hash table = 92.
• Bucket of the hash table to which key 92 maps = 92 mod 7 = 1.
• Since bucket-1 is already occupied, so collision occurs.
• Separate chaining handles the collision by creating a linked list to bucket-1.
• So, key 92 will be inserted in bucket-1 of the hash table as-

Step-07:
• The next key to be inserted in the hash table = 73.
• Bucket of the hash table to which key 73 maps = 73 mod 7 = 3.
• So, key 73 will be inserted in bucket-3 of the hash table as-

GEORGE WAINAINA 0718313173/0731919231 georgewainaina58@gmail.com Page 11


Step-08:
• The next key to be inserted in the hash table = 101.
• Bucket of the hash table to which key 101 maps = 101 mod 7 = 3.
• Since bucket-3 is already occupied, so collision occurs.
• Separate chaining handles the collision by creating a linked list to bucket-3.
• So, key 101 will be inserted in bucket-3 of the hash table as-

GEORGE WAINAINA 0718313173/0731919231 georgewainaina58@gmail.com Page 12


Open Addressing-
In Open Addressing, all elements are stored in the hash table itself. So at any point, the size of the
table must be greater than or equal to the total number of keys (Note that we can increase table size
by copying old data if needed). This approach is also known as closed hashing.
• Unlikeseparate chaining, all the keys are stored inside the hash table.
• No key is stored outside the hash table.

Techniques used for open addressing are-


• LinearProbing
• Quadratic Probing
• Double Hashing

Operations in Open Addressing-


Insert Operation-
• Hash function is used to compute the hash value for a key to be inserted.
• Hash value is then used as an index to store the key in the hash table.
In case of collision,
• Probing is performed until an empty bucket is found.
• Once an empty bucket is found, the key is inserted.
• Probing is performed in accordance with the technique used for open addressing.
Search Operation-
To search any particular key,
• Itshash value is obtained using the hash function used.
• Using the hash value, that bucket of the hash table is checked.
• If the required key is found, the key is searched.
• Otherwise, the subsequent buckets are checked until the required key or an empty bucket is
found.
• The empty bucket indicates that the key is not present in the hash table.
Delete Operation-
• The key is first searched and then deleted.
• After deleting the key, that particular bucket is marked as “deleted”.

NOTE-
• During insertion, the buckets marked as “deleted” are treated like any other empty bucket.
• During searching, the search is not terminated on encountering the bucket marked as
“deleted”.
• The search terminates only after the required key or an empty bucket is found

GEORGE WAINAINA 0718313173/0731919231 georgewainaina58@gmail.com Page 13


Open Addressing Techniques-

1. Linear Probing:
In linear probing, the hash table is searched sequentially that starts from the original location of
the hash. If in case the location that we get is already occupied, then we check for the next
location.
The function used for rehashing is as follows: rehash(key) = (n+1)%table-size.
For example, The typical gap between two probes is 1 as seen in the example below:
Let hash(x) be the slot index computed using a hash function and S be the table size
If slot hash(x) % S is full, then we try (hash(x) + 1) % S
If (hash(x) + 1) % S is also full, then we try (hash(x) + 2) % S
If (hash(x) + 2) % S is also full, then we try (hash(x) + 3) % S
…………………………………………..
…………………………………………..
Let us consider a simple hash function as “key mod 7” and a sequence of keys as 50, 700, 76, 85, 92, 73, 101.

Challenges in Linear Probing :


• Primary Clustering: One of the problems with linear probing is Primary clustering,
many consecutive elements form groups and it starts taking time to find a free slot or
to search for an element.
• Secondary Clustering: Secondary clustering is less severe, two records only have the
same collision chain (Probe Sequence) if their initial position is the same.

GEORGE WAINAINA 0718313173/0731919231 georgewainaina58@gmail.com Page 14


Example: Let us consider a simple hash function as “key mod 5” and a sequence of keys that are
to be inserted are 50, 70, 76, 93.
• Step1: First draw the empty hash table which will have a possible range of hash values
from 0 to 4 according to the hash function provided.

Hash table

• Step 2: Now insert all the keys in the hash table one by one. The first key is 50. It will
map to slot number 0 because 50%5=0. So insert it into slot number 0.

Insert key 50 in the hash table

• Step 3: The next key is 70. It will map to slot number 0 because 70%5=0 but 50 is
already at slot number 0 so, search for the next empty slot and insert it.

GEORGE WAINAINA 0718313173/0731919231 georgewainaina58@gmail.com Page 15


Insert key 70 in the hash table

• Step 4: The next key is 76. It will map to slot number 1 because 76%5=1 but 70 is
already at slot number 1 so, search for the next empty slot and insert it.

Insert key 76 in the hash table

• Step 5: The next key is 93 It will map to slot number 3 because 93%5=3, So insert it
into slot number 3.

Insert key 93 in the hash table

GEORGE WAINAINA 0718313173/0731919231 georgewainaina58@gmail.com Page 16


Linear Probing More Explanations:

• Take the above example, if we insert next item 40 in our collection, it would have a hash
value of 0 (40 % 10 = 0). But 70 also had a hash value of 0, it becomes a problem. This
problem is called as Collision or Clash. Collision creates a problem for hashing technique.
• Linear probing is used for resolving the collisions in hash table, data structures for
maintaining a collection of key-value pairs.
• Linear probing was invented by Gene Amdahl, Elaine M. McGraw and Arthur Samuel in
1954 and analyzed by Donald Knuth in 1963.
• It is a component of open addressing scheme for using a hash table to solve the dictionary
problem.
• The simplest method is called Linear Probing. Formula to compute linear probing is:

P = (1 + P) % (MOD) Table_size
For example,

If we insert next item 40 in our collection, it would have a hash value of 0 (40 % 10 = 0). But 70 also
had a hash value of 0, it becomes a problem.

Linear probing solves this problem:

P = H(40)
40 % 10 = 0
Position 0 is occupied by 70. so we look elsewhere for a position to store 40.

Using Linear Probing:


P= (P + 1) % table-size
0 + 1 % 10 = 1
But, position 1 is occupied by 31, so we look elsewhere for a position to store 40.

GEORGE WAINAINA 0718313173/0731919231 georgewainaina58@gmail.com Page 17


Using linear probing, we try next position: 1 + 1 % 10 = 2
Position 2 is empty, so 40 is inserted there.

2. Quadratic Probing

If you observe carefully, then you will understand that the interval between probes will increase
proportionally to the hash value. Quadratic probing is a method with the help of which we can
solve the problem of clustering that was discussed above. This method is also known as the mid-
square method. In this method, we look for the i2‘th slot in the ith iteration. We always start from
the original hash location. If only the location is occupied then we check the other slots.
let hash(x) be the slot index computed using hash function.
If slot hash(x) % S is full, then we try (hash(x) + 1*1) % S
If (hash(x) + 1*1) % S is also full, then we try (hash(x) + 2*2) % S
If (hash(x) + 2*2) % S is also full, then we try (hash(x) + 3*3) % S
…………………………………………..
…………………………………………..
Example: Let us consider table Size = 7, hash function as Hash(x) = x % 7 and collision
resolution strategy to be f(i) = i2 . Insert = 22, 30, and 50.
• Step 1: Create a table of size 7.

Hash table

GEORGE WAINAINA 0718313173/0731919231 georgewainaina58@gmail.com Page 18


• Step 2 – Insert 22 and 30
• Hash(22) = 22 % 7 = 1, Since the cell at index 1 is empty, we can easily
insert 22 at slot 1.
• Hash(30) = 30 % 7 = 2, Since the cell at index 2 is empty, we can easily
insert 30 at slot 2.

Insert keys 22 and 30 in the hash table

• Step 3: Inserting 50
• Hash(50) = 50 % 7 = 1
• In our hash table slot 1 is already occupied. So, we will search for slot 1+12,
i.e. 1+1 = 2,
• Again slot 2 is found occupied, so we will search for cell 1+22, i.e.1+4 = 5,
• Now, cell 5 is not occupied so we will place 50 in slot 5.

Insert key 50 in the hash table

GEORGE WAINAINA 0718313173/0731919231 georgewainaina58@gmail.com Page 19


3. Double Hashing
The intervals that lie between probes are computed by another hash function. Double hashing is a
technique that reduces clustering in an optimized way. In this technique, the increments for the
probing sequence are computed by using another hash function. We use another hash function
hash2(x) and look for the i*hash2(x) slot in the ith rotation.
let hash(x) be the slot index computed using hash function.
If slot hash(x) % S is full, then we try (hash(x) + 1*hash2(x)) % S
If (hash(x) + 1*hash2(x)) % S is also full, then we try (hash(x) + 2*hash2(x)) % S
If (hash(x) + 2*hash2(x)) % S is also full, then we try (hash(x) + 3*hash2(x)) % S
…………………………………………..
…………………………………………..
Example: Insert the keys 27, 43, 92, 72 into the Hash Table of size 7. where first hash-function
is h1(k) = k mod 7 and second hash-function is h2(k) = 1 + (k mod 5)
• Step 1: Insert 27
• 27 % 7 = 6, location 6 is empty so insert 27 into 6 slot.

Insert key 27 in the hash table

• Step 2: Insert 43
• 43 % 7 = 1, location 1 is empty so insert 43 into 1 slot.

Insert key 43 in the hash table

GEORGE WAINAINA 0718313173/0731919231 georgewainaina58@gmail.com Page 20


• Step 3: Insert 92
• 92 % 7 = 6, but location 6 is already being occupied and this is a collision
• So we need to resolve this collision using double hashing.
hnew = [h1(92) + i * (h2(92)] % 7
= [6 + 1 * (1 + 92 % 5)] % 7
=9%7
=2
Now, as 2 is an empty slot,
so we can insert 92 into 2nd slot.

Insert key 92 in the hash table

• Step 4: Insert 72
• 72 % 7 = 2, but location 2 is already being occupied and this is a collision.
• So we need to resolve this collision using double hashing.
hnew = [h1(72) + i * (h2(72)] % 7
= [2 + 1 * (1 + 72 % 5)] % 7
=5%7
= 5,
Now, as 5 is an empty slot,
so we can insert 72 into 5th slot.

Insert key 72 in the hash table

GEORGE WAINAINA 0718313173/0731919231 georgewainaina58@gmail.com Page 21


Comparison of the above three:

• Linear probing has the best cache performance but suffers from clustering. One more
advantage of Linear probing is easy to compute.
• Quadratic probing lies between the two in terms of cache performance and clustering.
• Double hashing has poor cache performance but no clustering. Double hashing
requires more computation time as two hash functions need to be computed.

S.No. Separate Chaining Open Addressing


1. Chaining is Simpler to implement. Open Addressing requires more
computation.
2. In chaining, Hash table never fills up, we can In open addressing, table may become
always add more elements to chain. full.
3. Chaining is Less sensitive to the hash function Open addressing requires extra care to
or load factors. avoid clustering and load factor.
4. Chaining is mostly used when it is unknown Open addressing is used when the
how many and how frequently keys may be frequency and number of keys is known.
inserted or deleted.
5. Cache performance of chaining is not good as Open addressing provides better cache
keys are stored using linked list. performance as everything is stored in
the same table.
6. Wastage of Space (Some Parts of hash table in In Open addressing, a slot can be used
chaining are never used). even if an input doesn’t map to it.
7. Chaining uses extra space for links. No links in Open addressing

Note: Cache performance of chaining is not good because when we traverse a Linked List, we are
basically jumping from one node to another, all across the computer’s memory. For this reason, the
CPU cannot cache the nodes which aren’t visited yet, this doesn’t help us. But with Open
Addressing, data isn’t spread, so if the CPU detects that a segment of memory is constantly being
accessed, it gets cached for quick access.

Hash table implementation in c using arrays.


#include<stdio.h>
#define size 7
int arr[size];
void init()
{
int i;
for(i = 0; i < size; i++)
arr[i] = -1;
}
void insert(int value)
{
int key = value % size;
if(arr[key] == -1)

GEORGE WAINAINA 0718313173/0731919231 georgewainaina58@gmail.com Page 22


{
arr[key] = value;
printf("%d inserted at arr[%d]\n", value,key);
}
else
{
printf("Collision : arr[%d] has element %d already!\n",key,arr[key]);
printf("Unable to insert %d\n",value);
}
}
void del(int value)
{
int key = value % size;
if(arr[key] == value)
arr[key] = -1;
else
printf("%d not present in the hash table\n",value);
}
void search(int value)
{
int key = value % size;
if(arr[key] == value)
printf("Search Found\n");
else
printf("Search Not Found\n");
}
void print()
{
int i;
for(i = 0; i < size; i++)
printf("arr[%d] = %d\n",i,arr[i]);
}
int main()
{
init();
insert(10); //key = 10 % 7 ==> 3
insert(4); //key = 4 % 7 ==> 4
insert(2); //key = 2 % 7 ==> 2
insert(3); //key = 3 % 7 ==> 3 (collision)
printf("Hash table\n");
print();
printf("\n");
printf("Deleting value 10..\n");
del(10);
printf("After the deletion hash table\n");
print();
printf("\n");
printf("Deleting value 5..\n");

GEORGE WAINAINA 0718313173/0731919231 georgewainaina58@gmail.com Page 23


del(5);
printf("After the deletion hash table\n");
print();
printf("\n");
printf("Searching value 4..\n");
search(4);
printf("Searching value 10..\n");
search(10);
return 0;
}

OUTPUT:

GEORGE WAINAINA 0718313173/0731919231 georgewainaina58@gmail.com Page 24


Rehashing;
Rehashing means hashing again. Basically, when the load factor increases to more than its
predefined value (the default value of the load factor is 0.75), the complexity increases. So, to
overcome this, the size of the array is increased (doubled) and all the values are hashed again and
stored in the new double-sized array to maintain a low load factor and low complexity.
Why rehashing?
Rehashing is done because whenever key value pairs are inserted into the map, the load factor
increases, which implies that the time complexity also increases as explained above. This might not
give the required time complexity of O(1). Hence, rehash must be done, increasing the size of the
bucket Array so as to reduce the load factor and the time complexity.

Rehashing is a technique in which the table is resized, i.e., the size of table is doubled by creating a
new table. It is preferable is the total size of table is a prime number. There are situations in which
the rehashing is required.

• When table is completely full


• With quadratic probing when the table is filled half.
• When insertions fail due to overflow.

GEORGE WAINAINA 0718313173/0731919231 georgewainaina58@gmail.com Page 25


In such situations, we have to transfer entries from old table to the new table by re computing their
positions using hash functions.
Consider we have to insert the elements 37, 90, 55, 22, 17, 49, and 87. the table size is 10 and
will use hash function:
H(key) = key mod tablesize
37 % 10 = 7
90 % 10= 0
55 % 10 = 5
22 % 10 = 2
17 % 10 = 7
49 % 10 = 9
Now this table is almost full and if we try to insert more elements collisions will occur and eventually
further insertions will fail. Hence, we will rehash by doubling the table size. The old table size is 10
then we should double this size for new table, that becomes 20. But 20 is not a prime number, we will
prefer to make the table size as 23. And new hash function will be
H(key) key mod 23
37 % 23 = 14
90 % 23 = 21
55 % 23 = 9
22 % 23 = 22
17 % 23 = 17
49 % 23 = 3
87 % 23 = 18
Now the hash table is sufficiently large to accommodate new insertions.

Applications of Hash Data structure


• Hash is used in databases for indexing.
• Hash is used in disk-based data structures.
• In some programming languages like Python, JavaScript hash is used to implement
objects.

GEORGE WAINAINA 0718313173/0731919231 georgewainaina58@gmail.com Page 26


Real-Time Applications of Hash Data structure

• Hash is used for cache mapping for fast access to the data.
• Hash can be used for password verification.
• Hash is used in cryptography as a message digest.

Advantages of Hash Data structure

• Hash provides better synchronization than other data structures.


• Hash tables are more efficient than search trees or other data structures
• Hash provides constant time for searching, insertion, and deletion operations on
average.

Disadvantages of Hash Data structure


• Hash is inefficient when there are many collisions.
• Hash collisions are practically not avoided for a large set of possible keys.
• Hash does not allow null values.

Conclusion

Hashing is to resolve the challenge of finding an item quickly in a collection. For example, if we
have a list of millions of English words and we wish to find a particular term then we would use
hashing to locate and find it more efficiently. It would be inefficient to check each item on the
millions of lists until we find a match. Hashing reduces search time by restricting the search to a
smaller set of words at the beginning.

GEORGE WAINAINA 0718313173/0731919231 georgewainaina58@gmail.com Page 27

You might also like