Hashing Explained

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 20

Hashing Explained

Mr O’Connor
Consider this scenario…
You own a large retail website which sells
various electronic equipment to customers.
You have millions of customers, all with their
own accounts and orders.

The task of finding a customer’s information as


become a pain, because you have so many
customer accounts to search through.

You decide to change the way you store


customers, and use a hash table
Creating a Hash Table…
You need to make a table which is big enough to store all of your
customers. In reality, the table would have over a million rows but for
the sake of explaining how hash tables work we will reduce this to 9
rows Index Data
0
1
You can very easily imagine that 2
large retail companies like Amazon 3
and eBay will have the index go into 4
the millions 5
6
7
8
Creating a Hash Function…
Now we have our table, we want to decide on a hash function which
will decide where our data should be stored in the hash table.

Ideally, a hash function should only return values that are between the
start index and end index of the table. Our table runs from index 0-8, so
we need a function which will return the values 0-8.

A customer’s data will typically be structured like this:


Customer(CustomerID, CustomerName, Age, Address)
Creating a Hash Function…
A customer’s data will typically be structured like this:
Customer(CustomerID, CustomerName, Age, Address)

Our hash function should take the CustomerID as an input, and Mod
that value by 9, therefore guaranteeing a result that is between 0 and
8.

Our hash function:


H(k) = CustomerID Mod 9
Example: Adding a customer to the table
Now we know the function. Let’s imagine we have a new customer
with the following details:
{102, ‘Joe Bloggs’, 43, ‘12 Cleo Lane’}
Index Data
To find where this customer should be stored in
the hash table, we will pass its Customer ID (102) 0
into the hash function 1
2
3
102 Mod 9 = 3
4
The function is telling us that this customer must 5
be stored at index 3 6
7
8
Example: Adding a customer to the table
Our table now looks like this…

Index Data
0
1
2
3 {102, ‘Joe Bloggs’, 43, ‘12 Cleo Lane’}
4
5
6
7
8
Example: Adding another customer to the table
Let’s add another customer:
{107, ‘Lucy Smith’, 33, ’92a Strawberry Fields’}

Index Data
To find where this customer should be stored in
the hash table, we will pass its Customer ID (107) 0
into the hash function 1
2
3 {102, ‘Joe Bloggs’, 43, ‘12 Cleo Lane’}
107 Mod 9 = 8
4
The function is telling us that this customer must 5
be stored at index 8 6
7
8
Example: Adding a customer to the table
Our table now looks like this…

Index Data
0
1
2
3 {102, ‘Joe Bloggs’, 43, ‘12 Cleo Lane’}
4
5
6
7
8 {107, ‘Lucy Smith’, 33, ’92a Strawberry Fields’}
What’s the point of all this?
The benefit of storing data this way is that it makes searching for data
instantaneous. In a normal table, if I wanted to search for customer ID 102, I would
either have to perform linear search or a binary search (if the data is in order)

With hashing, all I have to do is input the Customer Index Data


ID into the hash function, and it will tell me where 0
that person is in the table. 1
2
102 Mod 9 = 3
3 {102, ‘Joe Bloggs’, 43, ‘12 Cleo Lane’}
The function is telling us that the customer is 4
contained in index 3 5
6
This now makes searching for data O(1) rather 7
than O(n) or O(log n) 8 {107, ‘Lucy Smith’, 33, ’92a Strawberry Fields’}
What’s the point of all this?
But what if the person you are searching for doesn’t exist? Well, theoretically, it is
still O(1). Example below

If I wanted to search for Customer ID 103… Index Data


103 Mod 9 = 4 0
1
The function is telling us that the customer is 2
contained in index 4 3 {102, ‘Joe Bloggs’, 43, ‘12 Cleo Lane’}
4
But there is nothing in index 4, therefore they 5
must not be in the table at all, still O(1) 6
7
8 {107, ‘Lucy Smith’, 33, ’92a Strawberry Fields’}
Problems with Hashing
Using our function we can add more and more to the table, but a
problem will eventually arise…

Index Data
If we insert this customer into the hash table:
0 {90, ‘Rob Jones’, 68, ‘2 Ford Road’}
{111, ‘Charlotte Dunst’, 52, ’14 Hill Road’}
1 {82, ‘Anne Swarbrick’, 29, ‘60 Warwick Avenue’}
111 Mod 9 = 3 2
3 {102, ‘Joe Bloggs’, 43, ‘12 Cleo Lane’}
The function is telling us that this customer 4
must be stored at index 3, but someone is 5
already in row 3. A collision has occurred 6
7
8 {107, ‘Lucy Smith’, 33, ’92a Strawberry Fields’}
Collisions
Collisions occur when two or more keys hash to the same hash
value

This is a big problem with hash tables and there are two possible
collision handling methods to solve it:
• Open hashing
• Closed hashing
When we decide to use a collision handling method, our hash
table is no longer O(1) for searching 
Open Hashing
Open Hashing would ‘solve’ the issue by putting the new customer in
the “next available location” from the hash value
{111, ‘Charlotte Dunst’, 52, ’14 Hill Road’} Index Data
needed to go in index 3, but something else is 0 {90, ‘Rob Jones’, 68, ‘2 Ford Road’}
already there.
1 {82, ‘Anne Swarbrick’, 29, ‘60 Warwick Avenue’}
2
Therefore, the next available location from
3 {102, ‘Joe Bloggs’, 43, ‘12 Cleo Lane’}
here is index 4. It will be stored there.
4 {111, ‘Charlotte Dunst’, 52, ’14 Hill Road’}
Annoying isn’t it? 5
6
When you search for 111 now and you see 7
102 instead, your search is inconclusive and 8 {107, ‘Lucy Smith’, 33, ’92a Strawberry Fields’}
you need to keep looking. Hence, it is no
longer O(1)
Open Hashing
• To make matters worse, imagine I wanted to delete customer ID 102

Let’s say I removed {102, ‘Joe Bloggs’, 43, ‘12 Index Data
Cleo Lane’}. 0 {90, ‘Rob Jones’, 68, ‘2 Ford Road’}
1 {82, ‘Anne Swarbrick’, 29, ‘60 Warwick Avenue’}
I would have to move customer ID 111 up to
2
index 3 (where it should belong), and I would
3 {102, ‘Joe Bloggs’, 43, ‘12 Cleo Lane’}
also have to see if anything else that hashed
4 {111, ‘Charlotte Dunst’, 52, ’14 Hill Road’}
to index 3 needs to be moved up to where ID
111 used to be. 5 {120, ‘Bob Bobbins’, 48, ‘22 Bobby Avenue‘}
6
This could take a long time and is painfully 7
irritating to code 8 {107, ‘Lucy Smith’, 33, ’92a Strawberry Fields’}
Closed Hashing
Closed Hashing operates differently, it would also ‘solve’ the issue, but
this way would store a pointer value which indicates a separate area of
memory where the colliding record is stored Separate memory
Index Data Pointer
In this Index Data Pointer
0 {90, ‘Rob Jones’, 68, ‘2 Ford Road’} Null
implementation, 1 {82, ‘Anne Swarbrick’, 29, ‘60 Warwick Avenue’} Null
… … Null

{111, ‘Charlotte 2 Null 103 … Null


Dunst’, 52, ’14 Hill 3 {102, ‘Joe Bloggs’, 43, ‘12 Cleo Lane’} 104 104 {111, ‘Charlotte Dunst’, Null
52, ’14 Hill Road’}
Road’} will be stored 4
in a separate place in 5 105 … Null
memory, and a 6

7
pointer reference will
8 {107, ‘Lucy Smith’, 33, ’92a Strawberry Fields’}
be stored to
remember where it is
Closed Hashing
This way, colliding records can form a chain, so when you search for customer ID
111 and the hash function takes you to record 3, you can see that someone else is
there and you need to follow the pointer chain until you find them (they exist) or
you reach the end (they don’t exist)
Separate memory
Index Data Pointer
Index Data Pointer
0 {90, ‘Rob Jones’, 68, ‘2 Ford Road’} Null
1 {82, ‘Anne Swarbrick’, 29, ‘60 Warwick Avenue’} Null … … Null

2 Null 103 … Null

3 {102, ‘Joe Bloggs’, 43, ‘12 Cleo Lane’} 104 104 {111, ‘Charlotte Dunst’, 105
52, ’14 Hill Road’}
4
5 105 {120, ‘Bob Bobbins’, 48, Null
6 ‘22 Bobby Avenue‘}

7 …
8 {107, ‘Lucy Smith’, 33, ’92a Strawberry Fields’}
Closed Hashing
Deleting in closed hashing is simpler than open hashing, I can merely change the
pointer to values to ignore the customer that has been removed.
If I wanted to remove customer ID 111, I would simply change 102’s pointer to
what 111’s pointer used to be.
Separate memory
Index Data Pointer
Index Data Pointer
0 {90, ‘Rob Jones’, 68, ‘2 Ford Road’} Null
1 {82, ‘Anne Swarbrick’, 29, ‘60 Warwick Avenue’} Null … … Null
2 Null 103 … Null
3 {102, ‘Joe Bloggs’, 43, ‘12 Cleo Lane’} 104 105
104 {111, ‘Charlotte Dunst’, 105
4 52, ’14 Hill Road’}
5 105 {120, ‘Bob Bobbins’, 48, Null
‘22 Bobby Avenue‘}
6
7 …

8 {107, ‘Lucy Smith’, 33, ’92a Strawberry Fields’}

Note that Index 104 still contains Customer ID 111, but


nobody points to it, so theoretically it doesn’t exist (this
is how Recycle Bins work)
Things to remember
• Hashing is theoretically O(1) for adding, deleting and searching (but
only with a perfect hash function)
• But, collisions (hashing to the same record) causes this to be false,
and in reality searching and adding using hash tables is O(n)
• To deal with collisions you can use one of the collision handling
methods: Open Hashing or Closed Hashing
• If your hash function is resulting in too many collisions, you can
perform Re-Hashing (next slide)
Re-Hashing
Re-Hashing involves:
• Making a bigger table
• Creating a new hash function for this bigger table
• Going through all of the data in the old table, passing it
through the new function and placing it in its new place in
the new table

If that doesn’t make sense then Google it! 

You might also like