Hashing Explained
Hashing Explained
Hashing Explained
Mr O’Connor
Consider this scenario…
You own a large retail website which sells
various electronic equipment to customers.
You have millions of customers, all with their
own accounts and orders.
Ideally, a hash function should only return values that are between the
start index and end index of the table. Our table runs from index 0-8, so
we need a function which will return the values 0-8.
Our hash function should take the CustomerID as an input, and Mod
that value by 9, therefore guaranteeing a result that is between 0 and
8.
Index Data
0
1
2
3 {102, ‘Joe Bloggs’, 43, ‘12 Cleo Lane’}
4
5
6
7
8
Example: Adding another customer to the table
Let’s add another customer:
{107, ‘Lucy Smith’, 33, ’92a Strawberry Fields’}
Index Data
To find where this customer should be stored in
the hash table, we will pass its Customer ID (107) 0
into the hash function 1
2
3 {102, ‘Joe Bloggs’, 43, ‘12 Cleo Lane’}
107 Mod 9 = 8
4
The function is telling us that this customer must 5
be stored at index 8 6
7
8
Example: Adding a customer to the table
Our table now looks like this…
Index Data
0
1
2
3 {102, ‘Joe Bloggs’, 43, ‘12 Cleo Lane’}
4
5
6
7
8 {107, ‘Lucy Smith’, 33, ’92a Strawberry Fields’}
What’s the point of all this?
The benefit of storing data this way is that it makes searching for data
instantaneous. In a normal table, if I wanted to search for customer ID 102, I would
either have to perform linear search or a binary search (if the data is in order)
Index Data
If we insert this customer into the hash table:
0 {90, ‘Rob Jones’, 68, ‘2 Ford Road’}
{111, ‘Charlotte Dunst’, 52, ’14 Hill Road’}
1 {82, ‘Anne Swarbrick’, 29, ‘60 Warwick Avenue’}
111 Mod 9 = 3 2
3 {102, ‘Joe Bloggs’, 43, ‘12 Cleo Lane’}
The function is telling us that this customer 4
must be stored at index 3, but someone is 5
already in row 3. A collision has occurred 6
7
8 {107, ‘Lucy Smith’, 33, ’92a Strawberry Fields’}
Collisions
Collisions occur when two or more keys hash to the same hash
value
This is a big problem with hash tables and there are two possible
collision handling methods to solve it:
• Open hashing
• Closed hashing
When we decide to use a collision handling method, our hash
table is no longer O(1) for searching
Open Hashing
Open Hashing would ‘solve’ the issue by putting the new customer in
the “next available location” from the hash value
{111, ‘Charlotte Dunst’, 52, ’14 Hill Road’} Index Data
needed to go in index 3, but something else is 0 {90, ‘Rob Jones’, 68, ‘2 Ford Road’}
already there.
1 {82, ‘Anne Swarbrick’, 29, ‘60 Warwick Avenue’}
2
Therefore, the next available location from
3 {102, ‘Joe Bloggs’, 43, ‘12 Cleo Lane’}
here is index 4. It will be stored there.
4 {111, ‘Charlotte Dunst’, 52, ’14 Hill Road’}
Annoying isn’t it? 5
6
When you search for 111 now and you see 7
102 instead, your search is inconclusive and 8 {107, ‘Lucy Smith’, 33, ’92a Strawberry Fields’}
you need to keep looking. Hence, it is no
longer O(1)
Open Hashing
• To make matters worse, imagine I wanted to delete customer ID 102
Let’s say I removed {102, ‘Joe Bloggs’, 43, ‘12 Index Data
Cleo Lane’}. 0 {90, ‘Rob Jones’, 68, ‘2 Ford Road’}
1 {82, ‘Anne Swarbrick’, 29, ‘60 Warwick Avenue’}
I would have to move customer ID 111 up to
2
index 3 (where it should belong), and I would
3 {102, ‘Joe Bloggs’, 43, ‘12 Cleo Lane’}
also have to see if anything else that hashed
4 {111, ‘Charlotte Dunst’, 52, ’14 Hill Road’}
to index 3 needs to be moved up to where ID
111 used to be. 5 {120, ‘Bob Bobbins’, 48, ‘22 Bobby Avenue‘}
6
This could take a long time and is painfully 7
irritating to code 8 {107, ‘Lucy Smith’, 33, ’92a Strawberry Fields’}
Closed Hashing
Closed Hashing operates differently, it would also ‘solve’ the issue, but
this way would store a pointer value which indicates a separate area of
memory where the colliding record is stored Separate memory
Index Data Pointer
In this Index Data Pointer
0 {90, ‘Rob Jones’, 68, ‘2 Ford Road’} Null
implementation, 1 {82, ‘Anne Swarbrick’, 29, ‘60 Warwick Avenue’} Null
… … Null
3 {102, ‘Joe Bloggs’, 43, ‘12 Cleo Lane’} 104 104 {111, ‘Charlotte Dunst’, 105
52, ’14 Hill Road’}
4
5 105 {120, ‘Bob Bobbins’, 48, Null
6 ‘22 Bobby Avenue‘}
7 …
8 {107, ‘Lucy Smith’, 33, ’92a Strawberry Fields’}
Closed Hashing
Deleting in closed hashing is simpler than open hashing, I can merely change the
pointer to values to ignore the customer that has been removed.
If I wanted to remove customer ID 111, I would simply change 102’s pointer to
what 111’s pointer used to be.
Separate memory
Index Data Pointer
Index Data Pointer
0 {90, ‘Rob Jones’, 68, ‘2 Ford Road’} Null
1 {82, ‘Anne Swarbrick’, 29, ‘60 Warwick Avenue’} Null … … Null
2 Null 103 … Null
3 {102, ‘Joe Bloggs’, 43, ‘12 Cleo Lane’} 104 105
104 {111, ‘Charlotte Dunst’, 105
4 52, ’14 Hill Road’}
5 105 {120, ‘Bob Bobbins’, 48, Null
‘22 Bobby Avenue‘}
6
7 …