Hashing
Hashing
Hashing
Folding
The folding method involves dividing the key into equal parts, summing the parts,
and then taking the modulo with respect to 𝑚m.
Steps:
Divide the key into parts.
Sum the parts.
Take the modulo 𝑚m of the sum.
Simple hashes
• It's possible to have very simple hash
functions if you are certain of your keys
• For example,
› suppose we know that the keys s will be real
numbers uniformly distributed over 0 s < 1
› Then a very fast, very good hash function is
• hash(s) = floor(s·m)
• where m is the size of the table
very simple mapping
• hash(s) = floor(s·m) maps from 0 s < 1 to 0..m-1
› m = 10
s 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
floor(s*m) 0 1 2 3 4 5 6 7 8 9
Note the even distribution. There are collisions, but we will deal with them later.
Perfect hashing
• In some cases it's possible to map a known
set of keys uniquely to a set of index values
• You must know every single key beforehand
and be able to derive a function that works
one-to-one (not necessarily onto)
s 120 331 912 74 665 47 888 219
hash(s) 0 1 2 3 4 5 6 7 8 9
modulo mapping
• a mod m maps from integers to 0..m-1
› one to one? no
› onto? yes
x -4 -3 -2 -1 0 1 2 3 4 5 6 7
x mod 4 0 1 2 3 0 1 2 3 0 1 2 3
Hash function : mod
• If keys are integers, we can use the hash
function:
› Hash(key) = key mod TableSize
• Problem 1: What if TableSize is 11 and all
keys are 2 repeated digits? (eg, 22, 33, …)
› all keys map to the same index
› Need to pick TableSize carefully: often, a prime
number
Hash Function : add chars
• If keys are strings can get an integer by adding up ASCII
values of characters in key
hashValue = 0;
while (*key != ‘\0’)
hashValue += *key++;
character C S E 3 7 3 <0>
ASCII value 67 83 69 32 51 55 51 0
1. Open Addessing
2. Chaining
Open Addressing
• All the keys are kept inside the hash table, unlike separate
chaining.
• The hash table contains the only key information.
• The methods for open addressing are as follows:
• Linear Probing
• Quadratic Probing
• Double Hashing
Linear Probing
• If slot hash (x) % S is full, then
we try ( hash (x) + 1 ) % S
• If ( hash (x) + 1 ) % S is also
full, then we try ( hash (x) + 2)
%S
• If ( hash (x) + 2 ) % S is also
full, then we try ( hash (x) + 3 )
%S
Linear probing -- clustering
no collision
collision in small cluster
no collision
no collision
collision, try again
no collision at h1(x)+h2(x)
[R. Sedgewick]
Rehashing - rebuild the table
• Rehashing is the process of
increasing the size of a
hashmap and redistributing the
elements to new buckets based
on their new hash values. It is
done to improve the
performance of the hashmap
and to prevent collisions caused
by a high load factor.
• Rehashing is needed in a
hashmap to prevent collision
and to maintain the efficiency
of the data structure.
Load Factor
• The load factor is a measure that decides when to increase the
HashMap or Hashtable capacity to maintain the get() and put()
operation of complexity O(1). The default value of the load factor of
HashMap is 0.75 (75% of the map size). In short, we can say that the
load factor decides "when to increase the number of buckets to store
the key-value pair."