Hashing Powerpoint
Hashing Powerpoint
Hashing
TCSS 342
1/51
2/51
Examples
Telephone directory
Library catalogue
Books in print: key ISBN
FAT (File Allocation Table)
3/51
Main Issues
Size
Operations: search, insert, delete, ??? Create
reports??? List?
What will be stored in the dictionary?
How will be items identified?
4/51
query methods:
findElement(k)
findAllElements(k)
5/51
special element
NO_SUCH_KEY, returned by an unsuccessful
search
6/51
Implementing a Dictionary
with a Sequence
unordered sequence
searching and removing takes O(n) time
inserting takes O(1) time
applications to log files (frequent insertions,
rare searches and removals) 34 14 12 22 18
34
14
12
22
18
7/51
Implementing a Dictionary
with a Sequence
array-based ordered sequence
(assumes keys can be ordered)
- searching takes O(log n) time (binary search)
- inserting and removing takes O(n) time
- application to look-up tables
(frequent searches, rare insertions and removals)
12
14
18
22
34
8/51
Binary Search
narrow down the search range in stages
high-low game
findElement(22)
2
low
9 12 14 17 19 22 25 27 28 33 37
mid
high
9/51
Binary Search
2
9 12 14 17 19 22 25 27 28 33 37
low
mid
high
9 12 14 17 19 22 25 27 28 33 37
9 12 14 17 19 22 25 27 28 33 37
low = mid = high
10/51
Search Range
n/2
2i
n/2i
log 2n
1
12/51
13/51
Implementations
Sorted? Non Sorted?
Elementary: Arrays, vectors linked lists
Orgainization: None (log file), Sorted, Hashed
14/51
Skip Lists
Simulate Binary Search on a linked list.
Linked list allows easy insertion and
deletion.
http://www.epaperpress.com/s_man.html
15/51
A FAT Example
Directory: Key: file name. Data (time, date, size
) location of first block in the FAT table.
If first block is in physical location #23 (Disk
block number) look up position #23 in the FAT.
Either shows end of file or has the block number
on disk.
Example: Directory entry: block # 4
FAT: x x x F 5 6 10 x 23 25
3
The file occupies blocks 4,5,6,10, 3.
16/51
Hashing
Place item with key k in position h(k).
Hope: h(k) is 1-1.
Requires: unique key (unless multiple items
allowed). Key must be protected from
change (use abstract class that provides only
a constructor).
Keys must be comparable.
17/51
Key class
}
18/51
Hashing Problem
RT&T is a large phone company, and they
want to provide enhanced caller ID
capability:
Hashing Problem
We know two ways to design this dictionary:
abalanced search tree (AVL, red-black) or a skiplist with the phone number as the key has O(log n)
query time and O(n) space --- good space usage
and search time, but can we reduce the search time
to constant?
abucket array indexed by the phone number has
optimal O(1) query time, but there is a huge
amount of wasted space: O(n + R)
20/51
Bucket Array
Each cell is thought of as a bucket or a container
Holds key element pairs
In array A of size N, an element e with key k is inserted
in A[k].
(null)
(null)
000-000-0000 000-000-0001
Roberto
401-863-7639 ...
(null)
999-999-9999
21/51
Generalized indexing
Hash table
Data storage associated with a key
The key need not be an integer
22/51
Hash Tables
A data structure
The location of an item is determined
directly as a function of the item itself
Not by a sequence of trial and error comparisons
Example:
A symbol table constructed by a
compiler
Stores identifiers and information about
them
24/51
Another Solution
A Hash Table is an alternative solution with
O(1) expected query time and O(n + N)
space, where N is the size of the table
Like an array, but with a function to map
the large range of keys into a smaller one
e.g., take the original key, mod the size of the
table, and use that as an index
25/51
Example
Insert item (401-863-7639, Roberto) into a table of
size 5
4018637639 mod 5 = 4, so item (401-863-7639,
Roberto) is stored in slot 4 of the table
A lookup uses the same process: map the key to an
index, then check the array cell at that index
401863-7639
Roberto
26/51
Collision
Insert (401-863-9350, Andy)
And insert (401-863-2234, Devin). We have
a collision!
27/51
Collision Resolution
How to deal with two keys which map to
the same cell of the array?
Use chaining
Set up lists of items with the same index
28/51
Chaining
0
1
2
3
4
29/51
Chaining
The expected, search/insertion/removal time
is O(n/N), provided the indices are
uniformly distributed
The performance of the data structure can
be fine-tuned by changing the table size N
30/51
Hash Function
Function h defined by h(i) = i
Determines the location of an item i in the
hash table
Java Hash
Java provides a hashCode() method for the
Object class, which typically returns the 32bit memory address of the object.
This default hash code would work poorly
for Integer and String objects
The hashCode() method should be suitably
redefined by classes.
33/51
34/51
35/51
Random Hashing
Random hashing
Uses a simple random number generation
technique
Scatters the items randomly throughout
the hash table
37/51
More on Collisions
A key is mapped to an already occupied
table location
what to do?!?
Linear Probing
If the current location is used, try the next
table location
linear_probing_insert(K)
if (table is full) error
probe = h(K)
while (table[probe] occupied)
probe = (probe + 1) mod M
table[probe] = K
40/51
Linear Probing
Lookups walk along table until the key or an
empty slot is found
Uses less memory than chaining
dont have to store all those links
41
0
10 11 12
18 44 59 32 22 31 72
3
10 11 12
42/51
Double Hashing
Use two hash functions
If M is prime, eventually will examine every
position in the table
double_hash_insert(K)
if(table is full) error
probe = h1(K)
offset = h2(K)
while (table[probe] occupied)
probe = (probe + offset) mod M
table[probe] = K
43/51
Double Hashing
Many of same (dis)advantages as linear
probing
Distributes keys more uniformly than linear
probing does
44/51
44
0
41 73
1
10 11 12
18 32 53 31 22
4
10 11 12
45/51
Hash code
static int hashCode(long i) {
return (int)((i >> 32) + (int) i);}
46/51
Hash code
static int hashCode(String s) {
int h=0;
for (int i=0; i<s.length(); i++) {
h = (h << 5) | (h >>> 27);
// 5-bit cyclic shift of the running sum
h += (int) s.charAt(i); // add in next
character }
return h;
}
47/51
49/51
51/51
53/51
54/51
Dictionary
// methods of the dictionary ADT
public Object findElement (Object key) throws
InvalidKeyException {
int i = findItem(key); // helper method for finding a key
if (i < 0)
return Dictionary.NO_SUCH_KEY;
return element(i);
}
55/51
Dictionary
public void insertItem (Object key, Object element)
throws InvalidKeyException {
check(key);
int i = h.
hashValue(key) % N; // division method compression
map
int j = i;
56/51
Dictionary
// remember where we are starting
do {
if (empty(i) || available(i)) { // this slot is available
A[i] = new Item(key, element);
n++;
return;
}
i = (i + 1) % N; // check next slot
} while (i != j); // repeat until we return to start
throw new HashTableFullException("Hash table is
full."); }
57/51
Dictionary
public Object removeElement (Object key) throws
InvalidKeyException {
int i = findItem(key); // find this key first
if (i <0)
return Dictionary.NO_SUCH_KEY;
// nothing to remove
Object toReturn = element(i);
A[i] = AVAILABLE; // mark this slot as deactivated
n--;
return toReturn;
}
58/51