Chapter 5 - Hash Tables
Chapter 5 - Hash Tables
If the key is contained within the record at a specific offset relative to the start of the
record, it is known as internal or embedded key. If the keys are contained in a
separate table with pointers to the associated data, the key is classified as external
key.
Aside from searching, several other operations can be done on a table. The following
is a list of the possible operations:
Searching for the record where Ki = K, where K is given by the user.
Insertion.
Deletion.
Search the record with smallest (largest) key.
Given key Ki, find the record with the next larger (smaller) key.
If the key space is fixed, say m, where m is not so large, then the table can simply be
implemented as an array of m cells. With this, every key in the set is assigned a slot in the
table. If the key is the same as the index in the array, it is known as direct-address table.
A search algorithm accepts an argument and tries to find a record whose key is
equal to the one specified. If the search is successful, a pointer to the record is
returned; otherwise, a NULL pointer is returned back. Retrieval occurs when the
search is successful. This section discusses the ways to organize a table, as well as the
search operations on the different table organizations.
There are two general ways to organize a table: ordered and unordered. In an
ordered table, the elements are sorted based on their keys. Reference to the first
element, second element, and so on becomes possible. In an unordered table, there
is no relation presumed to exist among the records and their associated keys.
CHAPTER 5: HASH TABLES Monday, 20 February 2023 8
SEQUENTIAL SEARCH IN AN
UNORDERED TABLE
Given: A table of records R0, R1, ..., Rn-1 with keys K0, K1, ... Kn-1 respectively, where n ≥
0. Search for a value K:
− Step 1: Initialize: set i = 0
− Step 2: Compare: if K = Ki, stop - search is successful
− Step 3:Advance: Increase i by 1
− Step 4: End of File?: If i < n, go to Step 2. else stop: Search is unsuccessful
▪ Binary search begins with an interval covering the whole table, that is the middle value. If the search value is less
than the item in the middle of the interval, the interval is narrowed down to the lower half. Otherwise, it is
narrowed down to the upper half. This process of reducing the search size by half is repeatedly performed until
the value is found or the interval is empty. The algorithm for the binary search makes use of the following
relations in searching for the key K:
• K = Ki : stop, the desired record is found
• K < Ki : search the lower half - records with keys K1 to Ki-1
• K > Ki : search the upper half - records with keys Ki+1 to Kn
Binary search begins with an interval covering the whole table, that is the middle value. If the
search value is less than the item in the middle of the interval, the interval is narrowed down
to the lower half. Otherwise, it is narrowed down to the upper half. This process of reducing
the search size by half is repeatedly performed until the value is found or the interval is
empty. The algorithm for the binary search makes use of the following relations in searching
for the key K:
− K = Ki : stop, the desired record is found
− K < Ki : search the lower half - records with keys K1 to Ki-1
− K > Ki : search the upper half - records with keys Ki+1 to Kn
where i is initially the middle index val
Multiplicative binary search is similar to the previous algorithm but avoids the
division needed to find the middle key. To do that, there is a need to rearrange the
records in the table:
− Assign keys K1 < K2 < K3 < …< Kn to the nodes of a complete binary tree in inorder sequence.
− Arrange the records in the table according to the corresponding level-order sequence in the binary
tree.
Ki is initially chosen such that j = Fi, where Fi is the largest Fibonacci number less
than or equal to the table size n.
It is an assumption that the table is of size n= Fi+1 -1.
0 1 1 2 3 5 8 13 F
0 1 2 3 4 5 6 7 i
i = 5; Fi = 5; (Assumption) table size = Fi+1 - 1 = 7
j = 5, p = 3, q = 2: k < key[j]
j = 3, p = 2, q = 1: k < key[j]
j = 2, p = 1, q = 1: k = key[j] Successful
CHAPTER 5: HASH TABLES Monday, 20 February 2023 20
HASHING
There are different ways of folding. A key value can be folded in half. This is ideal for
relatively small key values since they would easily fit in the available addresses. If in
any case, the key would be unevenly split, the left fold must be greater than the right
fold. A key value can also be folded in thirds this is ideal for somewhat large key
values. We can also fold alternate digits. The digits in the odd positions form one
part, and the digits in the even positions form another. Folding in half and in thirds
can be done in yet another two ways. One is boundary folding where some parts
of the folded keys are reversed (imitating the way we fold paper) and then summed.
Last is shift folding where no parts of the folded keys are reversed.
CHAPTER 5: HASH TABLES Monday, 20 February 2023 23
SIMPLE HASH TECHNIQUES: FOLDING
CONT…
The following are some examples of shift folding: The following are some examples of boundary folding:
▪ Even digits, folding in half ▪ Even digits, folding in half
125758 => 125+758 => 883 125758 => 125+857 => 982
▪ Folding in thirds ▪ Folding in thirds
125758 => 12+57+58 => 127 125758 => 21+57+85 => 163
▪ Odd digits , folding in half ▪ Odd digits , folding in half
7453212 => 7453+212 => 7665 7453212 => 7453+212 => 7665
▪ Different digits, folding in thirds ▪ Different digits, folding in thirds
74532123 => 745+32+123 => 900 74532123 => 547+32+321 => 900
▪ Using XOR, folding in half ▪ Using XOR, folding in half
100101110 => 10010+1110 => 11100 100100110 => 10010+0110 => 10100
▪ Alternate digits ▪ Alternate digits
125758 => 155+278 => 433 125758 => 155+872 => 1027
CHAPTER 5: HASH TABLES Monday, 20 February 2023 24
COLLISION RESOLUTION TECHNIQUES:
CHAINING
For keys 125, 845, 444, 256, 345, 745, 902, 569, 254, 382,
234, 431, 947, 981, 792, 459 and 725, storage is pretty
straightforward – no overflow happened.
Hashing maps the possible range of key values into a smaller range of relative
addresses.
Simple hash techniques include the prime number division method and folding.
Chaining makes use of m linked lists, one for each possible address in the hash table.
In the use of buckets method, collision happens when a bucket overflows.
Linear probing and double hashing are two open addressing techniques.