0% found this document useful (0 votes)
47 views35 pages

Chapter 5 - Hash Tables

Uploaded by

Fenan Zeinu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views35 pages

Chapter 5 - Hash Tables

Uploaded by

Fenan Zeinu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

BIOMEDICAL ENGINEERING

DATA STRUCTURES AND ALGORITHMS


CHAPTER 5
HASH TABLES
LEARNING OBJECTIVES

At the end of this chapter, you will be able to:


▪ Discuss the basic concepts and definitions on tables: keys, operations and implementation.
▪ Explain table organization – unordered and ordered.
▪ Perform searching on a table using sequential, indexed sequential, binary and Fibonacci
search.
▪ Define hashing and explain how hashing works.
▪ Implement simple hashing techniques.
▪ Discuss how collisions are avoided/minimized by using collision resolution techniques.
CHAPTER 5: HASH TABLES Monday, 20 February 2023 3
DEFINITIONS AND RELATED CONCEPTS

 Table is defined as a group of elements, each of


which is called a record. Each record has a unique
key associated with it that is used to distinguish
records. A table is also referred to as a file. The
following figure illustrates a table.
 In the table, n records are stored. Ki is the key at
position i, while Xi is the associated data. The
notation used for a record is (Ki, Xi).

CHAPTER 5: HASH TABLES Monday, 20 February 2023 4


TYPES OF KEYS

 If the key is contained within the record at a specific offset relative to the start of the
record, it is known as internal or embedded key. If the keys are contained in a
separate table with pointers to the associated data, the key is classified as external
key.

CHAPTER 5: HASH TABLES Monday, 20 February 2023 5


OPERATIONS

Aside from searching, several other operations can be done on a table. The following
is a list of the possible operations:
 Searching for the record where Ki = K, where K is given by the user.
 Insertion.
 Deletion.
 Search the record with smallest (largest) key.
 Given key Ki, find the record with the next larger (smaller) key.

CHAPTER 5: HASH TABLES Monday, 20 February 2023 6


IMPLEMENTATION

 A table could be implemented using sequential allocation, linked allocation or a combination


of both. In implementing the ADT tree, there are several factors to consider:
− Size of the key space Uk, i.e., the number of possible keys
− Nature of the table: dynamic or static
− Type and mix of operations performed on the table

 If the key space is fixed, say m, where m is not so large, then the table can simply be
implemented as an array of m cells. With this, every key in the set is assigned a slot in the
table. If the key is the same as the index in the array, it is known as direct-address table.

CHAPTER 5: HASH TABLES Monday, 20 February 2023 7


TABLES AND SEARCHING

 A search algorithm accepts an argument and tries to find a record whose key is
equal to the one specified. If the search is successful, a pointer to the record is
returned; otherwise, a NULL pointer is returned back. Retrieval occurs when the
search is successful. This section discusses the ways to organize a table, as well as the
search operations on the different table organizations.
 There are two general ways to organize a table: ordered and unordered. In an
ordered table, the elements are sorted based on their keys. Reference to the first
element, second element, and so on becomes possible. In an unordered table, there
is no relation presumed to exist among the records and their associated keys.
CHAPTER 5: HASH TABLES Monday, 20 February 2023 8
SEQUENTIAL SEARCH IN AN
UNORDERED TABLE

 Sequential search reads each record


sequentially from the beginning until the
record or records being sought are found. It
is applicable to a table that is organized
either as an array or as a linked list. This
search is also known as linear search.

CHAPTER 5: HASH TABLES Monday, 20 February 2023 9


LINEAR SEARCH ALGORITHM

 Given: A table of records R0, R1, ..., Rn-1 with keys K0, K1, ... Kn-1 respectively, where n ≥
0. Search for a value K:
− Step 1: Initialize: set i = 0
− Step 2: Compare: if K = Ki, stop - search is successful
− Step 3:Advance: Increase i by 1
− Step 4: End of File?: If i < n, go to Step 2. else stop: Search is unsuccessful

CHAPTER 5: HASH TABLES Monday, 20 February 2023 10


BINARY SEARCH IN AN ORDERED TABLE

▪ Binary search begins with an interval covering the whole table, that is the middle value. If the search value is less
than the item in the middle of the interval, the interval is narrowed down to the lower half. Otherwise, it is
narrowed down to the upper half. This process of reducing the search size by half is repeatedly performed until
the value is found or the interval is empty. The algorithm for the binary search makes use of the following
relations in searching for the key K:
• K = Ki : stop, the desired record is found
• K < Ki : search the lower half - records with keys K1 to Ki-1
• K > Ki : search the upper half - records with keys Ki+1 to Kn

 where i is initially the middle index value.

CHAPTER 5: HASH TABLES Monday, 20 February 2023 11


BINARY SEARCH IN AN ORDERED TABLE

 Binary search begins with an interval covering the whole table, that is the middle value. If the
search value is less than the item in the middle of the interval, the interval is narrowed down
to the lower half. Otherwise, it is narrowed down to the upper half. This process of reducing
the search size by half is repeatedly performed until the value is found or the interval is
empty. The algorithm for the binary search makes use of the following relations in searching
for the key K:
− K = Ki : stop, the desired record is found
− K < Ki : search the lower half - records with keys K1 to Ki-1
− K > Ki : search the upper half - records with keys Ki+1 to Kn
where i is initially the middle index val

CHAPTER 5: HASH TABLES Monday, 20 February 2023 12


BINARY SEARCH ALGORITHM

int binarySearch(int k,Table t){


int lower = 0;
int upper = t.size - 1;
int middle;
while (lower < upper){
middle = (int) Math.floor((lower + upper) / 2);
if (k == t.key[middle]) return middle;
else if (k > t.key[middle]) lower = middle + 1;
else upper = upper - 1;
}
return(-1);
}
CHAPTER 5: HASH TABLES Monday, 20 February 2023 13
EXAMPLE: BINARY SEARCH

 Search for the key k=34567 given the following table:

lower = 0, upper = 6, middle = 3: k < kmiddle (45678)


lower = 0, upper = 2, middle = 1: k > kmiddle (23456)
lower = 2, upper = 2, middle = 2: k = kmiddle (34567) ==> successful search
 Since the search area is reduced logarithmically, i.e., halved each time the size is reduced, the
time complexity of the algorithm is O(log2n). The algorithm may be used if the table uses
indexed sequential organization. However, it can only be used with ordered tables stored as
an array.
CHAPTER 5: HASH TABLES Monday, 20 February 2023 14
MULTIPLICATIVE BINARY SEARCH
IN AN ORDERED TABLE

 Multiplicative binary search is similar to the previous algorithm but avoids the
division needed to find the middle key. To do that, there is a need to rearrange the
records in the table:
− Assign keys K1 < K2 < K3 < …< Kn to the nodes of a complete binary tree in inorder sequence.
− Arrange the records in the table according to the corresponding level-order sequence in the binary
tree.

CHAPTER 5: HASH TABLES Monday, 20 February 2023 15


MULTIPLICATIVE BINARY SEARCH
ALGORITHM

int multiplicativeBinarySearch(int k, int key[]){


int i = 0;
while (i < key.length){
if (k == key[i]) return i; /* successful search */
else if (k < key[i]) i = 2 * i + 1; /* go left */
else i = 2 * i + 2; /* go right */
}
return -1; /* unsuccessful search */
}
CHAPTER 5: HASH TABLES Monday, 20 February 2023 16
FIBONACCI SEARCH IN
AN ORDERED TABLE

 Fibonacci search uses the simple


properties of the Fibonacci number Fibonacci Sequence Algorithm:
sequence defined by the following
recurrence relation: int Fibonacci(int i){
− F0 = 0 if (i == 0) return 0;
else if (i == 1) return 1;
− F1 = 1
else return Fibonacci(i-1) + Fibonacci(i-2);
− Fj = Fi-2 + Fi-1 , i ≥ 2
}
 That is, the sequence 0, 1, 1, 2, 3, 5, 8,
13, 21, ...
CHAPTER 5: HASH TABLES Monday, 20 February 2023 17
FIBONACCI SEARCH ALGORITHM

 In the search algorithm, two auxiliary variables, p and q, are used:


− p = Fi-1
− q = Fi-2

 Ki is initially chosen such that j = Fi, where Fi is the largest Fibonacci number less
than or equal to the table size n.
 It is an assumption that the table is of size n= Fi+1 -1.

CHAPTER 5: HASH TABLES Monday, 20 February 2023 18


FIBONACCI SEARCH ALGORITHM
CONT…

Three states of comparison are possible:


 If K = Ki, stop: successful search
 If K < Kj
− Discard all the keys with indexes greater than j.
− Set j = j – q
− Shift p and q one place to the left of the number sequence
 If K > Kj,
− Discard all the keys with indexes less than j
− Set j = j + q
− Shift p and q two places to the left of the number sequence
 K < Kj and q=0 or K > Kj and p=1: search unsuccessful
This algorithm finds the element from index 1 to n, and since indexing in Java starts at 0, there
is a need to handle the case where k = key[0].

CHAPTER 5: HASH TABLES Monday, 20 February 2023 19


EXAMPLE: FIBONACCI SEARCH

 Search for key k = 34567

0 1 1 2 3 5 8 13 F
0 1 2 3 4 5 6 7 i
 i = 5; Fi = 5; (Assumption) table size = Fi+1 - 1 = 7
 j = 5, p = 3, q = 2: k < key[j]
 j = 3, p = 2, q = 1: k < key[j]
 j = 2, p = 1, q = 1: k = key[j] Successful
CHAPTER 5: HASH TABLES Monday, 20 February 2023 20
HASHING

 Hashing is the application of a mathematical function


(called a hash function) to the key values that results
in mapping the possible range of key values into a
smaller range of relative addresses. The hash function
is something like a locked box that needs a key to get
the output which, in this case, is the address where
the key is stored.
 In hashing, there is a need to choose a good hash
function and consequently, select a method to
resolve, if not eliminate, collisions. A good hash
function performs fast computation, in time
complexity of O(1), and produces less (or no)
collisions.

CHAPTER 5: HASH TABLES Monday, 20 February 2023 21


SIMPLE HASH TECHNIQUES: PRIME NUMBER
DIVISION METHOD

 This method is one of the most For example, let n = 13:


common randomizing scheme. If
the key value is divided by a
number n, the range of address
generated will be of the form 0 to
n-1.The formula is:
− h(k) = k mod n
− where k is the integer key value and n
is a prime number

CHAPTER 5: HASH TABLES Monday, 20 February 2023 22


SIMPLE HASH TECHNIQUES: FOLDING

 There are different ways of folding. A key value can be folded in half. This is ideal for
relatively small key values since they would easily fit in the available addresses. If in
any case, the key would be unevenly split, the left fold must be greater than the right
fold. A key value can also be folded in thirds this is ideal for somewhat large key
values. We can also fold alternate digits. The digits in the odd positions form one
part, and the digits in the even positions form another. Folding in half and in thirds
can be done in yet another two ways. One is boundary folding where some parts
of the folded keys are reversed (imitating the way we fold paper) and then summed.
Last is shift folding where no parts of the folded keys are reversed.
CHAPTER 5: HASH TABLES Monday, 20 February 2023 23
SIMPLE HASH TECHNIQUES: FOLDING
CONT…
The following are some examples of shift folding: The following are some examples of boundary folding:
▪ Even digits, folding in half ▪ Even digits, folding in half
125758 => 125+758 => 883 125758 => 125+857 => 982
▪ Folding in thirds ▪ Folding in thirds
125758 => 12+57+58 => 127 125758 => 21+57+85 => 163
▪ Odd digits , folding in half ▪ Odd digits , folding in half
7453212 => 7453+212 => 7665 7453212 => 7453+212 => 7665
▪ Different digits, folding in thirds ▪ Different digits, folding in thirds
74532123 => 745+32+123 => 900 74532123 => 547+32+321 => 900
▪ Using XOR, folding in half ▪ Using XOR, folding in half
100101110 => 10010+1110 => 11100 100100110 => 10010+0110 => 10100
▪ Alternate digits ▪ Alternate digits
125758 => 155+278 => 433 125758 => 155+872 => 1027
CHAPTER 5: HASH TABLES Monday, 20 February 2023 24
COLLISION RESOLUTION TECHNIQUES:
CHAINING

 In chaining, m linked lists are


maintained, one for each possible
address in the hash table. Using
chaining to resolve collision in
storing the example in Prime
Number Division Method of
hashing.

CHAPTER 5: HASH TABLES Monday, 20 February 2023 25


COLLISION RESOLUTION TECHNIQUES:
CHAINING CONT…

 Keys 845 and 234 both hash to


address 0 so they are linked
together in the address. The case is
the same for addresses 2, 4, 5, 7 and
10, while the rest of the addresses
have no collision. The chaining
method resolved the collision by
providing additional link nodes to
each of the values.
Initial Table After Insertions
CHAPTER 5: HASH TABLES Monday, 20 February 2023 26
COLLISION RESOLUTION TECHNIQUES:
USE OF BUCKETS
 Like chaining, this method divides the hash table into
m groups of records where each group contains
exactly b records, having each address regarded as a
bucket. The application of this was shown on the
figure to the right.
 Collision is redefined in this approach. It happens
when a bucket overflows – that is, when an insertion
is attempted on a full bucket. Hence, there is a
significant reduction in the number of collisions.
However, this method wastes some space and is not
free from overflowing further, in which case an
overflow policy must be invoked. In the above
example, there are three slots in each address. Being
static in size, problem arises when more than three
values hash to a single address.
CHAPTER 5: HASH TABLES Monday, 20 February 2023 27
COLLISION RESOLUTION TECHNIQUES:
OPEN ADDRESSING (PROBING)
 Linear probing is one of the simplest techniques
in handling collisions in which the file is scanned
or probed sequentially as a circular file and the
colliding key is stored in the nearest available
space to the address. This is used in systems
where a “first come, first served” basis is
followed.
 Whenever a collision occurs at a certain
address, the succeeding addresses are probed,
that is, searched sequentially until an empty one
is found. The key then uses this address. Array
should be considered circular, so that when the
last location is reached, the search proceeds to
the first location of the array.
CHAPTER 5: HASH TABLES Monday, 20 February 2023 28
COLLISION RESOLUTION TECHNIQUES:
OPEN ADDRESSING (PROBING) CONT…

 In this technique, key 642 hashed to address 2 but it


is already full. Probing for the next available space
led to address 3 where key is safely stored. Later in
the insertion process, the key 458 hash to address 3
and it is stored on the second slot in the address.
With key 421 that hashed to a full address 5, the
next available space is at address 6, where the key is
stored.

CHAPTER 5: HASH TABLES Monday, 20 February 2023 29


COLLISION RESOLUTION TECHNIQUES:
OPEN ADDRESSING (DOUBLE HASHING)
 Double hashing makes use of a second hash function, say
h2(k), whenever there's a collision. The record is initially
hashed to an address using the primary function. If the
hashed address is not available, a second hash function is
applied and added to the first hashed value, and the colliding
key is hashed to the new address if there is available space. If
there is none, the process is repeated.
 Step 1: Use the primary hash function h1(k) to determine
the position i at which to place the value.
 Step 2: If there is a collision, use the rehash function rh (i, k)
successively until an empty slot is found:
− rh (i, k) = (i + h2(k)) mod m
− where m is the number of addresses
CHAPTER 5: HASH TABLES Monday, 20 February 2023 30
COLLISION RESOLUTION TECHNIQUES:
OPEN ADDRESSING (DOUBLE HASHING)

 For keys 125, 845, 444, 256, 345, 745, 902, 569, 254, 382,
234, 431, 947, 981, 792, 459 and 725, storage is pretty
straightforward – no overflow happened.

CHAPTER 5: HASH TABLES Monday, 20 February 2023 31


COLLISION RESOLUTION TECHNIQUES:
OPEN ADDRESSING (DOUBLE HASHING) CONT…

 Inserting 652 into the hash table results to an overflow


at address 2 so we rehash:
h2 (652) = 652 mod 11 = 3
rh (2, 652) = (2 + 3) mod 13 = 5,
but address 5 is already full so rehash:
rh (5, 652) = (5 + 3) mod 13 = 8,
with space; so store here.

CHAPTER 5: HASH TABLES Monday, 20 February 2023 32


COLLISION RESOLUTION TECHNIQUES:
OPEN ADDRESSING (DOUBLE HASHING) CONT…

 Hashing 421 also results in a collision, so we rehash:


h2(421) = 421 mod 11 = 3
rh(5, 421) = (5 + 3) mod 13 = 8,
but address 8 is already full so rehash:
rh(8, 421) = (8 + 3) mod 13 = 11,
with space - so store here.
▪ Lastly, key 458 is stored at address 3

CHAPTER 5: HASH TABLES Monday, 20 February 2023 33


SUMMARY

 A table is a group of elements, each of which is called a record


 Two types of keys are internal or embedded key, and external key
 Tables may be represented using sequential allocation, link allocation or both
 Searching is eliminated in direct-address table. Also, insertion and deletion are
relatively straightforward
 A table may be ordered or unordered. Linear search is used if the table is unordered,
while indexed sequential search, binary search or Fibonacci search is used if the table
is ordered
CHAPTER 5: HASH TABLES Monday, 20 February 2023 34
SUMMARY CONT…

 Hashing maps the possible range of key values into a smaller range of relative
addresses.
 Simple hash techniques include the prime number division method and folding.
 Chaining makes use of m linked lists, one for each possible address in the hash table.
 In the use of buckets method, collision happens when a bucket overflows.
 Linear probing and double hashing are two open addressing techniques.

CHAPTER 5: HASH TABLES Monday, 20 February 2023 35

You might also like