0% found this document useful (0 votes)
190 views78 pages

Hashing: Data Structures and Algorithms in Java

Quadratic probing resolves collisions by probing positions that are quadratic distances from the original hash position. This example shows how it works by incrementing i by 1 each time to calculate h'(K) = (h(K) ± i^2) mod TSize until an empty position is found.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
190 views78 pages

Hashing: Data Structures and Algorithms in Java

Quadratic probing resolves collisions by probing positions that are quadratic distances from the original hash position. This example shows how it works by incrementing i by 1 each time to calculate h'(K) = (h(K) ± i^2) mod TSize until an empty position is found.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 78

Chapter 10

Hashing

Data Structures and Algorithms in Java


Introduction

Structure Array Linked list BST


Add first O(n) O(1) O(height)
Add last O(1) O(1) O(height)
Search O(n) O(n) O(height) -> O(logn)
Remove O(n) O(1) O(height)

What we have to do if requirements in search


and remove operations needs higher efficiency?

It still be “Divide and


Hashing
Conquer principle”

Data Structures and Algorithms in Java 2


Introduction
Function/ Mapping: a process which accepts
input then give ONLY ONE output Hashing

h Sub set 0
x y

Sub set 1
Domain of h Codomain of h
( Integers ) Sub set 2
position
…..
Sub set i
Hash
item function ….
Sub set n-1
O(1)
Data Structures and Algorithms in Java 3
Contents

Discuss the following topics:


• Basic of Hashing
• Common Hash Functions
• Collision Resolution
• Deletion
• Perfect Hash Functions
• Hash Functions for Extendible( extensible)
Files
• Hashing in java.util
Your works: Re-implement project
( At the end of the slides)

Data Structures and Algorithms in Java 4


Objectives
LO7.1 Explain the concept of "hash". Define concepts
hash function and hash table and their application.
LO7.2 Demonstrate the types of hash functions: Division,
Folding,...
LO7.3 Explain the collision and collision-handling.
LO7.4 Explain the open addressing method for collision-
resolution: linear and quadratic probing.
LO7.5 Explain the chaining method for collision-resolution:
separate chaining and Coalesced chaining.
LO7.6 Define perfect hash function and extendible
hashing.

Data Structures and Algorithms in Java 5


1- Basic of Hashing …
• What is hashing?  A process in which a large
data set will be partitioned to some data subsets.
• What is the tool for hashing?  hash function
• What will hash function do?  This function is
constructed by implementer which accepts input
data ( whole initial data or a chunk of initial data
or memory address of data) and an unique index
is it’s output  index of a subset.
• Who will implement hash function?  Hashing
implementer.

Data Structures and Algorithms in Java 6


1- Basic of Hashing …
Array vs Hash table:
Similarities:
• Contiguous memory blocks
Differences:
Property Array Hash table
Index Direct a[i] Output of hash function
Storing position contiguous May not be contiguous
Content of an item Data object only Data object + extra information
Memory efficiency The best Extra memory is it’s cost
Add operation O(n) O(1) + a little
Search operation O(n) O(1) + a little
Remove O(n) O(1) + a little

Data Structures and Algorithms in Java 7


1- Basic of Hashing …
• To find a function (h) that can transform a
particular key (K) (a string, number or record)
into an index in the table used for storing
items of the same type as K, the function h is
called a hash function
• If h transforms different keys into different
numbers, it is called a perfect hash
function
• To create a perfect hash function, the table
has to contain at least the same number of
positions as the number of elements being
hashed
Data Structures and Algorithms in Java 8
1- Basic of Hashing …
• Data structure of a hash table entry:

Whether this memory Data for collission


stored data or not Used? Data object
resolution

• Why conflict resolution is needed?  Because


some data objects are mapping to only one
index.
• Ex:
K1= 1025  h(K1) = 1025%100 = 25
K2= 125  h(K2) = 125%100 = 25
K3= 25  h(K3) = 25%100 = 25
Data Structures and Algorithms in Java 9
1- Basic of Hashing…
Data index Storage

K1, val1 0
1
K2, val2 2
Hash function
K3, val3 h: Key  index 3
 Search: O(1) 4
… 5
Kn, valn 6
7
There are mn functions …
 What function should we select? …
m: number of needed blocks for storing items. …
A key can be any value in the set of keys …
 There is a situation in which some keys are m-1
mapped to one index? How to resolve this
collision?

Data Structures and Algorithms in Java 10


2- Common Hash Functions
• The division method is the preferred choice(%, modulo).
TSize =sizeof(table), as in h(K) = K mod Tsize(K%Tsize)

Folding method: h(K) = sum of parts mod Tsize


Ex: K= 123-45-6789  sum 3 parts:123 + 45 + 6789= 6957 Hash table
 h(K)=6957 mod TSize designer will
Ex: K= 123-45-6789  Sum 5 parts:12 + 34+56+78+9 = 189 decide an
approach for
Mid-square method :
hash function
Ex: TSize= 1024= 210, K= 3121  K2=9 740 641
K2= 1001010 0101000010 1100001  h(K) = 01010000102 = 32
Extraction method: Only a part of the key is used to compute the address
Ex: K = 123-45-6789, h(K) = 1289 mod Tsize
Radix transformation method:
K = 34510  4239  h(K) = 423 mod Tsize ( =23 if Tsize=100 )

Data Structures and Algorithms in Java 11


3- Collision Resolution
Collision: A situation in which 2 distinct inputs but the hash
function give the same output  Same positions
• Ex:
K1= 1025  h(K1) = 1025%100 = 25
K2= 125  h(K2) = 125%100 = 25
K3= 25  h(K3) = 25%100 = 25

Common methods are used as solutions:


- Open Addressing Method – Dò tìm vị trí kế cận
- Chaining Method/ Coalesced chaining – băm theo nhóm
- Bucket Addressing : Một phần tử là một khối data

Data Structures and Algorithms in Java 12


3- Collision Resolution…
Open Addressing Method: when a key collides with
another key, the collision is resolved by finding an
available table entry other than the position (address)
to which the colliding key is originally hashed. Common
methods:

(1) if no collision, using h(k).


(2) If collision, using h’(k) = h(k) + f(i): i varies from 1, 2, 3, 4,
5,…. until an empty position is found

f(i): probing function. It can be linear (bậc 1 – dò tuyến tính-


simplest method) or quadratic function (bậc 2 – dò bậc 2)

Data Structures and Algorithms in Java 13


3- Collision Resolution…
Open Addressing Method:
Linear Probing, p(i) =i, h’(K) = (h(K) + i) mod TSize
i= 1, 2, 3…

Resolving collisions with the linear probing method. Subscripts


indicate the home positions of the keys being hashed.
Data Structures and Algorithms in Java 14
3- Collision Resolution…
Open Addressing Method:
Quadratic method: p(i)=  i2, h’(K) = (h(K)  i2) mod TSize

Insert B9
Collision
Insert B5  probe: i=1
Collision  h(9+1)=h(10)=0 OK
 probe: i=1 Insert C2
 h(5+1)=h(6)=6 Collision
OK  probe: i=1
Insert B2  h(2+1)=h(3)=3 No OK
Collision  h(2-1)=h(1)=1  No OK
 probe: i=1  probe: i=2, i2=4
 h(2+1)=h(3)=3  h(2+4)= 6  No OK
No OK  h(2-4), -2<0  No OK
 h(2-1)=h(1)=1  probe: i=3, i2=9
 OK  h(2+9)= 1  No OK
 h(2-9)= 2-9<0 No OK
 probe: i=4, i2=16
 h(2+16)= 8  OK

Using quadratic probing for collision resolution


h’(K) = (h(K)  i2) mod 10
Data Structures and Algorithms in Java 15
3- Collision Resolution…
Evaluation:

Formulas approximating, for different hashing methods, the


average numbers of trials for successful and unsuccessful
searches (Knuth, 1998)

Data Structures and Algorithms in Java 16


3- Collision Resolution…

Chaining Method
• Keys do not have to stored in table itself, each
position of the table is associated with a linked
list or chain of structures whose info fields
store keys or references to keys
• This method is called separate chaining, and a
table of references (pointers) is called a scatter
table (bảng phân phối)

Data Structures and Algorithms in Java 17


3- Collision Resolution…
Chaining Method

h(K)  index of a
linked list of
elements having
the same value of
hash function.

K h(K)  index
traverse the
appropriate list to
find the element
having this key.

In chaining, colliding keys are put on the same linked list


Data Structures and Algorithms in Java 18
3- Collision Resolution…

Coalesced chaining
• A version of chaining called coalesced hashing-
băm theo nhóm sử dụng array- (or coalesced chaining)
combines linear probing with chaining
• An overflow area (vùng tràn, hầm chứa) known
as a cellar can be allocated to store keys for
which there is no room in the table

Data Structures and Algorithms in Java 19


3- Collision Resolution…
Coalesced chaining

Index of
next
element
in the
7
same
group

Coalesced hashing puts a colliding key in the last


available position of the table
Data Structures and Algorithms in Java 20
3- Collision Resolution…
Coalesced chaining

Main
area

When cellar
is full,
inserted
element will
be put to
the main
Cellar: overflow area
region
Mechansm: bottom-up

Coalesced hashing that uses a cellar

Data Structures and Algorithms in Java 21


3- Collision Resolution…

Bucket Addressing Method


• To store colliding elements in the same position
in the table can be achieved by associating a
bucket with each address
• A bucket (khối) is a block of space large enough
to store multiple items

Data Structures and Algorithms in Java 22


3- Collision Resolution…
Bucket Addressing
bucket

Insert C2
 Collision
Use linear probing
Bucket 3 containing
a space
Insert C2 to bucket
3

Collision resolution with


buckets ( bucket=2)
and linear probing method

Data Structures and Algorithms in Java 23


3- Collision Resolution…
Bucket
Addressing

bucket

Reference to separate
overflow area

Collision
resolution with
buckets and
overflow area

Data Structures and Algorithms in Java 24


4- Deletion
Begin

Deleted data (k)

h(k)

index
Search
and Group k
delete k

End

Data Structures and Algorithms in Java 25


4- Deletion
• Structure and collision resolution of the hash
table will decide the way by which its elements
are deleted. They can be
– Linear search for deletion
– Linear search to locate the linked list of the
subgroup then delete an element in this linked
list.
– Linear search to locate the subgroup then delete
an element in this subgroup, update references to
next elements in the same subgroup.

Data Structures and Algorithms in Java 26


4- Deletion…
H’(k) = H(k) + i
Update locations

Delete A4
Linear search in the situation
where both insertion and deletion of keys are permitted

Data Structures and Algorithms in Java 27


5- Perfect Hash Functions (*)
Hash table
Key set K |K| = |I| Index set I

One-to-
one and
onto
mapping
k i
h
Ideal
situation
h(k1) != h(k2), if k1 !=k2
h=?

Memory is available for storing all items and


no overflow area (cellar) is needed.

Data Structures and Algorithms in Java 28


5- Perfect Hash Functions (*)…

• If a function requires only as many cells in the


table as the number of data so that no empty
cell remains after hashing is completed, it is
called a minimal perfect hash function  Vừa
đủ để lưu trữ
• (*) You need to know essential features on this
part only.

Data Structures and Algorithms in Java 29


Perfect Hash Functions…
Cichelli’s method
• Cichelli's Method is implemented to minimize the number of collisions
when mapping values to a hash table, using a hash function. This program
reads key words from a text file and inserts these key words into a hash
table by following Cichelli's method.
• Cichelli’s method is an algorithm to construct a minimal perfect hash
function
• It is used to hash a relatively small number of reserved words
• Where g is the function to be constructed
h(word) = (length(word) + g(firstLetter(word)) + g(lastLetter(word))) mod Tsize
• The algorithm has three parts:
– Computation of the letter occurrences
– Ordering the words
– Searching

Data Structures and Algorithms in Java 30


5- Perfect Hash Functions…
Cichelli’s method
• Choose a value of max;
• Compute the number of occurrences of each
first letter and last letter in the set of all words.
• Order all words in accordance to the frequency
of occurrence of the first and the last letters;

Data Structures and Algorithms in Java 31


5- Perfect Hash Functions…
Cichelli’s method

Set of nine Muses – 9 nhà thơ


Calliope, Clio, Erato, Euterpe, Melpomene,
Polyhymia, Terpsichore, Thalia, Urania

Occurrence of each letter at the first and


the last position:
E (6), A(3), C(2), O(2), T(2), M(1), P(1),
U(1).

According to frequencies, list may be


ordered as :
Euterpe, Calliope, Erato, Terpsichore,
Melpomene, Thalia, Clio, Polyhymnia,
Urania

Figure 10-11 Subsequent invocations of the searching procedure with Max = 4


in Cichelli’s algorithm assign the indicated values to letters and to the list of
reserved hash values. The asterisks indicate failures.

Data Structures and Algorithms in Java 32


5- Perfect Hash Functions…
The FHCD Algorithm

• The FHCD algorithm, an extension of Cechelli’s


approach, devised by Thomas Sager, searches
for a minimal perfect hash function of the form
(modulo TSize), where g is the function to be
determined by the algorithm
h(word) = h0(word) + g(h1(word)) + g(h2(word))

Data Structures and Algorithms in Java 33


5- Perfect Hash Functions…
The FHCD Algorithm

Data Structures and Algorithms in Java 34


6- Hash Functions for Extendible Files (*)

• File=table.
• Expandable hashing, dynamic hashing, and extendible hashing
methods distribute keys among buckets in a similar fashion
• Data  h(Data)  index  buckets[index]
• The main difference is the structure of the index (directory)
• In expandable hashing and dynamic hashing, a binary tree is used
as an index of buckets
• In extendible hashing, a directory of records is kept in a table

Data Structures and Algorithms in Java 35


6- Hash Functions for Extendible Files …

• Extendible hashing accesses the data stored in


buckets indirectly through an index that is
dynamically adjusted to reflect changes in the
file
• Extendible hashing allows the file to expand
without reorganizing it, but requires storage
space for an index
• Values returned by such a hash function are
called pseudokeys

Data Structures and Algorithms in Java 36


6- Hash Functions for Extendible Files …

• It is commonly used in database files, file = hash table


• Record = <key, value>
• A bucket contains some records, a bucket has a unique index and
new buckets can be created.
• Bucket indexes are stored in an distinct area and it is called as
directory file = {directory, bucket1, bucket 2, ……}
• Extendible hashing allows the file to expand without reorganizing it,
but requires storage space for an index
• Multi-level / extendible hashing can be used
• Values returned by such a hash function are called pseudokeys
• Data  h(key)  index  access directory  file position of
buckets[index]
• Cluster = 4KB

Data Structures and Algorithms in Java 37


6- Hash Functions for Extendible Files…

Function h generates
patterns of 5 bits

2 leftmost bits present the


position in the directory
containing the reference to
the bucket containing the
key. Number of bits is
called local depth (2 in
this example).

11001 arrives.
Case of splitting
a bucket Case of increase
depth, expand
An example of extendible hashing the directory

Data Structures and Algorithms in Java 38


6- Hash Functions for Extendible Files…

• With this method, no index is necessary because new


buckets generated by splitting existing buckets are
always added in the same linear way, so there is no
need to retain indexes
• A bucket is full when its loading factor exceeds a certain
level. This bucket will be splitted.
• A reference split indicates which bucket is to be split
next
• After the bucket is divided, the keys in this bucket
are distributed between this bucket and the newly
created bucket, which is added to the end of the
table

Data Structures and Algorithms in Java 39


6- Hash Functions for Extendible Files…

Linear Hashing…

TSize=3 Some hash functions may be applied.

Splitting buckets in the linear hashing technique


Data Structures and Algorithms in Java 40
6- Hash Functions for Extendible Files…
Linear Hashing…
TSize=3 H0(K) = K mod TSize H1(K) = K mod 2*Tsize

Figure 10-15 Inserting keys to buckets and overflow areas with


the linear hashing technique

Data Structures and Algorithms in Java 41


7- Hashing in java.util

• Main classes implement hashing


technique
• The HashMap class
• The HashSet class
• The HashTable class

Data Structures and Algorithms in Java 42


Using The HashMap class

• HashMap is an implementation of the interface


Map
• A map is a collection that holds pairs (key, value)
or entries
• A hash map is a collection of singly linked
lists (buckets); that is, chaining is used as a
collision resolution technique
• In a hash map, both null values and null keys
are permitted

Data Structures and Algorithms in Java 43


Using The HashMap class…

Methods in class HashMap including three inherited methods


Data Structures and Algorithms in Java 44
Using The HashMap class…

Methods in class HashMap including three inherited methods


Data Structures and Algorithms in Java 45
Using The HashMap class…
This example (in textbook)
demonstates how to use the
HashMap class to manage a list
of person
< name, age,hashCode> in
which the hasCode is the sum
of charcter codes in the field
name.

Click to go the
HashSet class

Figure 10-17 Demonstrating the operation of the methods in class HashMap

Data Structures and Algorithms in Java 46


Using The HashMap class…

Demonstrating the operation of the methods in class HashMap


Data Structures and Algorithms in Java 47
Using The HashMap class…

Demonstrating
the operation of
the methods in
class HashMap

Data Structures and Algorithms in Java 48


Using The HashSet class

• HashSet is another implementation of a set


(an object that stores unique elements)
• Class hierarchy in java.util for HashSet is:
Object → AbstractCollection → AbstractSet → HashSet
• HashSet is implemented in terms of HashMap
public HashSet() {
map = new HashMap();
}

Data Structures and Algorithms in Java 49


Using The HashSet class…

Methods in class HashSet including some inherited methods


Data Structures and Algorithms in Java 50
Using The HashSet class…

Methods in class HashSet including some inherited methods


Data Structures and Algorithms in Java 51
Using The HashSet class…

Methods in class HashSet including some inherited methods

Data Structures and Algorithms in Java 52


Using The HashTable

• A Hashtable is roughly equivalent (gần tương đương)


to a HashMap except that it is synchronized and
does not permit null values with methods to
operate on hash tables
• The class Hashtable is considered a legacy
class, just like the class Vector
• Class hierarchy in java.util is:
Object → Dictionary → Hashtable

Data Structures and Algorithms in Java 53


Using The Hashtable class…

Figure 10-20 Methods of the class Hashtable including three


inherited methods

Data Structures and Algorithms in Java 54


Using The Hashtable class…

Figure 10-20 Methods of the class Hashtable including three


inherited methods (continued)

Data Structures and Algorithms in Java 55


Using The Hashtable class…

Figure 10-20 Methods of the class Hashtable including three


inherited methods (continued)

Data Structures and Algorithms in Java 56


Using The Hashtable class…

Figure 10-20 Methods of the class Hashtable including three


inherited methods (continued)

Data Structures and Algorithms in Java 57


Summary: Objectives
 LO7.1 Explain the concept of "hash". Define concepts
hash function and hash table and their application.
LO7.2 Demonstrate the types of hash functions: Division,
 Folding,...
LO7.2 Explain the collision and collision-handling.

LO7.3 Explain the open addressing method for collision-
resolution: linear and quadratic probing.
LO7.4 Explain the chaining method for collision-resolution:
 separate chaining and Coalesced chaining.
LO7.5 Define perfect hash function and extendible
 hashing.

Data Structures and Algorithms in Java 58


Summary

• Common hash functions include the division,


folding, mid-square, extraction and radix
transformation methods.
• Collision resolution includes the open
addressing, chaining, and bucket addressing
methods.
• Cichelli’s method is an algorithm to construct a
minimal perfect hash function

Data Structures and Algorithms in Java 59


Summary (continued)

• The FHCD algorithm searches for a minimal


perfect hash function of the form (modulo
TSize), where g is the function to be determined
by the algorithm
• In expandable hashing and dynamic hashing, a
binary tree is used as an index of buckets
• In extendible hashing, a directory of records is
kept in a table

Data Structures and Algorithms in Java 60


Summary (continued)

• A hash map is a collection of singly linked lists


(buckets); that is, chaining is used as a collision
resolution technique
• HashSet is another implementation of a set
(an object that stores unique elements)
• A Hashtable is roughly equivalent to a
HashMap except that it is synchronized and
does not permit null values with methods to
operate on hash tables

Data Structures and Algorithms in Java 61


Notices about using hash tables
• When should hashtables be used:
– Elements in a group are different and insertion
and search are main operations.
• What are things to be concerned before a
hashtable is implemented?
– Choose a key for each element: number/string?
– Choose a hash function
– Choose a collision resolution
because these things will affect on algorithms that
will be selected in our hashtable.

Data Structures and Algorithms in Java 62


Demo 1: Using HashMap to compute probabilities
of characters in a text file

Data Structures and Algorithms in Java 63


Demo 1: Using HashMap to compute probabilities
of characters in a text file

Data Structures and Algorithms in Java 64


Demo 1: Using HashMap to compute probabilities
of characters in a text file

Data Structures and Algorithms in Java 65


Demo 2: Using HashTable to manage a student list

SE140606,NGUYỄN TRỌNG HẢI,7


SE141127,VÕ TRỌNG ĐẠT,4
SE140913,TRẦN MINH HIẾU,7
SE62440,ĐOÀN LƯƠNG PHÚ,6
SE141153,THÁI ĐỨC THẢO,5
SE140244,PHẠM NHẬT TÂN,8
SE140861,PHẠM ĐĂNG HẢI,5
SE140929,NGUYỄN LÊ ANH LONG,9
SE140755,LÊ ANH DUY,8
SE140618,LÝ GIA HUY,8
SE63394,VŨ VĂN KHẢI,9
SE63391,BÙI LÊ QUỐC THẮNG,4
SE140367,CAO DUY QUANG,9
SE140130,TRẦN VĂN TÂM,4
SE140923,NGUYỄN VĂN TÂN,5
SE130182,DIỆP MINH THÔNG,6
SE140877,NGUYỄN HỒNG SƠN,6
SE140813,NGUYỄN ĐĂNG HUY,6
SE140503,LÊ VĨNH HƯNG,3
SE140874,LÊ HỮU HIẾU,6
SE141086,NGUYỄN MẠNH LỰC,9
SE140873,TÔN THẤT BẢO,4
SE140067,NGUYỄN TRẦN HOÀNG
LONG,5
SE140855,TRẦN HOÀNG HẢI DUY,5
SE140885,CAO HOÀNG QUY,7
SE140203,HÀ GIA PHƯỚC,3
SE130610,THÁI TIẾN ĐẠT,7
SE151525,TẠ MINH TIẾN,3

Data Structures and Algorithms in Java 66


Demo 2: Using HashTable to manage a student list

Data Structures and Algorithms in Java 67


Demo 2: Using HashTable to manage a student list

Data Structures and Algorithms in Java 68


Demo 2: Using HashTable to manage a student list

Data Structures and Algorithms in Java 69


Demo 2: Using HashTable to manage a student list

Data Structures and Algorithms in Java 70


Demo 2: Using HashTable to manage a student list

Data Structures and Algorithms in Java 71


Demo 2: Using HashTable to manage a student list

Data Structures and Algorithms in Java 72


Demo 2: Using HashTable to manage a student list

Data Structures and Algorithms in Java 73


Demo 2: Using
HashTable to
manage a student
list

Data Structures and Algorithms in Java 74


Demo 2: Using HashTable to manage a student list

Data Structures and Algorithms in Java 75


Demo 2: Using HashTable to manage a student list

Data Structures and Algorithms in Java 76


Case Study: Hashing with Buckets
in a file

This example(in the textbook)


depicts how to implement
hashing data in a file.
Do yourself

Data Structures and Algorithms in Java 77


Thank you.

Data Structures and Algorithms in Java 78

You might also like