0% found this document useful (0 votes)

269 views58 pages

Hashing Powerpoint

The document discusses different methods for implementing dictionaries or hash tables, including using arrays, sequences, and hash functions. It explains that dictionaries store key-value pairs and support operations like searching, insertion, and deletion by key. Binary search trees and hash tables provide efficient lookup times of O(log n) and O(1), respectively. Hash tables map keys to table indices using hash functions, but collisions may occur and need to be resolved through techniques like chaining or open addressing.

Uploaded by

Marcus Aurelius

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

269 views58 pages

Hashing Powerpoint

Uploaded by

Marcus Aurelius

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 58

Dictionaries, Tables

Hashing
TCSS 342

1/51

The Dictionary ADT

a dictionary (table) is an abstract model of a
database
like a priority queue, a dictionary stores
key-element pairs
the main operation supported by a
dictionary is searching by key

2/51

Examples

Telephone directory
Library catalogue
Books in print: key ISBN
FAT (File Allocation Table)

3/51

Main Issues
Size
Operations: search, insert, delete, ??? Create
reports??? List?
What will be stored in the dictionary?
How will be items identified?

4/51

The Dictionary ADT

simple container methods:
size()
isEmpty()
elements()

query methods:
findElement(k)
findAllElements(k)
5/51

The Dictionary ADT

update methods:
insertItem(k, e)
removeElement(k)
removeAllElements(k)

special element
NO_SUCH_KEY, returned by an unsuccessful
search
6/51

Implementing a Dictionary
with a Sequence
unordered sequence
searching and removing takes O(n) time
inserting takes O(1) time
applications to log files (frequent insertions,
rare searches and removals) 34 14 12 22 18

18
7/51

Implementing a Dictionary
with a Sequence
array-based ordered sequence
(assumes keys can be ordered)
- searching takes O(log n) time (binary search)
- inserting and removing takes O(n) time
- application to look-up tables
(frequent searches, rare insertions and removals)

34
8/51

Binary Search
narrow down the search range in stages
high-low game
findElement(22)
2
low

9 12 14 17 19 22 25 27 28 33 37
mid

high

9/51

Binary Search
2

9 12 14 17 19 22 25 27 28 33 37
low

mid

high

9 12 14 17 19 22 25 27 28 33 37

low mid high

9 12 14 17 19 22 25 27 28 33 37
low = mid = high

10/51

Pseudocode for Binary Search

Algorithm
BinarySearch(S, k, low, high)
if low > high then
return NO_SUCH_KEY
else
mid (low+high) / 2
if k = key(mid) then
return key(mid)
else if k < key(mid) then
return BinarySearch(S, k, low, mid-1)
else
return BinarySearch(S, k, mid+1, high)
11/51

Running Time of Binary

Search
The range of candidate items to be searched
is halved after each comparison
Comparison

Search Range

n/2

n/2i

log 2n

1
12/51

Running Time of Binary

Search
In the array-based implementation, access
by rank takes O(1) time, thus binary search
runs in O(log n) time
Binary Search is applicable only to Random
Access structures (Arrays, Vectors)

13/51

Implementations
Sorted? Non Sorted?
Elementary: Arrays, vectors linked lists
Orgainization: None (log file), Sorted, Hashed

Advanced: balanced trees

14/51

Skip Lists
Simulate Binary Search on a linked list.
Linked list allows easy insertion and
deletion.
http://www.epaperpress.com/s_man.html

15/51

A FAT Example
Directory: Key: file name. Data (time, date, size
) location of first block in the FAT table.
If first block is in physical location #23 (Disk
block number) look up position #23 in the FAT.
Either shows end of file or has the block number
on disk.
Example: Directory entry: block # 4
FAT: x x x F 5 6 10 x 23 25
3
The file occupies blocks 4,5,6,10, 3.
16/51

Hashing
Place item with key k in position h(k).
Hope: h(k) is 1-1.
Requires: unique key (unless multiple items
allowed). Key must be protected from
change (use abstract class that provides only
a constructor).
Keys must be comparable.
17/51

Key class

public abstract class KeyID {

Private Comparable searchKey;
Public KeyID(Comparable m) {
searchKey = m;
}//Only one constructor
public Comparable getSearchKey() {
return searchKey;
}

}
18/51

Hashing Problem
RT&T is a large phone company, and they
want to provide enhanced caller ID
capability:

given a phone number, return the callers name

phone numbers are in the range 0 to R = 10101
n is the number of phone numbers used
want to do this as efficiently as possible
19/51

Hashing Problem
We know two ways to design this dictionary:
abalanced search tree (AVL, red-black) or a skiplist with the phone number as the key has O(log n)
query time and O(n) space --- good space usage
and search time, but can we reduce the search time
to constant?
abucket array indexed by the phone number has
optimal O(1) query time, but there is a huge
amount of wasted space: O(n + R)
20/51

Bucket Array
Each cell is thought of as a bucket or a container
Holds key element pairs
In array A of size N, an element e with key k is inserted
in A[k].

(null)

000-000-0000 000-000-0001

Roberto
401-863-7639 ...

(null)
999-999-9999
21/51

Generalized indexing
Hash table
Data storage associated with a key
The key need not be an integer

22/51

Hash Tables
A data structure
The location of an item is determined
directly as a function of the item itself
Not by a sequence of trial and error comparisons

Commonly used to provide faster searching

O(n) for linear searches
O (logn) for binary search
O(1) for hash table
23/51

Example:
A symbol table constructed by a
compiler
Stores identifiers and information about
them

24/51

Another Solution
A Hash Table is an alternative solution with
O(1) expected query time and O(n + N)
space, where N is the size of the table
Like an array, but with a function to map
the large range of keys into a smaller one
e.g., take the original key, mod the size of the
table, and use that as an index
25/51

Example
Insert item (401-863-7639, Roberto) into a table of
size 5
4018637639 mod 5 = 4, so item (401-863-7639,
Roberto) is stored in slot 4 of the table
A lookup uses the same process: map the key to an
index, then check the array cell at that index
401863-7639
Roberto

26/51

Collision
Insert (401-863-9350, Andy)
And insert (401-863-2234, Devin). We have
a collision!

27/51

Collision Resolution
How to deal with two keys which map to
the same cell of the array?
Use chaining
Set up lists of items with the same index

28/51

Chaining
0
1

2
3
4

29/51

Chaining
The expected, search/insertion/removal time
is O(n/N), provided the indices are
uniformly distributed
The performance of the data structure can
be fine-tuned by changing the table size N

30/51

Hash Function
Function h defined by h(i) = i
Determines the location of an item i in the
hash table

Called a hash function.

To reduce the large size of a hash table
use
h(i) = i mod 25;
31/51

From Keys to Indices

The mapping of keys to indices of a hash table is
called a hash function
A hash function is usually the composition of two
maps:
hash code map: key integer
compression map: integer [0, N -1]

An essential requirement of the hash function is to

map equal keys to equal indices
A good hash function minimizes the probability
of collisions
32/51

Java Hash
Java provides a hashCode() method for the
Object class, which typically returns the 32bit memory address of the object.
This default hash code would work poorly
for Integer and String objects
The hashCode() method should be suitably
redefined by classes.
33/51

Popular Hash-Code Maps

Integer cast: for numeric types with 32 bits
or less, we can reinterpret the bits of the
number as an int
Component sum: for numeric types with
more than 32 bits (e.g., long and double),
we can add the 32-bit components.

34/51

Popular Hash-Code Maps

Polynomial accumulation: for strings of a
natural language, combine the character
values (ASCII or Unicode) a 0 a 1 ... a n-1 by
viewing them as the coefficients of a
polynomial: a 0 + a 1 x + ...+ x n-1 a n-1

35/51

Popular Hash-Code Maps

The polynomial is computed with Horners
rule, ignoring overflows, at a fixed value x:
a0 + x (a1 + x (a2 + ... x (an-2 + x an-1 ) ... ))
The choice x = 33, 37, 39, or 41 gives at most 6
collisions on a vocabulary of 50,000 English
words

Why is the component-sum hash code bad

for strings?
36/51

Random Hashing
Random hashing
Uses a simple random number generation
technique
Scatters the items randomly throughout
the hash table

37/51

Popular Compression Maps

Division: h(k) = |k| mod N
the choice N =2 k is bad because not all the bits are
taken into account
the table size N is usually chosen as a prime
number
certain patterns in the hash codes are propagated

Multiply, Add, and Divide (MAD):

h(k) = |ak + b| mod N
eliminates patterns provided a mod N 0
same formula used in linear congruential (pseudo)
random number generators
38/51

More on Collisions
A key is mapped to an already occupied
table location
what to do?!?

Use a collision handling technique

Weve seen Chaining
Can also use Open Addressing
Double Hashing
Linear Probing
39/51

Linear Probing
If the current location is used, try the next
table location
linear_probing_insert(K)
if (table is full) error
probe = h(K)
while (table[probe] occupied)
probe = (probe + 1) mod M
table[probe] = K
40/51

Linear Probing
Lookups walk along table until the key or an
empty slot is found
Uses less memory than chaining
dont have to store all those links

Slower than chaining

may have to walk along table for a long way

Deletion is more complex

either mark the deleted slot
or fill in the slot by shifting some elements down
41/51

Linear Probing Example

h(k) = k mod 13
Insert keys:
18 41 22 44 59 32 31 73
0

41
0

10 11 12

18 44 59 32 22 31 72
3

10 11 12
42/51

Double Hashing
Use two hash functions
If M is prime, eventually will examine every
position in the table
double_hash_insert(K)
if(table is full) error
probe = h1(K)
offset = h2(K)
while (table[probe] occupied)
probe = (probe + offset) mod M
table[probe] = K
43/51

Double Hashing
Many of same (dis)advantages as linear
probing
Distributes keys more uniformly than linear
probing does

44/51

Double Hashing Example

h1(K) = K mod 13
h2(K) = 8 - K mod 8
we want h2 to be an offset to add
18 41 22 44 59 32 31 73
0

44
0

41 73
1

10 11 12

18 32 53 31 22
4

10 11 12

45/51

Hash code
static int hashCode(long i) {
return (int)((i >> 32) + (int) i);}

46/51

Hash code
static int hashCode(String s) {
int h=0;
for (int i=0; i<s.length(); i++) {
h = (h << 5) | (h >>> 27);
// 5-bit cyclic shift of the running sum
h += (int) s.charAt(i); // add in next
character }
return h;
}
47/51

Linear Probing Hash Table

public class LinearProbingHashTable implements Dictionary {
/** Marker for deactivated buckets */
private static Item AVAILABLE = new Item(null, null);
/** number of items in the dictionary */
private int n = 0;
/** capacity of the bucket array */
private int N;
/** bucket array */
private Item[] A;
48/51

Linear Probing Hash Table

/** hash comparator */
private HashComparator h;
/** constructor providing the hash comparator */
public LinearProbingHashTable(HashComparator hc) {
h = hc;
N = 1023; // default capacity
A = new Item[N];
}

49/51

Linear Probing Hash Table

/** constructor providing the hash comparator and the
capacity
* of the bucket array */
public LinearProbingHashTable(HashComparator hc, int
bN) {
h = hc;
N = bN;
A = new Item[N];
}
50/51

Linear Probing Hash Table

// auxiliary methods
private boolean available(int i) {
return (A[i] == AVAILABLE);
}
private boolean empty(int i) {
return (A[i] == null);
}

51/51

Linear Probing Hash Table

private Object key(int i) {
return A[i].key();
}
private Object element(int i) {
return A[i].element();
}
private void check(Object k) {
if (!h.isComparable(k)) throw new
InvalidKeyException("Invalid key.");
}
52/51

Helper search method

/** helper search method */
private int findItem(Object key) throws
InvalidKeyException {
check(key);
int i = h.hashValue(key) % N;
// division method compression map
int j = i;

53/51

Helper search method

do {
if (empty(i))
return -1; // item is not found
if (available(i))
i = (i + 1) % N; // bucket is deactivated
else if (h.isEqualTo(key(i), key)) // we have found our item
return i;
else // we must keep looking
i = (i + 1) % N;
} while (i != j);
return -1; // item is not found
}

54/51

Dictionary
// methods of the dictionary ADT
public Object findElement (Object key) throws
InvalidKeyException {
int i = findItem(key); // helper method for finding a key
if (i < 0)
return Dictionary.NO_SUCH_KEY;
return element(i);
}

55/51

Dictionary
public void insertItem (Object key, Object element)
throws InvalidKeyException {
check(key);
int i = h.
hashValue(key) % N; // division method compression
map
int j = i;

56/51

Dictionary
// remember where we are starting
do {
if (empty(i) || available(i)) { // this slot is available
A[i] = new Item(key, element);
n++;
return;
}
i = (i + 1) % N; // check next slot
} while (i != j); // repeat until we return to start
throw new HashTableFullException("Hash table is
full."); }
57/51

Dictionary
public Object removeElement (Object key) throws
InvalidKeyException {
int i = findItem(key); // find this key first
if (i <0)
return Dictionary.NO_SUCH_KEY;
// nothing to remove
Object toReturn = element(i);
A[i] = AVAILABLE; // mark this slot as deactivated
n--;
return toReturn;
}

58/51

GD32F3x0 User Manual en v2.1-1
No ratings yet
GD32F3x0 User Manual en v2.1-1
686 pages
9 DictionaryandHashing-1
No ratings yet
9 DictionaryandHashing-1
32 pages
9.map 1 HashTable
No ratings yet
9.map 1 HashTable
31 pages
Report of Analysis Structure Project
25% (4)
Report of Analysis Structure Project
40 pages
Ders7 - Data Structures and Search Algorithms
No ratings yet
Ders7 - Data Structures and Search Algorithms
41 pages
CS Index Albania
No ratings yet
CS Index Albania
54 pages
Hashing: Presented by
No ratings yet
Hashing: Presented by
35 pages
BD FACSVerse Flow Cytometer User Manual PDF
100% (1)
BD FACSVerse Flow Cytometer User Manual PDF
184 pages
Bike PUC
No ratings yet
Bike PUC
1 page
092 - MA8402, MA6453 Probability and Queueing Theory PQT - Notes 1
No ratings yet
092 - MA8402, MA6453 Probability and Queueing Theory PQT - Notes 1
337 pages
Measuring the Spillovers From Technical Advance
No ratings yet
Measuring the Spillovers From Technical Advance
15 pages
Electrical Diagram
100% (1)
Electrical Diagram
21 pages
Nature of Problem Based and Project Based TTL Maam Princes - 010249
No ratings yet
Nature of Problem Based and Project Based TTL Maam Princes - 010249
24 pages
Vertical Farming
100% (5)
Vertical Farming
16 pages
Chapter 1 - Types and Components of Computer Systems
No ratings yet
Chapter 1 - Types and Components of Computer Systems
16 pages
3rd Merit List of AD Leading to BS Hospitality and Tourism Program Fall 2024
No ratings yet
3rd Merit List of AD Leading to BS Hospitality and Tourism Program Fall 2024
7 pages
Exercise The Doctrine of Ultra Vires VZ
No ratings yet
Exercise The Doctrine of Ultra Vires VZ
19 pages
Tarjeta Kardex
No ratings yet
Tarjeta Kardex
15 pages
Technical Program Schedule With Session Link
No ratings yet
Technical Program Schedule With Session Link
9 pages
K-Map (Karnaugh Map) : Truth Table
No ratings yet
K-Map (Karnaugh Map) : Truth Table
20 pages
Chapter 3 - Syntax Analysis Part One
No ratings yet
Chapter 3 - Syntax Analysis Part One
17 pages
The Bicycle Wheel
No ratings yet
The Bicycle Wheel
147 pages
India Mumbai Office Q1 2023
No ratings yet
India Mumbai Office Q1 2023
2 pages
Network Theory MQP 2
No ratings yet
Network Theory MQP 2
4 pages
Thesis Outline - Impact of Capital Structure On Firm Value
No ratings yet
Thesis Outline - Impact of Capital Structure On Firm Value
6 pages
80 Questions With Answers Batch 4
No ratings yet
80 Questions With Answers Batch 4
15 pages
Augmentin I V
No ratings yet
Augmentin I V
13 pages
MEL-Top Notch 2e - Teacher Walkthrough
No ratings yet
MEL-Top Notch 2e - Teacher Walkthrough
9 pages
Description and Operation: Identification Codes
No ratings yet
Description and Operation: Identification Codes
47 pages
Crypto QB Module 1, 2
100% (2)
Crypto QB Module 1, 2
3 pages
Catalog Austrocasa
No ratings yet
Catalog Austrocasa
12 pages
Performance Task
No ratings yet
Performance Task
4 pages
Klasifikasi Jis Japan Industrial Standard
No ratings yet
Klasifikasi Jis Japan Industrial Standard
5 pages
Numerical of Paging and Segmentation
No ratings yet
Numerical of Paging and Segmentation
12 pages
Materials Characterization Techniques
100% (1)
Materials Characterization Techniques
1 page
Gulfood Exhibitor List N 1
No ratings yet
Gulfood Exhibitor List N 1
19 pages
Unit2 Skiplist
No ratings yet
Unit2 Skiplist
10 pages
CPP MODEL PAPERS
No ratings yet
CPP MODEL PAPERS
6 pages
stacks ppt
No ratings yet
stacks ppt
9 pages
Practical Assignment 3: 1. Produce A Combine Table in Which Each Student Is Combine With Every Other Application
100% (1)
Practical Assignment 3: 1. Produce A Combine Table in Which Each Student Is Combine With Every Other Application
20 pages
Daa Notes Complete
No ratings yet
Daa Notes Complete
203 pages
Zfi Print4
No ratings yet
Zfi Print4
2 pages
Horspool Algorithm
No ratings yet
Horspool Algorithm
6 pages
Module 4: Dynamic Programming: Design and Analysis of Algorithms 21CS42
No ratings yet
Module 4: Dynamic Programming: Design and Analysis of Algorithms 21CS42
105 pages
Cse Department - ANNA UNIVERSITY Important Question and Answers - Regulation 2013,2017 - STUDY MATERIAL, Notes
No ratings yet
Cse Department - ANNA UNIVERSITY Important Question and Answers - Regulation 2013,2017 - STUDY MATERIAL, Notes
5 pages
DSA 2022 Question Paper
100% (1)
DSA 2022 Question Paper
2 pages
DBMS Lab Manual With ER Diagrams
67% (3)
DBMS Lab Manual With ER Diagrams
53 pages
Lecture 1 DS Using C++
No ratings yet
Lecture 1 DS Using C++
26 pages
MCS 021 Data and File Structures
No ratings yet
MCS 021 Data and File Structures
22 pages
Cgpa Calculator
No ratings yet
Cgpa Calculator
21 pages
Different Approach of Software Design
No ratings yet
Different Approach of Software Design
8 pages
COSC2429 - Intro To Programming Assessment 2 - Sem A 2021: RMIT Classification: Trusted
100% (1)
COSC2429 - Intro To Programming Assessment 2 - Sem A 2021: RMIT Classification: Trusted
3 pages
Anchoring of Monolithic Refractories - Uk
100% (1)
Anchoring of Monolithic Refractories - Uk
9 pages
Bahria University, Islamabad Campus: Department of Computer Science
No ratings yet
Bahria University, Islamabad Campus: Department of Computer Science
2 pages
Chapter-2 - Array Searching and Sorting
No ratings yet
Chapter-2 - Array Searching and Sorting
21 pages
Practical - 1: AIM:-Implement RSA Algorithm. Take Two Prime Numbers P, Q N PXQ
No ratings yet
Practical - 1: AIM:-Implement RSA Algorithm. Take Two Prime Numbers P, Q N PXQ
42 pages
S J C Institute of Technology Dept. of Computer Science & Engineering SUB: Data Structures & Applications (18CS32) Class: Tutorial 3
No ratings yet
S J C Institute of Technology Dept. of Computer Science & Engineering SUB: Data Structures & Applications (18CS32) Class: Tutorial 3
2 pages
CS8792 - CNS
100% (1)
CS8792 - CNS
29 pages
Lab03: Constructor and Destructor
No ratings yet
Lab03: Constructor and Destructor
9 pages
DAA 2020 Week 02 Assignment 01
No ratings yet
DAA 2020 Week 02 Assignment 01
3 pages
AoA Important Question
100% (1)
AoA Important Question
3 pages
Hashing
No ratings yet
Hashing
24 pages
DAA Notes PDF
No ratings yet
DAA Notes PDF
55 pages
Analysis of Algorithms CS 477/677: Hashing Instructor: George Bebis
No ratings yet
Analysis of Algorithms CS 477/677: Hashing Instructor: George Bebis
53 pages
Numerical Based On Indexing: Problem 1.2
No ratings yet
Numerical Based On Indexing: Problem 1.2
3 pages
DATA STRUCTURES AND ALGORITHMS - Unit 5
No ratings yet
DATA STRUCTURES AND ALGORITHMS - Unit 5
35 pages
Computer Networks and Security: Model Question Paper-1 With Effect From 2019-20 (CBCS Scheme)
No ratings yet
Computer Networks and Security: Model Question Paper-1 With Effect From 2019-20 (CBCS Scheme)
3 pages
Iv Semester Ooc (18CS45) Notes
No ratings yet
Iv Semester Ooc (18CS45) Notes
181 pages
Mathematical Analysis of Recursive Algorithms
No ratings yet
Mathematical Analysis of Recursive Algorithms
19 pages
Relational Algebra
No ratings yet
Relational Algebra
13 pages
DS Unit-5 Searching-Sorting
100% (1)
DS Unit-5 Searching-Sorting
37 pages
Chapter 13: Query Processing
No ratings yet
Chapter 13: Query Processing
25 pages
Sangida Akter (ID-11608019) - Hill Cipher
No ratings yet
Sangida Akter (ID-11608019) - Hill Cipher
129 pages
MCA Data Structures With Algorithms 01
No ratings yet
MCA Data Structures With Algorithms 01
16 pages
Unit-1 DAA Notes - Daa Unit 1 Note Unit-1 DAA Notes - Daa Unit 1 Note
No ratings yet
Unit-1 DAA Notes - Daa Unit 1 Note Unit-1 DAA Notes - Daa Unit 1 Note
26 pages
Anna University OOPS Question Bank Unit 2
No ratings yet
Anna University OOPS Question Bank Unit 2
6 pages
Daa Unit Viii Notes 1
No ratings yet
Daa Unit Viii Notes 1
7 pages
Daa Lecture Notes
No ratings yet
Daa Lecture Notes
120 pages
OOP - I GTU Study Material Presentations Unit-1 07022022102854PM
No ratings yet
OOP - I GTU Study Material Presentations Unit-1 07022022102854PM
59 pages
Vu Study M Parallel and Distributed Computer
No ratings yet
Vu Study M Parallel and Distributed Computer
29 pages
Question Bank Bsc. Ii Year Data Structures Short Answers Type Questions
No ratings yet
Question Bank Bsc. Ii Year Data Structures Short Answers Type Questions
3 pages
Data Structures Lab Manual
No ratings yet
Data Structures Lab Manual
60 pages
Unit 4 Notes
No ratings yet
Unit 4 Notes
46 pages
Basis Path Testing
No ratings yet
Basis Path Testing
4 pages
Cs6402 DAA Notes (Unit-3)
No ratings yet
Cs6402 DAA Notes (Unit-3)
25 pages
LAB MANUAL Java CS406 FORMAT
No ratings yet
LAB MANUAL Java CS406 FORMAT
9 pages
Ugcnet
100% (1)
Ugcnet
151 pages
MOTO
83% (6)
MOTO
583 pages
Project Report Format PDF
No ratings yet
Project Report Format PDF
7 pages
Question Bank: Subject: Data Structures and Algorithms
No ratings yet
Question Bank: Subject: Data Structures and Algorithms
6 pages