0% found this document useful (0 votes)

8 views5 pages

Tutorial3

The document discusses various concepts related to hash tables, including the probability of collisions during insertions, the behavior of linear probing during insertions and deletions, and the expected number of probes for successful and unsuccessful searches. It provides solutions to specific problems involving hash functions and their properties, as well as the implications of load factors on search efficiency. Additionally, it touches on the design of hash tables for storing key-value pairs with operations such as addition, deletion, and search.

Uploaded by

Himanshu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views5 pages

Tutorial3

Uploaded by

Himanshu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Tutorial 3

1. What is the probability for the 3rd insertion to have exactly two collisions while using linear probing
in the hash table?

Solution: Let the size of the hash table be n ≥ 3. We assume that insertions take place
via linear probing. Further, we assume that no deletions or insertions occur in the same
sequence. Lastly, we assume that the hash function is uniform, i.e., for a random key,
the likelihood of any hash value occurring is exactly 1/n. After collisions, let the first
two insertions occur at indices i and j.
If ((i + 1) mod n) ̸= j and (( j + 1) mod n) ̸= i, then the third insertion cannot have two
collisions. The reason is that if the hash value of the third key collides with the first key,
then index (i + 1) mod n is vacant, and there is only one collision. If the hash value is
at index j, then index ( j + 1) mod n is vacant and hence, only one collision. Any other
hash value implies no collision. Let us neglect this case.
We focus only on cases where i and j are adjacent to each other on the table. Let
us re-label the indices as a and b such that (( a + 1) mod n) = b. In this case, for any
constant 0 ≤ c < n, the absolute value of P( a = c) (not conditional) can be thought of
as the probability that ((i = c and j = (i + 1) mod n) or (j = c and i = ( j + 1) mod n)).
The first event occurs when the second hash value is either i or (i + 1) mod n, with
a chance of (1/n) · (2/n). The second event occurs when the second hash value is
(n + i − 1) mod n, with a chance of (1/n) · (1/n).
From the above, we can say that P( a = c) = (3/n2 ) for all constants c, we safely say
that two collisions occur if and only if the third hash-value is a, which occurs with a
probability of n1 . However, c and, therefore, a can take any value. Thus, the product
(3/n2 ) · (1/n) has to be summed up across all values (n times), giving us the required
answer: (3/n2 ).

2. Given that k elements have to be stored using a hash function with target space n. What is the
probability of the hash function having an inherent collision? What is an estimate of the probability
of a collision in the insertion of n elements?

Solution: To compute the probability, we first count the negative case: no collisions at
all. This means that the k keys get mapped to distinct k keys out of n. We assume that
for a random key and for any constant c, P(hash(k) = c) = 1/n.
The first key has n choices of hash values. Given each choice of the first key, the second
key has n − 1 possible hash values such that there is no collision. Given the first two
keys and hash values, the third has n − 2 choices, and so on, till the kth key has n − k + 1
choices. Overall, the ordered k-tuple has ∏ik=−01 (n − i ) choices for no collision. All those
are mutually exclusive events with likelihoods (1/n)k each, being the product of k
probabilities of 1/n for k independent events and their respective outcomes. Thus,

n!
P( No collisions) =
(n − k )! · nk

One minus this expression gives the likelihood of a collision.

If k = n, the expression boils down to (n!/nn ). We apply Stirling’s approximation here:
√ n n 1 √ n n 1
2πn e 12n+1 < n! < 2πn e 12n
e e
√ n √ n
1 1 n! 1 1
2πn e 12n + 1 < n < 2πn e 12n
e n e
For large n, we can neglect
√ the terms e(1/(12n+1)) and e(1/12n) as both are very close to
1. However, the term 2πn · e−n tends to zero as n grows very large. This means that
the required probability is sandwiched between terms that exponentially decay toward
zero.
The implication of the above is the likelihood that all n keys get uniquely mapped to
the n possible hash values decay exponentially with n, and thus, collisions are almost
certain for large n. This can be linked with the birthday paradox in that in a class of
365; it is almost impossible for all students to have different birthdays; even a single
tutorial batch has a high likelihood of at least two birthdays being the same. In the
context of hashing, although the expected number of terms sharing a hash value is one
per hash value, we expect a good number of collisions to occur during hashing.

3. Let C (i ) be the chain of array indices that are queried to look for a key k in linear probing where
h(k) = i.

(a) How does this chain extend by an insertion, and how does it change by a deletion?
(b) A search for a key k ends when an empty cell is encountered. What if we mark the end of C (i)
with an end marker? We stop the search when this marker is encountered. Would this work?
Would this be efficient?
(c) Is there a way of not using tombstones?

Solution: To recall, linear probing covers indices i, i + 1, ... until an empty index is
found (then k not found in the table) or the cell stores k. Note that tombstones do not
count as empty cells in the hash table; counting them so after deletion would break
existing chains and result in false negatives when searching. Including them means no
change in these chains, answering part of the first (a) subproblem.
If the chain terminates at k (key found), then inserting a new element does not change
anything. If the added element is k itself, then it occupies the position at the end of
the chain, a vacant cell. So, no change. In the other case, the chain extends by one if
the inserted element is at the end of the chain and is not k. The chain could extend a
lot more if the inserted element joins the existing chain with another series of filled
cells in the table. Note that the chain, except for the end, has only filled or tombstone
cells; thus, the insertion cannot occur in the middle of the chain. If the insertion of the
element that is not k does not occur at the end of the previous chain for searching k,
then there is no change.
The answer to the second (b) part is that it would work but not be any more efficient
than the standard algorithm. For this to work, an empty cell once encountered during
the search should be marked an ‘end.’ Otherwise, the first search would go on an
infinite loop, looking for an end marker. If this is the case, the search must look for
both end marks and empty cells in each iteration, making it slightly less efficient but at
the same time complexity.
For the third (c) part, a slight modification should be made to the algorithm discussed
above: all cells should be marked ‘end’ at the start. Otherwise, a deletion without
a tombstone would cause false negatives in future searches, just like the discussions
without tombstones. If this modification is made, then we basically replace “empty”
with an “end” marker and “tombstone” marker with “empty”. The algorithm is,
otherwise, identical to the normal algorithm with tombstones, and hence, we really do
not avoid tombstones, although we do not use them. Note that marking the freed cells
as ‘end’ on deletions would be as bad as not using tombstones, which results in false
negatives.

4. Let m = 11, h1 (k) = (k mod 11), h2 (k) = 6 − (k mod 6).

Let us use the following hash function for an open addressing scheme.

h(k, i ) = (h1 (k) + i · h2 (k )) mod m

1. What will be the state of the table after insertions of 41, 22, 44, 59, 32, 31, and 74?
2. Let h2 (k) = p − (k mod p). What should be the relationship between p and m such that h is a
valid function for linear probing?
3. What is the average number of probes for an unsuccessful search if the table has α load factor?
4. What is the average time for a successful search?

Solution: To recall, the standard linear probing uses the function h(k, i ) = (h(k) + i ) mod m
where m is the size of the hash table [Here 11]. The way it works is that we check the
cells indexed at h(k, 0), h(k, 1), h(k, 2), . . . till we find an empty cell h(k, a) for some a.
The same logic should be extended here, except h(k, i ), which is defined in the problem
statement.
Let the table before adding 41 be

Now, h1 (41) = 8 and h2 (41) = 1. This means that h(41, 0) = 8 and cell 8 is empty. So,
we add 41 there:

h1 (22) = 0 and cell 0 is empty so we can add 22 there. [h(k, 0) = h1 (k)]

22 41

h1 (44) = 0 but cell 0 is filled. Now, h2 (44) = 4 and h(44, 1) = 4 which is an empty tile.

22 44 41

h1 (59) = 4 which is a filled cell. h2 (59) = 1 and h(59, 1) = 5 is an empty tile.

22 44 59 41

h1 (32) = 10 and h1 (31) = 9 are both empty and different cells, so we can add them.

22 44 59 41 31 32

Lastly, h1 (74) = 8, which is occupied. h2 (74) = 4. Now, h(74, 1) = 1 which is free.

22 74 44 59 41 31 32
This is how the table looks after inserting the elements in the given order.
Moving on to the next part, a valid function for linear probing should eventually cover
all cells in the grid as i increments. This is not possible if gcd(h2 (k), m) > 1, for one or
more values of k. The reason is that h(k, i ) mod gcd(h2 (k ), m) will be fixed, and hence,
h(k, i ) mod m will only take some values. But if that gcd is 1, then it means that for
all 0 ≤ i < m, gcd · i mod m will have a unique value, and the number of these unique
values will be m, so all cells of the table will be covered.
Note that gcd(h2 (k), m) = 1 should hold for all values of k, not just for any one k.
Keeping h2 (k) = p − (k mod p) can take any value in 1, 2, . . . p depending on the value
of k. Thus, none of the first p natural numbers should share a common factor with m.
In other words, the smallest prime factor (not 1) of m, the size of the hash table, should
be larger than p. It is not necessary that m is prime – for p = 6, m = 7 · 13 = 91 also
works.
Exact expression of expected number of probes in a search (unsuccessful/successful)
is mathematically complex hence we’ll prove an appropriate upper bound for them.
Claim: The mentioned expression for an unsuccessful one is atmost 1/(1 − α). This is
assuming α < 1.
Proof: Let X be the number of probes made in an unsuccessful search, and let Ai be
the event that an ith probe occurs and it is to an occupied slot. Then X ≥ i when the
event A1 ∩ A2 ∩ . . . ∩ Ai−1 occurs. Hence,
P ( X ≥ i ) = P ( A 1 ∩ A 2 ∩ . . . ∩ A i −1 )
= P ( A 1 ) × P ( A 2 | A 1 ) × P ( A 3 | A 2 ∩ A 1 ) × . . . × P ( A i −1 | A 1 ∩ . . . ∩ A i −2 )
P( A1 ) = n/m because there are n elements and m slots. For j > 1, the probability
that there is a jth probe and it is to an occupied slot, given that the first j − 1 probes
were to occupied slots, is (n − j + 1)/(m − j + 1) because we are finding one of the
remaining (n − ( j − 1)) elements in one of the (m − ( j − 1)) unexamined slots, and we
have uniform hashing so each of the slots is equally likely to be chosen next.
Because we have n < m, we have (n − j)/(m − j) ≤ n/m ∀ 0 ≤ j < m. Thus,
n n−1 n−i+2 n
P( X ≥ i) = × ×...× ≤ ( ) i −1 = α i −1
m m−1 m−i+2 m
So E[ X ] = ∑im=−11 P(ith probe occurs) = ∑im=−11 P( X ≥ i ) ≤ ∑im=−11 αi−1 ≤ ∑i∞=1 αi−1 =
1/(1 − α). Hence proved.
Claim: Expected number of probes for a successful search is atmost α1 ln( 1−1 α ).
Proof: A search for a key k reproduces the same probe sequence as when the element
with key k was inserted. If k was the (i + 1)th key inserted into the hash table, there
were only i keys in the table at the time, so the expected number of probes made in
a search for k is at most 1/(1 − i/m) = m/(m − i ) (as an insertion is equivalent to an
unsuccessful search, here α = i/m). Averaging over all n keys in the hash table, we get,
n −1 n −1
1 m m 1
n ∑ m−i
=
n ∑ m−i
i =0 i =0
m
1 1
=
α ∑ k
k = m − n +1
1 n 1
Z
≤ dx
α m−n x
1 m
= ln( )
α m−n
1 1
= ln( )
α 1−α
Hence Proved.
5. Suppose you want to store a large set of key-value pairs, for example, (name, address). You have
operations in this set: addition, deletion, and search of elements. You also have queries about whether
a particular name or address is in the set, and if so, then count them and delete all such entries. How
would you design your hash tables?

Solution: Let us recall that in the classic hashing, the hash function operates on the
key, and the value is simply stored alongside the key. An intuitive idea is to keep the
key (from the point of view of hashing) as both the name and the address, with no
special value otherwise. The hash function operates on this and gives us the index.
Unfortunately, this gives rise to a problem: there are multiple different hash values for
a given name, and hence, searching for the existence of an entry given a name (but
not address – we do not know which addresses are there given this name) is as bad as
linearly iterating through the table.
One way to get around this is to use a two-level hash table: the outer table N stores the
keys as the names alone, and each value corresponding to a name points to a hash table
itself. This inner hash table N [name] stores only addresses and only addresses where
the name is the corresponding key of the outer table. This makes finding elements
based on names easy but does not solve the problem of searching for given addresses.
To solve that, we can use a double two-level hash table: one two-level table N as
described above and one two-level table A done the other way round, i.e., A stores
the keys as addresses and values as pointers to inner hash tables A[ address] that store
names corresponding to addresses. Note that this requires additional storage space
(double) and more maintenance.
Specifically, it is important to ensure integrity: for every (name, address) pair in the
set, N [name] should exist and contain address, and A[ address] should exist and contain
name. Insertion should be done in both tables at once. On the other hand, if any
(name, address) pair does not exist, then A[ address], if it exists, should not contain name,
and N [name], if it exists, should not contain address. This should be reflected in the
deletion of pairs.

Solutions To Exercises On Hash Tables
No ratings yet
Solutions To Exercises On Hash Tables
3 pages
Hash Table 2010
No ratings yet
Hash Table 2010
43 pages
Perfect Hashing
No ratings yet
Perfect Hashing
6 pages
Questions 11
No ratings yet
Questions 11
3 pages
Lec 11 Hash Table
No ratings yet
Lec 11 Hash Table
43 pages
Overview of Hash Tables
No ratings yet
Overview of Hash Tables
4 pages
Hashing
No ratings yet
Hashing
13 pages
Chapter 5 - Hashing - Part1
No ratings yet
Chapter 5 - Hashing - Part1
28 pages
Hashing
No ratings yet
Hashing
111 pages
hw2 15211
No ratings yet
hw2 15211
8 pages
CS2040 Tutorial4 Ans
No ratings yet
CS2040 Tutorial4 Ans
5 pages
Ch11 Soln 2
No ratings yet
Ch11 Soln 2
8 pages
Unique Permutation Hashing: Shlomi Dolev Limor Lahiani Yinnon Haviv May 19, 2009
No ratings yet
Unique Permutation Hashing: Shlomi Dolev Limor Lahiani Yinnon Haviv May 19, 2009
11 pages
Module 5
No ratings yet
Module 5
22 pages
4
No ratings yet
4
29 pages
Algo Cha 8
No ratings yet
Algo Cha 8
20 pages
11 Hash Tables II
No ratings yet
11 Hash Tables II
18 pages
23AID204 ADSAA-HashTables
No ratings yet
23AID204 ADSAA-HashTables
41 pages
Hashing MCQ
No ratings yet
Hashing MCQ
3 pages
Chapter10 HashTables
No ratings yet
Chapter10 HashTables
49 pages
Hashing
No ratings yet
Hashing
7 pages
Final Hashing
No ratings yet
Final Hashing
15 pages
11-Hashing-Hong Kong
No ratings yet
11-Hashing-Hong Kong
25 pages
Hashing: An Ideal Hash Table
No ratings yet
Hashing: An Ideal Hash Table
11 pages
Handout 9 - Hashing
No ratings yet
Handout 9 - Hashing
11 pages
Hashing - Datastructures and Algorithms
No ratings yet
Hashing - Datastructures and Algorithms
32 pages
15 HashTables
No ratings yet
15 HashTables
27 pages
Lab 3
No ratings yet
Lab 3
5 pages
HW 4 Sol
No ratings yet
HW 4 Sol
8 pages
Hashing - DPP 01
No ratings yet
Hashing - DPP 01
4 pages
Collision Resolution
No ratings yet
Collision Resolution
26 pages
Chapter 28 Hashing: Hash Table. The Function That Maps A Key To An Index in The Hash Table Is
No ratings yet
Chapter 28 Hashing: Hash Table. The Function That Maps A Key To An Index in The Hash Table Is
4 pages
Theoretical Computer Science: Shlomi Dolev, Limor Lahiani, Yinnon Haviv
No ratings yet
Theoretical Computer Science: Shlomi Dolev, Limor Lahiani, Yinnon Haviv
7 pages
Lecture 3.2.2 Collision Resolution Strategies
No ratings yet
Lecture 3.2.2 Collision Resolution Strategies
35 pages
Hashing
No ratings yet
Hashing
41 pages
Hashing PDF
No ratings yet
Hashing PDF
65 pages
Problem Idea of Universal Hashing
No ratings yet
Problem Idea of Universal Hashing
14 pages
CLRS Linked Lists
No ratings yet
CLRS Linked Lists
6 pages
DSA2 Chapter 5 Hashing
No ratings yet
DSA2 Chapter 5 Hashing
44 pages
Hash Table PDF
No ratings yet
Hash Table PDF
25 pages
Hash Data Structure
No ratings yet
Hash Data Structure
18 pages
Hashing
No ratings yet
Hashing
35 pages
Hashing
No ratings yet
Hashing
35 pages
Hashing Important Theorems
No ratings yet
Hashing Important Theorems
26 pages
14 Hashing
No ratings yet
14 Hashing
23 pages
Problem Set 5
No ratings yet
Problem Set 5
2 pages
c11 Hashing
No ratings yet
c11 Hashing
9 pages
Lecture 6
No ratings yet
Lecture 6
57 pages
Ads M Tech Mid 2
No ratings yet
Ads M Tech Mid 2
26 pages
Hashing
No ratings yet
Hashing
10 pages
Dsa Merged
No ratings yet
Dsa Merged
339 pages
11 - Hash Table
No ratings yet
11 - Hash Table
65 pages
L04 Hashing
No ratings yet
L04 Hashing
63 pages
10 Hash Tables
No ratings yet
10 Hash Tables
19 pages
Lecture 27 - Hashing
No ratings yet
Lecture 27 - Hashing
48 pages
Lecture 9
No ratings yet
Lecture 9
44 pages
Perfect Hashing University of Chicago
No ratings yet
Perfect Hashing University of Chicago
42 pages
Lec 31 Handout
No ratings yet
Lec 31 Handout
18 pages
Algebraic Equations
From Everand
Algebraic Equations
Demetrios P. Kanoussis
No ratings yet
The Logical Solution Syracuse Conjecture
From Everand
The Logical Solution Syracuse Conjecture
Rolando Zucchini
No ratings yet
Events Orientation
No ratings yet
Events Orientation
1 page
Report
No ratings yet
Report
7 pages
01 Unix Command Notes
No ratings yet
01 Unix Command Notes
6 pages
09-Git
No ratings yet
09-Git
53 pages
A Title of The Article (Bahasa and English 14pt, Bold,: Centered)
No ratings yet
A Title of The Article (Bahasa and English 14pt, Bold,: Centered)
2 pages
Predicative Complexes With The Infinitive
No ratings yet
Predicative Complexes With The Infinitive
3 pages
The Nature of Jesus Christ
No ratings yet
The Nature of Jesus Christ
2 pages
b1 Reading Lesson World Migratory Bird Day - 157285
No ratings yet
b1 Reading Lesson World Migratory Bird Day - 157285
17 pages
Overall Structure and Stock Phrase Template
100% (1)
Overall Structure and Stock Phrase Template
16 pages
Accessing The Storage System
No ratings yet
Accessing The Storage System
25 pages
Online Medical Store - Synopsis
75% (4)
Online Medical Store - Synopsis
35 pages
A. Nagoor Kani - Circuit Theory-McGraw-Hill Education (2018)
67% (3)
A. Nagoor Kani - Circuit Theory-McGraw-Hill Education (2018)
808 pages
The Product Rule: Lesson Objective: Be Able To Differentiate The Product of Two Functions Using The Product Rule
No ratings yet
The Product Rule: Lesson Objective: Be Able To Differentiate The Product of Two Functions Using The Product Rule
13 pages
English Sample Paper 4-1
No ratings yet
English Sample Paper 4-1
7 pages
01ReadingGuide Plato Lindberg
No ratings yet
01ReadingGuide Plato Lindberg
4 pages
Lab 8 - Latex: 1. Objective - 2. Tutorial A. The Basic Layout of A Latex File
No ratings yet
Lab 8 - Latex: 1. Objective - 2. Tutorial A. The Basic Layout of A Latex File
12 pages
XML and Web Database
No ratings yet
XML and Web Database
10 pages
Shortcut Keys For Desktop & Laptop
No ratings yet
Shortcut Keys For Desktop & Laptop
3 pages
Handout Number Week 1
No ratings yet
Handout Number Week 1
1 page
Proof The Quran Never Been Changed
0% (1)
Proof The Quran Never Been Changed
4 pages
Design of OFDM Transmitter and Receiver For Error Free Communication
No ratings yet
Design of OFDM Transmitter and Receiver For Error Free Communication
61 pages
Wa0000.
No ratings yet
Wa0000.
5 pages
Cloud-AI-Native 6G Powered by eBPF
No ratings yet
Cloud-AI-Native 6G Powered by eBPF
20 pages
Introduction & Presentation
No ratings yet
Introduction & Presentation
3 pages
Romance and Chivalry in English Medieval Literature
No ratings yet
Romance and Chivalry in English Medieval Literature
12 pages
NeuropsychologicalassessmentLisaCipolotti, ElizabethKWarrington PDF
No ratings yet
NeuropsychologicalassessmentLisaCipolotti, ElizabethKWarrington PDF
10 pages
Introduction + Unit 1 Unit 1 (Cont) Unit 1 (Cont) Unit 2 Unit 2 (Cont)
No ratings yet
Introduction + Unit 1 Unit 1 (Cont) Unit 1 (Cont) Unit 2 Unit 2 (Cont)
38 pages
Striving For Inner Peace Najmi
No ratings yet
Striving For Inner Peace Najmi
2 pages
(eBook PDF) Modeling the Dynamics of Life: Calculus and Probability for Life Scientists 3rd Edition full
100% (1)
(eBook PDF) Modeling the Dynamics of Life: Calculus and Probability for Life Scientists 3rd Edition full
153 pages
Section 4
No ratings yet
Section 4
4 pages
MAKALAH TEKS EKSPOSISI DALAM BAHASA INGGRIS Exposition Text
100% (1)
MAKALAH TEKS EKSPOSISI DALAM BAHASA INGGRIS Exposition Text
7 pages
A1 C2 Standard For Students
No ratings yet
A1 C2 Standard For Students
7 pages
Cat A Dog
No ratings yet
Cat A Dog
9 pages
04 AMP in Flutter LAB 2 (Building Layouts)
100% (1)
04 AMP in Flutter LAB 2 (Building Layouts)
7 pages

Tutorial3

Uploaded by

Tutorial3

Uploaded by

Tutorial 3

One minus this expression gives the likelihood of a collision.

4. Let m = 11, h1 (k) = (k mod 11), h2 (k) = 6 − (k mod 6).

h(k, i ) = (h1 (k) + i · h2 (k )) mod m

h1 (22) = 0 and cell 0 is empty so we can add 22 there. [h(k, 0) = h1 (k)]

h1 (59) = 4 which is a filled cell. h2 (59) = 1 and h(59, 1) = 5 is an empty tile.

Lastly, h1 (74) = 8, which is occupied. h2 (74) = 4. Now, h(74, 1) = 1 which is free.

You might also like