Algorithms and Data Structures Sample 2
Algorithms and Data Structures Sample 2
Algorithms and Data Structures Sample 2
Problem Set 3
Notice that one problem is marked noncollaborative. As you might expect, this prob
lem should be done without any collaboration.
Problem 1. Augment the van Emde Boas priority queue to support the following oper
ations on integers in the range {0, 1, 2, . . . , u − 1} in O(log log u) worstcase time each and
O(u) space total:
Find (x, Q): Report whether the element x is stored in the structure.
Predecessor (x, Q): Return x’s predecessor, the element of largest value less than x, or
null if x is the minimum element.
Successor (x, Q): Return x’s successor, the element of smallest value greater than x, or
null if x is the maximum element.
Problem 3. In class we considered building a depthk, baseΔ (implicit) trie over integers
in the range from 1 to C (where Δ = C 1/k ) that supported insert in time O(k) and deletemin
in time O(Δ). By choosing k and Δ appropriately we found shortest paths in O(m+n log C)
time. We now improve this bound. Consider modifying the deletemin operation, where we
scan forward through a trie node and reach a new a bucket of items. If that bucket has more
than t items in it, we expand it to multiple buckets in a node at the next trie level down as
before. But if there are fewer than t items, we simply store them in a heap. During inserts
or decreasekeys, new items may be added to the heap, and if the heap size grows beyond t,
we expand it to a trie node of buckets as before.
(a) Let I(t), D(t), X(t) denote the times to insert, decrease key, and extract min in
a heap of size at most t. Prove that the amortized times for operations in the
https://www.programminghomeworkhelper.com/
2 Handout 5: Problem Set 3
Problem 4. Perfect hashing is nice, but does have the drawback that the perfect hash
function has a lengthy description (since you have to describe the secondlevel hash function
for each bucket). Consider the following alternative approach to producing a perfect hash
function with a small description. Define bibucket hashing, or bashing, as follows. Given n
items, allocate two arrays of size n1.5 . When inserting an item, map it to one bucket in each
array, and place it in the emptier of the two buckets.
(a) Suppose a random function is used to map each item to buckets. Give a good
upper bound on the expected number of collisions. Hint: What is the probability
that the k th inserted item collides with some previously inserted item?
(b) Argue that bashing can be implemented efficiently, with the same expected out
(c) Conclude an algorithm with linear expected time (ignoring array initialization)
for identifying a perfect bash function for a set of n items. How large is the
OPTIONAL (d) Generalize the above approach to use less space by exploiting tri
OPTIONAL Problem 5. Our bucketing data structures (and in particular ven Emde
Boas queues) use arrays, and we never worried about the time taken to initialize them.
Devise a way to avoid initializing large arrays. More specifically, develop a data structure
that holds n items according to an index i ∈ {1, . . . n} and supports the following operations
in O(1) time (worst case) per operation:
Your data structure should use O(n) space and should work regardless of what garbage
values are stored in that space at the beginning of the execution. Hint: use extra space to
remember which entries of the array have been initialized.
OPTIONAL Problem 6. Can a van Emde Boas type data structure be combined with
some ideas from Fibonacci heaps to support insert/decreasekey in O(1) time and deletemin
in O(log log u) time?
Massachusetts Institute of Technology Handout 8
6.854J/18.415J: Advanced Algorithms Wednesday, September 28, 2005
David Karger
Problem 2. Let us first recall Dijkstra’s algorithm. Given a weighted graph G = (V, E),
and a node s ∈ V , it finds the length of the shortest path from s to any node u ∈ V . The
algorithm works by maintaining a set of nodes S for which the shortest path from s to v ∈ S
is already known. During each iteration, the algorithm chooses the node in u ∈ V \ S whose
estimated distance from s is minimum. Then, for each node w that is adjacent to u, the
algorithm sets the estimated distance from s to w to be the the minimum of the current
estimated distance and the distance from s to u, and then using the edge (u, w).
The key property to note in the above algorithm is that the maximum difference between
the estimated distances from s to any node in the priority queue holding the nodes in V \ S
is C. This can be proven by induction on the iteration on the number of nodes in S.
The solution then is to use two vEB queues, where the range of values in each vEB queue
is [1, C]. The “left” queue will store the smaller values, and the “right” queue will store the
larger values. When we want to insert an estimated distance into the queue, we only insert
the distance mod C. Along with each queue, we will store the “base” value of that queue,
which will be a multiple of C. By the key property above, we will never have to worry about
filling a third queue, since the difference between any values in the two queues is at most C.
Problem 3.
(a) We will present how to perform the operations, and how to amortize the cost.
2 Handout 8: Problem Set 3 Solutions
Insert We associate with each inserted item a potential of k + ktΔ . The additive
factor of k will be spent on constant time operations that will be performed each
time an item goes one level down, and the factor of ktΔ will reimburse the cost
of scanning nodes. A new node is created when the number of elements in a
heap exceeds t. Each of these elements donates Δt of energy, and goes one level
down, so in total we collect Δ units of energy, and assign it to the newly created
node as a reimbursement for scanning this node. Eventually, while performing
insert of x, we descent down the trie, charging x’s energy, until we meet either a
blob, into which we insert x in constant time, or a heap, into which we insert x
in O(I(t)) time, or which is expanded down, and the heap’s elements pay for it.
Finally, the amortized cost of the operation is O(k + ktΔ + I(t)).
Extractmin Starting from the last entry from which we have extracted a
minimum, we scan for the first nonempty entry, possibly going down, if the
starting entry has been meanwhile expanded. In total, we scan each entry once,
and the scanning is paid by the potential associated with nodes. We find a heap
or a blob. If the latter, we convert it into a heap, and if an expansion occurs, we
continue scanning. After all we find a nonempty heap, from which we remove the
minimum in O(X(t)) time. For everything except the last removal the potential
is charged. The amortized time cost is only O(X(t)).
(b) We perform at most n inserts, n extractmin, and m decreasekey operations.
� � � �
kΔ
O n k+ + I(t) + nX(t) + m (D(t) + I(t)) .
t
Since in the case of Fibonacci heaps I(t) = O(1), D(t) = O(1), X(t) = O(log t),
we can simplify our upper bound to
� � ��
kΔ
O m+n k+ + log t .
t
We substitute first Δ = C 1/k , and t = 2k , achieving
kC 1/k
� � ��
R=O m+n k+ .
2k
Handout 8: Problem Set 3 Solutions 3
√
We transform the last addend, knowing that k = log C:
√ √
kC 1/k log C2log C/ log C �
= √ = log C.
2k 2 log C
Eventually, we achieve �
R = O(m + n log C).
Problem 4. (a) Consider the (k + 1)st item inserted. Since only k buckets (at
worst) are occupied, the probability that both candidate locations are occupied is
only (k/n1/5 )2 . Thus, the expected number of times an item is actually inserted
into an alreadyoccupied bucket is at most
n−1
� (n − 1)(n)(2n − 1)
(k/n1.5 )2 =
k=0
6n3
≤ 1/3
Now let’s consider pairwise collisions. Item k collides with item j < k only
if (i) one of the candidate locations of item k is the location as item j (this
has probability at most 2/n1.5 ) and (ii) the other candidate location for item
k contains at least one element (probability k/n1.5 ). Thus, the probability k
collides with j is at most k/n3 . Summing over the k possible values of j < k,
we find the expected number of collisions for item k is at most k 2 /n3 . Summing
over all k, we get the same result as above: O(1) expected collisions.
(b) Start with a 2universal family of hash functions mapping n items to 2n1.5 lo
function from the hash family family. The probability that item k collides with
Now suppose that we allocate two arrays of size 2n1.5 and choose a random 2
universal hash function from the family independently for each array. If an item
has no collision in either array, then it will be placed in an empty bucket by
the bash function. We need merely analyze the probability that this happens for
every item (this would make the bash function perfect).
√
The probability that item k has a collision in both arrays is at most (1/2 n)2 =
1/4n. It follows that the expected number of items colliding with some other
item is at most 1/4. This implies in turn that with probability 3/4, every item is
placed in an empty bucket by the (perfect) bash function. This in turn implies
that some pair of 2universal hash functions defines a perfect bash for our set of
n items.
Since every set of items gets a perfect bash from this scheme, it follows that the
family of pairs of 2universal functions above is a perfect bash family. Since the
4 Handout 8: Problem Set 3 Solutions
2universal family has size polynomial in the universe, so does the family of pairs
of 2universal functions.
(c) If we map our n items to k candidate locations in an array of size n1+1/k , our
collision odds work out as above and we get a constant number of collisions.
n1+1/k , has a constant probability of being perfect for any particular set of items,
so the set of all such functions provides a perfect family (of polynomial size for
any constant k). This gives a tradeoff of k probes for perfect hashing in space
O(n1+1/k ).
Note that while we can achieve perfect hashing to O(n) space, the resulting family
does not have polynomial size (since a different, subsidiary hash function must
be chosen for each subhashtable).