0% found this document useful (0 votes)
25 views

13 Randomized Algorithms

Random algorithms

Uploaded by

Pta Nhi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views

13 Randomized Algorithms

Random algorithms

Uploaded by

Pta Nhi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Randomization

Algorithmic design patterns.


Chapter 13  Greedy.
 Divide-and-conquer.
Dynamic programming.
Randomized 

Network flow.
Algorithms 

 Randomization.
in practice, access to a pseudo-random number generator

Randomization. Allow fair coin flip in unit time.

Why randomize? Can lead to simplest, fastest, or only known algorithm


for a particular problem.

Ex. Symmetry breaking protocols, graph algorithms, quicksort, hashing,


Slides by Kevin Wayne.
Copyright @ 2005 Pearson-Addison Wesley. load balancing, Monte Carlo integration, cryptography.
All rights reserved.

1 2

Contention Resolution in a Distributed System

13.1 Contention Resolution Contention resolution. Given n processes P1, …, Pn, each competing for
access to a shared database. If two or more processes access the
database simultaneously, all processes are locked out. Devise protocol
to ensure all processes get through on a regular basis.

Restriction. Processes can't communicate.

Challenge. Need symmetry-breaking paradigm.

P1

P2
.
.
.

Pn

4
Contention Resolution: Randomized Protocol Contention Resolution: Randomized Protocol

Protocol. Each process requests access to the database at time t with Claim. The probability that process i fails to access the database in
probability p = 1/n. en rounds is at most 1/e. After e⋅n(c ln n) rounds, the probability is at
most n-c.
Claim. Let S[i, t] = event that process i succeeds in accessing the
database at time t. Then 1/(e ⋅ n) ≤ Pr[S(i, t)] ≤ 1/(2n). Pf. Let F[i, t] = event that process i fails to access database in rounds
1 through t. By independence and previous claim, we have
Pf. By independence, Pr[S(i, t)] = p (1-p)n-1. Pr[F(i, t)] ≤ (1 - 1/(en)) t.
process i requests access none of remaining n-1 processes request access
$en % en
 Choose t = e ⋅ n: Pr[ F(i, t)] " (1# en1 ) " (1# en1 ) " 1
e
 Setting p = 1/n, we have Pr[S(i, t)] = 1/n (1 - 1/n) n-1. ▪
c ln n
value that maximizes Pr[S(i, t)] between 1/e and 1/2
 Choose t = e ⋅ n c ln n: Pr[ F(i, t)] " ( 1e ) = n#c
!

Useful facts from calculus. As n increases from 2, the function: !


 (1 - 1/n)n-1 converges monotonically from 1/4 up to 1/e
 (1 - 1/n)n-1 converges monotonically from 1/2 down to 1/e.

5 6

Contention Resolution: Randomized Protocol

Claim. The probability that all processes succeed within 2e ⋅ n ln n


rounds is at least 1 - 1/n.
13.2 Global Minimum Cut

Pf. Let F[t] = event that at least one of the n processes fails to access
database in any of the rounds 1 through t.

"n % n t
Pr [ F [t] ] = Pr$ U F [i, t ] ' ( ) Pr[ F [i, t]] ( n ( 1* en1 )
# i=1 & i=1

union bound previous slide

!
 Choosing t = 2 en c ln n yields Pr[F[t]] ≤ n · n-2 = 1/n. ▪

"n % n
Union bound. Given events E1, …, En, Pr $ U Ei ' ( ) Pr[ Ei ]
# i=1 & i=1

7
!
Global Minimum Cut Contraction Algorithm

Global min cut. Given a connected, undirected graph G = (V, E) find a Contraction algorithm. [Karger 1995]
cut (A, B) of minimum cardinality. Pick an edge e = (u, v) uniformly at random.
Contract edge e.
Applications. Partitioning items in a database, identify clusters of – replace u and v by single new super-node w
related documents, network reliability, network design, circuit design, – preserve edges, updating endpoints of u and v to w
TSP solvers. – keep parallel edges, but delete self-loops
Repeat until graph has just two nodes v1 and v2.
Network flow solution. Return the cut (all nodes that were contracted to form v1).
Replace every edge (u, v) with two antiparallel edges (u, v) and (v, u).
Pick some vertex s and compute min s-v cut separating s from each
other vertex v ∈ V.
a b c c
a b

False intuition. Global min-cut is harder than min s-t cut.


u d v

contract u-v w
e
f f

9 10

Contraction Algorithm Contraction Algorithm

Claim. The contraction algorithm returns a min cut with prob ≥ 2/n2. Claim. The contraction algorithm returns a min cut with prob ≥ 2/n2.

Pf. Consider a global min-cut (A*, B*) of G. Let F* be edges with one Pf. Consider a global min-cut (A*, B*) of G. Let F* be edges with one
endpoint in A* and the other in B*. Let k = |F*| = size of min cut. endpoint in A* and the other in B*. Let k = |F*| = size of min cut.
 In first step, algorithm contracts an edge in F* probability k / |E|. Let G' be graph after j iterations. There are n' = n-j supernodes.
 Every node has degree ≥ k since otherwise (A*, B*) would not be Suppose no edge in F* has been contracted. The min-cut in G' is still k.
min-cut. ⇒ |E| ≥ ½kn. Since value of min-cut is k, |E'| ≥ ½kn'.
 Thus, algorithm contracts an edge in F* with probability ≤ 2/n. Thus, algorithm contracts an edge in F* with probability ≤ 2/n'.

 Let Ej = event that an edge in F* is not contracted in iteration j.

Pr[E1 " E2 L " En#2 ] = Pr[E1 ] $ Pr[E2 | E1 ] $ L $ Pr[En#2 | E1 " E2 L " En#3 ]
A* B* % (1# 2n ) (1# n#1 ) ( 4 ) ( 3)
2 L 1# 2 1# 2

= ( n ) ( n #1 ) L ( 24 ) ( 13 )
n #2 n#3

= 2
n(n#1)

% 2
F* n2

11 12
Contraction Algorithm Global Min Cut: Context

Amplification. To amplify the probability of success, run the Remark. Overall running time is slow since we perform Θ(n2 log n)
contraction algorithm many times. iterations and each takes Ω(m) time.

Claim. If we repeat the contraction algorithm n2 ln n times with Improvement. [Karger-Stein 1996] O(n2 log3n).
independent random choices, the probability of failing to find the Early iterations are less risky than later ones: probability of
global min-cut is at most 1/n2. contracting an edge in min cut hits 50% when n / √ 2 nodes remain.
Run contraction algorithm until n / √ 2 nodes remain.
Pf. By independence, the probability of failure is at most Run contraction algorithm twice on resulting graph, and return best of
two cuts.
# 2 &n
2
ln n )# 2 & 12 n 2 , 2ln n 2ln n 1
%1" 2 (
$ n '
= +%1" 2 ( .
+*$ n ' .-
( )
/ e"1 =
n2 Extensions. Naturally generalizes to handle positive weights.

(1 - 1/x)x ≤ 1/e
Best known. [Karger 2000] O(m log3n).
! faster than best known max flow algorithm or
deterministic global min cut algorithm

13 14

Expectation

13.3 Linearity of Expectation Expectation. Given a discrete random variables X, its expectation E[X]
is defined by: "
E[X ] = # j Pr[X = j]
j=0

Waiting for!a first success. Coin is heads with probability p and tails
with probability 1-p. How many independent flips X until first heads?

# # p # p 1% p 1
E[X ] = $ j " Pr[X = j] = $ j (1% p) j%1 p = $ j (1% p) j = " 2 =
j=0 j=0 1% p j=0 1% p p p
j-1 tails 1 head

16
Expectation: Two Properties Guessing Cards

Useful property. If X is a 0/1 random variable, E[X] = Pr[X = 1]. Game. Shuffle a deck of n cards; turn them over one at a time; try to
guess each card.
Pf. # 1
E[X ] = $ j " Pr[X = j] = $ j " Pr[X = j] = Pr[X = 1]
j=0 j=0
Memoryless guessing. No psychic abilities; can't even remember
what's been turned over already. Guess a card from full deck
not necessarily independent uniformly at random.
! Linearity of expectation. Given two random variables X and Y defined
over the same probability space, E[X + Y] = E[X] + E[Y].
Claim. The expected number of correct guesses is 1.
Pf. (surprisingly effortless using linearity of expectation)
Let Xi = 1 if ith prediction is correct and 0 otherwise.
Decouples a complex calculation into simpler pieces.

 Let X = number of correct guesses = X1 + … + Xn.


 E[Xi] = Pr[Xi = 1] = 1/n.
 E[X] = E[X1] + … + E[Xn] = 1/n + … + 1/n = 1. ▪

linearity of expectation

17 18

Guessing Cards Coupon Collector

Game. Shuffle a deck of n cards; turn them over one at a time; try to Coupon collector. Each box of cereal contains a coupon. There are n
guess each card. different types of coupons. Assuming all boxes are equally likely to
contain each coupon, how many boxes before you have ≥ 1 coupon of
Guessing with memory. Guess a card uniformly at random from cards each type?
not yet seen.
Claim. The expected number of steps is Θ(n log n).
Claim. The expected number of correct guesses is Θ(log n). Pf.
Pf.  Phase j = time between j and j+1 distinct coupons.
 Let Xi = 1 if ith prediction is correct and 0 otherwise.  Let Xj = number of steps you spend in phase j.
 Let X = number of correct guesses = X1 + … + Xn.  Let X = number of steps in total = X0 + X1 + … + Xn-1.
 E[Xi] = Pr[Xi = 1] = 1 / (n - i - 1).
n"1 n"1 n n 1
 E[X] = E[X1] + … + E[Xn] = 1/n + … + 1/2 + 1/1 = H(n). ▪ E[X ] = # E[X j ] = # = n # = n H (n)
j=0 j=0 n" j i=1 i
linearity of expectation ln(n+1) < H(n) < 1 + ln n
prob of success = (n-j)/n
⇒ expected waiting time = n/(n-j)
!

19 20
Maximum 3-Satisfiability

exactly 3 distinct literals per clause


13.4 MAX 3-SAT MAX-3SAT. Given 3-SAT formula, find a truth assignment that
satisfies as many clauses as possible.

C1 = x2 " x3 " x4
C2 = x2 " x3 " x4
C3 = x1 " x2 " x4
C4 = x1 " x2 " x3
C5 = x1 " x2 " x4

Remark. NP-hard !search problem.

Simple idea. Flip a coin, and set each variable true with probability ½,
independently for each variable.

22

Maximum 3-Satisfiability: Analysis The Probabilistic Method

Claim. Given a 3-SAT formula with k clauses, the expected number of Corollary. For any instance of 3-SAT, there exists a truth assignment
clauses satisfied by a random assignment is 7k/8. that satisfies at least a 7/8 fraction of all clauses.

" 1 if clause C j is satisfied


Pf. Consider random variable Z j = # Pf. Random variable is at least its expectation some of the time. ▪
$ 0 otherwise.

 Let Z = weight of clauses satisfied by assignment Zj.

!
k
E[Z ] = " E[Z j ] Probabilistic method. We showed the existence of a non-obvious
j=1
linearity of expectation k property of 3-SAT by showing that a random construction produces it
= " Pr[clause C j is satisfied] with positive probability!
j=1
= 7k
8

23 24
Maximum 3-Satisfiability: Analysis Maximum 3-Satisfiability: Analysis

Q. Can we turn this idea into a 7/8-approximation algorithm? In Johnson's algorithm. Repeatedly generate random truth assignments
general, a random variable can almost always be below its mean. until one of them satisfies ≥ 7k/8 clauses.

Lemma. The probability that a random assignment satisfies ≥ 7k/8 Theorem. Johnson's algorithm is a 7/8-approximation algorithm.
clauses is at least 1/(8k).
Pf. By previous lemma, each iteration succeeds with probability at
Pf. Let pj be probability that exactly j clauses are satisfied; let p be least 1/(8k). By the waiting-time bound, the expected number of trials
probability that ≥ 7k/8 clauses are satisfied. to find the satisfying assignment is at most 8k. ▪

7k = E[Z ] = # j pj
8
j "0

= # j pj + # j pj
j < 7k /8 j " 7k /8

$ ( 7k
8
% 18 ) # p j + k # p j
j < 7k /8 j " 7k /8
$ ( 87 k % 18 ) & 1 + k p

Rearranging terms yields p ≥ 1 / (8k). ▪


25 26
!

Maximum Satisfiability Monte Carlo vs. Las Vegas Algorithms

Extensions. Monte Carlo algorithm. Guaranteed to run in poly-time, likely to find


 Allow one, two, or more literals per clause. correct answer.
 Find max weighted set of satisfied clauses. Ex: Contraction algorithm for global min cut.

Theorem. [Asano-Williamson 2000] There exists a 0.784-


approximation algorithm for MAX-SAT. Las Vegas algorithm. Guaranteed to find correct answer, likely to run
in poly-time.
Theorem. [Karloff-Zwick 1997, Zwick+computer 2002] There exists a Ex: Randomized quicksort, Johnson's MAX-3SAT algorithm.
7/8-approximation algorithm for version of MAX-3SAT where each
clause has at most 3 literals. stop algorithm after a certain point

Theorem. [Håstad 1997] Unless P = NP, no ρ-approximation algorithm Remark. Can always convert a Las Vegas algorithm into Monte Carlo,
for MAX-3SAT (and hence MAX-SAT) for any ρ > 7/8. but no known method to convert the other way.

very unlikely to improve over simple randomized


algorithm for MAX-3SAT

27 28
RP and ZPP

RP. [Monte Carlo] Decision problems solvable with one-sided error in


poly-time.
13.6 Universal Hashing
Can decrease probability of false negative
One-sided error. to 2-100 by 100 independent repetitions

If the correct answer is no, always return no.


If the correct answer is yes, return yes with probability ≥ ½.

ZPP. [Las Vegas] Decision problems solvable in expected poly-time.

running time can be unbounded, but


on average it is fast

Theorem. P ⊆ ZPP ⊆ RP ⊆ NP.

Fundamental open questions. To what extent does randomization help?


Does P = ZPP? Does ZPP = RP? Does RP = NP?

29

Dictionary Data Type Hashing

Dictionary. Given a universe U of possible elements, maintain a subset Hash function. h : U → { 0, 1, …, n-1 }.
S ⊆ U so that inserting, deleting, and searching in S is efficient.
Hashing. Create an array H of size n. When processing element u,
Dictionary interface. access array element H[h(u)].
 Create(): Initialize a dictionary with S = φ.
 Insert(u): Add element u ∈ U to S. Collision. When h(u) = h(v) but u ≠ v.
 Delete(u): Delete u from S, if u is currently in S.  A collision is expected after Θ(√n) random insertions. This
 Lookup(u): Determine whether u is in S. phenomenon is known as the "birthday paradox."
 Separate chaining: H[i] stores linked list of elements u with h(u) = i.

Challenge. Universe U can be extremely large so defining an array of


size |U| is infeasible. H[1] jocularly seriously

H[2] null

Applications. File systems, databases, Google, compilers, checksums H[3] suburban untravelled considerating

P2P networks, associative arrays, cryptography, web caching, etc.


H[n] browsing

31 32
Ad Hoc Hash Function Algorithmic Complexity Attacks

Ad hoc hash function. When can't we live with ad hoc hash function?
Obvious situations: aircraft control, nuclear reactors.
int h(String s, int n) { Surprising situations: denial-of-service attacks.
int hash = 0;
malicious adversary learns your ad hoc hash function
for (int i = 0; i < s.length(); i++)
(e.g., by reading Java API) and causes a big pile-up in
hash = (31 * hash) + s[i]; a single slot that grinds performance to a halt
return hash % n;
} hash function ala Java string library

Real world exploits. [Crosby-Wallach 2003]


 Bro server: send carefully chosen packets to DOS the server, using
Deterministic hashing. If |U| ≥ n2, then for any fixed hash function h, less bandwidth than a dial-up modem
there is a subset S ⊆ U of n elements that all hash to same slot. Thus,  Perl 5.8.0: insert carefully chosen strings into associative array.
Θ(n) time per search in worst-case.  Linux 2.4.20 kernel: save files with carefully chosen names.

Q. But isn't ad hoc hash function good enough in practice?

33 34

Hashing Performance Universal Hashing

Idealistic hash function. Maps m elements uniformly at random to n Universal class of hash functions. [Carter-Wegman 1980s]
hash slots.  For any pair of elements u, v ∈ U, Pr h " H [ h(u) = h(v) ] # 1/ n
 Running time depends on length of chains.  Can select random h efficiently. chosen uniformly at random
 Average length of chain = α = m / n.  Can compute h(u) efficiently.
 Choose n ≈ m ⇒ on average O(1) per insert, lookup, or delete. !
Ex. U = { a, b, c, d, e, f }, n = 2.

H = {h1, h2}
Challenge. Achieve idealized randomized guarantees, but with a hash a b c d e f
Pr h ∈ H [h(a) = h(b)] = 1/2
h1(x) 0 1 0 1 0 1 not universal
function where you can easily find items where you put them. Pr h ∈ H [h(a) = h(c)] = 1
h2(x) 0 0 0 1 1 1 Pr h ∈ H [h(a) = h(d)] = 0
...
Approach. Use randomization in the choice of h.
a b c d e f H = {h1, h2 , h3 , h4}
adversary knows the randomized algorithm you're using, Pr h ∈ H [h(a) = h(b)] = 1/2
but doesn't know random choices that the algorithm makes h1(x) 0 1 0 1 0 1
Pr h ∈ H [h(a) = h(c)] = 1/2 universal
h2(x) 0 0 0 1 1 1 Pr h ∈ H [h(a) = h(d)] = 1/2
h3(x) 0 0 1 0 1 1 Pr h ∈ H [h(a) = h(e)] = 1/2
Pr h ∈ H [h(a) = h(f)] = 0
h4(x) 1 0 0 1 1 0 ...

35 36
Universal Hashing Designing a Universal Family of Hash Functions

Universal hashing property. Let H be a universal class of hash Theorem. [Chebyshev 1850] There exists a prime between n and 2n.
functions; let h ∈ H be chosen uniformly at random from H; and let
u ∈ U. For any subset S ⊆ U of size at most n, the expected number of Modulus. Choose a prime number p ≈ n. no need for randomness here

items in S that collide with u is at most 1.


Integer encoding. Identify each element u ∈ U with a base-p integer
Pf. For any element s ∈ S, define indicator random variable Xs = 1 if of r digits: x = (x1, x2, …, xr).
h(s) = h(u) and 0 otherwise. Let X be a random variable counting the
total number of collisions with u. Hash function. Let A = set of all r-digit, base-p integers. For each
a = (a1, a2, …, ar) where 0 ≤ ai < p, define

Eh"H [X ] = E[#s"S X s ] = #s"S E[X s ] = #s"S Pr[X s = 1] $ #s"S 1 = | S | 1n $ 1


n
#r &
linearity of expectation Xs is a 0-1 random variable universal ha (x) = % " ai xi ( mod p
(assumes u ∉ S)
$ i=1 '
!
Hash function family. H = { ha : a ∈ A }.

37 38

Designing a Universal Class of Hash Functions Number Theory Facts

Theorem. H = { ha : a ∈ A } is a universal class of hash functions. Fact. Let p be prime, and let z ≠ 0 mod p. Then αz = m mod p has at most
one solution 0 ≤ α < p.
Pf. Let x = (x1, x2, …, xr) and y = (y1, y2, …, yr) be two distinct elements of
U. We need to show that Pr[ha(x) = ha(y)] ≤ 1/n. Pf.
 Since x ≠ y, there exists an integer j such that xj ≠ yj.  Suppose α and β are two different solutions.
 We have ha(x) = ha(y) iff  Then (α - β)z = 0 mod p; hence (α - β)z is divisible by p.
 Since z ≠ 0 mod p, we know that z is not divisible by p;
aj (yj " xj) = $ ai (xi " yi ) mod p
1424 3 i# j it follows that (α - β) is divisible by p.
z 1 44244 3
m  This implies α = β. ▪
 Can assume a was chosen uniformly at random by first selecting all
coordinates ai where i ≠ j, then selecting aj at random. Thus, we can
assume ! ai is fixed for all coordinates i ≠ j. Bonus fact. Can replace "at most one" with "exactly one" in above fact.
 Since p is prime, aj z = m mod p has at most one solution among p Pf idea. Euclid's algorithm.
possibilities. see lemma on next slide

 Thus Pr[ha(x) = ha(y)] = 1/p ≤ 1/n. ▪

39 40
Chernoff Bounds (above mean)

13.9 Chernoff Bounds Theorem. Suppose X1, …, Xn are independent 0-1 random variables. Let
X = X1 + … + Xn. Then for any µ ≥ E[X] and for any δ > 0, we have
µ
' e! $
Pr[ X > (1 + ! ) µ ] < % 1+! "
& (1 + ! ) #

sum of independent 0-1 random variables


is tightly centered on the mean

Pf. We apply a number of simple transformations.


 For any t > 0,

[
Pr[X > (1+ ")µ] = Pr e t X > e t(1+")µ ] # e$t(1+")µ % E[e tX ]

f(x) = etX is monotone in x Markov's inequality: Pr[X > a] ≤ E[X] / a

!
 Now E[e tX ] = E[e t "i X i ] = # i E[e t X i ]

definition of X independence

42
!

Chernoff Bounds (above mean) Chernoff Bounds (below mean)

Pf. (cont) Theorem. Suppose X1, …, Xn are independent 0-1 random variables. Let
 Let pi = Pr[Xi = 1]. Then, X = X1 + … + Xn. Then for any µ ≤ E[X] and for any 0 < δ < 1, we have
t !1)
E[et X i ] = pi et + (1 ! pi )e0 = 1+ pi (et ! 1) " e pi ( e Pr[ X < (1 " ! ) µ ] < e "!
2µ / 2

for any α ≥ 0, 1+α ≤ e α


Pf idea. Similar.
 Combining everything:
Remark. Not quite symmetric since only makes sense to consider δ < 1.
t t
Pr[X > (1+ ")µ] # e$t(1+")µ %i E[e t X i ] # e$t(1+")µ %i e pi (e $1) # e$t(1+")µ eµ(e $1)

previous slide inequality above ∑i pi = E[X] ≤ µ

!
 Finally, choose t = ln(1 + δ). ▪

43 44
Load Balancing

13.10 Load Balancing Load balancing. System in which m jobs arrive in a stream and need to
be processed immediately on n identical processors. Find an assignment
that balances the workload across processors.

Centralized controller. Assign jobs in round-robin manner. Each


processor receives at most m/n jobs.

Decentralized controller. Assign jobs to processors uniformly at


random. How likely is it that some processor is assigned "too many"
jobs?

46

Load Balancing Load Balancing: Many Jobs

Analysis. Theorem. Suppose the number of jobs m = 16n ln n. Then on average,


 Let Xi = number of jobs assigned to processor i. each of the n processors handles µ = 16 ln n jobs. With high probability
 Let Yij = 1 if job j assigned to processor i, and 0 otherwise. every processor will have between half and twice the average load.
 We have E[Yij] = 1/n
 Thus, Xi = ∑ j Yi j, and µ = E[Xi] = 1. Pf.
ec !1
 Applying Chernoff bounds with δ = c - 1 yields Pr[ X i > c] <  Let Xi , Yij be as before.
cc
 Applying Chernoff bounds with δ = 1 yields
 Let γ(n) be number x such that xx = n, and choose c = e γ(n).
16 n ln n ln n
&e# &1# 1 2
e( ( n ) 2( ( n ) (16n ln n) 1
Pr[ X i < 12 µ ] < e 2 ( 2 )
c "1 1
ec '1 &e# & 1 # & 1 # 1 Pr[ X i > 2 µ ] < $ ! < $ ! = =
Pr[ X i > c] < < $ ! = $ ! < $ ! = 2 %4" %e" n2 n2
cc %c" % ( ( n) " % ( ( n) " n
 Union bound ⇒ with probability ≥ 1 - 1/n no processor receives  Union bound ⇒ every processor has load between half and twice
more than e γ(n) = Θ(logn / log log n) jobs. the average with probability ≥ 1!- 2/n. ▪
Fact: this bound is asymptotically tight: with high
probability, some processor receives Θ(logn / log log n)

47 48
Extra Slides 13.5 Randomized Divide-and-Conquer

Quicksort Quicksort

Sorting. Given a set of n distinct elements S, rearrange them in Running time.


ascending order.  [Best case.] Select the median element as the splitter: quicksort
makes Θ(n log n) comparisons.
RandomizedQuicksort(S) {
 [Worst case.] Select the smallest element as the splitter:
if |S| = 0 return
quicksort makes Θ(n2) comparisons.
choose a splitter ai ∈ S uniformly at random
foreach (a ∈ S) {
Randomize. Protect against worst case by choosing splitter at random.
if (a < ai) put a in S-
else if (a > ai) put a in S+
} Intuition. If we always select an element that is bigger than 25% of
RandomizedQuicksort(S-)
the elements and smaller than 25% of the elements, then quicksort
output ai
RandomizedQuicksort(S+) makes Θ(n log n) comparisons.
}

Notation. Label elements so that x1 < x2 < … < xn.


Remark. Can implement in-place.

O(log n) extra space

51 52
Quicksort: BST Representation of Splitters Quicksort: BST Representation of Splitters

BST representation. Draw recursive BST of splitters. Observation. Element only compared with its ancestors and descendants.
x2 and x7 are compared if their lca = x2 or x7.
x2 and x7 are not compared if their lca = x3 or x4 or x5 or x6.
x7 x6 x12 x3 x11 x8 x7 x1 x15 x13 x17 x10 x16 x14 x9 x4 x5
Claim. Pr[xi and xj are compared] = 2 / |j - i + 1|.
first splitter, chosen uniformly at random

x10 x10

S- x5 S+ x13 x5 x13

x3 x9 x11 x16 x3 x9 x11 x16

x2 x4 x7 x12 x15 x17 x2 x4 x7 x12 x15 x17

x1 x6 x8 x14 x1 x6 x8 x14

53 54

Quicksort: Expected Number of Comparisons

Theorem. Expected # of comparisons is O(n log n).


Pf.

2 n i 1 n 1 n 1
# = 2# # $ 2n # % 2n & dx = 2n ln n
1$ i < j $ n j " i +1 i=1 j=2 j j=1 j x=1 x

probability that i and j are compared

Theorem. [Knuth 1973] Stddev of number of comparisons is ~ 0.65N.

Ex. If n = 1 million, the probability that randomized quicksort takes


less than 4n ln n comparisons is at least 99.94%.

Chebyshev's inequality. Pr[|X - µ| ≥ kδ] ≤ 1 / k2.

55

You might also like