0% found this document useful (0 votes)
25 views

Greedy Algorithms

Data structures

Uploaded by

sheronmufunguri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views

Greedy Algorithms

Data structures

Uploaded by

sheronmufunguri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 69

Greedy Algorithms

 Make choice based on immediate rewards rather than


looking ahead to see the optimum
 In many cases this is effective as the look ahead variation
can require exponential time as the number of possible
factors combine
– Best way to get to a destination?
– Without other knowledge, going down the road that heads in the
direction of the destination is probably a reasonable choice
• This is the greedy approach and can do well in some cases
• It also makes the decision much easier than considering all possible
future alternatives
– May not be the optimal decision, in contrast to considering all
possible alternatives and then subsequent alternatives, etc.

CS 312 – Greedy Algorithms 1


Next Move in Chess
 What move should you do next
 Could do move which leaves you in the best material
position after one move, standard greedy approach
– Greedy since it takes best now without considering the
ramifications of what could occur later
 Could do a 2nd order greedy approach
– Look two moves ahead
– More time consuming
– Better?
 Nth order greedy? – until game decided
– No longer greedy since consider full situation
– Exponential time required

CS 312 – Greedy Algorithms 2


Coins Problem
 Given:
– An unbounded supply of coins of specific denominations
– An integer c
 Find: Minimal number of coins that add up to c
 What is your greedy algorithm?

CS 312 – Greedy Algorithms 3


Coins Algorithm
Repeat until sum = c
Add the largest coin which does not cause sum to exceed c
 Does this work and is it Optimal?
– Assume denominations are 50¢, 20¢, 3¢, 2¢
– Try it with goal of 75¢, 60¢
– Now assume denominations are 50¢, 20¢, 3¢, 1¢

Greedy Philosophy
 Build up a solution piece by piece
 Always choose the next piece that offers the most obvious
and immediate benefit
 Without violating given constraints

CS 312 – Greedy Algorithms 4


MST – Minimum Spanning Tree

 Given a connected undirected graph we would like to find the “cheapest”


connected version of that graph
 Remove all extra edges, leaving just enough to be connected (V-1) – it
will be a tree
 Given G = (V, E), find the tree T = (V, E') where E' ⊆ E which
minimizes the sum of the edge lengths
 Not necessarily unique
 Many applications – cheapest phone network, etc.
CS 312 – Greedy Algorithms 5
MST – Minimum Spanning Tree
 What greedy algorithm might we use assuming we start
with the entire graph

1 2
1 2 3
6 4 5 6
4
3 8
4 5 6

4 7 3
7
CS 312 – Greedy Algorithms 6
MST – Minimum Spanning Tree
 What greedy algorithm might we use
assuming we start with the entire graph
1 2
 Iteratively take away the largest remaining 1 2 3
edge in the graph which does not 4
6
4
5
6
disconnect the graph 3 8
– Is it a greedy approach? 4 5 6

 Complexity? 7
4 3
 How do we prove if it works/optimal or 7
not
– Counterexamples – natural first attempt
– If no easily found counterexamples, we then
seek a more formal proof

CS 312 – Greedy Algorithms 7


Kruskal's Algorithm
 Sometimes greedy algorithms can also be optimal!
 Kruskal's is a simple greedy optimal algorithm for MST
1. Start with an empty graph
2. Repeatedly add the next smallest edge from E that does
not produce a cycle

 How might we test for cycles and what would the


complexity be? – more efficient data structure?

CS 312 – Greedy Algorithms 8


Kruskal's Algorithm

find(u) = find(v) if u and v are


Represents nodes in disjoint sets
in the same set, which means
makeset(u): create a singleton set containing just u
they are in the same connected
find(u): to which set does u belong?
component
union(u,v): merge the sets containing u and v
Why not add edge {u,v} if both
are in the same set? 9
Kruskal’s Algorithm

Make a disjoint set for each vertex


1 2 {1}{2}{3}{4}{5}{6}{7}
1 2 3
6 4 5 6
4
3 8
4 5 6

4 7 3
7

CS 312 – Greedy Algorithms 10


Kruskal’s Algorithm

Sort edges by weight


1 2 1: (1,2) {1}{2}{3}{4}{5}{6}{7}
1 2 3
2: (2,3)
6 4 5 6 3: (4,5)
4
3 8 3: (6,7)
4 5 6 4: (1,4)
4: (2,5)
4 7 3 4: (4,7)
5: (3,5)
7

CS 312 – Greedy Algorithms 11


Kruskal’s Algorithm

Add first edge to X if no cycle created


1 2 1: (1,2) {1}{2}{3}{4}{5}{6}{7}
1 2 3
2: (2,3)
6 4 5 6 3: (4,5)
4
3 8 3: (6,7)
4 5 6 4: (1,4)
4: (2,5)
4 7 3 4: (4,7)
5: (3,5)
7

CS 312 – Greedy Algorithms 12


Kruskal’s Algorithm

Merge vertices in added edges


1 2 1: (1,2) {1,2}{3}{4}{5}{6}{7}
1 2 3
2: (2,3)
6 4 5 6 3: (4,5)
4
3 8 3: (6,7)
4 5 6 4: (1,4)
4: (2,5)
4 7 3 4: (4,7)
5: (3,5)
7

CS 312 – Greedy Algorithms 13


Kruskal’s Algorithm

Process each edge in order


1 2 1: (1,2) {1,2}{3}{4}{5}{6}{7}
1 2 3
2: (2,3) {1,2,3}{4}{5}{6}{7}
6 4 5 6 3: (4,5)
4
3 8 3: (6,7)
4 5 6 4: (1,4)
4: (2,5)
4 7 3 4: (4,7)
5: (3,5)
7

CS 312 – Greedy Algorithms 14


Kruskal’s Algorithm

1 2 1: (1,2) {1,2}{3}{4}{5}{6}{7}
1 2 3
2: (2,3) {1,2,3}{4}{5}{6}{7}
6 4 5 6 3: (4,5) {1,2,3}{4,5}{6}{7}
4
3 8 3: (6,7)
4 5 6 4: (1,4)
4: (2,5)
4 7 3 4: (4,7)
5: (3,5)
7

Note that each set is a connected component of G 15


Kruskal’s Algorithm

1 2 1: (1,2) {1,2}{3}{4}{5}{6}{7}
1 2 3
2: (2,3) {1,2,3}{4}{5}{6}{7}
6 4 5 6 3: (4,5) {1,2,3}{4,5}{6}{7}
4
3 8 3: (6,7) {1,2,3}{4,5}{6,7}
4 5 6 4: (1,4)
4: (2,5)
4 7 3 4: (4,7)
5: (3,5)
7

CS 312 – Greedy Algorithms 16


Kruskal’s Algorithm

1 2 1: (1,2) {1,2}{3}{4}{5}{6}{7}
1 2 3
2: (2,3) {1,2,3}{4}{5}{6}{7}
6 4 5 6 3: (4,5) {1,2,3}{4,5}{6}{7}
4
3 8 3: (6,7) {1,2,3}{4,5}{6,7}
4 5 6 4: (1,4) {1,2,3,4,5}{6,7}
4: (2,5)
4 7 3 4: (4,7)
5: (3,5)
7

CS 312 – Greedy Algorithms 17


Kruskal’s Algorithm

Must join separate components


1 2 1: (1,2) {1,2}{3}{4}{5}{6}{7}
1 2 3
2: (2,3) {1,2,3}{4}{5}{6}{7}
6 4 5 6 3: (4,5) {1,2,3}{4,5}{6}{7}
4
3 8 3: (6,7) {1,2,3}{4,5}{6,7}
4 5 6 4: (1,4) {1,2,3,4,5}{6,7}
4: (2,5) rejected
4 7 3 4: (4,7)
5: (3,5)
7

CS 312 – Greedy Algorithms 18


Kruskal’s Algorithm
Done when all vertices in one set. Then they are all connected with
exactly |V| - 1 edges. Book version just goes until all edges considered.

1 2 1: (1,2) {1,2}{3}{4}{5}{6}{7}
1 2 3
2: (2,3) {1,2,3}{4}{5}{6}{7}
6 4 5 6 3: (4,5) {1,2,3}{4,5}{6}{7}
4
3 8 3: (6,7) {1,2,3}{4,5}{6,7}
4 5 6 4: (1,4) {1,2,3,4,5}{6,7}
4: (2,5) rejected
4 7 3 4: (4,7) {1,2,3,4,5,6,7} done
5: (3,5)
7

CS 312 – Greedy Algorithms 19


Kruskal's Algorithm – Is it correct?
 We have created a tree since we added V-1 edges with no
cycles
 But how do we know it is a minimal spanning tree (MST)?
 We have added the V-1 smallest possible edges that do not
create cycles
 Thus, it seems intuitive that we have the smallest sum of
edges and an MST
 But that is not a proof
 Review the proof in the book

CS 312 – Greedy Algorithms 20


Kruskal's Algorithm: Inductive Proof
 Theorem: Kruskal’s Algorithm finds a minimum spanning tree
 Basis: Xo =  and G is connected so a solution must exist
 Is this a correct partial solution?
 Assumption: At any moment edges Xt are part of an MST for G
 Inductive step is the Cut Property
 Cut Property: Assume edges X are part of an MST for G=(V,E). Pick any
subset S for which X does not cross between S and V-S, and let e be the
lightest edge across this partition. Then X ∪ {e} is part of some MST.
S V-S
e

X
CS 312 – Greedy Algorithms 21
 Cut Property: Assume edges X are part of an MST for G=(V,E). Pick
any subset S for which X does not cross between S and V-S, and let e
be the lightest edge across this partition. Then X ∪ {e} is part of some
MST. – Why?
1. Assume edges X are part of a partial MST T (Inductive hypothesis)
2. If e is a part of T then done, so consider case where e is not part of T
3. Now add e to T, creating a cycle, meaning there must be another edge
e' across the cut (S, V-S) (note that weight(e') ≥ weight(e))
4. Create T' by replacing e' with e: T' = T ∪ {e} – {e'}
5. T' is a tree since it is connected and still has |V|-1 edges
6. T' is an MST since:
1. weight(T') = weight(T) + w(e) – w(e')
2. both e and e' cross the cut (S, V-S)
3. by cut property e was the lightest edge across the cut
4. Therefore, w(e') = w(e) and T' is an MST
Thus, any (and only a) lightest edge across a cut will lead to an MST

CS 312 – Greedy Algorithms 22


Cut Property

Which edge is e'?


CS 312 – Greedy Algorithms 23
Kruskal's Algorithm Implementation

Data structure represents the state as a collection of disjoint sets where


each set represents a connected component (sub-tree) of G

CS 312 – Greedy Algorithms 24


Directed Tree Representation of Disjoint Sets
• Nodes are stored in an array
(fast access) and have a
pointer and a rank value
• π(x) is a pointer to parent
• if π(x) points to itself it is the
root/name of the disjoint set
• rank(x) is the height of the
sub-tree rooted at node x
• makeset is O(1)
• find(x) returns the unique
root/name of the set
• union(x,y) merges sets to
which x and y belong and
keeps the tree balanced so
that the maximum depth of
the tree representing the
disjoint set is log|V|
• find and union complexity?
CS 312 – Greedy Algorithms 25
Node π(x) Rank Node π(x) Rank
A D 0 A D 0
B E 0 B E 0
C F 0 C F 0
D D 1 D D 2
E E 1 E D 1
F F 1 F F 1
CS 312 – Greedy Algorithms 26
G G 0 G F 0
**Challenge Question** Kruskal's Algorithm

1. Given top image, show the new image and array representation after union(B, G)
2. What is the time and space complexity for Kruskal's algorithm with this data
structure? CS 312 – Greedy Algorithms 27
Kruskal Algorithm Complexity

 O(|E|log|V|) for initially sorting the edges


– Sort is actually O(|E|log|E|)
– Note that for a dense graph |E| ≈ |V|2
– But remember that logn2 = 2logn so they only differ by a constant
factor and thus Θ(|E|log|V|) = Θ(|E|log|E|)
 O(|V|) for the initial makesets – since each is O(1)
 find(u): log|V| 2|E| times: O(|E|log|V|)
 union(u,v): log|V| |V|-1 times (why?) – O(|V|log|V|)
 Total complexity is O(|E|log|V| + |V|×makeset_complexity + |
E|×find_complexity + |V|×union_complexity)
 Total complexity: O(|E|log|V|)
CS 312 – Greedy Algorithms 28
Could compress depth during find. How?
Could do compression if we plan on using data structure a lot
Over time find would then have amortized O(1) complexity
CS 312 – Greedy Algorithms 29
Prim's Algorithm
 Prim's algorithm grows one SCC (X and S) as a single tree
– X (edges) and S (vertices) are the sets of intermediate edges and vertices in
this partial MST
– On each iteration, X grows by one edge, and S by one vertex
 The smallest edge between a vertex in S and a vertex outside S (V - S)
 All vertices S and V-1 edges must eventually move into S and X
 Cannot create a cycle since new vertex not currently in S and we never add a new
edge between two nodes already in S
 What is an efficient data structure to retrieve that edge?
– The algorithm is basically Dijkstra's algorithm except that the PQ key value for
each node outside S is its smallest edge into S

CS 312 – Greedy Algorithms 30


Prim's Algorithm – An Intuitive Version
 Consider each node as wanting to get into club S
 All nodes must join the club before we finish
 Each non-member (V-S) has an entry key which is the smallest edge
length from itself to any current member of the club
 At each iteration the non-member with the smallest key joins the
club
 Whenever a new member joins the club, all non-members with
edges to the new member have their key updated if their edge to the
new member is smaller than their current smallest edge into the club

CS 312 – Greedy Algorithms 31


Prim's Algorithm

 Decreasekey does not update path length, but updates the key with the
decreased edge cost between v and z
 Almost the same as Dijkstra's Algorithm, same complexity values
CS 312 – Greedy Algorithms 32
Prim’s Algorithm

Choose arbitrary starting vertex,


set to 0 and deletemin
1 2 S = {5}
1 2 3 1: ∞
2: ∞
6 4 5 6
4 3: ∞
4: ∞
3 8
4 5 6 5: 0
6: ∞
7: ∞
4 8 3
7

CS 312 – Greedy Algorithms 33


Prim’s Algorithm

Choose arbitrary starting vertex,


set to 0 and deletemin
1 2 S = {5}
1 2 3 1: ∞
2: 4
6 4 5 6
4 3: 5
4: 3
3 8
4 5 6 6: 8
7: 8

4 8 3
7

CS 312 – Greedy Algorithms 34


Prim’s Algorithm
deletemin returns node
with shortest edge into S
1 2 X: S = {5}
1 2 3 1: ∞
{4,5} S = {4,5} 2: 4
6 4 5 6
4 3: 5
4: 3
3 8
4 5 6 6: 8
7: 8

4 8 3
7

CS 312 – Greedy Algorithms 35


Prim’s Algorithm
Don’t actually store S or X.
Final S is all V and we get MST using
prevs which are fixed once node dequeued.
PQ contains nodes not yet in MST (V – S)
1 2 X: S = {5}
1 2 3 1: ∞
{4,5} S = {4,5} 2: 4
6 4 5 6
4 3: 5
4: 3
3 8
4 5 6 6: 8
7: 8

4 8 3
7

CS 312 – Greedy Algorithms 36


Prim’s Algorithm
Update then decreases keys in PQ
(shortest edge into S) where possible,
based on new node just added to S

1 2 X: S = {5}
1 2 3 1: 4
{4,5} S = {4,5} 2: 4
6 4 5 6
4 3: 5
6: 8
3 8
4 5 6 7: 4

4 8 3
7

CS 312 – Greedy Algorithms 37


Prim’s Algorithm

Repeat until PQ is empty


1 2 X: S = {5}
1 2 3 2: 1
{4,5} S = {4,5} 3: 5
6 4 5 6 {1,4} S = {1,4,5}
4 6: 8
7: 4
3 8
4 5 6

4 8 3
7

CS 312 – Greedy Algorithms 38


Prim’s Algorithm

Repeat until PQ is empty


1 2 X: S = {5}
1 2 3 3: 2
{4,5} S = {4,5} 6: 8
6 4 5 6 {1,4} S = {1,4,5}
4 7: 4
3 8 {1,2} S = {1,2,4,5}
4 5 6

4 8 3
7

CS 312 – Greedy Algorithms 39


Prim’s Algorithm

Repeat until PQ is empty


1 2 X: S = {5}
1 2 3 6: 6
{4,5} S = {4,5} 7: 4
6 4 5 6 {1,4} S = {1,4,5}
4
3 8 {1,2} S = {1,2,4,5}
4 5 6 {2,3} S = {1,2,3,4,5}

4 8 3
7

CS 312 – Greedy Algorithms 40


Prim’s Algorithm

Repeat until PQ is empty


1 2 X: S = {5}
1 2 3 6: 3
{4,5} S = {4,5}
6 4 5 6 {1,4} S = {1,4,5}
4
3 8 {1,2} S = {1,2,4,5}
4 5 6 {2,3} S = {1,2,3,4,5}
{4,7} S = {1,2,3,4,5,7}
4 8 3
7

CS 312 – Greedy Algorithms 41


Prim’s Algorithm

Repeat until PQ is empty


1 2 X: S = {5}
1 2 3
{4,5} S = {4,5}
6 4 5 6 {1,4} S = {1,4,5}
4
3 8 {1,2} S = {1,2,4,5}
4 5 6 {2,3} S = {1,2,3,4,5}
{4,7} S = {1,2,3,4,5,7}
4 8 3 {6,7} S = {1,2,3,4,5,6,7}

CS 312 – Greedy Algorithms 42


Complexity
 Dense Graph: |E| = O(|V|2)
– Kruskal’s: (|E| log |V|) = (|V|2
log |V|)
– Prim’s (w/ Array PQ): (|V|2)
– Prim’s (w/ Binary Heap PQ): (|E| log |V|) = (|V|2 log |V|)
 Sparse Graph: |E| = O(|V|)
– Kruskal’s: (|E| log |V|) = (|V|
log |V|)
– Prim’s (w/ Array PQ): (|V|2)
– Prim’s (w/ Binary Heap PQ): (|E| log |V|) = (|V| log |V|)
 Punch-lines
– Prim’s with Array PQ is best for a dense graph
– Kruskal’s with sets OR Prim’s with Heap PQ are both about the same for
sparser graphs
– Kruskal’s gives more control when choosing between edges with ties in
cost and some consider Kruskal’s as easier to implement

CS 312 – Greedy Algorithms 43


Huffman Encoding
 Commonly used algorithm
 Example: MP3 audio compression which works as follows
 Digitize the analog signal by sampling at regular intervals
– Yields real numbers s1, s2,…, sT
– High fidelity is 44,100 samples per second
– Thus a 50 minute symphony would have T = 50·60·44,100 ≈ 130 million
samples
 Quantize each sample st into a finite alphabet Γ
– These are called codewords or symbols
– e.g. Quantize into 16 bit numbers
– Sufficient that close codewords are indistinguishable to human ear
 Encode the quantized values into binary numbers
– Huffman coding (compression) can give savings in this step

CS 312 – Greedy Algorithms 44


Huffman Encoding

CS 312 – Greedy Algorithms 45


Huffman Encoding
 Assume an example of 130 million samples, and that the
alphabet has 4 codewords: A, B, C, D
– Thus all measurements are quantized into one of 4 values
 Encode these into binary
– A: 00
– B: 01
– C: 10
– D: 11
 Total memory would be 260 million bits (2 bits/sample)
 Can we do better?

CS 312 – Greedy Algorithms 46


Huffman Encoding
 Consider the frequency (count) of the symbols: A, B, C, D
– A: 70 million
– B: 3 million
– C: 20 million
– D: 37 million
 Could use a variable length encoding where more common
codewords are represented with less bits?
– A: 0
– B: 001
– C: 01
– D: 10
 Total number of bits would now be less due to frequency of A
 However, how would you distinguish AC from B?
CS 312 – Greedy Algorithms 47
Prefix-Free Property
 Prefix-Free Property: A binary encoding such that no codeword can
be a prefix of another codeword
 Any full binary tree (all nodes have 0 or two children) with the #leafs
equal to the #codewords will always give a valid prefix free encoding
– Any codeword can arbitrarily be at any leaf, but we want more frequent
codewords higher in the tree
 Encoding below allows us to store our example with 213 Mbits – 17%
improvement
 But how do we find an optimal encoding tree? – Simple Alg?

CS 312 – Greedy Algorithms 48


Optimal Encoding Tree
 Assume frequency (count) of codewords are f1, f2, …, f|Γ|

 Treecost = 70·1 + 37·2 + 3·3 + 20·3 = 213


 Rather than 2 bits per sample we have
 1*70/130 + 2*(37/130) + 3*((20 + 3)/130) = average 1.64 bits per sample

CS 312 – Greedy Algorithms 49


Huffman Algorithm
 Greedy constructive algorithm
– Repeat until all codewords are used
– Pull the two codewords with the smallest fi and fj off the unordered list
of frequencies
– Make them children of a new node with frequency fi + fj
– Insert fi + fj into the list of frequencies – What data structure?

– Tree must have exactly 2n-1 nodes; n leafs and n-1 internal nodes
CS 312 – Greedy Algorithms 50
Huffman Algorithm

 Although we insert array indexes (integers from 1 to 2n-1)


into the queue, the priority queue key value is f(index)
– Note the array f is assumed unsorted (e.g. 70, 3, 20, 37)
 Can implement priority queue just like in Dijkstra's algorithm
– But don’t need decreasekey, so we don’t need the pointer array

CS 312 – Greedy Algorithms 51


**Challenge Question** Huffman Algorithm

 Using Huffman show a final tree, tree cost, and encoding given:
– A: 10
– B: 5
– C: 15
– D: 5
– E: 40
 If we use a binary heap implementation for the priority queue, then
what is Huffman time and space complexity?
CS 312 – Greedy Algorithms 52
Huffman Algorithm

 If we use a binary heap implementation for the priority


queue, then Huffman complexity is O(nlogn) and space
complexity is O(n)

CS 312 – Greedy Algorithms 53


Travelling Salesman Problem
 Given a set of n cities with distances from one city to
another:
– Visit each city exactly once, and return to the initial city
– Travelling the least possible distance
 What is the simplest algorithm to make sure we get the
optimal result?

CS 312 – Greedy Algorithms 54


TSP Complexity
 There are n! possible paths
 To calibrate, there are about 1057 atoms in the solar system and 1080
atoms in the current known universe

Approximate TSP Time Complexity – big O


# of Cities Brute force O(n!)
10 106
15 1012
20 1018
50 1064
100 10159
1000 102567

CS 312 – Greedy Algorithms 55


TSP Greedy Approach
 What is a greedy algorithm to find a TSP path?
– Will it be optimal?
– What is its complexity?

CS 312 – Greedy Algorithms 56


TSP Greedy Approach
 What is a greedy algorithm to find a TSP path?
– Start at an arbitrary city
– Choose shortest edge from current node to an unvisited city, O(n)
– Repeat process from the most recently visited city
– Finally, connect the last city to the first city
– Total O(n2)
– You will implement this as part of Project 5 as a comparison with
intelligent search for TSP
 2nd Order Greedy
– Shortest combined path from current city which reaches two unvisited
cities
– Complexity?
– Is it better?

CS 312 – Greedy Algorithms 57


Horn Formulas
 Can use rules and settings of variables to solve interesting
problems – PROLOG – early AI language
 Assume a set of Boolean variables
– r = it is raining
– u = umbrella is open
– d = you are dry
 Horn formulas are made of implications (i.e. rules)
– (r  u)  d
 and pure negative clauses (another type of rule)
– (¬r  ¬u  ¬d) – At least one of the variables must be false
 Can we find an assignment of our variables which satisfies
all our rules, and also infers new information

CS 312 – Greedy Algorithms 58


Horn Formulas Greedy Algorithm
Modus
Ponens
p q p→q
F F T
F T T
T F F
T T T

• 4 variables, 7 clauses, 17 literals


• Gives correct solution and you can implement the above with a
variation linear in the number of literals (See HW 5.33)
CS 312 – Greedy Algorithms 59
Satisfiability
 Horn-SAT is one variation of Satisfiability
 2-SAT
– (x  y)  (z  x)  (w  y)  …
– SAT problems: Does there exist a setting of the
Boolean variables which satisfy this formula, 2-CNF
– There is a polynomial solution
 3-SAT
– (x  y  p)  (z  x  q)  (w  y  q)  …
– There is a no polynomial solution, one of original NP-
complete problems

CS 312 – Greedy Algorithms 60


Set Cover
 Assume a universe U of elements
 Assume a set S of subsets Si of the universe
 Set Cover is the problem of finding the minimum number
of subsets, whose union includes all elements of the
universe – Very common in real world problems (e.g.
airline crew scheduling)
 U = {1, 2, 3, 4, 5}
 S = {{1, 2, 3}, {2, 4}, {3, 4}, {4, 5}}
 What is the minimum set cover in this case? Simple Alg?
 Greedy Algorithm?
 Is it optimal?
CS 312 – Greedy Algorithms 61
Set Cover Example
In which towns should we place schools given that all towns must be within 30 miles
of a school. Each town has edges to those towns who are within 30 miles.
Each town represents one of n subsets of towns (i.e. those to whom they are directly connected)

Until all elements in U covered, select Si with the largest number of uncovered elements.
CS 312 – Greedy Algorithms 62
Set Cover
 Set Cover is another NP-complete problem, thus there
exists no polynomial solution to find the optimum
 It can be proven that our simple greedy algorithm gives an
answer within ln(n) of the optimal solution
 Thus, if k is the optimal solution, our simple greedy
algorithm is guaranteed to find a value between k and
kln(n)
– ln(n) is the approximation factor
 There is provably no polynomial time algorithm for Set
Cover with a smaller approximation factor

CS 312 – Greedy Algorithms 63


Machine Learning and Decision Trees
 In machine learning we want to take data sets with
examples of classifications and learn so that we can
generalize good answers when we later receive new
unforeseen data
 Medical diagnosis, stock market prediction, speech
recognition, self-driving cars, etc.
 Decision trees are learned using a greedy divide-and-
conquer algorithm that leads to very good results

CS 312 – Greedy Algorithms 64


Decision Tree Learning – Pizza Classification
 Assume A1 is binary feature (Veggie, Meaty)
 Assume A2 is 3 value feature (Crust: Thin/Stuffed/Thick)
 Circles are good pizzas and Squares are great pizzas
 A goal is to get “pure” leaf nodes. What do we do?

Thin

Stuffed A2

Thick

A1
Veggie Meaty
CS 312 – Greedy Algorithms 65
Decision Tree Learning
 Assume A1 is binary feature (Veggie, Meaty)
 Assume A2 is 3 value feature (Crust: Thin/Stuffed/Thick)
 Circles are good pizzas and Squares are great pizzas
 Greedily divide on attribute which gives the purest children
A1

Thin Thin

Stuffed A2 Stuffed A2

Thick Thick

A1 A1
Veggie Meaty Veggie Meaty
CS 312 – Greedy Algorithms 66
Decision Tree Learning
 Assume A1 is binary feature (Veggie, Meaty)
 Assume A2 is 3 value feature (Crust: Thin/Stuffed/Thick)
 Circles are good pizzas and Squares are great pizzas
 Greedily divide on attribute which gives the purest children
A1

A2

Thin Thin

Stuffed A2 Stuffed A2

Thick Thick

A1 A1
Veggie Meaty Veggie Meaty
CS 312 – Greedy Algorithms 67
Decision Tree Learning
 Assume A1 is binary feature (Veggie, Meaty)
 Assume A2 is 3 value feature (Crust: Thin/Stuffed/Thick)
 Circles are good pizzas and Squares are great pizzas
 Greedily divide on attribute which gives the purest children
A1

A2

Thin Thin

Stuffed A2 Stuffed A2

Thick Thick

A1 A1
Veggie Meaty Veggie Meaty
CS 312 – Greedy Algorithms 68
Conclusion on Greedy Algorithms
 Leads to optimal solutions for a number of important
problems
– MST, Huffman, Horn Formulas, etc.
 Often do not lead to optimal solutions, but very powerful
and common approximation approach for otherwise
exponential tasks
– TSP, Decision trees, Set Cover, etc.
 Usually lead to relatively simple and efficient algorithms,
avoiding expensive look ahead

CS 312 – Greedy Algorithms 69

You might also like