Week 5 and 6 - Greedy Strategy Technique

Download as pdf or txt
Download as pdf or txt
You are on page 1of 58

Course: Analysis of Algorithms

Code: CS-14101
Branch: B.Tech -IT 4th Semester
Week 4 &5 : Greedy Strategy Technique

Faculty & Coordinator : Dr. J Sathish Kumar (JSK)

Department of Computer Science and Engineering


Motilal Nehru National Institute of Technology Allahabad,
Prayagraj-211004
The Greedy Method: Introduction
• Humans/People are usually Greedy. It is
seldom to find non-greedy people

• Greed works!!!!
We want to verify whether is it really so ?

• Take a number of computational problems


investigate the pros & cons of short-sighted
greed

• How to we define a greedy algorithm ?


Basic Greedy paradigm
• Builds up the small solution in small steps
• Choose a decision myopically (Lacking foresight or scope)
• To optimize some underlying criterion
• Choose what is best for the moment
• Typically works in stages
1.a decision made in one stage can’t be changed later (can’t be
undone)
2.the choice must lead to the feasible solution
3.expected to be an optimal solution
Applications
 Optimal solutions  Approximations
•Huffman codes •Knapsack problem
•Minimum Spanning Tree (MST) •Traveling Salesman Problem
•Single-source shortest paths (TSP)
•Optimal Storage on Tapes •Other combinatorial
optimization problems
•Container Loading problem
•Change making
•Simple scheduling problems
Greedy algorithms
Optimization problems
• solved through a sequence of choices that are:
• feasible
• locally optimal
• irrevocable
Not all optimization problems can be approached in
this manner!
Huffman Codes

Huffman Codes
• For compressing data (sequence of
characters)
• Widely used
• Very efficient (saving 20-90%)
• Use a table to keep frequencies of
occurrence of characters.
• Output binary string.
Huffman Codes Frequency Fixed-length Variable-length
Example:
codeword codeword
‘a’ 45000 000 0
‘b’ 13000 001 101
A file of 100,000 characters. ‘c’ 12000 010 100
Containing only ‘a’ to ‘e’
‘d’ 16000 011 111
‘e’ 9000 100 1101
‘f’ 5000 101 1100

eg. “abc” = “000001010” eg. “abc” = “0101100”

1*45000 + 3*13000 + 3*12000 +


300,000 bits 3*16000 + 4*9000 + 4*5000
= 224,000 bits
Huffman Codes A file of 100,000
characters.
The coding schemes can be represented by trees:
Frequency Fixed-length Frequency Variable-length
(in thousands) codeword (in thousands) codeword
‘a’ 45 000 ‘a’ 45 0
‘b’ 13 001 ‘b’ 13 101
‘c’ 12 010 ‘c’ 12 100
‘d’ 16 011 ‘d’ 16 111
‘e’ 9 100 ‘e’ 9 1101
‘f’ 5 101 ‘f’ 5 1100

0 100 100
1 0 1
86 14 a:45 55
0 1 0 0 1
58 28 14 25 30
0 1 0 1 0 1 0 1 0 1
a:45 b:13 c:12 d:16 e:9 f:5 b:13 c:12 14 d:16
0 1
e:9 f:5
Huffman Codes
How do we find the optimal Huffman code (1952) was invented to solve it.
prefix code? A Greedy Approach.

Q: A min-priority queue f:5 e:9 c:12 b:13 d:16 a:45

c:12 b:13 14 d:16 a:45 14 d:16 25 a:45


f:5 e:9 f:5 e:9 c:12 b:13

25 30 a:45 a:45 55 100


a:45 55
c:12 b:13 14 d:16 25 30
25 30
f:5 e:9 c:12 b:13 14 d:16
b:13 c:12 14 d:16
f:5 e:9
e:9 f:5
Huffman Codes
Q: A min-priority queue f:5 e:9 c:12 b:13 d:16 a:45

c:12 b:13 14 d:16 a:45 14 d:16 25 a:45


f:5 e:9 f:5 e:9 c:12 b:13

….

HUFFMAN(C) If Q is implemented as a binary min-


1 Build Q from C heap,
2 For i = 1 to |C|-1 “Build Q from C” is O(n)
3 Allocate a new node z “EXTRACT_MIN(Q)” is O(lg n)
4 z.left = x = EXTRACT_MIN(Q) “Insert z into Q” is O(lg n)
5 z.right = y = EXTRACT_MIN(Q) Huffman(C) is O(n lg n)
6 z.freq = x.freq + y.freq
7 Insert z into Q in correct position.
8 Return EXTRACT_MIN(Q)
Minimum Spanning Tree
• A spanning tree of an undirected connected graph is its connected
acyclic subgraph (i.e., a tree) that contains all the vertices of the
graph.
Minimum Spanning Tree
• If such a graph has weights assigned to its edges, a minimum
spanning tree is its spanning tree of the smallest weight, where the
weight of a tree is defined as the sum of the weights on all its edges.
Applications of minimum spanning tree

• Minimum spanning tree can be used to design


• water-supply networks,

• telecommunication networks, and

• electrical grids.

• It can be used to find paths in the map.


Kruskal’s Algorithm
Kruskal’s Algorithm
Kruskal’s Algorithm
Kruskal’s Algorithm
Kruskal’s Algorithm
Kruskal’s Algorithm
Kruskal’s Algorithm
Time Complexity

Initialise the forest O(|V|)


Sort the edges O(|E|log|E|)
Until we've added |V|-1 edges O(|V|) x
Check whether an edge
O(|V|) = O(|V|2)
forms a cycle
Total O(|V|+|E|log|E|+|V|2)
Since |E|=O(|V|2) O(|V|2log|V|)

Thus, we may refer to Kruskal's algorithm as an O(n2log n) algorithm, where n is the number of
vertices. However, note that if |E| is similar to |V|, then the complexity is O(n2).
Time Complexity
• The edges are maintained as min heap.
• The next edge can be obtained in O(logE) time if graph has E edges.
• Reconstruction of heap takes O(E) time.
• So, Kruskal’s Algorithm takes O(ElogE) time.
• The value of E can be at most O(V2).
• So, O(logV) and O(logE) are same.
• If the edges are already sorted, then there is no need to construct
min heap.
• So, deletion from min heap time is saved.
• In this case, time complexity of Kruskal’s Algorithm = O(E + V)
Prim’s Algorithm
Prim’s Algorithm
Prim’s Algorithm
Prim’s Algorithm
Prim’s Algorithm
Prim’s Algorithm
Time Complexity
• If adjacency list is used to represent the graph, then using breadth first
search, all the vertices can be traversed in O(V + E) time.
• We traverse all the vertices of graph using breadth first search and use a
min heap for storing the vertices not yet included in the MST.
• To get the minimum weight edge, we use min heap as a priority queue.
• Min heap operations like extracting minimum element and decreasing key
value takes O(logV) time.
• So, overall time complexity
= O(E + V) x O(logV)
= O((E + V)logV)
= O(ElogV)
Single-source shortest-paths problem

• Single-source shortest-paths problem: for a given vertex called the


source in a weighted connected graph, find shortest paths to all its
other vertices.

• Dijkstra’s algorithm
Dijkstra’s algorithm
Dijkstra’s algorithm
Dijkstra’s algorithm
Dijkstra’s algorithm
Dijkstra’s algorithm
Example of Dijkstra’s property fails with
negative edge weights

Dijkstra’s algorithm would visit b then a and leave b with a distance of 2 instead of
the correct distance 1.
The problem is that when Dijkstra visits b, it fails to consider the possibility of
there being a shorter path from a to b (which is impossible with nonnegative edge
weights).
Container Loading problem
• A large ship is to be loaded with containers of cargos. Different
containers, although of equal size, will have different weights.
• Let wi be the weight of the ith container, 1 ≤ i ≤ n, and the capacity of
the ship is c.
• Find out how could we load the ship with the maximum number of
containers.
• Let xi ∈ {0, 1}.
• If xi= 1, we will load the ith container, otherwise, we will not load it.
• We wish to assign values to x‘is such that c, and xi ∈ {0, 1}.
• The optimization function is
Example
• Given that ,
• n = 8,
• i = [1,2,3,4,5,6,7,8],
• wi = [100,200, 50, 90, 150, 50, 20, 80]
• C= 400
• What is the order of loading ?
• [x1…x8]=[]
• [x1….x8] = [1,0,1,1,0,1,1,1]
The Container Loading Problem
• Let xi be a boolean variable (1=container loaded ,
0=container not loaded).
• The problem is to assign values xi such that with
• Wi - the weight of the container i and
• C - the maximum cargo carrying capacity of a ship
• is maximized with
𝑛
respect to the constraints that
𝑊𝑖𝑋𝑖 ≤ 𝐶
𝑖=1
Algorithm Container Loading
1. for i=1 to n
2. do x[i] = 0;
3. t <- Allocate the memory (n);
4. Indirect-Sort(w, t, n);
5. while (i <=n) and w[t[i]] ≤ C)
6. do
7. x[t[i]] = 1;
8. C = C – w[t[i]];
9. i=i+1;
10. delete t
Knapsack problem

A thief breaks into a museum. Fabulous paintings, sculptures, and


jewels are everywhere. The thief has a good eye for the value of these
objects, and knows that each will fetch hundreds or thousands of dollars
on the clandestine art collector’s market. But, the thief has only brought
a single knapsack to the scene of the robbery, and can take away only
what he can carry. What items should the thief take to maximize the
haul?
Fractional Knapsack Problem
• In a knapsack problem there is a knapsack or a container of capacity M, n
items where, each item I is of weight wi and is associated with a profit pi.

• The problem of knapsack is to fill the available items into the knapsack so
that the knapsack gets filled up and yields a maximum profit.

• If a fraction xi of object i is placed into the knapsack, then a profit pixi is


earned.

• The constrain is that all chosen objects should sum up to M


Illustration
• Consider a knapsack problem of finding the optimal solution where,
M=15, (p1,p2,p3…p7) = (10, 5, 15, 7, 6, 18, 3) and (w1, w2, …., w7) =
(2, 3, 5, 7, 1, 4, 1).
• In order to find the solution, one can follow three different strategies.
Strategy 1: non-increasing profit values

• Let (a,b,c,d,e,f,g) represent the items with


profit (10,5,15,7,6,18,3) then the sequence
of objects with non-increasing profit is
(f,c,a,d,e,b,g), weights- (2, 3, 5, 7, 1, 4, 1).

• Profit= 47 units

• The solution set is


(1, 0, 1, 4/7, 0, 1, 0).
Strategy 2: non-decreasing weights
• Let (a,b,c,d,e,f,g) represent the
items with profit
(10,5,15,7,6,18,3) then the
sequence of objects with non-
increasing profit is (f,c,a,d,e,b,g),
weights- (2, 3, 5, 7, 1, 4, 1).

• The sequence of objects with non-


decreasing weights is
(e,g,a,b,f,c,d).

• Profit= 54 units

• The solution set is


(1,1,4/5,0,1,1,1).
Strategy 3: non-decreasing weights with
increasing profits
• Maximum profit per unit of capacity used
• This means that the objects are considered in decreasing order of the
ratio Pi/wi .
a: P1/w1 =10/2 = 5
b: P2/w2=5/3=1.66
c: P3/w3=15/5 = 3
d: P4/w4 =7/7=1
e: P5/w5 =6/1=6
f: P6/w6=18/4 = 4.5
g: P7/w7 =3/1=3
• Hence, the sequence is (e, a, f, c, g, b, d)
Strategy 3: non-decreasing weights with increasing profits
Strategy 3: non-decreasing weights with
increasing profits
• Profit= 55.33 units
• The solution set is (1,2/3,1,0,1,1,1).
• In the above problem it can be observed that, if the sum of all the
weights is <=M then all xi= 1, is an optimal solution.
• If we assume that the sum of all weights exceeds M, all xi’s cannot be
one.
• Sometimes it becomes necessary to take a fraction of some items to
completely fill the knapsack.
• This type of knapsack problems is a general knapsack
problem/Fractional knapsack problem.
0-1 knapsack problem
• We call this the 0-1 knapsack problem because for each item,
the thief must either take it or leave it behind; he cannot take
a fractional amount of an item or take an item more than
once
0-1 knapsack problem
• To see that this greedy strategy does not work for the 0-1 knapsack problem, consider the problem
instance illustrated in Figure a .
• This example has 3 items and a knapsack that can hold 50pounds.
• Item1weighs10pounds and is worth 60dollars.
• Item2weighs20pounds and is worth100dollars.
• Item3 weighs30pounds and is worth120dollars.
• Thus, the value per pound of item1is 6dollars per pound, which is greater than the value per
pound of either item2(5 dollars per pound) or item3(4dollars per pound).
• The greedy strategy, therefore, would take item1first.
• As you can see from the case analysis in Figure b, however, the optimal solution takes
items2and3, leaving item 1behind.
• The two possible solutions that take item1 are both suboptimal.
0-1 knapsack problem

• 0-1 knapsack problem cannot results optimal solution by using


Greedy.

• Hallmark for Dynamic Programming

• Optimal solution is achieved by Dynamic Programming.

• The Complexity of Fractional Knapsack is O(nlogn).


Optimal Storage on Tapes: Introduction
• Consider n programs that are to be stored on a tape of length L.
• Each program I is of length Li where i lies between 1 and n.
• All programs can be stored on the tape if the sum of the lengths of the programs
is at most L.
• It is assumed that, whenever a program is to be retrieved the tape is initially
positioned at the start/end.
• There are n programs that are to be stored on a computer tape of length L.
Associated with each program i is a length Li.
• Assume the tape is initially positioned at the front. If the programs are stored
in the order L = i1, i2, …, in, the time tj needed to retrieve program ij
j
tj = L
k 1
ik

4 -53
Optimal Storage on Tapes
• If all programs are retrieved equally often, then the mean
retrieval time
1 n
(MRT) = n  j1
tj

• This problem fits the ordering paradigm. Minimizing the MRT is


equivalent to minimizing
n j

d(L) =  L
j1 k 1
ik

4 -54
Example
• Let n = 3, (L1,L2,L3) = (5,10,3). 6 possible orderings.
The optimal is 3,1,2

Ordering I d(I)
1,2,3 5+5+10+5+10+3 = 38
1,3,2 5+5+3+5+3+10 = 31
2,1,3 10+10+5+10+5+3 = 43
2,3,1 10+10+3+10+3+5 = 41
3,1,2 3+3+5+3+5+10 = 29
3,2,1, 3+3+10+3+10+5 = 34
4 -56
Optimal Storage on Tapes

• From the above discussion one can observe that the MRT can be
minimized if the programs are stored in an increasing order i.e., l1 <=
l2 <= l3, … ln.

• Hence the ordering defined minimizes the retrieval time.

• The solution set obtained need not be a subset of data but may be
the data set itself in a different sequence.
Pseudo code : Optimal Storage on Tapes
tapes(n, m) // here n is the numbers of programs and m is the numbers of tapes
{
j=0;
Sort (n);
for(i=1 to n do)
{
write ("append", i "to the permutation of the tape", j)
j=(j+1) mod m
}
}

You might also like