Unit 1 Dynamic Programming: Structure Page Nos

Download as pdf or txt
Download as pdf or txt
You are on page 1of 72

Dynamic Programming

UNIT 1 DYNAMIC PROGRAMMING


Structure Page Nos.
1.0 Introduction 5
1.1 Objectives 8
1.2 The Problem of Making Change 8
1.3 The Principle of Optimality 13
1.4 Chained Matrix Multiplication 14
1.5 Matrix Multiplication Using Dynamic Programming 15
1.6 Summary 17
1.7 Solutions/Answers 18
1.8 Further Readings 21

1.0 INTRODUCTION
In the earlier units of the course, we have discussed some well-known techniques,
including the divide-and-conquer technique, for developing algorithms for
algorithmically solvable problems. The divide-and-conquer technique, though quite
useful in solving problems from a number of problem domains, yet in some cases, as
shown below, may give quite inefficient algorithms to solve problems.
n
Example 1.0.1: Consider the problem of computing binomial coefficient   (or in
k  
linear notation c (n, k)) where n and k are given non-negative integers with nk. One
way of defining and calculating the binomial coefficient is by using the following
recursive formula

  if k  n or k  

 n   n    n  
  =      if   k  n (1.0.1)
 k   k    k 
  otherwise

The following recursive algorithm named Bin (n, k), implements the above formula
for computing the binomial coefficient.

Function Bin (n, k)

If k = n or k = 0 then return 1
else return Bin (n1, k1) + Bin (n1, k)

For computing Bin (n, k) for some given values of n and k, a number of terms Bin
(i, j), 1 i  and 1  j  k, particularly for smaller values of i and j, are repeatedly
calculated. For example, to calculate Bin (7, 5), we compute Bin (6, 5) and Bin (6, 4).
Now, for computing Bin (6, 5), we compute Bin (5, 4) and Bin (5, 5). But for
calculating Bin (6, 4) we have to calculate Bin (5, 4) again. If the above argument is
further carried out for still smaller values, the number of repetitions for Bin (i, j)
increases as values of i and j decrease.

For given values of n and k, in order to compute Bin (n, k), we need to call Bin (i, j)
for 1  i  n ─ 1 and 1  j  k─1 and as the values of i and j decrease, the number of
times Bin (i, j) is required to be called and executed generally increases.

The above example follows the Divide-and-Conquer technique in the sense that the
task of calculating C(n, k) is replaced by the two relatively simpler tasks, viz.,
calculating C(n1, k) and C (n1, k1). But this technique, in this particular case,
5
Design Techniques-II makes large number of avoidable repetitions of computations. This is not an isolated
instance where the Divide-and-Conquer technique leads to inefficient solutions. In
such cases, an alternative technique, viz., Dynamic Programming, may prove quite
useful. This unit is devoted to developing algorithms using Dynamic Programming
technique. But before, we discuss the technique in more details, let us briefly discuss
underlying idea of the technique and the fundamental difference between Dynamic
Programming and Divide-and-Conquer technique.

Essential idea of Dynamic Programming, being quite simple, is that we should avoid
calculating the same quantity more than once, usually by keeping a table of known
results for simpler instances. These results, in stead of being calculated repeatedly,
can be retrieved from the table, as and when required, after first computation.

Comparatively, Divide-and-Conquer is conceptually a top-down approach for


solving problems or developing algorithms, in which the problem is attempted
initially with complete instance and gradually replacing the more complex instances
by simpler instances.

On the other hand, Dynamic Programming is a bottom-up approach for solving


problems, in which we first attempt the simplest subinstances of the problem under
consideration and then gradually handle more and more complex instances, using the
results of earlier computed (sub) instances.

We illustrate the bottom-up nature of Dynamic Programming, by attempting solution


of the problem of computing binomial coefficients. In order to calculate C(n, k), for
given numbers n and k, we consider an n  k table or matrix of the form shown
below.

The (i, j) th entry of the table contains the value C(i, j). We know,
C (i, 0) = 1 for all i = 0, 1, 2,…., n and
C (0, j) = 0 for j = 1, 2, …., k

Thus, initially the table looks like

0 1 2 3 …….. k

0 1 0 0 0 …….. 0
1 1
2 1
3 1
. .
. .
. .
. .
n 1

Then row C (1, j) for j = 1, 2, …. k may be filled up by using the formula

C (i, j) = C (i – 1, j – 1) + C (i – 1, j).

6
After filling up the entries of the first row, the table takes the following form: Dynamic Programming

0 1 2 3 …….. k

0 1 0 0 0 0

1 1 1 0 0 0

2 1

. 1
. .
. .
. .
n 1

From the already calculated values of a given row i, adding successive pairs of
consecutive values, we get the values for (i + 1)th row. After completing the entries
for row with index 4, the table may appear as follows, where the blank entries to the
right of the main diagonal are all zeros.
0 1 2 3 4 ….. k

0 1
1 1 1
2 1 2 1
3 1 3 3 1
4 1 4 6 4 1
. .
. .
. .
. .
n 1

We Summarize below the process followed above, for calculating C(i, j):
First of all, the simplest values C(i, 0) = 1 for i = 1, 2, …., n and C(0, j) = 0 for
j  1, are obtained directly from the given formula. Next, more complex values are
calculated from the already available less complex values. Obviously, the above
mentioned process is a bottom-up one.

Though the purpose in the above discussion was to introduce and explain the Dynamic
Programming technique, yet we may also consider the complexity of calculating
C (n, k) using the tabular method.

Space Complexity/Requirement: In view of the following two facts, it can be easily


seen that the amount of space required is not for all the n  k entries but only for k
values of any row C (i, j) for j = 1, 2, ….., k independent of the value of i:

(i) The value in column 0 is always 1 and hence need not be stored.
(ii) Initially 0th row is given by C(0,0) = 1 and C (0, j) = 0 for j = 1, 2, …., k. Once
any value of row 1, say C(1, j) is calculated the values C(0, j1) and C(0, j) are
no more required and hence C (1, j) may be written in the space currently
occupied by C(0, j) and hence no extra space is required to write C (1, j).

In general, when the value C(i, j) of the ith row is calculated the value C (I – 1, j) is no
more required and hence the cell currently occupied by C (i –1, j) can be used to store

7
Design Techniques-II the value C(i, j). Thus, at any time, one row worth of space is enough to calculate
C(n, k). Therefore, space requirement is  (k).

Time Complexity: If we notice the process of calculating successive values of the


table, we find that any value C (i, j) is obtained by adding 1’s. For example,
C(4, 2) = 6 is obtained by adding C(3, 1) = 3 and C(3, 2) = 3. But then C(3, 1) is
obtained by adding C(2, 0) = 1 and C(2, 1) = 2. Again C(2, 1) is obtained by adding
C(1, 0) = 1 and C(1, 1) = 1. Similarly, C(3, 2) and hence C(4, 2) can be shown to
have been obtained by adding, directly or indirectly, a sequence of 1’s. Also, the
number of additions for calculating C(n, k) can not be more than all the entries in the
(n1) rows viz., 1, 2, ….. (n1), each row containing k elements.
Thus, number of additions  n k.

Therefore, the time complexity is  (n k).

In the next sections, we shall discuss solution of some well-known problems, using
Dynamic Programming technique.

1.1 OBJECTIVES
After going through this unit, you should be able to:

 explain the dynamic programming technique for solving optimization problems;


 apply the dynamic programming technique for solving optimization problems.
specially you should be able to solve the following well-known problems using
this technique:
o Shortest paths problems
o Knapsack Problem
o Chained Matrix Multiplication Problem
o Problem of making change
 Understand the principle of optimality.

1.2 THE PROBLEM OF MAKING CHANGE


First of all, we state a special case of the problem of making change, and then discuss
the problem in its general form.

We, in India, have currency notes or coins of denominations of Rupees 1, 2, 5, 10, 20,
50, 100, 500 and 1000. Suppose a person has to pay an amount of Rs. 5896 after
having decided to purchase an article. Then, the problem is about how to pay the
amount using minimum number of coins/notes.

A simple, frequently and unconsciously used algorithm based on Greedy technique is


that after having collected an amount A < 5896, choose a note of denomination D,
which is s.t
(i) A + D  5896 and
(ii) D is of maximum denomination for which (i) is satisfied, i.e., if E > D
then A+E > 5896.

In general, the Change Problem may be stated as follows:


Let d1, d2, …..dk, with di >0 for i = 1, 2, …, k, be the only coins that are available such
that each coin with denomination di is available in sufficient quantity for the purpose

8
of making payments. Further, let A, a positive integer, be the amount to be paid using Dynamic Programming
the above-mentioned coins. The problem is to use the minimum number of coins, for
the purpose.

The problem with above mentioned algorithm based on greedy technique, is that in
some cases, it may either fail or may yield suboptimal solutions. In order to establish
inadequacy of greedy technique based algorithms, we consider the following two
examples.

Example 1.2.1: Let us assume a hypothetical situation in which we have supply of


rupee-notes of denominations 5, 3 and 2 and we are to collect an amount of Rs. 9.
Then using greedy technique, first we choose a note of Rupees 5. Next, we
choose a 3-Rupee note to make a total amount of Rupees 8. But then according to
greedy technique, we can not go ahead in the direction of collecting Rupees 9. The
failure of greedy technique is because of the fact that there is a solution otherwise, as
it is possible to make payment of Rupees 9 using notes of denominations of Rupees 5,
Rupees 3 and Rupees 2, viz., 9 = 5+2+2.

Example 1.2.2: Next, we consider another example, in which greedy algorithm may
yield a solution, but the solution may not be optimal, but only suboptimal. For this
purpose, we consider a hypothetical situation, in which currency notes of
denominations 1, 4 and 6 are available. And, we have to collect an amount of
8 = 6+1+1. But this solution uses three currency notes/coins, whereas another
solution using only two currency notes/coins, viz., 8 = 4+4, is available.

Next, we discuss how the Coin Problem is solved using Dynamic Programming
technique.

As mentioned earlier, to solve the coin problem using Dynamic Programming


technique, we construct a table, some of the entries in which, corresponding to simple
cases of the problem, are filled up initially from the definition of the problem. And
then recursively entries for the more complex cases, are calculated from the already
known ones.

We recall the definition of the coin problem:

Pay an amount A using minimum number of coins of denominations di, 1  i  k. It is


assumed coins of each denomination, are available in sufficient quantities.

Each of the denomination di, 1  i  k, is made a row label and each of the value j for
1  j  A is made a column label of the proposed table as shown below, where A is the
amount to be paid:

Amount  1 2 3 4 ….. j …. A

denomination

1= d1
d2
.
.
.
di C[i, j]
.
.
.
dk

9
Design Techniques-II In the table given on page no. 9, 0 < d1 < d2 < … < dk and C[i, j] denotes the
minimum number of coins of denominations d1, d2, …., di (only) that is used to make
an amount j, where more than one coin of the same denomination may be used. The
value C[i, j] is entered in the table with row label di and column label j.

Next, in respect of entries in the table, we make the following two observations:
(i) In order to collect an amount 0, we require zero number of coins, which is true
whether we are allowed to choose from, say, one of the successively larger sets
of denominations viz., {d1}, {d1, d2}, {d1, d2, d3}, …, {d1, d2, …, dk}. Thus,
entries in the table in the column with 0 as column label, are all 0’s.
(ii) If d1  1, then there may be some amounts j (including j = 1) for which, even
with dynamic programming technique, no solution may exist, i.e., there
may not be any number of coins of denominations d 1, d2, … dk for which
the sum is j. Therefore, we assume d1 = 1. The case d1  1 may be handled
similarly.

As d1 = 1, therefore, the first row of the table in the jth column, contains j, the
number of coins of denomination only d1 = 1 to form value j.

Thus, the table at this stage looks like

0 1 2 3 4 ….. j …. A

d1 0 1 2 3 4 … j … A
d2 0
di 0
. .
. .
. .
dk 0

Next for i  2 and j  1, the value C[i , j], the minimum number of coins of
denominations upto di (only) to sum up to j, can be obtained recursively through
either of the following two ways:

(i) We choose a coin of denomination di the largest, available at this stage; to


make up j (provided, of course, j  di). In this case, rest of the amount to be
made out of coins of denomination d1, d2, …, di is (j ─ di). Thus, in this case,
we may have

C [i, j] = 1+ C[i, j – di], j  di if j  di (1.2.1)

(ii) We do not choose a coin of denomination di even when such a coin is


available. Then we make up an amount j out of coins of denominations
d1, d2, …, di-1 (only). Thus, in this case, we may have

C[i, j] = C[i-1, j] if i  1 (1.2.2)

If we choose to fill up the table row-wise, in increasing order of column numbers, we


can easily see from (i) and (ii) above that both the values  C[i, j  di] and C[i1, j]
are already known for comparison, to find out better alternative for C[i, j].

10
By definition, C[i, j] = min { 1 + C[ i, j – di], C [ i─1, j]} and can be calculated, as Dynamic Programming
the two involved values viz., C [ i, j – di] and C [ i─1, j] are already known.

Comment 1.2.3
If j < di in case (1.1) then the Equation (1.1) is impossible. Mathematically we can
say C [ i, j – di] =  if j < di, because, then the case is automatically excluded from
consideration for calculating C[i, j].

Similarly we take

C[i1, j] =  if i<1
Following the above procedure, C{k, A] gives the desired number.

In order to explain the above method, let us consider the earlier example for which
greedy algorithm gave only suboptimal solution.

Example 1.2.4: Using Dynamic Programming technique, find out minimum number
of coins required to collect Rupees 8 out of coins of denominations 1, 4, 6.

From the earlier discussion we already know the following portion of the table to be
developed using Dynamic Programming technique.

0 1 2 3 4 5 6 7 8

d1 = 1 0 1 2 3 4 5 6 7 8
d2= 4 0
di= 6 0

Next, let us calculate C[2,1], which by definition, is given by

C[2, 1] = min {1 + c [1, 1 4], c [1,1]}


By the comment above C [1,  3] = 
 C [2, 1] = C [1, 1] = 1
Similarly, we can show that
C[3, 1] = C[2, 1] = 1

Next we consider

C[2, 2] = min [ 1 + C (2, ─ 2), C(1, 2)]


Again C[2,  2] = 
Therefore, C[2, 2] = [1, 2] = 2
Similarly, C[3, 2] = 2
On the similar lines, we can show that
C[2, 3] = C[1, 3] = 3
C[3, 3] = C[2, 3] = 3

Next, interesting case is C[2, 4], i.e., to find out minimum number of coins to make an
amount 4 out of coins of denominations 1, 4, 6. By definition,

11
Design Techniques-II C[2, 4] = min {1 + C (2, 4 ─ 4), C(1, 4)}
But C[2, 0] = 0 and C [1, 4] = 4, therefore,
= min {1 + 0, 4} = 1 C[2, 4]

By following the method explained through the above example steps, finally, we get
the table as

0 1 2 3 4 5 6 7 8

d1 = 1 0 1 2 3 4 5 6 7 8
d2= 4 0 1 2 3 1 2 3 4 2
di= 6 0 1 2 3 1 2 1 2 2

Let us formalize the method explained above of computing C[k, A], in the general
case, in the form of the following algorithm:

Function Min-Num-Coins (A, D[1..k])

{gives the minimum number of coins to add up to A where the number k of


denominations and the value A are supplied by the calling procedure/program}

array C [1… k, 0… A]
For i = 1 to k
Read (d [i ] )
{reads the various denominations available with each denomination coin in sufficient
numbers}.
{assuming d1 = 1, initialize the table C { } as follows}

For i = 1 to k
C[ i, 0] = 0
For j = 1to A
C [ 1, j] = j

If i = 1 and j < d [ i ] then


C [ i, j] = 
else
if
j < d [ i ] then
C [ i, j] = C [ i – 1, j]
else

C[ i, j] = min 1+C {i, j – d [i] ], C [i – 1, j]}


return C [ k, A]

Comments 1.2.5

Comment 1: The above algorithm explicitly gives only the number of coins, which
are minimum to make a pre-assigned amount A, yet it can also be used to determine
the set of coins of various denominations that add upto A.

This can be seen through the following argument:

By definition, C [i, j] = either 1 + C [i, j  di] or C [i 1, j], which means either we
choose a coin of denomination di or we do not choose a coin of denomination di,

12
depending upon whether 1 + C [i, j  di]  C [ i ─ 1, j] or not. Applying the above rule Dynamic Programming
recursively for decreasing values of i and j, we know which coins are chosen for
making an amount j out of the available coins.

Comment 2: Once an algorithm is designed, it is important to know its


computational complexity in order to determine its utility.

In order to determine C[k, A], the above algorithm computes k  (A + 1) values


through additions and comparisons. Therefore, the complexity of the algorithm is
 (k . A), the order of the size of the matrix C[k, A].

1.3 THE PRINCIPLE OF OPTIMALITY


The Principle of Optimality states that components of a globally optimum solution
must themselves be optimal. Or, Optimality Principle states subsolutions of an optimal
solution must themselves be optimal.

While using the Dynamic Programming technique in solving the coin problem we
tacitly assumed the above mentioned principle in defining C [i, j], as minimum of
1 + C [i, j  di] and C [ i – 1, j]. The principle is used so frequently and without our
being aware of having used it.

However, we must be aware that there may be situations in which the principle may
not be applicable. The principle of optimality may not be true, specially, in situations
where resources used over the (sub) components may be more than total resources
used over the whole, and where total resources are limited and fixed. A simple
example may explain the point. For example, suppose a 400-meter race champion
takes 42 seconds to complete the race. However, covering 100 meters in 10.5 seconds
may not be the best/optimal solution for 100 meters. The champion may take less
than 10 seconds to cover 100 meters. The reason being total resources (here the
concentration and energy of the athlete) are limited and fixed whether distributed
over 100 meters or 400 meters.

Similarly, for a vehicle with best performance over 100 miles, can not be thought of
in terms of 10 times best performance over 10 miles. First of all, fuel usage after
some lower threshold, increases with speed. Therefore, as the distance to be covered
increases (e.g., from 10 to 100 miles) then fuel has to be used more cautiously,
restraining the speed, as compared to when distance to be covered is less (e.g., 10
miles). Even if refuelling is allowed, refuelling also takes time. The drivers
concentration and energy are other fixed and limited resources; which in the case of
shorter distance can be used more liberally as compared to over longer distances, and
in the process produce better speed over short distances as compared to over long
distances. The above discussion is for the purpose of driving attention to the fact that
principle of optimality is not universal, specially when the resources are limited and
fixed. Further, it is to draw attention that Dynamic Programming technique assumes
validity of Principle of Optimality for the problem domain. Hence, while applying
Dynamic Programming technique for solving optimisation problems, in order for the
validity of the solution based on the technique, we need to ensure that the Optimality
Principle is valid for the problem domain.

Ex. 1) Using Dynamic Programming, solve the following problem (well known as
Knapsack Problem). Also write the algorithm that solves the problem.

We are given n objects and a knapsack. For i = 1, 2, …, n, object i has a positive


weight wi and a positive value vi. The knapsack can carry a weight not exceeding W.
Our aim is to fill the knapsack in a way that maximizes the value of the objects
included in the knapsack. Further, it is assumed that the objects may not be broken

13
Design Techniques-II into pieces. In other words, either a whole object is to be included or it has to be
excluded.

1.4 CHAINED MATRIX MULTIPLICATION


In respect of multiplication of matrices, we recall the following facts:
(i) Matrix multiplication is a binary operation, i.e., at one time only two
matrices can be multiplied.
(ii) However, not every given pair of matrices may be multiplicable. If we have
to find out the product M1M2 then the orders of the matrices must be of the
from m  n and n  p for some positive integers m, n and p. Then matrices
M1 and M2, in this order, are said to be multiplication – compatible. Number
of scalar (number) multiplications required is mnp.
As can be easily seen that matrix multiplication is not commutative. Even
M2 M1 may not be defined even if M1 M2 is defined.
(iii) Matrix multiplication is associative in the sense that if M 1 M2 and M3 are three
matrices of order m  n, n  p and p  q then the matrices
(M1 M2) M3 and M1 (M2 M3) are defined,
(M1 M2) M3 = M1 (M2 M3)
and the product is an m  n matrix.

(iv) Though, for three or more matrices, matrix multiplication is associative, yet
the number of scalar multiplications may very significantly depending upon
how we pair the matrices and their product matrices to get the final product.

For example, If A is 14  6 matrix, B is 6  90 matrix, and C is 90  4 matrix, then


the number of scalar multiplications required for (AB)C is

14  6  90 = 7560 (for (AB) which of order 14  90)


Plus 14  90  4 = 5040 (for product of (AB) with C)

equal to 12600 scalar multiplication.

On the other hand, number of scalar multiplications for A(BC) is

6  90  4 = 2160 (for (BC) which is of order 6  4)


Plus 14  6  4 = 336 (for product of with BC)

equal to 2496 scalar multiplication

Summering: The product of matrices A (14  6), B(6  90) and C(90  4) takes
12600 scalar operators when first the product of A and B is computed and then
product AB is multiplied with C. On the other hand, if the product BC is calculated
first and then product of A with matrix BC is taken then only 2496 scalar
multiplications are required. The later number is around 20% of the former. In case
when large number of matrices are to be multiplied for which the product is defined,
proper parenthesizing through pairing of matrices, may cause dramatic saving in
number of scalar operations.


Another version allows any fraction xi with 0xi1. However, in this problem, we
assume either xi = 1 or xi = 0.

14
This raises the question of how to parenthesize the pairs of matrices within the Dynamic Programming
expression A1A2 … An, a product of n matrices which is defined; so as to optimize
the computation of the product A1A2 … An. The product is known as Chained
Matrix Multiplication.

Brute-Force Method: One way for finding the optimal method (i.e., the method
which uses minimum number of scalar (numerical) operations) is to parenthesize the
expression A1A2 … An in all possible ways and calculate number of scalar
multiplications required for each way. Then choose the way which requires minimum
number of scalar multiplications.

However, if T(n) denotes the number of ways of putting parentheses for pairing the
expression A1A2 … An, T(n) is an exponentially increasing function. The rate at
which values of T(n) increase may be seen from the following table of values of T(n)
for n = 1, 2, …….

n : 1 2 3 4 5 … 10 … 15
T(n): 1 1 2 5 14 … 4862 … 2674440

These values of T(n) are called Catalan numbers.

Hence, it is almost impossible to use the method for determining how to optimize the
computation of the product A1A2 … An.

Next, we discuss how to optimize the computation using Dynamic Programming


Technique.

1.5 MATRIX MULTIPLICATION USING


DYNAMIC PROGRAMMING
It can be seen that if one arrangement is optimal for A1A2 … An then it will be optimal
for any pairings of (A1… Ak) and (Ak+1An). Because, if there were a better pairing for
say A1A2 … Ak, then we can replace the better pair A1A2 … Ak in A1A2 … Ak Ak+1 …
An to get a pairing better than the initially assumed optimal pairing, leading to a
contradiction. Hence the principle of optimality is satisfied.

Thus, the Dynamic Programming technique can be applied to the problem, and is
discussed below:

Let us first define the problem. Let Ai, 1  i  n, be a di-1  di matrix. Let the vector
d [ 0.. n] stores the dimensions of the matrices, where the dimension of
Ai is di-1  di for i = 1, 2, …, n. By definition, any subsequence Aj…Ak of
A1A2 … An for 1  j  k  n is a well-defined product of matrices. Let us consider a
table m [1.. n, 1 .. n] in which the entries mij for 1  i  j  n, represent optimal (i.e.,
minimum) number of operations required to compute the product matrix (Ai … Aj).

We fill up the table diagonal-wise, i.e., in one iteration we fill-up the table one
diagonal mi, i+s, at a time, for some constant s  0. Initially we consider the biggest
diagonal mii for which s = 0. Then next the diagonal mi, i+s for s = 1 and so on.

First, filling up the entries mii , i = 1, 2, …, n.

Now mii stands for the minimum scalar multiplications required to compute the
product of single matrix Ai. But number of scalar multiplications required are zero.

Hence,
mii = 0 for i = 1, 2, … n.
15
Design Techniques-II Filling up entries for mi(i +1) for i = 1, 2, …, (n – 1).

mi(i +1) denotes the minimum number of scalar multiplication required to find the
product Ai Ai+1. As Ai is di-1  di matrix and Ai+1 is di  di+1 matrix. Hence,
there is a unique number for scalar multiplication for computing Ai Ai+1 giving

mi, (i +1) = di-1didi+1


for i = 1, 2, …, (n – 1).

The above case is also subsumed by the general case

mi(i +s) for s  1


For the expression
Ai Ai+1 … Ai+s,
let us consider top-level pairing
(Ai Ai+1 … Aj) (AJ+1 … Ai+s)

for some k with i  j  i + s.

Assuming optimal number of scalar multiplication viz., mi,j and mi+1, j are already
known, we can say that

mi(i +s) = miniji+s (mi,j + mj+1,s + di-1 dj di+s)

for i = 1, 2, …, n – s.

When the term di-1 dj di+s represents the number of scalar multiplications required to
multiply the resultant matrices (Ai … Aj) and (Aj+1 … Ai+s)

Summing up the discussion, we come the definition mi,i+s for i=1,2, …(n 1) as

for s = 0: mi,i = 0 for i = 1, 2, …, n


for s = 1: mi,i+1 = di-1 di di+1 for i = 1, 2, …, (n1)
for 1 < s <n:

mi, i+s = miniji+s(mij+mj+1,i+s + di-1 di di+1) for i= 1, 2, …, (ns)

Then m1,n is the final answer

Let us illustrate the algorithm to compute mj+1,i+s discussed above through an example

Let the given matrices be

A1 of order 14  6
A2 of order 6  90
A3 of order 90  4
A4 of order 4  35

Thus the dimension of vector d [0..4] is given by [14, 6, 90, 4, 35]

For s = 0, we know mii = 0. Thus we have the Matrix

Next, consider for s = 1, the entries

mi,i+1 = di-1 di di+1

16
m12 = d0 d1 d2 = 14  6  90 = 7560 Dynamic Programming
m23 = d1 d2 d3 = 6  90  4 = 3240
m34 = d2 d3 d4 = 9  4  35 = 1260

1 2 3 4

1 0 7560

2 0 3240

3 0 1260

4 0

Next, consider for s = 2, the entries

m13 = min (m11 + m23 + 14  6  4, m12 + m33 + 14  90  4)


= min (0 + 3240 + 336, 7560 + 0 + 5040)
= 3576
m24 = min (m22 + m34 + 6  90  35, m23 + m44 + 6  4  35)
= min (0 + 1260 + 18900, 3240 + 0 + 840)
= 4080

Finally, for s = 3

m14 = min (m11 + m24 + 14  6  35, {when k = 1}


M12 + m34 + 14  90  35, {when k = 2}
M13 + m44 + 14  4  35) {when k = 3}

= min (0 + 4080 + 2090, 7560 + 3240 + 44100, 3576 + 0 + 1960)


= 5536

Hence, the optimal number scalar multiplication, is 5536.

1.6 SUMMARY

(1) The Dynamic Programming is a technique for solving optimization


Problems, using bottom-up approach. The underlying idea of dynamic
programming is to avoid calculating the same thing twice, usually by
keeping a table of known results, that fills up as substances of the
problem under consideration, are solved.
(2) In order that Dynamic Programming technique is applicable in solving an
optimisation problem, it is necessary that the principle of optimality is
applicable to the problem domain.
(3) The principle of optimality states that for an optimal sequence of
decisions or choices, each subsequence of decisions/choices must also be
optimal.
(4) The Chain Matrix Multiplication Problem: The problem of how to
parenthesize the pairs of matrices within the expression A1A2 … An, a product
of n matrices which is defined; so as to minimize the number of scalar
multiplications in the computation of the product A1A2 … An. The product is
known as Chained Matrix Multiplication.

17
Design Techniques-II (5) The Knapsack Problem: We are given n objects and a knapsack. For i = 1,
2, …, n, object i has a positive weight wi and a positive value vi. The
knapsack can carry a weight not exceeding W. The problem requires that
the knapsack is filled in a way that maximizes the value of the objects
included in the knapsack.

Further, a special case of knapsack problem may be obtained in which the


objects may not be broken into pieces. In other words, either a whole object
is to be included or it has to be excluded.

1.7 SOLUTIONS/ANSWERS
Ex. 1)

First of all, it can be easily verified that Principle of Optimality is valid in the
sense for an optimal solution of the overall problem each subsolution is also
optimal. Because in this case, a non-optimal solution of a subproblem, when
replaced by a better solution of the subproblem, would lead to a better than
optimal solution of the overall problem. A contradiction.

As usual, for solving an optimization problem using Dynamic Programming


technique, we set up a table V[1..n, 0..W].

In order to label the rows we first of all, order the given objects according to
increasing relative values R = v/w.

Thus first object O1 is the one with minimum relative value R1. The object O2 is
the one with next least relative value R2 and so on. The last object, in the
sequence, is On, with maximum relative weight Rn.

The ith row of the Table Corresponds to object Oi having ith relative value,
when values are arranged in increasing order. The jth column corresponds to
weight j for 0j W. The entry Knap [i, j] denotes the maximum value that can
be packed in knapsack when objects O1, O2, …,Oi only are used and the
included objects have weight at most j.

Next, in order to fill up the entries Knap[i, j], 1i  n and 0j W, of the table,
we can check, as was done in the coin problem that,

(i) Knap [i, 0] = 0 for i = 1, 2, …, n

(ii) Knap [ 1, j] = V, for j = 1, …, W


where V is the value of O1


Another version allows any fraction xi with 0xi1. However, in this problem, we assume either xi = 1
or xi = 0.

18
0 1 2 3 4 … j … W Dynamic Programming

1 0 V V V V V V
2 0
. .
. .
. .
j 0
. .
. .
. .
n 0

Where V is the value of the object O1.

Further, when calculating Knap [i, j], there are two possibilities: either
(i) The ith object Oi is taken into consideration, then
Knap [i, j] = Knap [i1, j  wi] + vi or
(ii) The ith object Oi is not taken into consideration then
Knap [i, j] = V[i1, j]

Thus we define

Knap [i, j] = max {Knap [i1, j], Knap [i1, j - wi] + vi}

The above equation is valid for i = 2 and j  wi. In order that the above equation may
be applicable otherwise also, without violating the intended meaning we take,

(i) Knap [O, j] = O for j  O and


(ii) Knap [i, j] = ─  for k < 0

We explain the Dynamic Programming based solution suggested above, through the
following example.

We are given six objects, whose weights are respectively 1, 2, 5, 6, 7, 10 units and
whose values respectively are 1, 6, 18, 22, 28, 43. Thus relative values in increasing
order are respectively 1.00, 3.00, 3.60, 3.67, 4.00 and 4.30. If we can carry a
maximum weight of 12, then the table below shows that we can compose a load
whose value is 49.

Weights 0 1 2 3 4 5 6 7 8 9 10 11 12

Relative Values

w1= 1, v1 = 1, R1 = 1.00 0 1 1 1 1 1 1 1 1 1 1 1 1
w2 = 2, v2 = 6, R2 = 3.00 0 1 6 7 7 7 7 7 7 7 7 7 7

w3 = 5, v3 = 18, R3 = 3.60 0 1 6 7 7 18 19 24 25 25 25 25 25

W4 = 6, v4 = 22, R4 =3.67 0 1 6 7 7 18 22 24 28 29 29 40 41

W5 = 7, v5 = 28 R5 = 4.00 0 1 6 7 7 18 22 28 29 34 35 40 46

W6 = 10, v6 = 43, R6 = 4.30 0 1 6 7 7 18 22 28 29 34 43 44 49

19
Design Techniques-II Algorithm for the solution of the solution explained above of the Knapsack
Problem.

Function Knapsack (W, Weight [ 1..n], value [1..n])

{returns the maximum value corresponding to maximum allowed weight W, where


we are given n objects Oi, 1  i  n, with weights Weight [ i ] and values Value [ i ]}
array R [ 1 ..n ], Knap [ 1..n, 0..W]
For i = 1 to n do

begin
read (Weight [i ] )
read (value [i ] )
R [ i ] / weight [ i ]
end

For j = 1 to n do

{Find k such that R[k] is minimum of


R[j], R[j+1], …R[n]}

Begin
k=j
For t = j + 1 to n do
If R [ t ] < R [ k ] then
k=t
Exchange (R [ j ], R [ k ]);
Exchange (Weight [ j ], Weight [ k ]);
Exchange (Value [ j ], value [ k ] );
end

{At this stage R[ 1.. n] is a sorted array in increasing order and Weight [ j] and value
[ j ] are respectively weight and value for jth least relative value}
{Next, we complete the table knap for the problem}

For i = 1 to n do
Knap [ i , 0] = 0 for i = 1, …, n
Knap [ 1, j] = value [ 1 ] for j = 1, …, W

{Value [ 1 ] is the value of the object with minimum relative value}

{Next values for out of the range of the table Knap}

If i  0 and j  0 then
Knap [ i , j ] = 0
Else if j < 0 ─ then
Knap [ i, j ] = ─ 

For i = 2 to n do
For j = 1 to W do
Knap [ i, j] = maximum {Knap [i –1, j], Knap [ i  1, j – Weight [i] ]
+ value [ i ]

Retu rn Knap [ n, W]

20
Dynamic Programming
1.8 FURTHER READINGS
1. Foundations of Algorithms, R. Neapolitan & K. Naimipour:
(D.C. Health & Company, 1996).
2. Algoritmics: The Spirit of Computing, D. Harel, (Addison-Wesley
Publishing Company, 1987).
3. Fundamental Algorithms (Second Edition), D.E. Knuth, (Narosa Publishing
House).
4. Fundamentals of Algorithmics, G. Brassard & P. Brately, (Prentice-Hall
International, 1996).
5. Fundamentals of Computer Algorithms, E. Horowitz & S. Sahni, (Galgotia
Publications).
6. The Design and Analysis of Algorithms, Anany Levitin, (Pearson Education,
2003).
7. Programming Languages (Second Edition) ─ Concepts and Constructs, Ravi
Sethi, (Pearson Education, Asia, 1996).

21
Design Techniques-II
UNIT 2 GREEDY TECHNIQUES
Structure Page Nos.
2.0 Introduction 22
2.1 Objectives 23
2.2 Some Examples 23
2.3 Formalization of Greedy Technique 25
2.3.1 Function Greedy-Structure (GV: set): Set
2.4 Minimum Spanning Tree 27
2.5 Prim‟s Algorithm 31
2.6 Kruskal‟s Algorithm 34
2.7 Dijkstra‟s Algorithm 38
2.8 Summary 41
2.9 Solutions/Answers 41
2.10 Further Readings 46

2.0 INTRODUCTION
Algorithms based on Greedy technique are used for solving optimization problems.
An optimization problem is one in which some value (or set of values) of interest is
required to be either minimized or maximized w.r.t some given relation on the values.
Such problems include maximizing profits or minimizing costs of, say, production of
some goods. Other examples of optimization problems are about
 finding the minimum number of currency notes required for an amount, say of
Rs. 289, where arbitrary number of currency notes of each denomination from
Rs. 1 to Rs. 100 are available, and

 finding shortest path covering a number of given cities where distances between
pair of cities are given.

As we will study later, the algorithms based on greedy technique, if exist, are easy to
think of, implement and explain about. However, for many interesting optimization
problems, no algorithm based on greedy technique may yield optimal solution. In
support of this claim, let us consider the following example:

Example 1.1: Let us suppose that we have to go from city A to city E through either
city B or city C or city D with costs of reaching between pairs of cities as shown
below:
A

5000
3000 4000

B C D D
8000
5000
4000

E
Figure 2.0.1

Then the greedy technique suggests that we take the route from A to B, the cost of
which Rs.3000, is the minimum among the three costs, (viz., Rs. 3000, Rs. 4000 and
Rs. 5000) of available routes.

22
However, at B there is only one route available to reach E. Thus, greedy algorithm Greedy Techniques
suggests the route from A to B to E, which costs Rs.11000. But, the route from A to C
to E, costs only Rs.9000. Also, the route from A to D to E costs also Rs.9000.
Thus, locally better solution, at some stage, suggested by greedy technique yields
overall (or globally) costly solution.

The essence of Greedy Technique is : In the process of solving an optimization


problem, initially and at subsequent stages, we evaluate the costs/benefits of the
various available alternatives for the next step. Choose the alternative which is
optimal in the sense that either it is the least costly or it is the maximum profit
yielding. In this context, it may be noted that the overall solution, yielded by
choosing locally optimal steps, may not be optimal.

2.1 OBJECTIVES
After studying unit, you should be able to:

 explain the Greedy technique for solving optimization problems;


 apply the Greedy technique for solving optimization problems;
 apply Greedy technique for solving well-known problems including shortest
path problem.

2.2 SOME EXAMPLES


In order to set the subject-matter to be discussed in proper context, we recall the
general characteristics of greedy algorithms. These algorithms are:

 used to solve optimization problems,


 most straight forward to write,
 easy to invent, easy to implement, and if exist, are efficient,
 may not yield a solution for an arbitrary solvable problem,
 short-sighted making decisions on the basis of information immediately on
hand, without worrying about the effect these decisions may have in future, and
 never reconsider their decisions.

In order to understand the salient features of the algorithms based on greedy


technique, let us consider some examples in respect of the following Minimum
Number of Notes Problem:

In a business transaction, we are required to make payment of some amount A (say


Rs.289/-). We are given Indian currency notes of all denominations, viz., of
1,2,5,10,20,50,100, 500 and 1000. The problem is to find the minimum number of
currency notes to make the required amount A, for payment. Further, it is assumed
that currency notes of each denomination are available in sufficient numbers, so that
one may choose as many notes of the same denomination as are required for the
purpose of using the minimum number of notes to make the amount.

Example 2.2.1
In this example, we discuss, how intuitively we attempt to solve the Minimum
Number of Notes Problem , to be specific, to make an amount of Rs.289/-.

23
Design Techniques-II Solution: Intuitively, to begin with, we pick up a note of denomination D, satisfying
the conditions.

i) D  289 and
ii) if D1 is another denomination of a note such that D1  289, then D1  D.
In other words, the picked-up note‟s denomination D is the largest among all the
denominations satisfying condition (i) above.

The above-mentioned step of picking note of denomination D, satisfying the above


two conditions, is repeated till either the amount of Rs.289/- is formed or we are clear
that we can not make an amount or Rs.289/- out of the given denominations.

We apply the above-mentioned intuitive solution as follows:

To deliver Rs. 289 with minimum number of currency notes, the notes of different
denominations are chosen and rejected as shown below:

Chosen-Note-Denomination Total-Value-So far


100 0+100 ≤ 289
100 100+100=  289

100 200+100 > 289


50 200+50 ≤ 289
50 250+50 > 289
20 250 + 20 ≤ 289
20 270 + 20 > 289
10 270 + 10 ≤ 289
10 280 + 10 > 289
5 280 + 5  289
5 285 + 5 > 289
2 285 + 2 < 289
2 287 + 2 = 289

The above sequence of steps based on Greedy technique, constitutes an algorithm to


solve the problem.

To summarize, in the above mentioned solution, we have used the strategy of


choosing, at any stage, the maximum denomination note, subject to the condition that
the sum of the denominations of the chosen notes does not exceed the required amount
A = 289.

The above strategy is the essence of greedy technique.

Example 2.2.2
Next, we consider an example in which for a given amount A and a set of available
denominations, the greedy algorithm does not provide a solution, even when a
solution by some other method exists.

Let us consider a hypothetical country in which notes available are of only the
denominations 20, 30 and 50. We are required to collect an amount of 90.

Attempted solution through above-mentioned strategy of greedy technique:

i) First, pick up a note of denomination 50, because 50  90. The amount


obtained by adding denominations of all notes picked up so far is 50.
ii) Next, we can not pick up a note of denomination 50 again. However, if we
pick up another note of denomination 50, then the amount of the picked-up

24
notes becomes 100, which is greater than 90. Therefore, we do not pick up Greedy Techniques
any note of denomination 50 or above.
iii) Therefore, we pick up a note of next denomination, viz., of 30. The amount
made up by the sum of the denominations 50 and 30 is 80, which is less then
90. Therefore, we accept a note of denomination 30.
iv) Again, we can not pick up another note of denomination 30, because
otherwise the sum of denominations of picked up notes, becomes 80+30=110,
which is more than 90. Therefore, we do not pick up only note of
denomination 30 or above.
v) Next, we attempt to pick up a note of next denomination, viz., 20. But, in that
case the sum of the denomination of the picked up notes becomes 80+20=100,
which is again greater than 90. Therefore, we do not pick up only note of
denomination 20 or above.
vi) Next, we attempt to pick up a note of still next lesser denomination. However,
there are no more lesser denominations available.

Hence greedy algorithm fails to deliver a solution to the problem.

However, by some other technique, we have the following solution to the


problem: First pick up a note of denomination 50 then two notes each of
denomination 20.

Thus, we get 90 and , it can be easily seen that at least 3 notes are required to make an
amount of 90. Another alternative solution is to pick up 3 notes each of denomination
30.

Example 2.2.3
Next, we consider an example, in which the greedy technique, of course, leads to a
solution, but the solution yielded by greedy technique is not optimal.

Again, we consider a hypothetical country in which notes available are of the only
denominations 10, 40 and 60. We are required to collect an amount of 80.

Using the greedy technique, to make an amount of 80, first, we use a note of
denomination 60. For the remaining amount of 20, we can choose note of only
denomination 10. And , finally, for the remaining amount, we choose another note of
denomination 10. Thus, greedy technique suggests the following solution using 3
notes: 80 = 60 + 10 + 10.

However, the following solution uses only two notes:


80 = 40 + 40
Thus, the solutions suggested by Greedy technique may not be optimal.

Ex.1) Give another example in which greedy technique fails to deliver an optimal
solution.

2.3 FORMALIZATION OF GREEDY TECHNIQUE


In order to develop an algorithm based on the greedy technique to solve a general
optimization problem, we need the following data structures and functions:
(i) A set or list of given/candidate values from which choices are made, to reach
a solution. For example, in the case of Minimum Number of Notes problem,
the list of candidate values (in rupees) of notes is {1, 2, 5, 10, 20, 50, 100, 500,
1000}. Further, the number of notes of each denomination should be clearly

25
Design Techniques-II mentioned. Otherwise, it is assumed that each candidate value can be used as
many times as required for the solution using greedy technique. Let us call this
set as
GV: Set of Given Values
(ii) Set (rather multi-set) of considered and chosen values: This structure
contains those candidate values, which are considered and chosen by the
algorithm based on greedy technique to reach a solution. Let us call this
structure as
CV: Structure of Chosen Values
The structure is generally not a set but a multi-set in the sense that values may
be repeated. For example, in the case of Minimum Number of Notes problem,
if the amount to be collected is Rs. 289 then

CV = {100, 100, 50, 20, 10, 5, 2, 2}

(iii) Set of Considered and Rejected Values: As the name suggests, this is the set
of all those values, which are considered but rejected. Let us call this set as
RV: Set of considered and Rejected Values
A candidate value may belong to both CV and RV. But, once a value is put in
RV, then this value can not be put any more in CV. For example, to make an
amount of Rs. 289, once we have chosen two notes each of denomination 100,
we have
CV = {100, 100}
At this stage, we have collected Rs. 200 out of the required Rs. 289. At this
stage RV = {1000, 500}. So, we can chose a note of any denomination except
those in RV, i.e., except 1000 and 500. Thus, at this stage, we can chose a note
of denomination 100. However, this choice of 100 again will make the total
amount collected so far, as Rs. 300, which exceeds Rs. 289. Hence we reject
the choice of 100 third time and put 100 in RV, so that now RV = {1000, 500,
100}. From this point onward, we can not chose even denomination 100.

Next, we consider some of the functions, which need to be defined in an


algorithm using greedy technique to solve an optimization problem.
(iv) A function say SolF that checks whether a solution is reached or not. However,
the function does not check for the optimality of the obtained solution. In the
case of Minimum Number of Notes problem, the function SolF finds the sum of
all values in the multi-set CV and compares with the desired amount, say
Rs. 289. For example, if at one stage CV = {100, 100} then sum of values in
CV is 200 which does not equal 289, then the function SolF returns „Solution
not reached‟. However, at a later stage, when CV = {100, 100, 50, 20, 10, 5, 2,
2}, then as the sum of values in CV equals the required amount, hence the
function SolF returns the message of the form „Solution reached‟.
It may be noted that the function only informs about a possible solution.
However, solution provided through SolF may not be optimal. For instance in
the Example 2.2.3, when we reach CV = {60, 10, 10}, then SolF returns
„Solution reached‟. However, as discussed earlier, the solution
80 = 60 + 10 + 10 using three notes is not optimal, because, another solution
using only two notes, viz., 80 = 40 + 40, is still cheaper.
(v) Selection Function say SelF finds out the most promising candidate value out
of the values not yet rejected, i.e., which are not in RV. In the case of Minimum
Number of Notes problem, for collecting Rs. 289, at the stage when
RV = {1000, 500} and CV = {100, 100} then first the function SelF attempts
the denomination 100. But, through function SolF, when it is found that by

26
addition of 100 to the values already in CV, the total value becomes 300 which Greedy Techniques
exceeds 289, the value 100 is rejected and put in RV. Next, the function SelF
attempts the next lower denomination 50. The value 50 when added to the sum
of values in CV gives 250, which is less than 289. Hence, the value 50 is
returned by the function SelF.

(vi) The Feasibility-Test Function, say FeaF. When a new value say v is chosen
by the function SelF, then the function FeaF checks whether the new set,
obtained by adding v to the set CV of already selected values, is a possible part
of the final solution. Thus in the case of Minimum Number of Notes problem,
if amount to be collected is Rs. 289 and at some stage, CV = {100, 100}, then
the function SelF returns 50. At this stage, the function FeaF takes the control.
It adds 50 to the sum of the values in CV, and on finding that the sum 250 is
less than the required value 289 informs the main/calling program that {100,
100, 50} can be a part of some final solution, and needs to be explored further.

(vii) The Objective Function, say ObjF, gives the value of the solution. For
example, in the case of the problem of collecting Rs. 289; as CV = {100, 100,
50, 20, 10, 5, 2, 2} is such that sum of values in CV equals the required value
289, the function ObjF returns the number of notes in CV, i.e., the number 8.

After having introduced a number of sets and functions that may be required by
an algorithm based on greedy technique, we give below the outline of greedy
technique, say Greedy-Structure. For any actual algorithm based on greedy
technique, the various structures the functions discussed above have to be
replaced by actual functions.

These functions depend upon the problem under consideration. The Greedy-
Structure outlined below takes the set GV of given values as input parameter
and returns CV, the set of chosen values. For developing any algorithm based
on greedy technique, the following function outline will be used.

2.3.1 Function Greedy-Structure (GV:set): set


CV   {initially, the set of considered values is empty}
While GV  RV and not SolF (CV) do
begin
v  SelF (GV)
If FeaF (CV  {v}) then
CV  CV  {v}
else RV  RV  {v}
end

// the function Greedy Structure comes out


// of while-loop when either GV=RV, i.e., all
// given values are rejected or when solution is found

If SolF (CV) then returns ObjF (GV)


else return “No solution is possible”
end function GreedyStructure

2.4 MINIMUM SPANNING TREE


In this section and some of the next sections, we apply greedy technique to develop
algorithms to solve some well-known problems. First of all, we discuss the
applications for finding minimum spanning tree for a given (undirected) graph. In
order to introduce the subject matter, let us consider some of the relevant definitions.

27
Design Techniques-II Definitions
A Spanning tree of a connected graph, say G = (V, E) with V as set of vertices and E
as set of edges, is its connected acyclic subgraph (i.e., a tree) that contains all the
vertices of the graph.

A minimum spanning tree of a weighted connected graph is its spanning tree of the
smallest weight, where the weight of a tree is defined as the sum of the weights on all
its edges.

The minimum spanning tree problem is the problem of finding a minimum


spanning tree for a given weighted connected graph.

The minimum spanning tree problem has a number of useful applications in the
following type of situations:

Suppose, we are given a set of cities alongwith the distances between each pair of
cities. In view of the shortage of funds, it is desired that in stead of connecting
directly each pair of cities, we provide roads, costing least, but allowing passage
between any pair cities along the provided roads. However, the road between some
pair of cities may not be direct, but may be passing through a number of other cities.

Next, we illustrate the concept of spanning tree and minimum spanning tree through
the following example.

Let us consider the connected weighted graph G given in Figure 4.1.

1
a b

5 2

c d
3
Figure: 2.4.1

For the graph of Figure 2.4.1given above, each of Figure 2.4.2, Figure, 2. 4.3 and
Figure. 2.4.4 shows a spanning tree viz., T 1, T2 and T3 respectively.
Out of these, T1 is a minimal spanning tree of G, of weight 1+2+3 = 6.

1
a b

3
c d
T1
Figure: 2.4.2

28
1 Greedy Techniques
a b

c d

T2
Figure: 2.4.3

1
a b

5 2

c d

T3

Figure: 2.4.4

Remark 2.4.1:
The weight may denote (i) length of an edge between pair of vertices or (ii) the cost of
reaching from one town to the other or (iii) the cost of production incurred in reaching
from one stage of production to the immediate next stage of production or (iv) the cost
of construction of a part of a road or of laying telephone lines between a pair of towns.

Remark 2.4.2:
The weights on edges are generally positive. However, in some situations the weight
of an edge may be zero or even negative. The negative weight may appear to be
appropriate when the problem under consideration is not about „minimizing costs‟ but
about „maximizing profits’ and we still want to use minimum spanning tree
algorithms. However, in such cases, it is not appropriate to use negative weights,
because, more we traverse the negative-weight edge, lesser the cost. However, with
repeated traversals of edges, the cost should increase in stead of decreasing.

Therefore, if we want to apply minimum spanning tree technique to „maximizing


profit problems, then in stead of using negative weights, we replace profits p i by Mpi
where M is some positive number s.t
M > Max {pij : pij is the profit in traversing from ith node to jth node}
Remark 2.4.3:
From graph theory, we know that a given connected graph with n vertices, must have
exactly (n 1) edges.

As mentioned earlier, whenever we want to develop an algorithm based on greedy


technique, we use the function Greedy-Structure given under 2.3.1. For this purpose,

29
Design Techniques-II we need to find appropriate values of the various sets and functions discussed in
Section 3.
In the case of the problem of finding minimum-spanning tree for a given
connected graph, the appropriate values are as follows:
(i) GV: The set of candidate or given values is given by
GV = E, the set of edges of the given graph (V, E).
(ii) CV: The structure of chosen values is given by those edges from E, which
together will form the required minimum-weight spanning tree.
(iii) RV: set of rejected values will be given by those edges in E, which at some
stage will form a cycle with earlier selected edges.
(iv) In the case of the problem of minimum spanning tree, the function SolF
that checks whether a solution is reached or not, is the function that checks
that
(a) all the edges in CV form a tree,
(b) the set of vertices of the edges in CV equal V, the set of all edges in the
graph, and
(c) the sum of the weights of the edges in CV is minimum possible of the
edges which satisfy (a) and (b) above.
(v) Selection Function: depends upon the particular algorithm used for the
purpose. There are two well-known algorithms, viz., Prim‟s algorithm and
Kruskal‟s algorithm for finding the Minimum Spanning Tree. We will
discuss these algorithms in detail in subsequent sections.
(vi) FeaF: Feasibility Test Function: In this case, when the selection function
SelF returns an edge depending on the algorithm, the feasibility test function
FeaF will check whether the newly found edge forms a cycle with the earlier
slected edges. If the new edge actually forms a cycle then generally the newly
found edge is dropped and search for still another edge starts. However, in
some of the algorithms, it may happen that some earlier chosen edge is
dropped.
(vii) In the case of Minimum Spanning Tree problem, the objective function may
return
(a) the set of edges that constitute the required minimum spanning tree and
(b) the weight of the tree selected in (a) above.

Ex. 2) Find a minimal spanning tree for the following graph.


1
a b

5 2

c d
3

4
8

30
Greedy Techniques
2.5 PRIM’S ALGORITHM
The algorithm due to Prim builds up a minimum spanning tree by adding edges to
form a sequence of expanding subtrees. The sequence of subtrees is represented by the
pair (VT, ET), where VT and ET respectively represent the set of vertices and the set of
edges of a subtree in the sequence. Initially, the subtree, in the sequence, consists of
just a single vertex which is selected arbitrarily from the set V of vertices of the given
graph. The subtree is built-up iteratively by adding an edge that has minimum weight
among the remaining edges (i.e., edge selected greedily) and, which at the same time,
does not form a cycle with the earlier selected edges.

We illustrate the Prim‟s algorithm through an example before giving a semi-formal


definition of the algorithm.
Example 2.5.1 (of Prim‟s Algorithm):
Let us explain through the following example how Prim‟s algorithm finds a minimal
spanning tree of a given graph. Let us consider the following graph:
Initially
1
VT = (a) a b
ET = 

5 2

c d
3

1.5

e
Figure: 2.5.1
In the first iteration, the edge having weight which is the minimum of the weights
of the edges having a as one of its vertices, is chosen. In this case, the edge ab with
weight 1 is chosen out of the edges ab, ac and ad of weights respectively 1,5 and 2.
Thus, after First iteration, we have the given graph with chosen edges in bold and
VT and ET as follows:
1
VT = (a, b) a b
ET = ( (a,b))

5 2

c d
3

1.5

Figure: 2.5.2

31
Design Techniques-II In the next iteration, out of the edges, not chosen earlier and not making a cycle with
earlier chosen edge and having either a or b as one of its vertices, the edge with
minimum weight is chosen. In this case the vertex b does not have any edge
originating out of it. In such cases, if required, weight of a non-existent edge may be
taken as . Thus choice is restricted to two edges viz., ad and ac respectively of
weights 2 and 5. Hence, in the next iteration the edge ad is chosen. Hence, after
second iteration, we have the given graph with chosen edges and V T and ET as
follows:
VT = (a, b, d) 1
a b
ET = ((a, b), (a, d))

5 2

c d
3

1.5

Figure: 2.5.3

In the next iteration, out of the edges, not chosen earlier and not making a cycle with
earlier chosen edges and having either a, b or d as one of its vertices, the edge with
minimum weight is chosen. Thus choice is restricted to edges ac, dc and de with
weights respectively 5, 3, 1.5. The edge de with weight 1.5 is selected. Hence, after
third iteration we have the given graph with chosen edges and VT and ET as
follows:

VT = (a, b, d, e) 1
a b
ET = ((a, b), (a, d), (d, e))

5 2

c d
3

1.5

Figure: 2.5.4

In the next iteration, out of the edges, not chosen earlier and not making a cycle with
earlier chosen edge and having either a, b, d or e as one of its vertices, the edge with
minimum weight is chosen. Thus, choice is restricted to edges dc and ac with weights
respectively 3 and 5. Hence the edge dc with weight 3 is chosen. Thus, after fourth
iteration, we have the given graph with chosen edges and V T and E T as follows:

32
1
VT = (a, b, d, e, c) a b Greedy Techniques
ET = (( a b), (a d) (d e) (d c))

5 2

c d
3

1.5

e
Figure: 2.5.5

At this stage, it can be easily seen that each of the vertices, is on some chosen edge
and the chosen edges form a tree.

Given below is the semiformal definition of Prim’s Algorithm


Algorithm Spanning-Prim (G)
// the algorithm constructs a minimum spanning tree
// for which the input is a weighted connected graph G = (V, E)
// the output is the set of edges, to be denoted by ET, which together constitute a
minimum spanning tree of the given graph G
// for the pair of vertices that are not adjacent in the graph to each other, can be given
// the label  indicating “infinite” distance between the pair of vertices.
// the set of vertices of the required tree is initialized with the vertex v 0
VT  {vo }
ET   // initially ET is empty
// let n = number of vertices in V
For i = 1 to n - 1 do
find a minimum-weight edge e = (v1, u1) among all the edges such that v1 is in VT and
u1 is in V – VT.
VT  VT  { u1}
ET = ET { e}
Return ET

Ex. 3) Using Prim‟s algorithm, find a minimal spanning tree for the graph given
below: 1
a b

5 2

c d
3

4
8

33
Design Techniques-II
2.6 KRUSKAL’S ALGORITHM
Next, we discuss another method, of finding minimal spanning tree of a given
weighted graph, which is suggested by Kruskal. In this method, the emphasis is on
the choice of edges of minimum weight from amongst all the available edges, of
course, subject to the condition that chosen edges do not form a cycle.

The connectivity of the chosen edges, at any stage, in the form of a subtree, which was
emphasized in Prim‟s algorithm, is not essential.

We briefly describe the Kruskal‟s algorithm to find minimal spanning tree of a given
weighted and connected graph, as follows:
(i) First of all, order all the weights of the edges in increasing order. Then repeat
the following two steps till a set of edges is selected containing all the vertices
of the given graph.
(ii) Choose an edge having the weight which is the minimum of the weights of the
edges not selected so far.
(iii) If the new edge forms a cycle with any subset of the earlier selected edges, then
drop it, else, add the edge to the set of selected edges.

We illustrate the Kruskal’s algorithm through the following:

Example 2.6.1:
Let us consider the following graph, for which the minimal spanning tree is required.
1
a b

5 4.2

c d
3

Figure: 2.6.1

Let Eg denote the set of edges of the graph that are chosen upto some stage.

According to the step (i) above, the weights of the edges are arranged in increasing
order as the set
{1, 3, 4.2 ,5, 6}
In the first iteration, the edge (a,b) is chosen which is of weight 1, the minimum of
all the weights of the edges of the graph.

As single edge do not form a cycle, therefore, the edge (a,b) is selected, so that

Eg = ((a,b))

34
After first iteration, the graph with selected edges in bold is as shown below: Greedy Techniques

1
a b

5 4.2

c d
3

Figure: 2.6.2

Second Iteration
Next the edge (c,d) is of weight 3, minimum for the remaining edges. Also edges
(a,b) and (c,d) do not form a cycle, as shown below. Therefore, (c,d) is selected
so that,
Eg = ((a,b), (c,d))
Thus, after second iteration, the graph with selected edges in bold is as shown below:

1
a b

5 4.2

c d
3

Figure: 2.6.3

It may be observed that the selected edges do not form a connected subgraph or
subtree of the given graph.

Third Iteration
Next, the edge (a,d) is of weight 4.2, the minimum for the remaining edges. Also the
edges in Eg alongwith the edge (a,d) do not form a cycle. Therefore, (a,d) is selected
so that new Eg = ((a,b), (c,d), (a,d)). Thus after third iteration, the graph with selected
edges in bold is as shown below.

35
1
Design Techniques-II
a b

5 4.2

c d
3

e
Figure: 2.6.4
Fourth Iteration
Next, the edge (a,c) is of weight 5, the minimum for the remaining edge. However,
the edge (a,c) forms a cycles with two edges in Eg, viz., (a,d) and (c,d). Hence (a,c) is
not selected and hence not considered as a part of the to-be-found spanning tree.
1
a b

5 4.2

c d
3

e Figure: 2.6.5
At the end of fourth iteration, the graph with selected edges in bold remains the same
as at the end of the third iteration, as shown below:
1
a b

5 4.2

c d
3

e
Figure: 2.6.6

36
Fifth Iteration Greedy Techniques

Next, the edge (e,d), the only remaining edge that can be considered, is considered.
As (e,d) does not form a cycle with any of the edges in Eg. Hence the edge (e,d) is put
in Eg. The graph at this stage, with selected edge in bold is as follows.
Error!
1
a b

5 4.2

c d
3

e
Figure: 2.6.7

At this stage we find each of the vertices of the given graph is a vertex of
some edge in Eg. Further we observe that the edges in Eg form a tree, and hence, form
the required spanning tree. Also, from the choice of the edges in E g, it is clear that the
spanning tree is of minimum weight. Next, we consider semi-formal definition of
Kruskal‟s algorithm.

ALGORITHM Spanning-Kruskal (G)


// The algorithm constructs a minimum spanning tree by choosing successively edges
// of minimum weights out of the remaining edges.
// The input to the algorithm is a connected graph G = (V, E), in which V is the set of
// vertices and E, the set of edges and weight of each edge is also given.
// The output is the set of edges, denoted by ET, which constitutes a minimum
// spanning tree of G
// the variable edge-counter is used to count the number of selected edges so far.
// variable t is used to count the number of edges considered so far.
Arrange the edges in E in nondecreasing order of the weights of edges. After the
arrangement, the edges in order are labeled as e1, e2, …eE
ET   // initialize the set of tree edges as empty
edge-counter  0 // initialize the ecounter to zero
t0 // initialize the number of processed edges as zero
// let n = number of edges in V

While edge-counter < n– 1


t  t + 1 // increment the counter for number of edges considered so far
If the edges et does not form a cycle with any subset of edges in ET then
begin
// if, et alongwith edges earlier in ET do not form a cycle
// then add et to ET and increase edge counter
ET  ET  { et};
edge-counter  edge-counter + 1
end if
return ET

37
Design Techniques-II Summary of Kruskal’s Algorithm

 The algorithm is always successful in finding a minimal spanning tree


 (Sub) tree structure may not be maintained, however, finally, we get a minimal
spanning tree

Computing Complexity of Kruskals’s


Let a be the number of edges and n be the number of nodes, initially given.
Then

(i)  ( a log a) time is required for sorting the edges in increasing orders of lengths
(ii) An efficient Union-Find operation takes (2a) find operations and (n 1) merge
operations.

Thus Complexity of Kruskal‟s algorithm is O(a log a)

Ex. 4) Using Kruskal‟s algorithm, find a minimal spanning tree for the following g
graph
5
a b

4 2

c d
3

1
8

2.7 DIJKSTRA’S ALGORITHM


Directed Graph:
So far we have discussed applications of Greedy technique to solve problems
involving undirected graphs in which each edge (a, b) from a to b is also equally an
edge from b to a. In other words, the two representations (a,b) and (b,a) are for the
same edge. Undirected graphs represent symmetrical relations. For example, the
relation of „brother‟ between male members of, say a city, is symmetric. However, in
the same set, the relation of „father‟ is not symmetric. Thus a general relation may be
symmetric or asymmetric. A general relation is represented by a directed graph, in
which the (directed) edge, also called an arc, (a, b) denotes an edge from a to b.
However, the directed edge (a,b) is not the same as the directed edge (b,a). In the
context of directed graphs, (b,a) denotes the edge from b to a. Next, we formally
define a directed graph and then solve some problems, using Greedy technique,
involving directed graphs.

Actually, the notation (a,b) in mathematics is used for ordered pair of the two
elements viz., a and b in which a comes first and then b follows. And the ordered
pair (b,a) denotes a different ordered set in which b comes first and then a follows.

38
However, we have misused the notation in the sense that we used the notation (a,b) to Greedy Techniques
denote an unordered set of two elements, i.e., a set in which order of occurrence of a
and b does not matter. In Mathematics the usual notation for an unordered set is
{a,b}. In this section, we use parentheses (i.e., (and)) to denote ordered sets and
braces (i.e., {and}) to denote a geneal (i.e., unordered set).

Definition:
A directed graph or digraph G = (V(G), E(G)) where V(G) denotes the set of
vertices of G and E(G) the set of directed edges, also called arcs, of G. An arc from a
to b is denoted as (a, b). Graphically it is denoted as follows:

a b,

in which the arrow indicates the direction. In the above case, the vertex a is sometimes
called the tail and the vertex b is called the head of the arc or directed edge.

Definition:
A Weighted Directed Graph is a directed graph in which each arc has an assigned
weight. A weighted directed graph may be denoted as G = (V(G), E(G)), where any
element of E(G) may be of the form (a,b,w) where w denotes the weight of the arc
(a,b). The directed Graph G = ((a, b, c, d, e), ((b, a, 3), (b, d, 2) (a, d,7), (c, b, 4),
(c, d, 5), (d, e, 4), (e, c, 6))) is diagrammatically represented as follows:

4
b c

3
6
2 5

a d e
7 4

Figure: 2.7.1

Single-Source Shortest Path


Next, we consider the problem of finding the shortest distances of each of the vertices
of a given weighted connected graph from some fixed vertex of the given graph. All
the weights between pairs of vertices are taken as only positive number. The fixed
vertex is called the source. The problem is known as Single-Source Shortest Path
Problem (SSSPP). One of the well-known algorithms for SSSPP is due to Dijkstra.
The algorithm proceeds iteratively, by first consider the vertex nearest to the source.
Then the algorithm considers the next nearest vertex to the source and so on. Except
for the first vertex and the source, the distances of all vertices are iteratively adjusted,
taking into consideration the new minimum distances of the vertices considered
earlier. If a vertex is not connected to the source by an edge, then it is considered to
have distance  from the source.

Algorithm Single-source-Dijkstra (V,E,s)


// The inputs to the algorithm consist of the set of vertices V, the set of edges E, and s
// the selected vertex, which is to serve as the source. Further, weights w(i,j) between
// every pair of vertices i and j are given. The algorithm finds and returns dv, the
// minimum distance of each of the vertex v in V from s. An array D of the size of
// number of vertices in the graph is used to store distances of the various vertices
// from the source. Initially Distance of the source from itself is taken as 0

39
Design Techniques-II // and Distance D(v) of any other vertex v is taken as .
// Iteratively distances of other vertices are modified taking into consideration the
// minimum distances of the various nodes from the node with most recently modified
// distance
D(s)  0
For each vertex v  s do
D (v)  
// Let Set-Remaining-Nodes be the set of all those nodes for which the final minimum
// distance is yet to be determined. Initially
Set-Remaining-Nodes  V
while (Set-Remaining-Nodes  ) do
begin
choose v  Set-Remaining-Nodes such that D(v) is minimum
Set-Remaining-Nodes  Set-Remaining-Nodes  {v}
For each node x  Set-Remaining-Nodes such that w(v, x)   do
D(x)  min {D(x), D(v) + w (v, x)}
end

Next, we consider an example to illustrate the Dijkstra’s Algorithm

Example 2.7.1

For the purpose, let us take the following graph in which, we take a as the source

4
Error! b c

3
6
2 5

a d e
7 4

Figure: 2.7.2

Step Additional S = Set-of- Distances from source of


node Remaining Nodes b, c, d, e
Initialization a (b, c, d, e) [3, , 7 ]

1 b (c, d, e) [3, 3 + 4, 3 + 2, ]

2 d (c, e) [3, 3+4, 3 + 2, 3+2+4]

3 c (e) [3, 7, 5, 9]

For minimum distance from a, the node b is directly accessed; the node c is accessed
through b; the node d is accessed through b; and the node e is accessed through b and
d.

Ex. 5) Using Dijkstra‟s algorithm, find the minimum distances of all the nodes from
node b which is taken as the source node, for the following graph.

40
6 Greedy Techniques
a b

4 2

c d
3 1

1
2

2.8 SUMMARY
In this unit, we have discussed the greedy technique the essence of which is : In the
process of solving an optimization problem, initially and at subsequent stages,
evaluate the costs/benefits of the various available alternatives for the next step.
Choose the alternative which is optimal in the sense that either it is the least
costly or it is the maximum profit yielding. In this context, it may be noted that
the overall solution, yielded by choosing locally optimal steps, may not be optimal.
Next, well-known algorithms viz., Prim‟s and Kruskal‟s that use greedy technique, to
find spanning trees for connected graphs are discussed. Also Dijkstra‟s algorithm for
solving Single-Source-Shortest path problem, again using greedy algorithm, is
discussed.

2.9 SOLUTIONS/ANSWERS
Ex.1)
Consider the following graph, in which vertices/nodes represent cities of a
country and each edge denotes a road between the cities denoted by the
vertices of the edge. The label on each edge denotes the distance in 1000
kilometers between the relevant cities. The problem is to find an optimal path
from A1 to A4.

A2

5
3

9 A4
A1

2 7

A3

Then greedy techniques suggests the route A1, A3, A4 of length 9000 kilometers,
whereas the optimal path A1, A2, A4 is of length 8000 kilometers only.

41
Design Techniques-II Ex.2)

We will learn some systematic methods of finding a minimal spanning tree of


a graph in later sections. However, by hit and trial, we get the following
minimal spanning tree of the given graph.

1
a b

3
c d

Ex.3)

The student should include the explanation on the lines of Example 2.5.1.
However, the steps and stages in the process of solving the problem are as
follows.

Initially
1
a b
VT = (a)
ET = 
5 2

c d
3

4
8

42
In the following figures, the edges in bold denote the chosen edges. Greedy Techniques

After First Iteration 1


a b
VT = (a, b)
ET = ((a, b))
5 2

c d
3

4
8

1
After Second Iteration a b

VT = (a, b, d)
ET = ((a, b), (a, d)) 5 2

c d
3

4
8

1
After Third Iteration a b

VT = (a, b, c, d)
ET = ((a, b), (a, d), (c, d)) 5 2

c d
3

4
8

43
Design Techniques-II After Fourth Iteration 1
a b
VT = (a, b, c, d, e)
ET = ((a, b), (a, d), (c, d), (c, e))
5 2

c d
3

4
8

e
Ex. 4)

The student should include the explanation on the lines of Example 2.6.1.
However, the steps and stages in the process of solving the problem are as
follows:

The edges in bold denote the selected edges.

After First Iteration 5


a b
Eg = ((c, e))

4 2

c d
3

1
8

After Second Iteration 5


a b
Eg = ((c, e), (a, d))

4 2

c d
3

1
8

44
5
After Third Iteration a b Greedy Techniques

Eg = ((c, e), (a, d), (c, d))


4 2

c d
3

1
8

e
After Fourth Iteration
We can not take edge ac, because it forms a cycle with (a, d) and (c, d)
After Fifth Iteration 5
a b
Eg = ((c, e), (a, d), (c, d), (a, b))

4 2

c d
3

1
8

e
Now, on the above four edges all the vertices of the graph lie and these edges form a
tree which is the required minimal spanning tree.

Ex. 5)
A copy of the graph is given below

6
a b

4 2

c d
3 1

1
2

45
Design Techniques-II Step Additional S = Set-of- Distances from source of
node Remaining Nodes a, c, d, e
Initialization B (a, c, d, e) [6, ∞, ∞, 1]

1 E (a, c, d) [6, 2, 3 1,]

2 c (a, d) [6, 2, 3 1,]

3 d (a) [5, 2, 3, 1]

For minimum distance from b, node a is accessed through d and e; node c is accessed
through e; node d is accessed through e and node e is accessed directly.

2.10 FURTHER READINGS


1. Foundations of Algorithms, R. Neapolitan & K. Naimipour, (D.C. Health &
Company, 1996).

2. Algoritmics: The Spirit of Computing, D. Harel, (Addison-Wesley


Publishing Company, 1987).

3. Fundamental Algorithms (Second Edition), D.E. Knuth, (Narosa Publishing


House).

4. Fundamentals of Algorithmics, G. Brassard & P. Brately (Prentice-Hall


International, 1996).

5. Fundamentals of Computer Algorithms, E. Horowitz & S. Sahni, (Galgotia


Publications).

6. The Design and Analysis of Algorithms, Anany Levitin, (Pearson Education,


2003).

7. Programming Languages (Second Edition) ─ Concepts and Constructs, Ravi


Sethi, (Pearson Education, Asia, 1996).

46
Models for Executing
UNIT 3 MODELS FOR EXECUTING Algorithms-I: FA

ALGORITHMS-I: FA
Structure Page Nos.
3.0 Introduction 47
3.1 Objectives 47
3.2 Regular Expressions 47
3.2.1 Introduction to Defining of Languages
3.2.2 Kleene Closure Definition
3.2.3 Formal Definition of Regular Expressions
3.2.4 Algebra of Regular Expressions
3.3 Regular Languages 53
3.4 Finite Automata 54
3.4.1 Definition
3.4.2 Another Method to Describe FA
3.5 Summary 59
3.6 Solutions/Answers 59
3.7 Further Readings 60

3.0 INTRODUCTION
In the earlier two blocks and unit 1 and unit 2 of this block, we discussed a number of
issues and techniques about designing algorithms. However, there are a number of
problems for each of which, no algorithmic solution exists. Examples of such
problems will be provided in unit 2 of block 4. However, many of these examples are
found from the discipline of the well-known models of computation viz., finite
automata, push-down automata and Tuning machines. In this unit, we discuss the
topic of Finite Automata.

3.1 OBJECTIVES
After studying this unit, you should be able to:

 define a finite automata for computation of a language;


 obtain a finite automata for a known language;
 create a grammar from language and vice versa;
 explain and create context free grammar and language;
 define the pushdown automata;
 apply the pumping lemma for non-context free languages; and
 find the equivalence of context free grammar and Pushdown Automata.

3.2 REGULAR EXPRESSIONS


In this unit, first we shall discuss the definitions of alphabet, string, and language with
some important properties.

3.2.1 Introduction to Defining of Languages


For a language, defining rules can be of two types. The rules can either tell us how to
test a string of alphabet letters that we might be presented with, to see if it is a valid
word, i.e., a word in the language or the rules can tell us how to construct all the
words in the language by some clear procedures.

47
Design Techniques-II Alphabet: A finite set of symbols/characters. We generally denote an alphabet by .
If we start an alphabet having only one letter, say, the letter z, then  = {z}.

Letter: Each symbol of an alphabet may also be called a letter of the alphabet or
simply a letter.

Language over an alphabet: A set of words over an alphabet. Languages are


denoted by letter L with or without a subscript.

String/word over an alphabet: Every member of any language is said to be a string


or a world.

Example 1: Let L1 be the language of all possible strings obtained by


L1 = {z, zz, zzz, zzzz . . . . . .}

This can also be written as


L1 = {zn} for n = 1, 2, 3, …..

A string of length zero is said to be null string and is represented by .


Above given language L1 does not include the null string. We could have defined it
so as to include . Thus, L = {znn=0, 1, 2, 3…} contains the null string.

In this language, as in any other, we can define the operation of concatenation, in


which two strings are written down side by side to form a new longer string. Suppose
u = ab and v = baa, then uv is called concatenation of two strings u and v and is
uv = abbaa and vu = baaab. The words in this language clearly analogous to the
positive integers, and the operation of concatenation are analogous to addition:

zn concatenated with zm is the word zn+m.

Example 2: If the word zzz is called c and the word zz is called d, then the word
formed by concatenating c and d is
cd = zzzzz
When two words in our language L 1 are concatenated they produce another word in
the language L1. However, this may not be true in all languages.

Example 3: If the language is L2 = {z, zzz, zzzzz, zzzzzzz…..}


= {zodd}
= {z2n+1 for n = 0, 1, 2, 3….}
then c = zzz and d = zzzzz are both words in L2, but their concatenation cd = zzzzzzzz
is not a word in L2. The reason is simple that member of L2 are of odd length while
after concatenation it is of even length.

Note: The alphabet for L2 is the same as the alphabet for L1.

Example 4: A Language L3 may denote the language having strings of even lengths
include of length 0. In other words, L3 = {, zz, zzzz, …..}

Another interesting language over the alphabet  = {z} may be

Example 5: L4 = {zp : p is a prime natural number}.


There are infinitely many possible languages even for a single letter alphabet
 = {z}.

In the above description of concatenation we find very commonly, that for a single
letter alphabet when we concatenate c with d, we get the same word as when we
concatenate d with c, that is cd = dc But this relationship does not hold for all

48
languages. For example, in the English language when we concatenate “Ram” and Models for Executing
“goes” we get “Ram goes”. This is, indeed, a word but distinct from “goes Ram”. Algorithms-I: FA

Now, let us define the reverse of a language L. If c is a word in L, then reverse (c) is
the same string of letters spelled backward.
The reverse (L) = {reverse (w), wL}

Example 6: Reverse (zzz) = zzz


Reverse (173) = 371

Let us define a new language called PALINDROME over the alphabet  = {a,b}.

PALINDROME = {, and all strings w such that reverse (w) = w}

= {, a, b, aa, bb, aaa, aba, bab, bbb, aaaa, abba, …}

Concatenating two words in PALINDROME may or may not give a word in


palindrome, e.g., if u = abba and v = abbcba, then uv = abbaabbcbba which is not
palindrome.

3.2.2 Kleene Closure Definition


Suppose an alphabet , and define a language in which any string of letters from  is a
word, even the null string. We shall call this language the closure of the alphabet.
We denote it by writing * after the name of the alphabet as a superscript, which is
written as *. This notation is sometimes also known as Kleene Star.

For a given alphabet , the language L consists of all possible strings, including the
null string.

For example, If  = {z}, then, * = L1 = {, z, zz, zzz …..}

Example 7: If  = {0, 1}, then, * = {, 0, 1, 00, 01, 10, 11, 000, 001 …..}
So, we can say that Kleene Star is an operation that makes an infinite language of
strings of letters out of an alphabet, if the alphabet,   . However, by the definition
alphabet  may also be  . In that case, * is finite. By “infinite language, we mean a
language with infinitely many words.

Now, we can generalise the use of the star operator to languages, i.e., to a set of
words, not just sets of alphabet letters.

Definition: If s is a set of words, then by s* we mean the set of all finite strings
formed by concatenating words from s, where any word may be used as often.

Example 8: If s = {cc, d}, then


s* = { or any word composed of factors of cc and d}
= { or all strings of c’s and d’s in which c’s occur in even clumps}.
The string ccdcccd is not in s* since it has a clump of c’s of length 3.
{x : x =  or x = (cc) i1 d j1 (cc) i 2 d j2 ......(cc) im (d) jm } where i1, j1,….. im, jm  0 .

Positive Closure: If we want to modify the concept of closure to refer to only the
concatenation leading to non-null strings from a set s, we use the notation + instead of
*. This plus operation is called positive closure.

Theorem 1: For any set s of strings prove that s* = (s*)* = s**

Proof: We know that every word in s** is made up of factors from s*.

49
Design Techniques-II Also, every factor from s* is made up of factors from s.

Therefore, we can say that every word in s** is made up of factors from s.

First, we show s** s*. (i)


Let x  s …. Then x = x1…..xn for some x1  s which implies s**s*
** *

Next, we show s* s**.


s* s** (ii)

By above inclusions (i) and (ii), we prove that


s* = s**

Now, try some exercises.

Ex.1) If u = ababb and v = baa then find


(i) uv (ii) vu (iii) uv (iv) vu (v) uuv.

Ex.2) Write the Kleene closure of the following:


(i) {aa, b}
(ii) {a, ba}

3.2.3 Formal Definition of Regular Expressions


Certain sets of strings or languages can be represented in algebraic fashion, then these
algebraic expressions of languages are called regular expressions. Regular
expressions are in Bold face. The symbols that appear in regular use of the letters of
the alphabet  are the symbol for the null string , parenthesis, the star operator, and
the plus sign.

The set of regular expressions is defined by the following rules:

1. Every letter of  can be made into a regular expression  itself is a regular


expression.
2. If l and m are regular expressions, then so are
(i) (l)
(ii) lm
(iii) l+m
(iv) l*
(v) l+ = ll*
3. Nothing else is regular expression.

For example, now we would build expression from the symbols 0,1 using the
operations of union, concatenation, and Kleene closure.
(i) 01 means a zero followed by a one (concatenation)
(ii) 0+1 means either a zero or a one (union)
(iii) 0* means +0+00+000+….. (Kleen closure).

With parentheses, we can build larger expressions. And, we can associate meanings
with our expressions. Here’s how

Expression Set represented


(0+1)* all strings over {0,1}
0*10*10* strings containing exactly two ones
(0+1)*11 strings which end with two ones.

50
The language denoted/represented by the regular expression R is L(R). Models for Executing
Algorithms-I: FA
Example 9: The language L defined by the regular expression ab*a is the set of all
strings of a’s and b’s that begin and end with a’s, and that have nothing but b’s inside.
L = {aa, aba, abba, abbba, abbbba, }
Example 10: The language associated with the regular expression a*b* contains all
the strings of a’s and b’s in which all the a’s (if any) come before all the b’s (if any).

L = {, a, b, aa, ab, bb, aaa, aab, abb, bbb, aaa,…)


Note that ba and aba are not in this language. Notice also that there need not be the
same number of a’s and b’s.

Example 11: Let us consider the language L defined by the regular expression
(a+b)* a(a+b)*. The strings of the language L are obtained by concatenating a string
from the language corresponding to (a+b)* followed by a string from the language
associated with (a+b)*. We can also say that the language is a set of all words over the
alphabet  = {a,b} that have an a in them somewhere.

To make the association/correspondence/relation between the regular expressions and


their associated languages more explicit, we need to define the operation of
multiplication of set of words.

Definition: If S and T are sets of strings of letters (they may be finite or infinite sets),
we define the product set of strings of letters to be. ST = {all combinations of a string
from S concatenated with a string from T in that order}.

Example 12: If S = {a, aa, aaa}, T = {bb, bbb}


Then, ST = {abb, abbb, aabb, aabbb, aaabb, aaabbb}.

Example 13: If S = {a bb bab}, T = { bbbb}


Then, ST = {a bb bab abbbb bbbbbb babbbbb}

Example 14: If L is any language, Then, L = L = L.

Ex.3) Find a regular expression to describe each of the following languages:


(a) {a,b,c}
(b) {a,b,ab,ba,abb,baa,….}
(c) {,a,abb,abbbb,….}

Ex.4) Find a regular expression over the alphabet {0,1,} to describe the set of all
binary numerals without leading zeroes (except 0 itself). So the language is
the set
{0,1,10,11,100,101,110,111,…}.

3.2.4 Algebra of Regular Expressions


There are many general equalities for regular expressions. We will list a few simple
equalities together with some that are not so simple. All the properties can be verified
by using properties of languages and sets. We will assure that R,S and T denote the
arbitrary regular expressions.

Properties of Regular Expressions


1. (R+S)+T = R+(S+T)

51
Design Techniques-II 2. R+R = R
3. R+ = +R = R.
4. R+S = S+R
5. R = R = 
6. R = R = R
7. (RS)T = R(ST)
8. R(S+T) = RS+RT
9. (S+T)R = SR+TR
10. * = * = 
11. R*R* = R* = (R*)*
12. RR* = R*R = R* = +RR*
13. (R+S)* = (R*S*)* = (R*+S*)* = R*S* = (R*S)*R* = R*(SR*)*
14. (RS)* = (R*S*)* = (R*+S*)*

Theorem 2: Prove that R+R = R


Proof : We know the following equalities:
L(R+R) = L(R)UL(R) = L(R)
So R+R = R
Theorem 3: Prove the distributive property
R(S+T) = RS+RT
Proof: The following set of equalities will prove the property:

L(R (S+T)) = L(R)L(S+T)


= L(R)(L(S)UL(T))
= (L(R)L(S))U(L(R)L(T))
= L(RS+RT)
Similarly, by using the equalities we can prove the rest. The proofs of the rest of the
equalities are left as exercises.

Example 15: Show that R+RS*S = a*bS*, where R = b+aa*b and S is any regular
expression.

R+RS*S = R+RS*S (property 6)

= R(+S*S) (property 8)

= R(+SS*) (property 12)

= RS* (property 12)

= (b+aa*b)S* (definition of R)
= (+aa*) bS* (properties 6 and 8)

= a*bS*. (Property 12)


Try an exercise now.

Ex.5) Establish the following equality of regular expressions:


b*(abb*+aabb*+aaabb*)* = (b+ab+aab+aaab)*

52
As we already know the concept of language and regular expressions, we have an Models for Executing
important type of language derived from the regular expression, called regular Algorithms-I: FA
language.

3.3 REGULAR LANGUAGES


Language represented by a regular expression is called a regular language. In other
words, we can say that a regular language is a language that can be represented by a
regular expression.

Definition: For a given alphabet , the following rules define the regular language
associated with a regular expression.

Rule 1: ,{} and {a} are regular languages denoted respectively by regular
expressions  and .

Rule 2: For each a in , the set {a} is a regular language denoted by the regular
expression a.

Rule 3: If l is a regular expression associated with the language L and m is a regular


expression associated with the language M, then:

(i) The language = {xy : xL and yM} is a regular expression associated with the
regular expression lm.

(ii) The regular expression l+m is associated with the language formed by the union
of the sets L and M.

language (l+m) = LM


(iii) The language associated with the regular expression (l)* is L*, the Kleen Closure
of the set L as a set of words:
language (l*) = L*.
Now, we shall derive an important relation that, all finite languages are regular.

Theorem 4: If L is a finite language, then L can be defined by a regular expression.


In other words, all finite languages are regular.

Proof: A language is finite if it contains only finitely many words.

To make one regular expression that defines the language L, turn all the words in L
into bold face type and insert plus signs between them. For example, the regular
expression that defines the language L = {baa, abbba, bababa} is baa + abbba +
bababa.

Example 16: If L = {aa, ab, ba, bb}, then the corresponding regular expression is
aa + ab +ba + bb.

Another regular expression that defines this language is (a+b) (a+b).

So, a particular regular language can be represented by more than one regular
expressions. Also, by definition, each regular language must have at least one regular
expression corresponding to it.
Try some exercises.

Ex.6) Find a language to describe each of the following regular expressions:


(a) a+b (b) a+b* (c) a*bc*+ac

53
Design Techniques-II Ex.7) Find a regular expression for each of the following languages over the
alphabet {a,b}.
(a) strings with even length.
(b) strings containing the sub string aba.

In our day to day life we oftenly use the word Automatic. Automation is the process
where the output is produced directly from the input without direct involvement of
mankind. The input passes from various states in process for the processing of a
language we use very important finite state machine called finite automata.

3.4 FINITE AUTOMATA


Finite automata are important in science, mathematics, and engineering. Engineers
like them because they are superb models for circuits (and, since the advent of VLSI
systems sometimes finite automata represent circuits.) computer scientists adore them
because they adapt very likely to algorithm design. For example, the lexical analysis
portion of compiling and translation. Mathematicians are introduced by them too due
to the fact that there are several nifty mathematical characterizations of the sets they
accept.

Can a machine recognise a language? The answer is yes for some machine and some
elementary class of machines called finite automata. Regular languages can be
represented by certain kinds of algebraic expressions by Finite automaton and by
certain grammars. For example, suppose we need to compute with numbers that are
represented in scientific notation. Can we write an algorithm to recognise strings of
symbols represented in this way? To do this, we need to discuss some basic
computing machines called finite automaton.

An automata will be a finite automata if it accepts all the words of any regular
language where language means a set of strings. In other words, The class of regular
language is exactly the same as the class of languages accepted by FA’s, a
deterministic finite automata.

3.4.1 Definition
A system where energy and information are transformed and used for performing
some functions without direct involvement of man is called automaton. Examples are
automatic machine tools, automatic photo printing tools, etc.

A finite automata is similar to a finite state machine. A finite automata consists of


five parts:
(1) a finite set of states;
(2) a finite set of alphabets;
(3) an initial state;
(4) a subset of set of states (whose elements are called “yes” state or; accepting
state;) and
(5) a next-state function or a transition state function.

A finite automata over a finite alphabet A can be thought of as a finite directed graph
with the property that each node omits one labelled edge for each distinct element of
A. The nodes are called states. There is one special state called the start (or initial)
state, and there is a possible empty set of states called final states.

For example, the labelled graph in Figure1 given below represents a DFA over the
alphabet A = {a,b} with start state 1 and final state 4.

54
Models for Executing
Algorithms-I: FA

Figure 1: Finite automata

We always indicate the start state by writing the word start with an arrow painting to
it. Final states are indicated by double circle.

The single arrow out of state 4 labelled with a,b is short hand for two arrows from
state 4, going to the same place, one labelled a and one labelled b. It is easy to check
that this digraph represents a DFA over {a,b} because there is a start state, and each
state emits exactly two arrows, one labelled with a and one labelled with b.

So, we can say that a finite automaton is a collection of three tuples:


1. A finite set of states, one of which is designated as the initial state, called the start
state, and some (may be none) of which we designated as final states.

2. An alphabet  of possible input letters from which are formed strings that are to
be read one letter at a time.

3. A finite set of transitions that tell for each state and for each letter of the input
alphabet which state to go to next.

For example, the input alphabet has only two letters a and b. Let us also assume that
there are only three states, x, y and z. Let the following be the rules of transition:
1. from state x and input a go to state y;

2. from state x and input b go to state z;

3. from state y and input b go to state x;

4. from state y and input b go to state z; and

5. from state z and any input stay at state z.

Let us also designate state x as the starting state and state z as the only final state.
Let us examine what happens to various input strings when presented to this FA. Let
us start with the string aaa. We begin, as always, in state x. The first letter of the
string is an a, and it tells us to go state y (by rule 1). The next input (instruction) is
also an a, and this tells us (by rule 3) to go back to state x. The third input is another
a, and (by Rule 1) again we go to the state y. There are no more input letters in the

55
Design Techniques-II input string, so our trip has ended. We did not finish in the final state (state z), so we
have an unsuccessful termination of our run.

The string aaa is not in the language of all strings that leave this FA in state z. The set
of all strings that do leave as in a final state is called the language defined by the finite
automaton. The input string aaa is not in the language defined by this FA. We may
say that the string aaa is not accepted by this FA because it does not lead to a final
state. We may also say “aaa is rejected by this FA.” The set of all strings accepted is
the language associated with the FA. So, we say that L is the language accepted by
this FA. FA is also called a language recogniser.

Let us examine a different input string for this same FA. Let the input be abba. As
always, we start in state x. Rule 1 tells us that the first input letter, a, takes us to state
y. Once we are in state y we read the second input letter, which is ab. Rules 4 now
tells us to move to state z. The third input letter is a b, and since we are in state z,
Rule 5 tells us to stay there. The fourth input letter is an a, and again Rule 5 says state
z. Therefore, after we have followed the instruction of each input letter we end up in
state z. State z is designated as a final state. So, the input string abba has taken us
successfully to the final state. The string abba is therefore a word in the language
associated with this FA. The word abba is accepted by this FA.

It is not difficult for us to predict which strings will be accepted by this FA. If an
input string is made up of only the letter a repeated some number of times, then the
action of the FA will be jump back and forth between state x and state y. No such
word can ever be accepted.

To get into state z, it is necessary for the string to have the letter b in it as soon as a b
is encountered in the input string, the FA jumps immediately to state z no matter what
state it was before. Once in state z, it is impossible to leave. When the input strings
run out, the FA will still be in state z, leading to acceptance of the string.

So, the FA above will accept all the strings that have the letter b in them and no other
strings. Therefore, the language associated with this FA is the one defined by the
regular expression (a+b)* b(a+b)*.

The list of transition rules can grow very long. It is much simpler to summarise them
in a table format. Each row of the table is the name of one of the states in FA, and
each column of this table is a letter of the input alphabet. The entries inside the table
are the new states that the FA moves into the transition states. The transition table for
the FA we have described is:

Table 1
Input
State
a b
Start x y z
y x z
Final z z z

The machine we have already defined by the transition list and the transition table can
be depicted by the state graph in Figure 2.

56
Models for Executing
Algorithms-I: FA

Figure 2: State transition graph

Note: A single state can be start as well as final state both. There will be only one
start state and none or more than one final states in Finite Automaton.

3.4.2 Another Method to Describe FA


There is a traditional method to describe finite automata which is extremely intuitive.
It is a picture called a graph. The states of the finite automaton appear as vertices of
the graph while the transitions from state to state under inputs are the graph edges.
The state graph for the same machine also appears in Figure 3 given below.

Figure 3: Finite automata

The finite automata shown in Figure 3 can also be represented in Tabular form as
below:

Table 2
Input
State 0 1 Accept?

Start 1 1 2 No
Final 2 2 3 Yes
3 3 3 No

Before continuing, let’s examine the computation of a finite automaton. Our first
example begins in state one and reads the input symbols in turn changing states as
necessary. Thus, a computation can be characterized by a sequence of states. (Recall
that Turing machine configurations needed the state plus the tape content. Since a
finite automata on never writes, we always know what is on the tape and need only
look at a state as a configuration). Here is the sequence for the input 0001001.

Input Read : 0 0 0 1 0 0 1
States : 1  1  1  1  2  2  2  3

Example 17 (An elevator controller): Let’s imagine an elevator that serves two
floors. Inputs are calls to a floor either from inside the elevator or from the floor
itself. This makes three distinct inputs possible, namely:

57
Design Techniques-II 0 - no calls
1 - call to floor one
2 - call to floor two

The elevator itself can be going up, going down, or halted at a floor. If it is on a floor,
it could be waiting for a call or about to go to the other floor. This provides us with
the six states shown in Figure 4 along with the state graph for the elevator controller.

W1 Waiting on first floor


U1 About to go up
UP Going up
DN Going down
W2 Waiting-second floor
D2 About to go down.

Figure 4: Elevator control

A transition state table for the elevator is given in Table 3:

Table 3: Elevator Control

State Input
None call to 1 call to 2
W1 (wait on 1) W1 W1 UP
U1 (start up) UP U1 UP
UP W2 D2 W2
DN W1 W1 U1
W2 (wait on 2) W2 DN W2
D2 (start down) DN DN D2

Accepting and rejecting states are not included in the elevator design because
acceptance is not an issue. If we were to design a more sophisticated elevator, it
might have states that indicated:

Finite automata
a) power faukyrem
b) overloading, or
c) breakdown

In this case, acceptance and rejection might make sense.

58
Let us make a few small notes about the design. If the elevator is about to move ( i.e., Models for Executing
in state U1 or D2) and it is called to the floor it is presently on it will stay. (This may Algorithms-I: FA
be good Try it next time you are in an elevator.) And, if it is moving (up or down)
and gets called back the other way, it remembers the call by going to the U1 or D2
state upon arrival on the next floor. Of course, the elevator does not do things like
open and close doors (these could be states too) since that would have added
complexity to the design. Speaking of complexity, imagine having 100 floors.
That is our levity for this section. Now that we know what a finite automaton is, we
must (as usual) define it precisely.

Definition : A finite automation M is a quintuple M = (Q,,,qO,F) where :

Q is a finite set (of states)


 is a finite alphabet (of input symbols)
: Q    Q (next state function)
qOQ (the starting state)
FQ (the accepting states)

We also need some additional notation. The next state function is called the transition
function and the accepting states are often called final states. The entire machine is
usually defined by presenting a transition state table or a transition diagram. In this
way, the states, alphabet, transition function, and final states are constructively
defined. The starting state is usually the lowest numbered state. Our first example of
a finite automaton is:

M = ({q1, q2, q3}, {0,1}, , q1, {q2}

Where the transition function , is defined explicitly by either a state table or a state
graph.

3.5 SUMMARY
In this unit we introduced several formulations for regular languages, regular
expressions are algebraic representations of regular languages. Finite Automata are
machines that recognise regular languages. From regular expressions, we can derive
regular languages. We also made some other observations. Finite automata can be
used as output devices - Mealy and Moore machines.

3.6 SOLUTIONS/ANSWERS
Ex.1)
(i) ababbbaa
(ii) baaababb
(iii) ab abb ab abb
(iv) baa baa
(v) ababbababb baa

Ex.2)
(i) Suppose aa = x
Then { x, b}* = {, x, b, xx, bb, xb, bx, xxx, bxx, xbx, xxb, bbx, bxb, xbb,
bbb} substituting x = aa
{aa,b}* = { , aa, b, aaaa, bb, aab, baa, aaaaaa, baaaa, aabaa,

(ii) {a,ba}*= { , a, ba, aa, baba, aba, baa, … }

59
Design Techniques-II Ex.3)
(a) a+b+c
(b) ab*+ba*
(c) +a(bb)*

Ex.4)
0+1(0+1)*

Ex.5)
Starting with the left side and using properties of regular expressions, we get

b*(abb* + aabb*+aaabb*)*
= b*((ab+aab+aaab)b*)* (property 9)
= (b + ab + aab + aaab)* (property 7).

Ex.6)
(a) {a,b}
(b) {a,,b,bb,….bn,….}
(c) {a,b,ab,bc,abb,bcc,…abn,bcn,….}

Ex.7)
(a) (aa+ab+ba+bb)*
(b) (a+b)* aba(a+b)*

3.7 FURTHER READINGS


1. Elements of the Theory of Computation, H.R. Lewis & C.H.Papadimitriou, PHI,
(1981).

2. Introduction to Automata Theory, Languages, and Computation (II Ed.), J.E.


Hopcroft, R.Motwani & J.D.Ullman: Pearson Education Asia (2001).

3. Introduction to Automata Theory, Language, and Computation, J.E. Hopcroft


and J.D. Ullman: Narosa Publishing House (1987).

4. Introduction to Languages and Theory of Computation, J.C. Martin,


Tata-Mc Graw-Hill (1997).

5. Computers and Intractability – A Guide to the theory of NP-Completeness,


M.R. Garey & D.S. Johnson: W.H. Freeman and Company (1979).

60
Models for Executing
UNIT 4 MODELS FOR EXECUTING Algorithms-II: PDFA &
CFG
ALGORITHMS-II: PDFA & CFG
Structure Page Nos.
4.0 Introduction 61
4.1 Objectives 61
4.2 Formal Language & Grammar 61
4.3 Context Free Grammar (CFG) 68
4.4 Pushdown Automata (PDA) 72
4.5 Summary 74
4.6 Solutions/Answers 74
4.7 Further Readings 75

4.0 INTRODUCTION
We have mentioned earlier that not every problem can be solved algorithmically and
that good sources of examples of such problems are provided by formal models of
computation viz., FA, PDFA and TA. In the previous unit, we discussed FA. In this
unit, we discuss PDFA, CFG and related topics.

4.1 OBJECTIVES
After going through this unit, you should be able to:

 explain and create context free grammar and language;


 define the pushdown automata;
 find the equivalence of context free grammar and Pushdown Automata, and
 create a grammar from language and vice versa.

4.2 FORMAL LANGUAGE & GRAMMAR


In our day-to-day life, we often use the common words such as grammar and
language. Let us discuss it through one example.
Example 1: If we talk about a sentence in English language, “Ram reads”, this
sentence is made up of Ram and reads. Ram and reads are replaced for <noun> and
<verb>. We can say simply that a sentence is changed by noun and verb and is
written as

<sentence>  <noun> <verb>


where noun can be replaced with many such values as Ram, Sam, Gita…. and also
<verb> can be replaced with many other values such as read, write, go …. As noun
and verb are replaced, we easily write

<noun>  I
<noun>  Ram
<noun>  Sam
<verb>  reads
<verb>  writes

From the above, we can collect all the values in two categories. One is with the
parameter changing its values further, and another is with termination. These

61
Design Techniques-II collections are called variables and terminals, respectively. In the above discussion
variables are, <sentence>, <noun> and <verb>, and terminals are I, Ram, Sam, read,
write. As the sentence formation is started with <sentence>, this symbol is special
symbol and is called start symbol.

Now formally, a Grammar G = (V, , P, S) where,

 V is called the set of variables. e.g., {S, A, B, C}


  is the set of terminals, e.g., {a, b}
 P is a set of production rules
(- Rules of the form A   where A (VU)+ and  (VU)+ e.g., S  aA).
 S is a special variable called the start symbol SV.

Structure of grammar: If L is a language over an alphabet A, then a grammar for L


consists of a set of grammar rules of the form

xy
where x and y denote strings of symbols taken from A and from a set of grammar
symbols disjoint from A. The grammar rule x  y is called a production rule, and
application of production rule (x is replaced by y), is called derivation.

Every grammar has a special grammar symbol called the start symbol and there must
be at least one production with the left side consisting of only the start symbol. For
example, if S is the start symbol for a grammar, then there must be at least one
production of the form S y.

Example 2: Suppose A = {a, b, c} then a grammar for the language A* can be


described by the following four productions:

S  (i)
S aS (ii)
S bS (iii)
S cS (iv)

S  aS  aaS  aacS  aacbS  aacb = aacb


using using using using using
prod.(u) prod.(ii) prod.(iv) prod.(iii) prod.(i)

The desired derivation of the string is aacb. Each step in a derivation corresponds to a
branch of a tree and this true is called parse tree, whose root is the start symbol. The
completed derivation and parse tree are shown in the Figure 1,2,3:

Figure 1: S  aS Figure 2: S  aS  aaS

62
Models for Executing
Algorithms-II: PDFA &
CFG

Figure 3: S  aS  aaS  aacS

Let us derive the string aacb, its parse tree is shown in Figure 4.
S  aS  aaS  aacS  aacbS  aacb = aacb

Figure 4: Parse tree deriving aacb

Sentential form: A string made up of terminals and/or non-terminals is called a


sentential form.

In example 1, formally grammar is rewritten as

In G = (V, , P, S) where
V = {<sentence>, <noun>, <verb>}
 = {Ram, reads,…}
P = <sentence>  <noun> <verb>
<noun>  Ram
<verb>  reads, and
S = <sentence>

If x and y are sentential forms and    is a production, then the replacement of 


by  in xy is called a derivation, and we denote it by writing

xy  xy

63
Design Techniques-II To the left hand side of the above production rule x is left context and y is right
context. If the derivation is applied to left most variable of the right hand side of any
production rule, then it is called leftmost derivation. And if applied to rightmost then
is called rightmost derivation.

The language of a Grammar:


A language is generated from a grammar. If G is a grammar with start symbol S and
set of terminals , then the language of G is the set
*
L(G) = {WW* and S  W}.
G
Any derivation involves the application production Rules. If the production rule is
*
applied once, then we write α  B. When it is more than one, it is written as α  .
G a
Recursive productions: A production is called recursive if its left side occurs on its
right side. For example, the production S  aS is recursive. A production A   is
indirectly recursive. If A derives a sentential form that contains A, Then, suppose we
have the following grammar:

S  b/aA
A  c/bS

the productions S  aA and A  bs are both indirectly recursive because of the


following derivations:

S  aA  abS,
A  bS  baA
A grammar is recursive if it contains either a recursive production or an indirectly
recursive production.

A grammar for an infinite language must be recursive.

Example 3: Consider {, a, aa, …, an, …} = {ann0}.

Notice that any string in this language is either  or of the form ax for some string x in
the language. The following grammar will derive any of these strings:

S  /aS.
Now, we shall derive the string aaa:

S  aS  aaS  aaaS  aaa.

Example 4: Consider {, ab, aabb, …, an bn, …} = {anbnn0}.

Notice that any string in this language is either  or of the form axb for some string x
in the language. The following grammar will derive any of the strings:

S  /aSb.
For example, we will derive the string aaabbb;

S  aSb  aaSbb  aaaSbbb  aaabbb.

Example 5: Consider a language {, ab, abab, …, (ab)n, …} = {(ab)nn0}.

Notice that any string in this language is either  or of the form abx for some string x
in the language. The following grammar will derive any of these strings:

S  /abS.

64
For example, we shall derive the string ababab: Models for Executing
Algorithms-II: PDFA &
S  abS  ababS  abababS  ababab. CFG

Sometimes, a language can be written in terms of simpler languages, and a grammar


can be constructed for the language in terms of the grammars for the simpler
languages. We will now concentrate on operations of union, product and closure.

Suppose M and N are languages whose grammars have disjoint sets of non-terminals.
Suppose also that the start symbols for the grammars of M and N are A and B,
respectively. Then, we use the following rules to find the new grammars generated
from M and N:

Union Rule: The language MN starts with the two productions

S  A/B.
Product Rule: The language MN starts with the production.

S  AB
Closure Rule: The language M* starts with the production

S  AS/.
Example 6: Using the Union Rule:

Let’s write a grammar for the following language:

L = {, a, b, aa, bb, …, an, bn, …}.


L can be written as union.

L = MN,

Where M = {ann0} and N = {bn n0}.

Thus, we can write the following grammar for L:

S  AB union rule,


A  /aA grammar for M,
B  /bB grammar for N.

Example 7: Using the Product Rule:

We shall write a grammar for the following language:

L = {ambnm,n0}.

L can be written as a product L = MN, where M = {a mm0} and N = {bnn0}.


Thus we can write the following grammar for L:

S  AB product rule
A  /aA grammar for M,
B  /bB grammar for N,
Example 8: Using the Closure Rule: For the language L of all strings with zero or
more occurrence of aa or bb. L = {aa, bb}*. If we let M = {aa, bb}, then L = M*.
Thus, we can write the following grammar for L:

S  AS/ closure rule,


A  aa/bb grammar for M.

65
Design Techniques-II We can simplify the grammar by substituting for A to obtain the following grammar:

S  aaS/bbS/

Example 9: Let  = {a, b, c}. Let S be the start symbol. Then, the language of
palindromes over the alphabet  has the grammar.

S  aSa/bSb/cSc/a/b/c/.

For example, the palindrome abcba can be derived as follows:

S  aSa  abSba  abcba


Ambiguity: A grammar is said to be ambiguous if its language contains some string
that has two different parse tree. This is equivalent to saying that some string has two
distinct leftmost derivations or that some string has two distinct rightmost derivations.

Example 10: Suppose we define a set of arithmetic expressions by the grammar:

E  a/b/EE

Figure 5: Parse tree Figure 6: Parse tree showing ambiguity

This is the parse tree for an ambiguous string.

The language of the grammar E  a/b/E-E contains strings like a, b, ba, aba, and
bbab. This grammar is ambiguous because it has a string, namely, aba, that has
two distinct parse trees.

Since having two distinct parse trees mean the same as having two distinct left most
derivations.

E  EE  aE  aEE  a  b  E  a  b  a.


E  EE  EEE  aEE  a  b  E  a  b  a.
The same is the case with rightmost derivation.

 A derivation is called a leftmost derivation if at each step the leftmost non-


terminal of the sentential form is reduced by some production.

 A derivation is called a rightmost derivation if at each step the rightmost non-


terminal of the sentential form is reduced by some production.

Let us try some exercises.

Ex.8) Given the following grammar

66
S  S[S]/ Models for Executing
Algorithms-II: PDFA &
CFG
For each of the following strings, construct a leftmost derivation, a rightmost
derivation and a parse tree.

(a) [] (b) [[ ]] (c) [][] (d) [[] [[]]]


Ex.9) Find a grammar for each language

(a) {ambnm,nN, n>m}.


(b) {ambcnnN}.

Ex.10) Find a grammar for each language:

(a) The even palindromes over {a, b}.


(b) The odd palindromes over {a, b}.

Chomsky Classification for Grammar:


As you have seen earlier, there may be many kinds of production rules. So, on the
basis of production rules we can classify a grammar. According to Chomsky
classification, grammar is classified into the following types:

Type 0: This grammar is also called unrestricted grammar. As its name suggests,
it is the grammar whose production rules are unrestricted.

All grammars are of type 0.

Type 1: This grammar is also called context sensitive grammar. A production of


the form xAy  xy is called a type 1 production if , which means length of the
working string does not decrease.

In other words, xAyxyas . Here, x is left context and y is right context.

A grammar is called type 1 grammar, if all of its productions are of type 1. For this,
grammar S   is also allowed.

The language generated by a type 1 grammar is called a type 1 or context sensitive


language.

Type 2: The grammar is also known as context free grammar. A grammar is called
type 2 grammar if all the production rules are of type 2. A production is said to be of
type 2 if it is of the form A   where AV and (V)*. In other words, the left
hand side of production rule has no left and right context. The language generated by
a type 2 grammar is called context free language.

Type 3: A grammar is called type 3 grammar if all of its production rules are of type
3. (A production rule is of type 3 if it is of form A  , A  a or A  aB where
a and A,BV), i.e., if a variable derives a terminal or a terminal with one variable.
This type 3 grammar is also called regular grammar. The language generated by
this grammar is called regular language.

Ex.11) Find the highest type number that can be applied to the following grammar:
(a) S  ASB/b, A  aA
(b) S  aSa/bSb/a/b/
(c) S  Aa, A S/Ba, B  abc.

67
Design Techniques-II
4.3 CONTEXT FREE GRAMMAR (CFG)
We know that there are non-regular languages. For example, {a nbnn0} is non-
regular language. Therefore, we can’t describe the language by any of the four
representations of regular languages, regular expressions, DFAs, NFAs, and regular
grammars.

Language {anbnn0} can be easily described by the non-regular grammar:

S  /aSb.
So, a context-free grammar is a grammar whose productions are of the form :

Sx
Where S is a non-terminal and x is any string over the alphabet of terminals and non-
terminals. Any regular grammar is context-free. A language is context-free language
if it is generated by a context-free grammar.

A grammar that is not context-free must contain a production whose left side is a
string of two or more symbols. For example, the production Sc  x is not part of any
context-free grammar.

Most programming languages are context-free. For example, a grammar for some
typical statements in an imperative language might look like the following, where the
words in bold face are considered to be the single terminals:

S  while E do S/ if E then S else S/{SL}/I: = E

L  SL/

E ….(description of an expression)
I ….(description of an identifier).

We can combine context-free languages by union, language product, and closure to


form new context-free languages.

Definition: A context-free grammar, called a CFG, consists of three components:

1. An alphabet  of letters called terminals from which we are going to make


strings that will be the words of a language.
2. A set of symbols called non-terminals, one of which is the symbols, start
symbol.
3. A finite set of productions of the form

One non-terminal  finite string of terminals and/or non-terminals.

Where the strings of terminals and non-terminals can consist of only terminals or of
only non-terminals, or any combination of terminals and non-terminals or even the
empty string.

The language generated by a CFG is the set of all strings of terminals that can be
produced from the start symbol S using the productions as substitutions. A language
generated by a CFG is called a context-free language.

Example 11: Find a grammar for the language of decimal numerals by observing that
a decimal numeral is either a digit or a digit followed by a decimal numeral.

68
S  D/DS Models for Executing
Algorithms-II: PDFA &
D  0/1/2/3/4/5/6/7/8/9 CFG
S  DS  7S  7DS  7DDS  78DS  780S  780D  780.
Example 12: Let the set of alphabet A = {a, b, c}

Then, the language of palindromes over the alphabet A has the grammar:

S  aSabSbcScabc
For example, the palindrome abcba can be derived as follows:

P  aPa  abPba  abcba

Example 13: Let the CFG is S  LLA

A  LADA
L  ab…Z
D  01…9
The language generated by the grammar has all the strings formed by a, b,c ….z, 0,
1,…..9.

We shall give a derivation of string a2b to show that it is an identifier.

S  LA  aA  aDA  a2A  a2LA  a2bA  a2b


Context-Free Language: Since the set of regular language is closed under all the
operations of union, concatenation, Kleen star, intersection and complement. The set
of context free languages is closed under union, concatenation, Kleen star only.
Union

Theorem 1: if L1 and L2 are context-free languages, then L1UL2 is a context-free


language.

Proof: If L1 and L2 are context-free languages, then each of them has a context-free
grammar; call the grammars G1 and G2. Our proof requires that the grammars have
no non-terminals in common. So we shall subscript all of G1’s non-terminals with a 1
and subscript all of G2’s non-terminals with a 2. Now, we combine the two grammars
into one grammar that will generate the union of the two languages. To do this, we
add one new non-terminal, S, and two new productions.

S  S1
 S2
S is the starting non-terminal for the new union grammar and can be replaced either
by the starting non-terminal for G1 or for G2, thereby generating either a string from
L1 or from L2. Since the non-terminals of the two original languages are completely
different, and once we begin using one of the original grammars, we must complete
the derivation using only the rules from that original grammar. Note that there is no
need for the alphabets of the two languages to be the same.

Concatenation
Theorem 2: If L1 and L2 are context-free languages, then L1L2 is a context-free
language.

Proof : This proof is similar to the last one. We first subscript all of the non-terminals
of G1 with a 1 and all the non-terminals of G2 with a 2. Then, we add a new
nonterminal, S, and one new rule to the combined grammar:

S  S1S2

69
Design Techniques-II
S is the starting non-terminal for the concatenation grammar and is replaced by the
concatenation of the two original starting non-terminals.

Kleene Star
Theorem 3: If L is a context-free language, then L* is a context-free language.

Proof : Subscript the non-terminals of the grammar for L with a 1. Then add a new
starting nonterminal, S, and the rules

S  S1S


The rule S  S1S is used once for each string of L that we want in the string of L*,
then the rule S   is used to kill off the S.

Intersection
Now, we will show that the set of context-free languages is not closed under
intersection. Think about the two languages L1 = {anbncmn,m0} and
L2 = {ambncnn,m0}. These are both context-free languages and we can give a
grammar for each one:

G1:

S  AB
A  aAb

B  cB


G2:
S  AB
A  aA

B  bBc


The strings in L1 contain the same number of a’s as b’s, while the strings in L 2 contain
the same number of b’s as c’s. Strings that have to be both in L 1 and in L2, i.e., strings
in the intersection, must have the same numbers of a’s as b’s and the same number of
b’s as c’s.

Thus, L1L2 = {anbncnn0}. Using Pumping lemma for context-free languages it


n
can be proved easily that { a b n c n n  0} is not context-free language. So, the class
of context-free languages is not closed under intersection.

Although the set is not closed under intersection, there are cases in which the
intersection of two context-free languages is context-free. Think about regular
languages, for instance. All regular languages are context-free, and the intersection of
two regular languages is regular. We have some other special cases in which an
intersection of two context-free languages is context, free.

Suppose that L1 and L2 are context-free languages and that L1L2. Then L2L1 = L1
which is a context-free language. An example is EQUAL {anbn}. Since strings in

70
{anbn} always have the same number of a’s as b’s, the intersection of these two Models for Executing
languages is the set {anbn}, which is context-free. Algorithms-II: PDFA &
CFG

Another special case is the intersection of a regular language with a non-regular


context-free language. In this case, the intersection will always be context-free. An
example is the intersection of L1 = a+b+a+, which is regular, with L2= PALINDROME.
L1L2 = {anbmanm,n  0}. This language is context-free.

Complement
The set of context-free languages is not closed under complement, although there are
again cases in which the complement of a context-free language is context-free.

Theorem 4: The set of context-free languages is not closed under complement.


Proof: Suppose the set is closed under complement. Then, if L1 and L2 are context-
free, so are L1 and L2. Since the set is closed under union, L1 L2 is also context-
free, as is (L1 L2). But, this last expression is equivalent to L1L2 which is not
guaranteed to be context-free. So, our assumption must be incorrect and the set is not
closed under complement.

Here is an example of a context-free language whose complement is not context-free.


The language {anbncnn1} is not context-free, but the author proves that the
complement of this language is the union of seven different context-free languages
and is thus context-free. Strings that are not in {a nbncnn1} must be in one of the
following languages:

1. Mpq = {apbqcrp,q,r1 and p>q} (more a’s than b’s)


2. Mqp = {apbqcrp,q,r1 and q>p} (more b’s than a’s)
3. Mpr = {apbqcrp,q,r1 and s>r} (more a’s than c’s)
4. Mrp = {apbqcrp,q,r1 and r>p} (more c’s than a’s)
5. M = the complement of a+b+c+ (letters out of order)

Using Closure Properties


Sometimes, we can use closure properties to prove that a language is not context-free.
Consider the language our author calls DOUBLEWORD = {www(a+b)*}. Is this
language context-free? Assume that it is. Form the intersection of DOUBLEWORD
with the regular language a+ b+ a+ b+, we know that the intersection of a context-free
language and a regular language is always context-free. The intersection of
DOUBLEWORD and is anbmanbmn,m  1}. But, this language is not context-free, so
DOUBLEWORD cannot be context-free.

Think carefully when doing unions and intersections of languages if one is a superset
of the other. The union of PALINDROME and (a+b)* is (a+b)*, which is regular. So,
sometimes the union of a context-free language and a regular language is regular. The
union of PALINDROME and a* is PALINDROME, which is context-free but not
regular.

Now try some exercises:

Ex.12) Find CFG for the language over  = {a,b}.

(a) All words of the form

ax by az, where x, y, z = 1,2,3… and y = 5x+7z

(b) For any two positive integers p and q, the language of all words of the
form ax by az, where x, y, z = 1, 2, 3… and y = px + qz.

71
Design Techniques-II
4.4 PUSHDOWN AUTOMATA (PDA)
Informally, a pushdown automata is a finite automata with stack. The corresponding
acceptor of context-free grammar is pushdown automata. There is one start state and
there is a possibly empty-set of final states. We can imagine a pushdown automata as
a machine with the ability to read the letters of an input string, perform stack
operations, and make state changes.

The execution of a PDA always begins with one symbol on the stack. We should
always specify the initial symbol on the stack. We assume that a PDA always begins
execution with a particular symbol on the stack. A PDA will use three stack
operations as follows:
(i) The pop operation reads the top symbol and removes it from the stack.
(ii) The push operation writes a designated symbol onto the top of the stack. For
example, push (x) means put x on top of the stack.
(iii) The nop does nothing to the stack.

We can represent a pushdown automata as a finite directed graph in which each state
(i.e., node) emits zero or more labelled edges. Each edge from state i to state j
labelled with three items as shown in the Figure 7, where L is either a letter of an
alphabet or , S is a stack symbol, and 0 is the stack operation to be performed.
LS
0
i j

Figure 7: Directed graph

It takes fine pieces of information to describe a labelled edge. We can also represent
it by the following 5-tuple, which is called a PDA instruction.
(i, L, S, 0, j)
An instruction of this form is executed as follows, where w is an input string whose
letters are scanned from left to right.

If the PDA is in state i, and either L is the current letter of w being scanned or L = ,
and the symbol on top of the stack is S, then perform the following actions:
(1) execute the stack operation 0;
(2) move to the state j; and
(3) if L  , then scan right to the next letter of w.

A string is accepted by a PDA if there is some path (i.e., sequence of instructions)


from the start state to the final state that consumes all letters of the string. Otherwise,
the string is rejected by the PDA. The language of a PDA is the set of strings that it
accepts.

Nondeterminism: A PDA is deterministic if there is at most one move possible from


each state. Otherwise, the PDA is non-deterministic. There are two types of non-
determinism that may occur. One kind of non-determinism occurs exactly when a
state emits two or more edges labelled with the same input symbol and the same stack
symbol. In other words, there are two 5-tuples with the same first three components.
For example, the following two 5-tuples represent nondeterminism:
(i, b, c, pop, j)
(i, b, c, push(D), k).

72
The second kind of nondeterminism occurs when a state emits two edges labelled with Models for Executing
Algorithms-II: PDFA &
the same stack symbol, where one input symbol is  and the other input symbol is not. CFG
For example, the following two 5-tuples represent non-determinism because the
machine has the option of consuming the input letter b or cleaning it alone.

(i, , c, pop, j)
(i, b, c, push(D), k).

Example 14: The language {anbnn0} can be accepted by a PDA. We will keep
track of the number of a’s in an input string by pushing the symbol Y onto the stack
for each a. A second state will be used to pop the stack for each b encountered. The
following PDA will do the job, where x is the initial symbol on the stack:

Figure 8: Pushdown automata

The PDA can be represented by the following six instructions:

(0, , X, nop, 2)
(0, a, X, push(Y), 0),
(0, a, Y, push(Y), 0),
(0, b, Y, pop,1),
(1, b, Y, pop,1),
(1, , X, nop,2).

This PDA is non-deterministic because either of the first two instructions in the list
can be executed if the first input letter is a and X is on the top of the stack. A
computation sequence for the input string aabb can be written as follows:

(0, aabb, X) start in state 0 with X on the stack,


(0, abb, YX) consume a and push Y,
(0, bb, YYX) consume a and push Y,
(1, b, YX) consume b and pop.
(0, , X) consume b and pop .
(2, , X) move to the final state.

Equivalent Forms of Acceptance:


Above, we defined acceptance of a string by a PDA in terms of final state acceptance.
That is a string is accepted if it has been consumed and the PDA is in a final state.
But, there is an alternative definition of acceptance called empty stack acceptance,
which requires the input string to be consumed and the stock to be empty, with no
requirement that the machine be in any particular state. The class of languages
accepted by PDAs that use empty stack acceptance is the same class of languages
accepted by PDAs that use final state acceptance.

73
Design Techniques-II Example 15: (An empty stack PDA): Let’s consider the language {anbnn0}, the
PDA that follows will accept this language by empty stack, where X is the initial
symbol on the stack.

Figure 9: Pushdown automata

PDA shown in Figure 9 can also be represented by the following three instructions:

(0, a, X, push (X), 0),


(0, , X, pop, 1),
(1, b, X, pop, 1).

This PDA is non-determinstic. Let’s see how a computation proceeds. For example,
a computation sequence for the input string aabb can be as follows:

(0, aabb, X) start in state 0 with X on the stack


(0, abb, XX) consume a and push X
(0, bb, XXX) consume a and push X
(1, bb, XX) pop.
(1, b, X) consume b and pop
(1, , ) consume b and pop (stack is empty)

Now, try some exercises.

Ex.13) Build a PDA that accepts the language odd palindrome.

Ex.14) Build a PDA that accepts the language even palindrome.

4.5 SUMMARY
In this unit we have considered the recognition problem and found out whether we can
solve it for a larger class of languages. The corresponding accepter for the context-
free languages are PDA’s. There are some languages which are not context free. We
can prove the non-context free languages by using the pumping lemma. Also in this
unit we discussed about the equivalence two approaches, of getting a context free
language. One approach is using context free grammar and other is Pushdown
Automata.

4.6 SOLUTIONS/ANSWERS
Ex.1)

(a) S  S[S]  [S]  [ ]


(b) S  S[S]  [S]  [S[S ]]  [[S]  [[]].
Similarly rest part can be done.

74
Ex.2) Models for Executing
Algorithms-II: PDFA &
(a) S  aSb/aAb CFG
A  bA/b

Ex.3)
(a) S  aSa/bSb/

(b) S  aSa/bSb/a/b.

Ex.4)

(a) S  ASB (type 2 production)


S  b (type 3 production)
A  aA (type 3 production)

So the grammar is of type 2.

(b) S  aSa (type 2 production)


S  bSb (type 2 production)
S  a (type 3 production)
S  b (type 3 production)
S   (type 3 production)

So the grammar is of type 2.

(c) Type 2.

Ex.5)
(a) S  AB
S  aAb5/
B  b7Ba/

(b) S  AB
A  aAbp/
B  bqBa/

Ex.6)
Suppose language is {wcwT:w{a,b}*} then pda is

(0, a, x, push (a), 0), (0, b, x, push (b), 0),


(0, a, a, push (a), 0), (0, b, a, push (b), 0),
(0, a, b, push (a), 0), (0, b, b, push (b), 0),
(0, c, a, nop, 1), (0, c, b, nop, 1),
(0, c, x, nop, 1), (1, a, a, pop, 1),
(1, b, b, pop, 1), (1, , x, nop, 2),

Ex.7)
Language is {wwT:w{a,b}*}. Similarly as Ex 6.

4.7 FURTHER READINGS


1. Elements of the Theory of Computation, H.R. Lewis & C.H.Papadimitriou, PHI,
(1981).
2. Introduction to Automata Theory, Languages, and Computation (II Ed.), J.E.
Hopcroft, R.Motwani & J.D.Ullman, Pearson Education Asia (2001).
3. Introduction to Automata Theory, Language, and Computation, J.E. Hopcroft
and J.D. Ullman, Narosa Publishing House (1987).

75
Design Techniques-II

76

You might also like