DESIGN AND ANALYSIS OF ALGORITHMS NOTES - Fred
DESIGN AND ANALYSIS OF ALGORITHMS NOTES - Fred
DESIGN AND ANALYSIS OF ALGORITHMS NOTES - Fred
BY FREDRICK ONUNGA
Algorithm Design
The important aspects of algorithm design include creating an efficient algorithm to solve
a problem in an efficient way using minimum time and space.
To solve a problem, different approaches can be followed. Some of them can be efficient
with respect to time consumption, whereas other approaches may be memory efficient.
However, one has to keep in mind that both time consumption and memory usage cannot
be optimized simultaneously. If we require an algorithm to run in lesser time, we have to
invest in more memory and if we require an algorithm to run with lesser memory, we
need to have more time.
Problem definition
Development of a model
Specification of an Algorithm
Designing an Algorithm
Checking the correctness of an Algorithm
Analysis of an Algorithm
Implementation of an Algorithm
Program testing
Documentation
Characteristics of Algorithms
Pseudocode
Algorithm: Insertion-Sort
Input: A list L of integers of length n
Output: A sorted list L1 containing those integers present in L
Step 1: Keep a sorted list L1 which starts off empty
Step 2: Perform Step 3 for each element in the original list L
Step 3: Insert it into the correct position in the sorted list L1.
Step 4: Return the sorted list
Step 5: Stop
Here is a pseudocode which describes how the high level abstract process mentioned
above in the algorithm Insertion-Sort could be described in a more realistic way.
for i <- 1 to length(A)
x <- A[i]
j <- i
while j > 0 and A[j-1] > x
A[j] <- A[j-1]
j <- j - 1
A[j] <- x
In this tutorial, algorithms will be presented in the form of pseudocode, that is similar in
many respects to C, C++, Java, Python, and other programming languages.
Analysis of Algorithms
In this chapter, we will discuss the need for analysis of algorithms and how to choose a
better algorithm for a particular problem as one computational problem can be solved by
different algorithms.
By considering an algorithm for a specific problem, we can begin to develop pattern
recognition so that similar types of problems can be solved by the help of this algorithm.
Algorithms are often quite different from one another, though the objective of these
algorithms are the same. For example, we know that a set of numbers can be sorted using
different algorithms. Number of comparisons performed by one algorithm may vary with
others for the same input. Hence, time complexity of those algorithms may differ. At the
same time, we need to calculate the memory space required by each algorithm.
Analysis of algorithm is the process of analyzing the problem-solving capability of the
algorithm in terms of the time and size required (the size of memory for storage while
implementation). However, the main concern of analysis of algorithms is the required
time or performance. Generally, we perform the following types of analysis −
Worst-case − The maximum number of steps taken on any instance of size a.
Best-case − The minimum number of steps taken on any instance of size a.
Average case − An average number of steps taken on any instance of size a.
Amortized − A sequence of operations applied to the input of size a averaged over
time.
To solve a problem, we need to consider time as well as space complexity as the program
may run on a system where memory is limited but adequate space is available or may be
vice-versa. In this context, if we compare bubble sort and merge sort. Bubble sort does
not require additional memory, but merge sort requires additional space. Though time
complexity of bubble sort is higher compared to merge sort, we may need to apply bubble
sort if the program needs to run in an environment, where memory is very limited.
2. Methodology of Analysis
To measure resource consumption of an algorithm, different strategies are used as
discussed in this chapter.
Asymptotic Analysis
Amortized analysis is generally used for certain algorithms where a sequence of similar
operations are performed.
Amortized analysis provides a bound on the actual cost of the entire sequence,
instead of bounding the cost of sequence of operations separately.
Amortized analysis differs from average-case analysis; probability is not involved in
amortized analysis. Amortized analysis guarantees the average performance of
each operation in the worst case.
It is not just a tool for analysis, it’s a way of thinking about the design, since designing and
analysis are closely related.
Aggregate Method
The aggregate method gives a global view of a problem. In this method, if n operations
takes worst-case time T(n) in total. Then the amortized cost of each operation is T(n)/n.
Though different operations may take different time, in this method varying cost is
neglected.
Accounting Method
In this method, different charges are assigned to different operations according to their
actual cost. If the amortized cost of an operation exceeds its actual cost, the difference is
assigned to the object as credit. This credit helps to pay for later operations for which the
amortized cost less than actual cost.
If the actual cost and the amortized cost of ith operation are cici and cl^cl^, then
∑i=1ncl^⩾∑i=1nci∑i=1ncl^⩾∑i=1nci
Potential Method
This method represents the prepaid work as potential energy, instead of considering
prepaid work as credit. This energy can be released to pay for future operations.
If we perform n operations starting with an initial data structure D0. Let us consider, ci as
the actual cost and Di as data structure of ith operation. The potential function Ф maps to a
real number Ф(Di), the associated potential of Di. The amortized cost cl^cl^ can be defined
by
cl^=ci+Φ(Di)−Φ(Di−1)cl^=ci+Φ(Di)−Φ(Di−1)
Time Complexity
It’s a function describing the amount of time required to run an algorithm in terms of the
size of the input. "Time" can mean the number of memory accesses performed, the
number of comparisons between integers, the number of times some inner loop is
executed, or some other natural unit related to the amount of real time the algorithm will
take.
Space Complexity
It’s a function describing the amount of memory an algorithm takes in terms of the size of
input to the algorithm. We often speak of "extra" memory needed, not counting the
memory needed to store the input itself. Again, we use natural (but fixed-length) units to
measure this.
Space complexity is sometimes ignored because the space used is minimal and/or obvious,
however sometimes it becomes as important an issue as time.
Asymptotic Notations
Execution time of an algorithm depends on the instruction set, processor speed, disk I/O
speed, etc. Hence, we estimate the efficiency of an algorithm asymptotically.
Time function of an algorithm is represented by T(n), where n is the input size.
Different types of asymptotic notations are used to represent the complexity of an
algorithm. Following asymptotic notations are used to calculate the running time
complexity of an algorithm.
O − Big Oh
Ω − Big omega
θ − Big theta
o − Little Oh
ω − Little omega
‘O’ (Big Oh) is the most commonly used notation. A function f(n) can be represented is the
order of g(n) that is O(g(n)), if there exists a value of positive integer n as n0 and a
positive constant c such that −
f(n)⩽c.g(n)f(n)⩽c.g(n) for n>n0n>n0 in all case
Hence, function g(n) is an upper bound for function f(n), as g(n) grows faster than f(n).
Example
Let us consider a given function, f(n)=4.n3+10.n2+5.n+1f(n)=4.n3+10.n2+5.n+1
Considering g(n)=n3g(n)=n3,
f(n)⩽5.g(n)f(n)⩽5.g(n) for all the values of n>2n>2
Hence, the complexity of f(n) can be represented as O(g(n))O(g(n)), i.e. O(n3)O(n3)
Example
Let us consider a given function, f(n)=4.n3+10.n2+5.n+1f(n)=4.n3+10.n2+5.n+1.
Considering g(n)=n3g(n)=n3, f(n)⩾4.g(n)f(n)⩾4.g(n) for all the values of n>0n>0.
Hence, the complexity of f(n) can be represented as Ω(g(n))Ω(g(n)), i.e. Ω(n3)Ω(n3)
O - Notation
Example
Let us consider the same function, f(n)=4.n3+10.n2+5.n+1f(n)=4.n3+10.n2+5.n+1
Considering g(n)=n4g(n)=n4,
limn→∞(4.n3+10.n2+5.n+1n4)=0limn→∞(4.n3+10.n2+5.n+1n4)=0
ω – Notation
Example
Let us consider same function, f(n)=4.n3+10.n2+5.n+1f(n)=4.n3+10.n2+5.n+1
Considering g(n)=n2g(n)=n2,
limn→∞(4.n3+10.n2+5.n+1n2)=∞limn→∞(4.n3+10.n2+5.n+1n2)=∞
Apriori analysis means, analysis is performed prior to running it on a specific system. This
analysis is a stage where a function is defined using some theoretical model. Hence, we
determine the time and space complexity of an algorithm by just looking at the algorithm
rather than running it on a particular system with a different memory, processor, and
compiler.
Apostiari analysis of an algorithm means we perform analysis of an algorithm only after
running it on a system. It directly depends on the system and changes from system to
system.
In an industry, we cannot perform Apostiari analysis as the software is generally made for
an anonymous user, which runs it on a system different from those present in the
industry.
In Apriori, it is the reason that we use asymptotic notations to determine time and space
complexity as they change from computer to computer; however, asymptotically they are
the same.
3. Design and Analysis Space Complexities
Space complexity shares many of the features of time complexity and serves as a further
way of classifying problems according to their computational difficulties.
Definition
Let M be a deterministic Turing machine (TM) that halts on all inputs. The space
complexity of M is the function f:N→Nf:N→N, where f(n) is the maximum number of cells
of tape and M scans any input of length M. If the space complexity of M is f(n), we can say
that M runs in space f(n).
We estimate the space complexity of Turing machine by using asymptotic notation.
Let f:N→R+f:N→R+ be a function. The space complexity classes can be defined as follows −
SPACE = {L | L is a language decided by an O(f(n)) space deterministic TM}
SPACE = {L | L is a language decided by an O(f(n)) space non-deterministic TM}
PSPACE is the class of languages that are decidable in polynomial space on a deterministic
Turing machine.
In other words, PSPACE = Uk SPACE (nk)
Savitch’s Theorem
One of the earliest theorem related to space complexity is Savitch’s theorem. According to
this theorem, a deterministic machine can simulate non-deterministic machines by using a
small amount of space.
For time complexity, such a simulation seems to require an exponential increase in time.
For space complexity, this theorem shows that any non-deterministic Turing machine that
uses f(n) space can be converted to a deterministic TM that uses f2(n) space.
Hence, Savitch’s theorem states that, for any function, f:N→R+f:N→R+,
where f(n)⩾nf(n)⩾n
NSPACE(f(n)) ⊆ SPACE(f(n))
Many algorithms are recursive in nature to solve a given problem recursively dealing with
sub-problems.
In divide and conquer approach, a problem is divided into smaller problems, then the
smaller problems are solved independently, and finally the solutions of smaller problems
are combined into a solution for the large problem.
Divide the problem into a number of sub-problems that are smaller instances of
the same problem.
Conquer the sub-problems by solving them recursively. If they are small enough,
solve the sub-problems as base cases.
Combine the solutions to the sub-problems into the solution for the original
problem.
In this approach, most of the algorithms are designed using recursion, hence memory
management is very high. For recursive function stack is used, where function state needs
to be stored.
Following are some problems, which are solved using divide and conquer approach.
Problem Statement
The Max-Min Problem in algorithm analysis is finding the maximum and minimum value in
an array.
Solution
To find the maximum and minimum numbers in a given array numbers[] of size n, the
following algorithm can be used. First we are representing the naive method and then we
will present divide and conquer approach.
Naïve Method
Naïve method is a basic method to solve any problem. In this method, the maximum and
minimum number can be found separately. To find the maximum and minimum numbers,
the following straightforward algorithm can be used.
for i = 2 to n do
if numbers[i] > max then
max := numbers[i]
if numbers[i] < min then
min := numbers[i]
return (max, min)
Analysis
The number of comparisons can be reduced using the divide and conquer approach.
Following is the technique.
In this approach, the array is divided into two halves. Then using recursive approach
maximum and minimum numbers in each halves are found. Later, return the maximum of
two maxima of each half and the minimum of two minima of each half.
Analysis
If T(n) represents the numbers, then the recurrence relation can be represented as
T(n)=⎧⎩⎨⎪⎪T(⌊n2⌋)+T(⌈n2⌉)+210forn>2forn=2forn=1
Let us assume that n is in the form of power of 2. Hence, n = 2k where k is height of the
recursion tree.
So,
T(n)=2.T(n2)+2=2.(2.T(n4)+2)+2.....=3n2−2
Compared to Naïve method, in divide and conquer approach, the number of comparisons is less.
However, using the asymptotic notation both of the approaches are represented by O(n).
3. Merge Sort
Problem Statement
Solution
In this algorithm, the numbers are stored in an array numbers[]. Here, p and q represents
the start and end index of a sub-array.
Analysis
T(n)={c2xT(n2)+dxnifn⩽1otherwise
T(n)=2iT(n2i)+i.d.n
As, i=logn,T(n)=2lognT(n2logn)+logn.d.n
=c.n+d.n.logn
Therefore, T(n)=O(nlogn)
Example
In the following example, we have shown Merge-Sort algorithm step by step. First, every
iteration array is divided into two sub-arrays, until the sub-array contains only one
element. When these sub-arrays cannot be divided further, then merge operations are
performed.
4. Binary Search
Problem Statement
Binary search can be performed on a sorted array. In this approach, the index of an element
x is determined if the element belongs to the list of elements. If the array is unsorted, linear
search is used to determine the position.
Solution
In this algorithm, we want to find whether element x belongs to a set of numbers stored in
an array numbers[]. Where l and r represent the left and right index of a sub-array in
which searching operation should be performed.
Algorithm: Binary-Search(numbers[], x, l, r)
if l = r then
return l
else
m := ⌊(l + r) / 2⌋
if x ≤ numbers[m] then
return Binary-Search(numbers[], x, l, m)
else
return Binary-Search(numbers[], x, m+1, r)
Analysis
Linear search runs in O(n) time. Whereas binary search produces the result in O(log n)
time
Hence,
T(n)={0T(n2)+1ifn=1otherwise
time.
Example
In this example, we are going to search element 63.
5. Strassen’s Matrix Multiplication
Problem Statement
Let us consider two matrices X and Y. We want to calculate the resultant matrix Z by
multiplying X and Y.
Naïve Method
First, we will discuss naïve method and its complexity. Here, we are calculating Z = X × Y.
Using Naïve method, two matrices (X and Y) can be multiplied if the order of these matrices
are p × q and q × r. Following is the algorithm.
Complexity
Here, we assume that integer operations take O(1) time. There are three for loops in this
algorithm and one is nested in other. Hence, the algorithm takes O(n3) time to execute.
In this context, using Strassen’s Matrix multiplication algorithm, the time consumption can
be improved a little bit.
Z=[IKJL]
M1:=(A+C)×(E+F)
M2:=(B+D)×(G+H)
M3:=(A−D)×(E+H)
M4:=A×(F−H)
M5:=(C+D)×(E)
M6:=(A+B)×(H)
M7:=D×(G−E)
Then,
I:=M2+M3−M6−M7
J:=M4+M6
K:=M5+M7
L:=M1−M3−M4−M5
Analysis
T(n)={c7xT(n2)+dxn2ifn=1otherwise
6. Greedy Method
Among all the algorithmic approaches, the simplest and straightforward approach is the
Greedy method. In this approach, the decision is taken on the basis of current available
information without worrying about the effect of the current decision in future.
Greedy algorithms build a solution part by part, choosing the next part in such a way, that it
gives an immediate benefit. This approach never reconsiders the choices taken previously.
This approach is mainly used to solve optimization problems. Greedy method is easy to
implement and quite efficient in most of the cases. Hence, we can say that Greedy algorithm
is an algorithmic paradigm based on heuristic that follows local optimal choice at each step
with the hope of finding global optimal solution.
In many problems, it does not produce an optimal solution though it gives an approximate
(near optimal) solution in a reasonable time.
Areas of Application
Finding the shortest path between two vertices using Dijkstra’s algorithm.
Finding the minimal spanning tree in a graph using Prim’s /Kruskal’s algorithm, etc.
In many problems, Greedy algorithm fails to find an optimal solution, moreover it may
produce a worst solution. Problems like Travelling Salesman and Knapsack cannot be
solved using this approach.
7. Fractional Knapsack
The Greedy algorithm could be understood very well with a well-known problem referred
to as Knapsack problem. Although the same problem could be solved by employing other
algorithmic approaches, Greedy approach solves Fractional Knapsack problem reasonably
in a good time. Let us discuss the Knapsack problem in detail.
Knapsack Problem
Given a set of items, each with a weight and a value, determine a subset of items to include
in a collection so that the total weight is less than or equal to a given limit and the total
value is as large as possible.
Applications
In many cases of resource allocation along with some constraint, the problem can be
derived in a similar way of Knapsack problem. Following is a set of example.
Problem Scenario
A thief is robbing a store and can carry a maximal weight of W into his knapsack. There are
n items available in the store and weight of ith item is wi and its profit is pi. What items
should the thief take?
In this context, the items should be selected in such a way that the thief will carry those
items for which he will gain maximum profit. Hence, the objective of the thief is to
maximize the profit.
Fractional Knapsack
Knapsack
Fractional Knapsack
In this case, items can be broken into smaller pieces, hence the thief can select fractions of
items.
According to the problem statement,
and
In this version of Knapsack problem, items can be broken into smaller pieces. So, the thief
may take only a fraction xi of ith item.
0⩽xi⩽1
maximize∑n=1n(xi.pi)
subject to constraint,
∑n=1n(xi.wi)⩽W
It is clear that an optimal solution must fill the knapsack exactly, otherwise we could add a
fraction of one of the remaining items and increase the overall profit.
∑n=1n(xi.wi)=W
In this context, first we need to sort those items according to the value of piwi
Analysis
If the provided items are already sorted into a decreasing order of piwi
, then the whileloop takes a time in O(n); Therefore, the total time including the sort is in
O(n logn).
Example
Let us consider that the capacity of the knapsack W = 60 and the list of provided items are
shown in the following table −
Item A B C D
Weight 40 10 20 24
Ratio (piwi)
7 10 6 5
Item B A C D
Weight 10 40 20 24
Ratio (piwi)
10 7 6 5
Solution
. First all of B is chosen as weight of B is less than the capacity of the knapsack. Next, item A
is chosen, as the available capacity of the knapsack is greater than the weight of A. Now, C is
chosen as the next item. However, the whole item cannot be chosen as the remaining
capacity of the knapsack is less than the weight of C.
Now, the capacity of the Knapsack is equal to the selected items. Hence, no more item can
be selected.
And the total profit is 100 + 280 + 120 * (10/20) = 380 + 60 = 440
This is the optimal solution. We cannot gain more profit selecting any different
combination of items.
Problem Statement
In job sequencing problem, the objective is to find a sequence of jobs, which is completed
within their deadlines and gives maximum profit.
Solution
Let us consider, a set of n given jobs which are associated with deadlines and profit is
earned, if a job is completed by its deadline. These jobs need to be ordered in such a way
that there is maximum profit.
It may happen that all of the given jobs may not be completed within their deadlines.
Assume, deadline of ith job Ji is di and the profit received from this job is pi. Hence, the
optimal solution of this algorithm is a feasible solution with maximum profit.
Thus, D(i)>0
for 1⩽i⩽n
Analysis
In this algorithm, we are using two loops, one is within another. Hence, the complexity of
this algorithm is O(n2)
Example
Let us consider a set of given jobs as shown in the following table. We have to find a
sequence of jobs, which will be completed within their deadlines and will give maximum
profit. Each job is associated with a deadline and profit.
Job J1 J2 J3 J4 J5
Deadline 2 1 3 2 1
Profit 60 100 20 40 20
Solution
To solve this problem, the given jobs are sorted according to their profit in a descending
order. Hence, after sorting, the jobs are ordered as shown in the following table.
Job J2 J1 J4 J3 J5
Deadline 1 2 2 3 1
Profit 100 60 40 20 20
From this set of jobs, first we select J2, as it can be completed within its deadline and
contributes maximum profit.
Thus, the solution is the sequence of jobs (J2, J1, J3), which are being executed within their
deadline and gives maximum profit.
Merge a set of sorted files of different length into a single sorted file. We need to find an
optimal solution, where the resultant file will be generated in minimum time.
If the number of sorted files are given, there are many ways to merge them into a single
sorted file. This merge can be performed pair wise. Hence, this type of merging is called as
2-way merge patterns.
As, different pairings require different amounts of time, in this strategy we want to
determine an optimal way of merging many files together. At each step, two shortest
sequences are merged.
To merge a p-record file and a q-record file requires possibly p + q record moves, the
obvious choice being, merge the two smallest files together at each step.
Two-way merge patterns can be represented by binary merge trees. Let us consider a set of
n sorted files {f1, f2, f3, …, fn}. Initially, each element of this is considered as a single node
binary tree. To find this optimal solution, the following algorithm is used.
At the end of this algorithm, the weight of the root node represents the optimal cost.
Example
Let us consider the given files, f1, f2, f3, f4 and f5 with 20, 30, 10, 5 and 30 number of elements
respectively.
50 + 60 + 65 + 95 = 270
Sorting the numbers according to their size in an ascending order, we get the following
sequence −
15 + 35 + 65 + 95 = 210
In this context, we are now going to solve the problem using this algorithm.
Initial Set
Step-1
Step-2
Step-3
Step-4
10.Dynamic Programming
Two main properties of a problem suggest that the given problem can be solved using
Dynamic Programming. These properties are overlapping sub-problems and optimal
substructure.
Overlapping Sub-Problems
Optimal Sub-Structure
A given problem has Optimal Substructure Property, if the optimal solution of the given
problem can be obtained using optimal solutions of its sub-problems.
For example, the Shortest Path problem has the following optimal substructure property −
If a node x lies in the shortest path from a source node u to destination node v, then the
shortest path from u to v is the combination of the shortest path from u to x, and the
shortest path from x to v.
The standard All Pair Shortest Path algorithms like Floyd-Warshall and Bellman-Ford are
typical examples of Dynamic Programming.
In this notes, earlier we have discussed Fractional Knapsack problem using Greedy
approach. We have shown that Greedy approach gives an optimal solution for Fractional
Knapsack. However, this section will cover 0-1 Knapsack problem and its analysis.
In 0-1 Knapsack, items cannot be broken which means the thief should take the item as a
whole or should leave it. This is reason behind calling it as 0-1 Knapsack.
Hence, in case of 0-1 Knapsack, the value of xi can be either 0 or 1, where other constraints
remain the same.
0-1 Knapsack cannot be solved by Greedy approach. Greedy approach does not ensure an
optimal solution. In many instances, Greedy approach may give an optimal solution.
Example-1
Let us consider that the capacity of the knapsack is W = 25 and the items are as shown in
the following table.
Item A B C D
Profit 24 18 18 10
Weight 24 10 10 7
Without considering the profit per unit weight (pi/wi), if we apply Greedy approach to
solve this problem, first item A will be selected as it will contribute maximum profit among
all the elements.
After selecting item A, no more item will be selected. Hence, for this given set of items total
profit is 24. Whereas, the optimal solution can be achieved by selecting items, B and C,
where the total profit is 18 + 18 = 36.
Example-2
Instead of selecting the items based on the overall benefit, in this example the items are
selected based on ratio pi/wi. Let us consider that the capacity of the knapsack is W = 60
and the items are as shown in the following table.
Item A B C
Price 100 280 120
Weight 10 40 20
Ratio 10 7 6
Using the Greedy approach, first item A is selected. Then, the next item B is chosen. Hence,
the total profit is 100 + 280 = 380. However, the optimal solution of this instance can be
achieved by selecting items, B and C, where the total profit is 280 + 120 = 400.
Hence, it can be concluded that Greedy approach may not give an optimal solution.
Problem Statement
A thief is robbing a store and can carry a maximal weight of W into his knapsack. There are
n items and weight of ith item is wi and the profit of selecting this item is pi. What items
should the thief take?
Dynamic-Programming Approach
Let i be the highest-numbered item in an optimal solution S for W dollars. Then S' = S - {i} is
an optimal solution for W - wi dollars and the value to the solution S is Vi plus the value of
the sub-problem.
We can express this fact in the following formula: define c[i, w] to be the solution for items
1,2, … , i and the maximum weight w.
The two sequences v = <v1, v2, …, vn> and w = <w1, w2, …, wn>
Dynamic-0-1-knapsack (v, w, n, W)
for w = 0 to W do
c[0, w] = 0
for i = 1 to n do
c[i, 0] = 0
for w = 1 to W do
if wi ≤ w then
if vi + c[i-1, w-wi] then
c[i, w] = vi + c[i-1, w-wi]
else c[i, w] = c[i-1, w]
else
c[i, w] = c[i-1, w]
The set of items to take can be deduced from the table, starting at c[n, w] and tracing
backwards where the optimal values came from.
If c[i, w] = c[i-1, w], then item i is not part of the solution, and we continue tracing with c[i-
1, w]. Otherwise, item i is part of the solution, and we continue tracing with c[i-1, w-W].
Analysis
This algorithm takes θ(n, w) times as table c has (n + 1).(w + 1) entries, where each entry
requires θ(1) time to compute.
12.The longest Common Subsequence
The longest common subsequence problem is finding the longest sequence which exists in
both the given strings.
Subsequence
A sequence Z = <z1, z2, z3, z4, …,zm> over S is called a subsequence of S, if and only if it can be
derived from S deletion of some elements.
Common Subsequence
Suppose, X and Y are two sequences over a finite set of elements. We can say that Z is a
common subsequence of X and Y, if Z is a subsequence of both X and Y.
If a set of sequences are given, the longest common subsequence problem is to find a
common subsequence of all the sequences that is of maximal length.
The longest common subsequence problem is a classic computer science problem, the basis
of data comparison programs such as the diff-utility, and has applications in bioinformatics.
It is also widely used by revision control systems, such as SVN and Git, for reconciling
multiple changes made to a revision-controlled collection of files.
Naïve Method
Let X be a sequence of length m and Y a sequence of length n. Check for every subsequence
of X whether it is a subsequence of Y, and return the longest common subsequence found.
Dynamic Programming
Let X = < x1, x2, x3,…, xm > and Y = < y1, y2, y3,…, yn > be the sequences. To compute the length of
an element the following algorithm is used.
In this procedure, table C[m, n] is computed in row major order and another table B[m,n]
is computed to construct optimal solution.
Algorithm: LCS-Length-Table-Formulation (X, Y)
m := length(X)
n := length(Y)
for i = 1 to m do
C[i, 0] := 0
for j = 1 to n do
C[0, j] := 0
for i = 1 to m do
for j = 1 to n do
if xi = yj
C[i, j] := C[i - 1, j - 1] + 1
B[i, j] := ‘D’
else
if C[i -1, j] ≥ C[i, j -1]
C[i, j] := C[i - 1, j] + 1
B[i, j] := ‘U’
else
C[i, j] := C[i, j - 1]
B[i, j] := ‘L’
return C and B
Algorithm: Print-LCS (B, X, i, j)
if i = 0 and j = 0
return
if B[i, j] = ‘D’
Print-LCS(B, X, i-1, j-1)
Print(xi)
else if B[i, j] = ‘U’
Print-LCS(B, X, i-1, j)
else
Print-LCS(B, X, i, j-1)
Analysis
To populate the table, the outer for loop iterates m times and the inner for loop iterates n
times. Hence, the complexity of the algorithm is O(m, n), where m and n are the length of
two strings.
Example
In this example, we have two strings X = BACDB and Y = BDCB to find the longest common
subsequence.
Next Page