DAA Lab
DAA Lab
DAA Lab
Prepared By
Vijay Prakash,
Asst. Professor
Department of Computer Science and Engineering
GCET
L T P C
19 CSCSPC21 ANALYSIS OF ALGORITHMS
3 0 0 3
COURSE OBJECTIVES:
To introduce students the advanced methods of designing and analyzing algorithms.
To enable the students to choose appropriate algorithms and use it for a specific problem.
To familiarize students with basic paradigms and data structures used to solve advanced
algorithmic problems.
To understand different classes of problems concerning their computation difficulties.
To introduce the students to get familiarity in recent developments in the area of algorithmic
design.
Sorting: Review of various sorting algorithms, topological sorting, Graph: Definitions and Elementary
Algorithms: Shortest path by BFS, Shortest path in edge-weighted case (Dijkstra‘s), depth-first search
and computation of strongly connected components, emphasis on correctness proof of the algorithm and
time/space analysis, example of amortized analysis.
Shortest path in Graphs: Floyd- Warshall algorithm and introduction to dynamic programming
paradigm. More examples of dynamic programming. Modulo Representation of integers/polynomials:
Chinese Remainder theorem, Conversion between base-representation and modulo-representation.
Page2
Extension of polynomials. Application: Interpolation problem. Discrete Fourier Transform (DFT): In
complex field, DFT in modulo ring. Fast Fourier Transform algorithm. Schonhage-Strassen Integer
Multiplication algorithm.
Linear Programming: Geometry of the feasibility region and Simplex algorithm. NP-completeness:
Examples, proof of NP-hardness and NP-Completeness. One or more of the following topics based on
time and interest: Approximation algorithms, Randomized Algorithms, Interior Point Method,
Advanced Number Theoretic Algorithm. Recent Trends in problem solving paradigms using recent
searching and sorting techniques by applying recently proposed data structures.
REFERENCES:
1. Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, Clifford Strin ―Introduction to
Algorithms‖, Second Edition, MIT Press, MCGraw Hill, 2001.
2. Alfred V.Aho, John E. Hopcroft, Jeffrey d.Ullman, ―The Design and Analysis of Computer
Algorithms‖, Addison-Wesley Publishing Company, 1974.
3. Jon Kleinberg, Eva Tardos, ―Algorithm Design‖ ,Pearson Addision Wesley, 2005.
COURSE OUTCOMES:
After completion of course, students would be able to:
Analyze the complexity/performance of different algorithms.
Determine the appropriate data structure for solving a particular set of problems.
Categorize the different problems in various classes according to their complexity.
Students should have an insight of recent activities in the field of the advanced data structure.
Page3
UNIT-1
PAGE
S.NO TOPICS
NUMBER
1 Sorting 5
2 Review of Various Sorting Algorithms 16
3 Topological Sorting 19
4 Graph: Definitions and Elementary – Algorithms 21
5 Graph: Definitions Shortest path by BFS 27
6 Shortest path in Edge-Weighted Case (Dijkstra‘s) 29
7 Depth-First Search 37
8 Computation of Strongly Connected Components 38
Emphasis on Correctness Proof of the Algorithm and Time/Space
9 40
Analysis
10 Example of Amortized Analysis 42
Page4
SORTING:
Sorting Algorithm: It is an algorithm made up of a series of instructions that takes an array as input,
and outputs a sorted array. There are many sorting algorithms, such as: Selection sort, Bubble Sort,
Heap Sort, Quick sort, Radix sort, Counting sort, Bucket sort, Shell sort, Comb sort.
Simple sorts
Two of the simplest sorts are insertion sort and selection sort, both of which are efficient on
small data, due to low overhead, but not efficient on large data. Insertion sort is generally faster than
selection sort in practice, due to fewer comparisons and good performance on almost-sorted data, and
thus is preferred in practice, but selection sort uses fewer writes, and thus is used when write
performance is a limiting factor.
Insertion sort
Insertion sort is a simple sorting algorithm that is relatively efficient for small lists and mostly
sorted lists, and is often used as part of more sophisticated algorithms. It works by taking elements from
the list one by one and inserting them in their correct position into a new sorted list. In arrays, the new
list and the remaining elements can share the array's space, but insertion is expensive, requiring shifting
all following elements over by one. Shell sort is a variant of insertion sort that is more efficient for
larger lists.
Page5
Selection sort
Selection sort is an in-place comparison sort. It has O(n2) complexity, making it inefficient on
large lists and generally performs worse than the similar insertion sort. Selection sort is noted for its
simplicity and also has performance advantages over more complicated algorithms in certain situations.
The algorithm finds the minimum value, swaps it with the value in the first position, and repeats these
steps for the remainder of the list. It does no more than n swaps, and thus is useful where swapping is
very expensive.
Efficient sorts
Practical general sorting algorithms are almost always based on an algorithm with average time
complexity (and generally worst-case complexity) O(n log n), of which the most common are heap sort,
merge sort and quick sort. Each has advantages and drawbacks, with the most significant being that
simple implementation of merge sort uses O(n) additional space and simple implementation of quick
sort has O(n2) worst-case complexity.
Merge sort
Merge sort takes advantage of the ease of merging already sorted lists into a new sorted list. It
starts by comparing every two elements (i.e., 1 with 2, then 3 with 4...) and swapping them if the first
should come after the second. It then merges each of the resulting lists of two into lists of four, then
merges those lists of four, and so on; until at last two lists are merged into the final sorted list. Of the
algorithms described here, this is the first that scales well to very large lists, because its worst-case
running time is O(n log n). It is also easily applied to lists, not only arrays, as it only requires sequential
access, not random access. However, it has additional O(n) space complexity and involves a large
number of copies in simple implementations.
Heap sort
Heap sort is a much more efficient version of selection sort. It also works by determining the
largest (or smallest) element of the list, placing that at the end (or beginning) of the list, then continuing
with the rest of the list, but accomplishes this task efficiently by using a data structure called a heap, a
special type of binary tree. Once the data list has been made into a heap, the root node is guaranteed to
be the largest (or smallest) element. When it is removed and placed at the end of the list, the heap is
rearranged so the largest element remaining moves to the root. Using the heap, finding the next largest
element takes O(log n) time, instead of O(n) for a linear scan as in simple selection sort. This allows
Heap sort to run in O(n log n) time and this is also the worst case complexity.
Quick sort
Quick sort is a divide and conquer algorithm which relies on a partition operation: to partition an
array, an element called a pivot is selected. All elements smaller than the pivots are moved before it and
all greater elements are moved after it. This can be done efficiently in linear time and in-place. The
lesser and greater sub-lists are then recursively sorted. This yields average time complexity of
Page6
O(n log n), with low overhead, and thus this is a popular algorithm. The most complex issue in quick
sort is thus choosing a good pivot element, as consistently poor choices of pivots can result in
drastically slower O(n2) performance, but good choice of pivots yields O(n log n) performance, which is
asymptotically optimal.
Shell sort
A Shell sort is different from bubble sort in that it moves elements to numerous swapping
positions. It improves upon insertion sort by moving out of order elements more than one position at a
time. The concept behind Shell sort is that insertion sort performs in time, where k is the greatest
distance between two out-of-place elements. This means that generally, they perform in O(n2), but for
data that is mostly sorted, with only a few elements out of place, they perform faster.
The worst-case time complexity of Shell sort is an open problem and depends on the gap
sequence used, with known complexities ranging from O(n2) to O(n4/3) and Θ(n log2 n). This, combined
with the fact that Shell sort is in-place, only needs a relatively small amount of code and does not
require use of the call stack, makes it useful in situations where memory is at a premium, such as
in embedded systems and operating system kernels.
Bubble sort
A bubble sort is a sorting algorithm that continuously steps through a list, swapping items until
they appear in the correct order. Bubble sort is a simple sorting algorithm. The algorithm starts at the
beginning of the data set. It compares the first two elements and if the first is greater than the second, it
swaps them. It continues doing this for each pair of adjacent elements to the end of the data set. It then
starts again with the first two elements, repeating until no swaps have occurred on the last pass. This
algorithm's average time and worst-case performance is O(n2), so it is rarely used to sort large,
unordered data sets. Bubble sort can be used to sort a small number of items (where its asymptotic
inefficiency is not a high penalty). Bubble sort can also be used efficiently on a list of any length that is
nearly sorted (that is, the elements are not significantly out of place). For example, if any numbers of
elements are out of place by only one position (e.g. 0123546789 and 1032547698), bubble sort's
exchange will get them in order on the first pass, the second pass will find all elements in order, so the
sort will take only 2n time.
Comb sort
Comb sort is a relatively simple sorting algorithm based on bubble sort. The basic idea is to
eliminate turtles, or small values near the end of the list, since in a bubble sort these slow the sorting
down tremendously. It accomplishes this by initially swapping elements that are a certain distance from
one another in the array, rather than only swapping elements if they are adjacent to one another, and
Page7
then shrinking the chosen distance until it is operating as a normal bubble sort. Thus, if Shell sort can be
thought of as a generalized version of insertion sort that swaps elements spaced a certain distance away
from one another, comb sort can be thought of as the same generalization applied to bubble sort.
Distribution sort
Distribution sort refers to any sorting algorithm where data is distributed from their input to
multiple intermediate structures which are then gathered and placed on the output. For example,
both bucket sort and flash sort are distribution based sorting algorithms. Distribution sorting algorithms
can be used on a single processor, or they can be a distributed algorithm, where individual subsets are
separately sorted on different processors, then combined. This allows external sorting of data too large
to fit into a single computer's memory.
Counting sort
Counting sort is applicable when each input is known to belong to a particular set, S, of
possibilities. The algorithm runs in O(|S| + n) time and O(|S|) memory where n is the length of the input.
It works by creating an integer array of size |S| and using the ith bin to count the occurrences of the ith
member of S in the input. Each input is then counted by incrementing the value of its corresponding bin.
Afterward, the counting array is looped through to arrange all of the inputs in order. This sorting
algorithm often cannot be used because S needs to be reasonably small for the algorithm to be efficient,
but it is extremely fast and demonstrates great asymptotic behavior as n increases. It also can be
modified to provide stable behavior.
Bucket sort
Bucket sort is a divide and conquer sorting algorithm that generalizes counting sort by
partitioning an array into a finite number of buckets. Each bucket is then sorted individually, either
using a different sorting algorithm, or by recursively applying the bucket sorting algorithm. A bucket
sort works best when the elements of the data set are evenly distributed across all buckets.
Radix sort
Radix sort is an algorithm that sorts numbers by processing individual digits. n numbers
consisting of k digits each are sorted in O(n · k) time. Radix sort can process digits of each number
either starting from the least significant digit (LSD) or starting from the most significant digit (MSD).
The LSD algorithm first sorts the list by the least significant digit while preserving their relative order
using a stable sort. Then it sorts them by the next digit and so on from the least significant to the most
significant, ending up with a sorted list. While the LSD radix sort requires the use of a stable sort, the
MSD radix sort algorithm does not (unless stable sorting is desired). In-place MSD radix sort is not
stable. It is common for the counting sort algorithm to be used internally by the radix sort.
A hybrid sorting approach, such as using insertion sort for small bins improves performance of radix
sort significantly.
Page8
Explanation of Sorting Algorithms:
INSERTION SORT:
Insertion sort is an in-place sorting algorithm.
It uses no auxiliary data structures while sorting.
It is inspired from the way in which we sort playing cards.
Page9
Insertion Sort Algorithm:
Let A be an array with n elements. The insertion sort algorithm used for sorting is as follows-
Insertion Sort Example:
Consider the following elements are to be sorted in ascending order: 6, 2, 11, 7, 5
Step-02: For i = 4 Loop gets terminated as ‗i‘ becomes 5. The state of array
after the loops are finished
Page10
SELECTION SORT:
Selection sort is one of the easiest approaches to sorting.
It is inspired from the way in which we sort things out in day to day life.
It is an in-place sorting algorithm because it uses no auxiliary data structures while sorting.
Page11
The above Selection sort algorithm works as illustrated below
QUICK SORT:
Quick Sort is a famous sorting algorithm.
It sorts the given data items in ascending order.
It uses the idea of divide and conquer approach.
It follows a recursive algorithm.
Page12
Quick sort algorithm
Partition_Array(a , beg , end , loc)
1. Begin 14. if(not done) then
2. Set left = beg , right = end , loc = 15. While((a[loc]>= a[left])and(loc ≠ left))
beg do
3. Set done = false 16. Set left = left + 1
4. While(not done) do 17. end while
5. While((a[loc]<= a[right])and(loc ≠ 18. if(loc = left) then
right)) do 19. Set done = true
6. Set right = right - 1 20. else if(a[loc]< a[left]) then
7. end while 21. Interchange a[loc] and a[left]
8. if(loc = right) then 22. Set loc = left
9. Set done = true 23. end if
10. else if(a[loc]> a[right]) then 24. end if
11. Interchange a[loc] and a[right] 25. end while
12. Set loc = right 26. End
13. end if
Then, Quick Sort Algorithm is as follows-
Page13
Quick Sort Algorithm works in the following steps-
Initially-
Left and Loc (pivot) points to the first
element of the array.
Step-01
Right points to the last element of the array.
So to begin with, we set loc = 0, left = 0
and right = 5 as
Since loc points at left, so algorithm starts
from right and move towards left.
As a[loc] < a[right], so algorithm
Step-02
moves right one position towards left as
Now, loc = 0, left = 0 and right = 4.
Page14
Since loc points at left, so algorithm starts
from right and move towards left.
As a[loc] > a[right], so algorithm swaps a[loc]
Step-08
and a[right] and loc points at right as-
Now, loc = 3, left = 2 and right = 3.
Now,
loc, left and right points at the same element.
This indicates the termination of procedure.
The pivot element 25 is placed in its final
position.
All elements to the right side of element 25
are greater than it.
All elements to the left side of element 25 are
smaller than it.
Now, quick sort algorithm is applied on the left and right sub arrays separately in the similar manner.
Quick Sort Analysis:
To find the location of an element that splits the array into two parts, O(n) operations are
required.
This is because every element in the array is compared to the partitioning element.
After the division, each section is examined separately.
If the array is split approximately in half (which is not usually), then there will be log2n splits.
Therefore, total comparisons required are f(n) = n x log2n = O(nlog2n).
Order of Quick Sort = O(nlog2n)
Worst Case:
Quick Sort is sensitive to the order of input data.
It gives the worst performance when elements are already in the ascending order.
It then divides the array into sections of 1 and (n-1) elements in each call.
Then, there are (n-1) divisions in all.
Therefore, here total comparisons required are f(n) = n x (n-1) = O(n2).
Order of Quick Sort in worst case = O(n2)
Page15
Advantages of Quick Sort:
The advantages of quick sort algorithm are-
Quick Sort is an in-place sort, so it requires no temporary memory.
Quick Sort is typically faster than other algorithms.(because its inner loop can be efficiently
implemented on most architectures)
Quick Sort tends to make excellent usage of the memory hierarchy like virtual memory or
caches.
Quick Sort can be easily parallelized due to its divide and conquer nature.
Slow Algorithms: Include Bubble Sort, Insertion Sort and Selection Sort. These are all simple Θ (n 2)
in-place sorting algorithms. Bubble Sort and Insertion Sort can be implemented as stable algorithms,
but Selection Sort cannot (without significant modifications).
Merge sort:
Merge sort is a stable Θ(n log n) sorting algorithm. The downside is that Merge Sort is the only
algorithm of the three that requires additional array storage, implying that it is not an in-place
algorithm.
Quick sort:
Widely regarded as the fastest of the fast algorithms. This algorithm is O(n log n) in the
expected case and O(n2) in the worst case. The probability that the algorithm takes asymptotically
longer (assuming that the pivot is chosen randomly) is extremely small for large n. It is an (almost) in-
place sorting algorithm but is not stable.
Page16
Heap sort:
Heap sort is based on a nice data structure, called a heap, which is a fast priority queue.
Elements can be inserted into a heap in O(log n) time and the largest item can be extracted in O(log n)
time. (It is also easy to set up a heap for extracting the smallest item.) If only we want to extract the k
largest values, a heap can allow us to do this is O(n + k log n) time. It is an in-place algorithm, but it is
not stable.
Page17
Merge sort:
Best, average and worst case time complexity: nlogn which is independent of distribution of data.
Heap sort:
Best, average and worst case time complexity: nlogn which is independent of distribution of data.
Quick sort :
It is a divide and conquer approach with recurrence relation:
T(n) = T(k) + T(n-k-1) + cn
Worst case: when the array is sorted or reverse sorted, the partition algorithm divides the array in
two sub-arrays with 0 and n-1 elements. Therefore,
T(n) =T (0) + T (n-1) +cn
Solving this we get, T(n) = O (n^2)
Best case and Average case: On an average, the partition algorithm divides the array in two sub-
arrays with equal size. Therefore,T(n) =2T (n/2) + cn
Solving this we get, T(n) = O (n log n)
Non-comparison based sorting:
In non-comparison based sorting, elements of array are not compared with each other to find the sorted
array.
Radix sort:
Best, average and worst case time complexity: nk where k is the maximum number of digits in
elements of array.
Count sort:
Best, average and worst case time complexity: n+k where k is the size of count array.
Bucket sort:
Best and average time complexity: n+k where k is the number of buckets.
Worst case time complexity: n^2 if all elements belong to same bucket.
In-place/Outplace technique:
A sorting technique is in-place if it does not use any extra memory to sort the array.
Among the comparison based techniques discussed, only merge sort is out placed technique as it
requires an extra array to merge the sorted sub-arrays.
Among the non-comparison based techniques discussed, all are out placed techniques. Counting sort
uses a counting array and bucket sort uses a hash table for sorting the array.
Online/Offline technique:
A sorting technique is considered online if it can accept new data while the procedure is ongoing
i.e. complete data is not required to start the sorting operation.
Among the comparison based techniques discussed, only Insertion Sort qualifies for this because of the
underlying algorithm it uses i.e. it processes the array (not just elements) from left to right and if new
elements are added to the right, it doesn‘t impact the ongoing operation.
Page18
Stable/Unstable technique:
A sorting technique is stable if it does not change the order of elements with the same value.
Out of comparison based techniques, bubble sort, insertion sort and merge sort are stable techniques.
Selection sort is unstable as it may change the order of elements with the same value. For example,
consider the array 4, 4, 1, 3.
In the first iteration, the minimum element found is 1 and it is swapped with 4 at 0th position.
Therefore, the order of 4 with respect to 4 at the 1st position will change. Similarly, quick sort and heap
sort are also unstable.
Out of non-comparison based techniques, Counting sort and Bucket sort are stable sorting techniques
whereas radix sort stability depends on the underlying algorithm used for sorting.
TOPOLOGICAL SORTING:
Topological Sorting or topological ordering of a directed graph is a linear ordering of its vertices
such that for every directed edge uv from vertex ‗u‘ to vertex ‗v‘, ‗u‘ comes before ‗v‘ in the ordering.
Topological Sorting for a graph is not possible if the graph is not a DAG.
It is important to note that
Topological Sorting is possible if and only if the graph is a Directed Acyclic Graph.
There may exist multiple different topological orderings for a given directed acyclic graph.
Topological Sort Example:
Consider the following directed acyclic graph- For example, a topological sorting of the
following graph is ―5 4 2 3 1 0‖. There can be
more than one topological sorting for a graph. For
example, another topological sorting of the
following graph is ―4 5 2 3 1 0‖. The first vertex
in topological sorting is always a vertex with in-
For this graph, following 4 different topological degree as 0 (a vertex with no incoming edges).
orderings are possible-
123456
123465
132456
132465
Page19
Topological Sorting Vs Depth First Search Traversal (DFS):
In DFS, we print a vertex and then recursively call DFS for its adjacent vertices. In topological
sorting, we need to print a vertex before its adjacent vertices. For example, in the given graph, the
vertex ‗5‘ should be printed before vertex ‗0‘, but unlike DFS, the vertex ‗4‘ should also be printed
before vertex ‗0‘. So topological sorting is different from DFS. For example, a DFS of the shown graph
is ―5 2 3 1 0 4‖, but it is not a topological sorting.
Page20
Applications of Topological Sort:
Few important applications of topological sort are
Scheduling jobs from the given dependencies among jobs
Instruction Scheduling
Determining the order of compilation tasks to perform in make files
Data Serialization
Page21
Self Loop: A self-loop is an edge whose end points are a single vertex. Multiple edges are two or more
edges that join the same two vertices.
Multi Graph: A graph is called simple if it has no self-loops and no multiple edges, and a multi
graph if it does have multiple edges.
Degree: The degree of a vertex v is the number of edges that connect to v.
Path: Path in a graph G = (V, E) is a sequence of vertices v1, v2, …, vk, with the property that there are
edges between vi and vi+1. We say that the path goes from v1 to vk. The sequence 6, 4, 5, 1, 2 is a
path from 6 to 2 in the graph above. A path is simple if its vertices are all different.
Cycle: A cycle is a path v1, v2, …, vk for which k > 2, the first k - 1 vertices are all different and v1 = vk.
The sequence 4, 5, 2, 3, 4 is a cycle in the graph above.A graph is connected if for every pair of
vertices u and v, there is a path from u to v.If there is a path connecting u and v,
the distance between these vertices is defined as the minimal number of edges on a path
from u to v.
Connected Component: A connected component is a sub-graph of maximum size, in which every
pair of vertices are connected by a path. Here is a graph with three connected components.
Trees: A tree is a connected simple acyclic graph. A vertex with degree 1 in a tree is called a leaf.
Directed graphs:
A directed graph or digraph G = (V, E) consists of a vertex set V and an edge set of ordered
pairs E of elements in the vertex set.
Here is a simple acyclic digraph (often called a DAG, ―directed acyclic graph‖) with seven vertices and
eight edges.
Page22
Adjacency matrix:
An adjacency matrix is a |V|x|V|-matrix of integers, representing a graph G = (V, E).
The vertices are number from 1 to |V|.
The number at position (i, j) indicates the number of edges from i to j.
Adjacency Matrix is a 2D array of size V x V where V is the number of vertices in a graph. Let the 2D
array be adj[][], a slot adj[i][j] = 1 indicates that there is an edge from vertex i to vertex j. Adjacency
matrix for undirected graph is always symmetric. Adjacency Matrix is also used to represent weighted
graphs. If adj[i][j] = w, then there is an edge from vertex i to vertex j with weight w. Here is an
undirected graph and its symmetric adjacency matrix.
The adjacency matrix representation is best suited for dense graphs, graphs in which the number of
edges is close to the maximal.
In a sparse graph, an adjacency matrix will have a large memory overhead and finding all
neighbors of a vertex will be costly.
Pros: Representation is easier to implement and follow. Removing an edge takes O(1) time. Queries
like whether there is an edge from vertex ‗u‘ to vertex ‗v‘ are efficient and can be done O(1).
Cons: Consumes more space O(V^2). Even if the graph is sparse (contains less number of edges), it
consumes the same space. Adding a vertex is O(V^2) time.
Adjacency list:
The adjacency list graph data structure is well suited for sparse graphs. It consists of an array of size |V|,
where position k in the array contains a list of all neighbors to k.
Note that the ―neighbor list‖ doesn‘t have to be an actual list. It could be any data structure
representing a set of vertices. Hash tables, arrays or linked lists are common choices.
An array of lists is used. Size of the array is equal to the number of vertices. Let the array be
array [ ]. An entry array[i] represents the list of vertices adjacent to the ith vertex. This
representation can also be used to represent a weighted graph. The weights of edges can be
represented as lists of pairs. Following is adjacency list representation of the above graph.
Page23
There are other representations also like, Incidence Matrix and Incidence List. The choice of the graph
representation is situation specific. It totally depends on the type of operations to be performed and ease
of use.
Graph Coloring
Graph coloring is a method to assign colors to the vertices of a graph so that no two adjacent vertices
have the same color. Some graph coloring problems are −
Vertex coloring − A way of coloring the vertices of a graph so that no two adjacent vertices
share the same color.
Edge Coloring − It is the method of assigning a color to each edge so that no two adjacent
edges have the same color.
Face coloring − It assigns a color to each face or region of a planar graph so that no two faces
that share a common boundary have the same color.
Chromatic Number
Chromatic number is the minimum number of colors required to color a graph. For example, the
chromatic number of the following graph is 3.
The concept of graph coloring is applied in preparing timetables, mobile radio frequency assignment,
Suduku game, register allocation and coloring of maps.
Steps for graph coloring
Set the initial value of each processor in the n-dimensional array to 1.
Now to assign a particular color to a vertex, determine whether that color is already assigned to
the adjacent vertices or not.
If a processor detects same color in the adjacent vertices, it sets its value in the array to 0.
2
After making n comparisons, if any element of the array is 1, then it is a valid coloring.
Page24
Minimal Spanning Tree
A spanning tree whose sum of weight (or length) of all its edges is less than all other possible spanning
tree of graph G is known as a minimal spanning tree or minimum cost spanning tree. The following
figure shows a weighted connected graph.
Some possible spanning trees of the above graph are shown below:
Page25
Among all the above spanning trees, figure (g) is the minimum spanning tree. The concept of
minimum cost spanning tree is applied in travelling salesman problem, designing electronic circuits,
designing efficient networks and designing efficient routing algorithms. To implement the minimum
cost-spanning tree, the following two methods are used:
Prim‘s Algorithm
Kruskal‘s Algorithm
Prim's Algorithm
Prim‘s algorithm is a greedy algorithm, which helps us find the minimum spanning tree for a weighted
undirected graph. It selects a vertex first and finds an edge with the lowest weight incident on that
vertex.
Steps for Prim’s Algorithm:
1. Select any vertex, say v1 of Graph G.
2. Select an edge, say e1 of G such that e1 = v1 v2 and v1 ≠ v2 and e1 has minimum weight among
the edges incident on v1 in graph G.
3. Now, following step 2, select the minimum weighted edge incident on v2.
4. Continue this till n–1 edges have been chosen. Here n is the number of vertices.
Kruskal's Algorithm
Kruskal‘s algorithm is a greedy algorithm, which helps us find the minimum spanning tree for a
connected weighted graph, adding increasing cost arcs at each step. It is a minimum-spanning-tree
algorithm that finds an edge of the least possible weight that connects any two trees in the forest.
Steps of Kruskal’s Algorithm
Select an edge of minimum weight; say e1 of Graph G and e1 is not a loop.
Select the next minimum weighted edge connected to e1.
Continue this till n-1 edges have been chosen. Here n is the number of vertices.
Page26
The minimum spanning tree of the given
graph
AlgorithmBFS(G, v)
Q ← new empty FIFO queue
Mark v as visited.
Q.enqueue(v)
while Q is not empty
a ← Q.dequeue()
// Perform some operation on a.
Page27
for all unvisited neighbors x of a
Mark x as visited.
Q.enqueue(x)
Before running the algorithm, all |V| vertices must be marked as not visited.
Time complexity
The time complexity of BFS can be computed as the total number of iterations performed by thefor
loop.Let E' be the set of all edges in the connected component visited by the algorithm. For each edge
{u, v} in E' the algorithm makes two for loop iteration steps: one time when the algorithm visits the
neighbors of u, and one time when it visits the neighbors of v.Hence, the time complexity is
Θ(|V| + |E'|).
Page28
used for crawlers, but the advantage with Breadth First Traversal is, depth or levels of the built tree
can be limited.
4. Social Networking Websites: In social networks, we can find people within a given distance ‗k‘
from a person using Breadth First Search till ‗k‘ levels.
5. GPS Navigation systems: Breadth First Search is used to find all neighboring locations.
6. Broadcasting in Network: In networks, a broadcasted packet follows Breadth First Search to reach
all nodes.
7. Garbage Collection: Breadth First Search is used in copying garbage collection using Cheney‘s
algorithm. Breadth First Search is preferred over Depth First Search because of better locality of
reference.
8. Cycle detection in undirected graph: In undirected graphs, either Breadth First Search or Depth
First Search can be used to detect cycle. We can use BFS to detect cycle in a directed graph also.
9. Ford–Fulkerson algorithm In Ford-Fulkerson algorithm, we can either use Breadth First or Depth
First Traversal to find the maximum flow. Breadth First Traversal is preferred as it reduces worst
case time complexity to O(VE2).
10. To test if a graph is Bipartite We can either use Breadth First or Depth First Traversal.
11. Path Finding We can either use Breadth First or Depth First Traversal to find if there is a path
between two vertices.
12. Finding all nodes within one connected component: We can either use Breadth First or Depth
First Traversal to find all nodes reachable from a given node. Many algorithms like Prim‘s Minimum
Spanning Tree and Dijkstra‘s Single Source Shortest Path use structure similar to Breadth First
Search.
DIJKSTRA’S ALGORITHM
Dijkstra‘s algorithm computes the shortest path from a vertex s, the source, to all other vertices. The
graph must have non-negative edge costs.
Dijkstra's algorithm is applicable for:
Both directed and undirected graphs
All edges must have nonnegative weights.
Graph must be connected
AlgorithmDijkstra(G, s)
for each vertex v in G
dist[v] ← ∞
Page29
prev[v] ← undefined
dist[s] ← 0
Q ← the set of all nodes in G
while Q is not empty
u ← vertex in Q with smallest distance in dist[]
Remove u from Q.
ifdist[u] = ∞
break
for each neighbor v of u
alt ← dist[u] + dist_between(u, v)
if alt <dist[v]
dist[v] ← alt
prev[v] ← u
returndist[], prev[]
Given a graph and a source vertex in the graph, find shortest paths from source to all vertices in the
given graph. Dijkstra‘s algorithm is very similar to Prim‘s algorithm for minimum spanning tree. Like
Prim‘s MST, we generate a SPT (shortest path tree) with given source as root. We maintain two sets,
one set contains vertices included in shortest path tree, and other set includes vertices not yet included
in shortest path tree. At every step of the algorithm, we find a vertex which is in the other set (set of not
yet included) and has a minimum distance from the source. Below are the detailed steps used in
Dijkstra‘s algorithm to find the shortest path from a single source vertex to all other vertices in the
givengraph.
Algorithm
Create a set sptSet (shortest path tree set) that keeps track of vertices included in shortest path tree, i.e.,
whose minimum distance from source is calculated and finalized. Initially, this set is empty.
1. Assign a distance value to all vertices in the input graph. Initialize all distance values as INFINITE.
Assign distance value as 0 for the source vertex so that it is picked first.
2. While sptSet doesn‘t include all vertices
a) Pick a vertex u which is not there in sptSet and has minimum distance value.
b) Include u to sptSet.
c) Update distance value of all adjacent vertices of u. To update the distance values, iterate through
all adjacent vertices. For every adjacent vertex v, if sum of distance value of u (from source) and
weight of edge u-v, is less than the distance value of v, then update the distance value of v.
Page30
Let us understand with the following example:
The set sptSet is initially empty and distances assigned to vertices are {0, INF, INF, INF, INF, INF,
INF, INF} where INF indicates infinite. Now pick the vertex with minimum distance value. The vertex
0 is picked, include it in sptSet. So sptSet becomes {0}. After including 0 to sptSet, update distance
values of its adjacent vertices. Adjacent vertices of 0 are 1 and 7. The distance values of 1 and 7 are
updated as 4 and 8.
Following subgraph shows vertices and their
distance values, only the vertices with finite
distance values are shown. The vertices included
in SPT are shown in green colour.
Page31
We repeat the above steps until sptSet does
include all vertices of given graph. Finally, we
get the following Shortest Path Tree (SPT).
Time complexity
To compute the time complexity we can use the same type of argument as for BFS. The main difference is
that we need to account for the cost of adding, updating and finding the minimum distances in the queue. If
we implement the queue with a heap, all of these operations can be performed in O(log |V|) time. This gives
the time complexity O((|E| + |V|)log|V|).
Page32
Disadvantage of Dijkstra's Algorithm:
1. It does a blind search, so wastes a lot of time while processing.
2. It can't handle negative edges.
3. It leads to the acyclic graph and most often cannot obtain the right shortest path.
4. We need to keep track of vertices that have been visited.
Solution:
Step1: Q =[s, t, x, y, z]
We scanned vertices one by one and find out its adjacent. Calculate the distance of each adjacent to the
source vertices.
We make a stack, which contains those vertices which are selected after computation of shortest
distance.
M = [S] Q = [t, x, y, z]
Page33
By comparing case (i) and case (ii)
Adj [s] → t = 10, y = 5
y is shortest
y is assigned in 5 = [s, y]
Page34
Step - 4 Now we will find adj [z] that are s, x
Page35
Weight from s to y is 5
Weight from s to z is 7
Weight from s to t is 8
Weight from s to x is 9
These are the shortest distance from the source's' in the given graph.
Example 2:
Page 1 Page 2
Page36
DEPTH-FIRST SEARCH:
Depth-first search (DFS) is an algorithm that visits all edges in a graph G that belong to the same
connected component as a vertex v.
Algorithm DFS(G, v)
if v is already visited
return
Mark v as visited.
// Perform some operation on v.
for all neighbors x of v
DFS(G, x)
Before running the algorithm, all |V| vertices must be marked as not visited.
For a graph Depth First Traversal (or Search) is similar to Depth First Traversal of a tree. The
only catch here is, unlike trees, graphs may contain cycles, so we may come to the same node again. To
avoid processing a node more than once, we use a boolean visited array.
For example, in the following graph, we start traversal from vertex 2. When we come to vertex
0, we look for all adjacent vertices of it. 2 is also an adjacent vertex of 0. If we don‘t mark visited
vertices, then 2 will be processed again and it will become a non-terminating process. A Depth First
Traversal of the following graph is 2, 0, 1, 3.
Time complexity
Let E' be the set of all edges in the connected component visited by the algorithm. The algorithm makes
two calls to DFS for each edge {u, v} in E': one time when the algorithm visits the neighbors of u, and
one time when it visits the neighbors of v. Hence, the time complexity of the algorithm is Θ(|V| + |E'|).
Page37
A graph has cycle if and only if we see a back edge during DFS. So we can run DFS for the graph
and check for back edges.
3. Path Finding
We can specialize the DFS algorithm to find a path between two given vertices u and z.
i) Call DFS(G, u) with u as the start vertex.
ii) Use a stack S to keep track of the path between the start vertex and the current vertex.
iii) As soon as destination vertex z is encountered, return the path as the contents of the stack
4. Topological Sorting
Topological Sorting is mainly used for scheduling jobs from the given dependencies among jobs. In
computer science, applications of this type arise in instruction scheduling, ordering of formula cell
evaluation when re-computing formula values in spreadsheets, logic synthesis, determining the order
of compilation tasks to perform in make files, data serialization and resolving symbol dependencies
in linkers.
5. To test if a graph is bipartite
We can augment either BFS or DFS when we first discover a new vertex, color it opposited its
parents and for each other edge, check it doesn‘t link two vertices of the same color. The first vertex
in any connected component can be red or black!
6. Finding Strongly Connected Components of a graph A directed graph is called strongly
connected if there is a path from each vertex in the graph to every other vertex.
7. Solving puzzles with only one solution, such as mazes. (DFS can be adapted to find all solutions to
a maze by only including nodes on the current path in the visited set.)
To find all strongly connected components in O(V+E) time using Kosaraju‘s algorithm.
Page39
EMPHASIS ON CORRECTNESS PROOF OF THE ALGORITHM
AND TIME/SPACE ANALYSIS:
When designing a completely new algorithm, a very thorough analysis of
its correctness and efficiency is needed. The last thing we would want is our solution not being
adequate for a problem it was designed to solve in the first place.
Mathematical Induction
Mathematical induction (MI) is an essential tool for proving the statement that proves an algorithm's
correctness. The general idea of MI is to prove that a statement is true for every natural number n.
Basic Example:
Problem: If we define S(n) as the sum of the first n natural numbers, for example S(3) = 3+2+1, prove
that the following formula can be applied to any n: S(n)=(n+1)∗n2S(n)=(n+1)∗n2
Let's trace our steps:
Induction Hypothesis: S(n) defined with the formula above
Induction Base: In this step we have to prove that S(1) = 1: S(1)=(1+1)∗12=22=1S(1)=(1+1)∗12=22=1
Induction Step: In this step we need to prove that if the formula applies to S(n), it also applies
to S(n+1) as follows: S(n+1)=(n+1+1)∗(n+1)2=(n+2)∗(n+1)2S(n+1)=(n+1+1)∗(n+1)2=(n+2)∗(n+1)2
This is known as an implication (a=>b), which just means that we have to prove b is correct
providing we know a is correct.
S(n+1)=S(n)+(n+1)=(n+1)∗n2+(n+1)=n2+n+2n+22S(n+1)=S(n)+(n+1)=(n+1)∗n2+(n+1)=n2+n+2n
+22
=n2+3n+22=(n+2)∗(n+1)2=n2+3n+22=(n+2)∗(n+1)2
Note that S(n+1) = S(n) + (n+1) just means we are recursively calculating the sum. Example with
literals: S(3) = S(2) + 3= S(1) + 2 + 3 = 1 + 2 + 3 = 6.
Page40
Steps for proving by induction Description
The proof consists of two steps:
The basis (base case): prove that the statement holds for the first natural number n. Usually, n = 0 or n =
1.
The inductive step: prove that, if the statement holds for some natural number n, then the statement
holds for n + 1.
Correctness –Aim: Proving the correctness of algorithms
Loop Invariants
Mathematical Induction
Time Complexity – Aim: Determining the cost of recursive algorithms
Recursion reminder
Expressing recursive algorithms as recurrences
Applying the Master Theorem
Correctness of an Algorithm:
Simply, an algorithm is correct if for any valid input it produces the result required by the
algorithm‘s specification. For example, a sorting function: sort (int[ ] in)
We specify that for a valid integer array as input
The sort function will sort the input integer array into ascending numerical order
The result that is returned is a correctly sorted integer array, for any valid array of integers
Testing:
• Testing is a critical skill for a developer
• Provides no guarantees of correctness
– Depends on generation of good test-cases
– Generally these follow sensible heuristics
• Test correct values, e.g. sort ({1, 2 ,3})
• Test incorrect values, e.g. sort ({‗a‘, ‗b‘, ‗c‘})
• Test boundaries, e.g. sort ({0, 1, ..., MAX_VALUE})
TIME COMPLEXITY
Progress and Running Time
Invariants not only tell us something about the progress of an algorithm: – e.g. For a sorting
algorithm
So far, all items are sorted up to some n [progress]
They can tell us about running time or cost – e.g. For a sorting algorithm
The worst case performance will be O(n2) [running time]
Complexity for iterative algorithms is mostly an exercise in counting operations, then stating the
complexity.
Page41
AMORTIZED ANALYSIS:
It is defined as analyze a sequence of operations on a data structure. An amortized analysis is any
strategy for analyzing a sequence of operations to show that the average cost per operation is small,
even though a single operation within the sequence might be expensive.
Goal: Show that although some individual operations may be expensive, on average the cost per
operation is small.
Average in this context does not mean that we‘re averaging over a distribution of inputs.
No probability is involved.
An amortized analysis guarantees the average performance of each operation in the worst case.
Stack operations
PUSH(S, x) – Θ(1) – pushes object x onto the stack
POP(S) – Θ(1) – pops and returns the top object of S
MULTIPOP(S,k) – Θ(min{|S|,k}) – pops the top k items from the stack (or until empty) and
returns the last item:
while S is not empty and k > 0 do
x ← POP(S)
k ← k −1
return x
Suppose we wish to analyze the running time for a sequence of n PUSH, POP and MULTIPOP
operations, starting with an empty stack. Considering individual operations without amortization, we
would say that a MULTIPOP operation could take Θ(|S|) time and |S| could be as large as n −1. So in
the hypothetical worst case, a single operation could take Θ(n) time and n such operations strung
together would take Θ ¡ n 2 ¢ time.
Page42
However, a little more thought reveals that such a string of operations is not possible. While a single
POP could take Θ(n) time, it would have to be preceded by Θ(n) PUSH operations, which are cheap.
Taken together, the Θ(n) PUSH operations and the one MULTIPOP operation take Θ(n)· Θ(1)+ 1
·Θ(n) = Θ(n) time thus, each operation in the sequence takes Θ(1) time on average.
In general, if there occur r MULTIPOP operations with arguments k1,...,kr, then there must also
occur at least k1+···+kr PUSH operations, so that there are enough items in the stack to pop. (To
simplify notation, we assume that k is never chosen larger than |S|.) Thus, in a string of n operations,
the total cost of all non-O(1) operations is bounded above by O(n), so the total cost of all operations
is O(n)·O(1)+ O(n) = O(n). Thus, the amortized running time per operation is O(1).
Observation
Each object can be popped only once per time that it‘s pushed.
Average over the n operations ⇒O(1) per operation on average. Again, notice no probability.
Therefore, O(1) per operation on average. This technique is called aggregate analysis.
The bank balance must not go negative! We must ensure that for all n.
As long as this condition is met, we know that the amortized cost provides an upper bound on
the actual cost of any sequence of operations. The difference between the above sums is the total
surplus or ―credit‖ stored in the data structure and must at all times be nonnegative. In this way,
the accounting model is like a debit account.
Example:
Perhaps you have bought pre-stamped envelopes at the post office before. In doing so, you pay up-front
for both the envelopes and the postage. Then, when it comes time to send a letter, no additional charge
is required. This accounting can be seen as an amortization of the cost of sending a letter:
Obviously, for any valid sequence of operations, the amortized cost is at least as high as the actual cost.
However, the amortized cost is easier to keep track of—it‘s one fewer item on your balance sheet.
For the stack the amortized costs as follows:
Page44
• Therefore, total amortized cost, = O(n), is an upper bound on total actual cost.
This rule can be seen as the conservation of energy: it says that the surplus (or deficit) cost cbi − ci
caused by a given operation must be equal to the change in potential of the data structure caused by
that operation.
An operation which increases the potential of the system is assigned a positive heuristic cost,
whereas an operation which decreases the potential of the system is assigned a negative heuristic
cost.
Summing the heuristic costs of all n operations, we find
Page45
• Can release potential to pay for future operations.
• Most flexible of the amortized analysis methods.
Stack example:
We define the potential of a stack S to be Φ(S) = |S|, the number of elements in the stack. An
empty stack has zero potential, and clearly Φ is always nonnegative, so Φ is an admissible potential
function. Then the heuristic costs of the stack operations are as follows:
A PUSH operation increases the size of S by 1, and has actual cost cPUSH = 1. Thus, the amortized
cost of a PUSH operation is 2.
A POP operation decreases the size of S by 1, and has actual cost cPOP = 1. Thus, the amortized cost
of a POP operation is 0.
The operation MULTIPOP(S,k) decreases the size of S by min{|S|,k}, and has actual cost
cMULTIPOP = min{|S|,k}. Thus, the amortized cost of a MULTIPOP operation is 0.
Thus, the amortized costs for this application of the potential method are the same as those we came up
with using the accounting method
Operation Actual cost ci Amortized cost ĉi
PUSH 1 2
POP 1 0
MULTIPOP min{|S|,k} 0
Page46
UNIT-2
S.No TOPICS PAGE NO
1 Matroids 48
2 Introduction to greedy paradigm 50
Algorithm to Compute a Maximum Weight Maximal
3 53
Independent set
4 Application to MST 55
5 Graph Matching 65
6 Algorithm to Compute Maximum Matching 68
Characterization of Maximum Matching by
7 75
Augmenting Paths
Edmond‘s Blossom Algorithm to Compute
8 75
Augmenting Path
Page47
MATROIDS:
Matroid is a structure that abstracts and generalizes the notion of linear independence in vector
spaces. Matroid is a pair ⟨X,I⟩⟨X,I⟩ where XX is called ground set and II is set of
all independent subsets of XX. In other words matroid ⟨X,I⟩⟨X,I⟩ gives a classification for each subset
of XX to be either independent or dependent (included in II or not included in II).Of course, we are not
speaking about arbitrary classifications. These 3 properties must hold for any matroid:
1. Empty set is independent.
2. Any subset of independent set is independent.
3. If independent set AA has smaller size than independent set BB, there exist at least one element
in BB that can be added into AA without loss of independency.
These are axiomatic properties of matroid. To prove that we are dealing with matroid, we generally
have to prove these three properties. For example, explicit presentation of matroid on ground
set {x,y,z}{x,y,z} which considers {y,z}{y,z} and {x,y,z}{x,y,z} to be dependent and marked red.
Other sets are independent and marked green in the below diagram.
Examples
There are matroids of different types. There are some examples:
Uniform matroid. Matroid that considers subset SS independent if size of SS is not greater than
some constant kk (|S|≤k|S|≤k). Simplest one, this matroid does not really distinguish elements of
ground set in any form, it only cares about number of taken elements. All subsets of size kk are
bases for this matroid, all subsets of size (k+1)(k+1) are circuits for this matroid. We can also define
some specific cases of uniform matroid.
Trivial matroid. k=0 k=0. Only empty set is considered independent, any element of ground set is
considered dependent (any combination of ground set elements is also considered dependent as a
consequence).
Complete matroid. k=|X| k=|X|. All subsets are considered independent including complete ground
set itself.
Linear (algebra) matroid. Ground set consists of vectors of some vector space. Set of vectors is
considered independent if it is linearly independent (no vector can be expressed as linear
Page48
combination of other vectors from that set). This is the matroid from which whole matroid theory
originates from. Linear bases of vector set are bases of matroid. Any circuit of this matroid is set of
vectors, where each vector can be expressed as combination of all other vectors, but this
combination involves all other vectors in circuit.
Colorful matroid. Ground set consists of colored elements. Each element has exactly one color. Set
of elements is independent if no pair of included elements share a color. Rank of a set is amount of
different colors included into a set. Bases of this matroid are sets that have exactly one element of
each color. Circuits of this matroid are all possible pairs of elements of the same color.
Graphic matroid. This matroid is defined on edges of some undirected graph. Set of edges is
independent if it does not contain a cycle. This type of matroids is the greatest one to show some
visual examples, because it can include dependent subsets of a large size and can be represented on
a picture at the same time. If graph is connected then any basis of this graph is just a spanning tree
of this graph. If graph is not connected then basis is a forest of spanning trees that include one
spanning tree for each connected component. Circuits are simple loops of this graph. Independence
oracle for this matroid type can be implemented with DFS, BFS (start from each vertex in graph and
check that no edge connect a vertex with already visited one) or DSU (keep connected components,
start with disjoint vertices, join by all edges and ensure that each edge connected different
components upon addition). Here is an example of circuit combinations property in graphic matroid:
Truncated matroid. We can limit rank of any matroid by some number kk without breaking
matroid properties. For example, basis of truncated colorful matroid is set of elements that include
no more than kk different colors and all colors are unique. Basis of truncated graphic matroid is
acyclic set of edges that leaves at least (n−k)(n−k) connected components in the graph (where nn is
amount if vertices in a graph). This is possible because third matroid property does not only refer to
bases of matroid, but to any independent set in matroid and when all independent sets with sizes
greater than kk are simultaneously removed, independent sets of size kk become new bases and for
any lesser independent set can still find elements from each basis that can be added.
Matroid on a subset of ground set. We can limit ground set of matroid to its subset without
breaking matroid properties. This is possible because rules of dependence does not rely on specific
elements being in ground set. If we remove an edge from a graph, we will still have a valid graph. If
we remove an element from set (of vectors or colored elements) we will still get a valid set of some
Page49
element of the same type and rules will preserve. Now, we can also define rank of subset in matroid
as a rank of matroid on a ground set limited to this subset.
Expanded matroid. Direct matroid sum. We can consider two matroids as one big matroid
without any difficulties if elements of ground set of first matroid does not affect independence,
neither intersect with elements of ground set of second matroid and vice versa. If we consider two
graphic matroids on two connected graphs, we can unite their graphs together resulting in graph
with two connected components, but it is clear that including some edges in one component have no
effect on other component. This is called direct matroid sum. Formally,
M1=⟨X1,I1⟩M1=⟨X1,I1⟩,
M2=⟨X2,I2⟩M2=⟨X2,I2⟩,
M1+M2=⟨X1𝖴X2,I1×I2⟩M1+M2=⟨X1𝖴X2,I1×I2⟩,
Where ×× means cartesian product of two sets. We can unite as many matroids of as many different
types without restrictions.
Greedy Algorithm:
In Greedy Algorithm a set of resources are recursively divided based on the maximum, immediate
availability of that resource at any given stage of execution. To solve a problem based on the greedy
approach, there are two stages
1. Scanning the list of items
2. Optimization.
These stages are covered parallelly, on course of division of the array.
To understand the greedy approach, we need to have a working knowledge of recursion and context
switching. This helps us to understand how to trace the code.
Two conditions define the greedy paradigm
Each stepwise solution must structure a problem towards its best-accepted solution.
It is sufficient if the structuring of the problem can halt in a finite number of greedy steps.
Page50
The greedy paradigm was registered as a different type of optimization strategy in the NIST records
in 2005.
Till date, protocols that run the web, such as the open-shortest-path-first (OSPF) and many other
network packet switching protocols use the greedy strategy to minimize time spent on a network.
In the greedy scan shown here as a tree (higher value higher greed), an algorithm state at value: 40, is
likely to take 29 as the next value. Further, its quest ends at 12. This amounts to a value of 41.
However, if the algorithm took a sub-optimal path or adopted a conquering strategy then 25 would be
followed by 40, and the overall cost improvement would be 65, which is valued 24 points higher as a
suboptimal decision.
Page52
ALGORITHM TO COMPUTE A MAXIMUM WEIGHT MAXIMAL
INDEPENDENT SET:
The Greedy Approach and Divide and conquer algorithms are used to compute a maximum
weight maximal independent set.
Greedy Algorithm: The maximum (weighted) independent set (MIS(MWIS)) is one of the most
important optimization problems. In several heuristic methods for optimization problems,
the greedy strategy is the most natural and simplest one. For MIS, two simple greedy algorithms have
been investigated. One is called GMIN, which selects a vertex of minimum degree, removes it and its
neighbors from the graph and iterates this process on the remaining graph until no vertex remains. (the
set of selected vertices is an independent set). The other is called GMAX, which deletes a vertex of
maximum degree until no edge remains (the set of remaining vertices is an independent set).
Divide/Break: This step involves breaking the problem into smaller sub-problems. Sub-problems
should represent a part of the original problem. This step generally takes a recursive approach to divide
the problem until no sub-problem is further divisible. At this stage, sub-problems become atomic in
nature but still represent some part of the actual problem.
Conquer/Solve: This step receives a lot of smaller sub-problems to be solved. Generally, at this level,
the problems are considered 'solved' on their own.
Merge/Combine: When the smaller sub-problems are solved, this stage recursively combines them
until they formulate a solution of the original problem. This algorithmic approach works recursively
Page53
conquers and merge steps works so close that they appear as one. Below diagram indicates this:
Page54
APPLICATION TO MST:
Tree: A tree is a graph with the following properties:
1. The graph is connected (can go from anywhere to anywhere)
2. There are no cyclic (Acyclic)
Spanning Tree:
Given a connected undirected graph, a spanning tree of that graph is a sub-graph that is a tree and
joined all vertices. A single graph can have many spanning trees.
For Example:
Page55
6. Spanning Tree does not contain cycles.
7. Spanning Tree has (n-1) edges where n is the number of vertices.
Addition of even one single edge results in the spanning tree losing its property of Acyclicity and
elimination of one single edge results in its losing the property of connectivity.
Page56
Methods of Minimum Spanning Tree
There are two methods to find Minimum Spanning Tree
1. Kruskal's Algorithm
2. Prim's Algorithm
Kruskal's Algorithm:
This is an algorithm to construct a Minimum Spanning Tree for a connected weighted graph. It is a
Greedy Algorithm. If the graph is not linked, then it finds a Minimum Spanning Tree.
Analysis:
Where E is the number of edges in the graph and V is the number of vertices, Kruskal's Algorithm
can be shown to run in O(E log E) time, or simply, O(E log V) time, all with simple data structures.
These running times are equivalent because:
2 2
o E is at most V and log V = 2 x log V is O (log V).
Page57
o If we ignore isolated vertices, V ≤ 2 E, so log V is O (log E).Thus the total time is
:O (E log E) = O (E log V).
For Example: Find the Minimum Spanning Tree of the following graph using Kruskal's algorithm.
Solution: First we initialize the set A to the empty set and create |v| trees, one containing each vertex
with MAKE-SET procedure. Then sort the edges in E into order by non-decreasing weight.
There are 9 vertices and 12 edges. So MST formed (9-1) = 8 edges
Now, check for each edge (u, v) whether the endpoints u and v belong to the same tree. If they do then
the edge (u, v) cannot be supplementary. Otherwise, the two vertices belong to different trees, and the
edge (u, v) is added to A and the vertices in two trees are merged in by union procedure.
Page58
Steps to find Minimum Spanning Tree using Kruskal's algorithm
Step1: So, first take (h, g) edge Step 2: then (g, f) edge.
Step 3: then (a, b) and (i, g) edges are Step 4: Now, edge (h, i). Both h and i
considered, and the forest becomes vertices are in the same set. Thus it creates
a cycle. So this edge is discarded.
Then edge (c, d), (b, c), (a, h), (d, e), (e, f)
are considered, and the forest becomes.
Step 5: In (e, f) edge both endpoints e and Step 6: After that edge (d, f) and the final
f exist in the same tree so discarded this spanning tree is shown as in dark lines.
edge. Then (b, h) edge, it also creates a
cycle.
Step 7: This step will be required Minimum Spanning Tree because it contains all the 9
vertices and (9 - 1) = 8 edges
e → f, b → h, d → f [cycle will be formed]
Page59
Prim's Algorithm
It is a greedy algorithm. It starts with an empty spanning tree. The idea is to maintain two sets of
vertices:
o Contain vertices already included in MST.
o Contain vertices not yet included.
At every step, it considers all the edges and picks the minimum weight edge. After picking the edge, it
moves the other endpoint of edge to set containing MST.
Example: Generate minimum cost spanning tree for the following graph using Prim's algorithm.
Page60
Solution: In Prim's algorithm, first we initialize the priority Queue Q. to contain all the vertices and the
key of each vertex to ∞ except for the root, whose key is set to 0. Suppose 0 vertex is the root, i.e., r. By
EXTRACT - MIN (Q) procure, now u = r and Adj [u] = {5, 1}.Removing u from set Q and adds it to set
V - Q of vertices in the tree. Now, update the key and π fields of every vertex v adjacent to u but not in
a tree.
Vertex 0 1 2 3 4 5 6
Key 0 ∞ ∞ ∞ ∞ ∞ ∞
Value
Parent NIL NIL NIL NIL NIL NIL NIL
Page61
w (5,4) < key [4]
date key value and parent of 4.
Vertex 0 1 2 3 4 5 6
Key 0 28 ∞ ∞ 25 10 ∞
Value
Parent NIL 0 NIL NIL 5 0 NIL
π[3]= 4 π[6]= 4
Vertex 0 1 2 3 4 5 6
Key 0 28 ∞ 22 25 10 ∞
Value
Parent NIL 0 NIL 4 5 0 NIL
Page62
Adj [3] = {4, 6, 2}
4 is already in heap
4 ≠ Q key [6] = 24 now becomes key [6] = 18
Key [2] = ∞ key [6] = 24
w (3, 2) = 12 w (3, 6) = 18
w (3, 2) < key [2] w (3, 6) < key [6]
Now in Q, key [2] = 12, key [6] = 18, key [1] = 28 and parent of 2 and 6 is 3.
π [2] = 3 π[6]=3
Vertex 0 1 2 3 4 5 6
Key 0 28 12 22 25 10 18
Value
Parent NIL 0 3 4 5 0 3
u = EXTRACT_MIN (2, 6)
u=2 [key [2] < key [6]]
12 < 18
Now the root is 2
Adj [2] = {3, 1}
3 is already in a heap
Taking 1, key [1] = 28
w (2,1) = 16
w (2,1) < key [1]
π[1]= 2
Page63
Vertex 0 1 2 3 4 5 6
Key
0 16 12 22 25 10 18
Value
Parent NIL 2 3 4 5 0 3
Now all the vertices have been spanned, Using above the table we get Minimum Spanning Tree.
0→5→4→3→2→1→6
[Because Π [5] = 0, Π [4] = 5, Π [3] = 4, Π [2] = 3, Π [1] =2, Π [6] =1]
Thus the final spanning Tree is
Total Cost = 10 + 25 + 22 + 12 + 16 + 14 = 99
Page64
GRAPH MATCHING:
Graph matching is the problem of finding a similarity between graphs. Graphs are commonly used to
encode structural information in many fields, including computer vision and pattern recognition, and
graph matching is an important tool in these areas. In these areas it is commonly assumed that the
comparison is between the data graph and the model graph.
The case of exact graph matching is known as the graph isomorphism problem. The problem of exact
matching of a graph to a part of another graph is called subgraph isomorphism problem.
The inexact graph matching refers to matching problems when exact matching is impossible, e.g.,
when the numbers of vertices in the two graphs are different. In this case it is required to find the best
possible match. For example, in image recognition applications, the results of image
segmentation in image processing typically produces data graphs with the numbers of vertices much
larger than in the model graphs data expected to match against. In the case of attributed graphs, even if
the numbers of vertices and edges are the same, the matching still may be only inexact.
Two categories of search methods are the ones based on identification of possible and impossible
pairings of vertices between the two graphs and methods which formulate graph matching as
an optimization problem. Graph edit distance is one of similarity measures suggested for graph
matching. The class of algorithms is called error-tolerant graph matching.
Definition
A matching graph is a sub-graph of a graph where there are no edges adjacent to each other. Simply,
there should not be any common vertex between any two edges.
Matching
Let ‗G‘ = (V, E) be a graph. A subgraph is called a matching M(G), if each vertex of G is incident
with at most one edge in M, i.e.,deg(V) ≤ 1 ∀ V ∈ G. Which means in the matching graph M(G), the
vertices should have a degree of 1 or 0, where the edges should be incident from the graph G.
Page65
Notation − M(G) The Example:
In a matching,
ifdeg(V) = 1, then (V) is said to be matched
ifdeg(V) = 0, then (V) is not matched.
In a matching, no two edges are adjacent. It is because if any two edges are adjacent, then the degree of
the vertex which is joining those two edges will have a degree of 2 which violates the matching rule.
Maximal Matching
A matching M of graph ‗G‘ is said to be maximal if no other edges of ‘G’ can be added to M.
Example
M1, M2, M3 from the above graph are the maximal matching of G.
Maximum Matching
It is also known as largest maximal matching. Maximum matching is defined as the maximal matching
with maximum number of edges.
The number of edges in the maximum matching of ‗G‘ is called its matching number.
Page66
For a graph given in the above example, M1 and M2 are the maximum matching of ‗G‘ and its
matching number is 2. Hence by using the graph G, we can form only the sub-graphs with only 2
edges maximum. Hence we have the matching number as two.
Perfect Matching
A matching (M) of graph (G) is said to be a perfect match, if every vertex of graph g (G) is incident
to exactly one edge of the matching (M), i.e.,deg(V) = 1 ∀ V
The degree of each and every vertex in the subgraph should have a degree of 1.
Note − Every perfect matching of graph is also a maximum matching of graph, because there is no
chance of adding one more edge in a perfect matching graph.
A maximum matching of graph need not be perfect. If a graph ‗G‘ has a perfect match, then the
number of vertices |V(G)| is even. If it is odd, then the last vertex pairs with the other vertex, and
finally there remains a single vertex which cannot be paired with any other vertex for which the degree
is zero. It clearly violates the perfect matching principle.
Example
Page67
Note − The converse of the above statement need not be true. If G has even number of vertices, then
M1 need not be perfect.
Example
It is matching, but it is not a perfect match, even though it has even number of vertices.
A matching in a Bipartite Graph is a set of the edges chosen in such a way that no two edges
share an endpoint. A maximum matching is a matching of maximum size (maximum number of
edges). In a maximum matching, if any edge is added to it, it is no longer a matching. There can be
more than one maximum matching for a given Bipartite Graph.
Hopcroft Karp algorithm is an improvement that runs in O(√V x E) time. Let us define few
terms before we discuss the algorithm.
Free Node or Vertex: Given a matching M, a node that is not part of matching is called free node.
Initially all vertices as free (from first graph of below diagram). In second graph, u2 and v2 are free. In
third graph, no vertex is free.
Matching and Not-Matching edges: Given a matching M, edges that are part of matching are called
Matching edges and edges that are not part of M (or connect free nodes) are called Not-Matching edges.
In first graph, all edges are non-matching. In second graph, (u0, v1), (u1, v0) and (u3, v3) are matching
and others not-matching.
Page68
Alternating Paths: Given a matching M, an alternating path is a path in which the edges belong
alternatively to the matching and not matching. All single edges paths are alternating paths. Examples
of alternating paths in middle graph are u0-v1-u2 and u2-v1-u0-v2.
Augmenting path: Given a matching M, an augmenting path is an alternating path that starts from and
ends on free vertices. All single edge paths that start and end with free vertices are augmenting paths. In
below diagram, augmenting paths are highlighted with blue color. Note that the augmenting path
always has one extra matching edge.
The Hopcroft Karp algorithm is based on below concept.
A matching M is not maximum if there exists an augmenting path. It is also true other way, i.e, a
matching is maximum if no augmenting path exists. So the idea is to one by one look for augmenting
paths. And add the found paths to current matching.
In the initial graph all single edges are augmenting paths and we can pick in any order. In the middle
stage, there is only one augmenting path. We remove matching edges of this path from M and add not-
matching edges. In final matching, there are no augmenting paths so the matching is maximum.
Ford-Fulkerson algorithm
The Ford–Fulkerson method or Ford–Fulkerson algorithm (FFA) is a greedy algorithm that computes
Page69
the maximum flow in a flow network. It is sometimes called a "method" instead of an "algorithm" as
the approach to finding augmenting paths in a residual graph is not fully specified or it is specified in
several implementations with different running times.
Time Complexity: Time complexity of the above algorithm is O(max_flow * E). We run a loop while
there is an augmenting path. In worst case, we may add 1 unit flow in every iteration. Therefore the
time complexity becomes O(max_flow * E).
Page70
Given a graph which represents a flow network where every edge has a capacity. Also given two
vertices source ‗s‘ and sink ‗t‘ in the graph, find the maximum possible flow from s to t with following
constraints:
a) Flow on an edge doesn‘t exceed the given capacity of the edge.
b) Incoming flow is equal to outgoing flow for every vertex except s and t.
For example, consider the following graph.
A matching in a Bipartite Graph is a set of the edges chosen in such a way that no two edges share an
endpoint. A maximum matching is a matching of maximum size (maximum number of edges). In a
maximum matching, if any edge is added to it, it is no longer a matching. There can be more than one
maximum matchings for a given Bipartite Graph.
Why do we care?
There are many real world problems that can be formed as Bipartite Matching.
For example, consider the following problem:
There are M job applicants and N jobs. Each applicant has a subset of jobs that he/she is interested in.
Each job opening can only accept one applicant and a job applicant can be appointed for only one job.
Find an assignment of jobs to applicants in such that as many applicants as possible get jobs.
Page71
Example: Ford-Fulkerson Algorithm for Maximum Flow Problem
Page72
How to implement the above approach?
Let us first define input and output forms. Input is in the form of Edmonds matrix which is a 2D
array ‗bpGraph[M][N]‘ with M rows (for M job applicants) and N columns (for N jobs). The value
bpGraph[i][j] is 1 if i‘th applicant is interested in j‘th job, otherwise 0.
Output is number maximum number of people that can get jobs.
A simple way to implement this is to create a matrix that represents adjacency matrix
representation of a directed graph with M+N+2 vertices. Call the fordFulkerson() for the matrix. This
implementation requires O((M+N)*(M+N)) extra space.
Extra space can be be reduced and code can be simplified using the fact that the graph is bipartite and
capacity of every edge is either 0 or 1. The idea is to use DFS traversal to find a job for an applicant
(similar to augmenting path in Ford-Fulkerson). We call bpm() for every applicant, bpm() is the DFS
based function that tries all possibilities to assign a job to the applicant.
In bpm(), we one by one try all jobs that an applicant ‗u‘ is interested in until we find a job, or all
jobs are tried without luck. For every job we try, we do following.
If a job is not assigned to anybody,
We simply assign it to the applicant and return true. If a job is assigned to somebody else say x, then
we recursively check whether x can be assigned some other job. To make sure that x doesn‘t get the
same job again, we mark the job ‗v‘ as seen before we make recursive call for x. If x can get other
job, we change the applicant for job ‗v‘ and return true. We use an array maxR[0..N-1] that stores the
applicants assigned to different jobs.
If bmp() returns true, then it means that there is an augmenting path in flow network and 1 unit
of flow is added to the result in maxBPM().
Each Directed Edge is labeled with capacity. Use the Ford-Fulkerson algorithm to find the maximum
flow.
Page73
The left side of each part shows the residual network Gf with a shaded augmenting path p,and the right
side of each part shows the net flow f.
Page74
CHARACTERIZATION OF MAXIMUM MATCHING BY
AUGMENTING PATHS:
Augmenting Paths for Matchings:
Definitions: ñ Given a matching M in a graph G, a vertex that is not incident to any edge of M is called
a free vertex w. r. t. M. ñ for a matching M a path P in G is called an alternating path if edges in M
alternate with edges not in M. ñ An alternating path is called an augmenting path for matching M if it
ends at distinct free vertices.
Suppose there is a matching M0 with larger cardinality. Consider the graph H with edge-set
M0M (i.e., only edges that are in either M or M0 but not in both).
Each vertex can be incident to at most two edges (one from M and one from M0). Hence, the
connected components are alternating cycles or alternating path.
As jM0j>jMj there is one connected component that is a path P for which both endpoints are
incident to edges from M0. P is an alternating path.
Algorithmic idea:
As long as we find an augmenting path augment our matching using this path. When we arrive at a
matching for which no augmenting path exists we have a maximum matching.
Page75
Introduction
The blossom algorithm, created by Jack Edmunds in 1961, was the first polynomial time algorithm that
could produce a maximum matching on any graph. Previous to the discovery of this algorithm, there
were no fast algorithms that were able to find a maximum matching on a graph with odd length cycles.
The Hungarian Algorithm came the closest still created a maximum matching on a graph under the
condition that there were no odd length cycles. A matching is a graph that occurs when all vertices are
connected to at most one other vertex through a single edge and therefore the set of all edges in the
original graph do not contain any common vertices. A maximum matching occurs when the number of
edges in the matching graph is maximized.
An example of an augmenting path where matched vertices are orange and unmatched vertices are
black. The edges that make up the matching are blue.
Page76
Finding an augmenting path happens as follows. Starting with a free (not part of the matching) vertex
V, explore all of its neighbors. Either V will discover another free vertex (and thus find an augmenting
path) or it will find a vertex W that is part of the matching. If the latter occurs, mark the edge from V to
W and then extend the path to W's mate X. This will always produce an alternating path starting from
the original vertex to some other matched vertex in the graph. Finally, recurse with vertex X. This
process will continue as long as there is a vertex X that can recursed on.
Page77
The search for an augmenting path already occured as described as steps 1-5 for finding an augmenting
path. The starting point for this potential augmenting path is vertex A
1. The yellow edge connects two nodes that could be part of an alternating path back towards A.
So a blossom is present.
2. All nodes that are part of the cycle are then contracted into the supernode (red).
3. The supernode is then enqueued into Q.
After the nodes are contracted, the search for an augmenting path continues as if the supernode were a
single node. Once the search for an augmenting path is completed (regardless of if it was found or not),
the supernode is then lifted. This lifting means that the supernode once again becomes all of the original
nodes. If an augmented path was found, then there is an augmenting path from the original free node to
another free node with some vertices of the odd length cycle in the path. This augmenting path is then
inverted (just like before) in order to increase the size of the matching by one.
The search for an augmenting path is continued from supernode V as previously described, and it
is found that the augmenting path passes through the supernode.
The Algorithm
Intuitively the algorithm works by repeatedly trying to find augmenting paths (specifically paths
starting and ending with free vertices) in the graph until no more augmenting paths can be found. This
occurs when either all vertices are matched with a mate or when a free vertex is only connected to
matched vertices which recursively only match to other matched vertices.
All Vertices Matched and a maximum Not All Vertices Matched and a
matching maximum matching
Furthermore, the process of finding augmented paths can be slightly modified so that it
simultaneously can check to see if there is a blossom in the graph. This can be accomplished by labeling
each vertex in an alternating manner O (for odd) and E (for even) depending on how far away a vertex
is from the root. To start, label the original free vertex E, and then alternate labeling from there. The
Page78
algorithm that finds augmeted paths will always search for neighboring vertices from a vertex labeled E
(the matched vertex‘s mate always is two edges further away from the root) and thus if an adjacent
vertex also has the label E, there is a blossom that contains both of those vertices. This blossom is then
contracted into one supernode with a label E and then the search for augmenting paths continues.
All together, this algorithm starts by finding augmenting paths within any graph while labeling vertices
E or O and then inverting these paths to increase the size of the matching by one. If a blossom is found,
it is contracted into a single node and the augmenting path algorithm continues as normal. This repeats
until all vertices of the graph have been either searched or are already part of the matching set.
Blossoms can be nested within each other, and therefore, supernodes may end up containing
other supernodes.
Time Complexity
Overall, this algorithm takes O(E * V2) time. Every attempt to find an augmenting path takes O(E) time
in order to iterate over all of the edges. Assuming that an attempt to find an augmenting path succeeds,
it will take O(V) time to flip all of the incident edges. Finally, there are O(V) blossoms in any graph and
for each blossom the smaller graph with the supernode may need to be recursed on. This means that the
total time is O(E * V2) or simply O(V4) for a dense graph.
Page79
UNIT-3
S.No TOPICS PAGE NO
1 Flow-Networks 81
2 Maxflow-mincut theorem 82
3 Ford-Fulkerson Method to Compute Maximum Flow 85
4 Edmond-Karp Maximum-Flow Algorithm 88
5 Matrix Computations 92
6 Strassen‘s Algorithm 93
7 Introduction to Divide and Conquer Paradigm 94
8 Inverse of a Triangular Matrix 97
Relation between the Time Complexities of Basic
9 100
Matrix Operations
10 LUP-Decomposition 102
Page80
FLOW-NETWORKS:
A network is a directed graph G with vertices V and edges E combined with a function cc,
which assigns each edge ee a non-negative integer value, the capacity of ee. Such a network is called
a flow network, if we additionally label two vertices, one as source and one as sink.
A flow in a flow network is function ff, that again assigns each edge ee a non-negative integer
value, namely the flow. The function has to fulfill the following two conditions:
The flow of an edge cannot exceed the capacity.
f(e) ≤ c(e)
And the sum of the incoming flow of a vertex uu has to be equal to the sum of the outgoing flow
of u except in the source and sink vertices.
The value of a flow of a network is the sum of all flows that gets produced in source ss, or equivalently
of the flows that are consumed in the sink tt. A maximal flow is a flow with the maximal possible
value. Finding this maximal flow of a flow network is the problem that we want to solve.
In the visualization with water pipes, the problem can be formulated in the following way: how much
water can we push through the pipes from the source to the sink. The following image shows the
Page81
maximal flow in the flow network.
In graph theory, a flow network is defined as a directed graph involving a source(S) and a sink(T) and
several other nodes connected with edges. Each edge has an individual capacity which is the maximum
limit of flow that edge could allow.
Flow in the network should follow the following conditions:
For any non-source and non-sink node, the input flow is equal to output flow.
For any edge (Ei) in the network, 0≤flow (Ei) ≤Capacity (Ei).
Total flow out of the source node = total flow in to the sink node.
Net flow in the edges follows skew symmetry i.e. F(u,v)=−F(v,u) where F(u,v) is flow from
node u to node v. This leads to a conclusion where you have to sum up all the flows between
two nodes (either directions) to find net flow between the nodes initially.
Maximum Flow:
It is defined as the maximum amount of flow that the network would allow to flow from source to sink.
Multiple algorithms exist in solving the maximum flow problem. Two major algorithms to solve these
kind of problems are Ford-Fulkerson algorithm and Dinic's Algorithm.
MAXFLOW-MINCUT THEOREM:
In computer science and optimization theory, the max-flow min-cut theorem states that in a flow
network, the maximum amount of flow passing from the source to the sink is equal to the total weight
of the edges in the minimum cut, i.e the smallest total weight of the edges which if removed would
disconnect the source from the sink.
In a flow network, an s-t cut is a cut that requires the source ‗s‘ and the sink ‗t‘ to be in different subsets
and it consists of edges going from the source‘s side to the sink‘s side. The capacity of an s-t cut is
defined by the sum of the capacity of each edge in the cut-set. The problem discussed here is to find
minimum capacity s-t cut of the given network. Expected output is all edges of the minimum cut.
For example, in the following flow network, example s-t cuts are {{0 ,1}, {0, 2}}, {{0, 2}, {1, 2}, {1,
3}}, etc. The minimum s-t cut is {{1, 3}, {4, 3}, {4 5}} which has capacity as 12+7+4 = 23.
Page82
Minimum Cut and Maximum Flow
Maximum Bipartite Matching, this is another problem which can be solved using Ford-
Fulkerson Algorithm. This is based on max-flow min-cut theorem. The max-flow min-cut
theorem states that in a flow network, the amount of maximum flow is equal to capacity of the
minimum cut. From Ford-Fulkerson, we get capacity of minimum cut. How to print all edges that form
the minimum cut? The idea is to use residual graph.
Following are steps to print all edges of the minimum cut.
1. Run Ford-Fulkerson algorithm and consider the final residual graph.
2. Find the set of vertices that are reachable from the source in the residual graph.
3. All edges which are from a reachable vertex to non-reachable vertex are minimum cut edges.
Print all such edges.
The final residual graph is used to find the minimum cut. First, you find all nodes reachable from the
source in the residual graph. This is one set of nodes, which we color purple below:
Now, in the original graph, we divide our nodes into two sets: the set determined above, and all of the
remaining nodes. They are drawn purple and yellow in the original graph below:
Page83
The minimum cut is composed of all the edges that go from the source set to the sink set. These are
edges AD, CD and EG, which I've drawn in red above. The sum of their capacities equals the maximum
flow of six.
Instead of using a depth-first search, you can use a modification of Dijkstra's algorithm to find the path
through the residual that has the maximum flow. When processing a node, Dijkstra's algorithm traverses
the node's edges, and if shorter paths to any other nodes is discovered, the nodes are updated. At each
step, the node with the shortest known path is processed next.
The modification works as follows. When processing a node, again the algorithm traverses the node's
edges, and if paths with more flow to any other nodes are discovered, then the nodes are updated. At
each step, the node with the maximum flow is processed next.
Flow Residual
Page84
The next maximum flow path is A->B->C->D->F->G with a flow of 2:
Residual
Flow
Residual
Flow
We can initialize the residual graph as original graph as there is no initial flow and initially residual
capacity is equal to original capacity. To find an augmenting path, we can either do a BFS or DFS of
the residual graph. We have used BFS in below implementation.
Using BFS, we can find out if there is a path from source to sink. BFS also builds parent [ ] array.
Using the parent [ ] array, we traverse through the found path and find possible flow through this
Page85
path by finding minimum residual capacity along the path. We later add the found path flow to
overall flow.
The important thing is, we need to update residual capacities in the residual graph. We subtract path
flow from all edges along the path and we add path flow along the reverse edges. We need to add
path flow along reverse edges because may later need to send flow in reverse direction
A demonstration of working of Ford-Fulkerson algorithm is shown below with the help of diagrams.
Page86
No more Path Left Max Flow : 15
Implementation:
An augmenting path in residual graph can be found using DFS or BFS.
Updating residual graph includes following steps: (refer the diagrams for better understanding)
o For every edge in the augmenting path, a value of minimum capacity in the path is
subtracted from all the edges of that path.
o An edge of equal amount is added to edges in reverse direction for every successive
nodes in the augmenting path.
The complexity of Ford-Fulkerson algorithm cannot be accurately computed as it all depends on the
path from source to sink. For example, considering the network shown below, if each time, the path
chosen are S−A−B−T and S−B−A−T alternatively, then it can take a very long time. Instead, if path
chosen are only S−A−T and S−B−T, would also generate the maximum flow.
Each Directed Edge is labeled with capacity. Use the Ford-Fulkerson algorithm to find the maximum
flow.
The left side of each part shows the residual network Gf with a shaded augmenting path p,and the right
side of each part shows the net flow f.
Page87
EDMOND-KARP MAXIMUM-FLOW ALGORITHM:
Edmonds–Karp algorithm is an implementation of the Ford–Fulkerson method for computing
the maximum flow in a flow network in much more optimized approach. Edmonds-Karp is identical to
Ford-Fulkerson except for one very important trait. The search order of augmenting paths is well
defined. Here augmenting paths, along with residual graphs, are the two important concepts to
understand when finding the max flow of a network.
Page88
Edmonds–Karp algorithm
Algorithm EdmondsKarp(G)
1: f ← 0; Gf ← G
2: while Gf contains an s − t path P do
3: Let P be an s − t path in Gf with the minimum number of edges.
4: Augment f using P.
5: Update Gf
6: end while
7: return f
Augmenting paths are simply any path from the source to the sink that can currently take more flow.
Over the course of the algorithm, flow is monotonically increased. So, there are times when a path from
the source to the sink can take on more flow and that are an augmenting path. This can be found by a
breadth-first search, as we let edges have unit length. The running time of O(V E2) is found by showing
that each augmenting path can be found in O(E) time, that every time at least one of the E edges
becomes saturated (an edge which has the maximum possible flow), that the distance from the saturated
edge to the source along the augmenting path must be longer than last time it was saturated and that the
length is at most V. Another property of this algorithm is that the length of the shortest augmenting path
increases monotonically.
Page89
Notice how the length of the augmenting path found by the algorithm (in red) never decreases. The
paths found are the shortest possible. The flow found is equal to the capacity across the minimum cut in
the graph separating the source and the sink. There is only one minimal cut in this graph, partitioning
the nodes into the sets { A , B , C , E } and { D , F , G }, with the capacity is c(A,D) + c (C,D) +c (E,G)
=3+1+1 =5
Complexity
Worst case time complexity: Θ(V * E * E)
Average case time complexity: Θ(V * E * E)
Best case time complexity: Θ(V * E * E)
Space complexity: Θ(E + V)
Finally, the Edmonds-Karp algorithm uses a straightforward breadth-first search to find the minimum
hop path through the residual graph. This is equivalent to treating the residual as an unweighted and
performing a shortest path search on it.
There are two minimum hop paths: A->D->E->G and A->D->F->G. Suppose we process the former of
these, with a flow of one:
Page90
Flow Residual
Now, there is only one minimum hop path through the residual: A->D->F->G, with a flow of two:
Flow Residual
At this point, there are only two paths through the residual: A->B->C->D->F->G and A->B->C->E-
>D->F->G. The first of these has fewer hops, so we process it. It has a flow of two:
Flow Residual
The final path through the residual is A->B->C->E->D->F->G with a flow of one. When we process
it, we get the same flow and residual graphs as the other two algorithms:
Flow Residual
Page91
MATRIX COMPUTATIONS:
A matrix is a rectangular array of numbers arranged in rows and columns. The array of numbers below
is an example of a matrix. The number of rows and columns that a matrix has is called its dimension
or its order. By convention, rows are listed first; and columns, second
Standard algorithm:
Page92
STRASSEN’S ALGORITHM:
1. Divide: Partition A and B into (n/2)x(n/2) submatrices. Form terms to be multiplied using + and –.
2. Conquer: Perform 7 multiplications of (n/2)x(n/2) submatrices recursively.
3. Combine: Perform +and – on (n/2)x(n/2) submatrices. Combine the result of two matrices to find
the final product or final matrix.
In this context, using Strassen‘s Matrix multiplication algorithm, the time consumption can be
improved a little bit. Strassen‘s Matrix multiplication can be performed only on square
matrices where n is a power of 2. Order of both of the matrices are n × n.
Divide X, Y and Z into four (n/2)×(n/2) matrices as represented below −
𝐼 𝐽 𝐴 𝐵 𝐸 𝐹
Z=[ ] X= [ ] Y=[ ]
𝐾 𝐿 𝐶 𝐷 𝐺 𝐻
Using Strassen‘s Algorithm compute the following:
M1:=(A+C)×(E+F)
M2:=(B+D)×(G+H)
M3:=(A−D)×(E+H)
M4:=A×(F−H)
M5:=(C+D)×(E)
M6:=(A+B)×(H)
M7:=D×(G−E)
Then,
I:=M2+M3−M6−M7
J:=M4+M6
K:=M5+M7
L:=M1−M3−M4−M5
Analysis
c n
T (n) = { + 𝑑𝑥𝑛2𝑖𝑓𝑛 = 1 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒where c and d are constants
7xT ( )
2
Using this recurrence relation, we get T(n)=O(nlog7)T(n)=O(nlog7)
Addition and Subtraction of two matrices takes O(N2) time. So time complexity can be written
asT(N) =7T (N/2) + O (N2)
Hence, the complexity of Strassen‘s matrix multiplication algorithm is O(nlog7)
Page93
The number 2.81 may not seem much smaller than 3, but because the difference is in the
exponent, the impact on running time is significant. In fact, Strassen‘s algorithm beats the
ordinary algorithm today‘s machines for n 30 so on.
Generally Strassen’s Method is not preferred for practical applications for following reasons.
1. The constants used in Strassen‘s method are high and for a typical application Naive method works
better.
2. For Sparse matrices, there are better methods especially designed for them.
3. The submatrices in recursion take extra space.
4. Because of the limited precision of computer arithmetic on noninteger values, larger errors
accumulate in Strassen‘s algorithm than in Naive Method
Divide and Conquer algorithm consists of a dispute using the following three steps.
1. Divide: The original problem into a set of sub-problems.
2. Conquer: Solve every sub-problem individually, recursively.
3. Combine: Put together the solutions of the sub-problems to get the solution to the whole
problem.
Page94
Fundamental of Divide & Conquer Strategy:
There are two fundamentals of Divide & Conquer Strategy:
1. Relational Formula
2. Stopping Condition
1. Relational Formula: It is the formula that we generate from the given technique. After generation of
Formula we apply D&C Strategy, i.e. we break the problem recursively & solve the broken sub-
problems.
2. Stopping Condition: When we break the problem using Divide & Conquer Strategy, then we need
to know that for how much time, we need to apply divide & Conquer. So the condition where we need
to stop our recursion steps of D&C is called as Stopping Condition.
In the above method, we do 8 multiplications for matrices of size N/2 x N/2 and 4 additions. Addition
of two matrices takes O(N2) time. So the time complexity can be written asT(N) = 8T (N/2) + O (N2)
From Master‘s Theorem, time complexity of above method is O (N3)
Simple Divide and Conquer also leads to O(N3), can there be a better way?
In the above divide and conquer method, the main component for high time complexity is 8 recursive
calls. The idea of Strassen’s method is to reduce the number of recursive calls to 7. Strassen‘s method
is similar to above simple divide and conquer method in the sense that this method also divide matrices
Page95
to sub-matrices of size N/2 x N/2 as shown in the below diagram, but in Strassen‘s method, the four
sub-matrices of result are calculated using following formulae.
Page96
7. It also uses memory caches effectively. When the sub-problems become simple enough, they
can be solved within a cache, without having to access the slower main memory.
Matrix Inverse
Inverse of a matrix can only be defined for square matrices.
Inverse of a square matrix exists only if the determinant of that matrix is non-zero.
Inverse matrix of 𝐴 is noted as 𝐴−1.
𝐴𝐴−1 = 𝐴−1𝐴 = I
Page97
Inverse of a 3 x 3 matrix (using cofactor matrix):
Calculating the inverse of a 3 × 3 matrix is:
Compute the matrix of minors for A.
Compute the cofactor matrix by alternating + and – signs.
Compute the adjugate matrix by taking a transpose of cofactor matrix.
Divide all elements in the adjugate matrix by determinant of matrix.
Page98
Inverse of Diagonal matrices
• The determinant of a diagonal matrix is the product of its diagonal elements.
• If they all are non-zero, then determinant is non-zero and the matrix is invertible.
• The inverse of a diagonal matrix A is another diagonal matrix B whose diagonal elements are the
reciprocals of the diagonal elements of A.
• Example:
Page99
Inverse of Upper/Lower Triangular Matrices
The four "basic operations" on numbers are addition, subtraction, multiplication, and division.
For matrices, there are three basic row operations; that is, there are three procedures that you can do
with the rows of a matrix.
Basic Arithmetic Operations
Operation Input Output Algorithm Complexity
Two n-digit One n+1-digit Basic addition with
Addition Θ(n), Θ(log(N))
numbers N, N number carry
Two n-digit One n+1-digit Basic subtraction
Subtraction Θ(n), Θ(log(N))
numbers N, N number with borrow
Two n-digit One 2n-digit Basic long
Multiplication O(n2)
numbers number multiplication
Karatsuba algorithm O(n1.585)
3-way Toom–Cook
O(n1.465)
multiplication
k-way Toom–Cook O(nlog (2k −
multiplication 1)/log k)
Mixed-level Toom–
O(n 2√2
Cook (Knuth 4.3.3-
log n log n)
T)[2]
Page100
Schönhage–Strassen
O(n log n log log n)
algorithm
Fürer's algorithm[3] O(n log n 22 log* n)
Harvey-Hoeven
O(n log n)
algorithm[4] [5]
Two n-digit One n-digit
Division Basic long division O(n2)
numbers number
Burnikel-Ziegler
Divide-and-Conquer O(M(n) log n)
Division [6]
Newton–Raphson
O(M(n))
division
One n-digit One n-digit
Square root Newton's method O(M(n))
number number
Two n-digit Repeated
Modular One n-digit
integers and a k- multiplication and O(M(n) 2k)
exponentiation integer
bit exponent reduction
Exponentiation by
O(M(n) k)
squaring
Exponentiation
with Montgomery O(M(n) k)
reduction
Matrix Algebric Function
Operation Input Output Algorithm Complexity
Basic matrix
O(n3)
multiplication
Strassen algorithm O(n2.807)
Matrix
Two n×n matrices One n×n matrix Coppersmith–
multiplication O(n2.376)
Winograd algorithm
Optimized CW-like
O(n2.373)
algorithms
One n×m matrix
Matrix Schoolbook matrix
& One n×p matrix O(nmp)
multiplication multiplication
one m×p matrix
One n×⌈n⌉ matrix O(nω(k)+∈)where
Matrix
& One n×n matrix Algorithms upper bounds on
multiplication
one ⌈n⌉×n matrix ω(k)
Matrix Gauss–Jordan
One n×n matrix One n×n matrix O(n3)
inversion* elimination
Page101
Strassen algorithm O(n2.807)
Coppersmith–
O(n2.376)
Winograd algorithm
Optimized CW-like
O(n2.373)
algorithms
One m×m matrix,
one m×n matrix, Bidiagonalizationand O(mn2 + m2 n)
& QR algorithm (m ≥ n)
Singular value one n×n matrix
One m×n matrix
decomposition One m×n matrix,
one n×n matrix, Bidiagonalization O(mn2)
& and QR algorithm (m ≥ n)
one n×n matrix
Laplace expansion O(n!)
Division-free
O(n4)
algorithm
Determinant One n×n matrix One number LU decomposition O(n3)
Bareiss algorithm O(n3)
Fast matrix
O(n2.373)
multiplication
Back
Triangular matrix n solutions Back substitution O(n2)
substitution
LUP-DECOMPOSITION:
Definition:
A m×n matrix is said to have a LU-decomposition if there exists matrices L and U with the
following Properties:
(i) L is a m×n lower triangular matrix with all diagonal entries being 1.
(ii) U is a m×n matrix in some echelon form.
(iii) A = LU.
Advantages of LU-Decomposition:
Suppose we want to solve a m×n system AX = b.
If we can find a LU-decomposition for A , then to solve AX =b, it is enough to solve the systems
Page102
Thus the system LY = b can be solved by the method of forward substitution and the system UX = Y
can be solved by the method of backward substitution. The following examples will illustrate this.
Example:
Consider the given system AX = b, where
to get
and get
The above discussion raises the following question. Which matrices have LU-decomposition, and how
to find them? The answer is given by the following:
Theorem:
Let A be a m×n matrix and , ,... be elementary matrices such that
U = ............. A
is in non-echelon form. If none of the 's corresponds to the operation of row interchange, then
Page103
C= ... is a lower triangular invertible matrix. Further L = is also a lower triangular
matrix with A = LU .
Example:
Let
Thus
Page104
Note that the element below the main diagonal in L is the pivotal columns are the negatives of the
scalars used in the respective column to make all the entries in that column below the pivot zero.
For example, in first column, we operated -2 + and 3 + . Thus, in the first column, the second
entry is 2 and the third entry is -3.
Example:
Let
To reduce A to a row -echelon form, we operate ½ + and ½ + for the first pivotal columns,
namely the second column to get
Next we operate -3 + for the next pivotal column, the column to get
This has only two non-zero rows. Thus L is given by replacing entries below diagonal pivotal columns
by -ve of the multiples: in column 1 these are -½ , -½ ,and in second column this is 3. Thus
Let us check
Page105
The method followed in examples can in fact be proved mathematically as follows:
Theorem:
Let be the m m elementary matrix which corresponds to the elementary row operation of
multiplying the ith row by the scalar - and adding it to the jth row. Then
Page106
is in non-echelon form, then
Till now we had assumed that no row interchanges were required to transform A to upper triangular
form. If such row operations are needed we can proceed by the following:
Theorem
Let A be an m n matrix which requires row interchanges , in addition to the
other elementary row operations to transform it to row echelon form U : Then , the matrix
PA requires no row interchanges to reduce it to row echelon form and hence can be written
as PA = LU .
The system AX = b is equivalent to a system PAX = Pb , Where PA has LU-decomposition .
Definition
A m×m matrix which arises from by finite number of row interchanges is called
a permutation matrix.
Example:
Page107
and now U is in r.e.f..
Thus, we consider PA and transform it to r.e.f
Hence,
Page108
UNIT-4
Page109
SHORTEST PATH IN GRAPHS:
The shortest path problem is about finding a path between 2 vertices in a graph such that the total sum
of the edges weights is minimum.
Shortest paths:
An edge-weighted digraph is a digraph where we associate weights or costs with each edge. A shortest
path from vertex s to vertex t is a directed path from s to t with the property that no other such path has
a lower weight.
Properties:
Paths are directed. A shortest path must respect the direction of its edges.
The weights are not necessarily distances. Geometric intuition can be helpful, but the edge
weights weights might represent time or cost.
Not all vertices need be reachable. If t is not reachable from s, there is no path at all, and
therefore there is no shortest path from s to t.
Negative weights introduce complications. For the moment, we assume that edge weights are
positive (or zero).
Shortest paths are normally simple. Our algorithms ignore zero-weight edges that form
cycles, so that the shortest paths they find have no cycles.
Shortest paths are not necessarily unique. There may be multiple paths of the lowest weight
from one vertex to another; we are content to find any one of them.
Parallel edges and self-loops may be present. In the text, we assume that parallel edges are not
present and use the notation v->w to refer to the edge from v to w, but our code handles them
without difficulty.
A weighted graph is a graph in which each edge has a numerical value associated with it.
This algorithm follows the dynamic programming approach to find the shortest paths.
Page110
Follow the steps below to find the shortest path between all the pairs of vertices.
1. Create a matrix A1 of dimension n*n where n is the number of vertices. The row and the column are
indexed as i and j respectively. i and j are the vertices of the graph.
th th
Each cell A[i][j] is filled with the distance from the i vertex to the j vertex. If there is no path
from ith vertex to jth vertex, the cell is left as infinity.
2. Now, create a matrix A1 using matrix A0. The elements in the first column and the first row are left
as they are. The remaining cells are filled in the following way.
Let k be the intermediate vertex in the shortest path from source to destination. In this step, k is the
first vertex.A[i][j] is filled with (A[i][k] + A[k][j]) if (A[i][j] > A[i][k]+A[k][j]).
That is, if the direct distance from the source to the destination is greater than the path through the
vertex k, then the cell is filled with A[i][k] + A[k][j].
In this step, k is vertex 1. We calculate the distance from source vertex to destination vertex
throughthisvertexk.
For example: For A1[2, 4], the direct distance from vertex 2 to 4 is 4 and the sum of the distance
from vertex 2 to 4 through vertex (ie. from vertex 2 to 1 and from vertex 1 to 4) is 7. Since 4 <
7, A0[2, 4] is filled with 4.
3. In a similar way, A2 is created using A3. The elements in the second column and the second row are
left as they are.In this step, k is the second vertex (i.e. vertex 2). The remaining steps are the same as
in step 2.
Page111
4. Similarly, A3 and A4 is also created.
n = no of vertices
A = matrix of dimension n*n
for k = 1 to n
for i = 1 to n
for j = 1 to n
Ak[i, j] = min (Ak-1[i, j], Ak-1[i, k] + Ak-1[k, j])
return A
Time Complexity
There are three loops. Each loop has constant complexities. So, the time complexity of the Floyd-
Warshall algorithm is O(n3).
Space Complexity
The space complexity of the Floyd-Warshall algorithm is O(n2).
Page112
Using Floyd Warshall Algorithm, find the shortest path distance between every pair of vertices.
Solution:
Step-01:
Remove all the self loops and parallel edges (keeping the lowest weight edge) from the graph.
In the given graph, there are neither self edges nor parallel edges.
Step-02:
Write the initial distance matrix.
It represents the distance between every pair of vertices in the form of given weights.
For diagonal elements (representing self-loops), distance value = 0.
For vertices having a direct edge between them, distance value = weight of that edge.
For vertices having no direct edge between them, distance value = ∞.
Initial distance matrix for the given graph is-
Step-03:
Using Floyd Warshall Algorithm, write the following 4 matrices-
The last matrix D4 represents the shortest path distance between every pair of vertices.
In the above problem, there are 4 vertices in the given graph.
Page113
So, there will be total 4 matrices of order 4 x 4 in the solution excluding the initial distance matrix.
Diagonal elements of each matrix will always be 0.
This algorithm is used for constructing the shortest path. Show that matrices D(k) and π(k) computed by
the Floyd-Warshall algorithm for the graph.
Solution:
Page114
Step (ii) When k =1
Page115
Step (iv) When k = 3
Page116
Step (vi) When k = 5
Example 2:
Page 1 Page 2
Page117
Page 3 Page 4
Page 6
Page 5
Page118
INTRODUCTION TO DYNAMIC PROGRAMMING PARADIGM:
Dynamic Programming is the most powerful design technique for solving optimization problems.
Divide& Conquer algorithm partition the problem into disjoint sub-problems, solves the sub-problems
recursively and then combine their solution to solve the original problems.
It is used when the sub-problems are not independent, e.g. when they share the same sub-problems.
In this case, divide and conquer may do more work than necessary, because it solves the same sub
problem multiple times.
It solves each sub-problem just once and stores the result in a table so that it can be repeatedly
retrieved if needed again.
It is a Bottom-up approach. We solve all possible small problems and then combine to obtain
solutions for bigger problems.
It is a paradigm of algorithm design in which an optimization problem is solved by a combination of
achieving sub-problem solutions and appearing to the "principle of optimality".
o Optimal Substructure: If an optimal solution contains optimal sub solutions then a problem
exhibits optimal substructure.
o Overlapping sub-problems: When a recursive algorithm would visit the same sub-problems
repeatedly, then a problem has overlapping sub-problems.
If a problem has optimal substructure, then we can recursively define an optimal solution. If a
problem has overlapping sub-problems, then we can improve on a recursive implementation by
computing each sub-problem only once.
If a problem doesn't have optimal substructure, there is no basis for defining a recursive algorithm to
find the optimal solutions. If a problem doesn't have overlapping sub problems, we don't have
anything to gain by using dynamic programming.
If the space of sub-problems is enough (i.e. polynomial in the size of the input), dynamic
programming can be much more efficient than recursion.
Page119
Elements of Dynamic Programming:
There are basically three elements that characterize a dynamic programming algorithm:-
1. Substructure: Decompose the given problem into smaller sub-problems. Express the solution
of the original problem in terms of the solution for smaller problems.
2. Table Structure: After solving the sub-problems, store the results to the sub problems in a
table. This is done because sub-problem solutions are reused many times and we do not want to
repeatedly solve the same problem over and over again.
3. Bottom-up Computation: Using table, combine the solution of smaller sub-problems to solve
larger sub-problems and eventually arrives at a solution to complete problem.
Bottom-up means
i. Start with smallest sub-problems.
ii. Combining their solutions obtain the solution to sub-problems of increasing size.
iii. Until solving at the solution of the original problem.
1. Stages: The problem can be divided into several sub-problems, which are called stages. A stage
is a small portion of a given problem. For example, in the shortest path problem, they were
defined by the structure of the graph.
2. States: Each stage has several states associated with it. The states for the shortest path problem
were the node reached.
3. Decision: At each stage, there can be multiple choices out of which one of the best decisions
should be taken. The decision taken at every stage should be optimal; this is called a stage
decision.
Page120
4. Optimal policy: It is a rule which determines the decision at each stage; a policy is called an
optimal policy if it is globally optimal. This is known as Bellman principle of optimality.
5. Given the current state, the optimal choice for each of the remaining states does not depend on
the previous states or decisions. In the shortest path problem, it was not necessary to know how
we got a node only that we did.
6. There exist recursive relationships that identify the optimal decisions for stage j, given that stage
j+1, has already been solved.
7. The final stage must be solved by itself.
Knapsack, items cannot be broken which means the thief should take the item as a whole or should
leave it. This is reason behind calling it as 0-1 Knapsack.
Page121
Hence, in case of 0-1 Knapsack, the value of xi can be either 0 or 1, where other constraints remain the
same.
0-1 Knapsack cannot be solved by Greedy approach. Greedy approach does not ensure an optimal
solution. In many instances, Greedy approach may give an optimal solution.
Example 1
Let us consider that the capacity of the knapsack is W = 25 and the items are as shown in the following
table.
Item A B C D
Profit 24 18 18 10
Weight 24 10 10 7
Without considering the profit per unit weight (pi/wi), if we apply Greedy approach to solve this
problem, first item A will be selected as it will contribute maximum profit among all the elements.
After selecting item A, no more item will be selected. Hence, for this given set of items total profit
is 24. Whereas, the optimal solution can be achieved by selecting items B and C, where the total profit
is 18 + 18 = 36.
Example 2
Instead of selecting the items based on the overall benefit, in this example the items are selected based
on ratio pi/wi. Let us consider that the capacity of the knapsack is W = 60 and the items are as shown in
the following table.
Item A B C
Price 100 280 120
Weight 10 40 20
Ratio 10 7 6
Using the Greedy approach, first item A is selected. Then, the next item B is chosen. Hence, the total
profit is 100 + 280 = 380. However, the optimal solution of this instance can be achieved by selecting
items, B and C, where the total profit is 280 + 120 = 400.
Hence, it can be concluded that Greedy approach may not give an optimal solution.
To solve 0-1 Knapsack, Dynamic Programming approach is required.
Problem Statement
Page122
A thief is robbing a store and can carry a maximal weight of W into his knapsack. There are n items
and weight of ith item is wi and the profit of selecting this item is pi. What items should the thief take?
Dynamic-Programming Approach
Let i be the highest-numbered item in an optimal solution S for W dollars. Then S' = S - {i} is an
optimal solution for W - wi dollars and the value to the solution S is Vi plus the value of the sub-
problem.
We can express this fact in the following formula: define c[i, w] to be the solution for items 1,2, … ,
i and the maximum weight w.
The algorithm takes the following inputs
The maximum weight W
The number of items n
The two sequences v = <v1, v2, …, vn> and w = <w1, w2, …, wn>
Dynamic-0-1-knapsack (v, w, n, W)
for w = 0 to W do
c[0, w] = 0
for i = 1 to n do
c[i, 0] = 0
for w = 1 to W do
ifwi ≤ w then
if vi + c[i-1, w-wi] then
c[i, w] = vi + c[i-1, w-wi]
else c[i, w] = c[i-1, w]
else
c[i, w] = c[i-1, w]
The set of items to take can be deduced from the table, starting at c[n, w] and tracing backwards where
the optimal values came from.
If c[i, w] = c[i-1, w], then item i is not part of the solution and we continue tracing with c[i-1, w].
Otherwise, item i is part of the solution and we continue tracing with c[i-1, w-W].
Analysis
This algorithm takes θ(n, w) times as table c has (n + 1).(w + 1) entries, where each entry requires θ(1)
time to compute.
Longest Common Subsequence (LCS)
The longest common subsequence problem is finding the longest sequence which exists in both the
given strings.
Page123
Subsequence
Common Subsequence
Suppose, X and Y are two sequences over a finite set of elements. We can say that Z is a common
subsequence of X and Y, if Z is a subsequence of both X and Y.
If a set of sequences are given, the longest common subsequence problem is to find a common
subsequence of all the sequences that is of maximal length.
Naïve Method
Let X be a sequence of length m and Y a sequence of length n. Check for every subsequence
of X whether it is a subsequence of Yand return the longest common subsequence found.
There are 2m subsequences of X. Testing sequences whether or not it is a subsequence
of Y takes O(n) time. Thus, the naïve algorithm would take O(n2m) time.
Dynamic Programming
Let X = < x1, x2, x3,…,xm > and Y = < y1, y2, y3,…, yn > be the sequences. To compute the length of an
element the following algorithm is used.In this procedure, table C[m, n] is computed in row major
order and another table B[m,n] is computed to construct optimal solution.
Algorithm: LCS-Length-Table-Formulation (X, Y)
m := length(X) n := length(Y)
for i = 1 to m do
C[i, 0] := 0
for j = 1 to n do
C[0, j] := 0
for i = 1 to m do
for j = 1 to n do
if xi = yj
C[i, j] := C[i - 1, j - 1] + 1
B[i, j] := ‗D‘
else
if C[i -1, j] ≥ C[i, j -1]
Page124
C[i, j] := C[i - 1, j] + 1
B[i, j] := ‗U‘
else
C[i, j] := C[i, j - 1]
B[i, j] := ‗L‘
return C and B
Example
In this example, we have two strings X = BACDB and Y = BDCB to find the longest common
subsequence. Following the algorithm LCS-Length-Table-Formulation (as stated above), we have
calculated table C (shown on the left hand side) and table B (shown on the right hand side).In table B,
instead of ‗D‘, ‗L‘ and ‗U‘, we are using the diagonal arrow, left arrow and up arrow, respectively.
After generating table B, the LCS is determined by function LCS-Print. The result is BCB.
Page125
MODULO REPRESENTATION OF POLYNOMIALS/INTEGERS:
Addition:
Suppose to find the sum of 23 & 3. Just adding the triplets for 23 & 3 will give the triplet for the sum.
This is a little tricky, though. it has to be done !23 + 3= (1, 2, 3) + (1, 0, 3) = (2, 2, 6)
- just add the corresponding remainders when divided by 2,3 and 5.
In this triplet, the 1st number represents the remainder when divided by 2, the second the remainder
when divided by 3 and the third, the remainder when divided by5.
However, the remainder when divided by 2 can never be greater than 1, that when divided by 3 can
never be greater than 2 & that when divided by 5 can never be greater than 4.
So, the solution we have obtained above has to be reduced further. For that, take each number in the
Page126
representation of the sum. If it is greater than the allowed limit, take the modulo of that number again.
So, for (2, 2, 6),
i. 2 > 1, which is the maximum possible remainder when divided by 2. Hence [2]2=0.
ii. 2 = 2, which is the maximum possible remainder when divided by 3.So,let's leave it as it is!
iii. 6 > 4, which is the maximum possible remainder when divided by 5. Hence [6]5= 1.Hence the
representation for 23 + 3 =26 is (0, 2, 1).
Subtraction:
If we wish to subtract 3 from 23,23 -3 = (1, 2, 3) - (1, 0, 3) = (0, 2, 0)
- just subtract the respective remainders. Thus, 20 = (0,2,0)
Now, let us see what to do if we get a negative number in the remainder.
Take the example 17 - 9.
17 = (1, 2, 2)
9 = (1, 0, 4)
17 - 9 = (1, 2, 2) - (1, 0, 4) = (0, 2, -2)
Now, we have -2 as the remainder when divided by 5. As in addition, the modulo has to be found for -2.
But, how do you find the modulo of a negative number ? Well, it's very simple !
Add the base to the remainder. In this case, the base is 5 and the remainder is -2. Adding, we get 3.
Now, take the modulo of 3 with respect to the base 5 i.e.[3] 5. The result is 3.
Hence 17 - 9 = 8 = (0, 2, 3).
Multiplication:
Multiplication is as simple as addition and subtraction. Just multiply the respective remainders!
Take the numbers 4 and 7.
4 = (0, 1, 4)
7 = (1, 1, 2)
4 × 7 = (0, 1, 4) × (1, 1, 2) = (0, 1, 8).
As before, the remainders must be less than the allowable limit. Here 8 > 4, which is the maximum
possible remainder when divided by 5. Taking [8] 5, we get 3.Hence 4 × 7 = 28 = (0, 1, 3)
This representation will hold for any number of bases relatively prime to each other. For example, if we
wish to represent a number with respect to 4 bases, we will have 4 numbers in the representation.
Polynomials modulo m:
Let us look at m = 17 as an example. Then mod-m-polynomials are expressions like
10x 4 + 14x + 2 or x 3 + 2x or x 10 + 7.
This means there are powers
x 0 = 1, x1 = x, x2 , x3 , x4 , . . . , of a symbol x (the variable).
Among these powers there is the expression x0, to be read as ―1‖ and to be omitted when it appears as a
factor: 2 · x 0 = 2. Instead of x 1 we write x. Such powers of x we can multiply with numbers c between
0 and m − 1 to obtain terms cxj .The factor c here is then called a coefficient. We get polynomials by
adding arbitrary terms with different powers of x. In order not to get confused we normally order the
terms according to falling powers of x, but we may also write 5x + 3x 2 + 1 or 5x + 2x 2 + 1 + x 2 :
these are just a different way of writing 3x 2 + 5x + 1. 1
If we are given an expression like
2x 4 − 3x 3 + 30x 2 + 3
Page127
that is not really a polynomial modulo 17, because coefficients that are negative or larger than 16 are
illegal, we may just take all numbers modulo 17 to obtain a proper mod-17-polynomial:
2x 4 + 14x 3 + 13x 2 + 3.
The general format for a mod-m-polynomial is the following:
(2) cn · x n + cn−1 · x n−1 + · · · + c2 · x 2 + c1 · x + c0.
Here cn, . . . , c0 are numbers between 0 and m − 1 (endpoints included).
It is interesting to note that if one wants to calculate with polynomials there is no need to deal with the
―x‖ and its powers at all. One just stores the numbers c0, c1, . . . ,cn, e. g. in an array C[0..n], and has
the complete information.
Lastly define:
w1 ≡ y1z1 (mod m)
w2 ≡ y2z2 (mod m)
w3 ≡ y3z3 (mod m)
w4 ≡ y4z4 (mod m)
Then w1, w2, w3, and w4 have the properties in the table
Example:
Solve the simultaneous congruences
x ≡ 6 (mod 11),
x ≡ 13 (mod 16),
x ≡ 9 (mod 21),
x ≡ 19 (mod 25).
Solution:
Since 11, 16, 21, and 25 are pairwise relatively prime, the Chinese Remainder Theorem tells us that
there is a unique solution modulo m, where m = 11⋅16⋅21⋅25 = 92400.
We apply the technique of the Chinese Remainder Theorem with
k = 4, m1 = 11, m2 = 16, m3 = 21, m4 = 25,
a1 = 6, a2 = 13, a3 = 9, a4 = 19, to obtain the solution.
We compute
z1 = m / m1 = m2m3m4 = 16⋅21⋅25 = 8400
z2 = m / m2 = m1m3m4 = 11⋅21⋅25 = 5775
z3 = m / m3 = m1m2m4 = 11⋅16⋅25 = 4400
z4 = m / m4 = m1m3m3 = 11⋅16⋅21 = 3696
y1 ≡ z1 –1 (mod m1) ≡ 8400–1 (mod 11) ≡ 7–1 (mod 11) ≡ 8 (mod 11)
y2 ≡ z2 –1 (mod m2) ≡ 5775–1 (mod 16) ≡ 15–1 (mod 16) ≡ 15 (mod 16)
y3 ≡ z3 –1 (mod m3) ≡ 4400–1 (mod 21) ≡ 11–1 (mod 21) ≡ 2 (mod 21)
y4 ≡ z4 –1 (mod m4) ≡ 3696–1 (mod 25) ≡ 21–1 (mod 25) ≡ 6 (mod 25)
Page129
The solution, which is unique modulo 92400, is
x ≡ a1w1 + a2w2 + a3w3 + a4w4 (mod 92400)
≡ 6⋅67200 + 13⋅86625 + 9⋅8800 + 19⋅22176 (mod 92400)
≡ 2029869 (mod 92400) ≡ 51669 (mod 92400).
We are given two arrays num[0..k-1] and rem[0..k-1]. In num[0..k-1], every pair is co-prime (gcd for
every pair is 1). We need to find minimum positive number x such that:
x % num[0] = rem[0],
x % num[1] = rem[1],
.......................
x % num[k-1] = rem[k-1]
Basically, we are given k numbers which are pairwise co-prime and given remainders of these numbers
when an unknown number x is divided by them. We need to find the minimum possible value of x that
produces given remainders.
Examples:
Input: num[] = {5, 7}, rem[] = {1, 3}
Output: 31
Output: 11
Explanation:
Page130
CONVERSION BETWEEN BASE REPRESENTION AND MODULO
REPRESENTION:
Representing a number in a base different from the customary 10 may have a great advantage.
For example, binary representation is instrumental in solving Nim, Scoring, Turning Turtles and
other puzzles. Along with the binary, the science of computers employs bases 8 and 16 for it's very
easy to convert between the three while using bases 8 and 16 shortens considerably number
representations.
To represent 8 first digits in the binary system we need 3 bits. Thus we have, 0 = 000, 1 = 001, 2 =
010, 3 = 011,4 = 100, 5 = 101, 6 = 110, 7 = 111. Assume M = (2065)8.
In order to obtain its binary representation, replace each of the four digits with the corresponding
triple of bits: 010 000 110 101. After removing the leading zeros, binary representation is
immediate: M = (10000110101)2. (For the hexadecimal system, conversion is quite similar, except
that now one should use 4-bit representation of numbers below 16.)
This fact follows from the general conversion algorithm and the observation that 8 = 23 (and, of
course, 16 = 24.) Thus it appears that the shortest way to convert numbers into the binary system is
to first convert them into either octal or hexadecimal representation. Now let us see how to
implement the general algorithm programmatically. Let us combine the first few conversion in a
table:
Representation of a number in a system with base (radix) N may only consist of digits that are less than
N. More accurately, if
(1) M = akNk + ak-1Nk-1 + ... + a1N1 + a0
with 0 ≤ ai < N. we have a representation of M in base N system and write
M = (akak-1...a0)N
If we rewrite (1) as
(2) M = a0 + N·(a1 + N·(a2 + N·...))
The algorithm for obtaining coefficients ai becomes more obvious. For example, a0 =
M modulo n and a1 = M/N (modulo n) and so on.
Recursive implementation
Let's represent the algorithm mnemonically: (result is a string or character variable where I shall
accumulate the digits of the result one at a time)
Page131
1. result = ""
2. if M < N, result = 'M' + result. Stop.
3. S = M mod N, result = 'S' + result
M = M/N
4. goto 2
A few words of explanation.
1. It is an empty string. You may remember it's a zero element for string concatenation.
2. Here we check whether the conversion procedure is over. It's over if M is less than N in which
case M is a digit (with some qualification for N>10) and no additional action is necessary. Just
prepend it in front of all other digits obtained previously. The '+' plus sign stands for the string
concatenation.
3. If we got this far, M is not less than N. First we extract its remainder of division by N, prepend
this digit to the result as described previously and reassign M to be M/N.
4. This says that the whole process should be repeated starting with step 2.
I would like to have a function say called Conversion that takes two arguments M and N and
returns representation of the number M in base N. The function might look like this
1 String Conversion(int M, int N) // return string, accept two integers
2{
3 if (M < N) // see if it's time to return
4 return new String(""+M); // ""+M makes a string out of a digit
5 Else // the time is not yet ripe
6 return Conversion(M/N, N) +
// continue
new String(""+(M mod N));
7}
At some point the function calls itself with a different first argument. One may say that the function is
defined in terms of itself. Such functions are called recursive. (The best known recursive function
is factorial: n! = n·(n-1)!.) The function calls (applies) itself to its arguments and then (naturally)
applies itself to its new arguments and then ... and so on. We can be sure that the process will eventually
stop because the sequence of arguments (the first ones) is decreasing. Thus sooner or later the first
argument will be less than the second and the process will start emerging from the recursion, still a step
at a time.
Iterative implementation
The iterative implementation of the Conversion function might look as the following.
Page132
5 {
6 7 stack.push(M mod N); // store a digit
7 M = M/N; // find new M
8 }
9 // now it's time to collect the digits together
10 String str = new String(""+M); // create a string with a single digit M
11 while (stack.NotEmpty())
12 str = str+stack.pop() // get from the stack next digit
13 return str;
14 }
The function is by far longer than its recursive counterpart; but, as I said, sometimes it's the one we
want to use and sometimes it's the only one you may actually use.
Introduction to Polynomials:
Definition: A polynomial is an expression of the form:
an · x n + an −1 · x n −1 + … + a2 · x 2 + a1 · x + a0,where x is a variable, n is a positive integer
and a0, a1, … , an −1, an are constants. The highest power of x that occurs is called the degree of the
polynomial. The terms are usually written in order from highest power of x to lowest power.
Examples:
Any quadratic is a polynomial of degree 2.
4 2
x − 8 x is a polynomial of degree 4.
5 3
x − 8 x + 10 x + 6 is a polynomial of degree 5.
A polynomial is an expression consisting of variables and coefficients,that involves only the operations
of addition, subtraction, multiplication and non-negative integer exponents.
an x n + an −1 x n −1 + … + a2 x 2 + a1 x + a0
Page133
We can then call the output of the function ―y‖ and make a graph of y versus x. We will get a
curve like the two curves shown to the right. The graph of a polynomial function oscillates smoothly up
and down several times before finally ―taking off for good‖ in either the up or down direction.
The degree of the polynomial gives the maximum number of ―ups and downs‖ that the graph of the
polynomial can have. It also gives the maximum number of crossings of the x axis that the polynomial
can have.
Polynomial equation
If we set the polynomial equal to zero or if we set y = 0 or f (x) = 0 then we get a so-called polynomial
equation:
an x n + an −1 x n −1 + … + a2 x 2 + a1 x + a0 = 0.
(Note that setting y = 0 in the polynomial‘s graph means that we are looking at points where the graph
crosses the x axis and setting f (x) = 0 in the polynomial function means that we are looking for values
of x for which the output of the polynomial function is zero.
There is a close connection between:
The values of x that cause a polynomial to equal zero.
The places where a polynomial function‘s graph crosses the x axis.
The solutions of a polynomial equation.
The factors of a polynomial.
This connection is made formal by the Factor theorem and the Fundamental theorem of algebra.
Page134
Notes on the Factor theorem and the Fundamental theorem of algebra:
The factor theorem essentially says that finding the roots of a polynomial equation amounts to
finding the factors of the polynomial and vice versa.
If a is a real root, then x − a is a factor of the polynomial and the graph of the polynomial
crosses the x axis at x = a.
2
If a is a double root, then (x − a) is a factor of the polynomial and the graph of the polynomial
just touches the x axis at x = a rather than crosses it.
Over the real numbers, complex roots are not allowed and must be excluded from the count of
roots. That is why there may be less than n solutions over the real numbers.
If a + b i is a complex root then so is a − b i and the expression (x − a + b i) (x − a − b i) is a
factor of the polynomial. Over the real numbers this complex expression is not allowed and is
replaced by the real quadratic expression x 2 − 2 a x + (a 2 + b 2) (which is the previous
expression, but in unfactored form). A complex root a + b i does not correspond to a graph
crossing of the x axis.
Extension of Polynomials:
A polynomial extension operator is an extension operator with the additional property that whenever
the function on ∂K to be extended is the trace of a polynomial on K, the extended function is also
a polynomial. u is a polynomial of degree at most p on K.
INTERPOLATION PROBLEM:
The process of fitting a function through given data is called interpolation. Usually when we
have data, we don‘t know the function f(x) that generated the data. So we fit a certain class of functions.
The most usual class of functions fitted through data are polynomials. Polynomials are fitted through
data when we don‘t know f(x).The process of fitting a polynomial through given data is called
polynomial interpolation. Polynomials are often used because they have the property of approximating
any continuous function.
Given:
f(x) continuous on [a,b], ε>0 (called tolerance), then there is a polynomial P(x) of appropriate
degree which approximates the function within the given tolerance. Interpolation,
in mathematics, the determination or estimation of the value of f(x), or a function of x, from certain
known values of the function.
If x0 < … < xn and y0 = f(x0),…, yn = f(xn) are known and if x0 < x < xn, then the estimated value
of f(x) is said to be an interpolation. If x < x0 or x > xn, the estimated value of f(x) is said to be an
extrapolation.If x0, …, xn are given, along with corresponding values y0, …, yn (see the figure),
interpolation may be regarded as the determination of a function y = f(x) whose graph passes through
the n + 1 points, (xi, yi) for i = 0, 1, …, n. There are infinitely many such functions, but the simplest is
a polynomial interpolation function
y = p(x) = a0 + a1x + … + anxn
with constant ai‘s such that p(xi) = yi for i = 0, …, n.
Page135
DISCRETE FOURIER TRANSFORM:
Definition:
Let x0, …, xN−1 be complex numbers. The DFT is defined by the formula
N−1
𝑋𝐾 = ∑ 𝑥𝑛e−i2πkn/N k = 0, … … N − 1
𝑛=0
Any function that periodically repeats itself can be expressed as the sum of sines and/or cosines of
different frequencies, each multiplied by a different coefficient (Fourier series).
Even functions that are not periodic (but whose area under the curve is finite) can be expressed as the
integral of sines and/or cosines multiplied by a weighting function (Fourier transform).
The frequency domain refers to the plane of the two dimensional discrete Fourier transform of an
image. The purpose of the Fourier transform is to represent a signal as a linear combination of
sinusoidal signals of various frequencies. The one-dimensional Fourier transform and its inverse
Fourier transform (continuous case)
j2ux
F(u) f (x)e dx where j 1
Inverse Fourier transform:
f (x)
F(u)e
j2ux
du
Since e cos j sin and the fact cos( ) cos then discrete Fourier transform can be
j
•
redefined
1 M 1
F (u)
f (x)[cos 2ux / M j sin 2ux / M ]
M x0
for u 0,1,2,..., M 1
Page136
– Frequency (time) domain: the domain (values of u) over which the values of F(u) range;
because u determines the frequency of the components of the transform.
– Frequency (time) component: each of the M terms of F(u).
– F(u) can be expressed in polar coordinates:
F (u) F (u) e j(u)
1
where F (u) R 2 (u) I 2 (u) 2 (magnitude or spectrum)
I (u)
(u) tan 1 R(u) (phase angle or phase spectrum)
– R(u): the real part of F(u)
– I(u): the imaginary part of F(u)
– Power spectrum:
P(u) F (u) 2 R2 (u) I 2 (u)
• The two-dimensional Fourier transform and its inverse
– Fourier transform (discrete case) DTC
1 M 1 N 1
F (u, v) y0
MN x0 f (x, y)e j 2 (ux / M vy / N )
for u 0,1,2,..., M 1, v 0,1,2,..., N 1
– Inverse Fourier transform:
M 1 N 1
f (x, y) F (u, v)e
u 0 v0
j 2 (ux / M vy / N )
1
F (u, v) R 2 (u, v) I 2 (u, v) 2 ( spectrum)
I (u, v)
(u, v) tan1 R(u, v) (phase angle)
P(u,v) F (u, v) 2 R2 (u, v) I 2 (u, v) (power spectrum)
– R(u,v): the real part of F(u,v)
– I(u,v): the imaginary part of F(u,v)
• Some properties of Fourier transform:
Page137
f (x, y)(1)x y F (u
M
,v
N
) (shift)
2 2
M 1 N 1
1
F (0,0)
MN f (x, y)
x0 y 0
(average)
In mathematics, the discrete Fourier transform (DFT) converts a finite sequence of equally-
spaced samples of a function into a same-length sequence of equally-spaced samples of the discrete-
time Fourier transform (DTFT), which is a complex-valued function of frequency.
The interval at which the DTFT is sampled is the reciprocal of the duration of the input sequence.
Discrete Fourier Transform (DFT) is the discrete version of the Fourier Transform (FT)
that transforms a signal (or discrete sequence) from the time domain representation to its
representation in the frequency domain. Whereas, Fast Fourier Transform (FFT) is any efficient
algorithm for calculating the DFT.
An inverse DFT is a Fourier series, using the DTFT samples as coefficients of complex sinusoids at
the corresponding DTFT frequencies. It has the same sample-values as the original input sequence.
The DFT is a frequency domain representation of the original input sequence.
If the original sequence spans all the non-zero values of a function, its DTFT is continuous (and
periodic) and the DFT provides discrete samples of one cycle. If the original sequence is one cycle
of a periodic function, the DFT provides all the non-zero values of one DTFT cycle. The DFT is the
most important discrete transform, used to perform Fourier analysis in many practical applications.
In digital signal processing, the function is any quantity or signal that varies over time, such as the
pressure of a sound wave, a radio signal, or daily temperature readings, sampled over a finite time
interval (often defined by a window function).
The samples can be the values of pixels along a row or column of a raster image. The DFT is also
used to efficiently solve partial differential equations and to perform other operations such
as convolutions or multiplying large integers. Since it deals with a finite amount of data, it can be
implemented in computers by numerical algorithms or even dedicated hardware.
These implementations usually employ efficient fast Fourier transform (FFT) algorithms so much so
that the terms "FFT" and "DFT" are often used interchangeably. Prior to its current usage, the
"FFT" initialisation may have also been used for the ambiguous term "finite Fourier transform".
Definition:
N−1
i2π
− kn
𝑋𝐾 = ∑ 𝑥𝑛. e N k = 0, … … N − 1
𝑛=0
Page138
𝑁−1
2π 2π
= ∑ x𝑛. [cos ( kn) − i. sin ( kn)]
N N
𝑛=0
Where the last expression follows from the first one by Euler's formula.
The transform is sometimes denoted by the symbol , as in X=f(x) or f(x) or Fx.
The discrete analog of the formula for the coefficients of a Fourier series is
N−1
𝑥 =1 ∑ 𝑥 . e−i2πkn/N n ϵZ
n k
𝑁
𝑘=0
Consider the case of N-point complex DFT, it takes in N samples of complex-valued time domain
waveform and produces an array of length .
Page139
Next terms are positive frequency components with being the Nyquist frequency
(which is equal to half of sampling frequency).
Next terms are negative frequency components (note: negative frequency
components are the phasors rotating in opposite direction, they can be optionally omitted
depending on the application).The corresponding synthesis equation (reconstruct from
frequency domain samples ) is
From these equations we can see that the real DFT is computed by projecting the signal on cosine and
sine basis functions. However, the complex DFT projects the input signal on exponential basis
functions (Euler‘s formula connects these two concepts).When the input signal in the time domain is
real valued, the complex DFT zero-fills the imaginary part during computation (That‘s its flexibility and
avoids the caveat needed for real DFT). The following figure shows how to interpret the raw FFT
results in Matlab that computes complex DFT.
Page140
Evaluating this definition directly requires operations: there are N outputs Xk, and each output
requires a sum of N terms.
An FFT is any method to compute the same results in o(N log N) operations. All known FFT
algorithms require Θ(NlogN) operations, although there is no known proof that a lower complexity
score is impossible.
To illustrate the savings of an FFT, consider the count of complex multiplications and additions for
N=4096 data points.
Evaluating the DFT's sums directly involves N2 complex multiplications and N(N − 1) complex
additions, of which O(N) operations can be saved by eliminating trivial operations such as
multiplications by 1, leaving about 30 million operations.
On the other hand, the radix-2 Cooley–Tukey algorithm, for N a power of 2, can compute the same
result with only (N/2)log2(N) complex multiplications (again, ignoring simplifications of
multiplications by 1 and similar) and N log2(N) complex additions, in total about 30,000 operations
- a thousand times less than with direct evaluation. In practice, actual performance on modern
computers is usually dominated by factors other than the speed of arithmetic operations and the
analysis is a complicated subject but the overall improvement from to
remains.
Cooley–Tukey algorithm
By far the most commonly used FFT is the Cooley–Tukey algorithm. This is a divide and conquer
algorithm that recursively breaks down a DFT of any composite size N = N1N2into many smaller
DFTs of sizes N1 and N2, along with O(N) multiplications by complex roots of unity traditionally
called twiddle factors.
This method (and the general idea of an FFT) was popularized by a publication of Cooley and
Tukey in 1965, but it was later discovered that those two authors had independently re-invented an
algorithm known to Carl Friedrich Gauss around 1805 (and subsequently rediscovered several times
in limited forms).
The best known use of the Cooley–Tukey algorithm is to divide the transform into two pieces of
size N/2 at each step and is therefore limited to power-of-two sizes, but any factorization can be
used in general (as was known to both Gauss and Cooley/Tukey).
These are called the radix-2 and mixed-radix cases, respectively (and other variants such as
the split-radix FFT have their own names as well).
Although the basic idea is recursive, most traditional implementations rearrange the algorithm to
avoid explicit recursion.
Also, because the Cooley–Tukey algorithm breaks the DFT into smaller DFTs, it can be combined
arbitrarily with any other algorithm for the DFT.
Page141
Other FFT algorithms:
Main articles: Prime-factor FFT algorithm, Bruun's FFT algorithm, Rader's FFT
algorithm, Bluestein's FFT algorithm and Hexagonal fast Fourier transform.
There are FFT algorithms other than Cooley–Tukey. Cornelius Lanczos did pioneering work on the
FFT and FFS (fast Fourier sampling method) with G. C. Danielson (1940).
For N = N1N2 with co-prime N1 and N2, one can use the prime-factor (Good–Thomas) algorithm
(PFA), based on the Chinese remainder theorem, to factorize the DFT similarly to Cooley–Tukey
but without the twiddle factors.
The Rader–Brenner algorithm (1976) is a Cooley–Tukey-like factorization but with purely
imaginary twiddle factors, reducing multiplications at the cost of increased additions and
reduced numerical stability.
It was later superseded by the split-radix variant of Cooley–Tukey (which achieves the same
multiplication count but with fewer additions and without sacrificing accuracy).
Algorithms that recursively factorize the DFT into smaller operations other than DFTs include the
Bruun and QFT algorithms.
The Rader–Brenner and QFT algorithms were proposed for power-of-two sizes, but it is possible
that they could be adapted to general composite N. Bruun's algorithm applies to arbitrary even
composite sizes.
Bruun's algorithm, in particular, is based on interpreting the FFT as a recursive factorization of
the polynomialsN − 1, here into real-coefficient polynomials of the form zM − 1 and z2M + azM + 1.
Schonhage-Strassen’s Algorithm:
The Schönhage–Strassen algorithm is an asymptotically fast multiplication algorithm for
large integers. It was developed by Arnold Schönhage and VolkerStrassen in 1971.
The run-time bit complexity is, in Big O notation, O(n.log n .log.log n ) for two n-digit numbers. The
algorithm uses recursive Fast Fourier transforms in rings with 2n+1 elements, a specific type of number
theoretic transform.
Page142
Basic Idea:
First algorithms to use FFT (Schonhage and Strassen 1971)
Uses ring Rn = Z/(2n + 1)Z for transform, with l = 2k | n
Then 2n/l ≡ −1 (mod 2n + 1): so 2n/l ∈Rn is 2l-th root of unity, multiplication by powers of 2 is
fast! (O(n))
Allows length l weighted transform for negacyclic convolution
Write input a = A(2M), b = B(2M), compute C(x) = A(x)B(x) (mod xl + 1). Then c = C(2M) =
ab (mod 2Ml + 1)
Point-wise products modulo 2n + 1 use SSA recursively: choose next level‘s l' , M' so that M' l'
=n
Schonhage-Strassen’s Algorithm:
SSA reduces multiplication of two S-bit integers to l- multiplications of approx. 4S/l-bit integers
Example: multiply two numbers a, b of 220 bits each ⇒ product has at most 2^21 bits
1. Choose N = 2^21 and a good l, for this example l = 512. We compute ab mod (2N + 1)
2. Write a as polynomial of degree l, coefficients ai< 2M with M = N/l, a = a(2M). Same for b
3. ab = a(2M)b(2M) mod (2N + 1), compute c(x) = a(x)b(x) mod (xl + 1)
4. Convolution theorem: Fourier transform and pointwise multiplication Alexander Kruppa 14
INRIA RocquencourtSchonhage-Strassen‘s Algorithm
5. FFT needs l-th root of unity: map to Z/(2n + 1)Z[x] with l | n. Then 22n/l has order l
6. We need 2n + 1 > ci: choose n ≥ 2M + log2(l)+1
7. Compute c(x) = a(x)b(x) mod (xl + 1), evaluate ab = c(2M) - and we are done!
8. Benefits: - Root of unity is power of 2 - Reduction mod(2n + 1) is fast - Point-wise products
can use SSA recursively without padding
Mersenne Transform:
Convolution theorem implies reduction mod(xl − 1)
Convolution mod(xl + 1) needs weights θi with ord(θ)=2l, needs l | n to get 2l-th root of unity in
Rn
Computing ab mod (2N + 1) to allow recursive use of SSA, but is not required at top level
Map a and b to Z/(2N − 1)Z instead: compute c(x) = a(x)b(x) mod (xl − 1)
Page143
Condition relaxes to l | 2n. Twice the transform length, smaller n
No need to apply/unapply weights
CRT Reconstruction:
At least one of (2rN −1, 2sN +1) and (2rN +1, 2sN −1) is co-prime
Our implementation uses (2rN +1, 2N −1) : always co-prime, good speed
Smaller convolution, finer-grained parameter selection
The √2 Trick
If 4 | n, 2 is a quadratic residue in Z/(2n + 1)Z
In that case, √2 ≡ 23n/4 − 2n/4: simple form, multiplication by √2 takes only 2 shift, 1
subtraction modulo 2n + 1
Offers root of unity of order 4n, allows l | 2n for Fermat transform, l | 4n for Mersenne transform
Sadly, higher roots of 2 usually not available in Z/(2n + 1)Z, or have no simple form
Arithmetic modulo:
2n + 1 Residues stored semi-normalized (< 2n+1), each with m = n/w full words plus one extra
word ≤ 1
Adding two semi-normalized values: c = a[m] + b[m] + mpn_add_n (r, a, b, m); r[m] = (r[0] <
c); MPN_DECR_U (r, m + 1, c - r[m]); Assures r[m] = 0 or 1, c − r[m] > r[0] ⇒ r[m] = 1 so
carry propagation must terminate.
Conditional branch only in mpn_add_n loop and (almost always non-taken) in carry propagation
of MPN_DECR_U • Similar for subtraction, multiplication by 2k
Page144
UNIT-5
Page145
LINEAR PROGRAMMING
Linear programming (LP) is a method to achieve the optimum outcome under some requirements
represented by linear relationships. LP is an optimization technique for a system of linear constraints
and a linear objective function. An objective function defines the quantity to be optimized and the goal
of linear programming is to find the values of the variables that maximize or minimize the objective
function subject to some linear constraints. In general, the standard form of LP consists of
2. Additivity: Additivity means that the total value of the objective function and each constraint
function is obtained by adding up the individual contributions from each variable.
3. Divisibility. The decision variables are allowed to take on any real numerical values within some
range specified by the constraints. That is, the variables are not restricted to integer values. When
fractional values do not make a sensible solution, such as the number of flights an airline should have
each day between two cities, the problem should be formulated and solved as an integer program.
4. Certainty: We assume that the parameter values in the model are known with certainty or are at least
treated that way. The optimal solution obtained is optimal for the specific problem formulated. If the
parameter values are wrong, then the resulting solution is of little value. In practice, the assumptions
of proportionality and additivity need the greatest care and are most likely to be violated by the
modeler. With experience, we recognize when integer solutions are needed and the variables must be
modeled explicitly.In practice, the assumptions of proportionality and additivity need the greatest
care and are most likely to be violated by the modeler. With experience, we recognize when integer
solutions are needed and the variables must be modeled explicitly.
Solution
The quantities that the manager controls are the amounts of each grain to feed each sheep daily. We
define xj number of pounds of grain j ( = 1, 2, 3) to feed each sheep daily. Note that the units of
measure are completely specified. In addition, the variables are expressed on a per sheep basis. If we
minimize the cost per sheep, we minimize the cost for any group of sheep. The daily feed cost per sheep
will be
(cost per lb of grain j)/(lb. of grain j fed to each sheep daily)
That is, the objective function is to
Minimize z = 41x1 + 36x2 + 96x3nutrient
Why can‘t the manager simply make all the variables equal to zero? This keeps costs at zero, but the
manager would have a flock of dead sheep, because there are minimum constraints that must be
satisfied. The values of the variables must be chosen so that the number of units of nutrient A consumed
daily by each sheep is equal to or greater than 110. Expressing this in terms of the variables yields
20x130x270x3 110
The constraints for the other nutrients are
10x110x2 18
50x130x2 90
6x12.5x210x3 110 and
finally all xj s 0
The optimal solution to this problem (obtained using a computer software package) is x1= 0.595, x2
=2.008, x3 =0.541 and z 148.6 cents.
Page149
THE GEOMETRY OF LINEAR PROGRAMS
The characteristic that makes linear programs easy to solve is their simple geometric structure. Let‘s
define some terminology. A solution for a linear program is any set of numerical values for the
variables. These values need not be the best values and do not even have to satisfy the constraints or
make sense. For example, in the Healthy Pet Food problem, M = 25 and Y = -800 is a solution, but it
does not satisfy the constraints, nor does it make physical sense. A feasible solution is a solution that
satisfies all of the constraints. The feasible set or feasible region is the set of all feasible solutions.
Finally, an optimal solution is the feasible solution that produces the best objective function value
possible.
All Solutions
Feasible Solutions
Optimal Solutions
The above figure shows the relationships among these types of solutions. Let‘s use the Healthy Pet
Food example to show the geometry of linear programs and to show how two-variable problems can be
solved graphically. The linear programming formulation for the Healthy Pet Food problem is:
Maximize z 0.65M 0.45Y
Subject to 2M + 3Y ≤ 400,000
3M + 1.5Y ≤ 300,000
M ≤ 90,000
M, Y ≥0
The fundamental theorem of linear programming reduces to a finite value the number of feasible
solutions that need to be evaluated. One solution strategy might be to identify the coordinates of
every extreme point and then evaluate the objective function at each. The one that produces the best
objective function value is the optimum. In practice, this approach is not efficient because the
number of extreme points can be very large for real problems with hundreds or thousands of
variables and constraints.
The simplex algorithm begins by identifying an initial extreme point of the feasible set. The
algorithm then looks along each edge intersecting at the extreme point and computes the net effect
on the objective function if we were to move along the edge.
Page150
If the objective function value does not improve by moving along at least one of these edges, it can
be proved that the extreme point is optimal. If movement along one or more of the edges improves
the objective function value, we move along one of these edges until we reach a new extreme point.
We repeat the previous steps: checking along each edge intersecting at the extreme point and then
either stopping or sliding along another edge that improves the objective function value.
If an NP-hard problem can be solved in polynomial time, then all NP-complete problems can be
solved in polynomial time.
All NP-complete problems are NP-hard, but all NP- hard problems are not NP-complete.
The class of NP-hard problems is very rich in the sense that it contains many problems from a wide
variety of disciplines
Page152
The machines executing such operations are
allowedtochooseanyoneoftheseoutcomessubjecttoaterminationcondition.
This leads to the concept of non-deterministic algorithms.
To specify such algorithms in SPARKS, we introduce three statements Choice (s) ………
arbitrarily chooses one of the elements of the set S. Failure …. Signals an unsuccessful
completion. Success: Signals a successful completion.
Whenever there is a set of choices that leads to a successful completion then one such set of
choices is always made and the algorithm terminates.
A non-deterministic algorithm terminates unsuccessfully if and only if there exists no set of
choices leading to a successful signal.
Nondeterministic algorithms
A non deterministic algorithm consists of
Phase 1: guessing
Phase 2: checking
If the checking stage of a non-deterministic algorithm is of polynomial timecomplexity, then
this algorithm is called an NP (nondeterministic polynomial) algorithm.
NP problems : (must be decision problems)
e.g. Searching, MST Sorting, Satisfy ability problem (SAT), Travelling salesperson problem
(TSP)
Page153
Example of NP-Complete and NP-Hard Problem
Chromatic Number Decision Problem (CNP)
i. A coloring of a graph G = (V,E) is a function f : V { 1,2, …, k} i V
ii. If (U, V) E then f(u) f(v).
iii. The CNP is to determine if G has a coloring for a given K.
iv. Satisfiability with at most three literals per clause chromatic number problem. CNP is NP-hard.
CNF-Satisfiability with at most three literals per clause is NP-hard. If each clause is restricted to
have at most two literals then CNF-satisfiability is polynomial solvable.
Generating optimal code for a parallel assignment statement is NP-hard, However if the expressions
are restricted to be simple variables, then optimal code can be generated in polynomial time.
Generating optimal code for level one directed a-cyclic graphs is NP-hard but optimal code for trees
can be generated in polynomial time. Determining if a planner graph is three colorable is NP-Hard.
To determine if it is two colorable is a polynomial complexity problem. (It only have to see if it is
bipartite)
General Definitions
P, NP, NP-hard, NP-easy and NP-complete... - Polynomial-time reduction Examples of NP-complete
problems
P - Decision problems (decision problems) that can be solved in polynomial time - can be solved
―efficiently‖.
NP - Decision problems whose ―YES‖ answer can be verified in polynomial time, if we already
have the proof (or witness).
Co-NP - Decision problems whose ―NO‖ answer can be verified in polynomial time, if we
already have the proof (or witness) E.g. the satisfy ability problem (SAT) - Given a Boolean
formula is it possible to assign the input x1...x9, so that the formula evaluates to TRUE?
Page154
If the answer is YES with a proof (i.e. an assignment of input value), then we can check the
proof in polynomial time (SAT is in NP). We may not be able to check the NO answer in
polynomial time. (Nobody really knows.)
NP-hard - A problem is NP-hard if and only if an polynomial-time algorithm for it implies a
polynomial-time algorithm for every problem in NPNP-hard problems are at least as hard as NP
problems
NP-complete - A problem is NP-complete if it is NP-hardand is an element of NP.
If a polynomial time algorithm exists for any of these problems, all problems in NP would be
polynomial time solvable. These problems are called NP-complete. The phenomenon of NP-
completeness is important for both theoretical and practical reasons.
Definition of NP-Completeness
A language B is NP-complete if it satisfies two conditions
B is in NP
Every A in NP is polynomial time reducible to B.
If a language satisfies the second property, but not necessarily the first one, the language B is known
as NP-Hard. Informally, a search problem B is NP-Hard if there exists some NP-
Complete problem A that Turing reduces to B.
The problem in NP-Hard cannot be solved in polynomial time, until P = NP. If a problem is proved to
be NPC, there is no need to waste time on trying to find an efficient algorithm for it. Instead, we can
focus on design approximation algorithm.
NP-Complete Problems
Following are some NP-Complete problems, for which no polynomial time algorithm is known.
Determining whether a graph has a Hamiltonian cycle
Determining whether a Boolean formula is satisfiable, etc.
NP-Hard Problems
The following problems are NP-Hard
The circuit-satisfiability problem
Page155
Set Cover
Vertex Cover
Travelling Salesman Problem
In this context, now we will discuss TSP is NP-Complete
TSP is NP-Complete
The traveling salesman problem consists of a salesman and a set of cities. The salesman has to visit
each one of the cities starting from a certain one and returning to the same city. The challenge of the
problem is that the traveling salesman wants to minimize the total length of the trip
Proof
To prove TSP is NP-Complete, first we have to prove that TSP belongs to NP. In TSP, we find a tour
and check that the tour contains each vertex once. Then the total cost of the edges of the tour is
calculated. Finally, we check if the cost is minimum. This can be completed in polynomial time.
Thus TSP belongs to NP.
Secondly, we have to prove that TSP is NP-hard. To prove this, one way is to show that Hamiltonian
cycle ≤p TSP (as we know that the Hamiltonian cycle problem is NP-complete).
Assume G = (V, E) to be an instance of Hamiltonian cycle.
Hence, an instance of TSP is constructed. We create the complete graph G' = (V, E'), where
E′={(i,j):i,j∈Vandi≠jE′={(i,j):i,j∈Vandi≠j
Thus, the cost function is defined as follows −
t(i,j)={01if(i,j)∈Eotherwiset(i,j)={0if(i,j)∈E1otherwise
Now, suppose that a Hamiltonian cycle h exists in G. It is clear that the cost of each edge
in h is 0 in G' as each edge belongs to E. Therefore, h has a cost of 0 in G'. Thus, if graph G has a
Hamiltonian cycle, then graph G' has a tour of 0 cost.
Conversely, we assume that G' has a tour h' of cost at most 0. The cost of edges in E' are 0 and 1 by
definition. Hence, each edge must have a cost of 0 as the cost of h' is 0. We therefore conclude
that h' contains only edges in E.We have thus proven that G has a Hamiltonian cycle, if and only
if G' has a tour of cost at most 0. TSP is NP-complete.
APPROXIMATION ALGORITHM:
Introduction:
An Approximate Algorithm is a way of approach NP-COMPLETENESS for the optimization
problem.The goal of an approximation algorithm is to come as close as possible to the optimum value
in a reasonable amount of time which is at the most polynomial time. Such algorithms are called
approximation algorithm or heuristic algorithm.
o For the traveling salesperson problem, the optimization problem is to find the shortest cycle, and
the approximation problem is to find a short cycle.
o For the vertex cover problem, the optimization problem is to find the vertex cover with fewest
vertices and the approximation problem is to find the vertex cover with few vertices.
Page156
Performance Ratios:
Suppose we work on an optimization problem where every solution carries a cost. An Approximate
Algorithm returns a legal solution, but the cost of that legal solution may not be optimal.
For Example, suppose we are considering for a minimum size vertex-cover (VC). An approximate
algorithm returns a VC for us, but the size (cost) may not be minimized.
Another Example is we are considering for a maximum size Independent set (IS). An approximate
algorithm returns an IS for us, but the size (cost) may not be maximum. Let C be the cost of the
solution returned by an approximate algorithm and C* is the cost of the optimal solution. We say the
approximate algorithm has an approximate ratio P (n) for an input size n, where
Intuitively, the approximation ratio measures how bad the approximate solution is distinguished with
the optimal solution. A large (small) approximation ratio measures the solution is much worse than
(more or less the same as) an optimal solution.
Observe that P (n) is always ≥ 1, if the ratio does not depend on n, we may write P. Therefore, a 1-
approximation algorithm gives an optimal solution. Some problems have polynomial-time
approximation algorithm with small constant approximate ratios, while others have best-known
polynomial time approximation algorithms whose approximate ratios grow with n.
Vertex Cover
A Vertex Cover of a graph G is a set of vertices such that each edge in G is incident to at least one of
these vertices. The decision vertex-cover problem was proven NPC. Now, we want to solve the optimal
version of the vertex cover problem, i.e., we want to find a minimum size vertex cover of a given graph.
We call such vertex cover an optimal vertex cover C*.
An approximate algorithm for vertex cover:
Page157
Add u and v to C;
Remove from E' all edges incident to
u or v;
}
Return C;
}
The idea is to take an edge (u, v) one by one, put both vertices to C, and remove all the edges incident
to u or v. We carry on until all edges have been removed. C is a VC. But how good is C?
VC = {b, c, d, e, f, g}
Traveling-salesman Problem
In the traveling salesman Problem, a salesman must visits n cities. We can say that salesman wishes
to make a tour or Hamiltonian cycle, visiting each city exactly once and finishing at the city he starts
from. There is a non-negative cost c (i, j) to travel from the city i to city j.
The goal is to find a tour of minimum cost. We assume that every two cities are connected. Such
problems are called Traveling-salesman problem (TSP).
We can model the cities as a complete graph of n vertices, where each vertex represents a city.It can
be shown that TSP is NPC.
If we assume the cost function c satisfies the triangle inequality, then we can use the following
approximate algorithm.
Triangle inequality
Let u, v, w be any three vertices, we have
One important observation to develop an approximate solution is if we remove an edge from H*, then
the tour becomes a spanning tree.
Algorithm:
Approx-TSP (G= (V, E))
{ 1. Compute a MST T of G;
2. Select any vertex r is the root of the tree;
3. Let L be the list of vertices visited in a preorder tree walk of T;
Page158
4. Return the Hamiltonian cycle H that visits the vertices in the order L;
}
Traveling-Salesman Problem
Intuitively, Approx-TSP first makes a full walk of MST T, which visits each edge exactly two times. To
create a Hamiltonian cycle from the full walk, it bypasses some vertices (which corresponds to making
a shortcut)
RANDOMIZED ALGORITHM
What is a Randomized Algorithm?
An algorithm that uses random numbers to decide what to do next anywhere in its logic is called
Randomized Algorithm.
For example, in Randomized Quick Sort, we use random number to pick the next pivot (or we
randomly shuffle the array) and in Karger‘s algorithm, we randomly pick an edge.
Page159
Average of all evaluated times is the expected worst case time complexity. Below facts are
generally helpful in analysis of such algorithms. Linearity of expectation expected number of trials
until Success.
How many times while loop runs before finding a central pivot?
The probability that the randomly chosen element is central pivot is 1/2.
Therefore, expected number of times the while loop runs is 2
Thus, the expected time complexity of step 2 is O(n)
Page160
INTERIOR POINT METHOD
Interior-point methods (also referred to as barrier methods or IPMs) are a certain class
of algorithms that solve linear and nonlinear convex optimization problems.
Example: John von Neumann suggested an interior-point method of linear programming, which was
neither a polynomial-time method nor an efficient method in practice. In fact, it turned out to be slower
than the commonly used simplex method.
An interior point method discovered by Soviet mathematician I. I. Dikin in 1967 and reinvented in
the U.S. in the mid-1980s. In 1984, NarendraKarmarkar developed a method for linear
programming called Karmarkar's algorithm, which runs in provably polynomial time and is also very
efficient in practice.
It enabled solutions of linear programming problems that were beyond the capabilities of the simplex
method. Contrary to the simplex method, it reaches a best solution by traversing the interior of
the feasible region. The method can be generalized to convex programming based on a self-
concordant barrier function used to encode the convex set.
Any convex optimization problem can be transformed into minimizing (or maximizing) a linear
function over a convex set by converting to the epigraph form.
A special class of such barriers that can be used to encode any convex set. They guarantee that the
number of iterations of the algorithm is bounded by a polynomial in the dimension and accuracy of
the solution.
Karmarkar's breakthrough revitalized the study of interior-point methods and barrier problems,
showing that it was possible to create an algorithm for linear programming characterized
by polynomial complexity and, moreover, that was competitive with the simplex method.
Already Khachiyan's ellipsoid method was a polynomial-time algorithm; however, it was too slow to
be of practical interest.
The class of primal-dual path-following interior-point methods is considered the most
successful. Mehrotra's predictor–corrector algorithm provides the basis for most implementations of
this class of methods.
Primal dual interior Point Method for non-linear Optimization
The primal-dual method's idea is easy to demonstrate for constrained non-linear optimization. For
simplicity, consider the all-inequality version of a non-linear optimization problem:
minimize subject
Page161
Here is a small positive scalar, sometimes called the "barrier parameter". As µ converges to
zero the minimum of B(x,µ) should converge to a solution of (1).
The barrier function gradient is
1. For a product of N numbers, if we have to subtract a constant K such that the product gets its
maximum value, then subtract it from a largest value such that largest value-k is greater than 0. If we
have to subtract a constant K such that the product gets its minimum value, then subtract it from the
smallest value where smallest value-k should be greater than 0.
2. Goldbech`s conjecture: Every even integer greater than 2 can be expressed as the sum of 2 primes.
3. Perfect numbers or Amicable numbers: Perfect numbers are those numbers which are equal to the
sum of their proper divisors. Example: 6 = 1 + 2 + 3
4. Lychrel numbers: Are those numbers that cannot form a palindrome when repeatedly reversed and
added to itself. For example 47 is not a Lychrel Number as 47 + 74 = 121
5. Lemoine’s Conjecture : Any odd integer greater than 5 can be expressed as a sum of an odd prime
(all primes other than 2 are odd) and an even semi-prime. A semi-prime number is a product of two
prime numbers. This is called Lemoine‘s conjecture.
6. Fermat’s Last Theorem : According to the theorem, no three positive integers a, b, c satisfy the
equation, for any integer value of n greater than 2. For n = 1 and n = 2, the
equation have infinitely many solutions.
Page162
RECENT TRENDS IN PROBLEM SOLVING USING SEARCHING
AND SORTING TECHNIQUES IN DATASTRUCTURES:
Searching
Searching is the algorithmic process of finding a particular item in a collection of items. A search
typically answers either True or False as to whether the item is present. It turns out that there are many
different ways to search for the item. What we are interested in here is how these algorithms work and
how they compare to one another.
Page163
Analysis of Sequential Search
To analyze searching algorithms, we need to decide on a basic unit of computation. Recall that this is
typically the common step that must be repeated in order to solve the problem. For searching, it makes
sense to count the number of comparisons performed. Each comparison may or may not discover the
item we are looking for. In addition, we make another assumption here.
The list of items is not ordered in any way. The items have been placed randomly into the list in other
words, the probability that the item we are looking for is in any particular position is exactly the same
for each position of the list. If the item is not in the list, the only way to know it is to compare it against
every item present. If there are 𝑛 items, then the sequential search requires 𝑛comparisons to discover
that the item is not there. In the case where the item is in the list, the analysis is not so straightforward.
There are actually three different scenarios that can occur.
In the best case we will find the item in the first place we look, at the beginning of the list. We
will need only one comparison. In the worst case, we will not discover the item until the very last
comparison, the nth comparison.
What about the average case? On average, we will find the item about halfway into the list;
that is, we will compare against 𝑛/2 items. Recall, however, that as 𝑛 gets large, the coefficients, no
matter what they are, become insignificant in our approximation, so the complexity of the sequential
search, is (𝑛).
Table summarizes these results. We assumed earlier that the items in our collection had been randomly
placed so that there is no relative order between the items. What would happen to the sequential search
if the items were ordered in some way? Would we be able to gain any efficiency in our search
technique? Assume that the list of items was constructed so that the items were in ascending order, from
low to high.
If the item we are looking for is present in the list, the chance of it being in any one of the 𝑛 positions is
still the same as before. We will still have the same number of comparisons to find the item. However,
if the item is not present there is a slight advantage. Figure shows this process as the algorithm looks for
the item 50.Notice that items are still compared in sequence until 54. At this point, however, we know
something extra. Not only is 54 not the item we are looking for, but no other elements beyond 54 can
work either since the list is sorted. In this case, the algorithm does not have to continue looking through
all of the items to report that the item was not found. It can stop immediately.
Page164
The Binary Search
It is possible to take greater advantage of the ordered list if we are clever with our comparisons. In the
sequential search, when we compare against the first item, there are at most 𝑛 − 1 more items to look
through if the first item is not what we are looking for. Instead of searching the list in sequence, a
binary search will start by examining the middle item. If that item is the one we are searching for, we
are done.
If it is not the correct item, we can use the ordered nature of the list to eliminate half of the remaining
items. If the item we are searching for is greater than the middle item, we know that the entire lower
half of the list as well as the middle item can be eliminated from further consideration. The item, if it is
in the list, must be in the upper half. We can then repeat the process with the upper half. Start at the
middle item and compare it against what we are looking for. Again, we either find it or split the list in
half, therefore eliminating another large part of our possible search space. Figure shows how this
algorithm can quickly find the value 54.
Before we move on to the analysis, we should note that this algorithm is a great example of a divide and
conquer strategy. Divide and conquer means that we divide the problem into smaller pieces, solve the
Page165
smaller pieces in some way and then reassemble the whole problem to get the result. When we perform
a binary search of a list, we first check the middle item. If the item we are searching for is less than the
middle item, we can simply perform a binary search of the left half of the original list. Likewise, if the
item is greater, we can perform a binary search of the right half. Either way, this is a recursive call to
the binary search function passing a smaller list.
Page166
Sorting
Sorting is the process of placing elements from a collection in some kind of order. For example, a list of
words could be sorted alphabetically or by length. A list of cities could be sorted by population, by area,
or by zip code. There are many, many sorting algorithms that have been developed and analyzed. This
suggests that sorting is an important area of study in computer science. Sorting a large number of items
can take a substantial amount of computing resources. Like searching, the efficiency of a sorting
algorithm is related to the number of items being processed. For small collections, a complex sorting
method may be more trouble than it is worth. The overhead may be too high. On the other hand, for
larger collections, we want to take advantage of as many improvements as possible. In this section we
will discuss several sorting techniques and compare them with respect to their running time. Before
getting into specific algorithms, we should think about the operations that can be used to analyze a
sorting process. First, it will be necessary to compare two values to see which is smaller (or larger). In
order to sort a collection, it will be necessary to have some systematic way to compare values to see if
they are out of order. The total number of comparisons will be the most common way to measure a sort
procedure. Second, when values are not in the correct position with respect to one another, it may be
necessary to exchange them. This exchange is a costly operation and the total number of exchanges will
also be important for evaluating the overall efficiency of the algorithm.
1. Bubble Sort:
The bubble sort makes multiple passes through a list. It compares adjacent items and exchanges those
that are out of order. Each pass through the list places the next largest value in its proper place. In
essence, each item ―bubbles‖ up to the location where it belongs. Figure shows the first pass of a bubble
sort. The shaded items are being compared to see if they are out of order. If there are 𝑛 items in the list,
then there are � − 1 pairs of items that need to be compared on the first pass. It is important to note that
once the largest value in the list is part of a pair, it will continually be moved along until the pass is
complete. At the start of the second pass, the largest value is now in place. There are � − 1 items left to
sort, meaning that there will be � − 2 pairs. Since each pass places the next largest value in place, the
total number of passes necessary will be � − 1. After completing the � − 1 passes, the smallest item
must be in the correct position with no further processing required. The code below shows the complete
bubble_sort function. It takes the list as a parameter and modifies it by exchanging items as necessary.
The exchange operation, sometimes called a ―swap,‖ is slightly different in Python than in most other
programming languages. Typically, swapping two elements in a list requires a temporary storage
Page167
location (an additional memory location),will exchange the 𝑖th and �th items in the list. Without the
temporary storage, one of the values would be overwritten.
2. Selection Sort:
The selection sort improves on the bubble sort by making only one exchange for every pass through the
list. In order to do this, a selection sort looks for the largest value as it makes a pass and after
completing the pass, places it in the proper location. As with a bubble sort, after the first pass, the
largest item is in the correct place. After the second pass, the next largest is in place. This process
continues and requires � − 1 passes to sort � items, since the final item must be in place after the (� −
1)st pass. Figure shows the entire sorting process. On each pass, the largest remaining item is selected
and then placed in its proper location. The first pass places 93, the second pass places 77, the third
places 55, and so on. The function is shown below.
Page168
The selection sort makes the same number of comparisons as the bubble sort and is therefore also (�2).
However, due to the reduction in the number of exchanges, the selection sort typically executes faster in
benchmark studies. In fact, for our list, the bubble sort makes 20 exchanges, while the selection sort
makes only 8
Page169
3. Insertion Sort:
The insertion sort, although still, works in a slightly different way. It always maintains a sorted sub-list
in the lower positions of the list. Each new item is then ―inserted‖ back into the previous sub-list such
that the sorted sub-list is one item larger. Figure shows the insertion sorting process. The shaded items
represent the ordered sub-lists as the algorithm makes each pass. We begin by assuming that a list with
one item (position 0) is already sorted. On each pass, one for each item 1 through, the current item is
checked against those in the already sorted sub-list. As we look back into the already sorted sub-list, we
shift those items that are greater to the right. When we reach a smaller item or the end of the sub-list,
the current item can be inserted. Figure below shows the fifth pass in detail. At this point in the
algorithm, a sorted sub-list of five items consisting of 17, 26, 54, 77, and 93 exists. We want to insert 31
back into the already sorted items. The first comparison against 93 causes 93 to be shifted to the right.
77 and 54 are also shifted. When the item 26 is encountered, the shifting process stops and 31 is placed
in the open position. Now we have a sorted sub-list of six items. The implementation of insertion sort
shows that there are again n − 1 passes to sort n items. The iteration starts at position 1 and moves
through position n − 1, as these are the items that need to be inserted back into the sorted sub-lists. Line
8 performs the shift operation that moves a value up one position in the list, making room behind it for
the insertion. Remember that this is not a complete exchange as was performed in the previous
algorithms. The maximum number of comparisons for an insertion sort is the sum of the first n−1
integers. Again, this is (n2 ). However, in the best case, only one comparison needs to be done on each
pass. This would be the case for an already sorted list. One note about shifting versus exchanging is also
important. In general, a shift operation requires approximately a third of the processing work of an
exchange since only one assignment is performed. In benchmark studies, insertion sort will show very
good performance.
Page170
4. Shell Sort:
The shell sort, sometimes called the ―diminishing increment sort,‖ improves on the insertion sort by
breaking the original list into a number of smaller sub-lists, each of which is sorted using an insertion
sort. The unique way that these sub-lists are chosen is the key to the shell sort. Instead of breaking the
list into sub-lists of contiguous items, the shell sort uses an increment i, sometimes called the gap, to
create a sub-list by choosing all items that are i items apart. This can be seen in figure. This list has nine
items. If we use an increment of three, there are three sub-lists, each of which can be sorted by an
insertion sort. After completing these sorts, we get the list shown in figure. Although this list is not
completely sorted, something very interesting has happened. By sorting the sub-lists, we have moved
the items closer to where they actually belong. Figure shows a final insertion sort using an increment of
one; in other words, a standard insertion sort. Note that by performing the earlier sub-list sorts, we have
now reduced the total number of shifting operations necessary to put the list in its final order. For this
case, we need only four more shifts to complete the process. In the way in which the increments are
chosen is the unique feature of the shell sort. The function shell sort shown below uses a different set of
increments. In this case, we begin with 2 sub-lists. On the next pass, 4 sub-lists are sorted. Eventually, a
single list is sorted with the basic insertion sort. Figure shows the first sub-lists for our example using
this increment.
Page171
Page172
5. The Merge Sort:
Merge sort using a divide and conquer strategy as a way to improve the performance of sorting
algorithms. Merge sort is a recursive algorithm that continually splits a list in half. If the list is empty or
has one item, it is sorted by definition (the base case). If the list has more than one item, we split the list
and recursively invoke a merge sort on both halves. Once the two halves are sorted, the fundamental
operation, called a merge, is performed. Merging is the process of taking two smaller sorted lists and
combining them together into a single, sorted, new list. Figure below shows our familiar example list as
it is being split by merge sort. Figure below also shows the simple lists, now sorted, as they are merged
back together. The merge sort function shown below begins by asking the base case question. If the
length of the list is less than or equal to one, then we already have a sorted list and no more processing
is necessary. If, on the other hand, the length is greater than one, then we use the Python slice operation
to extract the left and right halves. It is important to note that the list may not have an even number of
items. That does not matter, as the lengths will differ by at most one.
Page173
6. Quick Sort:
The quick sort uses divide and conquer to gain the same advantages as the merge sort, while not using
additional storage. As a trade-off, however, it is possible that the list may not be divided in half. When
this happens, we will see that performance is diminished. A quick sort first selects a value, which is
called the pivot value. Although there are many different ways to choose the pivot value, we will simply
use the first item in the list. The role of the pivot value is to assist with splitting the list. The actual
position where the pivot value belongs in the final sorted list, commonly called the split point, will be
used to divide the list for subsequent calls to the quick sort. Figure below shows that 54 will serve as
our first pivot value. Since we have looked at this example a few times already, we know that 54 will
eventually end up in the position currently holding 31. The partition process will happen next. It will
find the split point and at the same time move other items to the appropriate side of the list, either less
than or greater than the pivot value.
Partitioning begins by locating two position markers – let‘s call them left_mark and right_mark – at the
beginning and end of the remaining items in the list (positions 1 and 8 in Figure). The goal of the
partition process is to move items that are on the wrong side with respect to the pivot value while also
converging on the split point. Figure shows this process as we locate the position of 54.We begin by
incrementing left_mark until we locate a value that is greater than the pivot value. We then decrement
right_mark until we find a value that is less than the pivot value. At this point we have discovered two
items that are out of place with respect to the eventual split point. For our example, this occurs at 93 and
20. Now we can exchange these two items and then repeat the process again. At the point where
right_mark becomes less than left_mark, we stop. The position of right_mark is now the split point. The
pivot value can be exchanged with the contents of the split point and the pivot value is now in place. In
addition, all the items to the left of the split point are less than the pivot value and all the items to the
right of the split point are greater than the pivot value. The list can now be divided at the split point and
the quick sort can be invoked recursively on the two halves. The quick_sort function shown below
invokes a recursive function, quick_sort_helper. quick_sort_helper begins with the same base case as
the merge sort. If the length of the list is less than or equal to one, it is already sorted. If it is greater,
then it can be partitioned and recursively sorted. The partition function implements the process
described earlier.
Page174
Page175
Page176
Web Resources:
Unit I
https://www.slideshare.net/mohamedloey/algorithms-lecture-4-sorting-algorithms-i
http://vssut.ac.in/lecture_notes/lecture1428551222.pdf
https://www.tutorialspoint.com/design_and_analysis_of_algorithms/design_and_analysis_of_algorithms
_selection_sort.htm
https://yourbasic.org/algorithms/graph/
https://www.tutorialspoint.com/index.htm
https://www.geeksforgeeks.org/breadth-first-search-or-bfs-for-a-graph/
https://www.geeksforgeeks.org/topological-sorting/
https://www.geeksforgeeks.org/strongly-connected-components/
http://www.csc.kth.se/utbildning/kth/kurser/DD1339/inda14/lecture/PDF/lecture01a-correctness.pdf
https://www.cs.cornell.edu/courses/cs3110/2013sp/supplemental/recitations/rec21.html
Unit II
https://codeforces.com/blog/entry/69287
https://inf.ethz.ch/personal/ladickyl/Matroids.pdf
https://www.guru99.com/greedy-algorithm.html
https://www.geeksforgeeks.org/divide-and-conquer/
https://www.tutorialspoint.com/design_and_analysis_of_algorithms/design_and_analysis_of_algorithms
_greedy_method.htm
https://www.javatpoint.com/minimum-spanning-tree-introduction
https://www.javatpoint.com/applications-of-minimum-spanning-tree
https://www.javatpoint.com/kruskals-minimum-spanning-tree-algorithm
https://www.javatpoint.com/prims-minimum-spanning-tree-algorithm
https://en.wikipedia.org/wiki/Graph_matching
https://www.tutorialspoint.com/graph_theory/graph_theory_matchings.htm
https://www.geeksforgeeks.org/hopcroft-karp-algorithm-for-maximum-matching-set-1-introduction/
https://www.geeksforgeeks.org/maximum-bipartite-matching/
https://www.ti.inf.ethz.ch/ew/courses/GT03/lectures/PDF/lecture5f.pdf
http://www14.in.tum.de/lehre/2014WS/ea/split/sec-Augmenting-Paths-for-Matchings-single.pdf
https://www.eecs.tufts.edu/~gdicks02/Blossom/Blossom_Algorithm.html
Unit III
https://tutorialspoint.dev/data-structure/graph-data-structure/minimum-cut-in-a-directed-graph
https://www.hackerearth.com/practice/algorithms/graphs/maximum-flow/tutorial/
https://cp-algorithms.com/graph/edmonds_karp.html#toc-tgt-5
https://sahebg.github.io/computersceince/maximum-flow-edmond-karp-algorithm-c-program-example/
https://www.geeksforgeeks.org/strassens-matrix-multiplication/
Page177
https://shivathudi.github.io/jekyll/update/2017/06/15/matr-mult.html
http://www.facweb.iitkgp.ac.in/~sourav/Lecture-03.pdf
https://developerinsider.co/introduction-to-divide-and-conquer-algorithm-design-paradigm/
https://www.javatpoint.com/divide-and-conquer-introduction
https://www.includehelp.com/algorithms/divide-and-conquer-paradigm.aspx
http://graphics.ics.uci.edu/ICS6N/NewLectures/Lecture5.pdf
https://en.wikipedia.org/wiki/Computational_complexity_of_mathematical_operations
https://en.wikipedia.org/wiki/LU_decomposition
Unit IV
https://www.javatpoint.com/floyd-warshall-algorithm
https://www.javatpoint.com/dynamic-programming-introduction
http://eiche.theoinf.tu-ilmenau.de/fingerprint/ueber_polynome_modulo-englisch.pdf
http://sabarirb.org/insight/Reading%20Materials/maths/algebra/indet/modrep.htm
http://homepages.math.uic.edu/~leon/mcs425-s08/handouts/chinese_remainder.pdf
http://algo.inria.fr/seminars/sem08-09/kruppa-slides.pdf
Unit V
http://www.uky.edu/~dsianita/300/online/LP.pdf
https://www.techopedia.com/definition/20403/linear-programming-lp
https://optimization.mccormick.northwestern.edu/index.php/Interior-point_method_for_LP
https://www.tutorialspoint.com/design_and_analysis_of_algorithms/design_and_analysis_of_algorithms
_np_hard_complete_classes.htm
https://www.slideshare.net/JyotsnaSuryadevara/9-chapter-8-np-hard-and-np-complete-problems
https://www.javatpoint.com/daa-approximate-algorithms
https://www.cs.auckland.ac.nz/compsci105s1c/resources/ProblemSolvingwithAlgorithmsandDataStruct
ures.pdf
Page178