Computer Science III
Computer Science III
Computer Science III
2016/08/05 21:48:04
Version 1.2.0
These are lecture notes used in CSCE 310 (Data Structures & Algorithms) at the
University of Nebraska—Lincoln.
i
Contents
1 Introduction 1
2 Algorithm Analysis 3
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1.1 Example: Computing a Sum . . . . . . . . . . . . . . . . . . . . . 6
2.1.2 Example: Computing a Mode . . . . . . . . . . . . . . . . . . . . 8
2.2 Pseudocode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.4 Asymptotics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.4.1 Big-O Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.4.2 Other Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.4.3 Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.4.4 Limit Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.5 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.5.1 Linear Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.5.2 Set Operation: Symmetric Dierence . . . . . . . . . . . . . . . . 29
2.5.3 Euclid’s GCD Algorithm . . . . . . . . . . . . . . . . . . . . . . . 30
2.5.4 Selection Sort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.6 Other Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.6.1 Importance of Input Size . . . . . . . . . . . . . . . . . . . . . . . 33
2.6.2 Control Structures are Not Elementary Operations . . . . . . . . . 36
2.6.3 Average Case Analysis . . . . . . . . . . . . . . . . . . . . . . . . 37
2.6.4 Amortized Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.7 Analysis of Recursive Algorithms . . . . . . . . . . . . . . . . . . . . . . . 39
2.7.1 The Master Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 40
3 Storing Things 45
3.1 Lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.2 Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.3 Hash-Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.3.1 Hash Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.3.2 Collisions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.3.3 Eciency Rehashing . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.3.4 Other Applications . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.3.5 Java Implementations . . . . . . . . . . . . . . . . . . . . . . . . . 48
iii
Contents
6 Linear Systems 83
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
6.2 Solving Linear Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
6.2.1 LU Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . 87
6.2.2 Matrix Inverse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
6.2.3 Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
6.3 Linear Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
6.3.1 Formulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
iv
Contents
6.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
7 Trees 97
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
7.2 Denitions & Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . 97
7.3 Tree Traversal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
7.3.1 Preorder Traversal . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
7.3.2 Inorder Traversal . . . . . . . . . . . . . . . . . . . . . . . . . . 100
7.3.3 Postorder Traversal . . . . . . . . . . . . . . . . . . . . . . . . . 101
7.3.4 Breadth-First Search Traversal . . . . . . . . . . . . . . . . . . . 104
7.3.5 Implementations & Data Structures . . . . . . . . . . . . . . . . 104
7.3.6 Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
7.4 Binary Search Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
7.4.1 Basic Operations . . . . . . . . . . . . . . . . . . . . . . . . . . 110
7.5 Balanced Binary Search Trees . . . . . . . . . . . . . . . . . . . . . . . 112
7.5.1 AVL Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
7.5.2 B-Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
7.5.3 Red-Black Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
7.6 Heaps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
7.6.1 Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
7.6.2 Implementations . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
7.6.3 Java Collections Framework . . . . . . . . . . . . . . . . . . . . 128
7.6.4 Other Operations . . . . . . . . . . . . . . . . . . . . . . . . . . 128
7.6.5 Variations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
7.7 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
7.7.1 Heap Sort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
7.7.2 Human Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
v
Contents
vi
Contents
Glossary 221
Acronyms 223
Index 227
References 228
vii
List of Algorithms
1 Computing the Mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2 Computing the Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3 Trivial Sorting (Bad Pseudocode) . . . . . . . . . . . . . . . . . . . . . . . . . 14
4 Trivially Finding the Minimal Element . . . . . . . . . . . . . . . . . . . . . . 14
5 Finding the Minimal Element . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
6 Linear Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
7 Symmetric Dierence of Two Sets . . . . . . . . . . . . . . . . . . . . . . . . 29
8 Euclid’s GCD Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
9 Selection Sort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
10 Sieve of Eratosthenes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
11 Fibonacci(n) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
12 Binary Search – Recursive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
13 Merge Sort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
14 Next k-Combination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
15 Next Lexicographic Permutation . . . . . . . . . . . . . . . . . . . . . . . . . 55
16 Next Repeated Permutation Generator . . . . . . . . . . . . . . . . . . . . . . 56
17 Base Conversion Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
18 Set Partition Generator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
19 Brute Force Iterative Algorithm for Satisability . . . . . . . . . . . . . . . . 59
20 Brute Force Recursive Algorithm for Satisability . . . . . . . . . . . . . . . . 59
21 Brute Force Iterative Algorithm for Hamiltonian Cycle . . . . . . . . . . . . . 61
22 Hamiltonian DFS Cycle Walk . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
23 Hamiltonian DFS Path Walk–Main Algorithm . . . . . . . . . . . . . . . . . . 62
24 Walk(G, p) – Hamiltonian DFS Path Walk . . . . . . . . . . . . . . . . . . . 62
25 Knapsack(K, S) – Backtracking Brute Force 0-1 Knapsack . . . . . . . . . . 65
26 Brute Force Convex Hull . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
27 Brute Force Subset Sum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
28 Binary Exponentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
29 Euclid’s Simple GCD Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 74
30 ExtendedEuclideanAlgorithm . . . . . . . . . . . . . . . . . . . . . . . 75
ix
LIST OF ALGORITHMS
x
List of Code Samples
2.1 Summation Algorithm 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Summation Algorithm 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 Summation Algorithm 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.4 Mode Finding Algorithm 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.5 Mode Finding Algorithm 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.6 Mode Finding Algorithm 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.7 Naive Exponentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.8 Computing an Average . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
xi
List of Figures
2.1 Plot of two functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.2 Expected number of comparisons for various success probabilities p. . . . 38
xiii
List of Figures
8.3 DFS Forrest with initial vertex i. Dashed edges indicate back edges. . . 147
8.4 BFS Tree with initial vertex i. Dashed edges indicate cross edges. . . . . 150
8.5 BFS Tree with cross edges (dashed) involved in a cycle. . . . . . . . . . 152
8.6 A Directed Graph. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
8.7 Condensation Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
8.8 Minimum Spanning Tree example. . . . . . . . . . . . . . . . . . . . . . 157
8.9 Illustration of Tree, Fringe, and Unseen vertex sets. . . . . . . . . . . . 159
8.10 Prim’s Algorithm after the second iteration. . . . . . . . . . . . . . . . . 159
8.11 Weighted directed graph. . . . . . . . . . . . . . . . . . . . . . . . . . . 161
8.12 Result of Dijsktra’s Algorithm with source vertex e. . . . . . . . . . . . 162
8.13 Basic idea behind Floyd-Warshal: Supposing that a path from vi vj
(k−1)
has already been found with a distance of di,j , the consideration of vk
(k−1) (k−1)
as a new intermediate node may have shorter distance, di,k + dk,j . . 164
8.14 Floyd’s Algorithm Demonstration. . . . . . . . . . . . . . . . . . . . . . 166
8.15 Another Demonstration of Floyd-Warshall’s Algorithm. . . . . . . . . . 167
xiv
1 Introduction
These lecture notes assume that you are already familiar with the following topics:
• Mastery over at least one high-level programming language
• Working knowledge of algorithm design and analysis
• Familiarity with design and analysis of recursive algorithms
• Working knowledge of Big-O notation and proof techniques involving asymptotics
• Familiarity with basic data structures such as lists, stacks, queues, binary search
trees
Nevertheless, this section serves as a high-level review of these topics.
1
2 Algorithm Analysis
2.1 Introduction
Deterministic An algorithm is deterministic if, when given a particular input, will always
go through the exact same computational process and produce the same output.
Most of the algorithms you’ve used up to this point are deterministic.
Randomized An algorithm is randomized is an algorithm that involves some form of
random input. The random source can be used to make decisions such as random
selections or to generate random state in a program as candidate solutions. There
are many types of randomized algorithms including Monte-Carlo algorithms (that
may have some error with low probability), Las Vagas algorithms (whose results
are always correct, but may fail with a certain probability to produce any results),
etc.
Optimization Many algorithms seek not only to nd a solution to a problem, but to
nd the best, optimal solution. Many of these type of algorithms are heuristics:
rather than nding the actual best solution (which may be infeasible), they can
approximate a solution (Approximation algorithms). Other algorithms simulate
3
2 Algorithm Analysis
Time The most obvious resource an algorithm takes is time: how long the algorithm
takes to nish its computation (measured in seconds, minutes, etc.). Alternatively,
time can be measured in how many CPU cycles or oating-point operations a
particular piece of hardware takes to execute the algorithm.
Memory The second major resource in a computer is memory. An algorithm requires
memory to store the input, output, and possibly extra memory during its execution.
How much memory an algorithm uses in its execution may be even more of
an important consideration than time in certain environments or systems where
memory is extremely limited such as embedded systems.
Power The amount of power a device consumes is an important consideration when you
have limited capacity such as a battery in a mobile device. From a consumer’s
perspective, a slower phone that oered twice the batter life may be preferable. In
certain applications such as wireless sensor networks or autonomous systems power
may be more of a concern than either time or memory.
Bandwidth In computer networks, eciency is measured by how much data you can
transmit from one computer to another, called throughput. Throughput is generally
limited by a network’s bandwidth: how much a network connection can transmit
under ideal circumstances (no data loss, no retransmission, etc.)
Circuitry When designing hardware, resources are typically measured in the number
of gates or wires are required to implement the hardware. Fewer gates and wires
means you can t more chips on a silicon die which results in cheaper hardware.
Fewer wires and gates also means faster processing.
4
2.1 Introduction
Idleness Even when a computer isn’t computing anything, it can still be “costing” you
something. Consider purchasing hardware that runs a web server for a small user
base. There is a substantial investment in the hardware which requires maintenance
and eventually must be replaced. However, since the user base is small, most of
the time it sits idle, consuming power. A better solution may be to use the same
hardware to serve multiple virtual machines (VMs). Now several small web serves
can be served with the same hardware, increasing our utilization of the hardware.
In scenarios like this, the lack of work being performed is the resource.
Load Somewhat the opposite of idleness, sometimes an application or service may have
occasional periods of high demand. The ability of a system to service such high
loads may be considered a resource, even if the capacity to handle them goes unused
most of the time.
These are all very important engineering and business considerations when designing
systems, code, and algorithms. However, we’ll want to consider the complexity of
algorithms in a more abstract manner.
Suppose we have to dierent programs (or algorithms) A and B. Both of those algorithms
are correct, but A uses fewer of the above resources than B. Clearly, algorithm A is the
better, more ecient solution. However, how can we better quantify this eciency?
List Operations
To give a concrete example, consider a typical list ADT. The list could be implemented
as an array-based list (where the class owns a static array that is resized/copied when
full) or a linked list (with nodes containing elements and linking to the next node in the
list). Some operations are “cheap” on one type of list while other operations may be
more “expensive.”
Consider the problem of inserting a new element into the list at the beginning (index 0).
For a linked list this involves creating a new node and shuing a couple of references.
The number of operations in this case is not contingent on the size of the the list. In
contrast, for an array-based list, if the list contains n elements, each element will need to
be shifted over one position in the array in order to make room for the element to be
inserted. The number of shifts is proportional to the number of elements in the array, n.
Clearly for this operation, a linked list is better (more ecient).
Now consider a dierent operation: given an index i, retrieve the i-th element in the list.
For an array-based list we have the advantage of random access to the array. When we
index an element, arr[i], it only takes one memory address computation to “jump” to
the memory location containing the i-th element. In contrast here, a linked list would
require us to start at the head, and traverse the list until we reach the i-th node. This
requires i traversal operations. In the worst case, retrieving the last element, the n-th
element, would require n such operations. A summary of these operations can be found
5
2 Algorithm Analysis
in Table 2.1.
Already we have a good understanding of the relative performance of algorithms based on
the type of data structure we choose. In this example we saw constant time algorithms
and linear algorithms. Constant time algorithms execute a constant number of operations
regardless of the size of the input (in this case, the size of the list n). Linear algorithms
perform a number of operations linearly related to the size of the input.
In the following examples, we’ll begin to be a little bit more formal about this type of
analysis.
The following is a toy example, but its easy to understand and straightforward. Consider
the following problem: given an integer n ≥ 0, we want to compute the arithmetic series,
n
∑
i = 1 + 2 + 3 + · · · + (n − 1) + n
i=1
As a naive approach, consider the algorithm in Code Sample 2.1. In this algorithm, we
iterate over each possible number i in the series. For each number i, we count 1 through
i and add one to a result variable.
1 int result = 0;
2 for(int i=1; i<=n; i++) {
3 for(int j=1; j<=i; j++) {
4 result = result + 1;
5 }
6 }
As an improvement, consider the algorithm in Code Sample 2.2. Instead of just adding
one on each iteration of the inner loop, we omit the loop entirely and simply just add
the index variable i to the result.
Can we do even better? Yes. The arithmetic series actually has a closed-form solution
6
2.1 Introduction
1 int result = 0;
2 for(int i=1; i<=n; i++) {
3 result = result + i;
4 }
Code Sample 2.3 uses this formula to directly compute the sum without any loops.
1 int result = n * (n + 1) / 2;
All three of these algorithms were run on a laptop computer for various values of n from
10 up to 1,000,000. Table 2.2 contains the resulting run times (in milliseconds) for each
of these three algorithms on the various input sizes.
With small input sizes, there is almost no dierence between the three algorithms.
However, that would be a naive way of analyzing them. We are more interested in how
each algorithm performs as the input size, n increases. In this case, as n gets larger, the
dierences become very stark. The rst algorithm has two nested for loops. On average,
the inner loop will run about n2 times while the outer loop runs n times. Since the loops
are nested, the inner loop executes about n2 times for each iteration of the outer loop.
Thus, the total number of iterations, and consequently the total number of additions is
about
n
n × ≈ n2
2
The second algorithm saves the inner for loop and thus only makes n additions. The
nal algorithm only performs a constant number of operations.
Observe how the running time grows as the input size grows. For Algorithm 1, increasing
n from 100,000 to 1,000,000 (10 times as large) results in a running time that is about
100 times as slow. This is because it is performing n2 operations. To see this, consider
7
2 Algorithm Analysis
the following. Let t(n) be the time that Algorithm 1 takes for an input size of n. From
before we know that
t(n) ≈ n2
Observe what happens when we increase the input size from n to 10n:
which is 100 times as large as t(n). The running time of Algorithm 1 will grow quadratically
with respect to the input size n.
Similarly, Algorithm 2 grows linearly,
t(n) ≈ n
t(10n) ≈ 10n
leads to a 10 fold increase in the running time. Algorithm 3’s runtime does not depend
on the input size, and so its runtime does not grow as the input size grows. It essentially
remains at–constant.
Of course, the numbers in Table 2.2 don’t follow this trend exactly, but they are pretty
close. The actual experiment involves a lot more variables than just the algorithms: the
laptop may have been performing other operations, the compiler and language may have
optimizations that change the algorithms, etc. Empirical results only provide general
evidence as to the runtime of an algorithm. If we moved the code to a dierent, faster
machine or used a dierent language, etc. we would get dierent numbers. However, the
general trends in the rate of growth would hold. Those rates of growth will be what we
want to analyze.
1
In general there may be more than one mode, for example in the set {10, 20, 10, 20, 50}, 10 and 20 are
both modes. The problem will simply focus on nding a mode, not all modes.
8
2.1 Introduction
3 int maxCount = 0;
4 int modeIndex = 0;
5 for(int i=0; i<arr.length; i++) {
6 int count = 0;
7 int candidate = arr[i];
8 for(int j=0; j<arr.length; j++) {
9 if(arr[j] == candidate) {
10 count++;
11 }
12 }
13 if(count > maxCount) {
14 modeIndex = i;
15 maxCount = count;
16 }
17 }
18 return arr[modeIndex];
19 }
Now consider the following variation in Code Sample 2.5. In this algorithm, the rst
thing we do is sort the array. This means that all equal elements will be contiguous. We
can exploit this to do less work. Rather than going through the list a second time for
each possible mode, we can count up contiguous runs of the same element. This means
that we need only examine each element exactly once, giving us n comparison operations
(line 8).
We can’t, however, ignore the fact that to exploit the ordering, we needed to rst “invest”
some work upfront by sorting the array. Using a typical sorting algorithm, we would
expect that it would take about n log (n) comparisons. Since the sorting phase and mode
nding phase were separate, the total number of comparisons is about
n log (n) + n
The highest order term here is the n log (n) term for sorting. However, this is still lower
than the n2 algorithm. In this case, the investment to sort the array pays o! To compare
with our previous analysis, what happens when we increase the input size 10 fold? For
simplicity, let’s only consider the highest order term:
Then
t(10n) = 10n log (10n) = 10n log (n) + 10 log (10)
9
2 Algorithm Analysis
The second term is a constant additive term. The increase in running time is essentially
linear! We cannot discount the additive term in general, but it is so close to linear that
terms like n log (n) are sometimes referred to as quasilinear.
Yet another solution, presented in Code Sample 2.6, utilizes a map data structure to
compute the mode. A map is a data structure that allows you to store key-value pairs.
In this case, we map elements in the array to a counter that represents the element’s
multiplicity. The algorithm works by iterating over the array and entering/updating the
elements and counters.
There is some cost associated with inserting and retrieving elements from the map,
but this particular implementation oers amortized constant running time for these
operations. That is, some particular entries/retrievals may be more expensive (say
linear), but when averaged over the life of the algorithm/data structure, each operation
only takes a constant amount of time.
Once built, we need only go through the elements in the map (at most n) and nd the
one with the largest counter. This algorithm, too, oers essentially linear runtime for all
inputs. Similar experimental results can be found in Table 2.3.
The dierence in performance is even more dramatic than in the previous example. For an
input size of 1,000,000 elements, the n2 algorithm took nearly 8 minutes! This is certainly
unacceptable performance for most applications. If we were to extend the experiment to
10
2.2 Pseudocode
n = 10,000,000, we would expect the running time to increase to about 13 hours! For
perspective, input sizes in the millions are small by today’s standards. Algorithms whose
runtime is quadratic are not considered feasible for today’s applications.
2.2 Pseudocode
11
2 Algorithm Analysis
Pseudocode (“fake” code) is similar to some programming languages that you’re familiar
with, but does not have any particular syntax rules. Instead, it is a higher-level description
of a process. You may use familiar control structures such as loops and conditionals, but
you can also utilize natural language descriptions of operations.
There are no established rules for pseudocode, but in general, good pseudocode:
• Clearly labels the algorithm
• Identies the input and output at the top of the algorithm
• Does not involve any language or framework-specic syntax–no semicolons, decla-
ration of variables or their types, etc.
• Makes liberal use of mathematical notation and natural language for clarity
Good pseudocode abstracts the algorithm by giving enough details necessary to under-
stand the algorithm and subsequently implement it in an actual programming language.
Let’s look at some examples.
2
To review, ai ∈ A is a predicate meaning the element ai is in the set A.
12
2.2 Pseudocode
to a variable.3
Consider another example of computing the mode, similar to the second approach in a
previous example.
3
Not all languages use the familiar single equals sign = for the assignment operator. The statistical
programming language R uses the left-arrow operator, <- and Maple uses := for example.
13
2 Algorithm Analysis
2.3 Analysis
Given two competing algorithms, we could empirically analyze them like we did in
previous examples. However, it may be infeasible to implement both just to determine
14
2.3 Analysis
which is better. Moreover, by analyzing them from a more abstract, theoretical approach,
we have a better more mathematically-based proof of the relative complexity of two
algorithm.
Given an algorithm, we can analyze it by following this step-by-step process.
1. Identify the input
2. Identify the input size, n
3. Identify the elementary operation
4. Analyze how many times the elementary operation is executed with respect to the
input size n
5. Characterize the algorithm’s complexity by providing an asymptotic (Big-O, or
Theta) analysis
This step is pretty straightforward. If the algorithm is described with good pseudocode,
then the input will already be identied. Common types of inputs are single numbers,
collections of elements (lists, arrays, sets, etc.), data structures such as graphs, matrices,
etc.
However, there may be some algorithms that have multiple inputs: two numbers or a
collection and a key, etc. In such cases, it simplies the process if you can, without loss
of generality, restrict attention to a single input value, usually the one that has the most
relevance to the elementary operation you choose.
Once the input has been identied, we need to identify its size. We’ll eventually want to
characterize the algorithm as a function f (n): given an input size, how many resources
does it take. Thus, it is important to identify the number corresponding to the domain
of this function.
This step is also pretty straightforward, but may be dependent on the type of input or
even its representation. Examples:
• For collections (sets, lists, arrays), the most natural is to use the number of elements
in the collection (cardinality, size, etc.). The size of individual elements is not as
important as number of elements since the size of the collection is likely to grow
more than individual elements do.
• An n × m matrix input could be measured by one or both nm of its dimensions.
• For graphs, you could count either the number of vertices or the number of edges
15
2 Algorithm Analysis
in the graph (or both!). How the graph is represented may also aect its input size
(an adjacency matrix vs. an adjacency list).
• If the input is a number x, the input size is typically the number of bits required
to represent x. That is,
n = dlog2 (x)e
To see why, recall that if you have n bits, the maximum number you can represent
is 2n − 1. Inverting this expression gives us dlog2 (x)e.
Some algorithms may have multiple inputs. For example, a collection and a number (for
searching) or two integers as in Euclid’s algorithm. The general approach to analyzing
such algorithms to simplify things by only considering one input. If one of the inputs is
larger, such as a collection vs. a single element, the larger one is used in the analysis.
Even if it is not clear which one is larger, it may be possible to assume, without loss of
generality, that one is larger than the other (and if not, the inputs may be switched).
The input size can then be limited to one variable to simplify the analysis.
We also need to identify what part of the algorithm does the actual work (where the most
resources will be expended). Again, we want to keep the analysis simple, so we generally
only identify one elementary operation. There may be several reasonable candidates
for the elementary operation, but in general it should be the most common or most
expensive operation performed in the algorithm. For example:
• When performing numeric computations, arithmetic operations such as additions,
divisions, etc.
• When sorting or searching, comparisons are the most natural elementary operations.
Swaps may also be a reasonable choice depending on how you want to analyze the
algorithm.
• When traversing a data structure such as a linked list, tree, or graph a node traversal
(visiting or processing a node) may be considered the elementary operation.
In general, operations that are necessary to control structures (such as loops, assignment
operators, etc.) are not considered good candidates for the elementary operation. An
extended discussion of this can be found in Section 2.6.2.
Analysis
Once the elementary operation has been identied, the algorithm must be analyzed to
count the number of times it is executed with respect to the input size. That is, we
analyze the algorithm to nd a function f (n) where n is the input size and f (n) gives
the number of times the elementary operation is executed.
16
2.3 Analysis
The analysis may involve deriving and solving a summation. For example, if the
elementary operation is performed within a for loop and the loop runs a number of times
that depends on the input size n.
If there are multiple loops in which the elementary operation is performed, it may be
necessary to setup multiple summations. If two loops are separate and independent (one
executes after the other), then the sum rule applies. The total number of operations is
the sum of the operations of each loop.
If two loops are nested, then the product rule applies. The inner loop will execute fully
for each iteration of the outer loop. Thus, the number of operations are multiplied with
each other.
Sometimes the analysis will not be so clear cut. For example, a while loop may execute
until some condition is satised that does not directly depend on the input size but also
on the nature of the input. In such cases, we can simplify our analysis by considering
the worst-case scenario. In the while loop, what is the maximum possible number of
iterations for any input?
Asymptotic Characterization
As computers get faster and faster and resources become cheaper, they can process more
and more information in the same amount of time. However, the characterization of
an algorithm should be invariant with respect to the underlying hardware. If we run
an algorithm on a machine that is twice as fast, that doesn’t mean that the algorithm
has improved. It still takes the same number of operations to execute. Faster hardware
simply means that the time it takes to execute those operations is half as much as it was
before.
To put it in another perspective, performing Euclid’s algorithm to nd the GCD of two
integers took the same number of steps 2,300 years ago when he performed them on
paper as it does today when they are executed on a digital computer. A computer is
obviously faster than Euclid would have been, but both Euclid and the computer are
performing the same number of steps when executing the same algorithm.
For this reason, we characterize the number of operations performed by an algorithm
using asymptotic analysis. Improving the hardware by a factor of two only aects the
“hidden constant” sitting outside of the function produced by the analysis in the previous
step. We want our characterization to be invariant of those constants.
Moreover, we are really more interested in how our algorithm performs for larger and
larger input sizes. To illustrate, suppose that we have two algorithms, one that performs
f (n) = 100n2 + 5n
17
2 Algorithm Analysis
·106
1.5
1
f (n) = 100n2 + 5n
0.5
g(n) = n3
0
0 20 40 60 80 100 120
n
operations. These functions are graphed in Figure 2.1. For inputs of size less than 100,
the rst algorithm performs worse than the second (the graph is higher indicating “more”
resources). However, for inputs of size greater than 100, the rst algorithm is better. For
small inputs, the second algorithm may be better, but small inputs are not the norm
for any “real” problems.4 In any case, on modern computers, we would expect small
inputs to execute fast anyway as they did in our empirical experiments in Section 2.1.1
and 2.1.2. There was essentially no discernible dierence in the three algorithms for
suciently small inputs.
We can rigorously quantify this by providing an asymptotic characterization of these
functions. An asymptotic characterization essentially characterizes the rate of growth of
a function or the relative rate of growth of functions. In this case, n3 grows much faster
than 100n2 + 5n as n grows (tends toward innity). We formally dene these concepts
in the next section.
2.4 Asymptotics
We want to capture the notion that one function grows faster than (or at least as fast as)
another. Categorizing functions according to their growth rate has been done for a long
4
There are problems where we can apply a “hybrid” approach: we can check for the input size and
choose one algorithm for small inputs and another for larger inputs. This is typically done in hybrid
sorting algorithms such as when merge sort is performed for “large” inputs but switches over to
insertion sort for smaller arrays.
18
2.4 Asymptotics
f (n) ∈ O(g(n))
read as “f is big-O of g,” if there exist constants c ∈ R+ and n0 ∈ N such that for every
integer n ≥ n0 ,
f (n) ≤ cg(n)
f (n) = O(g(n))
5
The original notation and denition are attributed to Paul Bachmann in 1894 [4]. Denitions and
notation have been rened and introduced/reintroduced over the years. Their use in algorithm
analysis was rst suggested by Donald Knuth in 1976 [10].
19
2 Algorithm Analysis
Example
Let’s revisit the example from before where f (n) = 100n2 + 5n and g(n) = n3 . We want
to show that f (n) ∈ O(g(n)). By the denition, we need to show that there exists a c
and n0 such that
f (n) ≤ cg(n)
As we observed in the graph in Figure 2.1, the functions “crossed over” somewhere
around n = 100. Let’s be more precise about that. The two functions cross over when
they are equal, so we setup an equality,
100n2 + 5n = n3
Collecting terms and factoring out an n (that is, the functions have one crossover point
at n = 0), we have
n2 − 100n − 5 = 0
The values of n satisfying this inequality can be found by applying the quadratic formula,
and so √
100 ± 10000 + 20
n=
2
Which is −0.049975 . . . and 100.0499 . . .. The rst root is negative and so irrelevant.
The second is our cross over point. The next largest integer is 101. Thus, for c = 1 and
n0 = 101, the inequality is satised.
In this example, it was easy to nd the intersection because we could employ the quadratic
equation to nd roots. This is much more dicult with higher degree polynomials. Throw
in some logarithmic functions, exponential functions, etc. and this approach can be
dicult.
Revisit the denition of big-O: the inequality doesn’t have to be tight or precise. In the
previous example we essentially xed c and tried to nd n0 such that the inequality held.
Alternatively, we could x n0 to be small and then nd the c (essentially compressing
the function) such that the inequality holds. Observe:
By adding positive values, we make the equation larger until it looks like what we want,
in this case g(n) = n3 . By the end we’ve got our constants: for c = 105 and n0 = 0, the
inequality holds. There is nothing special about this c, c = 1000000 would work too.
The point is we need only nd at least one c, n0 pair that the inequality holds (there are
an innite number of possibilities).
20
2.4 Asymptotics
Big-O provides an asymptotic upper bound characterization of two functions. There are
several other notations that provide similar characterizations.
Big-Omega
Big-Omega provides an asymptotic lower bound on a function. The only dierence is the
inequality has been reversed. Intuitively f has a growth rate that is bounded below by g.
Big-Theta
Yet another characterization can be used to show that two functions have the same order
of growth.
Denition 3. Let f and g be two functions f, g : N → R+ . We say that
f (n) ∈ Θ(g(n))
read as “f is Big-Theta of g,” if there exist constants c1 , c2 ∈ R+ and n0 ∈ N such that
for every integer n ≥ n0 ,
c1 g(n) ≤ f (n) ≤ c2 g(n)
Big-Θ essentially provides an asymptotic equivalence between two functions. The function
f is bounded above and below by g. As such, both functions have the same rate of
growth.
Soft-O Notation
Logarithmic factors contribute very little to a function’s rate of growth especially com-
pared to larger order terms. For example, we called n log (n) quasilinear since it was
nearly linear. Soft-O notation allows us to simplify terms by removing logarithmic factors.
Denition 4. Let f, g be functions such that f (n) ∈ O(g(n) · logk (n)). Then we say
that f (n) is soft-O of g(n) and write
f (n) ∈ Õ(g(n))
21
2 Algorithm Analysis
For example,
n log (n) ∈ Õ(n)
Little Asymptotics
Related to big-O and big-Ω are their corresponding “little” asymptotic notations, little-o
and little-ω.
Denition 5. Let f and g be two functions f, g : N → R+ . We say that
f (n) ∈ o(g(n))
read as “f is little-o of g,” if
f (n)
lim =0
n→∞ g(n)
The little-o is sometimes dened as for every > 0 there exists a constant N such that
|f (n)| ≤ |g(n)| ∀n ≥ N
but given the restriction that g(n) is positive, the two denitions are essentially equivalent.
Little-o is a much stronger characterization of the relation of two functions. If f (n) ∈
o(g(n)) then not only is g an asymptotic upper bound on f , but they are not asymptoti-
cally equivalent. Intuitively, this is similar to the dierence between saying that a ≤ b
and a < b. The second is a stronger statement as it implies the rst, but the rst does
not imply the second. Analogous to this example, little-o provides a “strict” asymptotic
upper bound. The growth rate of g is strictly greater than the growth rate of f .
Similarly, a little-ω notation can be used to provide a strict lower bound characterization.
Denition 6. Let f and g be two functions f, g : N → R+ . We say that
f (n) ∈ ω(g(n))
read as “f is little-omega of g,” if
f (n)
lim =∞
n→∞ g(n)
2.4.3 Observations
As you might have surmised, big-O and big-Ω are duals of each other, thus we have the
following.
Lemma 1. Let f, g be functions. Then
f (n) ∈ O(g(n)) ⇐⇒ g(n) ∈ Ω(f (n))
22
2.4 Asymptotics
Because big-Θ provides an asymptotic equivalence, both functions are big-O and big-Θ
of each other.
Equivalently,
With respect to the relationship between little-o and little-ω to big-O and big-Ω, as
previously mentioned, little asymptotics provide a stronger characterization of the growth
rate of functions. We have the following as a consequence.
and
f (n) ∈ ω(g(n)) ⇒ f (n) ∈ Ω(g(n))
Common Identities
Lemma 5. Let f1 (n), f2 (n) be functions such that f1 (n) ∈ O(f2 (n)). Then
23
2 Algorithm Analysis
Logarithms
Classes of Functions
Table 2.4 summarizes some of the complexity functions that are common when doing
algorithm analysis. Note that these classes of functions form a hierarchy. For example,
linear and quasilinear functions are also O(nk ) and so are polynomial.
24
2.4 Asymptotics
The method used in previous examples directly used the denition to nd constants c, n0
that satised an inequality to show that one function was big-O of another. This can get
quite tedious when there are many terms involved. A much more elegant proof technique
borrows concepts from calculus.
Let f (n), g(n) be functions. Suppose we examine the limit, as n → ∞ of the ratio of
these two functions.
f (n)
lim
n→∞ g(n)
25
2 Algorithm Analysis
Examples
Let’s reuse the example from before where f (n) = 100n2 + 5n and g(n) = n3 . Setting up
our limit,
f (n) 100n2 + 5n
lim = lim
n→∞ g(n) n→∞ n3
100n + 5
= lim
n→∞ n2
100n 5
= lim + lim
n→∞ n2 n→∞ n2
100
= lim +0
n→∞ n
=0
Consider the following example: let f (n) = log2 n and g(n) = log3 (n2 ). Setting up our
limit we have
f (n) log2 n
lim =
n→∞ g(n) log3 n2
log n
= 2 log2 n
2
log2 3
log2 3
=
2
= .7924 . . . > 0
And so we conclude that log2 (n) ∈ Θ(log3 (n2 )).
As another example, let f (n) = log (n) and g(n) = n. Setting up the limit gives us
log (n)
lim
n→∞ n
The rate of growth might seem obvious here, but we still need to be mathematically
rigorous. Both the denominator and numerator are monotone increasing functions. To
solve this problem, we can apply l’Hôpital’s Rule:
26
2.4 Asymptotics
Theorem 2 (l’Hôpital’s Rule). Let f and g be functions. If the limit of the quotient fg(n)
(n)
exists, it is equal to the limit of the derivative of the denominator and the numerator.
That is,
f (n) f ′ (n)
lim = lim ′
n→∞ g(n) n→∞ g (n)
Applying this to our limit, the denominator drops out, but what about the numerator?
Recall that log (n) is the logarithm base-2. The derivative of the natural logarithm is well
known, ln′ (n) = n1 . We can use the change of base formula to transform log (n) = ln (n)
ln (2)
and then take the derivative. That is,
1
log′ (n) =
ln (2)n
Thus,
=0
Pitfalls
l’Hôpital’s Rule is not always the most appropriate tool to use. Consider the following
example: let f (n) = 2n and g(n) = 3n . Setting up our limit and applying l’Hôpital’s
Rule we have
2n (2n )′
lim = lim
n→∞ 3n n→∞ (3n )′
(ln 2)2n
= lim
n→∞ (ln 3)3n
which doesn’t get us anywhere. In general, we should look for algebraic simplications
rst. Doing so we would have realized that
n
2n 2
lim n = lim
n→∞ 3 n→∞ 3
2
Since 3
< 1, the limit of its exponent converges to zero and we have that 2n ∈ O(3n ).
27
2 Algorithm Analysis
2.5 Examples
28
2.5 Examples
5. This is clearly linear, which is why the algorithm is called Linear Search,
Θ(n)
Recall that the symmetric dierence of two sets, A ⊕ B consists of all elements in A or
B but not both. Algorithm 7 computes the symmetric dierence.
29
2 Algorithm Analysis
n+m
The greatest common divisor (or GCD) of two integers a, b is the largest positive integer
that divides both a and b. Finding a GCD has many useful applications and the problem
has one of the oldest known algorithmic solutions: Euclid’s Algorithm (due to the Greek
mathematician Euclid c. 300 BCE).
The algorithm relies on the following observation: any number that divides a, b must
also divide the remainder of a since we can write b = a · k + r where r is the remainder.
This suggests the following strategy: iteratively divide a by b and retain the remainder
r, then consider the GCD of b and r. Progress is made by observing that b and r are
necessarily smaller than a, b. Repeating this process until we have a remainder of zero
gives us the GCD because once we have that one evenly divides the other, the larger
must be the GCD. Pseudocode for Euclid’s Algorithm is provided in Algorithm 8.
30
2.5 Examples
dlog (n)e
But in the end it doesn’t really matter if we think of computers as speaking in base-10,
base-2, or any other integer base greater than or equal to 2 because as we’ve observed that
all logarithms are equivalent to within a constant factor using the change of base formula.
In fact this again demonstrates again that algorithms are an abstraction independent
of any particular platform: that the same algorithm will have the same (asymptotic)
performance whether it is performed on paper in base-10 or in a computer using binary!
Back to the analysis of Euclid’s Algorithm: how many times does the while loop get
executed? We can rst observe that each iteration reduces the value of b, but by how
much? The exact number depends on the input: some inputs would only require a single
division, other inputs reduce b by a dierent amount on each iteration. The important
thing to to realize is that we want a general characterization of this algorithm: it suces
to consider the worst case. That is, at maximum, how many iterations are performed?
The number of iterations is maximized when the reduction in the value of b is minimized
at each iteration. We further observe that b is reduced by at least half at each iteration.
Thus, the number of iterations is maximized if we reduce b by at most half on each
iteration. So how many iterations i are required to reduce n down to 1 (ignoring the
last iteration when it is reduced to zero for the moment)? This can be expressed by the
equation: i
1
n =1
2
Solving for i (taking the log on either side), we get that
i = log (n)
Recall that the size of the input is log (n). Thus, Euclid’s Algorithm is linear with respect
to the input size.
31
2 Algorithm Analysis
Recall that Selection Sort is a sorting algorithm that sorts a collection of elements by
rst nding the smallest element and placing it at the beginning of the collection. It
continues by nding the smallest among the remaining n − 1 and placing it second in
the collection. It repeats until the “rst” n − 1 elements are sorted, which by denition
means that the last element is where it needs to be.
The Pseudocode is presented as Algorithm 9.
32
2.6 Other Considerations
• line 3 (and subsequent blocks of code) are executed multiple times for each
iteration of the for loop in line 1 (for i running from 1 up to n − 1
This gives us the following summation:
n−1 ∑
∑ n
1
i=1 j=i+1
line 4
line 3
line 1
The second step in our algorithm analysis outline is to identify the input size. Usually this
is pretty straightforward. If the input is an array or collection of elements, a reasonable
input size is the cardinality (the number of elements in the collection). This is usually
the case when one is analyzing a sorting algorithm operating on a list of elements.
Though seemingly simple, sometimes identifying the appropriate input size depends on
the nature of the input. For example, if the input is a data structure such as a graph,
33
2 Algorithm Analysis
the input size could be either the number of vertices or number of edges. The most
appropriate measure then depends on the algorithm and the details of its analysis. It
may even depend on how the input is represented. Some graph algorithms have dierent
eciency measures if they are represented as adjacency lists or adjacency matrices.
Yet another subtle diculty is when the input is a single numerical value, n. In such
instances, the input size is not also n, but instead the number of symbols that would be
needed to represent n. That is, the number of digits in n or the number of bits required
to represent n. We’ll illustrate this case with a few examples.
Sieve of Eratosthenes
A common beginner mistake is made when analyzing the Sieve of Eratosthenes (named
for Eratosthenes of Cyrene, 276 BCE – 195 BCE). The Sieve is an ancient method for
prime number factorization. A brute-force algorithm, it simply tries every integer up to
a point to see if it is a factor of a given number.
Input : An integer n
Output : Whether n is prime or composite
1 for i = 2, . . . , n do
2 if i divides n then
3 Output composite
4 end
5 end
6 Output prime
34
2.6 Other Considerations
This distinction is subtle but crucial: the dierence between a polynomial-time algorithm
and an exponential algorithm is huge even for modestly sized inputs. What may take
a few milliseconds using a polynomial time algorithm may take billions and billions of
years with an exponential time algorithm as we’ll see with our next example.
Computing an Exponent
As a nal example, consider the problem of computing a modular exponent. That is,
given integers a, n, and m, we want to compute
an mod m
A naive (but common!) solution might be similar to the Java code snippet in Code
Sample 2.7.
Whether one chooses to treat multiplication or integer division as the elementary operation,
the for-loop executes exactly n times. For “small” values of n this may not present a
problem. However, for even moderately large values of n, say n ≈ 2256 , the performance
of this code will be terrible.
To illustrate, suppose that we run this code on a 14.561 petaFLOP (14 quadrillion oating
point operations per second) super computer cluster (this throughput was achieved by
the Folding@Home distributed computing project in 2013). Even with this power, to
make 2256 oating point operations would take
2256
15
≈ 2.5199 × 1053
14.561 × 10 · 60 · 60 · 24 · 365.25
or 252 sexdecilliion years to compute!
For context, this sort of operation is performed by millions of computers around the
world every second of the day. A 256-bit number is not really all that “large”. The
problem again lies in the failure to recognize the dierence between an input, n, and
the input’s size. We will explore the better solution to this problem when we examine
Repeated Squaring in Section 5.3.
35
2 Algorithm Analysis
A proper analysis would use the addition in line 4 as the elementary operation, leading
to a Θ(n) algorithm. However, for the sake of argument, let’s perform a detailed analysis
with respect to each of these operations. Assume that there are n values in the array,
thus the for loop executes n times. This gives us
• n + 2 total assignment operations
• n increment operations
• n comparisons
• n additions and
• 1 division
Now, suppose that each of these operations take time t1 , t2 , t3 , t4 , and t5 milliseconds
each (or whatever time scale you like). In total, we have a running time of
t1 (n + 2) + t2 n + t3 n + t4 n + t5 = (t1 + t2 + t3 + t4 )n + (2t1 + t5 ) = cn + d
Which doesn’t change the asymptotic complexity of the algorithm: considering the
36
2.6 Other Considerations
additional operations necessary for the control structures only changed the constant
sitting out front (as well as some additive terms).
The amount of resources (in this case time) that are expended for the assignment,
increment and comparison for the loop control structure are proportional to the true
elementary operation (addition). Thus, it is sucient to simply consider the most
common or most expensive operation in an algorithm. The extra resources for the control
structures end up only contributing constants which are ultimately ignored when an
asymptotic analysis is performed.
The behavior of some algorithms may depend on the nature of the input rather than
simply the input size. From this perspective we can analyze an algorithm with respect
to its best-, average-, and worst-case running time.
In general, we prefer to consider the worst-case running time when comparing algorithms.
Consider the problem of searching an array for a particular element (the array is unsorted,
contains n elements). You could get lucky (best-case) and nd it immediately in the rst
index, requiring only a single comparison. On the other hand, you could be unlucky and
nd the element in the last position or not at all. In either case n comparisons would be
required (worst-case).
What about the average case? A naive approach would be to average the worst and best
case to get an average of n+1
2
comparisons. A more rigorous approach would be to dene
a probability of a successful search p and the probability of an unsuccessful search, 1 − p.
For a successful search, we further dene a uniform probability distribution on nding
the element in each of the n indices. That is, we will nd the element at index i with
probability np . Finding the element at index i requires i comparisons. Thus the total
number of expected comparisons in a successful search is
n
∑ p p(n + 1)
i· =
i=1
n 2
For an unsuccessful search, we would require n comparisons, thus the number of expected
comparisons would be
n(1 − p)
Since these are mutually exclusive events, we sum the probabilities:
p(n + 1) p − pn + 2n
+ n(1 − p) =
2 2
37
2 Algorithm Analysis
p C
0 n
1 7 1
n+
4 8 8
1 3 1
n+
2 4 4
3 5 3
n+
4 8 8
n+1
1
2
Table 2.5: Expected number of comparisons C for various values of the probability of a
successful search p.
We cannot remove the p terms, but we can make some observations for various values
(see Table 2.6.3). When p = 1 for example, we have the same conclusion as the naive
approach. As p decreases, the expected number of comparisons grows to n.
Sometimes it is useful to analyze an algorithm not based on its worst-case running time,
but on its expected running time. That is, on average, how many resources does an
algorithm perform? This is a more practical approach to analysis. Worst-case analysis
assumes that all inputs will be dicult, but in practice, dicult inputs may be rare. The
average input may require fewer resources.
38
2.7 Analysis of Recursive Algorithms
total comparisons.
Recursive algorithms can be analyzed with the same basic 5 step process. However,
because they are recursive, the analysis (step 4) may involve setting up a recurrence
relation in order to characterize how many times the elementary operation is executed.
Though cliche, as a simple example consider a naive recursive algorithm to compute the
n-th Fibonacci number. Recall that the Fibonacci sequence is dened as
Fn = Fn−1 + Fn−2
with initial conditions F0 = F1 = 1. That is, the n-th Fibonacci number is dened as the
sum of the two previous numbers in the sequence. This denes the sequence
39
2 Algorithm Analysis
Input : An integer n
Output : The n-th Fibonacci Number, Fn
1 if n = 0 or n = 1 then
2 output 1
3 else
4 output Fibonacci(n − 1) + Fibonacci(n − 2)
5 end
a recurrence relation. Let A(n) be a function that counts the number of additions
performed by Fibonacci(n). If n is zero or one, the number of additions is zero (the base
case of our recursion performs no additions). That is, A(0) = A(1) = 0. But what about
n > 1?
When n > 1, line 4 executes. Line 4 contains one addition. However, it also contains
two recursive calls. How many additions are performed by a call to Fibonacci(n − 1)
and Fibonacci(n − 2)? We dened A(n) to be the number of additions on a call to
Fibonacci(n), so we can reuse this function: the number of additions is A(n − 1) and
A(n − 2) respectively. Thus, the total number of additions is
Suppose that we have a recursive algorithm that takes an input of size n. The recursion
may work by dividing the problem into a subproblems each of size nb (where a, b are
constants). The algorithm may also perform some amount of work before or after the
recursion. Suppose that we can characterize this amount of work by a polynomial
function, f (n) ∈ Θ(nd ).
This kind of recursive algorithm is common among “divide-and-conquer” style algorithms
that divide a problem into subproblems and conquers each subproblem. Depending on
6
Also called a dierence equation, which are sort of discrete analogs of dierential equations.
40
2.7 Analysis of Recursive Algorithms
the values of a, b, d we can categorize the runtime of this algorithm using the Master
Theorem.
Theorem 3 (Master Theorem). Let T (n) be a monotonically increasing function that
satises
T (n) = aT ( nb ) + f (n)
T (1) = c
where a ≥ 1, b ≥ 2, c > 0. If f (n) ∈ Θ(nd ) where d ≥ 0, then
Θ(nd ) if a < bd
T (n) = Θ(n log n) if a = bd
d
Θ(nlogb a ) if a > bd
“Master” is a bit of a misnomer. The Master Theorem can only be applied to recurrence
relations of the particular form described. It can’t, for example, be applied to the previous
recurrence relation that we derived for the Fibonacci algorithm. However, it can be used
for several common recursive algorithms.
As an example, consider the Binary Search algorithm, presented as Algorithm 12. Recall
that binary search takes a sorted array (random access is required) and searches for an
element by checking the middle element m. If the element being searched for is larger
than m, a recursive search on the upper half of the list is performed, otherwise a recursive
search on the lower half is performed.
Let’s analyze this algorithm with respect to the number of comparisons it performs. To
simplify, though we technically make two comparisons in the pseudocode (lines 5 and 7),
let’s count it as a single comparison. In practice a comparator pattern would be used and
logic would branch based on the result. However, there would still only be one invocation
of the comparator function/object. Let C(n) be a function that equals the number of
comparisons made by Algorithm 12 while searching an array of size n in the worst case
(that is, we never nd the element or it ends up being the last element we check).
As mentioned, we make one comparison (line 5, 7) and then make one recursive call. The
recursion roughly cuts the size of the array in half, resulting in an array of size n2 . Thus,
n
C(n) = C +1
2
Applying the master theorem, we nd that a = 1, b = 2. In this case, f (n) = 1 which is
bounded by a polynomial: a polynomial of degree d = 0. Since
1 = a = bd = 20
by case 2 of the Master Theorem applies and we have
C(n) ∈ Θ(log (n))
41
2 Algorithm Analysis
Recall that Merge Sort is an algorithm that works by recursively splitting an array of
size n into two equal parts of size roughly n2 . The recursion continues until the array
is trivially sorted (size 0 or 1). Following the recursion back up, each subarray half is
sorted and they need to be merged. This basic divide and conquer approach is presented
in Algorithm 13. We omit the details of the merge operation on line 4, but observe that
it can be achieved with roughly n − 1 comparisons in the worst case.
42
2.7 Analysis of Recursive Algorithms
n
C(n) = 2C + (n − 1)
2
Again, applying the Master Theorem we have that a = 2, b = 2 and that f (n) = (n + 1) ∈
Θ(n) and so d = 1. Thus,
2 = a = bd = 21
and by case 2,
C(n) ∈ Θ(n log (n))
We will apply the Master Theorem to several other recursive algorithms later on.
43
3 Storing Things
3.1 Lists
3.2 Sets
3.3 Hash-Tables
Motivation:
• Linked lists provide O(n), O(1) access/insert eciency
• Array-based lists provide O(1), O(n) access/insert eciency
• Balanced binary search trees provide O(log n) operations
• Hash tables provide amortized O(1) operations
Basics:
• Elements are stored in a xed-size array
• The location of an element (its index) is computed by computing a hash value of
the element
• We can restrict attention to integer keys (elements/objects can be mapped to an
integer key based on their state)
• Computing the hash value of a key provides quick, random access to the associated
bucket containing the element (value)
• Insertion and retrieval are achieved by computing the hash value and accessing the
array
• Simply put, a hash function is a function that maps a large domain to a small
co-domain
45
3 Storing Things
Index 0 1 2 3 4 5 6 7
Value 13 4 10 16 86
But, now suppose we attempt to insert the key 100: h(100) = 1. The array cell is already
occupied by 4. This is known as a collision
3.3.2 Collisions
• By denition, hash functions map large sets to small sets; they cannot be one-to-one
• Some elements k1 , k2 will necessarily map to the same hash value h(k1 ) = h(k2 )
• When two elements map to the same hash value, it is called a collision
• Collisions can be resolved using various strategies; we examine two:
1. Open addressing
2. Chaining
Open Addressing
Open addressing involves probing the hash table to nd an open cell in the table. Several
strategies are available.
• Linear probing involves probing the table using a linear function:
h(k, i) = h(k) + c1 · i mod m
46
3.3 Hash-Tables
for i = 1, . . . , m − 1 and c1 ∈ Zm
• Quadratic probing involves probing the table using a quadratic function:
for i = 0, 1, . . . , m
• In general, c1 , c2 ∈ Zm
• Probing wraps around the table (modulo m)
• Double Hashing involves using a second hash function s(k) to probe:
for i = 0, 1, . . . , m
Chaining
• No probing, instead each bucket in the hash table represents a linked list
• Alternatively: another ecient collection (BST, etc.)
• Each element is added to the list associated with the hash value
• Retrieval now may involve traversing another data structure
• Retrieval may now also involve another mechanism to identify each object in the
collection
• No more issues with deletion
• Rehashing may still be necessary to prevent each list/collection from getting too
large
Eciency of a hash table depends on how full it is. Let n be the number of elements in
the table, then the load factor of the table is
n
α=
m
47
3 Storing Things
• As a hash table lls up, more collisions slow down access of elements
• If deletion is allowed, more cells become dirty and slow down access of elements
• To accommodate more elements and to ensure that insert/retrieval remains fast a
rehashing of the tables
1. A new, larger table is created with a new hash function using a larger m
2. Each element is reinserted into the new hash table, an O(n) operation.
• Rehashing is done infrequently (when the load factor is exceeded): when averaged
over the lifecycle of the data structure, it leads to amortized constant time.
The Java Collections library provides several data structures that are backed by hash
tables that utilize an object’s hashCode() method and rely on several assumptions. For
this reason, it is best practice to always override the hashCode() and equals() methods
for every user-dened object so that they depend on an object’s entire state.
• java.util.HashSet<E>
– Implements the Set interface
48
3.4 Bloom Filters
ol
3.6 Exercises
(a) Insert the above elements into a hash table, using linear probing to resolve collisions.
Show what the hash table would look like after inserting the elements in the order
above.
49
3 Storing Things
(b) Insert the above elements into a hash table using chaining to resolve collisions. Show
what the hash table would look like after inserting the elements in the order above.
50
4 Brute Force Style Algorithms
4.1 Introduction
4.1.1 Examples
Shuing Cards
• There are 52! ≈ 8.065817 × 1067 possible shues
• If 5 billion people shued once per second for the last 1,000 years, only:
5.1118 × 1050
gcd(a, b)
51
4 Brute Force Style Algorithms
• Then
min{k1 ,`1 } min{k2 ,`2 }
gcd(a, b) = p1 p2 · · · pnmin{kn ,`n }
4.1.2 Backtracking
• We iteratively (or recursively) build partial solutions such that solutions can be
“rolled back” to a prior state
• Once a full solution is generated, we process/test it then roll it back to a (previous
partial) solution
• Advantage: depending on the nature/structure of the problem, entire branches of
the computation tree may be “pruned” out
• Avoiding infeasible solutions can greatly speed up computation in practice
• Heuristic speed up only: in the worst case and in general, the computation may
still be exponential
Recall that combinations are simply all possible subsets of size k. For our purposes, we
will consider generating subsets of
{1, 2, 3, . . . , n}
52
4.2 Generating Combinatorial Objects
53
4 Brute Force Style Algorithms
Iteration Combination
1 1234
Iteration Combination 2 1235
3 1236
1 123
4 1245
2 124
5 1246
3 125
6 1256
4 134
7 1345
5 135
8 1346
6 145
9 1356
7 234
10 1456
8 235
11 2345
9 245
12 2346
10 345
5 13 2356
(a) Sequence for 3
14 2456
15 3456
6
(b) Sequence for 4
• We can adapt the two previous algorithms to generate these permutations: use
the combinations algorithm to generate all distinct subsets of size k; then run the
permutation algorithm on each subset to generate ordered permutations.
54
4.2 Generating Combinatorial Objects
55
4 Brute Force Style Algorithms
56
4.2 Generating Combinatorial Objects
n
∑ n
Bn+1 = Bk
k=0
k
with B0 = 1.
57
4 Brute Force Style Algorithms
4.3.1 Satisability
P (~x) = P (x1 , . . . , xn )
• Example:
(x1 ∨ ¬x2 ) ∧ (¬x1 ∨ x2 )
is satisable (there are two assignments for which the exclusive-or evaluates to
true)
• Example:
(x1 ∨ x2 ∨ x3 ) ∧ (x1 ∨ ¬x2 ) ∧ (x2 ∨ ¬x3 ) ∧ (x3 ∨ ¬x1 ) ∧ (¬x1 ∨ ¬x2 ∨ ¬x3 )
Problem 1 (Satisability).
Given: A boolean predicate P on n variables
Output: True if P is satisable, false otherwise
58
4.3 Problems & Algorithms
N (v) = {x|(u, v) ∈ E}
Variations:
59
4 Brute Force Style Algorithms
x1
0 1
x2 x2
0 1 0 1
x3 x3 x3 x3 d=n
xn xn
2n
is minimized.
60
4.3 Problems & Algorithms
61
4 Brute Force Style Algorithms
• To see how the walk approach can provide a substantial speed up observe the graph
if Figure 4.3
• Any permutation of vertices that involves edges that are not present need not be
processed
• Backtracking computation is illustrated in Figure 4.4
62
4.3 Problems & Algorithms
b c
a f
d e
and ∑
val(S) = val(s)
s∈S
is maximized.
An example input can be found in Figure 4.5. The optimal solution is to take items
a1 , a3 , a4 for a value of 29. Another example can be found in Table 9.7 in a later section
where we look at a dynamic programming solution for the same problem.
• We wish to maximize the value of items (in general, an “objective function”) taken
subject to some constraint
• 0-1 refers to the fact that we can an item or leave it
• Variation: the dual of the problem would be to minimize some loss or cost
• Variation: fractional knapsack
• A greedy strategy fails in general; a small example: three items with weights 1, 2, 3
and values 6, 10, 12 respectively; and a total weight capacity of 4. The ratios would
be 6, 5, 4 respectively meaning that a greedy strategy would have us select the rst
two items for a total weight of 3 and value of 16. However, the optimal solution
would be to select the rst and third item for a total weight of 4 and value of 18.
• A brute-force backtracking algorithm is presented as Algorithm 14; note that you
would invoke this algorithm with j = 0 and S = ∅ initially.
63
4 Brute Force Style Algorithms
b d
b c d b
a c d e e f c e
d e f a c f f e e f c f
f e f c f e f c
(a) Hamiltonian path computation tree (b) Hamiltonian path computation tree starting from
starting from b. a.
Figure 4.4: Brute Force Backtracing Hamiltonian Path Traversal. The rst traversal
starts from b and nds no Hamiltonian Path. The second traversal nds
several and would terminate prior to exploring the entire computation tree.
The entire tree is presented for completeness.
64
4.3 Problems & Algorithms
• A set of point P in the plane is convex if for any pair of points p, q ∈ P every point
in the line segment pq is contained in P
65
4 Brute Force Style Algorithms
{a1 , a2 , a3 , a4 }
• The idea is to choose extreme points: points that when connected form the border
of the convex hull
• A pair of points p1 = (x1 , y1 ), p2 = (x2 , y2 ) are extreme points if all other p ∈
P \ {p1 , p2 } lie on the same side of the line dened by p1 , p2
66
4.3 Problems & Algorithms
Problem 7 (Assignment).
Given: A collection of n tasks T = {t1 , . . . , tn } and n persons P = {p1 , . . . , pn } and an
n × n matrix C with entries ci,j representing the cost of assigning person pi to task tj .
Output: A bijection f : P → T such that
∑
C[i, f (pi )]
pi ∈P
is minimized.
67
4 Brute Force Style Algorithms
68
4.4 Exercises
4.4 Exercises
Exercise 4.1. Implement the brute-force solution for the Hamiltonian Path problem in
the high-level programming language of your choice.
Exercise 4.2. Implement the brute-force solution for the 0-1 Knapsack problem in the
high-level programming language of your choice.
Exercise 4.3. Consider the following frequent itemset problem: we are given a set of
items A = {a1 , a2 , . . . , an } and a set of baskets B1 , B2 , . . . , Bm such that each basket is
a subset containing a certain number of items, Bi ⊆ A. Given a support s, frequent
itemsets are subsets S ⊂ A such that S is a subset of t baskets (that is, the items in S
all appear in at least s baskets).
For example, suppose that we have the following baskets:
B1 = {beer, screws, hammer}
B2 = {saw, lugnuts}
B3 = {beer, screws, hammer, lugnuts}
B4 = {beer, screws, hammer, router}
If our support s = 2 then we’re looking for all subsets of items that appear in at least 2
of the baskets. In particular:
• {lugnuts} as it appears in B2 , B3
• {beer} as it appears in B1 , B3 , B4
• {hammer} as it appears in B1 , B3 , B4
• {screws} as it appears in B1 , B3 , B4
• {beer, screws} as it appears in B1 , B3 , B4
• {hammer, screws} as it appears in B1 , B3 , B4
• {hammer, beer} as it appears in B1 , B3 , B4
This problem is widely seen in commerce, language analysis, search, and nancial
applications to learn association rules. For example, a customer analysis may show that
people often buy nails when they buy a hammer (the pair has high support), so run a
sale on hammers and jack up the price of nails!
For this exercise, you will write a program that takes a brute force approach (with
pruning) and nds the support for each possible subset S ⊆ A. If there is zero support,
you will omit it from your output. Essentially you are solving the problem for s = 1, but
we’re not only interested in the actual sets, but the sets and the number of baskets that
they appear in.
Write a solution to this problem in the high-level programming language of your choice.
69
5 Divide & Conquer Style Algorithms
5.1 Introduction
Divide & conquer approach already seen in other algorithms: Merge Sort, Quick Sort,
Binary Search, etc.
Google’s MapReduce
Example:
a77 = a64 · a13
= a64 · a8 · a5
= a64 · a8 · a4 · a1
a2 = a·a
a4 = a2 · a2
a8 = a4 · a4
a16 = a8 · a8
a32 = a16 · a16
a64 = a32 · a32
Which only requires 6 multiplications, then a77 can be computed with an additional 3
multiplications (instead of 277 = 1.511 × 1023 ).
71
5 Divide & Conquer Style Algorithms
We still evaluate each term independently however, since we will need it in the next term
(though the accumulated value is only multiplied by 1).
1 1 0 1 0 = (26)2
4 3 2 1 - i
1 16 13 8 12 term
9 9 8 8 1 product
72
5.4 Euclid’s GCD Algorithm
Thus,
1226 mod 17 = 9
Using algebra, we can reason that any divisor of 184 and 1768 must also be a divisor of
the remainder, 112. Thus,
gcd(a, b) = gcd(b, r)
73
5 Divide & Conquer Style Algorithms
Input : Integers, a, b ∈ Z+
Output : gcd(a, b)
1 while b 6= 0 do
2 t←b
3 b ← a mod t
4 a←t
5 end
6 output a
b
=1
2n
Theorem 4. If a and b are positive integers, then there exist integers s, t such that
gcd(a, b) = sa + tb
74
5.4 Euclid’s GCD Algorithm
a0 b0 t0 t s0 s q r
27 58 0 1 1 0 0 27
58 27 1 0 0 1 2 4
27 4 0 1 1 -2 6 3
4 3 1 -6 -2 13 1 1
3 1 -6 7 13 -15 3 0
Therefore,
gcd(27, 58) = 1 = (−15)27 + (7)58
Example:
Compute gcd(25480, 26775) and nd s, t such that
75
5 Divide & Conquer Style Algorithms
a0 b0 t0 t s0 s q r
25480 26775 0 1 1 0 0 25480
26775 25480 1 0 0 1 1 1295
25480 1295 0 1 1 -1 19 875
1295 875 1 -19 -1 20 1 420
875 420 -19 20 20 -21 2 35
420 35 20 -59 -21 62 12 0
Therefore,
gcd(25480, 26775) = 35 = (62)25480 + (−59)26775
(4 · 1 + 7 · 5) = (4 + 7) · (1 + 5) − (4 · 5) − (7 · 1)
(a + b)(c + d) = ac + ad + bc + bd
ad + bc = (a + b)(c + d) − ac − bd
a = a1 · 10n/2 + a0
b = b1 · 10n/2 + b0
76
5.6 Karatsuba Multiplication
That is, a1 are the higher order bits and a0 are the lower order bits.
a · b = a1 · 10n/2 + a0 · b1 · 10n/2 + b0
= a1 · b1 · 10n + (a1 · b0 + a0 · b1 ) · 10n/2 + a0 · b0
= a1 · b1 · 10n + (a1 + a0 )(b1 + b0 ) − a1 · b1 − a0 · b0 · 10n/2 + a0 · b0
Observations:
• Due to Karatsuba (1960)
• Powers of 10 are a simple shift
• If a, b are not powers of two, we can “pad” them out with leading zeros
• Three recursive calls to compute the multiplication
• Overhead in recursion/book keeping only makes this better for large numbers
(thousands of digits)
n
M (n) = 3 · M +0
2
By the Master Theorem:
M (n) ∈ Θ(nlog2 3 )
As log2 (3) = 1.5849625 . . ., this is an asymptotic improvement over the naive multiplica-
tion.
Example:
a · b = 72, 893, 424 · 33, 219, 492
a1 = 7, 289
a0 = 3, 424
b1 = 3, 321
b0 = 9, 492
Then (after recursion):
a1 · b 1 = 24, 206, 769
a0 · b 0 = 32, 500, 608
x = a1 + a0 = 10, 719
y = b 1 + b0 = 12, 813
x·y = 137, 342, 547
Altogether:
a · b = 24, 206, 769 · 108 + (137, 342, 547 − 24, 206, 769 − 32, 500, 608) · 104 + 32, 500, 608
= 2, 421, 483, 284, 200, 608
77
5 Divide & Conquer Style Algorithms
n−1
∑
ci,j = ai,k · bk,j
k=0
where
The number of multiplications and additions for a 2 × 2 matrix using Strassen’s formulas
are 7 and 18 respectively. Contrast with 8 and 4 operations for the brute force method.
We’re doing a lot more in terms of additions, and only slightly less in terms of multiplications–
but it turns out that this is still asymptotically better than the brute force approach.
Strassen’s formulas can be generalized to matrices rather than just scalars
C00 C01 A00 A01 B00 B01
=
C10 C11 A10 A11 B10 B11
78
5.7 Strassen’s Matrix Multiplication
M 1 + M4 − M 5 + M7 M3 + M5
=
M2 + M4 M 1 + M3 − M 2 + M6
For additions, the same 7 matrices of size n/2 must be recursively dealt with while 18
additions must be made on matrices of size n/2, so
Multiplications Additions
Brute-Force Θ(n3 ) Θ(n3 )
Strassen’s Θ(nlog2 7 ) Θ(nlog2 7 )
Many other algorithms have been discovered with the best being Winogard’s algorithm
at O(n2.375477 ).
More recently this was improved (analytically) to O(n2.3727 ) (Williams 2012 [15])
However, the overhead and multiplicative constants for these algorithms make them
unfeasible for even moderately sized matrices.
79
5 Divide & Conquer Style Algorithms
Without loss of generality, we can assume that the given point set, P = {p1 =
(x1 , y1 ), . . . , pn = (xn , yn )} are sorted in ascending order with respect to x coordinates,
x1 ≤ x2 ≤ . . . ≤ xn .
Then we have the following divide and conquer strategy:
• Conquer: Find the two closest pair of points in each set P1 , P2 recursively
• Combine: We take the minimum distance between the two subproblems, d1 , d2 ;
however, we must also check points with distances that cross our dividing line
From the dividing line, we need only check points in two regions: [y − d, y + d] where
d = min{d1 , d2 }. For each point, one needs only check, at most 6 points (see gure 4.6,
p149).
Such a check takes linear time–there are n/2 points (we need only check C1 ) and up to 6
distance calculations each.
This suggests the usual recurrence,
n
T (n) = 2T + Θ(n)
2
which by the Master Theorem Θ(n log n).
Levitin describes a “Quick Hull” algorithm. Here we present a dierent divide & conquer
approach. Yet another approach is the Graham Scan (a “walk” approach).
Basic Idea:
1. Divide: partition into two sets A, B according to a lower/upper half with respect
to the x coordinate
2. Conquer: Compute the convex hull of A and B
3. Combine: Combine into a larger convex hull by computing the upper/lower
tangents of each and throwing out all internal points (O(n) operation)
80
5.10 Fast Fourier Transform
Combine procedure:
• Starting with the right-most point a of the lower hull and the left-most point of
the upper hull
• Iterate through the hull points clockwise in the lower hull and counter clockwise on
the upper partition, alternating between the two
• Repeat until the tangent ab has been found
• Tangent has been found when the “next” point in each hull is contained in the new
hull
• Only required to look at the immediate next point (if the next point is above or
below, stop)
• Repeat the process for the upper tangent
• Article: http://blog.revolutionanalytics.com/2014/01/the-fourier-transform-explained-i
html
• Nice Visualization: http://bbrennan.info/fourier-work.html
5.11 Exercises
Exercise 5.2. Consider the following problem: Given an array of sorted integers and
two values l, u (lower and upper bounds), nd the indices i, j such that l ≤ A[k] ≤ u for
all k, i ≤ k ≤ j. Solve this problem by developing an O(log n) algorithm. Give good
pseudocode and analyze your algorithm.
Exercise 5.3. Adaptive Quadrature is a numerical method for approximating the denite
integral of a function f (x):
∫ b
f (x) dx
a
It works as follows: given a, b and a tolerance τ , it approximates the integral using some
method. It then computes an error bound for that method and if the error is less than τ
then the estimation is used. If the error is greater than τ , then the midpoint m = a+b 2
is
used to recursively integrate on the intervals [a, m] and [m, b], updating τ to τ2 .
81
5 Divide & Conquer Style Algorithms
Write good pseudocode for this algorithm and analyze its complexity.
Exercise 5.4. Let A be a zero-index array that is sorted. Now suppose that someone
comes along and cyclicly “shifts” the contents of A so that the contents are moved some
number of positions, with those at the end being shifted back to the beginning. For
example, if the array contained the elements 2, 6, 10, 20, 30, 40 and it were shifted by
3 positions, the resulting array would be 20, 30, 40, 2, 6, 10 (in general, you will not be
given how many positions the array has been shifted).
Design an ecient (that is an O(log n)) algorithm that can compute the median element
of A. Give good pseudocode and a short explanation. Analyze your algorithm.
82
6 Linear Systems
6.1 Introduction
TODO
3x + 2y = 35
x + 3y = 27
83
6 Linear Systems
Ax = b → A′ x = b′
An upper-triangular system is easier to solve because we can easily back-track (from the
bottom right) and solve each unknown. This is known as backward substitution (not to
be confused with the method for solving recurrence relations).
Now that we know how to back-solve, how do we take a given system and transform it
to an upper-triangular system? We use several elementary row operations:
• We can exchange any two rows of a system
1
• We can exchange any two columns of a system
• We can multiply an equation by a non-zero scalar
• We can add or subtract one equation with another (or a multiple thereof)
Such operations do not change the solution to the system.
Example
3x1 − x2 + 2x3 = 1
4x1 + 2x2 − x3 = 4
x1 + 3x2 + 3x3 = 5
3 −1 2 1
4 2 −1 4
1 3 3 5
3 −1 2 1 ⇒
4 2 −1 4 subtract 4 times row 1
3
1 3 3 5 subtract 13 times row 1
3 −1 2 1 ⇒
0 10 − 11 8
3 3 3
0 10
3
7
3
14
3
subtract 1 times row 2
3 −1 2 1
0 10 − 11 8
3 3 3
0 0 6 2
1
If we do, however, we must make note that the variables have also been exchanged. For instance if we
exchange column i and column j then xi and xj have exchanged.
84
6.2 Solving Linear Systems
Using elementary row operations, we can proceed as follows. Using a11 as a pivot, we
subtract a21 /a11 times the rst equation from the second equation, and continue with
a31 /a11 from the third, etc. In general, we subtract ai1 /a11 times row 1 from the i-th row
for 2 ≤ i ≤ n. We repeat for a22 as the next pivot, etc.
Issues:
• It may be the case that we attempt to divide by zero, which is invalid. To solve
this problem, we can exchange a row!
• Though a divisor may not be zero, it could be so small that our quotient is too
large. We can solve this one of two ways– either scale the current row so that it
becomes 1 or we can again exchange rows. Specically, the row with the largest
absolute value in the i-th row.
• The system may be indeterminant: there may be one or more free-variables (an
innite number of solutions). Visually, both equations correspond to the same line
(intersecting at an innite number of points); example:
x + 7y = 8
2x + 14y = 16
• The system may be inconsistent (there are no solutions). Visually the equations
correspond to two parallel lines that never intersect. Example:
x + 7y = 8
2x + 14y = 20
85
6 Linear Systems
86
6.2 Solving Linear Systems
Analysis
The multiplication is our elementary operation. We thus have the following summation:
n ∑
n−1 ∑ n+1
∑ 1
C(n) = 1 ≈ n3 ∈ Θ(n3 )
i=1 j=i+1 k=i
3
n(n−1)
Backward substitution will work on the remaining entries making about 2
multipli-
cations and n divisions, so the asymptotic classication doesn’t change.
6.2.1 LU Decomposition
AA−1 = I
87
6 Linear Systems
where I is the identity matrix (1’s on the diagonal, 0’s everywhere else). The inverse of a
matrix is always unique. However not every matrix is invertible–such matrices are called
singular.
Recall that when using Gaussian elimination, the system may be inconsistent. That
is, no valid solution exists. One can prove that this is the case if and only if A is not
invertible which in turn is the case if and only if one of the rows is a linear combination
of another row.
Such a linear combination can be detected by running Gaussian Elimination on A. If
one of the diagonal entries is 0 then A is singular. Otherwise A is consistent.
Finding an inverse is important because it allows us an implicit division operation among
matrices2 :
A−1 Ax = A−1 b
x = A−1 b
AX = I
Axj = ej
for 1 ≤ j ≤ n
Alternatively, you can adapt Gaussian Elimination to compute an inverse as follows.
Given A, you create a “super augmented” matrix,
∣
A∣ I
Where I is the n × n identity matrix, resulting in an n × 2n. Then you perform Gaussian
Elimination to make the left part upper triangular, then perform a Gaussian Elimination
for the upper part as well, then normalize each row so that each entry on the diagonal is
1 (this is actually Gauss-Jordan elimination). The resulting augmented matrix represents
∣ −1
I ∣A
2
be careful: remember, matrix multiplication is not commutative in general
88
6.2 Solving Linear Systems
6.2.3 Determinants
n
∑
det A = sj a1j det Aj
j=1
where
1 if j is odd
sj =
−1 if j is even
• a1j is the element in row 1 and column j
• Aj is the (n − 1) × (n − 1) matrix obtained by deleting row 1 and column j from A
• If n = 1 then det A = a11
Determinants are easy to compute for 2 × 2 matrices and even 3 × 3 matrices. There is
also an easy to remember visual way to do it.
If you can easily compute a determinant you can easily tell if a matrix is invertible or
not:
If you were to use the denition to compute a determinant, you would be summing n!
terms! Instead we can again use Gaussian elimination.
The determinant of an upper-triangular matrix is simply the product of the elements on its
diagonal. Thus we can use Gaussian elimination to transform A into an upper-triangular
matrix A′ and simply compute
∏n
det A′ = a′ii
i=1
Some things to keep in mind about determinants when using Gaussian elimination:
89
6 Linear Systems
Though not algorithmically sound (it is far too complex, cumbersome to program and
certainly not stable3 ) Cramer’s Rule does give us an explicit solution for a system of
linear equations:
det A1 det A2 det An
x1 = , x2 = , . . . xn =
det A det A det A
where Aj is the matrix obtained by replacing the j-th column of A by the vector b.
maximize ~c T · ~x
subject to Ax ≤ ~b
~x ≥ 0
The simplex method (Dantzig, 1947) is an old way of solving linear programs
• Start at an arbitrary point
3
stable in the sense that oating point operations are prone to error and that error propagates
90
6.3 Linear Programming
• Travel along the feasible region to the next vertex (in n-space)
n
• In general, the optimal solution lies on the polytope with m extreme points (n
variables, m constraints)
• May have an exponential number of such critical points
• Choice of direction: that which improves the solution the most (the direction in
which the variable has the largest partial derivative)
• In general, pivot choices may lead to aberrant (exponential) behavior or could even
cycle
• Very fast in practice
• Polynomial-time algorithm (Khachiyan, 1979) and better interior-point method
(Karmarkar, 1984) among other improvements
maximize 4x + 3y
subject to x ≥ 0
y ≥ 0
x ≤ 9
2y + x ≤ 12
2y − x ≤ 8
(2, 5) y = − 43 x + 13.5
y ≤ 12 x + 4
y≥0 y = − 43 x + 7
(0, 4)
y ≤ − 12 x + 6
Figure 6.1: Visualization of a Linear Program with two sub-optimal solution lines and
the optimal one.
91
6 Linear Systems
Point Value
(0, 0) 0
(0, 4) 12
(2, 5) 23
(9, 1.5) 40.5
(9, 0) 36
6.3.1 Formulations
We can transform and conquer several previously examined problems as follows: we can
reformulate the problem as an optimization problem (a linear program) and solve it with
the simplex method or other algorithm for LP.
Knapsack
n
∑
maximize vi · xi
i=1
∑n
subject to w i · xi ≤ W
i=1
xi ∈ {0, 1} 1 ≤ i ≤ n
92
6.3 Linear Programming
n
∑
maximize vi · xi
i=1
∑n
subject to w i · xi ≤ W
i=1
0 ≤ xi ≤ 1 1 ≤ i ≤ n
• This formulation doesn’t solve the original 0-1 Knapsack problem, it could come
up with a solution where a fraction of an item is taken
∑∑
minimize ci,j · xi,j
i j
∑
subject to xi,j = 1 ∀j
i
xi,j ∈ {0, 1} ∀i, j
Where xi,j is an indicator variable for assigning task i to person j (with associated cost
ci,j )
This program has integer constraints (ILP) and is, in general, NP-complete. We could
relax it:
∑∑
minimize ci,j · xi,j
i j
∑
subject to xi,j = 1 ∀j
i
0 ≤ xi,j ≤ 1 ∀i, j
• It doesn’t make much sense to assign half a task to a quarter of a person (or maybe
it does!)
• However, it turns out that certain problems are unimodular
• A matrix is unimodular if it is:
93
6 Linear Systems
– square
– integer entries
– has a determinant that is −1, 1
• Further it is totally unimodular if every square non-singular submatrix is unimodular
(thus determinant is 0, −1, 1)
• Fact: if an ILP’s constraint matrix is totally unimodular, then its optimal solution
is integral
• If an optimum exists, it is composed of integers: ~x = (x1 , . . . , xn ), xi ∈ Z
• Fact: The assignment problem is totally unimodular
• Consequence: the relaxation of the assignment problem’s ILP will have the same
optimal solution as the original ILP!
94
6.4 Exercises
6.4 Exercises
Exercise 6.1. Let T be a binary search tree and let v1 , v2 , . . . , vn be a preorder traversal
of its nodes. Design an algorithm that reconstructs the tree T given v1 , . . . , vn .
Exercise 6.2. Diagram each step of inserting the following keys into an initially empty
AVL tree. Clearly indicate which type of rotations occur.
Exercise 6.3. Diagram each step of inserting the following keys into an initially empty
2-3 tree. Clearly indicate which type of promotions and node transformations occur.
Exercise 6.5. Suppose we insert 1, 2, . . . , 25 into an initially empty AVL tree. What
will be the depth of the resulting tree?
Exercise 6.6. Suppose we insert 1, 2, . . . , 25 into an initially empty 2-3 tree. What will
be the depth of the resulting tree?
Exercise 6.7. Write an algorithm (give good pseudocode and a complete analysis) that
inverts a binary search tree so that for each node with key k, all keys in the left-sub-tree
are greater than k and all nodes in the right-sub-tree are less than k. Thus, an in-order
traversal of an inverted tree will produce a key ordering in non-increasing order instead.
An example of a binary search tree and its inversion is presented in Figure 6.2. Your
algorithm must be ecient and should not create a new tree.
95
6 Linear Systems
20 20
10 30 30 10
5 15 25 35 35 25 15 5
(a) Binary Search Tree (b) Inverted Binary Search Tree
96
7 Trees
7.1 Introduction
Motivation: we want a data structure to store elements that oers ecient, arbitrary
retrieval (search), insertion, and deletion.
• Array-based Lists
– O(n) insertion and deletion
– Fast index-based retrieval
– Ecient binary search if sorted
• Linked Lists
– Ecient, O(1) insert/delete for head/tail
– Inecient, O(n) arbitrary search/insert/delete
– Ecient binary search not possible without random access
• Stacks and queues are ecient, but are restricted access data structures
• Possible alternative: Trees
• Trees have the potential to provide O(log n) eciency for all operations
97
7 Trees
b c
d e f
g h i j k
l m n o p q
Properties:
• In a tree, all nodes are connected by exactly one unique path
• The maximum number of nodes at any level k is 2k
98
7.3 Tree Traversal
• Thus, the maximum number of nodes n for any binary tree of depth d is:
d
∑
n = 20 + 21 + 22 + · · · + 2d−1 + 2d = 2k = 2d+1 − 1
k=0
d = log (n + 1) − 1
• A preorder traversal strategy visits nodes in the following order: root; left-sub-tree;
right-sub-tree
• An example traversal on the tree in Figure 7.1:
a, b, d, g, l, m, r, h, n, e, i, o, c, f, j, p, q, k
• Applications:
– Building a tree, traversing a tree, copying a tree, etc.
– Ecient stack-based implementation
– Used in prex notation (polish notation); used in languages such as Lisp/Scheme
99
7 Trees
• An inorder traversal strategy visits nodes in the following order: left-sub-tree; root;
right-sub-tree
• An example traversal on the tree in Figure 7.1:
l, g, r, m, d, h, n, b, e, o, i, a, c, p, j, q, f, k
• Applications:
– Enumerating elements in order in a binary search tree
– Expression trees
100
7.3 Tree Traversal
• Applications:
– Topological sorting
– Destroying a tree when manual memory management is necessary (roots are
the last thing that get cleaned up)
– Reverse polish notation (operand-operand-operator, unambiguous, used in old
HP calculators)
101
7 Trees
102
7.3 Tree Traversal
103
7 Trees
Preorder Implementations
104
7.3 Tree Traversal
Inorder Implementation
Stack-based implementation:
• The same basic idea: push nodes onto the stack as you visit them
• However, we want to delay processing the node until we’ve explored the left-sub-tree
• We need a way to tell if we are visiting the node for the rst time or returning
from the left-tree exploration
• To achieve this, we allow the node to be null
• If null, then we are returning from a left-tree exploration, pop the top of the stack
and process (then push the right child)
• If not null, then we push it to the stack for later processing, explore the left child
105
7 Trees
Postorder Implementation
Stack-based implementation:
• Same basic ideas, except that we need to distinguish if we’re visiting the node for
the rst time, second time or last (so that we can process it)
• To achieve this, we keep track of where we came from: a parent, left, or right node
• We keep track of a previous and a current node
106
7.3 Tree Traversal
107
7 Trees
BFS Implementation
• Simple rules based on local information: where you are and where you came from
• No additional data structures required
• Traversal is a “walk” around the perimeter of the tree
• Can use similar rules to determine when the current node should be processed to
achieve pre, in, and postorder traversals
• Need to take care with corner cases (when current node is the root or children are
missing)
• Pseudocode presented Algorithm 38
7.3.6 Operations
Basic Operations:
• Search for a particular element/key
• Adding an element
– Add at most shallow available spot
– Add at a random leaf
– Add internally, shift nodes down by some criteria
108
7.3 Tree Traversal
• Removing elements
– Removing leaves
– Removing elements with one child
– Removing elements with two children
Other Operations:
• Compute the total number of nodes in a tree
• Compute the total number of leaves in a tree
• Given an item or node, compute its depth
• Compute the depth of a tree
Regular binary search trees have little structure to their elements; search, insert, delete
operations are still linear with respect to the number of tree nodes, O(n). We want a
data structure with operations proportional to its depth, O(d). To this end, we add
structure and order to tree nodes.
• Each node has an associated key
• Binary Search Tree Property: For every node u with key uk in T
1. Every node in the left-sub-tree of u has keys less than uk
2. Every node in the right-sub-tree of u has keys greater than uk
• Duplicate keys can be handled, but you must be consistent and not guaranteed to
be contiguous
• Alternatively: do not allow duplicate keys or dene a key scheme that ensures a
total order
• Inductive property: all sub-trees are also binary search trees
• A full example can be found in Figure 7.2
Observation: a binary search tree has more structure: the key in each node provides
information on where a node is not located. We will exploit this structure to achieve
O(log n) operations.
Search/retrieve
110
7.4 Binary Search Trees
50
40 60
20 45 90
10 24 49 80 95
2 15 28 48 75 85
12
• Goal: nd a node (and its data) that matches a given key k
• Start at the node
• At each node u, compare k to u’s key, uk :
– If equal, element found, stop and return
– If k < uk , traverse to u’s left-child
– If k > uk , traverse to u’s right-child
• Traverse until the sub-tree is empty (element not found)
• Analysis: number of comparisons is bounded by the depth of the tree, O(d)
111
7 Trees
Insert
• Insert new nodes as leaves
• To determine where it should be inserted: traverse the tree as above
• Insert at the rst available spot (rst missing child node)
• Analysis: nding the available location is O(d), inserting is just reference juggling,
O(1)
Delete
• Need to rst nd the node u to delete, traverse as above, O(d)
• If u is a leaf (no children): its safe to simply delete it
• If u has one child, then we can “promote” it to u’s spot (u’s parent will now point
to u’s child)
• If u has two children, we need to nd a way to preserve the BST property
– Want to minimally change the tree’s structure
– Need the operation to be ecient
– Find the minimum element of the greater nodes (right sub-tree) or the maximal
element of the lesser nodes (left sub-tree)
– Such an element will have at most one child (which we know how to delete)
– Delete it and store o the key/data
– Replace u’s key/data with the contents of the minimum/maximum element
• Analysis:
– Search/Find: O(d)
– Finding the min/max: O(d)
– Swapping: O(1)
– In total: O(d)
• Examples illustrated in Figure 7.3
In a Binary Search Tree (BST), insert, delete, and search operations are proportional to
the depth of the tree.
• The depth d of a tree is the length of the maximal path from the root to any leaf
112
7.5 Balanced Binary Search Trees
15 15
7 30 7 30
4 9 20 40 4 9 20 40
1 5 17 23 1 5 17 23
16
(a) A Binary Search Tree (b) Insertion of a new node (16) into a Binary
Search Tree
15 9
7 30 7 30
4 9 20 40 4 20 40
1 5 17 23 1 5 17 23
(c) Deletion of a node with two children (15). (d) Node 15 is replaced with the extremal node,
First step: nd the maximum node in the left- preserving the BST property
sub-tree (lesser elements).
9 9
7 30 4 30
4 20 40 1 5 20 40
1 5 17 23 17 23
(e) Deletion a node with only one child (7). (f) Removal is achieved by simply promoting
the single child/subtree.
Figure 7.3: Binary Search Tree Operations. Figure 7.3(b) depicts the insertion of (16)
into the tree in Figure 7.3(a). Figures 7.3(c) and 7.3(d) depict the deletion of
a node (15) with two children. Figures 7.3(e) and 7.3(f) depict the deletion
of a node with only one child (7).
113
7 Trees
Let T be a binary tree with u a node in T . The height of u is the length of the longest
path from u to any descendant leaves in its left or right subtree. For example, in the
tree in Figure 7.4, the node containing 6 has height 2 (as the longest path is from 6 to
4 which has length 2). The node containing 8 has height 3 and the root has height 4.
Whereas the depth of a node is dened with respect to the root, the height of a node is
dened with respect to leaves. The height of a tree (or subtree) is the height of its root
node. By convention, the height of a single node tree is 0 and the height of an empty
tree is -1.
The balance factor of a node u is equal to the height of u’s left subtree minus the height
of its right subtree.
balance factor(u) = h(TL ) − h(TR )
A balance factor is a measure of how skewed or unbalanced a tree or subtree is. A
balance factor of 0 indicates that a node is balanced between its left/right subtree. A
“large” positive balance factor indicates a tree that is unbalanced to the left while a large
negative balance factor indicates a tree that is unbalanced to the right. A node’s balance
factor only quanties how well balanced a tree is at that particular node, not overall.
That is, its a local property. Figure 7.4 depicts a BST with the balance factor indicated
for each node. Observe that for the node containing 4, the left and right subtree both
have a height of −1, so the balance factor is −1 − (−1) = 0. The balance factor of 8 is 3
since its left subtree has a height of 2 and its right subtree has a height of −1.
Denition 8. AVL Tree An AVL Tree is a binary search tree in which the balance factor
of every node’s left and right subtrees is 0, 1, or -1.
114
7.5 Balanced Binary Search Trees
1
10
3 −1
8 40
1 0 1
6 30 80
1 0 0
5 7 50
0
4
We need to re-balance the tree to ensure the AVL Tree properties are preserved after
insertion. To do this, we use one of several rotations depending on the type of imbalance.
There are four basic rotations:
• R – A right rotation is used when three nodes are skewed all the way to the right.
We rotate such that the middle one becomes the new root.
• L – A left rotation is used when three nodes are skewed all the way to the left. We
rotate such that the middle node becomes the new root.
• LR – A double left-right rotation is done when the middle node is skewed to the
left and its child is skewed to the right. We left-rotate at the middle node and then
right rotate at the top node.
• RL – A double right-left rotation is done when the middle node is skewed to the
right and its child is skewed to the left. We right-rotate at the middle node and
then left rotate at the top node.
These rotations are depicted in simple trees in Figure 7.5 through 7.8 and the generalized
rotations with subtrees present are presented in Figures 7.9 through 7.12.
To illustrate some of these operations, consider the example depicted Figure 7.13 where
we insert the keys, 8, 4, 7, 3, 2, 5, 1, 10, 6 into an initially empty tree.
Deleting from an AVL tree is the same as a normal BST. Leaves can be straightforwardly
deleted, nodes with a single child can have the child promoted, and nodes with both
children can have their key replaced with either the maximal element in the left subtree
(containing the lesser elements) or the minimal element in the right subtree (containing
the greater elements). However, doing so may unbalance the tree. Deleting a node may
reduce the height of some subtree which will aect the balance factor of its ancestors.
The same rotations may be necessary to rebalance ancestor nodes. Moreover, rebalancing
may also reduce the height of subtrees. Thus, the rebalancing may propagate all the way
back to the root node. An example of a worst case deletion is depicted in Figure 7.14.
115
7 Trees
−2
a
−1 0
b b
0 0 0
c a c
(a) Upon insertion of a (b) AVL tree is rebal-
new node c (a < b < c), anced: b becomes the
the AVL tree becomes new root and the BST
unbalanced at the root property is preserved.
with a balance factor of
−2.
+2
c
+1
0
b b
0 0 0
a a c
(a) Upon insertion of a (b) AVL tree is rebal-
new node a (a < b < c), anced: b becomes the
the AVL tree becomes new root and the BST
unbalanced at the root property is preserved.
with a balance factor of
+2.
116
7.5 Balanced Binary Search Trees
+2
c
−1
0
a b
0 0 0
b a c
(a) Upon insertion of (b) AVL tree is re-
a new node b (a < balanced after a left
b < c), the AVL tree rotation about a fol-
becomes unbalanced lowed by a right rota-
at the root with a tion about c.
balance factor of +2,
but its left-subtree is
skewed to the right.
−2
a
+1 0
c b
0 0 0
b a c
(a) Upon insertion of (b) AVL tree is re-
a new node b (a < balanced after a right
b < c), the AVL tree rotation about c fol-
becomes unbalanced lowed by a left rota-
at the root with a tion about a.
balance factor of −2,
but its right-subtree
is skewed to the left.
117
7 Trees
−2
r c
−1
c r
T1 T3
T2 T3 T1 T2
a
a
(a) Upon insertion of a new node a, the AVL (b) AVL tree is rebalanced. The node c be-
Tree becomes unbalanced at r (which may be comes the new root of the (sub)tree, r becomes
a subtree) with a balance factor of −2. c’s left-child.
Figure 7.9: Generalized AVL L Rotation. Subtree T2 “swings” over to become r’s new
right subtree.
2
r c
1
c r
T3 T1
T1 T2 T2 T3
a
a
(a) Upon insertion of a new node a, the AVL (b) AVL tree is rebalanced. The node c be-
Tree becomes unbalanced at r (which may be comes the new root of the (sub)tree, r becomes
a subtree) with a balance factor of +2. c’s right-child.
Figure 7.10: Generalized AVL R Rotation. Subtree T2 “swings” over to become r’s new
left subtree.
118
7.5 Balanced Binary Search Trees
2
r g
−1
c c r
T4
T1 T1 T2 T3 T4
T2 T3
a a
a a
(a) Upon insertion of a new node a in either subtree, (b) AVL tree is rebalanced after a left rotation about c
the AVL Tree becomes unbalanced at r (which may followed by a right rotation about r making g the new
be a subtree) with a balance factor of +2. But the root. T3 “swings” over to become the left-subtree of r.
left-subtree rooted at c is skewed to the right. The node c becomes the new root of the (sub)tree, r
becomes c’s right-child.
Figure 7.11: Generalized AVL LR Rotation. Subtree T3 “swings” over to become r’s new
left subtree.
−2
r g
1
c r c
T1
T4 T1 T2 T3 T4
T2 T3
a a
a a
(a) Upon insertion of a new node a in either subtree, (b) AVL tree is rebalanced after a right rotation about
the AVL Tree becomes unbalanced at r (which may c followed by a left rotation about r making g the new
be a subtree) with a balance factor of −2. But the root. T2 “swings” over to become the right-subtree of
right-subtree rooted at c is skewed to the left. r. The node g becomes the new root of the (sub)tree, r
becomes g’s left-child.
Figure 7.12: Generalized AVL RL Rotation. Subtree T2 “swings” over to become r’s new
right subtree.
119
7 Trees
Analysis
Thus, no matter what the specic value of h is for a given tree, we can conclude that
h ∈ Θ(log n)
where n is the number of nodes in the tree. Furthermore, rotations are simply a
matter of switching out pointers (O(1)) thus searching, insertion and deletion are all
O(h) = Θ(log n) for AVL Trees.
You can get some practice with AVL trees by trying out one of the many online simulations
and animation applications. For example:
• https://www.cs.usfca.edu/~galles/visualization/AVLtree.html
• http://www.qmatica.com/DataStructures/Trees/AVL/AVLTree.html
7.5.2 B-Trees
Rather than a single key per node, a 2-3 tree may have 1 or 2 keys per node.
• 2-node – A node containing a single key k with two children, i.e. the usual type
of node in a regular BST.
• 3-node – A node containing two keys, k1 , k2 such that k1 < k2 . A 3-node has three
children:
– Left-Most-Child represents all keys less than k1
– Middle-Child represents all keys between k1 , k2
120
7.5 Balanced Binary Search Trees
Analysis
Let T be a 2-3 tree of height h. The smallest number of keys T could have is when all
nodes are 2-nodes, which is essentially a regular binary search tree. As the tree is full,
there would be at least
∑h
n≥ 2i = 2h+1 − 1
i=0
keys. Conversely, the largest number of keys would be when all nodes are 3-nodes and
every node has 2 keys and three children. Thus,
h
∑
n≤2 3i = 2 3h+1 − 1
i=0
121
7 Trees
Denition 9. A Red-Black Tree is a binary search tree in which every node has a single
(unique) key and a color, either red or black.
• The root is black and every leaf is black (this can be made trivial true by adding
sentinels).
• Every red node has only black children.
• Every path from the root to a leaf has the same number of black nodes in it.
The black height of a node x is the number of black nodes in any path from x to a root
(not counting x), denoted bh(x).
Lemma 8. In a Red-Black Tree with n internal nodes satises the height property
h ≤ 2 log2 (n + 1)
You can prove this by induction on the height h with the base case being a leaf.
As with AVL Trees, insertion and deletion of nodes may violate the Red-Black Tree
properties. Thus, we insert just like a regular BST, but we also perform rotations if
necessary.
In fact, we use the exact same Left and Right rotations as are used in AVL Trees.
However, LR and RL rotations are unnecessary.
The rst step to inserting a new node is to insert it as a regular binary search tree and
color it red.
There are three cases for insertion.
• Case 1 – If the inserted node z has a red parent and a red uncle (the cousin node
of z’s parent). In this case we recolor z’s parent, uncle, and grandfather. We then
recurse back up and see if any properties are violated at z’s grandfather.
• Case 2 – z’s parent is red but z’s uncle is black. If z is the right-sub-child of its
parent, then we perform a Left rotation. The new z becomes the new left-sub-child
of z.
• Case 3 – z’s parent is red but z’s uncle is black. z is the left-sub-child of its parent,
so we perform a Right rotation. The new z becomes the new left-sub-child of z.
• http://mathpost.la.asu.edu/~daniel/redblack.html
• http://www.ececs.uc.edu/~franco/C321/html/RedBlack/redblack.html
122
7.6 Heaps
7.6 Heaps
Denition 10. A heap is a binary tree that satises the following properties.
1. It is a full or complete binary tree: all nodes are present except possibly the last
row
2. If the last row is not full, all nodes are full-to-the-left
3. It satises the Heap Property: every node has a key that is greater than both of its
children (max-heap)
• As a consequence of the Heap Property, the maximal element is always at the root
• Alternatively, we can dene a min-heap
• Variations: 2-3 heaps, bonacci heaps, etc.
• A min-heap example can be found in Figure 7.18
Applications
• Heaps are an optimal implementation of a priority queue
• Used extensively in other algorithms (Heap Sort, Prim’s, Dijkstra’s, Human
Coding, etc.) to ensure ecient operation
7.6.1 Operations
Insert
• Want to preserve the full-ness property and the Heap Property
• Preserve full-ness: add new element at the end of the heap (last row, rst free spot
on the left)
• Insertion at the end may violate the Heap Property
• Heapify/x: bubble up inserted element until Heap Property is satised
• Analysis: insert is O(1); heapify: O(d)
Remove from top
• Want to preserve the full-ness property and the Heap Property
• Preserve full-ness: swap root element with the last element in the heap (lowest row,
right-most element)
• Heap property may be violated
• Heapify/x: bubble new root element down until Heap Property is satised
123
7 Trees
7.6.2 Implementations
Array-Based
• Root is located at index 1
• If a node u is at index i, u’s left-child is at 2i, its right-child is at 2i + 1
• If node u is at index j, its parent is at index b 2j c
• Alternatively: 0-index array left/right children/parent are at 2n + 1, 2n + 2, and
b j−1
2
c
• Advantage: easy implementation, all items are contiguous in the array (in fact, a
BFS ordering!)
• Disadvantage: Insert operation may force a reallocation, but this can be done in
amortized-constant time (though may still have wasted space)
Tree-Based
• Reference-based tree (nodes which contain references to children/parent)
• Parent reference is now required for eciency
• For eciency, we need to keep track of the last element in the tree
• For deletes/inserts: we need a way to nd the last element and rst “open spot”
• We’ll focus on nding the rst available open spot as the same technique can be
used to nd the last element with minor modications
124
7.6 Heaps
125
7 Trees
126
7.6 Heaps
nding it is O(1)
• Otherwise, we’ll need to traverse around the perimeter of the tree until we reach
the next open slot
127
7 Trees
Java has support for several data structures supported by underlying tree structures.
7.6.5 Variations
7.7 Applications
• If min/max element is always at the top; simply insert all elements, then remove
them all!
• Perfect illustration of “Smart data structures and dumb code are a lot better than
the other way around”
128
7.7 Applications
Overview
• Coding Theory is the study and theory of codes—schemes for transmitting data
• Coding theory involves eciently padding out data with redundant information to
increase reliability (detect or even correct errors) over a noisy channel
129
7 Trees
Σ → {0, 1}∗
• A block encoding is a xed length encoding scheme where all codewords have the
same length (example: ASCII); requires dlog2 ne length codes
• Not all symbols have the same frequency, alternative: variable length encoding
• Intuitively: assign shorter codewords to more frequent symbols, longer to less
frequent symbols
• Reduction in the overall average codeword length
• Variable length encodings must be unambiguous
• Solution: prex free codes: a code in which no whole codeword is the prex of
another (other than itself of course).
• Examples:
– {0, 01, 101, 010} is not a prex free code.
– {10, 010, 110, 0110} is a prex free code.
• A simple way of building a prex free code is to associate codewords with the leaves
of a binary tree (not necessarily full).
• Each edge corresponds to a bit, 0 if it is to the left sub-child and 1 to the right
sub-child.
• Since no simple path from the root to any leaf can continue to another leaf, then
we are guaranteed a prex free coding.
• Using this idea along with a greedy encoding forms the basis of Human Coding
Steps
• Consider a precomputed relative frequency function:
freq : Σ → [0, 1]
130
7.7 Applications
• Build a collection of weighted trees Tx for each symbol x ∈ Sigma with wt(Tx ) =
freq(x)
• Combine the two least weighted trees via a new node; associate a new weight (the
sum of the weights of the two subtrees)
• Keep combining until only one tree remains
• The tree constructed in Human’s algorithm is known as a Human Tree and it
denes a Human Code
Example
Construct the Human Tree and Human Code for a le with the following content.
character A B C D E F G
frequency 0.10 0.15 0.34 .05 .12 .21 .03
131
7 Trees
• Compression ratio:
(3 − 2.53)
= 15.67%
3
• In general, for text les, pack (Human Coding), claims an average compression
ratio of 25-40%.
• Degenerative cases:
– When the probability distribution is uniform: p(x) = p(y) for all x, y ∈ Σ
– When the probability distribution follows a bonacci sequence (the sum of
each of the two smallest probabilities is less than equal to the next highest
probability for all probabilities)
132
7.7 Applications
2
7
2 2 0
8 4 8
1 1
0
4 7 3
0 0 0
0
7 4 8 2
(a) Insertion of 8, 4, 7 (b) AVL tree is rebal- (c) Insertion of 3, 2 unbalances
causes the tree to be- anced after an LR Ro- the tree at node 4
come unbalanced. tation
1
7
0 0
3 8
0 0
2 4
(d) AVL tree is rebalanced
after an R Rotation
2
7 7
−1
0
3 8 4 8
4
0 −1
2 4 3 5 3 7
0
5 2 2 5 8
(e) Insertion of 5 unbalances (f) Step 1: A left rotation (g) Step 2: A right rota-
the tree at 7, an LR rotation at node 3 tion at node 7; node 5
will be performed at node 7 swings over to become
7’s left-child.
4 −1
4
0 0
3 7 4
2 7
0 0−1 −1
2 5 8 2 7 1 3 5 8
0 0
1 1 3 5 8 6 10
(h) Insertion of 1 unbal- (i) The right rotation re- (j) Insertion of 10, 6 does
ances the tree at 3, a right sults in a complete bi- not unbalance the AVL tree.
rotation about 3 is per- nary tree. The balance
formed. factor of all nodes is
zero.
133
7 Trees
50 50
−2 −1
20 60 20 60
−2
10 30 12 30
0
3 12 25 40 10 14 25 40
14 27 35 47 T1 T2 27 35 47 T1 T2
48 48
(a) Deletion of 3 causes node 10 to become unbal- (b) The rotation reduces the height of the subtree at
anced, prompting a left rotation about 10. node 12 causing an unbalance at 20. A left rotation
is performed about 20; the subtree rooted at 25
swings over to become the right subtree of 20.
60
−2 50
50
−1 30
30 60
20 40 T2
20 40
12 25 35 47 T1
12 25 35 47
10 14 27 48 T1 T2
10 14 27 48
(c) The rotation reduces the height of the subtree at (d) The rotation at the root rebalances the
node 30 causing an unbalance at the root, 50. A left tree.
rotation is performed about 50; the subtree T1 swings
over to become the right subtree of 50.
134
7.7 Applications
7 7 3, 7
8, 4, 7 4 8 2, 3, 4 7 2 4 8
(a) In- (b) The tree (c) Insertion of (d) The middle key 3 is
sertion splits into two 3, 2 causes the left promoted to the parent
of 8, 4, 7 children and node to become (root) node while 2, 4 get
causes the creates a new overfull split into 2 subtrees
root node root node.
to become
overfull
3, 7 3, 7
1, 2 4, 5 8, 10 1, 2 4, 5, 6 8, 10
(e) Insertion of 5, 1, 10 does (f) Insertion of 6 causes the
not cause any disruption, but middle node to become over-
all nodes are full at this point. full
3, 5, 7 3 7
1, 2 4 6 8, 10 1, 2 4 6 8, 10
(g) 5 is promoted while 2 children are (h) However, the promo-
created. tion of 5 caueses the root
to become overfull, the
split propagates up the
tree.
135
7 Trees
c b a, d b, d
a, b φ a c φ b, c e, f a c e, f
(a) The left sibling node can spare (b) The right sibling can spare a key; b, a are
a key, b, c are rotated. rotated.
a, c b, d c, d b, e
φ b d, e a c e a, b φ e, f a, b d f
(c) The right sibling does not have a key to (d) The immediate right sibling has a key to
spare, but the far right one does, a rotation spare, e is promoted, d fullls the empty child.
involves all three children.
b φ a, c c
a φ a, b T φ b d a, b d
(e) The left sibling does not have a (f) The right sibling does not have a key
key to spare; instead the parent is to spare; instead one of the parent keys
merged down. The resulting parent is merged down, reducing the number of
is now empty and has only one child. children nodes.
We deal with it recursively.
f d
b, c c b, d φ b f
a φ d a, b d a c e g a c e g
(g) Neither sibling has a key to spare, so the (h) A generalized rotation, b is promoted, f
two left children are merged, b is demoted. is demoted; e swings over to become f ’s new
left child.
Figure 7.16: 2-3 Tree Deletion Operations Part I. Dierent congurations result in redis-
tributions or merging of nodes. Symmetric operations are omitted.
136
7.7 Applications
b, h d, h
φ d, f j, l b f j, l
a c e g i k m a c e g i k m
(a) Another generalized rotation. b is demoted, d promoted while c swings over to be b’s right
child.
d φ
b φ b, d φ
a c e a c e
(b) A generalized merge procedure: d is merged
down; e swings over to become the right-most
child.
b, f f
φ d h b, d h
a c e g i a c e g i
(c) Another generalized merge procedure; b is merged with
d; a becomes their left-most child.
b, d b, d
a c e a c e
(d) Deleting an empty root. The root’s
child becomes the new root.
Figure 7.17: 2-3 Tree Deletion Operations Part II. Dierent congurations result in
redistributions or merging of nodes. Symmetric operations are omitted.
137
7 Trees
5 30
50 55 40 80
60 53 72 70 65 45 90 95
92 63
d−1
∑
2i = 2d − 1
k=0
L R Number of
nodes at level
level d level d d is
2d 2d
m = n − (2d − 1)
at most 2
at most 2
d
If m ≥ 22 , the
d
If m < 22 , the
open slot is in open slot is in
the left subtree the right subtree
Figure 7.19: Tree-based Heap Analysis. Because of the fullness property, we can determine
which subtree (left or right) the “open” spot in a heap’s tree is by keeping
track of the number of nodes, n. This can be inductively extended to each
subtree until the open spot is found.
138
7.7 Applications
.08
0 1
.39 .61
0 1 0 1
G : .03 D : .05
139
8 Graph Algorithms
8.1 Introduction
Graph variations:
• Directed: each edge is oriented and (u, v) is an ordered pair
• Weighted: there is a weight function dened on the edge set. It usually suces to
consider:
wt : E → Z
Representations:
• Adjacency List – An adjacency list representation of a graph G = (V, E) main-
tains |V | linked lists. For each vertex v ∈ V , the head of the list is v and subsequent
entries correspond to adjacent vertices v ′ ∈ V .
141
8 Graph Algorithms
Libraries:
• JGraphT (Java) – http://jgrapht.org/
• Boost (C++ header only library) – http://www.boost.org/doc/libs/1_54_0/
libs/graph/doc/index.html
• LEMON (C++ Graph library) – http://lemon.cs.elte.hu/trac/lemon
• Python: NetworkX (http://networkx.github.io/), igraph (http://igraph.org/)
• Ruby: https://github.com/bruce/graphy
• JavaScript: https://github.com/devenbhooshan/graph.js
Depth First Search (DFS) traverses a graph by visiting “deeper” vertices rst, before
processing the vertices. That is, it explores a graph as deeply as possible before back
tracking and visiting other vertices.
At each iteration of DFS we can choose to visit an unvisited vertex according to several
criteria:
• Lexicographic order – visit the next vertex according to the label ordering
• Weight – in a weighted graph, we can visit the vertex whose edge is the least (or
greatest) weight
In general, we can keep track of the state of nodes by two measures, a vertex’s color and
the discovery/processing “time”.
At various points in the algorithm, a vertex is colored:
• White – indicating that the vertex is unvisited (all vertices are initially white)
• Gray – indicating that the vertex has been discovered, but not yet processed (we
go deeper in the graph if possible before processing gray vertices)
• Black – indicating that the vertex has been visited and processed. By the end of
the algorithm, all vertices are black
Discovery and processing time stamps can be associated with each vertex by keeping a
global counter that increments each time a vertex is initially visited and again when it is
processed. The sequence of time stamps indicates visit/process order.
A straightforward implementation would be a recursive algorithm (Algorithms 44 and
45). The main algorithm is necessary to accommodate:
• Directed graphs - we may need to restart the search if some vertices are unreachable
from the initial vertex
142
8.2 Depth First Search
143
8 Graph Algorithms
(Algorithm 46).
Consider the small graph in Figure 8.1; the discovery and processing (nishing) time
stamps are presented in Table 8.1.
144
8.2 Depth First Search
a b
c d
Depth-First Search can produce a depth-rst forest (or tree if the graph is connected).
Edges in a DFS forest can be classied as:
• Tree Edges – Edges that are part of the search path, whenever an unvisited vertex
v ′ is discovered from the current vertex, v, (v, v ′ ) is a tree edge.
• Back Edge – From the current vertex, v, if an edge is found pointing to a vertex
that has already been discovered, the edge (v, v ′ ) is a back edge. Back edges connect
vertices to their ancestors in the DFS forest
• Forward Edge – If, after backtracking, the current vertex v points to a visited
vertex v ′ , (v, v ′ ) is a forward edge.
• Cross Edge – All other edges, they can be edges between trees in the forest or
edges between vertices in the same tree with a common ancestor
Note that:
• For undirected graphs forward and back edges are the same (no orientation)
• For undirected graphs, cross edges do not exist
• For directed graphs, cross edges may connect between components (created by
restarting the DFS subroutine) or within components
Edges can be classied as follows. From the current vertex, the color of the adjacent
vertex determines the type of edge:
145
8 Graph Algorithms
i a
c b h
d g
f e
i, a, b, c, d, e, f, g, h
Processing order:
f, e, h, g, d, c, b, a, i
The DFS Tree produced by this traversal can be found in Figure 8.3. Note that only
forward edges are present as this is a connected, undirected graph.
8.2.3 Analysis
• Each vertex is examined once when it is rst visited (pushed) and when it is nished
(popped)
• Each vertex is processed exactly once
• Each edge is examined once (twice for undirected graphs): each time it appears in
a neighborhood
• For adjacency matrices, this is O(n2 ) as each traversal will examine every entry in
the matrix (to to determine the neighborhood of each vertex)
146
8.2 Depth First Search
e g
f h
Figure 8.3: DFS Forrest with initial vertex i. Dashed edges indicate back edges.
147
8 Graph Algorithms
• For adjacency matrices this is O(n + m): for each vertex, its entire adjacency list
will have to be examined, but need not be computed
Breadth First Search (BFS) is a “shallow search.” BFS explores the nearest vertices
rst. At any given vertex v, the entire neighborhood, N (v) is explored before progressing
further in the graph.
As with DFS, a BFS may need to be restarted (disconnected graphs or unreachable
vertices in a directed graph), see Algorithms 47 and 48.
148
8.3 Breadth First Search
149
8 Graph Algorithms
a f
b c h d e
Figure 8.4: BFS Tree with initial vertex i. Dashed edges indicate cross edges.
An example BFS run on the same small example can be found in Table 8.2.
BFS can also produce a BFS Forest (or tree) with similar types of edges (Tree, Cross,
Forward, Back). The BFS Tree of the graph in Figure 8.2 can be seen in Figure 8.4.
• For undirected graphs only tree and cross edges are possible (no back/forward
edges: why?)
• For directed graphs, back and cross edges are possible; forward edges are not
Some problems can be solved by either DFS or BFS, others are more appropriately solved
by one more than the other.
150
8.4 DFS/BFS Applications
Problem 9 (Connectivity).
Given: A graph G = (V, E), two vertices s, t ∈ V
Output: true if there is a path p : s t
Variations:
• Directed or undirected paths
• A functional version: not only do we want to know if a path exists, but if one does,
we want the algorithm to output one
• Does there exist a path involving a particular vertex (or edge)?
Recall that a partial ordering is a relation on a set that is reexive, antisymmetric and
transitive. Posets can model:
• Tasks to be performed with certain prerequisites
• Entities with constrained relations, etc.
In general, consider a directed acyclic graph (DAG). A topological ordering (or sorting)
is an arrangement of vertices that is consistent with the connectivity dened by the
underlying graph.
That is, if (u, v) ∈ E then u precedes v in the topological ordering.
There may be many valid topological orders
DFS Solution: Run DFS on the DAG and ordering the vertices in descending order
according to their nishing timestamps.
Variations:
• Directed or undirected versions
• Output the shortest weighted path p
151
8 Graph Algorithms
b c
d ye
x g
Figure 8.5: BFS Tree with cross edges (dashed) involved in a cycle.
152
8.4 DFS/BFS Applications
Alternatives: make two runs of DFS or BFS to answer the connectivity question:
∃u, v ∈ V ∃ [p1 : u v ∧ p2 : v u]
For undirected graphs, this does not necessarily work: the path could be the same.
Instead we could take the following strategy: use a functional version of DFS/BFS to
actually nd a path p from u to v, then remove each of the vertices (and corresponding
edges) in the path (except for u, v) and run DFS/BFS again to see if there is a dierent
path p′ which would constitute a cycle.
Recall that a bipartite graph G = (R, L, E) is a graph with two disjoint vertex sets, R
and L and edges between vertices in R and L. That is, for v, v ′ ∈ R, (v, v ′ ) is never an
edge (L likewise).
Problem 12 (Bipartite Testing).
Given: A graph G = (V, E)
Output: true if G is a bipartite graph, false otherwise
Theorem 6. An undirected graph G = (V, E) is bipartite if and only if G contains no
odd-length cycles.
We already have an algorithm for cycle detection, how can we modify it to detect
odd-cycles?
A directed graph G = (V, E) is called strongly connected if for any pair of vertices,
u, v ∈ V there exists a directed path p : u v. That is, every vertex is reachable from
every other vertex by some path.
More generally, a directed graph’s vertices can be partitioned into maximal, disjoint
subsets, V1 , V2 , . . . , Vk such that each induced subgraph Gi = (Vi , Ei ) is a strongly
connected graph. These are known as the strongly connected components of G.
A condensation graph Gc = (V ′ , E ′ ) of a directed graph G is dened as
• V ′ = {V1 , . . . , Vk }
• E ′ = {(Vi , Vj | for some u ∈ Vi , v ∈ Vj , (u, v) ∈ E}
That is, the vertices are the strongly connected components of G and there are edges
between two components if some vertex is reachable between them in G.
A condensation graph is always a directed, acyclic graph that is never strongly connected
(unless G was strongly connected–what would Gc look like then?). The following is the
Kosaraju-Sharir Algorithm.
153
8 Graph Algorithms
a b c d
e f g
a d, g
b, c, e
154
8.5 Minimum Spanning Tree Algorithms
is minimal.
In general, there may be many unique minimum spanning trees. In fact, the number
of possible spanning trees is exponential and generating them is very dicult. Rather,
solving the minimum spanning tree problem can be solved exactly by a greedy strategy.
155
8 Graph Algorithms
• if the inclusion of the edge would induce a cycle in the tree created so far, it is
ignored, otherwise it takes it
• the algorithm terminates when it has taken n − 1 edges
156
8.5 Minimum Spanning Tree Algorithms
a c a c
7 8 7
5 b 5 5 b 5
9 7 7
15
d e d e
6 8 6
f 9 f 9
11
g g
(a) A weighted, undirected graph. (b) Minimum Spanning Tree with total weight
39.
Figure 8.8: Minimum Spanning Tree example. For this particular example, Kruskal’s
and Prim’s algorithms result in the same tree. In fact, for this example there
is only one unique MST.
which can be improved to Θ(m log m) using an appropriate data structure (that removes
the need to perform a DFS/BFS).
Prim’s algorithm is also greedy but works dierently. It may result in a dierent, but
equivalent minimum spanning tree.
Starting at an arbitrary vertex, it builds subtrees by adding a single vertex on each
iteration. The vertex it chooses is based on a greedy choice: add the vertex whose edge
157
8 Graph Algorithms
158
8.5 Minimum Spanning Tree Algorithms
a c
7 8
5 b 5
9 7
15
d e
6 8
f 9
11
g
Figure 8.10: Prim’s Algorithm after the second iteration. Tree vertices (green), Fringe
vertices (yellow), and unseen vertices (grey) are highlighted. Fringe edges
(dark green) connect tree vertices and fringe vertices. The next edge to be
added, (a, b) is highlighted in red.
159
8 Graph Algorithms
Table 8.4: Sequence of edges considered by Prim’s Algorithm starting at vertex a and
results.
Recall the Shortest Path problem (Problem 10). We have seen a BFS solution that works
for several special cases. We now consider two solutions that work for weighted graphs
(either directed or undirected).
Dijkstra’s algorithm works making the following greedy choice: from a source vertex s,
it chooses a path (edge) to the its nearest neighbor. Then it chooses its second nearest
neighbor and so on. By the i-th iteration, it has chosen the shortest paths to i − 1 other
vertices.
The idea is similar to Prim’s in that we make our selection from a set of fringe vertices.
Each vertex is given two labels: the tree vertex by which the current shortest path can
be reached and the length of the shortest path. To choose the next vertex to be added
to the tree, we simply choose the minimal length among all fringe vertices, breaking ties
arbitrarily.
Once a vertex u has been moved from the fringe to part of the tree, we must update each
fringe edge that is adjacent to u∗ . If, via u a vertex v can be reached by a shorter path,
160
8.6 Minimum Distance Algorithms
20
c f
5 5
20
a 20 10 i
5
5
10 g 15
7 d 15
15
20 5
b 5 15 j
10
5 20
e h
then we update the labels of v to the new shortest path and the vertex label u.
c f
20
a 20 i
5
5
10 g
d
b 5 j
5
e h
An example run of Dijkstra’s Algorithm on the graph in Figure 8.11 can be found in
Table 8.5. The resulting shortest path tree is in Figure 8.12.
If you wanted to actually build a shortest path from the source vertex to any other
vertex v, you could “back track” from v using the previous vertex values, pv computed
by Dijkstra’s algorithm. Starting at v, you look up pv = u, then you look up pu and so
on until you arrive at the source vertex s.
As presented, using a min-heap implementation (or even a balanced binary search tree
implementation) for the priority queue will lead to a loose O(n2 log n) analysis. However,
using a more advanced heap implementation such as a Fibonacci heap can improve the
running time to O(m + n log n).
Recall that by using BFS, we could determine the shortest path to all vertices from a
given source vertex. By running BFS for each vertex we can get all pairs shortest paths.
Floyd’s algorithm works equally well on directed and undirected graphs, but is particularly
interesting for weighted graphs of either type.
Pitfall: Floyd’s algorithm does not work on a graph with a negatively weighted cycle.
Why?
Floyd’s starts with the usual adjacency matrix with entries that are the weights of the
edges. Much like Warshall’s, Floyd’s algorithm computes intermediate distance matrices,
D(0) , . . . , D(k−1) , D(k) , . . . , D(n)
Each intermediate distance matrix D(k) contains entries di,j which correspond to the
162
8.6 Minimum Distance Algorithms
(a) Prior to the rst it- (b) First iteration, N (e) (c) Second iteration, (d) Iteration 3, N (d) is
eration. is explored. N (b) is explored explored
Vertex d v pv Vertex dv pv Vertex d v pv Vertex dv pv
a ∞ φ a ∞ φ a 12 b a 10 d
b ∞ φ b 5 e c ∞ φ c 25 d
c ∞ φ c ∞ φ d 5 e f 25 d
d ∞ φ d 5 e f ∞ φ g 15 d
e 0 φ f ∞ φ g 20 e h ∞ φ
f ∞ φ g 20 e h ∞ φ i ∞ φ
g ∞ φ h ∞ φ i ∞ φ j ∞ φ
h ∞ φ i ∞ φ j ∞ φ
i ∞ φ j ∞ φ
j ∞ φ
(h) Final Shortest
(e) Iteration 4, N (a) is (f) Iteration 5, N (g) is (g) Iteration 6, N (i) is Paths from e and
explored explored explored Precesessors
Vertex d v pv Vertex dv pv Vertex dv pv Vertex dv pv
c 25 d c 25 d c 25 d a 10 d
f 25 d f 25 d f 25 d b 5 e
g 15 d h ∞ φ h ∞ φ c 25 d
h ∞ φ i 20 g j ∞ φ d 5 e
i ∞ φ j ∞ φ e 0 φ
j ∞ φ f 25 d
g 15 d
h ∞ φ
i 20 g
j ∞ φ
Table 8.5: Walkthrough of Dijkstra’s Algorithm for source vertex e . Iterations 7 – 9 are
omitted as they do not result in any further changes.
163
8 Graph Algorithms
vk
(k−1) (k−1)
di,k dk,j
vi vj
(k−1)
di,j
Figure 8.13: Basic idea behind Floyd-Warshal: Supposing that a path from vi vj has
(k−1)
already been found with a distance of di,j , the consideration of vk as a
(k−1) (k−1)
new intermediate node may have shorter distance, di,k + dk,j .
total weight of the shortest path from vi to vj not using any vertex numbered higher
than k.
Also as before, we can compute each intermediate distance matrix by using its immediate
predecessor. The same reasoning as before yields a similar recurrence:
{ }
(k) (k−1) (k−1) (k−1) (0)
di,j ← min di,j , di,k + dk,j , di,j = wi,j
164
8.6 Minimum Distance Algorithms
Observations:
• We can keep track of which k corresponds to an update to reconstruct the minimum
paths
• Clearly the algorithm is O(n3 ).
• An example of a full run of Floyd’s can be found in Figure 8.14
A mentioned, we can keep track of which value(s) of k caused the matrix entries to
change. This results in a successor matrix S as depicted in Figure 8.14(h). We now
describe how we can use this successor matrix to construct the shortest path for any pair.
Suppose that we wanted to construct the shortest path vi vj . We start at vi ; to nd
the next vertex, we reference the successor matrix S by looking at the i-th row and j-th
column which gives us the next vertex in the shortest path. Suppose this is v` . Then to
nd the next vertex, we want the entry in the `-th row, j-th column. We continue until
the entry is equal to vj . The complete pseudocode is found in Algorithm 53. For the
example, the shortest path a d is
p:a→b→e→d
165
8 Graph Algorithms
1
a b 2
2 4 8 e
c 3
d
5
(a) A weighted directed graph.
0 1 2 ∞ ∞ 0 1 2 ∞ ∞ 0 1 2 9 3
∞ 0 ∞ 8 2
∞ 0 ∞ 8 2
∞ 0 ∞ 8 2
∞ ∞ 0 5 ∞
∞ ∞ 0 5 ∞
∞ ∞ 0 5 ∞
∞ 4 ∞ 0 ∞ ∞ 4 ∞ 0 ∞ ∞ 4 ∞ 0 6
∞ ∞ ∞ 3 0 ∞ ∞ ∞ 3 0 ∞ ∞ ∞ 3 0
(b) Initial distance matrix (c) Distance matrix after itera- (d) Distance matrix after iter-
tion k = 1 ation k = 2
0 1 2 7 3 0 1 2 7 3 0 1 2 6 3
∞ 0 ∞ 8 2
∞ 0 ∞ 8 2
∞ 0 ∞ 5 2
∞ ∞ 0 5 ∞
∞ 9 0 5 11
∞ 9 0 5 11
∞ 4 ∞ 0 6 ∞ 4 ∞ 0 6 ∞ 4 ∞ 0 6
∞ ∞ ∞ 3 0 ∞ 7 ∞ 3 0 ∞ 7 ∞ 3 0
(e) Distance matrix after iter- (f) Distance matrix after it- (g) Distance matrix after it-
ation k = 3 eration k = 4 eration k = 5
− b c b b
− − − e e
− d − d d
− b − − b
− d − d −
(h) Successor matrix pro-
duced by Floyd’s Algorithm
166
8.6 Minimum Distance Algorithms
15
10
5
1 1 1 1
a b c d e
5 5
10
(a) A weighted directed graph.
0 1 5 10 15 a b c d e
∞ 0 1 5 10 − b c
d e
∞ ∞ 0 1 5 − − c
d e
∞ ∞ ∞ 0 1 − − − d e
∞ ∞ ∞ ∞ 0 − − − − e
(b) Initial distance and successor matrices
0 1 5 10 15 a b c d e
∞ 0 1 5 10 − b c
d e
∞ ∞ 0 1 5 − − c
d e
∞ ∞ ∞ 0 1 − − − d e
∞ ∞ ∞ ∞ 0 − − − − e
(c) After iteration k = 1
0 1 2 6 11 a b b b b
∞ 0 1 5 10
− b c d e
∞ ∞ 0 1 5 − − c
d e
∞ ∞ ∞ 0 1 − − − d e
∞ ∞ ∞ ∞ 0 − − − − e
(d) After iteration k = 2
0 1 2 3 7 a b b b b
∞ 0 1 2 6
− b c c c
∞ ∞ 0 1 5
− − c d e
∞ ∞ ∞ 0 1 − − − d e
∞ ∞ ∞ ∞ 0 − − − − e
(e) After iteration k = 3
0 1 2 3 4 a b b b b
∞ 0 1 2 3
− b c c c
∞ ∞ 0 1 2
− − c d d
∞ ∞ ∞ 0 1 − − − d e
∞ ∞ ∞ ∞ 0 − − − − e
(f) After iteration k = 4
0 1 2 3 4 a b b b b
∞ 0 1 2 3
− b c c c
∞ ∞ 0 1 2
− − c d d
∞ ∞ ∞ 0 1 − − − d e
∞ ∞ ∞ ∞ 0 − − − − e 167
(g) After iteration k = 5
8.7 Exercises
Exercise 8.1. Consider a weighted, complete graph Kn (where all vertices are connected
by edges). Further, assume that we are given G as an adjacency list where the list for
each vertex is ordered in increasing order with respect the edge weights. Describe an
O(|V |)-time algorithm to compute a minimum spanning tree for such an input.
Exercise 8.2. The Food Bucket, a regional chain of restaurants wants you to develop a
program that will generate mazes for its children’s menu. Rather than create one maze,
they want a program that will generate a random maze on an n × n grid such that there
is only one solution (that is, one path from the lower left to the upper right corners).
The algorithm should produce a random maze with various paths connecting grid points
(representing a traversable path) with only a single path leading from grid point (0, 0) to
grid point (n − 1, n − 1). There should also not be any cycles. The algorithm should be
correct and ecient.
Exercise 8.3. Let G = (V, E) be an undirected, unweighted graph. Let δ(u, v) be the
minimum distance between vertices u, v ∈ V . The eccentricity of a vertex v ∈ V is
dened as the maximal minimum distance between v and any other vertex in G:
d = max (v)
v∈V
r = min (v)
v∈V
(a) Design an algorithm that utilizes either DFS, BFS, or some variation/application of
DFS/BFS to compute the radius of a graph G. Provide good pseudocode and give a
brief analysis of your algorithm.
(b) Design an algorithm that utilizes either DFS, BFS, or some variation/application
of DFS/BFS to compute the diameter of a graph G. Provide good pseudocode and
give a brief analysis of your algorithm.
Exercise 8.4. Give an example of weighted (directed or undirected) in which the BFS
Tree does not provide the shortest distance path.
168
8.7 Exercises
Exercise 8.6. Give an algorithm to solve the following problem: given a graph G = (V, E)
and vertices u, v, x, determine if there exists a path from u to v that involves x
Exercise 8.7. Provide a small example of a weighted undirected graph whose MST
produced from Kruskal’s algorithm would be dierent from that produced by Prim’s
algorithm.
Exercise 8.8. Suppose that we restrict the MST problem to weighted graphs whose
weights are all positive integers; that is wt : E → Z+ . Show that each of the following
variations are equivalent to this formulation by showing a transformation between them.
That is, describe a transformation from the variation below to the positive weighted
version and describe how a solution to the positive weighted version can be transformed
back to a solution for the variation.
1. Let wt : E → Z (that is we allow negative and zero weighted edges)
2. Rather than nding the minimum spanning tree, suppose that we wish to nd the
maximum spanning tree
Exercise 8.11. Each iteration of Kruskal’s algorithm considers adding an edge e = (u, v)
to the current minimum spanning tree T by checking whether or not its inclusion in T
would induce a cycle. Gomer thinks he’s found a better way: rather than checking for a
cycle, just check if the end points of e are already in T : if they are, then do not include
the edge as they would not add any vertex to the minimum spanning tree. If either end
point (u or v) is outside the current tree then do add the edge as it would connect the
tree further. Show that Gomer is wrong by providing an example of a tree where using
this criteria instead would fail. Briey explain why Gomer’s criteria is wrong.
Exercise 8.12. Recall that a graph is a tree if it contains no cycles. Adapt either DFS
or BFS to develop an algorithm that determines if a given graph G = (V, E) is a tree or
not. Provide good pseudocode and fully analyze your algorithm.
169
8 Graph Algorithms
Exercise 8.14. Implement DFS in the high-level programming language of your choice.
Exercise 8.15. Implement BFS in the high-level programming language of your choice.
Exercise 8.17. Implement Prim’s in the high-level programming language of your choice.
170
9 Dynamic Programming
9.1 Introduction
Fn = Fn−1 + Fn−2 , F0 = 0, F1 = 1
In general, for a problem to have a dynamic programming solution, it must posses the
optimal substructure property. This is a formalization of the idea that the optimal solution
to a problem can be formulated by (eciently) nding the optimal solution to each of its
sub-problems. This property is also integral to greedy algorithmic solutions. Obviously,
not all problems poses this property.
171
9 Dynamic Programming
172
9.2 Binomial Coecients
0 1 2 3 4 ··· k−1 k
0 1
1 1 1
2 1 2 1
3 1 3 3 1
4 1 4 6 4 1
.. ..
. .
k 1 ··· 1
..
.
n−1 n−1
n−1 1 ··· k−1 k
n
n 1 ··· k
Table 9.1: Tableau for Binomial Coecients. Astute readers will recognize the tableau
as Pascal’s Triangle.
The tableau is lled out top-to-bottom, left-to-right with the trivial base cases pre-lled.
Input : Integers, n, k, 0 ≤ k ≤ n
Output : The binomial coecient nk
1 for i = 0, . . . , n do
2 for j = 0, . . . , min i, k do
3 if j = 0 ∨ j = k then
4 Ci,j ← 1
5 else
6 Ci,j ← Ci−1,j−1 + Ci−1,j
7 end
8 end
9 end
10 output Cn,k
Analysis: clearly the elementary operation is addition and in total, the algorithm is
Θ(nk). This matches the complexity of a memoization approach, but avoids any additional
function calls.
We’ve considered Binary Search Trees and Balanced BSTs. Consider the following
variation: suppose we know in advance the keys to be searched along with a known (or
estimated) probability distribution of searches on these keys.
We don’t necessarily want a balanced binary search tree, rather we want a BST that
will minimize the overall expected (average) number of key comparisons. Intuitively,
keys with higher probabilities should be shallower in the tree while those with lower
probabilities should be deeper. Specically, average number of comparisons made would
be: n
∑
h(ki ) · p(ki )
i=1
where h(ki ) is the level in which the key ki lies and p(ki ) is the probability of searching
for ki . Our goal will be to compute the Optimal Binary Search Tree (OBST).
As an example, consider the key distribution in Table 9.2 and the trees in Figure 9.1.
There are several valid BSTs, but the expected number of comparisons given the key
probabilities are dierent.
k p(k)
a 0.6
b 0.3
c 0.1
The number of possible BSTs with n keys corresponds to the Catalan numbers.
Denition 13. The Catalan numbers are a sequence of natural numbers dened by
1 2n (2n)!
Cn = =
n+1 n (n + 1)!n!
• The rst few numbers in the sequence: 1, 1, 2, 5, 14, 42, 132, 429, . . . ,
• Corresponds to the number of valid, balanced parenthesization of operations:
174
9.3 Optimal Binary Search Trees
a a
b
x c x b
a c b c
(a) A balanced tree, but (b) Another tree, but (c) The optimal cong-
the expected number of the expected number of uration, the expected
comparisons is 1.7 comparisons is still 1.7 number of comparisons
is 1.5
• Corresponds to the number of “full” (every node has 0 or 2 children) binary trees
with n + 1 leaves. This is the same a binary tree with n key nodes (which will
necessarily have n + 1 sentinel leaves).
• Many other interpretations
• They have an exponential growth:
4n
Cn ∈ O
n1.5
OBSTs are not ideal for applications that perform a lot of insertion and deletion operations.
The introduction of a new key or the elimination of a key will necessarily change the
probability distribution and a new OBST may have to be generated from scratch. The
use cases are for more static, indexed collections that do not change frequently.
For a dynamic programming solution (due to Knuth [9]), we need a recurrence. Let Ci,j
be dened as the smallest average number of comparisons made in a successful search
in a binary tree Ti,j involving keys ki through kj for 1 ≤ i, j ≤ n. Ultimately, we are
interested in nding C1,n —the optimal BST involving all keys.
To dene a recurrence, we need to consider which key should be the root of the intermediate
tree Ti,j , say kl . Such a tree will have kl as the root and a left subtree, Ti,l−1 containing keys
ki , . . . kl−1 optimally arranged and the right-sub-tree, Tl+1,j containing keys kl+1 , . . . , j.
j
∑
Ci,j = min Ci,k−1 + Ck+1,j + p(ks )
i≤k≤j
s=i
for 1 ≤ i ≤ j ≤ n. Note that the sum is invariant over all values of k: it represents the
fact that building the tree necessarily adds 1 comparison to the depth of all the keys. A
visualization of the split can be found in Figure 9.2.
Some obvious corner cases:
175
9 Dynamic Programming
k`
ki , . . . , k`−1 k`+1 , . . . , kj
176
9.3 Optimal Binary Search Trees
..
.
· · · Ci,i−1 Ci,i Ci,i+1 · · · Ci,j−2 Ci,j−1 Ci,j ···
Ci+1,j
Ci+2,j
Ci+3,j
..
.
Cj,j
Cj+1,j
..
.
Figure 9.3: Visualization of the OBST tableau. Computing the minimal value for Ci,j
involves examining potential splits of keys, giving pairs of sub-solutions. This
indicates that we need elements in the same row to the left and in the same
column below. Thus, the algorithm needs to complete the tableau top-left to
bottom-right diagonally.
9.3.1 Example
k = 1 : C1,0 + C2,2 + 2s=1 p(ks ) = 0 + .02 + (2.13 + .02) = .253
C1,2 = min
1≤k≤2 k = 2 : C1,1 + C3,2 + 2s=1 p(ks ) = 0.213 + .02 + (2.13 + .02) = .446
These two options correspond to splitting the sub-tree involving keys A, B at A, B
respectively. That is, A as the root with B as its right-child or B as the root with A as
its left-child. The values indicate that k = 1 is the better option.
178
9.4 Dynamic Knapsack
Key Probability
A .213
B .020
C .547
D .100
E .120
Table 9.5: Tableaus resulting from the Optimal Binary Search Tree example
k = 2 : C2,1 + C3,5 + 5s=2 p(ks ) = 1.874
= 3 : C2,2 + C4,5 + 5s=2 p(ks ) = 1.127
k
C2,5 = min
2≤k≤5
k = 4 : C2,3 + C5,5 + 5s=2 p(ks ) = 1.494
k = 5 : C2,4 + C6,5 + 5s=2 p(ks ) = 1.573
The nal result can be seen in Figure 9.4. The expected number of key comparisons
for any search is 1.573. Though this particular example resulted in a balanced BST, in
general, OBSTs need not be balanced.
Recall the 0-1 Knapsack Problem (see Problem 4). This problem lends itself to an
“ecient” dynamic programming solution.
Let Vi,j be the optimal solution (a subset of items) involving the rst i objects subject to
an intermediate weight constraint j. Clearly, 1 ≤ i ≤ n and 1 ≤ j ≤ W .
For a given Vi,j we can divide all the possible subsets of the rst i items that t a
knapsack capacity of j into two groups: those that include item i and those that do not.
179
9 Dynamic Programming
A E
x B D y
• Among subsets that do not include the i-th element, the value of the optimal subset
is Vi−1,j .
• Among the subsets that do include the i-th element, the optimal subset will be
made up of the i-th item and the optimal subset of the rst i − 1 items that t
into a knapsack of capacity j − wi . This is exactly
vi + Vi−1,j−wi
.
We are interested in maximizing our total value, so we take the max of these two solutions
if feasible.
max Vi−1,j , vi + Vi−1,j−wi if j − wi ≥ 0
Vi,j =
Vi−1,j if j − wi < 0
180
9.4 Dynamic Knapsack
H
HH j
H 0 ··· j − wi ··· ··· W
i HH
0 0 ··· 0 ··· 0 ··· 0
1 0 ··· ··· ···
..
.
i−1 0 ··· Vi−1,j−wi ··· Vi−1,j ···
i 0 ··· ··· Vi,j ···
..
.
n 0 ··· ··· Vi,j ··· Vn,W
Table 9.6: Tableau for Dynamic Knapsack. The value of an entry Vi,j depends on the
values in the previous row and columns.
Visually, we start with entry Vn,W . We then scan upwards in this column until the table
value changes. A change from row i to row i − 1 corresponds to taking item i with weight
wi . We then jump left in the table on row i − 1 to column j − wi (where j is the column
we started out) and repeat the process until we have reached the border of the tableau.
181
9 Dynamic Programming
9.4.1 Example
The resulting tableau can be found in gure 9.8. The corresponding optimal knapsack
would be {a2 , a4 , a5 }; the backtracking is illustrated in Figure 9.5.
9.4.2 Analysis
Clearly, the running time is equivalent to the number of entries that we compute, Θ(nW ).
Unfortunately this fails to give a complete picture. Recall that W is part of the input,
thus the input size is log W . If W ∈ O(2n ) for example, this is clearly not polynomial.
Algorithms with analysis like this are called pseudopolynomial (or more specically,
pseudolinear in this case). This is because another parameter of the input “hides”
the true complexity of the algorithm. If we considered W to be constant or even
W ∈ O(nk ) then the algorithm runs in polynomial time. In general, such restrictions are
not reasonable and the problem remains NP-complete.
The Coin Change problem is the problem of giving the minimum number of coins adding
up to a total L in change using a given set of denominations {c1 , . . . , cn }. For certain
182
9.5 Coin Change Problem
0 1 2 3 4 5 6 7
0 0 0 0 0 0 0 0 0
a1 0 0 0 0 0 10 10 10
a2 0 0 5 5 5 10 10 15
a3 0 0 5 5 8 10 13 15
a4 0 0 7 7 12 12 15 17
a5 0 0 7 7 12 14 15 19
9.5.1 Example
Suppose that we have a set of four coin denominations (see Table 9.10).
The resulting tableau can be found in Table 9.11.
183
9 Dynamic Programming
H
HH j
H 0 1 2 ··· C
i HH
0 ∞ ∞ ∞ ··· ∞
..
1 0 .
2 0
.. ..
. .
n 0
Denomination Value
c1 5
c2 2
c3 4
c4 1
Suppose we have three matrices, A, B, C of dimensions (5 × 10), (10 × 15), (15 × 10)
respectively. If we were to perform the multiplication ABC, we could do it one of two
ways; multiply AB rst then by C or BC rst then by A. That is, either
(AB)C
or
A(BC)
Using straightforward matrix multiplication, the rst way would result in
(5 · 10 · 15) + (5 · 15 · 10) = 1, 500
multiplications while the second would result in
(5 · 10 · 10) + (10 · 15 · 10) = 2, 000
multiplications.
In general, suppose that we are given a chain of matrices A1 , . . . An and we wanted to
multiply them all but minimize the total number of multiplications. This is the matrix
chain multiplication problem. More generally, what parenthesization of the associative
operations minimizes the overall number of multiplication made to compute the product?
Clearly, each product, Ai Ai+1 has to be a valid operation. Further, we will assume that
each matrix has dierent dimensions (d1 , . . . , dn+1 )—if they were all square there would
be no point in nding the optimal order (they would all be the same).
184
9.6 Matrix Chain Multiplication
H
HH j
ci H 0 1 2 3 4 5 6 7 8
i HH
- 0 - ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞
5 1 0 ∞ ∞ ∞ ∞ 1 ∞ ∞ ∞
2 2 0 ∞ 1 ∞ 2 1 3 2 4
4 3 0 ∞ 1 ∞ 1 1 2 2 2
1 4 0 1 1 2 1 1 2 2 2
For this problem we dene a (lower-triangular) table of size n × n. Each entry represents
the minimum number of multiplications it takes to evaluate the matrices from Ai . . . Aj .
The tableau can be lled in using the following recurrence relation:
0 if i = j
C(i, j) = min {C(i, k) + C(k + 1, j) + di−1 dk dj } otherwise
i≤k<j
The order in which the tableau is lled in begins with the base cases, down the diagonal
where i = j. Depending on how you orient your tableau, you will then work downward
in a diagonal fashion, see Table 9.12.
In addition, at each calculation, we ll in an S-array that keeps track of which value of k
was chosen for each entry. This value can be used to construct the optimal solution. The
S-array ranges 2 ≤ i ≤ n and 1 ≤ j ≤ n − 1; see Table 9.13.
HH
j
HH 1 2 ··· n−1
i HH
2 0
3 φ 0
.. .. ..
. . .
n φ φ ··· 0
185
9 Dynamic Programming
Matrix Dimensions
A1 30 × 35
A2 35 × 15
A3 15 × 5
A4 5 × 10
A5 10 × 20
A6 20 × 25
9.6.1 Example
186
9.6 Matrix Chain Multiplication
H
HH j
H 1 2 3 4 5 6
i HH
1 0 15,750 7,875 9,375 11,875 15,125
2 φ 0 2,625 4,375 7,125 10,500
3 φ φ 0 750 2,500 5,375
4 φ φ φ 0 1,000 3,500
5 φ φ φ φ 0 5,000
6 φ φ φ φ φ 0
C2,2 + C3,5 + d1 d2 d5 = 0 + 2500 + 35 · 15 · 20 = 13, 000
C2,5 = min C2,3 + C4,5 + d1 d3 d5 = 2625 + 1000 + 35 · 5 · 20 = 7, 125
2≤k≤4
C2,4 + C5,5 + d1 d4 d5 = 4375 + 0 + 35 · 10 · 20 = 11, 375
Clearly, the minimum among these is 7,125. The k value that gave this result was 3,
consequently we place this value into our S-array, S2,5 = 3. The full S-array can be
found in Table 9.16.
Each entry in the S-array records the value of k such that the optimal parenthesization of
Ai Ai+1 Ai+2 . . . Aj splits the product between Ak and Ak+1 . Thus we know that the nal
matrix multiplication in computing A1...n optimally is A1...s[1,n] As[1,n]+1...n . Multiplications
in each sub-chain can be calculated recursively; the intuitive base case is when you only
have two matrices to multiply.
From the S-array above, the initial split would be at S1,6 = 3. The full parenthesization
would be:
((A1 )(A2 A3 )) ((A4 A5 )(A6 ))
187
9 Dynamic Programming
9.7 Exercises
Exercise 9.1. Construct the C tableau, S-array and nal parenthesization for the
following matrices:
Matrix Dimensions
A1 5 × 10
A2 10 × 3
A3 3 × 12
A4 12 × 5
A5 5 × 50
A6 50 × 6
Exercise 9.2. Gomer thinks he’s found a better solution to constructing an Optimal
Binary Search Tree. He thinks a greedy strategy would work just as well: Sort the keys
and choose the one with the highest probability as the root of the tree. Then, repeat this
process with the left/right keys. Show that Gomer’s strategy will not work by providing
an example input that will not be optimal with his strategy.
Exercise 9.3. Implement the dynamic programming solution to the Optimal Binary
Search Tree problem in the high-level programming language of your choice.
Exercise 9.4. Implement the dynamic programming solution to the Matrix Chain
Multiplication problem in the high-level programming language of your choice.
Exercise 9.5. Implement the dynamic programming solution to the 0-1 knapsack problem
in the high-level programming language of your choice.
Exercise 9.6. Implement the dynamic programming solution to the Coin Change
problem in the high-level programming language of your choice.
Exercise 9.7. Consider the following “game”: n items of values v1 , v2 , . . . , vn are placed
in a line. You are allowed to take an item only at the beginning or the end of the line.
Once you take an item, the second player is allowed to take an item at the beginning or
the end of the line. You take turns like this until all items have been taken. The winner
is the player whose item values sum to the greater amount.
Prove or disprove: a greedy-choice strategy will work; that is, always take the item with
the higher value.
Exercise 9.8. Recall that a palindrome is a string that is the same backwards and
forwards (“abba” or “deied”).
(a) Write pseudocode that, given a string s, determines the longest contiguous palin-
dromic subsequence in s. Analyze your algorithm.
188
9.7 Exercises
BBABCBCAB
189
10 Computational Models
10.1 Introduction
• Language Theory
• Computational Models
• Turing Machines
• Computability Theory
• Complexity Classes P & NP
• NP-Completeness
• Reductions
each part of this statement is imprecise from a mathematical point of view; each has an
intuitive and dictionary meaning, but we will want to give a more rigorous, quantiable
mathematical equivalent. In particular,
191
10 Computational Models
10.1.1 Languages
• Numerical
• Graph
• Sorting
• Logic
We need a unied model that captures all dierent types of problems; for this we use
languages.
Language Operations
L1 L2 = {xy | x ∈ L1 , y ∈ L2 }
192
10.1 Introduction
The Kleene Star is a recursively dened operation. For a given language L, L0 = {λ}
and for n > 0, we dene
Ln = L(n−1) L
The Kleene Star operation is then dened as the union of all such concatenations.
⋃
L∗ = Ln
n≥0
Regular Languages
193
10 Computational Models
• 0∗ ∪ 1∗ is the language consisting of all strings with either all 1s or all 0s (plus the
empty string).
• 0∗ 10∗ is the language consisting of all strings with a single 1 in them.
• (ΣΣ)∗ the set of all even length strings
• 1Σ∗ 0 the set of all canonical representation of even integers.
Exercise: Give a regular expression for the set of all strings where every 0 appears before
any occurrence of a 1.
The language model is robust. Any problem P can be equivalently stated as a language
L where
• (Encodings) x of yes instances are members of the language; x ∈ L.
• (Encodings) x of no instances are not members of the language; x 6∈ L.
The key is that we establish a proper encoding scheme.
A proper encoding of graphs, for example, may be a string that consists of a binary
representation of n, the number of vertices.
Using some delimiter (which can also be in binary), we can specify connectivity by listing
pairs of connected vertices.
〈G〉 = 11:00:01:01:10
194
10.2 Computational Models
There are many dierent computational models corresponding to many classes of lan-
guages.
Some are provably more powerful than others. Here, we give a brief introduction to
• Finite State Automata
• Grammars
• Turing Machines
• Q = {q0 , q1 , q2 }
• Σ = {0, 1}
• q0 is our initial state
• F = {q2 }
195
10 Computational Models
1 0
0 1
q0 q1 q2
0
δ(q0 , 0) = q1
δ(q0 , 1) = q0
δ(q1 , 0) = q1
δ(q1 , 1) = q2
δ(q2 , 0) = q1
δ(q2 , 1) = q0
This FSM accepts is any string that ends in 01. An equivalent regular expression is
simply Σ∗ 01.
The set of strings that a nite-state automaton A accepts is its language:
Conversely, any string that ends in a non-accept state is rejected. This also denes a
language–the complement language:
L(M) = Σ∗ − L(M)
Finite-state automata are a simple computation model, but very restrictive. The only
types of languages it can recognize are those that can be dened by regular expressions.
Recognition here means that a machine, given a nite string x ∈ Σ∗ can tell if x ∈ L(M).
Examples of regular languages:
• The language containing all strings, Σ∗
• The language consisting of all strings that are all 0s or all 1s
• The language consisting of all strings with an equal number of 0s and 1s
196
10.2 Computational Models
• The language consisting of all even parity strings (an even number of 1s)
Not all languages are regular. Even simple languages such as
L = {0n 1n | ∀n ≥ 0}
A Turing Machine is a basic computational model which has an input tape that can be
read from, an output tape that can be written on and a set of states.
• A tape head moves left and right along the input/output tape and performs reads
and writes according to what symbols it encounters. A special empty symbol, t is
used to indicate the end of the input in the input tape and unwritten/uninitialized
cells in the work tape.
• A denition of a given Turing Machine can be made precise by enumerating every
possible transition on every possible input and output symbol for every state.
• A state diagram similar to automatons can visualize this transition. However, it is
much easier to simply describe a Turing Machine in high level English.
• A visualization can be seen in Figure 10.2
• In our denition:
– There are separate input and work tapes; the input tape is read-only and
cannot be changed/written to
– We allow left/right transitions for both tapes as well as a “no transition”, −
Other denitions (such as those in [14]) may omit these conveniences, but it can
be shown to be equivalent
197
10 Computational Models
Input x0 x1 x2 x3 x4 x5 x6 x7 · · · xn t
L = {x#x | x ∈ Σ∗ }
198
10.2 Computational Models
Extended Example
Designing Turing Machines can be a very complex process (and more arduous than
programming with 1950s era punchcards). We’re essentially programming using machine
language.
As an example, let’s design a full Turing Machine to decide the non-regular language,
{0n 1n | ∀n ≥ 1}. Let’s start with a high-level description:
1. Start by writing a special end symbol, # to the work tape. Moreover, if we see a 1
in the input tape or it is empty, t immediately halt and reject.
2. Start reading 0s in the input tape, for each 0, write a zero to the worktape (the
work tape will act as a stack that we push symbols onto).
3. When the rst 1 is encountered, continue reading 1s; for each 1, erase the work
tape symbols.
• If we ever encounter another 0, reject
• If we encounter # with 1s still in the input tape, reject
• If we encounter the end of the input but there are still 0s on the work tape,
reject
• Otherwise, accept.
The transition function is dened by in Table 10.1 and a visual depiction of the nite
state control is depicted in Figure 10.3. This Turing Machine construction is actually
equivalent to a push-down automaton (essentially a nite state machine with access to
an innite-capacity stack), which dene Context-Free Languages.
The Church-Turing Thesis gives a formal (though debatably not rigorous) denition of
what an algorithm is. It states that the intuitive notion of an algorithmic process is
equivalent to the computational model of Turing Machines.
This means that any rigorous computational model can be simulated by a Turing Machine.
Moreover, no other computational model is more powerful (in terms of the types of
languages it can accept) than a Turing Machine.
A programming language is Turing complete if it can do anything that a Turing machine
can do.
As a consequence, any two Turing complete programming languages are equivalent.
Intuitively, anything that you can do in Java, you can do in C++ (algorithmically, we’re
not talking about specic libraries), Perl, Python, PHP, etc.
199
10 Computational Models
Table 10.1: Turing Machine Transitions. Some states are not possible (NA).
State Input Output Transition
0 NA
0 1 NA
# NA
t q1 , #, −, R
q0 0 NA
1 1 NA
# NA
t reject
0 NA
t 1 NA
# NA
t reject
0 NA
0 1 NA
# NA
t q1 , 0, R, R
q1 0 NA
1 1 NA
# NA
t q2 , t, −, −
0 NA
t 1 NA
# NA
t reject
0 NA
0 1 NA
# reject
t reject
q2 0 NA
1 1 NA
# reject
t q1 , t, R, L
0 reject
t 1 NA
# accept
t NA
200
10.2 Computational Models
(1, #)
(0, ·)
(t, ·) (1, ·) (t, #)
(t, 0)
qreject qaccept
Figure 10.3: Turing Machine Finite State Transitions. Invalid or unnecessary transitions
have been omitted. The transition (a, b) → (x, y, z) indicates a is the symbol
on the input tape, b is the symbol on the work tape, x is the symbol we
write to the work tape, and y, z are the tape head transitions for the input
tape and work tape respectively. We have used the notation · as in (0, ·) to
indicate that the symbol on the work tape is irrelevant. Transitions to a
nal halting state omit the tape write/transitions as they are nal states.
201
10 Computational Models
Denition 17. Let M be a Turing Machine and let L be a language. We say that M
decides L if for all x ∈ Σ∗ , M (s) halts and:
• accepts if and only if x ∈ L
• rejects if and only if x 6∈ L
As previously noted, a machine M on an input x may not halt. A less restrictive property
is language recognition.
We say that a Turing machine M recognizes a language L if for every x ∈ L, M (x) halts
and accepts.
A language L is in RE if some Turing machine recognizes it.
RE is the class of recursively enumerable languages (also called computably enumerable).
For a language in L ∈ RE, if x ∈ L, then some machine will eventually halt and accept it.
If x 6∈ L then the machine may or may not halt.
A language L is in R if some Turing machine decides it.
R is the class of recursive languages (also called computable).
Here, if L ∈ R, then there is some machine that will halt on all inputs and is guaranteed
to accept or reject.
Its not hard to see that if a language is decidable, it is also recognizable by denition,
thus
R ⊆ RE
202
10.2 Computational Models
There are problems (languages) that are not Turing Decidable: languages L ∈ RE, L 6∈ R.
We take as our rst example the halting problem.
Problem 15 (Halting Problem). Given: A Turing Machine M and an input x.
Question: does M (x) halt?
This indeed would be a very useful program—once you’ve compiled a program, you may
want to determine if you’ve screwed up and caused an innite loop somewhere.
We will show that the halting problem is undecidable.
That is, no algorithm, program or Turing Machine exists that could ever tell if another
Turing Machine halts on a given input or not.
Proof. By way of contradiction assume that there exists a Turing Machine H that decides
the halting problem:
halt and output 1 if M halts on x
H(〈M, x〉) =
halt and output 0 if M does not halt on x
It is easy to construct Q: we run H(〈M, M 〉), if it outputs 0, then we halt and accept; if
it outputs 1 then we go into a trivial loop state, never halting. Now that Q is constructed,
we can run Q on itself:
halts if H(〈Q, Q〉) = 0
Q(〈Q〉) =
does not halt if H(〈Q, Q〉) = 1
Which is a contradiction because Q(〈Q〉) will halt if and only if Q(〈Q〉) doesn’t halt and
vice versa.
Therefore, no such H can exist.
Many other problems, some of even practical interest have been shown to be undecidable.
This means that no matter how hard you try, you can never solve these problems with
any algorithm.
203
10 Computational Models
I am lying
The key to this seeming paradox is self-reference. This is where we get the terms recursive
and recursively enumerable.
Now that we have a concrete model to work from: Problems as languages and Algorithms
as Turing Machines, we can further delineate complexity classes within R (all decidable
problems) by considering Turing Machines with respect to resource bounds.
In the computation of a Turing Machine M , the amount of memory M uses can be
quantied by how many tape cells are required in the computation of an input x. The
amount of time M uses can be quantied by the number of transitions M makes in the
computation of x.
Of course, just as before, we are interested in how much time and memory are used as a
function of the input size. In this case,
T (|x|)
and
M (|x|)
respectively where x ∈ Σ∗ . Again, the restriction to decisional versions of problems is
perfectly ne—we could just consider languages and Turing Machines themselves.
204
10.3 Complexity Classes
Denition 18. The complexity class P consists of all languages that are decidable by a
Turing Machine running in polynomial time with respect to the input |x|. Alternatively,
P is the class of all decision problems that are solvable by a polynomial time running
algorithm.
10.3.2 Nondeterminism
205
10 Computational Models
That is, each stage, guessing and verication, can be done in polynomial time. HamiltonianCycle
NP since a random permutation can be guessed in O(n) time and the verication process
can be done in O(n2 ) time.
It is not hard to see that
P ⊆ NP
since any problem that can be deterministically solved in polynomial time can certainly
be solved in nondeterministic polynomial time.
The most famous unanswered question so far then is
?
P = NP
If the answer is yes (very unlikely), then every problem in NP could be solved in
polynomial time. If the answer is no, then the hardest problems in NP could never be
solved by a polynomial time algorithm. Such problems will forever remain intractable.
To understand this more fully, we need to explore the notion of NP-Completeness.
206
10.4 Reductions & NP-Completeness
A′ (x)
yes yes
no no
x ∈ A ⇐⇒ f (x) ∈ B
207
10 Computational Models
NPC NP
f ≤P
Intuitively, NP-Complete problems are the hardest (most dicult computationally speak-
ing) problems in NP (of course there are provably harder problems in classes such as
EXP). The “landscape” is depicted in Figure 10.5.
There are 5 basic steps to show that a given problem P is NP-Complete.
1. Prove that P ∈ NP by giving an algorithm that guesses a certicate and an
algorithm that veries a solution in polynomial time.
2. Select a known NP-Complete problem P ′ that we will reduce to P (P ′ ≤P P)
′
3. Give an algorithm that computes f : Pyes 7→ Pyes for every instance x ∈ {0, 1}∗ .
4. Prove that f satises x ∈ P ′ if and only if f (x) ∈ P
5. Prove that the algorithm in step 3 runs in polynomial time
10.4.1 Satisability
However, LNP is unnatural from a “problem” point of view. However, the key is that
208
10.4 Reductions & NP-Completeness
P 1 ≤ P P 2 ≤ P P 3 ⇒ P1 ≤ P P 3
This conjunction is satisable and is therefore a yes instance of SAT since if we set
x1 = x4 = 0 and x2 = x3 = 1, C = 1.
Let n = 3 and consider the following conjunction:
This conjunction is not satisable since none of the 2n = 8 possible boolean assignments
will ever make C = 1.
209
10 Computational Models
Problem 17 (Clique).
Given: An undirected graph G = (V, E)
Output: A subset C ⊆ V of maximal cardinality such that all vertices in C are connected
210
10.4 Reductions & NP-Completeness
Proof. (⇒) : Suppose that φ has a satisfying assignment. This implies that each clause
Ci contains at least one true literal. Remember that each literal corresponds to some
vertex in G. Choosing a true literal from each clause yields a set V ′ of size k. To see
that V ′ is a clique we look at our two requirements from before: vαi and vβj are consistent
and both are true, thus (vαi , vβj ) ∈ E.
(⇐): Suppose that G has a clique of size k. No edges in G connect vertices in the same
triple corresponding to a clause so V ′ contains exactly one vertex per triple. Without fear
of causing inconsistencies, we can assign a 1 to each literal (0 if a negation) corresponding
to some vertex in each triple thus each clause is satised and so φ is satised.
Problem 18 (VertexCover).
Given: An undirected graph G = (V, E)
Output: A subset V ′ ⊆ V of minimal cardinality such that each edge e ∈ V is incident
on some v ∈ V ′
211
10 Computational Models
C2
x1 x2 x3
x1 x1
C1 ¬x2 ¬x2 C3
x3 ¬x3
The vertices “cover” all the edges. Clearly V is a trivial vertex cover, but we are interested
in vertex covers of minimal size.
Theorem 9. Vertex Cover is NP-complete.
Proof. We rst observe that the verication of the Vertex Cover problem can be achieved
with an NP algorithm, establishing that VertexCover ∈ NP. We rst convert this problem
to a decision version by adding a parameter k and ask the question, does there exist a
subset V ′ ⊆ V of cardinality k that represents a vertex cover. We then nondetermin-
istically guess a subset V ′ ⊆ V of size k and verify in deterministic O(km) time that
it constitutes a vertex cover by iterating over vertices in the cover and removing edges
incident on them. If all edges have been removed, it is a vertex cover (accept) otherwise
there are uncovered edges and we reject.
We complete the proof by showing a reduction from the clique problem to vertex cover.
The fact that the clique problem is NP-complete was established previously. Given an
encoding of the (decision version) of the clique problem, 〈G, k〉, we transform it to an
instance of the vertex cover problem by taking the complement graph G and substituting
|V | − k for our cardinality parameter. That is:
〈G, k〉 → 〈G, |V | − k〉
We now make the following claim:
C ⊆ V is a clique of size k in G if and only if V \ C is a vertex cover of size |V | − k in G.
The proof is below; the idea is visualized in Figure 10.7.
212
10.5 Beyond P and NP
G G
C C
u u
v v
C =V \C C =V \C
(a) Suppose that a (b) In G, u, v are con-
clique C of size k ex- nected and C vertex
isted in G with u, v cover of size n − k
unconnected.
Figure 10.7: Reduction Idea for Clique/Vertex Cover. The complement of a clique in G
acts as a vertex cover in G.
(⇒) Let C be a clique in G such that |C| = k. Let e = (u, v) be an edge in G. We need
to show that this edge is covered either by u or by v. Observe:
e = (u, v) ∈ E ⇒ (u, v) 6∈ E
⇒ u 6∈ C ∨ v 6∈ C or both
⇒ u∈V \C ∨v ∈V \C
⇒ (u, v) is covered by V \ C
∀u, v ∈ V [u 6∈ C ∧ v 6∈ C ⇒ (u, v) ∈ E]
There are many more complexity classes based on various computational resource bounds:
memory (space); randomization, quantum, etc. A large list of complexity classes is
maintained at the Complexity Zoo: http://www.complexityzoo.com/ [2].
In summary, with respect to the complexity classes examined here,
213
10 Computational Models
b d f b d f
a a
c e g c e g
(a) Graph G with a clique of size 4 highlighted (b) The complement graph G. The corresponding
in purple. vertex cover is highlighted in yellow.
Figure 10.8: Graph and Complement: Clique and Vertex Cover. Graph G with a clique
of size 4 and its complement G with a vertex cover of size 3.
• coRE is the complement class of RE, that is it consists of all decision problems for
which no instances can be veried by a Turing Machine in a nite amount of time.
yes instances are not guaranteed to halt.
• coNP is the complement class of NP. Rather than producing certicates, acceptance
is dened as being able to produce a disqualier (i.e. if some computation path
produces a no answer, we accept. This is still a nondeterministic class. A good
example: Tautology.
• NP ∩ coNP is the intersection of NP and coNP, P is contained in this class as is
• NPI: NP intermediate, problems in NP, but that are neither in P nor NP-complete.
If one were able to prove a language L ∈ NPI, then it would show that P 6= NP. Thus,
no problems have been shown to be in NPI; however there are many candidates:
Graph Isomorphism, Group Isomorphism, Integer Factorization, Discrete Logarithm,
etc. We do have a partial result: Ladner’s Theorem states that if P 6= NP, then
there are languages in NPI [11].
10.6 Misc
Terminology
• An alternating path: given a partial matching, an alternating path is a path p in G
such that edges in p alternate, e ∈ M, e 6∈ M , etc.
214
10.6 Misc
RE coRE
NPC coNPC
NP coNP
P
TODO
Other Algorithms:
• Edmonds (for general graphs): paths, trees and owers (graph decomposition)
• Hungarian algorithm (for the assignment problem–equivalent to maximal bipartite
matching)
Related: counting the number of perfect matchings in a bipartite graph is #P-complete
215
10 Computational Models
(a much higher complexity class). This is equivalent to computing the permanent of a 0-1
matrix. Here we have a dicult counting problem whose corresponding search problem
(nding a matching) is easy!
10.7 Exercises
Exercise 10.1. Design a nite-state automaton to accept the language consisting of any
string in which contain no contiguous 0s.
Exercise 10.2. Can you show that polynomial time reductions, ≤P dene an equivalence
relation on all languages in NP? What would the implications of such a result be?
Exercise 10.3. Let G = (V, E) be an undirected graph. Given two vertices, u, v ∈ V
it is easy to determine the length of the shortest path between them using Dijkstra’s
algorithm. However, determining the longest path between u, v is NP-complete. Prove
this as follows: assume that a “free” algorithm A exists for the longest path problem. A
takes as input, G, u, v and outputs the length k of the longest path between u and v)
and use it to design a polynomial time algorithm to solve the Hamiltonian Path problem,
a known NP-complete problem.
Exercise 10.4. Show that the Hamiltonian Cycle problem is NP-complete by showing a
reduction from Hamiltonian Path (hint: try adding a vertex and connecting it somehow).
Exercise 10.5. Say that we had an oracle (a subroutine; a “free” function) to answer yes
or no whether a given graph G has a Hamiltonian Path. Design an algorithm to actually
compute a Hamiltonian Path that uses this as a subroutine that runs in polynomial time
(assuming that the oracle subroutine is free).
Exercise 10.6. Consider the following denition and problem:
Denition 23. An independent set of an undirected graph G = (V, E) is a subset of
vertices V ′ ⊆ V such that for any two vertices v, v ′ ∈ V ′ , (v, v ′ ) 6∈ E
Problem 19 (IndependentSet). Given an undirected graph G = (V, E)
Question: Does there exist an independent set of size k?
Prove that IndependentSet is NP-complete by showing a reduction from the clique problem.
Exercise 10.7. Show that the Clique problem is NP-complete by showing a reduction
from the Independent set problem.
Exercise 10.8. Recall the Traveling Salesperson Problem: prove that this problem is
NP-complete by showing a reduction from the Hamiltonian Cycle problem.
Exercise 10.9. Answer the following questions regarding P and NP. Here, ≤P refers to
a polynomial time reduction and P1 , P2 , P3 are problems. Provide a brief justication for
your answers.
216
10.7 Exercises
217
11 More Algorithms
11.1 A∗ Search
http://www.aaai.org/ocs/index.php/SOCS/SOCS12/paper/viewFile/5396/5212
https://en.wikipedia.org/wiki/Jump_point_search
11.3 Minimax
219
Glossary
algorithm a process or method that consists of a specied step-by-step set of operations.
221
Acronyms
BFS Breadth First Search. 128
223
Index
225
Index
226
Bibliography
[1] An annotated list of selected NP-complete problems. http://cgi.csc.liv.ac.uk/
~ped/teachadmin/COMP202/annotated_np.html, 2014. Accessed: 2014-08-12.
[2] Complexity zoo. https://complexityzoo.uwaterloo.ca/Complexity_Zoo, 2014.
Accessed: 2014-08-05.
[3] Manindra Agrawal, Neeraj Kayal, and Nitin Saxena. PRIMES is in P. Ann. of
Math, 2:781–793, 2002.
[4] Paul Bachmann. Die Analytische Zahlentheorie. 1894.
[5] R. Bayer and E. McCreight. Organization and maintenance of large ordered indices.
In Proceedings of the 1970 ACM SIGFIDET (Now SIGMOD) Workshop on Data
Description, Access and Control, SIGFIDET ’70, pages 107–141, New York, NY,
USA, 1970. ACM.
[6] Stephen A. Cook. The complexity of theorem-proving procedures. In Proceedings
of the Third Annual ACM Symposium on Theory of Computing, STOC ’71, pages
151–158, New York, NY, USA, 1971. ACM.
[7] Michael R. Garey and David S. Johnson. Computers and Intractability: A Guide to
the Theory of NP-Completeness. W. H. Freeman & Co., New York, NY, USA, 1979.
[8] G. Georgy Adelson-Velsky and E. M. Landis. An algorithm for the organization of
information. In Proceedings of the USSR Academy of Sciences (in Russian), number
146, pages 263–266, 1962.
[9] D.E. Knuth. Optimum binary search trees. Acta Informatica, 1(1):14–25, 1971.
[10] Donald E. Knuth. Big omicron and big omega and big theta. SIGACT News,
8(2):18–24, April 1976.
[11] Richard E. Ladner. On the structure of polynomial time reducibility. J. ACM,
22(1):155–171, January 1975.
[12] Leonid A Levin. Universal sequential search problems. Problemy Peredachi Infor-
matsii, 9(3):115–116, 1973.
[13] Anany V. Levitin. Introduction to the Design and Analysis of Algorithms. Addison-
Wesley Longman Publishing Co., Inc., Boston, MA, USA, 2002.
[14] Michael Sipser. Introduction to the Theory of Computation. International Thomson
227
Bibliography
228