Algorithmics
Algorithmics
Algorithmics
GILLES BRASSARD
PAUL BRATLEY
ALGORITHMICS
ALGORITHMICS
Theory and Practice
Brassard, Gilles
Algorithmics : theory and practice.
1. Recursion theory. 2. Algorithms. I. Bratley,
Paul. II. Title.
QA9.6.B73 1987 51 I'.3 88-2326
ISBN 0-13-023243-2
ISBN 0-13-023243-2
Preface Xiii
1 Preliminaries 1
1.7.1. Sorting, 13
1.7.2. Multiplication of Large Integers, 13
1.7.3. Evaluating Determinants, 14
1.7.4. Calculating the Greatest Common Divisor, 15
1.7.5. Calculating the Fibonacci Sequence, 16
1.7.6. Fourier Transforms, 19
VII
viii Contents
1.9.2. Graphs, 21
1.9.3. Rooted Trees, 23
1.9.4. Heaps, 25
1.9.5. Disjoint Set Structures, 30
1.10. References and Further Reading 35
3 Greedy Algorithms 79
3.1. Introduction 79
3.2. Greedy Algorithms and Graphs 81
Bibliography 341
Index 353
Preface
xiii
xiv Preface
computing", adopting the wider perspective that it is "the area of human study,
knowledge and expertise that concerns algorithms".
Our book is neither a programming manual nor an account of the proper use of
data structures. Still less is it a "cookbook" containing a long catalogue of programs
ready to be used directly on a machine to solve certain specific problems, but giving at
best a vague idea of the principles involved in their design. On the contrary, the aim of
our book is to give the reader some basic tools needed to develop his or her own algo-
rithms, in whatever field of application they may be required.
Thus we concentrate on the techniques used to design and analyse efficient algo-
rithms. Each technique is first presented in full generality. Thereafter it is illustrated by
concrete examples of algorithms taken from such different applications as optimization,
linear algebra, cryptography, operations research, symbolic computation, artificial intel-
ligence, numerical analysis, computing in the humanities, and so on. Although our
approach is rigorous and theoretical, we do not neglect the needs of practitioners:
besides illustrating the design techniques employed, most of the algorithms presented
also have real-life applications.
To profit fully from this book, you should have some previous programming
experience. However, we use no particular programming language, nor are the exam-
ples for any particular machine. This and the general, fundamental treatment of the
material ensure that the ideas presented here will not lose their relevance. On the other
hand, you should not expect to be able to use the algorithms we give directly: you will
always be obliged to make the necessary effort to transcribe them into some
appropriate programming language. The use of Pascal or similarly structured language
will help reduce this effort to the minimum necessary.
Some basic mathematical knowledge is required to understand this book. Gen-
erally speaking, an introductory undergraduate course in algebra and another in cal-
culus should provide sufficient background. A certain mathematical maturity is more
important still. We take it for granted that the reader is familiar with such notions as
mathematical induction, set notation, and the concept of a graph. From time to time a
passage requires more advanced mathematical knowledge, but such passages can be
skipped on the first reading with no loss of continuity.
Our book is intended as a textbook for an upper-level undergraduate or a lower-
level graduate course in algorithmics. We have used preliminary versions at both the
University de Montreal and the University of California, Berkeley. If used as the basis
for a course at the graduate level, we suggest that the material be supplemented by
attacking some subjects in greater depth, perhaps using the excellent texts by Garey
and Johnson (1979) or Tarjan (1983). Our book can also be used for independent
study: anyone who needs to write better, more efficient algorithms can benefit from it.
Some of the chapters, in particular the one concerned with probabilistic algorithms,
contain original material.
It is unrealistic to hope to cover all the material in this book in an undergraduate
course with 45 hours or so of classes. In making a choice of subjects, the teacher
should bear in mind that the first two chapters are essential to understanding the rest of
Preface xv
the book, although most of Chapter 1 can probably be assigned as independent reading.
The other chapters are to a great extent independent of one another. An elementary
course should certainly cover the first five chapters, without necessarily going over
each and every example given there of how the techniques can be applied. The choice
of the remaining material to be studied depends on the teacher's preferences and incli-
nations.The last three chapters, however, deal with more advanced topics; the teacher
may find it interesting to discuss these briefly in an undergraduate class, perhaps to lay
the ground before going into detail in a subsequent graduate class.
Each chapter ends with suggestions for further reading. The references from each
chapter are combined at the end of the book in an extensive bibliography including
well over 200 items. Although we give the origin of a number of algorithms and ideas,
our primary aim is not historical. You should therefore not be surprised if information
of this kind is sometimes omitted. Our goal is to suggest supplementary reading that
can help you deepen your understanding of the ideas we introduce.
Almost 500 exercises are dispersed throughout the text. It is crucial to read the
problems: their statements form an integral part of the text. Their level of difficulty is
indicated as usual either by the absence of an asterisk (immediate to easy), or by the
presence of one asterisk (takes a little thought) or two asterisks (difficult, maybe even a
research project). The solutions to many of the difficult problems can be found in the
references. No solutions are provided for the other problems, nor do we think it advis-
able to provide a solutions manual. We hope the serious teacher will be pleased to have
available this extensive collection of unsolved problems from which homework assign-
ments can be chosen. Several problems call for an algorithm to be implemented on a
computer so that its efficiency may be measured experimentally and compared to the
efficiency of alternative solutions. It would be a pity to study this material without
carrying out at least one such experiment.
The first printing of this book by Prentice Hall is already in a sense a second edi-
tion. We originally wrote our book in French. In this form it was published by Masson,
Paris. Although less than a year separates the first French and English printings, the
experience gained in using the French version, in particular at an international summer
school in Bayonne, was crucial in improving the presentation of some topics, and in
spotting occasional errors. The numbering of problems and sections, however, is not
always consistent between the French and English versions.
Writing this book would have been impossible without the help of many people.
Our thanks go first to the students who have followed our courses in algorithmics over
the years since 1979, both at the undergraduate and graduate levels. Particular thanks
are due to those who kindly allowed us to copy their course notes: Denis Fortin,
Laurent Langlois, and Sophie Monet in Montreal, and Luis Miguel and Dan Philip in
Berkeley. We are also grateful to those people who used the preliminary versions of
our book, whether they were our own students, or colleagues and students at other
universities. The comments and suggestions we received were most valuable. Our war-
mest thanks, however, must go to those who carefully read and reread several chapters
of the book and who suggested many improvements and corrections: Pierre
xvi Preface
Beauchemin, Andre Chartier, Claude Crepeau, Bennett Fox, Claude Goutier, Pierre
L'Ecuyer, Pierre McKenzie, Santiago Miro, Jean-Marc Robert, and Alan Sherman.
We are also grateful to those who made it possible for us to work intensively on
our book during long periods spent away from Montreal. Paul Bratley thanks Georges
Stamon and the Universite de Franche-Comte. Gilles Brassard thanks Manuel Blum
and the University of California, Berkeley, David Chaum and the CWI, Amsterdam,
and Jean-Jacques Quisquater and Philips Research Laboratory, Bruxelles. He also
thanks John Hopcroft, who taught him so much of the material included in this book,
and Lise DuPlessis who so many times made her country house available; its sylvan
serenity provided the setting and the inspiration for writing a number of chapters.
Denise St.-Michel deserves our special thanks. It was her misfortune to help us
struggle with the text editing system through one translation and countless revisions.
Annette Hall, of Editing, Design, and Production, Inc., was no less misfortuned to help
us struggle with the last stages of production. The heads of the laboratories at the
Universite de Montreal's Departement d'informatique et de recherche operationnelle,
Michel Maksud and Robert Gerin-Lajoie, provided unstinting support. We thank the
entire team at Prentice Hall for their exemplary efficiency and friendliness; we particu-
larly appreciate the help we received from James Fegen. We also thank Eugene L.
Lawler for mentioning our French manuscript to Prentice Hall's representative in
northern California, Dan Joraanstad, even before we plucked up the courage to work on
an English version. The Natural Sciences and Engineering Research Council of Canada
provided generous support.
Last but not least, we owe a considerable debt of gratitude to our wives, Isabelle
and Pat, for their encouragement, understanding, and exemplary patience-in short,
for putting up with us -while we were working on the French and English versions of
this book.
Gilles Brassard
Paul Bratley
ALGORITHMICS
I
Preliminaries
The Concise Oxford Dictionary defines an algorithm as a "process or rules for (esp.
machine) calculation". The execution of an algorithm must not include any subjective
decisions, nor must it require the use of intuition or creativity (although we shall see an
important exception to this rule in Chapter 8). When we talk about algorithms, we
shall mostly be thinking in terms of computers. Nonetheless, other systematic methods
for solving problems could be included. For example, the methods we learn at school
for multiplying and dividing integers are also algorithms. The most famous algorithm
in history dates from the time of the Greeks : this is Euclid's algorithm for calculating
the greatest common divisor of two integers. It is even possible to consider certain
cooking recipes as algorithms, provided they do not include instructions like "Add salt
to taste".
When we set out to solve a problem, it is important to decide which algorithm
for its solution should be used. The answer can depend on many factors : the size of
the instance to be solved, the way in which the problem is presented, the speed and
memory size of the available computing equipment, and so on. Take elementary arith-
metic as an example. Suppose you have to multiply two positive integers using only
pencil and paper. If you were raised in North America, the chances are that you will
multiply the multiplicand successively by each figure of the multiplier, taken from
right to left, that you will write these intermediate results one beneath the other shifting
each line one place left, and that finally you will add all these rows to obtain your
answer. This is the "classic" multiplication algorithm.
1
2 Preliminaries Chap. 1
However, here is quite a different algorithm for doing the same thing, sometimes
called "multiplication a la russe ". Write the multiplier and the multiplicand side by
side. Make two columns, one under each operand, by repeating the following rule
until the number under the multiplier is 1 : divide the number under the multiplier by
2, ignoring any fractions, and double the number under the multiplicand by adding it to
itself. Finally, cross out each row in which the number under the multiplier is even,
and then add up the numbers that remain in the column under the multiplicand. For
example, multiplying 19 by 45 proceeds as in Figure I.I.I. In this example we get
19+76+152+608 = 855. Although this algorithm may seem funny at first, it is essen-
tially the method used in the hardware of many computers. To use it, there is no need
to memorize any multiplication tables : all we need to know is how to add up, and
how to double a number or divide it by 2.
45 19 19
22 38 ---
11 76 76
5 152 152
2 304 -----
1 608 608
We shall see in Section 4.7 that there exist more efficient algorithms when the
integers to be multiplied are very large. However, these more sophisticated algorithms
are in fact slower than the simple ones when the operands are not sufficiently large.
At this point it is important to decide how we are going to represent our algo-
rithms. If we try to describe them in English, we rapidly discover that natural
languages are not at all suited to this kind of thing. Even our description of an algo-
rithm as simple as multiplication a la russe is not completely clear. We did not so
much as try to describe the classic multiplication algorithm in any detail. To avoid
confusion, we shall in future specify our algorithms by giving a corresponding pro-
gram. However, we shall not confine ourselves to the use of one particular program-
ming language : in this way, the essential points of an algorithm will not be obscured
by the relatively unimportant programming details.
We shall use phrases in English in our programs whenever this seems to make
for simplicity and clarity. These phrases should not be confused with comments on the
program, which will always be enclosed within braces. Declarations of scalar quanti-
ties (integer, real, or Boolean) are usually omitted. Scalar parameters of functions and
procedures are passed by value unless a different specification is given explicitly, and
arrays are passed by reference.
The notation used to specify that a function or a procedure has an array param-
eter varies from case to case. Sometimes we write, for instance
procedure proc1(T : array)
Sec. 1.1 What Is an Algorithm? 3
or even
procedure proc2(T)
if the type and the dimensions of the array T are unimportant or if they are evident
from the context. In such a case #T denotes the number of elements in the array T. If
the bounds or the type of T are important, we write
procedure proc3(T [ 1 .. n ])
or more generally
procedure proc4(T [a .. b ] : integers)
In such cases n, a, and b should be considered as formal parameters, and their values
are determined by the bounds of the actual parameter corresponding to T when the pro-
cedure is called. These bounds can be specified explicitly, or changed, by a procedure
call of the form
proc3(T [ l .. m ]) .
To avoid proliferation of begin and end statements, the range of a statement such
as if, while, or for, as well as that of a declaration such as procedure, function, or
record, is shown by indenting the statements affected. The statement return marks
the dynamic end of a procedure or a function, and in the latter case it also supplies the
value of the function. The operators div and mod represent integer division (dis-
carding any fractional result) and the remainder of a division, respectively. We assume
that the reader is familiar with the concepts of recursion and of pointers. The latter are
denoted by the symbol " T ". A reader who has some familiarity with Pascal, for
example, will have no difficulty understanding the notation used to describe our algo-
rithms. For instance, here is a formal description of multiplication a la russe.
function russe (A, B )
arrays X, Y
{ initialization }
X[1] -A; Y[1] -B
i (--
I make the two columns }
while X [i] > 1 do
X[i+1]E-X[i]div2
Y[i+l] <-- Y[i] +Y[i]
i -i +1
{ add the appropriate entries }
prod F- 0
while i > 0 do
if X [i ] is odd then prod F prod + Y [i ]
<--
return prod
4 Preliminaries Chap. 1
If you are an experienced programmer, you will probably have noticed that the
arrays X and Y are not really necessary, and that this program could easily be
simplified. However, we preferred to follow blindly the preceding description of the
algorithm, even if this is more suited to a calculation using pencil and paper than to
computation on a machine. The following APL program describes exactly the same
algorithm (although you might reasonably object to a program using logarithms,
exponentiation, and multiplication by powers of 2 to describe an algorithm for multi-
plying two integers ...) .
V R-A RUSAPL B; T
[1] R<-+/(2I LA=T)/BXT-1,2*t[2*A V
On the other hand, the following program, despite a superficial resemblance to the one
given previously, describes quite a different algorithm.
function not -russe (A, B )
arrays X, Y
{ initialization }
X[1] - A ; Y[1] - B
iF1
{ make the two columns)
while X [i ] > 1 do
X[i+1]<-X[i]-1
Y[i+1]-B
i-i+1
{ add the appropriate entries }
prod F-- 0
while i > 0 do
if X [i ] > 0 then prod - prod + Y [i ]
i -i-1
return prod
We see that different algorithms can be used to solve the same problem, and that
different programs can be used to describe the same algorithm. It is important not to
lose sight of the fact that in this book we are interested in algorithms, not in the pro-
grams used to describe them.
other hand, it is usually more difficult to prove the correctness of an algorithm. When
we specify a problem, it is important to define its domain of definition, that is, the set
of instances to be considered. Although multiplication a la russe will not work if the
first operand is negative, this does not invalidate the algorithm since (-45, 19) is not an
instance of the problem being considered.
Any real computing device has a limit on the size of the instances it can handle.
However, this limit cannot be attributed to the algorithm we choose to use. Once again
we see that there is an essential difference between programs and algorithms.
program and machine, usually by some form of regression. This approach allows pred-
ictions to be made about the time an actual implementation will take to solve an
instance much larger than those used in the tests. If such an extrapolation is made
solely on the basis of empirical tests, ignoring all theoretical considerations, it is likely
to be less precise, if not plain wrong.
It is natural to ask at this point what unit should be used to express the theoret-
ical efficiency of an algorithm. There can be no question of expressing this efficiency
in seconds, say, since we do not have a standard computer to which all measurements
might refer. An answer to this problem is given by the principle of invariance,
according to which two different implementations of the same algorithm will not differ
in efficiency by more than some multiplicative constant. More precisely, if two imple-
mentations take t i(n) and t2(n) seconds, respectively, to solve an instance of size n ,
then there always exists a positive constant c such that tl(n) <_ ct2(n) whenever n is
sufficiently large. This principle remains true whatever the computer used (provided it
is of a conventional design), regardless of the programming language employed and
regardless of the skill of the programmer (provided that he or she does not actually
modify the algorithm!). Thus, a change of machine may allow us to solve a problem
10 or 100 times faster, but only a change of algorithm will give us an improvement
that gets more and more marked as the size of the instances being solved increases.
Coming back to the question of the unit to be used to express the theoretical
efficiency of an algorithm, there will be no such unit : we shall only express this
efficiency to within a multiplicative constant. We say that an algorithm takes a time in
the order of t (n), for a given function t, if there exist a positive constant c and an
implementation of the algorithm capable of solving every instance of the problem in a
time bounded above by ct (n) seconds, where n is the size (or occasionally the value,
for numerical problems) of the instance considered. The use of seconds in this
definition is obviously quite arbitrary, since we only need change the constant to bound
the time by at (n) years or bt (n) microseconds. By the principle of invariance any
other implementation of the algorithm will have the same property, although the multi-
plicative constant may change from one implementation to another. In the next chapter
we give a more rigorous treatment of this important concept known as the asymptotic
notation. It will be clear from the formal definition why we say "in the order of"
rather than the more usual "of the order of ".
Certain orders occur so frequently that it is worth giving them a name. For
example, if an algorithm takes a time in the order of n, where n is the size of the
instance to be solved, we say that it takes linear time. In this case we also talk about a
linear algorithm. Similarly, an algorithm is quadratic, cubic, polynomial, or exponen-
tial if it takes a time in the order of n 2, n 3, n k , or c, respectively, where k and c are
appropriate constants. Sections 1.6 and 1.7 illustrate the important differences between
these orders of magnitude.
The hidden multiplicative constant used in these definitions gives rise to a certain
danger of misinterpretation. Consider, for example, two algorithms whose im-
plementations on a given machine take respectively n 2 days and n 3 seconds to solve an
Sec. 1.4 Average and Worst-Case Analysis 7
The time taken by an algorithm can vary considerably between two different instances
of the same size. To illustrate this, consider two elementary sorting algorithms
inser-:
tion and selection.
procedure insert (T [1 .. n ])
for i F-- 2 to n do
x- T[i]; j- i- 1
while j > 0andx <T[j] doT[j+l] <.-T[j]
j F-j-1
T[j+1]-x
and
procedure select (T [1 .. n ])
for i E-- 1 to n -1 do
minj - i ; minx F- T [i]
for j E-- i + I to n do
if T [ j ] < minx then minj i- j
minx -T[j]
T[minjI -T[i]
T [i] f- minx
for these two algorithms : no array of n elements requires more work. Nonetheless,
the time required by the selection sorting algorithm is not very sensitive to the original
order of the array to be sorted : the test "if T [ j ] < minx " is executed exactly the
same number of times in every case. The variation in execution time is only due to the
number of times the assignments in the then part of this test are executed. To verify
this, we programmed this algorithm in Pascal on a DEC VAx 780. We found that the
time required to sort a given number of elements using selection sort does not vary by
more than 15% whatever the initial order of the elements to be sorted. As Example
2.2.1 will show, the time required by select (T) is quadratic, regardless of the initial
order of the elements.
The situation is quite different if we compare the times taken by the insertion
sort algorithm on the arrays U and V. On the one hand, insert (U) is very fast, because
the condition controlling the while loop is always false at the outset. The algorithm
therefore performs in linear time. On the other hand, insert (V) takes quadratic time,
because the while loop is executed i -1 times for each value of i (see Example 2.2.3).
The variation in time is therefore considerable, and moreover, it increases with the
number of elements to be sorted. An implementation in Pascal on the DEC VAx 780
shows that insert (U) takes less than one-fifth of a second if U is an array of 5,000 ele-
ments already in ascending order, whereas insert (V) takes three and a half minutes
when V is an array of 5,000 elements in descending order.
If such large variations can occur, how can we talk about the time taken by an
algorithm solely in terms of the size of the instance to be solved? We usually consider
the worst case of the algorithm, that is, for each size we only consider those instances
of that size on which the algorithm requires the most time. Thus we say that insertion
sorting takes quadratic time in the worst case.
Worst-case analysis is appropriate for an algorithm whose response time is crit-
ical. For example, if it is a question of controlling a nuclear power plant, it is crucial
to know an upper limit on the system's response time, regardless of the particular
instance to be solved. On the other hand, in a situation where an algorithm is to be
used many times on many different instances, it may be more important to know the
average execution time on instances of size n. We saw that the time taken by the
insertion sort algorithm varies between the order of n and the order of n 2. If we can
calculate the average time taken by the algorithm on the n! different ways of initially
ordering n elements (assuming they are all distinct), we shall have an idea of the likely
time taken to sort an array initially in random order. We shall see in Example 2.2.3
that this average time is also in the order of The insertion sorting algorithm thus
n2.
takes quadratic time both on the average and in the worst case, although in certain
cases it can be much faster. In Section 4.5 we shall see another sorting algorithm that
also takes quadratic time in the worst case, but that requires only a time in the order of
n log n on the average. Even though this algorithm has a bad worst case, it is among
the fastest algorithms known on the average.
It is usually harder to analyse the average behaviour of an algorithm than to
analyse its behaviour in the worst case. Also, such an analysis of average behaviour
Sec. 1.5 What Is an Elementary Operation? 9
can be misleading if in fact the instances to be solved are not chosen randomly when
the algorithm is used in practice. For example, it could happen that a sorting algorithm
might be used as an internal procedure in some more complex algorithm, and that for
some reason it might mostly be asked to sort arrays whose elements are already nearly
ordered. In this case, the hypothesis that each of the n! ways of.initially ordering n
elements is equally likely fails. A useful analysis of the average behaviour of an algo-
rithm therefore requires some a priori knowledge of the distribution of the instances to
be solved, and this is normally an unrealistic requirement. In Chapter 8 we shall see
how this difficulty can be circumvented for certain algorithms, and their behaviour
made independent of the specific instances to be solved.
In what follows we shall only be concerned with worst-case analyses unless
stated otherwise.
increases with the length of the operands. In practice, however, it may be sensible to
consider them as elementary operations so long as the operands concerned are of a rea-
sonable size in the instances we expect to encounter. Two examples will illustrate
what we mean.
function Not-Gauss (n)
{ calculates the sum of the integers from 1 to n }
sum E- 0
for i <- l ton do sum E-sum + i
return sum
and
may assume that additions, multiplications, and tests of divisibility by an integer (but
not calculations of factorials or exponentials) can be carried out in unit time, regardless
of the size of the operands involved.
A similar problem can arise when we analyse algorithms involving real numbers
if the required precision increases with the size of the instances to be solved. One typ-
ical example of this phenomenon is the use of De Moivre's formula to calculate values
in the Fibonacci sequence (see Section 1.7.5). In most practical situations, however,
the use of single precision floating point arithmetic proves satisfactory despite the inev-
itable loss of precision. When this is so, it is reasonable to count such arithmetic
operations at unit cost.
To sum up, even deciding whether an instruction as apparently innocent as
"j F i + j " can be considered as elementary or not calls for the use of judgement. In
what follows we count additions, subtractions, multiplications, divisions, modulo
operations, Boolean operations, comparisons, and assignments at unit cost unless expli-
citly stated otherwise.
As computing equipment gets faster and faster, it may seem hardly worthwhile to
spend our time trying to design more efficient algorithms. Would it not be easier
simply to wait for the next generation of computers? The remarks made in the
preceding sections show that this is not true. Suppose, to illustrate the argument, that
to solve a particular problem you have available an exponential algorithm and a com-
puter capable of running this algorithm on instances of size n in 10-4 x 2' seconds.
Your program can thus solve an instance of size 10 in one-tenth of a second. Solving
an instance of size 20 will take nearly two minutes. To solve an instance of size 30,
even a whole day's computing will not be sufficient. Supposing you were able to run
your computer without interruption for a year, you would only just be able to solve an
instance of size 38.
Since you need to solve bigger instances than this, you buy a new computer one
hundred times faster than the first. With the same algorithm you can now solve an
instance of size n in only 10-6 x 2" seconds. You may feel you have wasted your
money, however, when you figure out that now, when you run your new machine for a
whole year, you cannot even solve an example of size 45. In general, if you were pre-
viously able to solve an instance of size n in some given time, your new machine will
solve instances of size at best n + 7 in the same time.
Suppose you decide instead to invest in algorithmics. You find a cubic algorithm
that can solve your problem. Imagine, for example, that using the original machine
this new algorithm can solve an instance of size n in 10-2 x n 3 seconds. In one day
you can now solve instances whose size is greater than 200; with one year's computa-
tion you can almost reach size 1,500. This is illustrated by Figure 1.6.1.
12 Preliminaries Chap. 1
Not only does the new algorithm offer a much greater improvement than the pur-
chase of new machinery, it will also, supposing you are able to afford both, make such
a purchase much more profitable. In fact, thanks to your new algorithm, a machine
one hundred times faster than the old one will allow you to solve instances four or five
times bigger in the same length of time. Nevertheless, the new algorithm should not
be used uncritically on all instances of the problem, in particular on the rather small
ones. On the original machine the new algorithm takes 10 seconds to solve an instance
of size 10, which is one hundred times slower than the old algorithm. The new algo-
rithm is faster only for instances of size 20 or greater. Naturally, it is possible to com-
bine the two algorithms into a third one that looks at the size of the instance to be
solved before deciding which method to use.
Maybe you are wondering whether it is really possible in practice to accelerate an algo-
rithm to the extent suggested in the previous section. In fact, there have been cases
where even more spectacular improvements have been made, even for well-established
algorithms. Some of the following examples use large integers or real arithmetic.
Unless we explicitly state the contrary, we shall simplify our presentation by ignoring
the problems that may arise because of arithmetic overflow or loss of precision on a
particular machine. Such problems can always be solved by using multiple-precision
arithmetic (see Sections 1.7.2 and 4.7). Additions and multiplications are therefore
generally taken to be elementary operations in the following paragraphs (except, of
course, for Section 1.7.2).
Sec. 1.7 Some Practical Examples 13
1.7.1 Sorting
The sorting problem is of major importance in computer science, and in particular in
algorithmics. We are required to arrange in ascending order a collection of n objects
on which a total ordering is defined. Sorting problems are often found inside more
complex algorithms. We have already seen two classic sorting algorithms in Section
1.4: insertion sorting and selection sorting. Both these algorithms, as we saw, take
quadratic time both in the worst case and on the average.
Although these algorithms are excellent when n is small, other sorting algorithms
are more efficient when n is large. Among others, we might use Williams's heapsort
algorithm (see Example 2.2.4 and Problem 2.2.3), mergesort (see Section 4.4), or
Hoare's quicksort algorithm (see Section 4.5). All these algorithms take a time in the
order of n log n on the average ; the first two take this same amount of time even in
the worst case.
To have a clearer idea of the practical difference between a time in the order of
n 2 and a time in the order of n log n , we programmed insertion sort and quicksort in
Pascal on a DEC VAx 780. The difference in efficiency between the two algorithms is
marginal when the number of elements to be sorted is small. Quicksort is already
almost twice as fast as insertion when sorting 50 elements, and three times as fast
when sorting 100 elements. To sort 1,000 elements, insertion takes more than three
seconds, whereas quicksort requires less than one-fifth of a second. When we have
5,000 elements to sort, the inefficiency of insertion sorting becomes still more pro-
nounced : one and a half minutes are needed on average, compared to little more than
one second for quicksort. In 30 seconds, quicksort can handle 100,000 elements ; our
estimate is that it would take nine and a half hours to carry out the same task using
insertion sorting.
and the larger as the multiplicand. Thus, there is no reason for preferring it to the
classic algorithm, particularly as the hidden constant is likely to be larger.
Problem 1.7.1. How much time does multiplication a la russe take if the mul-
tiplier is longer than the multiplicand ?
the size of the smaller. If both operands are of size n, the algorithm thus takes a time
in the order of n 1.599 which is preferable to the quadratic time taken by both the classic
algorithm and multiplication a la russe.
The difference between the order of n 2 and the order of n 1.59 is less spectacular
than that between the order of n 2 and the order of n log n , which we saw in the case
of sorting algorithms. To verify this, we programmed the classic algorithm and the
algorithm of Section 4.7 in Pascal on a CDc CYBER 835 and tested them on operands of
different sizes. To take account of the architecture of the machine, we carried out the
calculations in base 220 rather than in base 10. Integers of 20 bits are thus multiplied
directly by the hardware of the machine, yet at the same time space is used quite
efficiently (the machine has 60-bit words). Accordingly, the size of an operand is
measured in terms of the number of 20-bit segments in its binary representation. The
theoretically better algorithm of Section 4.7 gives little real improvement on operands
of size 100 (equivalent to about 602 decimal digits) : it takes about 300 milliseconds,
whereas the classic algorithm takes about 400 milliseconds. For operands ten times
this length, however, the fast algorithm is some three times more efficient than the
classic algorithm : they take about 15 seconds and 40 seconds, respectively. The gain
in efficiency continues to increase as the size of the operands goes up. As we shall see
in Chapter 9, even more sophisticated algorithms exist for much larger operands.
Let
M =
I an l an 2 ... an ,n
j
be an n x n matrix. The determinant of the matrix M, denoted by det(M), is often
defined recursively : if M [i, j ] denotes the (n - 1) x (n - 1) submatrix obtained from M
by deleting the i th row and the j th column, then
Sec. 1.7 Some Practical Examples 15
= n
det(M) Y, (-1)j+'a1,1 det(M[1, j ]) .
j=1
If n = 1, the determinant is defined by det(M) = a 1,1 . Determinants are important in
linear algebra, and we need to know how to calculate them efficiently.
If we use the recursive definition directly, we obtain an algorithm that takes a
time in the order of n! to calculate the determinant of an n x n matrix (see Example
2.2.5). This is even worse than exponential. On the other hand, another classic algo-
rithm, Gauss-Jordan elimination, does the computation in cubic time. We programmed
the two algorithms in Pascal on a Coc CYBER 835. The Gauss-Jordan algorithm finds
the determinant of a 10 x 10 matrix in one-hundredth of a second ; it takes about five
and a half seconds on a 100 x 100 matrix. On the other hand, the recursive algorithm
takes more than 20 seconds on a 5 x 5 matrix and 10 minutes on a 10 x 10 matrix ; we
estimate that it would take more than 10 million years to calculate the determinant of a
20 x 20 matrix, a task accomplished by the Gauss-Jordan algorithm in about one-
twentieth of a second !
You should not conclude from this example that recursive algorithms are neces-
sarily bad. On the contrary, Chapter 4 describes a technique where recursion plays a
fundamental role in the design of efficient algorithms. In particular, Strassen
discovered in 1969 a recursive algorithm that can calculate the determinant of an n x n
matrix in a time in the order of n 1g7 or about n 2.81, thus proving that Gauss-Jordan
elimination is not optimal.
Let m and n be two positive integers. The greatest common divisor of m and n,
denoted by gcd(m, n), is the largest integer that divides both m and n exactly. When
gcd(m, n) =1, we say that m and n are coprime. For example, gcd(6,15) = 3 and
gcd(10, 21) = 1. The obvious algorithm for calculating gcd(m , n) is obtained directly
from the definition.
function ged (m, n)
i - min(m , n) + 1
repeat i F- i - 1 until i divides both m and n exactly
return i
The time taken by this algorithm is in the order of the difference between the
smaller of the two arguments and their greatest common divisor. When m and n are of
similar size and coprime, it therefore takes a time in the order of n.
A classic algorithm for calculating gcd(m, n) consists of first factorizing m and
n, and then taking the product of the prime factors common to m and n, each prime
factor being raised to the lower of its powers in the two arguments. For example, to
calculate gcd(120, 700) we first factorize 120=2 3 x 3 x 5 and 700=2 2 x5 2 x 7. The
common factors of 120 and 700 are therefore 2 and 5, and their lower powers are 2 and
1, respectively. The greatest common divisor of 120 and 700 is therefore 22 x 51 = 20.
16 Preliminaries Chap. 1
Even though this algorithm is better than the one given previously, it requires us to
factorize m and n, an operation we do not know how to do efficiently.
Nevertheless, there exists a much more efficient algorithm for calculating greatest
common divisors. This is Euclid's famous algorithm.
function Euclid (m, n)
while m > 0 do
tF- nmodm
n F-m
mt
return n
Considering the arithmetic operations to have unit cost, this algorithm takes a time in
the order of the logarithm of its arguments, even in the worst case (see Example 2.2.6),
which is much faster than the preceding algorithms. To be historically exact, Euclid's
original algorithm works using successive subtractions rather than by calculating a
modulo.
L = _1 [on - (-4)-n
where 4 _ (I +x(5)/2 is the golden V rJatio. Since 4 < 1, the term (-4)-n can be
neglected when n is large, which means that the value of fn is in the order of 0n .
However, De Moivre's formula is of little immediate help in calculating fn exactly,
since the larger n becomes, the greater is the degree of precision required in the values
of 5 and 0. On the CDC CYBER 835, a single-precision computation programmed in
Pascal produces an error for the first time when calculating f66 .
The algorithm obtained directly from the definition of the Fibonacci sequence is
the following.
function fib 1(n)
if n < 2 then return n
else return fibl(n - 1) + fib 1(n-2)
Sec. 1.7 Some Practical Examples 17
This algorithm is very inefficient because it recalculates the same values many times.
For instance, to calculate fib l (5) we need the values of fib 1(4) and fib l (3) ; but
fib 1(4) also calls for the calculation of fib 1(3). We see that fib 1(3) will be calculated
twice, fibl(2) three times, fibl(1) five times, and fibl(0) three times. In fact, the time
required to calculate f, using this algorithm is in the order of the value of f, itself,
that is to say, in the order of 0" (see Example 2.2.7).
To avoid wastefully recalculating the same values over and over, it is natural to
proceed as in Section 1.5.
function fib2(n)
i E- 1; j - 0
fork -ltondoj -i+j
i 4j-i
return j
This second algorithm takes a time in the order of n, assuming we count each addition
as an elementary operation (see Example 2.2.8). This is much better than the first
algorithm. However, there exists a third algorithm that gives as great an improvement
over the second algorithm as the second does over the first. This third algorithm,
which at first sight appears quite mysterious, takes a time in the order of the logarithm
of n (see Example 2.2.9). It will be explained in Chapter 4.
function fib3(n)
i - 1; j E-0; k -0; h E-1
while n > 0 do
if n is odd then t F-- jh
j - ih+jk+t
i - ik+t
t f- h2
h F- 2kh + t
k E--k2+t
n E-n div 2
return j
Once again, we programmed the three algorithms in Pascal on a CDC CYBER 835
in order to compare their execution times empirically. To avoid problems caused by
arithmetic overflow (the Fibonacci sequence grows very rapidly : fl00 is a number with
21 decimal digits), we carried out all the computations modulo 107, which is to say
that we only obtained the seven least significant figures of the answer. Table 1.7.1 elo-
quently illustrates the difference that the choice of an algorithm can make. (All these
times are approximate. Times greater than two minutes were estimated using the hybrid
approach.) The time required by fib l for n > 50 is so long that we did not bother to
estimate it, with the exception of the case n = 100 on which fib 1 would take well over
109 years ! Note that fib2 is more efficient than fib3 on small instances.
18 Preliminaries Chap. 1
n 10 20 30 50
Using the hybrid approach, we can estimate approximately the time taken by our
implementations of these three algorithms. Writing t, (n) for the time taken by fibi on
the instance n, we find
t1(n) ^ 0,-20 seconds,
n 5 10 15 20 25
The Fast Fourier Transform algorithm is perhaps the one algorithmic discovery that
had the greatest practical impact in history. We shall come back to this subject in
Chapter 9. For the moment let us only mention that Fourier transforms are of funda-
mental importance in such disparate applications as optics, acoustics, quantum physics,
telecommunications, systems theory, and signal processing including speech recogni-
tion. For years progress in these areas was limited by the fact that the known algo-
rithms for calculating Fourier transforms all took far too long.
The "discovery" by Cooley and Tukey in 1965 of a fast algorithm revolutionized
the situation : problems previously considered to be infeasible could now at last be
tackled. In one early test of the "new" algorithm the Fourier transform was used to
analyse data from an earthquake that had taken place in Alaska in 1964. Although the
classic algorithm took more than 26 minutes of computation, the "new" algorithm was
able to perform the same task in less than two and a half seconds. .
Ironically it turned out that an efficient algorithm had already been published in
1942 by Danielson and Lanczos. Thus the development of numerous applications had
been hindered for no good reason for almost a quarter of a century. And if that were
not sufficient, all the necessary theoretical groundwork for Danielson and Lanczos's
algorithm had already been published by Runge and Konig in 1924!
At the beginning of this book we said that "the execution of an algorithm must not
include any subjective decisions, nor must it require the use of intuition or creativity".
In this case, can we reasonably maintain that fib3 of Section 1.7.5 describes an algo-
rithm ? The problem arises because it is not realistic to consider that the multiplica-
tions in fib3 are elementary operations. Any practical implementation must take this
into account, probably by using a program package allowing arithmetic operations on
very large integers. Since the exact way in which these multiplications are to be car-
ried out is not specified in fib3, the choice may be considered a subjective decision,
and hence fib3 is not formally speaking an algorithm. That this distinction is not
merely academic is illustrated by Problems 2.2.11 and 4.7.6, which show that indeed
the order of time taken by fib3 depends on the multiplication algorithm used. And
what should we say about De Moivre's formula used as an algorithm ?
Calculation of a determinant by the recursive method of Section 1.7.3 is another
example of an incompletely presented algorithm. How are the recursive calls to be set
up? The obvious approach requires a time in the order of n 2 to be used before each
recursive call. We shall see in Problem 2.2.5 that it is possible to get by with a time in
the order of n to set up not just one, but all the n recursive calls. However, this added
subtlety does not alter the fact that the algorithm takes a time in the order of n! to cal-
culate the determinant of an n x n matrix.
20 Preliminaries Chap. 1
To make life simple, we shall continue to use the word algorithm for certain
incomplete descriptions of this kind. The details will be filled in later should our ana-
lyses require them.
The use of well-chosen data structures is often a crucial factor in the design of efficient
algorithms. Nevertheless, this book is not intended to be a manual on data structures.
We suppose that the reader already has a good working knowledge of such basic
notions as arrays, structures, pointers, and lists. We also suppose that he or she has
already come across the mathematical concepts of directed and undirected graphs, and
knows how to represent these objects efficiently on a computer. After a brief review of
some important points, this section concentrates on the less elementary notions of
heaps and disjoint sets. Chosen because they will be used in subsequent chapters,
these two structures also offer interesting examples of the analysis of algorithms (see
Example 2.2.4, Problem 2.2.3, and Example 2.2.10).
1.9.1 Lists
the elements of a list occupy the slots value [ 1 ] to value [counter ], and the order of the
elements is given by the order of their indices in the array. Using this implementation,
we can find the first and the last elements of the list rapidly, as we can the predecessor
and the successor of a given node. On the other hand, inserting a new element or
deleting one of the existing elements requires a worst-case number of operations in the
order of the current size of the list.
This implementation is particularly efficient for the important structure known as
the stack, which we obtain by restricting the permitted operations on a list : addition
and deletion of elements are allowed only at one particular end of the list. However, it
presents the major disadvantage of requiring that all the memory space potentially
required be reserved from the outset of a program.
On the other hand, if pointers are used to implement a list structure, the nodes
are usually represented by some such structure as
type node = record
value : information
next : T node ,
where each node includes an explicit pointer to its successor. In this case, provided a
suitably powerful programming language is used, the space needed to represent the list
can be allocated and recovered dynamically as the program proceeds.
Even if additional pointers are used to ensure rapid access to the first and last
elements of the list, it is difficult when this representation is used to examine the k th
element, for arbitrary k, without having to follow k pointers and thus to take a time in
the order of k. However, once an element has been found, inserting new nodes or
deleting an existing node can be done rapidly. In our example, a single pointer is used
in each node to designate its successor : it is therefore easy to traverse the list in one
direction, but not in the other. If a higher memory overhead is acceptable, it suffices to
add a second pointer to each node to allow the list to be traversed rapidly in either
direction.
1.9.2 Graphs
Intuitively speaking, a graph is a set of nodes joined by a set of lines or arrows. Con-
sider Figure 1.9.2 for instance. We distinguish directed and undirected graphs. In the
case of a directed graph the nodes are joined by arrows called edges. In the example
of Figure 1.9.2 there exists an edge from alpha to gamma and another from gamma to
alpha; beta and delta, however, are joined only in the direction indicated. In the case
of an undirected graph, the nodes are joined by lines with no direction indicated, also
called edges. In every case, the edges may form paths and cycles.
There are never more than two arrows joining any two given nodes of a directed
graph (and if there are two arrows, then they must go in opposite directions), and there
is never more than one line joining any two given nodes of an undirected graph. For-
mally speaking, a graph is therefore a pair G = < N, A > where N is a set of nodes
and A c N x N is a set of edges. An edge from node a to node b of a directed graph
is denoted by the ordered pair (a, b), whereas an edge joining nodes a and b in an
undirected graph is denoted by the set { a, b } .
22 Preliminaries Chap. 1
alpha
gamma delta
Figure 1.9.2. A directed graph.
There are at least two obvious ways to represent a graph on a computer. The
first is illustrated by
If there exists an edge from node i of the graph to node j, then adjacent [i , j ] = true ;
otherwise adjacent [i, j ] = false. In the case of an undirected graph, the matrix is
necessarily symmetric.
With this representation it is easy to see whether or not two nodes are connected.
On the other hand, should we wish to examine all the nodes connected to some given
node, we have to scan a complete row in the matrix. This takes a time in the order of
nbnodes, the number of nodes in the graph, independently of the number of edges that
exist involving this particular node. The memory space required is quadratic in the
number of nodes.
A second possible representation is as follows :
Here we attach to each node i a list of its neighbours, that is to say of those nodes j
such that an edge from i to j (in the case of a directed graph) or between i and j
(in the case of an undirected graph) exists. If the number of edges in the graph is
small, this representation is preferable from the point of view of the memory space
used. It may also be possible in this case to examine all the neighbours of a given
node in less than nbnodes operations on the average. On the other hand, to determine
whether or not two given nodes i and j are connected directly, we have to scan the list
of neighbours of node i (and possibly of node j, too), which is less efficient than
looking up a Boolean value in an array.
A tree is an acyclic, connected, undirected graph. Equivalently, a tree may be
defined as an undirected graph in which there exists exactly one path between any
given pair of nodes. The same representations used to implement graphs can be used
to implement trees.
Sec. 1.9 Data Structures 23
Let G be a directed graph. If there exists in G a vertex r such that every other vertex
can be reached from r by a unique path, then G is a rooted tree and r is its root. Any
rooted tree with n nodes contains exactly n -1 edges. It is usual to represent a rooted
tree with the root at the top, like a family tree, as in Figure 1.9.3. In this example
alpha is at the root of the tree. (When there is no danger of confusion, we shall use
the simple term "tree" instead of the more correct "rooted tree".) Extending the
analogy with a family tree, we say that beta is the parent of delta and the child of
alpha, that epsilon and zeta are the siblings of delta, that alpha is an ancestor of
epsilon, and so on.
A leaf of a rooted tree is a node with no children; the other nodes are called
internal nodes. Although nothing in the definition indicates this, the branches of a
rooted tree are often considered to be ordered : in the previous example beta is
situated to the left of gamma, and (by analogy with a family tree once again) delta is
the eldest sibling of epsilon and zeta. The two trees in Figure 1.9.4 may therefore be
considered as different.
lambda lambda
On a computer, any rooted tree may be represented using nodes of the following
type :
type treenode = record
value : information
eldest-child, next-sibling :T treenode
The rooted tree shown in Figure 1.9.3 would be represented as in Figure 1.9.5, where
now the arrows no longer represent the edges of the rooted tree, but rather the pointers
used in the computer representation. As in the case of lists, the use of additional
pointers (for example, to the parent or the eldest sibling of a given node) may speed up
certain operations at the price of an increase in the memory space needed.
24 Preliminaries Chap. 1
alpha
beta gamma
VV
delta
1A
epsilon zeta
M
Figure 1.9.5. Possible computer representation of a rooted tree.
The depth of a node in a rooted tree is the number of edges that need to be
traversed to arrive at the node starting from the root. The height of a node is the
number of edges in the longest path from the node in question to a leaf. The height of
a rooted tree is the height of its root, and thus also the depth of its deepest leaf.
Finally, the level of a node is equal to the height of the tree minus the depth of the
node concerned. For example, gamma has depth 1, height 0, and level 1 in the tree of
Figure 1.9.3.
If each node of a rooted tree can have up to n children, we say it is an n-ary tree.
In this case, the positions occupied by the children are significant. For instance, the
binary trees of Figure 1.9.6 are not the same : in the first case b is the elder child of a
and the younger child is missing, whereas in the second case b is the younger child of
a and the elder child is missing. In the important case of a binary tree, although the
metaphor becomes somewhat strained, we naturally tend to talk about the left-hand
child and the right-hand child.
There are several ways of representing an n-ary tree on a computer. One obvious
representation uses nodes of the type
type n-ary-node = record
value : information
child [ 1 .. n ] : T n-ary-node
It is possible to update a search tree, that is, to delete nodes or to add new values,
without destroying the search tree property. However, if this is done in an uncon-
sidered fashion, it can happen that the resulting tree becomes badly unbalanced, in the
sense that the height of the tree is in the order of the number of nodes it contains.
More sophisticated methods, such as the use of AVL trees or 2-3 trees, allow such
operations as searches and the addition or deletion of nodes in a time in the order of
the logarithm of the number of nodes in the tree in the worst case. These structures
also allow the efficient implementation of several additional operations. Since these
concepts are not used in the rest of this book, here we only mention their existence.
1.9.4 Heaps
A heap is a special kind of rooted tree that can be implemented efficiently in an array
without any explicit pointers. This interesting structure lends itself to numerous appli-
cations, including a remarkable sorting technique, called heapsort (see Problem 2.2.3),
as well as the efficient implementation of certain dynamic priority lists.
A binary tree is essentially complete if each of its internal nodes possesses
exactly two children, one on the left and one on the right, with the possible exception
of a unique special node situated on level 1, which possesses only a left-hand child and
no right-hand child. Moreover, all the leaves are either on level 0, or else they are on
levels 0 and 1, and no leaf is found on level 1 to the left of an internal node at the
same level. The unique special node, if it exists, is to the right of all the other level 1
internal nodes. This kind of tree can be represented using an array T by putting the
nodes of depth k, from left to right, in the positions T [2k], T [2k+1], ... , T[2 k+1_11
(with the possible exception of level 0, which may be incomplete). For instance,
Figure 1.9.7 shows how to represent an essentially complete binary tree containing 10
nodes. The parent of the node represented in T [i] is found in T [i div 2] for i > 1, and
26 Preliminaries Chap. 1
T[I0]
the children of the node represented in T[i] are found in T[2i] and T [2i + 1 ], whenever
they exist. The subtree whose root is in T [i] is also easy to identify.
A heap is an essentially complete binary tree, each of whose nodes includes an
element of information called the value of the node. The heap property is that the
value of each internal node is greater than or equal to the values of its children. Figure
1.9.8 gives an example of a heap. This same heap can be represented by the following
array :
10 7 9 4 7 5 2 2 1 6
The fundamental characteristic of this data structure is that the heap property can
be restored efficiently after modification of the value of a node. If the value of the
node increases, so that it becomes greater than the value of its parent, it suffices to
exchange these two values and then to continue the same process upwards in the tree
until the heap property is restored. We say that the modified value has been percolated
up to its new position (one often encounters the rather strange term sift-up for this pro-
cess). If, on the contrary, the value of a node is decreased so that it becomes less than
the value of at least one of its children, it suffices to exchange the modified value with
the larger of the values in the children, and then to continue this process downwards in
the tree until the heap property is restored. We say that the modified value has been
sifted down to its new position. The following procedures describe more formally the
basic heap manipulation process. For the purpose of clarity, they are written so as to
reflect as closely as possible the preceding discussion. If the reader wishes to make
use of heaps for a "real" application, we encourage him or her to figure out how to
avoid the inefficiency resulting from our use of the "exchange" instruction.
procedure alter-heap (T [ 1 .. n ], i, v )
{ T [I .. n] is a heap ; the value of T[i] is set to v and the
heap property is re-established ; we suppose that 1<_ i<_ n }
x - T[i]
T[i]<-v
if v < x then sift-down (T, i )
else percolate (T, i)
procedure sift-down (T [1 .. n ], i )
{this procedure sifts node i down so as to re-establish the heap
property in T [I .. n ] ; we suppose that T would be a heap if T [i ]
were sufficiently large ; we also suppose that 1 <_ i 5 n }
k F-i
repeat
j *-k
{ find the larger child of node j }
if 2j <_ n and T [2j ] > T [k] then k - 2j
if2j <n andT[2j+1]>T[k]thenk -2j+1
exchange T [ j ] and T [k]
{ if j = k, then the node has arrived at its final position)
until j = k
procedure percolate (T [1 .. n ], i )
{ this procedure percolates node i so as to re-establish the
heap property in T [I .. n I; we suppose that T would be a heap
if T [i] were sufficiently small ; we also suppose that
1 <_ i <_ n ; the parameter n is not used here }
k-i
repeat
j *-k
if j > 1 and T [j div 2] < T [k] then k - j div 2
exchange T [ j ] and T [k]
until j = k
The heap is an ideal data structure for finding the largest element of a set, removing it,
adding a new node, or modifying a node. These are exactly the operations we need to
implement dynamic priority lists efficiently : the value of a node gives the priority of
the corresponding event. The event with highest priority is always found at the root of
28 Preliminaries Chap. 1
the heap, and the priority of an event can be changed dynamically at all times. This is
particularly useful in computer simulations.
function find-max (T [1 .. n ])
{ returns the largest element of the heap T [ 1 .. n
return T[1]
procedure delete-max (T [1 .. n ])
{ removes the largest element of the heap T [I .. n ]
and restores the heap property in T [I .. n -1 ] }
T[1]-T[n]
sift-down (T [ 1 .. n -1 ],1)
procedure insert-node (T [ 1 .. n ], v)
{ adds an element whose value is v to the heap T [ 1 .. n ]
and restores the heap property in T [I .. n + 1 ] }
T[n+1] - v
percolate (T [1 .. n + 1], n + 1)
1 6 9 2 7 5 2 7 4 10
represented by the tree of Figure 1.9.9a. We begin by making each of the subtrees
whose roots are at level 1 into a heap, by sifting down those roots, as illustrated in
Figure 1.9.9b. The subtrees at the next higher level are then transformed into heaps,
also by sifting down their roots. Figure 1.9.9c shows the process for the left-hand sub-
tree. The other subtree at level 2 is already a heap. This results in an essentially com-
plete binary tree corresponding to the array :
1 10 9 7 7 5 2 2 4 6
It only remains to sift down its root in order to obtain the desired heap. The final pro-
cess thus goes as follows :
Sec. 1.9 Data Structures 29
10 1 9 7 7 5 2 2 4 6
10 7 9 1 7 5 2 2 4 6
10 7 9 4 7 5 2 2 1 6
(c) One level 2 subtree is made into a heap (the other already is a heap).
Problem 1.9.2. Let T [I .. 12] be an array such that T [i ] = i for each i <-12.
Exhibit the state of the array after each of the following procedure calls :
make-heap (T)
alter-heap (T, 12, 10)
alter-heap (T, 1, 6)
alter-heap (T, 5, 8) .
we choose a canonical object, which will serve as a label for the set. Initially, the N
objects are in N different sets, each containing exactly one object, which is necessarily
the label for its set. Thereafter, we execute a series of operations of two kinds :
for a given object, find which set contains it and return the label of this set ; and
given two distinct labels, merge the two corresponding sets.
1 2 3 2 1 3 4 3 4
therefore represents the trees given in Figure 1.9.10, which in turn represent the sets
11, 51, (2,4,7,10) and (3,6,8,9). To merge two sets, we need now only change a
single value in the array ; on the other hand, it is harder to find the set to which an
object belongs.
32 Preliminaries Chap. 1
function find2(x)
{ finds the label of the set containing object x }
i -x
while set [i ] 91- i do i - set [i ]
return i
procedure merge2(a, b)
[ merges the sets labelled a and b }
if a < b then set [b] - a
else set [a] - b
Problem 1.9.6. Prove that the time needed to execute an arbitrary sequence of
n operations find2 and merge3 starting from the initial situation is in the order of
n log n in the worst case.
By modifying find2, we can make our operations faster still. When we are
trying to determine the set that contains a certain object x, we first traverse the edges
of the tree leading up from x to the root. Once we know the root, we can now traverse
the same edges again, modifying each node encountered on the way to set its pointer
directly to the root. This technique is called path compression. For example, when we
execute the operation find (20) on the tree of Figure 1.9.1 la, the result is the tree of
4 9 4 9 20
11 10 16
1 8 11
1 G )a
(b) after
12 20
21 16
(a) before
Figure 1.9.1lb: nodes 20, 10, and 9, which lay on the path from node 20 to the root,
now point directly to the root. The pointers of the remaining nodes have not changed.
This technique obviously tends to diminish the height of a tree and thus to accelerate
subsequent find operations. On the other hand, the new find operation takes about
twice as long as before. Is path compression a good idea? The answer is given when
we analyse it in Example 2.2.10.
Using path compression, it is no longer true that the height of a tree whose root
is a is given by height [a]. However, this remains an upper bound on the height. We
call this value the rank of the tree, and change the name of the array accordingly. Our
function becomes
function find3(x )
{ finds the label of the set containing object x }
r Fx
while set [r ] r do r - set [r ]
{ r is the root of the tree }
i <-- x
while i # r do
j E- set [i]
set [i ] - r
i F-j
return r
From now on, when we use this combination of an array and of the procedures find3
and merge3 to deal with disjoint sets of objects, we say we are using a disjoint set
structure.
Problem 1.9.7. A second possible tactic for merging two sets is to ensure that
the root of the tree containing the smaller number of nodes becomes the child of the
other root. Path compression does not change the number of nodes in a tree, so that it
is easy to store this value exactly (whereas we could not efficiently keep track of the
exact height of a tree after path compression).
Write a procedure merge4 to implement this tactic, and give a result cor-
responding to the one in Problem 1.9.5.
** Problem 1.9.8. Analyse the combined efficiency of find3 together with your
merge4 from the previous problem.
Problem 1.9.9. A canonical object has no parent, and we make no use of the
rank of any object that is not canonical. Use this remark to implement a disjoint set
structure that uses only one length N array rather than the two set and rank. (Hint :
use negative values for the ranks.)
Sec. 1.10 References and Further Reading 35
Williams (1964). The improvements suggested at the end of the sub-section on heaps
are described in Johnson (1975), Fredman and Tarjan (1984), Gonnet and Munro
(1986), and Carlsson (1986, 1987). Carlsson (1986) also describes a data structure,
which he calls the double-ended heap, or deap, that allows finding efficiently the
largest and the smallest elements of a set. For ideas on building heaps faster, consult
McDiarmid and Reed (1987). In this book, we give only some of the possible uses of
disjoint set structures ; for more applications see Hopcroft and Karp (1971) and Aho,
Hopcroft, and Ullman (1974, 1976).
2
Let IN and IR represent the set of natural numbers (positive or zero) and the set of real
numbers, respectively. We denote the set of strictly positive natural numbers by IN+,
the set of strictly positive real numbers by IR+, and the set of nonnegative real numbers
by 1R* (the latter being a nonstandard notation). The set { true, false 1 of Boolean con-
stants is denoted by 13.
Let f : IN -* JR* be an arbitrary function. We define
In other words, 0 (f (n)) (read as "the order of f (n)") is the set of all functions t(n)
bounded above by a positive real multiple of f (n), provided that n is sufficiently large
(greater than some threshold no).
For convenience, we allow ourselves to misuse the notation from time to time.
For instance, we say that t(n) is in the order of f (n) even if t(n) is negative or
undefined for some values n < n 0. Similarly, we talk about the order of f (n) even
37
38 Analysing the Efficiency of Algorithms Chap. 2
when f (n) is negative or undefined for a finite number of values of n ; in this case we
must choose no sufficiently large to be sure that such behaviour does not happen for
n ? no. For example, it is allowable to talk about the order of n / log n , even though
this function is not defined when n =0 or n = 1, and it is correct to write
n3-3n2-n -8 E 0(n).
The principle of invariance mentioned in the previous chapter assures us that if
some implementation of a given algorithm never takes more than t(n) seconds to solve
an instance of size n, then any other implementation of the same algorithm takes a
time in the order of t(n) seconds. We say that such an algorithm takes a time in the
order of f (n) for any function f : IN -4 IR* such that t(n) E O (f (n)). In particular,
since t(n) E O (t(n)), it takes a time in the order of t (n) itself. In general, however, we
try to express the order of the algorithm's running time using the simplest possible
function f such that t(n) E O (f (n)).
Problem 2.1.2. Which of the following statements are true? Prove your
answers.
i. n2E0(n3)
ii. n3EO(n2)
O(f(n))={t:IN -4 ft*1(3cjR)(b'nEIN)[t(n)<_cf(n)]}.
Sec. 2.1 Asymptotic Notation 39
In other words, the threshold no is not necessary in principle, even though it is often
useful in practice.
Problem 2.1.4. Prove that the relation " E 0 " is transitive : if f (n) E 0 (g(n))
and g(n) E 0 (h (n)), then f (n) E 0 (h (n)). Conclude that if g(n) e 0 (h (n)), then
0 (g(n)) c 0 (h (n))
This asymptotic notation provides a way to define a partial order on functions
and consequently on the relative efficiency of different algorithms to solve a given
problem, as suggested by the following exercises.
Problem 2.1.5. For arbitrary functions f and g : IN -> IR*, prove that
i. 0 (f (n)) = 0 (g(n)) if and only if f (n) E 0 (g(n)) and g(n) E O (f (n)), and
ii. 0 (f (n)) C 0 (g(n)) if and only if f (n) E 0 (g(n)) but g(n) e 0 (f (n)).
The result of the preceding problem is useful for simplifying asymptotic calcula-
tions. For instance,
n3+3n2+n +8 E O(n3+(3n2+n +8))
= 0 (max(n 3, 3n 2 + n + 8)) = 0 (n 3).
The last equality holds despite the fact that max(n 3, 3n2 + n +8)#n 3 when
0 <- n <- 3, because the asymptotic notation only applies when n is sufficiently large.
However, we do have to ensure that f (n) and g(n) only take nonnegative values (pos-
sibly with a finite number of exceptions) to avoid false arguments like the following :
0 (n2)=O(n3+(n2-n3))
= 0 (max(n 3, n 2- n 3)) = O (n 3).
A little manipulation is sufficient, however, to allow us to conclude that
n3-3n2-n -8 E O(n3)
because
40 Analysing the Efficiency of Algorithms Chap. 2
Problem 2.1.9. The notion of a limit is a powerful and versatile tool for com-
paring functions. Given f and g : IN -* R+, prove that
i. lim f (n)/g(n) E R+ 0 (f (n)) = 0 (g(n)), and
n->-
iv. it can happen that 0 (f (n)) C 0 (g(n)) when the limit of f (n)lg(n) does not exist
as n tends to infinity and when it is also not true that 0 (g(n)) = 0 (g(n) -f(n)).
Problem 2.1.10. Use de l'Hopital's rule and Problems 2.1.5 and 2.1.9 to
prove that log n e 0 but that e O (log n).
Problem 2.1.11. Let a be an arbitrary real constant, 0 < e < 1. Use the rela-
tions " c " and "_ " to put the orders of the following functions into a sequence :
Sec. 2.1 Asymptotic Notation 41
Prove that
The notation we have just seen is useful for estimating an upper limit on the time that
some algorithm will take on a given instance. It is also sometimes interesting to esti-
mate a lower limit on this time. The following notation is proposed to this end :
df (n) to solve the worst instance of size n, for each sufficiently large n. This in no
way rules out the possibility that a much shorter time might suffice to solve some other
instances of size n. Thus there can exist an infinity of instances for which the algo-
rithm takes a time less than df (n). Insertion sort, which we saw in Section 1.4, pro-
vides a typical example of such behaviour : it takes a time in S2(n 2) in the worst case,
despite the fact that a time in the order of n is sufficient to solve arbitrarily large
instances in which the items are already sorted.
We shall be happiest if, when we analyse the asymptotic behaviour of an algo-
rithm, its execution time is bounded simultaneously both above and below by positive
real multiples (possibly different) of the same function. For this reason we introduce a
final notation
Problem 2.1.15. For arbitrary functions f and g : IN -4 R*, prove that the fol-
lowing statements are equivalent :
i. 0 (f (n)) = 0 (g(n)),
i. log,, n (=_O(logb n) whatever the values of a, b > 1 (so that we generally do not
bother to specify the base of a logarithm in an asymptotic expression), but
210g°n
iii. 0 O(21°g'n) if a # b,
n
iii. ik E O(nk+') for any given integer k >_ 0 (this works even for real k > -1;
r=i
the hidden constant in the 0 notation may depend on the value of k),
iv. log(n!) E O(n log n), and
n
v. i -' E O(log n).
0
It may happen when we analyse an algorithm that its execution time depends simul-
taneously on more than one parameter of the instance in question. This situation is
typical of certain algorithms for problems involving graphs, where the time depends on
both the number of vertices and the number of edges. In such cases the notion of the
"size of the instance" that we have used so far may lose much of its meaning. For this
reason the asymptotic notation is generalized in a natural way to functions of several
variables.
Let f : INxIN -* 1R* be an arbitrary function. We define
O(f(m,n))= [t:INxJ-4JR' I(3cElR+)(3mo,noE N)
(Vn?n0)(Vm ?mo)[t(m,n)5cf(m,n)l }
Other generalizations are defined similarly.
There is nevertheless an essential difference between an asymptotic notation with
only one parameter and one with several : unlike the result obtained in Problem 2.1.3,
it can happen that the thresholds mo and no are indispensable. This is explained by the
fact that while there are never more than a finite number of values of n >- 0 such that
n >_ no is not true, there are in general an infinite number of pairs < m , n > such that
m >- 0 and n >- 0 yet such that m >- m o and n >- no are not both true.
To simplify some calculations, we can manipulate the asymptotic notation using arith-
metic operators. For instance, 0 (f (n))+O (g(n)) represents the set of functions
44 Analysing the Efficiency of Algorithms Chap. 2
2.1.19(ii)). To belong to [O(f (n))]2, a function g(n) must be the pointwise square of
some function in O(f (n)) ; to belong to O(f (n)) x O(f (n)), however, it suffices for
g(n) to be the pointwise product of two possibly different functions, each a member of
O(f (n)). To understand the first notation, think of it as O(f (n)) exp { Id 2 } , where
" exp " denotes the binary exponentiation operator and " Id 2 " is the constant function
1d2(n)=2 for all n . Similarly, n x O(f (n)) denotes
{ t :IN -4 JR I (3g(n)EO(f(n)))(3noEIN)(Vn ?no) [t(n) = n xg(n)] },
which is not at all the same as yj 1 O(f (n)) = O(f (n))+O(f (n)) + +O(f (n)).
Problem 2.1.19. Let f and g be arbitrary functions from IN into IR*. Prove
the following identities :
H. O([f(n)]2) _ [0(f(n))]2=O(.f(n))xO(f(n));
Example 2.1.1. Although this expression can be simplified, the natural way to
express the execution time required by Dixon's integer factorization algorithm (Section
8.5.3) is
0(eo(mi ))
where n is the value of the integer to be factorized.
Many algorithms are easier to analyse if initially we only consider instances whose size
satisfies a certain condition, such as being a power of 2. Conditional asymptotic nota-
tion handles this situation. Let f : IN -4 R* be any function and let P : IN -* 113 be a
predicate. We define
In other words, 0 (f (n) I P(n)), which we read as the order of f (n) when P(n), is the
set of all functions t(n) bounded above by a real positive multiple of f (n) whenever n
is sufficiently large and provided the condition P(n) holds. The notation 0 (f (n))
defined previously is thus equivalent to 0 (f (n) I P(n)) where P(n) is the predicate
whose value is always true. The notation S2(f (n) I P(n)) and O(f (n) I P(n)) is defined
similarly, as is the notation with several parameters.
The principal reason for using this conditional notation is that it can generally be
eliminated once it has been used to make the analysis of an algorithm easier. You
probably used this idea for solving Problem 2.1.12. A function f : IN -* R* is
46 Analysing the Efficiency of Algorithms Chap. 2
eventually nondecreasing if (3 n 0 E IN) ('d n > n 0) [f (n) <_ f (n + 1)], which implies by
mathematical induction that (3 n0 E l) (`d n >_ n0) (V m >_ n) [f (n) <_ f (m)]. Let b ? 2
be any integer. Such a function is b-smooth if, as well as being eventually nonde-
creasing, it satisfies the condition f (bn ) E O (f (n)). It turns out that any function that
is b-smooth for some integer b >_ 2 is also c-smooth for every integer c ? 2 (prove
it!); we shall therefore in future simply refer to such functions as being smooth. The
following problem assembles these ideas.
t(n) _ Ja ifn = 1
t(Ln/2J)+t([n/21)+bn otherwise,
where a and b are arbitrary real positive constants. The presence of floors and ceilings
makes this equation hard to analyse exactly. However, if we only consider the cases
when n is a power of 2, the equation becomes
t (n)
Ja ifn=l
2t (n / 2) + bn if n > I is a power of 2.
The techniques discussed in Section 2.3, in particular Problem 2.3.6, allow us to infer
immediately that t (n) E O(n log n { n is a power of 2). In order to apply the result of
the previous problem to conclude that t(n) E O(n log n), we need only show that t(n) is
an eventually nondecreasing function and that n log n is smooth.
The proof that (Vn >_ 1) [t(n) <_t(n + 1)] is by mathematical induction. First,
note that t (1) = a <_ 2(a +b) = t (2). Let n be greater than 1. By the induction
hypothesis, assume that (V m < n) [t(m) <_ t (m + 1)]. In particular, t (Ln / 2j)
<_ t (L(n + l )/2]) and t ([n / 21) <_ t (((n + 1)12 ). Therefore
t(n)=t(Ln/2j)+t([n/2])+bn St(L(n+1)12j)+t([(n+l)/21)+b(n+l)=t(n+l).
A word of caution is important here. One might be tempted to claim that t(n) is
eventually nondecreasing because such is obviously the case with n log n . This argu-
mentation is irrelevant and fallacious because the relation between t(n) and n log n has
only been demonstrated thus far when n is a power of 2. The proof that t(n) is nonde-
creasing must use its recursive definition.
Sec. 2.1 Asymptotic Notation 47
When analysing algorithms, we do not always find ourselves faced with equations as
precise as those in Example 2.1.2 in the preceding section. More often we have to deal
with inequalities such as
tI(n) ifn :5 no
t(n) <
t (Ln / 2]) + t (Fn / 21) + cvi otherwise
and simultaneously
for some constants c, d E 1R+, n 0 E IN, and for appropriate initial functions
t 1, t2: IN -f IR+. Our asymptotic notation allows these constraints to be expressed
succinctly as
t(n) E t(Ln/2])+t([n121)+0(n).
To solve such inequalities, it is convenient to convert them first to equalities. To
this end, define f : IN -* IR by
1 ifn = 1
f(n)= f (Ln / 2]) + f ([n / 21) + n otherwise.
We saw in the previous section that f (n) E O(n log n).
Coming back now to the function t(n) satisfying the preceding inequalities, let
u = max(c, max I t I (n) I n Sn0}) and v =min(d,min {t7(n)/f(n) I n Sn0}). It is easy
to prove by mathematical induction that v <_ t(n)/f(n) <_ u for every integer n. We
immediately conclude that t(n) E O(f (n)) = O(n log n).
This change from the original inequalities to a parametrized equation is useful
from two points of view. Obviously it saves having to prove independently both
t (n) c O (n log n) and t (n) E Q (n log n). More importantly, however, it allows us to
confine our analysis in the initial stages to the easier case where n is a power of 2.
It is then possible, using the conditional asymptotic notation and the technique
explained in Problem 2.1.20, to generalize our results automatically to the case where n
is an arbitrary integer. This could not have been done directly with the original in-
equalities, since they do not allow us to conclude that t(n) is eventually nondecreasing,
which in turn prevents us from applying Problem 2.1.20.
48 Analysing the Efficiency of Algorithms Chap. 2
In the case of the preceding example it would have been simpler to determine the
values of a, b, and c by constructing three linear equations using the values of HI (n)
forn =0, 1, and 2.
Sec. 2.1 Asymptotic Notation 49
c=0
a+b+c=1
4a +2b +c=3
Solving this system gives us immediately a = 2, b = Z, and c = 0. However, using
this approach does not prove that f (n) = n 2/ 2 + n / 2, since nothing allows us to assert
a priori that f (n) is in fact given by a quadratic polynomial. Thus once the constants
are determined we must in any case follow this with a proof by mathematical induc-
tion.
Some recurrences are more difficult to solve than the one given in Example
2.1.3. Even the techniques we shall see in Section 2.3 will prove insufficient on occa-
sion. However, in the context of asymptotic notation, an exact solution of the
recurrence equations is generally unnecessary, since we are only interested in estab-
lishing an upper bound on the quantity of interest. In this setting constructive induc-
tion can be exploited to the hilt.
Example 2.1.4. Let the function t : IN+ -> IR+ be given by the recurrence
(a ifn=1
t(n) _
jl bn z + nt(n - 1) otherwise,
where a and b are arbitrary real positive constants. Although this equation is not easy
to solve exactly, it is sufficiently similar to the recurrence that characterizes the fac-
torial (n! = n x(n -1)!) that it is natural to conjecture that t (n) E O(n!). To establish
this, we shall prove independently that t (n) E O (n!) and that t(n) E S2(n!). The tech-
nique of constructive induction is useful in both cases. For simplicity, we begin by
proving that t(n)ES2(n!), that is, there exists a real positive constant u such that
t (n) >- un! for every positive integer n. Suppose by the partially specified induction
hypothesis that t(n -1) >- u (n -1)! for some n > 1. By definition of t(n), we know
that t(n) = bn2 + nt(n -1) >- bn2 + nu (n -1)! = bn2 + un! >- W. Thus we see that
t(n) ? un! is always true, regardless of the value of u, provided that
t(n - 1) ? u (n - 1)!. In order to conclude that t(n) >- un! for every positive integer n,
it suffices to show that this is true for n = 1; that is, t (l) >- u. Since t (l) = a, this is
the same as saying that u <- a. Taking u = a, we have established that t(n) ? an! for
every positive integer n , and thus that t (n) E U(M).
Encouraged by this success, we now try to show that t (n) E O (n!) by proving the
existence of a real positive constant v such that t(n) 5 vn! for every positive integer n.
Suppose by the partially specified induction hypothesis that t(n -1) <- v(n - 1)! for
some n > 1. As usual, this allows us to affirm that t(n) = bn2 + nt(n -1) <- bn2 + vn!.
However our aim is to show that t(n) :- vn!. Unfortunately no positive value of v
allows us to conclude that t(n) <- vn! given that t(n) :- bn2 + vn!. It seems then that
constructive induction has nothing to offer in this context, or perhaps even that the
hypothesis to the effect that t(n) EO (n!) is false.
50 Analysing the Efficiency of Algorithms Chap. 2
In fact, it is possible to obtain the result we hoped for. Rather than trying to
prove directly that t(n) <- vn!, we use constructive induction to determine real positive
constants v and w such that t(n) <- vn! - wn for any positive integer n. This idea may
seem odd, since t(n) <- vn! - wn is a stronger statement than t(n) <- vn!, which we
were unable to prove. We may hope for success, however, on the grounds that if the
statement to be proved is stronger, then so too is the induction hypothesis it allows us
to use.
Suppose then by the partially specified induction hypothesis that
t(n -1) < v (n -1)! - w (n -1) for some n > 1. Using the definition of t(n), we con-
clude that
t (n) = bn 2 + nt (n -1) <- bn 2 + n (v (n -1)! - w (n -1)) = vn! + ((b -w )n + w )n.
To conclude that t(n) <- vn! - wn, it is necessary and sufficient that
(b -w )n + w <- -w. This inequality holds if and only if n >- 3 and w > bn /(n -2).
Since n /(n -2):5 3 for every n >- 3, we may in particular choose w = 3b to ensure that
t(n) <- vn! - wn is a consequence of the hypothesis t(n -1) <- v (n - 1)! - w (n -1),
independently of the value of v, provided that n >- 3.
All that remains is to adjust the constant v to take care of the cases n <- 2. When
n = 1, we know that t(l) = a. If we are to conclude that t(n) <- v - 3b, it is necessary
and sufficient that v >- a + 3b. When n = 2, we can apply the recurrence definition of
t(n) to find t(2) = 4b + 2t(1) = 4b + 2a . If we are to conclude that t(n) <- 2v - 6b , it
is necessary and sufficient that v >- a + 5b, which is stronger than the previous condi-
tion. In particular, we may choose v = a + 5b.
The conclusion from all this is that t(n) E 9(n!) since
an! S t(n) S (a + 5b)n! - 3bn
for every positive integer n. If you got lost in the preceding argument, you may wish
to prove this assertion, which is now completely specified, by straightforward
mathematical induction. 11
The following problem is not so easy ; it illustrates well the advantage obtained
by using constructive induction, thanks to which we were able to prove that
t(n) E O(n!) without ever finding an exact expression for t(n).
a ifn=1
g(n) =
bnk + ng (n - 1) otherwise.
The notation used in this chapter is not universally accepted. You may encounter three
major differences in other books. The most striking is the widespread use of state-
ments such as n 2 = O (n 3) where we would write n 2 E O (n 3). Use of such "one-way
equalities" (for one would not write O (n3) = n2) is hard to defend except on historical
grounds. With this definition we say that the execution time of some algorithm is of
the order of f (n) (or is 0 (f (n))) rather than saying it is in the order of f (n).
The second difference is less striking but more important, since it can lead an
incautious reader astray. Some authors define
where It(n)I denotes (here only) the absolute value of t(n). Using this definition, one
would write n 3 - n 2 E n 3 + O (n 2). Of course, the meaning of "such-and-such an algo-
rithm takes a time in 0 (n 2)" does not change since algorithms cannot take negative
time. On the other hand, a statement such as O(f (n)) + O(g(n)) = O(max(f (n), g(n)))
is no longer true.
Example 2.2.1. Selection sort. Consider the selection sorting algorithm given
in Section 1.4. Most of the execution time is spent carrying out the instructions in the
inner loop, including the implicit control statements for this loop. The time taken by
each trip round the inner loop can be bounded above by a constant a. The complete
execution of the inner loop for a given value of i therefore takes at most a time
b + a (n -i ), where b is a second constant introduced to take account of the time spent
initializing the loop. One trip round the outer loop is therefore bounded above by
c +b + a (n-i ), where c is a third constant, and finally, the complete algorithm takes a
time not greater than d + 7-11 [c + b + a (n -i )], for a fourth constant d. We can
simplify this expression to 2 n 2 + (b +c -a /2)n + (d -c -b ), from which we conclude
that the algorithm takes a time in 0(n2). A similar analysis for the lower bound
shows that in fact it takes a time in O(n 2). 0
In this first example we gave all the details of our argument. Details like the ini-
tialization of the loops are rarely considered explicitly. It is often sufficient to choose
some instruction in the algorithm as a barometer and to count how many times this
instruction is executed. This figure gives us the exact order of the execution time of
the complete algorithm, provided that the time taken to execute the chosen instruction
can itself be bounded above by a constant. In the selection sort example, one possible
barometer is the test in the inner loop, which is executed exactly n (n - 1)/ 2 times
when n items are sorted. The following example shows, however, that such
simplifications should not be made incautiously.
every value of i from 2 to n when the array is initially sorted in descending order. The
total number of comparisons is therefore It 2(i - 1) = n (n - 1)/ 2 E O(n 2). In the
worst case insertion sorting thus takes a time in O(n2). Notice that selection sorting
systematically makes the same number of comparisons between elements that insertion
sorting makes in the worst case.
To determine the time needed by the insertion sorting algorithm on the average,
suppose that the n items to be sorted are all distinct and that each permutation of these
items has the same probability of occurrence. If i and k are such that 1 <- k <- i, the
probability that T[i] is the kth largest element among T [1], T [2], ... , T [i] is 1/i
because this happens for [n] (i -1)! (n -i )! = n!/i of the n! possible permutations of n
elements. For a given value of i, T[i] can therefore be situated with equal probability
in any position relative to the items T [I], T [2], ... , T [i -1]. With probability 1/i ,
T[i] < T [i -1 ] is false at the outset, and the first comparison x < T [j ] gets us out of
the while loop. The same probability applies to any given number of comparisons up
to i-2 included. On the other hand, the probability is 2/i that i - 1 comparisons will
be carried out, since this happens both when x < T [ 1 ] and when T [ 1 ] S x < T [2].
The average number of comparisons made for a given value of i is therefore
i-2
ci = - 2(i-1)+ Y, k
k=I
_ (i -1)(i+2) _ i + I _ I
2i 2 i
These events are independent for different values of i. The average number of com-
parisons made by the algorithm when sorting n items is therefore
"
i=2
Eci=z
i=2
" i+1
2
-- 1
- n2+3n -H"
4
E 0(n2).
Here H" = E i "_ I i- 1 , the n th term of the harmonic series, is negligible compared to the
dominant term n 2/4 because H" E O(log n), as shown in Problem 2.1.17.
The insertion sorting algorithm makes on the average about half the number of
comparisons that it makes in the worst case, but this number is still in O(n2).
Although the algorithm takes a time in c2(n2) both on the average and in the worst
case, a time in 0 (n) is sufficient for an infinite number of instances.
k=o
Rather than simply proving this formula by mathematical induction, try to see how you
might have discovered it for yourself.
Example 2.2.4. Making a heap. Consider the "make-heap " algorithm given
at the end of Section 1.9.4: this algorithm constructs a heap starting from an array T
of n items. As a barometer we use the instructions in the repeat loop of the algorithm
used to sift down a node. Let m be the largest number of trips round the loop that can
be caused by calling sift-down (T, n, i). Denote by j, the value of j after execution of
the assignment "j - k " on the t th trip round the loop. Obviously j, = i. Moreover,
if 1 < t <- m, then at the end of the (t -1)st trip round the loop we had j # k ; therefore
k >- 2j. This shows that j, ? 2jr_1 for 1 < t <- m. But it is impossible for k (and thus
j ) to exceed n. Consequently
X lg(n/i)<-2klg(n/2k).
% =GA
The interesting part of the sum (*) can therefore be decomposed into sections
corresponding to powers of 2. Let d = Llg (n / 2)].
Ln/2i d
E lg(n/i) <- E2 k lg(n/2k) < 2dllg(n/2d 1)
i=1 k=O
(by Problem 2.2.1). But d = Llg (n / 2)] implies that d + 1 <- lg n and d - I > lg (n / 8).
Hence
Ln/2J
lg(n/i)<-3n
From (*) we thus conclude that Ln / 2j + 3n trips round the repeat loop are enough to
construct a heap, so that this can be done in a time in 0 (n). Since any algorithm for
constructing a heap must look at each element of the array at least once, we obtain our
56 Analysing the Efficiency of Algorithms Chap. 2
final result that the construction of a heap of size n can be carried out in a time in
O(n).
A different approach yields the same result. Let t (k) stand for the time needed to
build a heap of height at most k in the worst case. Assume k >- 2. In order to con-
struct the heap, the algorithm first transforms each of the two subtrees attached to the
root into heaps of height at most k - 1 (the right hand subtree could be of height k - 2).
The algorithm then sifts the root down a path whose length is at most k, which takes a
time in the order of k in the worst case. We thus obtain the asymptotic recurrence
t (k) E 2t (k -1) +O (k). The techniques of Section 2.3, in particular Example 2.3.5,
can be used to conclude that t (k) E O (2k). But a heap containing n elements is of
height Llg nj, hence it can be built in at most t (Llg n J) steps, which is in O(n) since
2UgnJ <- n.
Problem 2.2.2. In Section 1.9.4 we saw another algorithm for making a heap
(slow-make-heap ). Analyse the worst case for this algorithm and compare it to the
algorithm analysed in Example 2.2.4.
What is the order of the execution time required by this algorithm in the worst case ?
* Problem 2.2.4. Find the exact order of the execution time for Williams's
heapsort, both in the worst case and on the average. For a given number of elements,
what are the best and the worst ways to arrange the elements initially insofar as the
execution time of the algorithm is concerned?
Besides this, the matrices for the recursive calls have to be set up and some other
housekeeping done, which takes a time in O(n2) for each of the n recursive calls if we
do this without thinking too much about it (but see Problem 2.2.5). This gives us the
following asymptotic recurrence : t (n) E nt(n - 1) + O(n 3 ). By Problem 2.1.22 the
algorithm therefore takes a time in O(n!) to calculate the determinant of an n x n
matrix.
Problem 2.2.5. Example 2.2.5 supposes that the time needed to compute a
determinant, excluding the time taken by the recursive calls, is in O(n3). Show that
this time can be reduced to O(n). By Problem 2.1.22, however, this does not affect the
fact that the complete algorithm takes a time in O(n!).
* Problem 2.2.6. Analyse the algorithm again, taking account this time of the
fact that the operands may become very large during execution of the algorithm.
Assume that you know how to add two integers of size n in a time in O(n) and that
you can multiply an integer of size m by an integer of size n in a time in O(mn).
We first show that for any two integers m and n such that n >_ in, it is always true that
n modm <n12.
If m > n / 2, then 1 <_ n / m < 2, and so Ln / m] = 1, which means that
n modm =n -m < n -n/2=n12.
Ifm<_n/2,then (n mod m)<m <-n/2.
Let k be the number of trips round the loop made by the algorithm working on the
instance < m, n >. For each integer i <- k, let ni and mi be the values of n and m at
the end of the i th trip round the loop. In particular, Mk = 0 causes the algorithm to ter-
minate and mi > 1 for every i < k. The values of mi and ni are defined by the fol-
lowing equations for 1 <- i <- k, where mo and no are the initial values of m and n :
ni =mi_i
mi =ni_i modmi_1.
58 Analysing the Efficiency of Algorithms Chap. 2
* Problem 2.2.7. Prove that the worst case for Euclid's algorithm arises when
we calculate the greatest common divisor of two consecutive numbers from the
Fibonaccisequence.
Example 2.2.7. Analysis of the algorithm fibl. We now analyse the algo-
rithm fib l of Section 1.7.5, still not taking account of the large size of the operands
involved. Let t(n) be the time taken by some implementation of this algorithm
working on the integer n. We give without explanation the corresponding asymptotic
recurrence: t(n) E t(n-1)+t(n-2)+O(1).
Once again, the recurrence looks so like the one used to define the Fibonacci
sequence that it is tempting to suppose that t(n) must be in 0(f ). However, as in
Example 2.1.4, constructive induction cannot be used directly to find a constant d such
that t(n) <- d,, . On the other hand, it is easy to use this technique to find three
real positive constants a, b, and c such that a f, <- t (n) <- bf - c for any positive
integer n. The algorithm fib 1 therefore takes a time in 0(4") to calculate the
n th term of the Fibonacci sequence, where 0 = (1+' )/ 2.
Problem 2.2.8. Using constructive induction, prove that af <- t(n) <- b,, - c
for appropriate constants a, b, and c, and give values for these constants.
Problem 2.2.9. Prove that the algorithm fibl takes a time in even if we
take into account that we need a time in O(n) to add two integers of size n. (Since the
value of f is in O(4"), its size is in 0(n lg o) = E) (n).)
Example 2.2.8. Analysis of the algorithm fib2. It is clear that the algorithm
fib2 takes a time equal to a + bn on any instance n , for appropriate constants a and b.
This time is therefore in 0(n).
Sec. 2.2 Analysis of Algorithms 59
What happens, however, if we take account of the size of the operands involved?
Let a be a constant such that the time to add two numbers of size n is bounded above
by an, and let b be a constant such that the size of fn is bounded above by bn for
every integer n ? 2. Notice first that the values of i and j at the beginning of the k th
trip round the for loop are respectively fk_2 and fk- 1 (where we take f _1 = 1). The
k th trip round the loop therefore consists of calculating fk_2 + fk _ 1 and fk - fk-2 ,
which takes a time bounded above by ab (2k - 1) for k >- 3, plus some constant time c
to carry out the assignments and the loop control. For each of the first two trips round
the loop, the time is bounded above by c +2a. Let d be an appropriate constant to
account for necessary initializations. Then the time taken by ffb2 on an integer n > 2
is bounded above by
n
d + 2(c + 2a) + Y, ab (2k -1) = abn 2 + (d + 2c +4a - 4ab) ,
k=3
which is in 0 (n 2). It is easy to see by symmetry that the algorithm takes a time in
0(n 2).
Example 2.2.9. Analysis of the algorithm fib3. The analysis of fib3 is rela-
tively easy if we do not take account of the size of the operands. To see this, take the
instructions in the while loop as our barometer. To evaluate the number of trips round
the loop, let n, be the value of n at the end of the t th trip ; in particular n 1 = Ln / 2j. It
is obvious that nr = Lnr_1/ 2]<- n,-1/2 for every 2 <- t <_ m. Consequently
nt 5 nr-1/2 5 nr-2/4 <_ ... < n 1/ 2t-1 <- n12' .
Let m = I + LlgnJ. The preceding equation shows that nn 5 n/2m < 1. But nn is a
nonnegative integer, and so nn = 0, which is the condition for ending the loop. We
conclude that the loop is executed at most m times, which implies that the algorithm
fib3 takes a time in 0 (log n).
Problem 2.2.10. Prove that the execution time of the algorithm fib3 on an
integer n is in @(log n) if no account is taken of the size of the operands.
** Problem 2.2.11. Determine the exact order of the execution time of the algo-
rithm fib3 used on an integer n. Assume that addition of two integers of size n takes a
time in O(n) and that multiplication of an integer of size n by an integer of size m
takes a time in O(mn). Compare your result to that obtained in Example 2.2.8.
If you find the result disappointing, look back at the table at the end of Section 1.7.5
and remember that the hidden constants can have practical importance! In Section 4.7
we shall see a multiplication algorithm that can be used to improve the performance of
the algorithm fib3 (Problem 4.7.5), but not, of course, that of fib2 (why not?).
60 Analysing the Efficiency of Algorithms Chap. 2
for i F-0ton do
j F- i
while j# 0 do j E- j div 2
Supposing that integer division by 2, assignments, and loop control can all be carried
out at unit cost, it is clear that this algorithm takes a time in U(n) n 0 (n log n). Find
the exact order of its execution time. Prove your answer.
* Problem 2.2.13. Answer the same question as in the preceding problem, this
time for the algorithm
for i -0ton do
j F- i
while j is odd do j F- j div 2
Show a relationship between this algorithm and the act of counting from 0 to n + 1 in
binary.
Example 2.2.10. Analysis of disjoint set structures. It can happen that the
analysis of an algorithm is facilitated by the addition of extra instructions and counters
that have nothing to do with the execution of the algorithm proper. For instance, this
is so when we look at the algorithms find3 and merge3 used to handle the disjoint set
structures introduced in Section 1.9.5. The analysis of these algorithms is the most
complicated case we shall see in this book. We begin by introducing a counter called
global and a new array cost [ 1 .. N J. Their purpose will be explained later. The array
set [1 .. N ] keeps the meaning given to it in algorithms find3 and merge3: set [i ] gives
the parent of node i in its tree, except when set [i ] = i, which indicates that i is the root
of its tree. The array rank [ 1 .. N ] plays the role of height [ 1 .. N ] in algorithm merge3 :
rank [i] denotes the rank of node i (see Section 1.9.5). We also introduce a strictly
increasing function F : IN -* IN (specified later) and its "inverse" G : IN -* N defined
by G (n) = min { m E IN I F (m) >_ n ). Finally, define the group of an element of rank r
as G (r). The algorithms become
procedure init
(initializes the trees)
global <- 0
fori F- 1 to N do set [i] F- i
rank [i] 0
cost [i ] - 0
Sec. 2.2 Analysis of Algorithms 61
With these modifications the time taken by a call on the procedure find can be
reckoned to be in the order of I plus the increase of global + EN cost [i] occasioned
by the call. The time required for a call on the procedure merge can be bounded
above by a constant. Therefore the total time required to execute an arbitrary sequence
of n calls on find and merge, including initialization, is in
N
O (N + n + global + cost [i ]),
i=1
where global and cost [i] refer to the final values of these variables after execution of
the sequence. In order to obtain an upper bound on these values, the following
remarks are relevant :
1. once an element ceases to be the root of a tree, it never becomes a root thereafter
and its rank no longer changes ;
2. the rank of a node that is not a root is always strictly less than the rank of its
parent ;
62 Analysing the Efficiency of Algorithms Chap. 2
3. the rank of an element never exceeds the logarithm (to the base 2) of the number
of elements in the corresponding tree ;
4. at every moment and for every value of k, there are not more than N i 2k ele-
ments of rank k ; and
5. at no time does the rank of an element exceed L1gNJ, nor does its group ever
exceed G ( Llg NJ).
Remarks (1) and (2) are obvious if one simply looks at the algorithms. Remark
(3) has a simple proof by mathematical induction, which we leave to the reader.
Remark (5) derives directly from remark (4). To prove the latter, define subk (i) for
each element i and rank k : if node i never attains rank k, subk (i) is the empty set ; oth-
erwise sub, (i) is the set of nodes that are in the tree whose root is i at that precise
moment when the rank of i becomes k. (Note that i is necessarily a root at that
moment, by remark (1).) By remark (3), subk (i) # 0 # subk (i) ? 2k . By remark
(2), i # j subk (i) n subk (j) = 0. Hence, if there were more than N/ 2k elements i
such that subk (i) 0, there would have to be more than N elements in all, which
proves remark (4).
The fact that G is nondecreasing allows us to conclude, using remarks (2) and
(5), that the increase in the value of global caused by a call on the procedure find
cannot exceed 1 + G ( LlgNJ). Consequently, after the execution of a sequence of n
operations, the final value of this variable is in 0 (1 + nG (L1g NJ)). It only remains to
find an upper bound on the final value of cost [i] for each element i in terms of its final
rank.
Note first that cost [i ] remains at zero while i is a root. What is more, the value
of cost [i ] only increases when a path compression causes the parent of node i to be
changed. In this case the rank of the new parent is necessarily greater than the rank of
the old parent by remark (2). But the increase in cost [i ] stops as soon as i becomes
the child of a node whose group is greater than its own. Let r be the rank of i at the
instant when i stops being a root, should this occur. By remark (1) this rank does not
change subsequently. Using all the preceding observations, we see that cost [i ] cannot
increase more than F (G (r)) - F (G (r) -1) -1 times. We conclude from this that the
final value of cost [i] is less than F (G (r)) for every node i (=-final (r), where final (r)
denotes the set of elements that cease to be a root when they have rank r >- 1 (while,
on the other hand, cost [i] remains at zero for those elements that never cease to be a
root or that do so when they have rank zero). Let K = G (Llg NJ) - 1. The rest is
merely manipulation.
N K F(g+l)
cost [i ] = Y E cost [i ]
i=l g=0 r=F(g)+1 iefina((r)
F(g+l)
<I
K
I I F (G (r))
g=0 r=F(g)+I iefinal(r)
Sec. 2.2 Analysis of Algorithms 63
K F(g+l)
<_ 1 E (N12`)F(g+1)
g=O r=F(g)+l
K
NEF(g+1)/2F(g)
g=O
It suffices therefore to put F (g + 1) = 2F(g) to balance global and 1"' cost [i] and so
to obtain E 7l cost [i] NG (Llg NJ). The time taken by the sequence of n calls on
find and merge with a universe of N elements, including the initialization time, is
therefore in
N
O(N +n +global + cost[i]) c O(N +n +nG(L1gNJ)+NG(L1gNJ))
=1
= O(max(N,n)(l +G(L1gN]))).
Now that we have decided that F (g + 1) = 2F(g), with the initial condition
F (0) = 0, what can we say about the function G ? This function, which is often
denoted by lg*, can be defined by
G(N)=lg*N =min{k I lglg ...lgN <0)
k times
The function lg* increases very slowly : lg*N <- 5 for every N 5 65,536 and
lg*N <- 6 for every N < 265,536 Notice also that lg*N - lg*(L1gN j) 2, so that
lg*(Llg NJ) E O(lg*N ). The algorithms that we have just analysed can therefore exe-
cute a sequence of n calls on find and merge with a universe of N elements in a time
in 0 (n lg*N ), provided n >_ N, which is to most intents and purposes linear.
This bound can be improved by refining the argument in a way too complex to
give here. We content ourselves with mentioning that the exact analysis involves the
use of Ackermann's function (Problem 5.8.7) and that the time taken by the algorithm
is not linear in the worst case.
4- II-1 ILL 1 I
II II I II
1I Figure 2.2.1
4- I
Example 2.2.11. The towers of Hanoi. It is said that after creating the
world, God set on Earth three rods made of diamond and 64 rings of gold. These rings
are all different in size. At the creation they were threaded on one of the rods in order
of size, the largest at the bottom and the smallest at the top. God also created a
monastery close by the rods. The monks' task in life is to transfer all the rings onto
another rod. The only operation permitted consists of moving a single ring from one
rod to another, in such a way that no ring is ever placed on top of another smaller one.
When the monks have finished their task, according to the legend, the world will come
to an end. This is probably the most reassuring prophecy ever made concerning the
end of the world, for if the monks manage to move one ring per second, working night
and day without ever resting nor ever making a mistake, their work will still not be
finished 500,000 million years after they began!
The problem can obviously be generalized to an arbitrary number of rings. For
example, with n = 3, we obtain the solution given in Figure 2.2.1. To solve the gen-
eral problem, we need only realize that to transfer the m smallest rings from rod i to
rod j (where 1 <- i <- 3, 1 <- j <- 3, i # j, and m ? 1), we can first transfer the smallest
m -1 rings from rod i to rod 6 - i -j, next transfer the m th ring from rod i to rod j,
and finally retransfer the m -1 smallest rings from rod 6 - i -j to rod j. Here is a
formal description of this algorithm ; to solve the original instance, all you have to do
(!) is to call it with the arguments (64, 1, 2).
procedure Hanoi(m, i, j)
(moves the m smallest rings from rod i to rod j }
ifm > 0 then Hanoi(m -1, i, 6 - i -j)
write i "-*" j
Hanoi(m - 1, 6 - i j, j)
To analyse the execution time of this algorithm, let us see how often the instruc-
tion write, which we use as a barometer, is executed. The answer is a function of m,
which we denote e (m). We obtain the following recurrence :
1 ifm=1
e(m)=
2e(m-1)+1 ifm > 1,
from which we find that e (m) = 2" - 1 (see Example 2.3.4). The algorithm therefore
takes a time in the exact order of 2" to solve the problem with n rings.
Problem 2.2.14. Prove that the algorithm of Example 2.2.11 is optimal in the
sense that it is impossible with the given constraints to move n rings from one rod to
another in less than 2" - 1 operations. 0
We have seen that the indispensable last step when analysing an algorithm is often to
solve a system of recurrences. With a little experience and intuition such recurrences
can often be solved by intelligent guesswork. This approach, which we do not illus-
trate here, generally proceeds in four stages : calculate the first few values of the
recurrence, look for regularity, guess a suitable general form, and finally, prove by
mathematical induction that this form is correct. Fortunately there exists a technique
that can be used to solve certain classes of recurrence almost automatically.
Our starting point is the resolution of homogeneous linear recurrences with constant
coefficients, that is, recurrences of the form
aptn +altn-1 + ... +aktn-k =0 (*)
where
i. the t, are the values we are looking for. The recurrence is linear because it does
not contain terms of the form t, ti+j , t, 2, and so on ;
ii. the coefficients a, are constants ; and
iii. the recurrence is homogeneous because the linear combination of the t, is equal
to zero.
After a while intuition may suggest we look for a solution of the form
tn
=xn
where x is a constant as yet unknown. If we try this solution in (*), we obtain
apx" +a1xn-I + ... +akx"-k = O.
This equation is satisfied if x = 0, a trivial solution of no interest, or else if
aoxk +alxk-I + ... +ak =0.
of terms rrn is a solution of the recurrence (*), where the k constants c 1 , c 22 ,-.- Ck
are determined by the initial conditions. (We need exactly k initial conditions to deter-
mine the values of these k constants.) The remarkable fact, which we do not prove
here, is that (*) has only solutions of this form.
to = to-1 + to-2
n >2
subject to to = 0, t 1 = 1.
(This is the definition of the Fibonacci sequence ; see Section 1.7.5.)
The recurrence can be rewritten in the form to - to-1 - to-2 = 0, so the charac-
teristic equation is
x2-x - 1 =0
whose roots are
2 and
r1=1+5 r2=
The general solution is therefore of the form
to =Cir +c2r2
The initial conditions give
Sec. 2.3 Solving Recurrences Using the Characteristic Equation 67
c1+ c2=0 n =0
rice+r2c2=1 n =1
from which it is easy to obtain
CI=, T5
C2=- I
T5.
Thus t" = 5 (r i - r2" ). To show that this is the same as the result obtained by De
Moivre mentioned in Section 1.7.5, we need to note only that r I = 0 and r2 = - 0-I.
Now suppose that the roots of the characteristic equation are not all distinct. Let
p(x)=aoxk +alxk- I + ... + ak
be the polynomial in the characteristic equation, and let r be a multiple root. For every
n >- k, consider the n th degree polynomial defined by
h (x) = x [x" -k p (x)]' = a o n x" + a 1(n - 1)x"- I + + ak (n -k )x"-k
Let q (x) be the polynomial such that p (x) = (x -r )2q (x). We have that
n-k )x"-k q (x) + (x-r )2[x"-k q (x)]7
h (x) = x [(x-r )2 x q (x)]' = x [2(x-r
In particular, h (r) = 0. This shows that
aonr" +ai(n-1)r"-I+ +ak(n-k)rn-k =0,
that is, to = nr" is also a solution of (*). More generally, if m is the multiplicity of the
root r, then t" = r" , t" = nr" , t" = n 2r" , t" = n m - I r n are all possible solutions
of (*). The general solution is a linear combination of these terms and of the terms
contributed by the other roots of the characteristic equation. Once again there are k
constants to be determined by the initial conditions.
The left-hand side is the same as (*), but on the right-hand side we have bn p (n),
where
i. b is a constant ; and
H. p (n) is a polynomial in n of degree d.
obtaining respectively
9th - 18tn_I = (n+5)3n+2
tn=2tn_1+I n>?1
subject to to = 0.
The recurrence can be written
to - 2t,, -I = 1,
which is of the form (**) with b = 1 and p (n) = 1, a polynomial of degree 0. The
characteristic equation is therefore
(x-2)(x-1) = 0
where the factor (x-2) comes from the left-hand side and the factor (x-1) comes from
the right-hand side. The roots of this equation are 1 and 2, so the general solution of
the recurrence is
=Ciln +C22n.
to
We need two initial conditions. We know that to = 0; to find a second initial condition
we use the recurrence itself to calculate
t1=2to+I=1.
We finally have
c1+ c2=0 n =0
c1+2c2=1 n=1
from which we obtain the solution
to=2n-1.
If all we want is the order of to , there is no need to calculate the constants in the gen-
eral solution. In the previous example, once we know that
to = C I In + C22n
we can already conclude that to E 8(2n). For this it is sufficient to notice that to , the
number of movements of a ring required, is certainly neither negative nor a constant,
since clearly to >- n. Therefore C2 > 0, and the conclusion follows.
In fact we can obtain a little more. Substituting the general solution back into
the original recurrence, we find
1 = to - 2tn -I
=c1+C22n -2(c1+C22n-')
Whatever the initial condition, it is therefore always the case that c 1 must be equal
to -1.
Sec. 2.3 Solving Recurrences Using the Characteristic Equation 71
Problem 2.3.2. There is nothing surprising in the fact that we can determine
one of the constants in the general solution without looking at the initial condition ; on
the contrary! Why?
Problem 2.3.3. By substituting the general solution back into the recurrence,
prove that in the preceding example C2 = -2 and C3 = -1 whatever the initial condition.
Conclude that all the interesting solutions of the recurrence must have c i > 0, and
hence that they are all in O(2").
where the b, are distinct constants and the p; (n) are polynomials in n respectively of
degree d; . It suffices to write the characteristic equation
(aoxk +aIxk - I+ . .. +ak)(X-b1)d'+I(x-b2)°,+I . . . =0
which contains one factor corresponding to the left-hand side and one factor
corresponding to each term on the right-hand side, and to solve the problem as before.
subject to to = 0.
The recurrence can be written
t -2tn_I =n +2n,
72 Analysing the Efficiency of Algorithms Chap. 2
Problem 2.3.4. Prove that all the solutions of this recurrence are in fact in
O(n 2"), regardless of the initial condition.
straints on these constants can be obtained without using the initial conditions ? (See
Problems 2.3.3 and 2.3.4.)
Example 2.3.7. Here is how we can find the order of T (n) if n is a power of 2
and if
T(n)=4T(n12)+n n > 1.
Replace n by 2k (so that k = lg n) to obtain T (2k) = 4T(2'- 1) + 2k . This can be
written
Sec. 2.3 Solving Recurrences Using the Characteristic Equation 73
tk =4tk-1+2k
if tk = T (2k) = T (n). We know how to solve this new recurrence : the characteristic
equation is
(x-4)(x-2) = 0
and hence tk = c 14k + c 22k .
Putting n back instead of k, we find
T(n)=C1n2+C2n.
T (n) is therefore in 0 (n 2 1 n is a power of 2).
Example 2.3.8. Here is how to find the order of T (n) if n is a power of 2 and
if
T(n)=4T(n/2)+n2 n > 1.
Proceeding in the same way, we obtain successively
T(2k)=4T(2k-1)+4k
tk =4tk-I +4k.
The characteristic equation is (x-4)2 = 0, and so
tk =c14k +c2k4k
T(n) =c1n2+C2n2lgn.
Thus T (n) e O (n 2log n I n is a power of 2).
Example 2.3.9. Here is how to find the order of T (n) if n is a power of 2 and
if
T (n) = 2T (n / 2) + n lg n n > 1.
As before, we obtain
T(2k)=2T(2k-1)+k2k
tk =2tk-I+k2k
The characteristic equation is (x-2)3 = 0, and so
tk =c12k +C2k2k +C3k22k
T(2k)=3T(2k-l)+c2k
tk =3tk_i+c2k.
The characteristic equation is (x-3)(x-2) = 0, and so
tk = c 13k + C22k
T(n)=c13'g" +c2n
and hence since algb = b1g°
T(n)=cInIg3+c2n.
Finally, T (n) E O (n 'g 3 I n is a power of 2). 0
Remark. In Examples 2.3.7 to 2.3.10 the recurrence given for T (n) only applies
when n is a power of 2. It is therefore inevitable that the solution obtained should be
in conditional asymptotic notation. In each of these four cases, however, it is sufficient
to add the condition that T (n) is eventually nondecreasing to be able to conclude that
the asymptotic results obtained apply unconditionally for all values of n. This follows
from problem 2.1.20 since the functions n 2, n 2 log n , n log2n and n I g 3 are smooth.
* Problem 2.3.6. The constants no ? 1, b >- 2 and k >- 0 are integers, whereas a
and c are positive real numbers. Let T : IN -) 1R+ be an eventually nondecreasing
function such that
T(n)=aT(n/b)+cnk n > no
when n In 0 is a power of b. Show that the exact order of T (n) is given by
22A
Problem 2.3.8. Solve the following recurrence exactly for n of the form
)+lgn n >_4
subject to T(2) = 1.
Express your solution as simply as possible using the 0 notation.
subject to to = n for 0 5 n < 3. Express your answer as simply as possible using the
O notation.
Let p = loge a . It turns out that the simplest way to express T (n) in asymptotic nota-
tion depends on how f (n) compares to n P . In what follows, all asymptotic notation is
implicitly conditional on n E=X. Prove that
Note that the third alternative includes f (n) E O(nP) by choosing q=1.
W. As a special case of the first alternative, T (n) e O(n P) whenever f (n) E O (n' ) for
some real constant r < p .
iv. The last alternative can be generalized to include cases such as
f (n) E O(n P+q log n) or f (n) E O(n P+q / log n) ; we also get T (n) E O(f (n)) if
there exist a function g : X -* IR* and a real constant a> a such that
f(n)EO(g(n)) and g(bn)>-ag(n) for all
v. Prove or disprove that the third alternative can be generalized as follows :
T(n) E O(f (n) log n) whenever there exist two strictly positive real constants
). If you
q 1 <_ q 2 such that f (n) E O (n P (log n)q' 1) and f (n) E Q(n P (log n)'
disprove it, find the simplest but most general additional constraint on f (n) that
suffices to imply T (n) E O(f (n) log n).
The asymptotic notation has existed for some while in mathematics: see Bachmann
(1894) and de Bruijn (1961). Knuth (1976) gives an account of its history and pro-
poses a standard form for it. Conditional asymptotic notation and its use in Problem
2.1.20 are introduced by Brassard (1985), who also suggests that "one-way inequali-
ties" should be abandoned in favour of a notation based on sets. For information on
calculating limits and on de l'Hopital's rule, consult any book on mathematical
analysis, Rudin (1953), for instance.
The book by Purdom and Brown (1985) presents a number of techniques for ana-
lysing algorithms. The main mathematical aspects of the analysis of algorithms can
also be found in Greene and Knuth (1981).
Example 2.1.1 corresponds to the algorithm of Dixon (1981). Problem 2.2.3
comes from Williams (1964). The analysis of disjoint set structures given in Example
2.2.10 is adapted from Hopcroft and Ullman (1973). The more precise analysis
making use of Ackermann's function can be found in Tarjan (1975, 1983). Buneman
and Levy (1980) and Dewdney (1984) give a solution to Problem 2.2.15.
Several techniques for solving recurrences, including the characteristic equation
and change of variable, are explained in Lueker (1980). For a more rigorous
mathematical treatment see Knuth (1968) or Purdom and Brown (1985). The paper by
Bentley, Haken, and Saxe (1980) is particularly relevant for recurrences occurring from
the analysis of divide-and-conquer algorithms (see Chapter 4).
3
Greedy Algorithms
3.1 INTRODUCTION
Greedy algorithms are usually quite simple. They are typically used to solve optimiza-
tion problems : find the best order to execute a certain set of jobs on a computer, find
the shortest route in a graph, and so on. In the most common situation we have
a set (or a list) of candidates : the jobs to be executed, the nodes of the graph, or
whatever ;
the set of candidates that have already been used ;
a function that checks whether a particular set of candidates provides a solution
to our problem, ignoring questions of optimality for the time being ;
a function that checks whether a set of candidates is feasible, that is, whether or
not it is possible to complete the set in such a way as to obtain at least one solu-
tion (not necessarily optimal) to our problem (we usually expect that the problem
has at least one solution making use of candidates from the set initially avail-
able);
a selection function that indicates at any time which is the most promising of the
candidates not yet used ; and
an objective function that gives the value of a solution (the time needed to exe-
cute all the jobs in the given order, the length of the path we have found, and so
on); this is the function we are trying to optimize.
79
80 Greedy Algorithms Chap. 3
objective function. A greedy algorithm proceeds step by step. Initially, the set of
chosen candidates is empty. Then at each step, we try to add to this set the best
remaining candidate, our choice being guided by the selection function. If the enlarged
set of chosen candidates is no longer feasible, we remove the candidate we just added ;
the candidate we tried and removed is never considered again. However, if the
enlarged set is still feasible, then the candidate we just added stays in the set of chosen
candidates from now on. Each time we enlarge the set of chosen candidates, we check
whether the set now constitutes a solution to our problem. When a greedy algorithm
works correctly, the first solution found in this way is always optimal.
It is easy to see why such algorithms are called "greedy": at every step, the pro-
cedure chooses the best morsel it can swallow, without worrying about the future.
It never changes its mind : once a candidate is included in the solution, it is there for
good ; once a candidate is excluded from the solution, it is never reconsidered.
The selection function is usually based on the objective function ; they may even
be identical. However, we shall see in the following examples that at times there may
be several plausible selection functions, so that we have to choose the right one if we
want our algorithm to work properly.
Example 3.1.1. We want to give change to a customer using the smallest pos-
sible number of coins. The elements of the problem are
the candidates: a finite set of coins, representing for instance 1, 5, 10, and 25
units, and containing at least one coin of each type ;
a solution : the total value of the chosen set of coins is exactly the amount we
have to pay ;
a feasible set : the total value of the chosen set does not exceed the amount to be
paid
the selection function : choose the highest-valued coin remaining in the set of
candidates; and
the objective function : the number of coins used in the solution. 0
Sec. 3.2 Greedy Algorithms and Graphs 81
* Problem 3.1.1. Prove that with the values suggested for the coins in the
preceding example the greedy algorithm will always find an optimal solution provided
one exists.
Prove, on the other hand, by giving specific counterexamples, that the greedy
algorithm no longer gives an optimal solution in every case if there also exist 12-unit
coins, or if one type of coin is missing from the initial set. Show that it can even
happen that the greedy algorithm fails to find a solution at all despite the fact that one
exists.
It is obviously more efficient to reject all the remaining 25-unit coins (say) at
once when the remaining amount to be represented falls below this value. Using
integer division is also more efficient than proceeding by successive subtractions.
Let G = < N, A > be a connected undirected graph where N is the set of nodes and A
is the set of edges. Each edge has a given non-negative length. The problem is to find
a subset T of the edges of G such that all the nodes remain connected when only the
edges in T are used, and the sum of the lengths of the edges in T is as small as pos-
sible. (Instead of talking about length, we can associate a cost to each edge. In this
case the problem is to find a subset T whose total cost is as small as possible. Obvi-
ously, this change of terminology does not affect the way we solve the problem.)
Problem 3.2.1. Prove that the partial graph < N, T > formed by the nodes of
G and the edges in T is a tree.
The graph < N, T > is called a minimal spanning tree for the graph G. This
problem has many applications. For instance, if the nodes of G represent towns, and
the cost of an edge (a, b } is the cost of building a road from a to b, then a minimal
spanning tree of G shows us how to construct at the lowest possible cost a road system
linking all the towns in question.
We give two greedy algorithms to solve this problem. In the terminology we
have used for greedy algorithms, a set of edges is a solution if it constitutes a spanning
tree, and it is feasible if it does not include a cycle. Moreover, a feasible set of edges
is promising if it can be completed so as to form an optimal solution. In particular, the
empty set is always promising since G is connected. Finally, an edge touches a given
set of nodes if exactly one end of the edge is in the set. The following lemma is cru-
cial for proving the correctness of the forthcoming algorithms.
Lemma 3.2.1. Let G = < N, A > be a connected undirected graph where the
length of each edge is given. Let B C N be a strict subset of the nodes of G. Let
82 Greedy Algorithms Chap. 3
The initial set of candidates is the set of all the edges. A greedy algorithm
selects the edges one by one in some given order. Each edge is either included in the
set that will eventually form the solution or eliminated from further consideration. The
main difference between the various greedy algorithms to solve this problem lies in the
order in which the edges are selected.
N\B B
T contains the chosen edges (1,2), (2, 31, (4,5), (6,7), (1,4), and (4,7). This
minimal spanning tree is shown by the heavy lines in Figure 3.2.2; its total length
is 17.
Problem 3.2.2. Prove that Kruskal's algorithm works correctly. The proof,
which uses lemma 3.2.1, is by induction on the number of edges selected until now. El
Problem 3.2.3. A graph may have several different minimal spanning trees.
Is this the case in our example, and if so, where is this possibility reflected in the algo-
rithm ? 0
To implement the algorithm, we have to handle a certain number of sets : the
nodes in each connected component. We have to carry out rapidly the two operations
find (x), which tells us in which component the node x is to be found, and
merge (A , B) to merge two disjoint sets. We therefore use disjoint set structures (Sec-
tion 1.9.5). For this algorithm it is preferable to represent the graph as a vector of
edges with their associated lengths rather than as a matrix of distances. Here is the
algorithm.
function Kruskal (G = < N, A > : graph ; length : A -i IR*) : set of edges
1 initialization)
Sort A by increasing length
n -#N
T - 0 1 will contain the edges of the minimal spanning tree)
initialize n sets, each containing one distinct element of N
{ greedy loop }
repeat
{ u , v) f- shortest edge not yet considered
ucomp F-- find (u )
vcomp f- find (v)
if ucomp # vcomp then
merge (ucomp, vcomp)
T -T u {{u,v}}
until #T = n -1
return T
Problem 3.2.4. What happens if, by mistake, we run the algorithm on a graph
that is not connected?
We can estimate the execution time of the algorithm as follows. On a graph with
n nodes and a edges the number of operations is in
0 (a log a) to sort the edges, which is equivalent to 0 (a log n) since
n-1 <-a Sn(n-1)/2;
0 (n) to initialize the n disjoint sets ;
in the worst case 0 ((2a+n -1)lg*n) for all the find and merge operations, by
the analysis given in example 2.2.10, since there are at most 2a find operations
and n -1 merge operations on a universe containing n elements ; and
at worst, O (a) for the remaining operations.
For a connected graph we know that a >- n - 1. We conclude that the total time
for the algorithm is in 0 (a log n) because 0 (lg*n) c 0 (log n). Although this does not
Sec. 3.2 Greedy Algorithms and Graphs 85
change the worst-case analysis, it is preferable to keep the edges in a heap (Section
1.9.4 - here the heap property should be inverted so that the value of each internal
node is less than or equal to the values of its children). This allows the initialization to
be carried out in a time in 0 (a), although each search for a minimum in the repeat
loop will now take a time in 0 (log a) = 0 (log n). This is particularly advantageous
in cases when the minimal spanning tree is found at a moment when a considerable
number of edges remain to be tried. In such cases, the original algorithm wastes time
sorting all these useless edges.
Problem 3.2.5. What can you say about the time required by Kruskal's algo-
rithm if, instead of providing a list of edges, the user supplies a matrix of distances,
leaving to the algorithm the job of working out which edges exist ?
function Prim (G = < N, A > : graph ; length : A -> IR*) : set of edges
( initialization)
T F- 0 (will contain the edges of the minimal spanning tree }
B (-- (an arbitrary member of N }
while B # N do
find { u , v } of minimum length such that u E N \ B and v E B
T 6-T u {(u,v}}
B F-B U (u)
return T
Problem 3.2.6. Prove that Prim's algorithm works correctly. The proof,
which again uses Lemma 3.2.1, is by induction on the number of nodes in B.
To illustrate how the algorithm works, consider once again the graph in Figure
3.2.2. We arbitrarily choose node I as the starting node.
88 Greedy Algorithms Chap. 3
gives the length of each directed edge : L [i, j ] ? 0 if the edge (i, j) exists and
L [i, j ] = 00 otherwise. Here is the algorithm.
function Dijkstra (L [1 .. n, I .. n ]) : array[2.. n ]
(initialization)
C F- { 2, 3, ... , n ) { S = N \ C exists only by implication )
for i -2tondoD[i]*-L[l,i]
{ greedy loop }
repeat n -2 times
v - some element of C minimizing D [v ]
C (-- C \ { v) { and implicitly S - S u { v } }
for each w e C do
D[w] f- min(D[w], D [v] +L[v,w])
return D
The algorithm proceeds as follows on the graph in Figure 3.2.3.
Step V C D
Initialization
1
-5
(2,3,4,5)
(2,3,4)
[50,30,100,10]
[50,30,20,10]
2 4 12,31 [40,30,20, 10]
3 3 {2) (35,30,20, 10]
Clearly, D would not change if we did one more iteration to remove the last element of
C, which is why the main loop is only repeated n - 2 times.
If we want not only to know the length of the shortest paths but also where they
pass, it suffices to add a second array P [2.. n ], where P [v ] contains the number of the
node that precedes v in the shortest path. To find the complete path, simply follow the
pointers P backwards from a destination to the source. The modifications to the algo-
rithm are simple :
i. if a node i is in S, then D [i] gives the length of the shortest path from the source
to i
H. if a node i is not in S, then D [i] gives the length of the shortest special path
from the source to i.
Look at the initialization of D and S to convince yourself that these two condi-
tions hold at the outset ; the base for our induction is thus obtained. Next, consider the
inductive step, and suppose by the induction hypothesis that these two conditions hold
just before we add a new node v to S.
I. This follows immediately from the induction hypothesis for each node i that was
already in S before the addition of v. As for node v, it will now belong to S.
We must therefore check that D [v ] gives the length of the shortest path from the
source to v. By the induction hypothesis D [v ] certainly gives the length of the
shortest special path. We therefore have to verify that the shortest path from the
source to v does not pass through a node that does not belong to S. Suppose the
contrary : when we follow the shortest path from the source to v, the first node
encountered that does not belong to S is some node x distinct from v (see Figure
3.2.4).
The initial section of the path, as far as x, is a special path. Consequently,
the total distance to v via x is
distance to x (since edge lengths are non-negative)
D [x] (by part (ii) of the induction)
D [v] (because the algorithm chose v before x)
and the path via x cannot be shorter than the special path leading to v.
We have thus verified that when v is added to S, part (i) of the induction
remains true.
ii. Consider now a node w 0 S different from v When v is added to S, there are
.
two possibilities for the shortest special path from the source to w : either it does
not change, or else it now passes through v. In the latter case it seems at first
90 Greedy Algorithms Chap. 3
Figure 3.2.4. The shortest path from the source to v cannot go through node x.
glance that there are again two possibilities : either v is the last node in S visited
before arriving at w or it is not. We have to compare explicitly the length of the
old special path leading to w and the length of the special path that visits v just
before arriving at w ; the algorithm does this. However, we can ignore the possi-
bility (see Figure 3.2.5) that v is visited, but not just before arriving at w : a path
of this type cannot be shorter than the path of length D [x] + L [x, w] that we
examined at a previous step when x was added to S, because D [x] <_ D [v].
Thus the algorithm ensures that part (ii) of the induction also remains true when
a new node v is added to S.
To complete the proof that the algorithm works, we need only note that when its
execution stops all the nodes but one are in S (even though the set S is not constructed
explicitly). At this point it is clear that the shortest path from the source to the
remaining node is a special path.
Problem 3.2.11. Show by giving an explicit example that if the edge lengths
can be negative, then Dijkstra's algorithm does not always work correctly. Is it still
sensible to talk about shortest paths if negative distances are allowed?
Figure 3.2.5 The shortest path from the source to w cannot visit x between v and w.
Sec. 3.2 Greedy Algorithms and Graphs 91
O (a logk n). Problem 2.1.17(i) does not apply here since k is not a constant.
Note that this gives 0 (n 2) if a = n 2 and 0 (a log n) if a = n ; it therefore gives
the best of both worlds. (Still faster algorithms exist.)
Problem 3.2.13. Show that Prim's algorithm to find minimal spanning trees
can also be implemented through the use of heaps. Show that it then takes a time in
0(a log n ), just as Kruskal's algorithm would. Finally, show that the modification
suggested in the previous problem applies just as well to Prim's algorithm.
A single server (a processor, a petrol pump, a cashier in a bank, and so on) has n cus-
tomers to serve. The service time required by each customer is known in advance :
customer i will take time t, , 1 <_ i <_ n. We want to minimize
n
T= (time in system for customer i ).
Since the number of customers is fixed, minimizing the total time in the system is
equivalent to minimizing the average time. For example, if we have three customers
with
t1=5, t2=10, t3=3,
then six orders of service are possible.
Order T
123 : 5 + (5+10) + (5+10+3)=38
132: 5 + (5+3) + (5+3+10)=31
213: 10 + (10+5) + (10+5+3)=43
231 : 10 + (10+3) + (10+3+5) = 41
312: 3 + (3+5) + (3 + 5 + 10) = 29 E- optimal
321: 3 + (3+10) + (3+10+5)=34
In the first case, customer 1 is served immediately, customer 2 waits while customer 1
is served and then gets his turn, and customer 3 waits while both 1 and 2 are served
and then is served himself: the total time passed in the system by the three customers
is 38.
Imagine an algorithm that builds the optimal schedule step by step. Suppose that
after scheduling customers i I , i 2, ... , i,,, we add customer j. The increase in T at
this stage is
Sec. 3.3 Greedy Algorithms for Scheduling 93
Suppose now that I is such that we can find two integers a and b with a < b and
t; > t; in other words, the a th customer is served before the b th customer even
:
though the former needs more service time than the latter (see Figure 3.3.1). If we
exchange the positions of these two customers, we obtain a new order of service I'
obtained from I by interchanging the items is and ib This new order is preferable .
because
n
T(I')=(n-a+1)ti,+(n-b+1)ti + E (n-k+l)t;
k=1
kta,b
Service order 1 2 .. a b n
Served customer 11 i,
(from I)
After exchange
of is and !b
I
Service duration l t' ... tab ... tea ... tin
Served customer 1o
(from I')
and thus
(b -a )(ti, - tin )
>0.
We can therefore improve any schedule in which a customer is served before someone
else who requires less service. The only schedules that remain are those obtained by
putting the customers in nondecreasing order of service time. All such schedules are
clearly equivalent, and therefore they are all optimal.
Problem 3.3.1. How much time (use the 0 notation) is required by a greedy
algorithm that accepts n and t [I .. n ] as data and produces an optimal schedule ?
The problem can be generalized to a system with s servers, as can the algorithm.
Without loss of generality, suppose the customers are numbered so that
t 1 <- t2 <- S to . In this context, server i, 1 s , must serve customers number
i9 i +s , i + 2s .... in that order.
Problem 3.3.2. Prove that this algorithm always yields an optimal schedule.
where the constant c depends on the recording density and the speed of the drive. We
want to minimize T.
i. Prove by giving an explicit example that it is not necessarily optimal to hold the
programs in order of increasing values of 1i .
ii. Prove by giving an explicit example that it is not necessarily optimal to hold the
programs in order of decreasing values of pi
iii. Prove that T is minimized if the programs are held in order of decreasing pi /li .
Sec. 3.3 Greedy Algorithms for Scheduling 95
We have a set of n jobs to execute, each of which takes unit time. At any instant
t = 1, 2, ... , we can execute exactly one job. Job i, 1 <_ i <_ n , earns us a profit g; if
and only if it is executed no later than time d, .
i 1 2 3 4
g, 50 10 15 30
d; 2 1 2 1
Sequence: 1 Profit: 50
2 10
3 15
4 30
1,3 65
2,1 60
2,3 25
3,1 65
4, 1 80 - optimum
4,3 45.
The sequence 3, 2, for instance, is not considered because job 2 would be executed at
time t = 2, after its deadline d2 = 1. To maximize our profit in this example, we
should execute the schedule 4, 1.
A set of jobs is feasible if there exists at least one sequence (also called feasible)
that allows all the jobs in the set to be executed in time for their respective deadlines.
An obvious greedy algorithm consists of constructing the schedule step by step, adding
at each step the job with the highest value of g; among those not yet considered, pro-
vided that the chosen set of jobs remains feasible.
In the preceding example we first choose job 1. Next, we choose job 4: the set
11, 41 is feasible because it can be executed in the order 4, 1. Next, we try the set
11, 3, 41, which turns out not to be feasible ; job 3 is therefore rejected. Finally we try
(1, 2, 41, which is also infeasible ; so job 2 is also rejected. Our solution - optimal in
this case - is therefore to execute the set of jobs 11,41, which in fact can only be
done in the order 4, 1. It remains to prove that this algorithm always finds an optimal
schedule and to find an efficient way of implementing it.
Let J be a set of k jobs. At first glance it seems we might have to try all the k !
possible permutations of these jobs to see whether J is feasible. Happily this is not the
case.
96 Greedy Algorithms Chap. 3
tion of these jobs such that ds <- ds2 ds, . Then the set J is feasible if and only
if the sequence a is feasible.
If some task a is scheduled in S, whereas there is a gap in S,, (and therefore task
a does not belong to J ), the set J u { a } is feasible and would be more profitable
than J. This is not possible since J is optimal by assumption.
If some task b is scheduled in S. whereas there is a gap in S; , the set I u { b } is
feasible, hence the greedy algorithm should have included b in I. This is also
impossible since it did not do so.
The only remaining possibility is that some task a is scheduled in S; whereas a
different task b is scheduled in S,, . Again, this implies that a does not appear in
J and that b does not appear in I.
If gQ > gb , one could substitute a for b in J and improve it. This goes
against the optimality of J.
Sec. 3.3 Greedy Algorithms for Scheduling 97
- If g, < gl, , the greedy algorithm should have chosen h before even con-
sidering a since (I \ { a }) u { h } would be feasible. This is not possible
either since it did not include h in I.
- The only remaining possibility is therefore that go = gn
In conclusion, for each time slot, sequences S; and S,, either schedule no tasks,
the same task, or two distinct tasks yielding the same profit. This implies that the total
worth of I is identical with that of the optimal set J, and thus I is optimal as well.
For our first implementation of the algorithm suppose without loss of generality
that the jobs are numbered so that g I >- g 2 ? >- g . To allow us to use sentinels,
suppose further that n > 0 and that di > 0, 1 <_ i <- n .
function sequence (d [0.. n ]) : k, array[ 1 .. k ]
array j [0.. n ]
d [0], j [0] f- 0 (sentinels)
k, j [ l ] - 1 { task 1 is always chosen }
{greedy loop I
for i <-- 2 to n do { in decreasing order of g I
r-k
while d [ j Ir ]] > max(d [i], r) do r F- r -
ifd[j[r]]<-d[i]andd[i]>r then
for/ F-kstep-] tor+lldoj[I+1]E-j[l]
j[r+I] - i
k4-k+I
return k, j[I ..k]
P Y q x r S,
r s r P u V q w Si
after reorganization,
if this task is a
x Y P r q S;
u S P r V q w S"
The exact values of the g, are unnecessary provided the jobs are correctly numbered in
order of decreasing profit.
Problem 3.3.4. Verify that the algorithm works, and show that it requires
quadratic time in the worst case.
The lemma suggests that we should consider an algorithm that tries to fill one by
one the positions in a sequence of length l = min(n, max({ di 1 <_ i <_ n 1)). For any
I
position t, define n, = max{k <_ t I position k is free }. Also define certain sets of
positions : two positions i and j are in the same set if ni = nj (see Figure 3.3.3). For a
given set K of positions, let F (K) be the smallest member of K. Finally, define a ficti-
tious position 0, which is always free.
Clearly, as we assign new jobs to vacant positions, these sets will merge to form
larger sets ; disjoint set structures are intended for just this purpose. We obtain an
algorithm whose essential steps are the following :
n; = nj i j
Free
position
Occupied
position
3 dU[il]
Initialization: Ad
k
Try 2: 2
Try 3: unchanged
3 3
Try 4: 2 4
Try 5: unchanged
Try 6: unchanged
If the instance is given to us with the jobs already ordered by decreasing profit,
so that an optimal sequence can be obtained merely by calling the preceding algorithm,
most of the time will be spent manipulating disjoint sets. Since there are at most n+1
find operations and I merge operations to execute, and since n > 1, the required time is
in 0 (n lg*1), which is essentially linear. If, on the other hand, the jobs are given to us
in arbitrary order, so that we have to begin by sorting them, we need a time in
0 (n log n) to obtain the initial sequence.
Because they are so simple, greedy algorithms are often used as heuristics in situations
where we can (or must) accept an approximate solution instead of an exact optimal
solution. We content ourselves with giving two examples of this technique. These
examples also serve to illustrate that the greedy approach does not always yield an
optimal solution.
Sec. 3.4 Greedy Heuristics 101
F= 0 1 2 3
F= 0 2
F= 0
Let G = < N, A > be an undirected graph whose nodes are to be coloured. If two
nodes are joined by an edge, then they must be of different colours. Our aim is to use
as few different colours as possible. For instance, the graph in Figure 3.4.1 can be
coloured using only two colours: red for nodes 1, 3 and 4, and blue for nodes 2 and 5.
An obvious greedy algorithm consists of choosing a colour and an arbitrary
starting node, and then considering each other node in turn, painting it with this colour
if possible. When no further nodes can be painted, we choose a new colour and a new
102 Greedy Algorithms Chap. 3
starting node that has not yet been painted, we paint as many nodes as we can with this
second colour, and so on.
In our example if node 1 is painted red, we are not allowed to paint node 2 with
the same colour, nodes 3 and 4 can be red, and lastly node 5 may not be painted. If we
start again at node 2 using blue paint, we can colour nodes 2 and 5 and finish the job
using only two colours ; this is an optimal solution. However, if we systematically
consider the nodes in the order 1, 5, 2, 3, 4, we get a different answer: nodes I and 5
are painted red, then node 2 is painted blue, but now nodes 3 and 4 require us to use a
third colour; in this case the result is not optimal.
The algorithm is therefore no more than a heuristic that may possibly, but not
certainly, find a "good" solution. Why should we be interested by such algorithms?
For the colouring problem and many others the answer is that all the exact algorithms
known require exponential computation time. This is an example of the NP-complete
problems that we shall study in Chapter 10. For a large-scale instance these algorithms
cannot be used in practice, and we are forced to make do with an approximate method
(of which this greedy heuristic is among the least effective).
Problem 3.4.1. For a graph G and an ordering a of the nodes of G, let c(G)
be the number of colours used by the greedy algorithm. Let C (G) be the optimal
(smallest) number of colours. Prove the following assertions :
i. (VG)(3(;)[c6(G)=c(G)],
R. (t/(XEIR+)(3G)(3o)[ca(G)1c(G) > a].
In other words, the greedy heuristic may find the optimal solution, but it may also give
an arbitrarily bad answer.
Problem 3:4.2. Find two or three practical problems that can be expressed in
terms of the graph colouring problem.
We know the distances between a certain number of towns. The travelling salesperson
wants to leave one of these towns, to visit each other town exactly once, and to arrive
back at the starting point, having travelled the shortest total distance possible. We
Sec. 3.4 Greedy Heuristics 103
assume that the distance between two towns is never negative. As for the previous
problem, all the known exact algorithms for this problem require exponential time (it is
also NP-complete). Hence they are impractical for large instances.
The problem can be represented using a complete undirected graph with n nodes.
(The graph can also be directed if the distance matrix is not symmetric: see Section
5.6.) One obvious greedy algorithm consists of choosing at each step the shortest
remaining edge provided that
i. it does not form a cycle with the edges already chosen (except for the very last
edge chosen, which completes the salesperson's tour) ;
ii. if chosen, it will not be the third chosen edge incident on some node.
For example, if our problem concerns six towns with the following distance matrix :
From To : 2 3 4 5 6
1 3 10 11 7 25
2 6 12 8 26
3 9 4 20
4 5 15
5 18
edges are chosen in the order (1,2), (3,5), (4,5), (2,3), (4,6), (1,6) to make the
circuit (1, 2, 3, 5, 4, 6, 1) whose total length is 58. Edge (1, 5 ), for example, was
not kept when we looked at it because it would have completed a circuit (1, 2, 3, 5,
1), and also because it would have been the third edge incident on node 5. In this
instance the greedy algorithm does not find an optimal tour since the tour (1, 2, 3, 6,
4, 5, 1) has a total length of only 56.
Problem 3.4.3. What happens to this greedy algorithm if the graph is not
complete, that is, if it is not possible to travel directly between certain pairs of towns ?
Problem 3.4.6. Invent a heuristic greedy algorithm for the case when the dis-
tance matrix is not symmetric.
Problem 3.4.7. There exists a greedy algorithm (which perhaps would better
be called "abstinent") for solving the problem of the knight's tour on a chessboard : at
each step move the knight to the square that threatens the least possible number of
squares not yet visited. Try it !
Problem 3.4.8. In a directed graph a path is said to be Hamiltonian if it
passes exactly once through each node of the graph, but without coming back to the
starting node. Prove that if a directed graph is complete (that is, if each pair of nodes
is joined in at least one direction) then it has a Hamiltonian path, and give an algorithm
for finding such a path in this case.
A discussion of topics connected with Problem 3.1.1 can be found in Wright (1975)
and Chang and Korsh (1976) ; see also Problem 5.8.5 of this book.
The problem of minimal spanning trees has a long history, which is discussed in
Graham and Hell (1985). The first algorithm proposed (which we have not described)
is due to Boruvka (1926). The algorithm to which Prim's name is attached was
invented by Jarnik (1930) and rediscovered by Prim (1957) and Dijkstra (1959).
Kruskal's algorithm comes from Kruskal (1956). Other more sophisticated algorithms
are described in Yao (1975), Cheriton and Tarjan (1976), and Tarjan (1983).
The implementation of Dijkstra's algorithm that takes a time in 0 (n 2) is from
Dijkstra (1959). The details of the improvement suggested in Problem 3.2.12 can be
found in Johnson (1977). Similar improvement for the minimal spanning tree problem
(Problem 3.2.13) is from Johnson (1975). Faster algorithms for both these problems
are given in Fredman and Tarjan (1984); in particular, use of the Fibonacci heap
allows them to implement Dijkstra's algorithm in a time in 0 (a + n log n ). Other
ideas concerning shortest paths can be found in Tarjan (1983).
The solution to Problem 3.4.5 is given in Christofides (1976) ; the same reference
gives an efficient heuristic for finding a solution to the travelling salesperson problem
with a Euclidean distance matrix that is not more than 50% longer than the optimal
tour.
An important greedy algorithm that we have not discussed is used to derive
optimal Huffman codes; see Schwartz (1964). Other greedy algorithms for a variety of
problems are described in Horowitz and Sahni (1978).
4
Divide-and-Conquer
4.1 INTRODUCTION
tB(n)=3tA([n/21)+t(n)<_3c((n+l)/2)2+dn =4cn2+(4c+d)n+4c
The term 4cn 2 dominates the others when n is sufficiently large, which means that
algorithm B is essentially 25% faster than algorithm A. Although this improvement is
not to be sneezed at, nevertheless, you have not managed to change the order of the
time required : algorithm B still takes quadratic time.
105
106 Divide-and-Conquer Chap. 4
To do better than this, we come back to the question posed in the opening para-
graph : how should the subinstances be solved? If they are small, it is possible that
algorithm A may still be the best way to proceed. However, when the subinstances are
sufficiently large, might it not be better to use our new algorithm recursively? The
idea is analogous to profiting from a bank account that compounds interest payments !
We thus obtain a third algorithm C whose implementation runs in time
tA(n) ifn <no
tc (n) _
3tc( [ n/2 1 )+t(n) otherwise
where no is the threshold above which the algorithm is called recursively. This equa-
tion, which is similar to the one in Example 2.3.10, gives us a time in the order of ntg3
which is approximately n 1.59 The improvement compared to the order of n 2 is there-
fore quite substantial, and the bigger n is, the more this improvement is worth having.
We shall see in the following section how to choose no in practice. Although this
choice does not affect the order of the execution time of our algorithm, we are also
concerned to make the hidden constant that multiplies nlg3 as small as possible.
Here then is the general outline of the divide-and-conquer method:
function DQ (x)
{ returns a solution to instance x )
if x is sufficiently small or simple then return ADHOC (x)
decompose x into smaller subinstances x 1, x 2, , xk
fori - l tokdoy1 E-DQ(x,)
recombine the y, 's to obtain a solution y for x
return y ,
where ADHOC, the basic subalgorithm, is used to solve small instances of the
problem in question.
The number of subinstances, k, is usually both small and also independent of the
particular instance to be solved. When k = 1, it is hard to justify calling the technique
divide-and-conquer, and in this case it goes by the name of simplification (see sections
4.3, 4.8, and 4.10). We should also mention that some divide-and-conquer algorithms
do not follow the preceding outline exactly, but instead, they require that the first
subinstance be solved even before the second subinstance is formulated (Section 4.6).
For this approach to be worthwhile a number of conditions are usually required :
it must be possible to decompose an instance into subinstances and to recombine the
subsolutions fairly efficiently, the decision when to use the basic subalgorithm rather
than to make recursive calls must be taken judiciously, and the subinstances should be
as far as possible of about the same size.
After looking at the question of how to choose the optimal threshold, this chapter
shows how divide-and-conquer is used to solve a variety of important problems and
how the resulting algorithms can be analysed. We shall see that it is sometimes pos-
sible to replace the recursivity inherent in divide-and-conquer by an iterative loop.
When implemented in a conventional language such as Pascal on a conventional
Sec. 4.2 Determining the Threshold 107
machine, an iterative algorithm is likely to be somewhat faster than the recursive ver-
sion, although only by a constant multiplicative factor. On the other hand, it may be
possible to save a substantial amount of memory space in this way : for an instance of
size n, the recursive algorithm uses a stack whose depth is often in 1 (log n) and in
bad cases even in S2(n).
Problem 4.2.1. Prove that if we set no= 2k for some given integer k >_ 0,
then for all 1 >_ k the implementation considered previously takes
2k 31-k (32+2k) -21 +5 milliseconds
to solve an instance of size 21.
Problem 4.2.2. Find all the values of the threshold that allow an instance of
size 1024 to be solved in less than 8 minutes. 0
This example shows that the choice of threshold can have a considerable
influence on the efficiency of a divide-and-conquer algorithm. Choosing the threshold
108 Divide-and-Conquer Chap. 4
is complicated by the fact that the best value does not generally depend only on the
algorithm concerned, but also on the particular implementation. Moreover, the
preceding problem shows that, over a certain range, changes in the value of the thres-
hold may have no effect on the efficiency of the algorithm when only instances of
some specific size are considered. Finally, there is in general no uniformly best value
of the threshold : in our example, a threshold larger than 66 is optimal for instances of
size 67, whereas it is best to use a threshold between 33 and 65 for instances of size
66. We shall in future abuse the term "optimal threshold" to mean nearly optimal.
So how shall we choose no? One easy condition is that we must have no > 1 to
avoid the infinite recursion that results if the solution of an instance of size 1 requires
us first to solve a few other instances of the same size. This remark may appear trivial,
but Section 4.6 describes an algorithm for which the ultimate threshold is less obvious,
as Problem 4.6.8 makes clear.
Given a particular implementation, the optimal threshold can be determined
empirically. We vary the value of the threshold and the size of the instances used for
our tests and time the implementation on a number of cases. Obviously, we must
avoid thresholds below the ultimate threshold. It is often possible to estimate an
optimal threshold simply by tabulating the results of these tests or by drawing a few
diagrams. Problem 4.2.2 makes it clear, however, that it is not usually enough simply
to vary the threshold for an instance whose size remains fixed. This approach may
require considerable amounts of computer time. We once asked the students in an
algorithmics course to implement the algorithm for multiplying large integers given in
Section 4.7, in order to compare it with the classic algorithm from Section 1.1. Several
groups of students tried to estimate the optimal threshold empirically, each group using
in the attempt more than 5,000 (1982) Canadian dollars worth of machine time ! On
the other hand, a purely theoretical calculation of the optimal threshold is rarely pos-
sible, given that it varies from one implementation to another.
The hybrid approach, which we recommend, consists of determining theoretically
the form of the recurrence equations, and then finding empirically the values of the
constants used in these equations for the implementation at hand. The optimal thres-
hold can then be estimated by finding the value of n at which it makes no difference,
for an instance of size n, whether we apply the basic subalgorithm directly or whether
we go on for one more level of recursion.
Coming back to our example, the optimal threshold can be found by solving
to (n) = 3tA ([n / 21) + t (n), because tc ([n / 21) = to ([n / 21) if [n / 21 <_ no. The pres-
ence of a ceiling in this equation complicates things. If we neglect this difficulty, we
obtain n = 64. On the other hand, if we systematically replace In / 21 by (n + 1)/2, we
find n = 70. There is nothing surprising in this, since we saw in Problem 4.2.2 that in
fact no uniformly optimal threshold exists. A reasonable compromise, corresponding
to the fact that the average value of Fn / 21 is (2n + 1)/4, is to choose n o = 67 for our
threshold.
Sec. 4.3 Binary Searching 109
* Problem 4.2.3. Show that this choice of no = 67 has the merit of being subop-
timal for only two values of n in the neighbourhood of the threshold. Furthermore,
prove that there are no instances that take more than I % longer with threshold 67 than
they would with any other threshold.
Problem 4.2.4. Let a and b be real positive constants. For each positive real
number s, consider the function f5 : IR* -- 1R* defined by the recurrence
x2 if X <_ s
f(x)= 3fs (x / 2) + bx otherwise.
Prove by mathematical induction that if u = 4b /a and if v is an arbitrary positive real
number, then fu (x) 5 f, (x) for every real number x. Notice that this u is chosen so
that au 2 = 3a (u / 2)2 + bu . (For purists : even if the domain of f, and f is not count-
able, the problem can be solved without recourse to transfinite induction, precisely
because infinite recursion is not a worry.)
In practice, one more complication arises. Supposing, for instance, that to (n) is
quadratic, it may happen that tA (n) = an 2 + bn + c for some constants a, b, and c
depending on the implementation. Although bn + c becomes negligible compared to
an 2 when n is large, the basic subalgorithm is used in fact precisely on instances of
moderate size. It is therefore usually insufficient merely to estimate the constant a.
Instead, measure to (n) a number of times for several different values of n , and then
estimate all the necessary constants, probably using a regression technique.
Problem 4.3.2. Show that the algorithm takes a time in 0( log n) to find x in
T [1 .. n ] whatever the position of x in T.
The algorithm in fact executes only one of the two recursive calls, so that techni-
cally it is an example of simplification rather than of divide-and-conquer. Because the
recursive call is situated dynamically at the very end of the algorithm, it is easy to pro-
duce an iterative version.
Sec. 4.3 Binary Searching 111
situated. On the other hand, a trip round the loop in the variant will take a little longer
to execute on the average than a trip round the loop in the first algorithm. To compare
them, we shall analyse exactly the average number of trips round the loop that each
version makes. Suppose to make life simpler that T contains n distinct elements and
that x is indeed somewhere in T, occupying each possible position with equal proba-
bility. Let A (n) and B (n) be the average number of trips round the loop made by the
first and the second iterative versions, respectively.
Analysis of the First Version. Let k = 1 + Ln / 2j. With probability (k -1)/n ,
x < T [k], which causes the assignment j - k - 1, after which the algorithm starts
over on an instance reduced to k -1 elements. With probability I - (k - 1)/n,
x > T [k], which causes the assignment i E- k, after which the algorithm starts over on
an instance reduced to n -k + 1 elements. One trip round the loop is carried out before
the algorithm starts over, so the average number of trips round the loop is given by the
recurrence
B(1)=0, B(2)=I.
Define a (n) and b (n) as n A (n) and n B (n), respectively. The equations then
become
a(1)=b(1)=0, b(2)=2.
The first equation is easy in the case when n is a power of 2, since it then
reduces to
a(n)=2a(n/2)+n, n ?2
a(1)=0,
which yields a (n) = n lg n using the techniques of Section 2.3. Exact analysis for
arbitrary n is harder. We proceed by constructive induction, guessing the likely form
of the answer and determining the missing parameters in the course of a tentative proof
by mathematical induction. A likely hypothesis, already shown to hold when n is a
power of 2, is that n Llg n j <- a (n) <- n Fig n 1. What might we add to n Llg n j
to arrive at a (n) ? Let n * denote the largest power of 2 that is less than or
equal to n. In particular, Llg n] = lg n *. It seems reasonable to hope that
Sec. 4.3 Binary Searching 113
a (n) = n lg n * + cn + do * .
When n > 1 is of the form 2'-1, then Ln / 2] _ (n -1)/2, (Ln / 2] )* =(n+1)14 and
Fn/21 = (rn / 21)* = n * = (n + 1)/2. To prove HI(n) in this case, it is necessary and
sufficient that
n1gn+1
2
+(c+d)n+d
2 2
= nlgn+l
2
+(c+3d+!)n+(3d+
4 2 4
i);
2
that is
4c + 2d = 4c + 3d + 2 and 2d = 3d+2.
These two equations are not linearly independent. They allow us to conclude that
d = - 2, there still being no constraints on c.
At this point we know that if only we can make the hypothesis
a (n) = n lg n * + cn - 2n *
true for the base n = 1, then we shall have proved by mathematical induction that it is
true for every positive integer n. Our final constraint is therefore
0=a(1)=c -2,
which gives c = 2 and implies that the general solution of the recurrence (*) for a (n)
is
a(n)=nlgn*+2(n-n*).
The average number of trips round the loop executed by the first iterative algorithm for
binary searching, when looking for an element that is in fact present with uniform
probability distribution among the n different elements of an array sorted into
increasing order, is given by A (n) = a (n)l n , that is
A (n) = Llg n] + 2(1- n */ n) .
* Problem 4.3.6. Show that there exists a function it : IN+ 1N+ such that
(n + 1)/3 <- n(n) <- (n + 1)/2 for every positive integer n, and such that the exact solu-
tion of the recurrence is b (n) = n lg n * + n - 2n * + lg n * + 2 - n(n).
* Problem 4.3.7. Show that the function n(n) of the previous exercise is given
by
n */ 2 if 2n < 3n *
lt(n -1) =
jl n-n * otherwise
for all n > 2. Equivalently
lt(n -1) = [n * + On / n *j -2)(2n -3n*)] / 2
We are finally in a position to answer the initial question : which of the two algo-
rithms for binary searching is preferable? By combining the preceding analysis of the
function a (n) with the solution to Problem 4.3.6, we obtain
lt(n)-LngnJ-2 3
A(n)-B(n)=1+
Thus we see that the first algorithm makes on the average less than one and a half trips
round the loop more than the second. Given that the first algorithm takes less time on
the average than the variant to execute one trip round the loop, we conclude that the
Sec. 4.4 Sorting by Merging 115
first algorithm is more efficient than the second on the average whenever n is
sufficiently large. The situation is similar if the element we are looking for is not in
fact in the array. However, the threshold beyond which the first algorithm is preferable
to the variant can be very high for some implementations.
Let T [I .. n ] be an array of n elements for which there exists a total ordering. We are
interested in the problem of sorting these elements into ascending order. We have
already seen that the problem can be solved by selection sorting and insertion sorting
(Section 1.4), or by heapsort (Example 2.2.4 and Problem 2.2.3). Recall that an
analysis both in the worst case and on the average shows that the latter method takes a
time in O(n log n), whereas both the former methods take quadratic time.
The obvious divide-and-conquer approach to this problem consists of separating
the array T into two parts whose sizes are as nearly equal as possible, sorting these
parts by recursive calls, and then merging the solutions for each part, being careful to
preserve the order. We obtain the following algorithm :
procedure mergesort (T [ 1 .. n ])
{ sorts array T into increasing order }
if n is small then insert (T)
else arrays U [ 1 .. n div 2], V [ 1 .. (n + 1) div 2]
U F T[ 1 .. n div 2]
V -T[1+(n div 2) .. n]
mergesort (U) ; mergesort (V )
merge (T,U,V),
where insert (T) is the algorithm for sorting by insertion from Section 1.4, and
merge (T, U, V) merges into a single sorted array T two arrays U and V that are
already sorted.
** Problem 4.4.2. Repeat the previous problem, but without using an auxiliary
array : the sections T [I .. k ] and T [k + 1 .. n ] of an array are sorted independently,
and you wish to sort the whole array T [I .. n I. You may only use a fixed number of
working variables to solve the problem, and your algorithm must work in linear time.
mergesort separates the instance into two subinstances half the size, solves each of
these recursively, and then combines the two sorted half-arrays to obtain the solution to
the original instance.
Let t(n) be the time taken by this algorithm to sort an array of n elements.
Separating T into U and V takes linear time. By the result of Problem 4.4.1, the final
merge also takes linear time. Consequently, t(n) E t(Ln / 2j) + t (Fn / 21) + 0(n). This
equation, which we analysed in Section 2.1.6, allows us to conclude that the time
required by the algorithm for sorting by merging is in O(n log n).
Problem 4.4.3. Rather than separate T into two half-size arrays, we might
choose to separate it into three arrays of size Ln/3], L(n + 1)/3j, and L(n + 2)/3],
respectively, to sort each of these recursively, and then to merge the three sorted
arrays. Give a more formal description of this algorithm, and analyse its execution
time.
The merge sorting algorithm we gave, and those suggested by the two previous
problems, have two points in common. The fact that the sum of the sizes of the subin-
stances is equal to the size of the original instance is not typical of algorithms derived
using divide-and-conquer, as we shall see in several subsequent examples. On the
other hand, the fact that the original instance is divided into subinstances whose sizes
are as nearly as possible equal is crucial if we are to arrive at an efficient algorithm.
To see why, look at what happens if instead we decide to separate T into an array U
with n -1 elements and an array V containing only 1 element. Let t'(n) be the time
required by this variant to sort n items. We obtain t'(n) E t'(n - 1) + t'(1) + O(n).
Simply forgetting to balance the sizes of the subinstances can therefore be disas-
trous for the efficiency of an algorithm obtained using divide-and-conquer.
Problem 4.4.6. This poor sorting algorithm is very like one we have already
seen in this book. Which one, and why?
4.5 QUICKSORT
The sorting algorithm invented by Hoare, usually known as "quicksort", is also based
on the idea of divide-and-conquer. Unlike sorting by merging, the nonrecursive part of
the work to be done is spent constructing the subinstances rather than combining their
solutions. As a first step, this algorithm chooses one of the items in the array to be
sorted as the pivot. The array is then partitioned on either side of the pivot : elements
Sec. 4.5 Quicksort 117
are moved in such a way that those greater than the pivot are placed on its right,
whereas all the others are moved to its left. If now the two sections of the array on
either side of the pivot are sorted independently by recursive calls of the algorithm, the
final result is a completely sorted array, no subsequent merge step being necessary. To
balance the sizes of the two subinstances to be sorted, we would like to use the median
element as the pivot. (For a definition of the median, see Section 4.6.) Unfortunately,
finding the median takes more time than it is worth. For this reason we simply use the
first element of the array as the pivot. Here is the algorithm.
procedure quicksort (T [i .. j ])
{ sorts array T [i .. j ] into increasing order }
if j - i is small then insert (T [i j ]) { Section 1.4 }
else pivot (T [i .. j ],1)
{ after pivoting, i <- k < 1 T [k] 5 T [l]
and 1 <k 5j = T[k]>T[1]}
quicksort(T [i .. I -1])
quicksort (T [1 +1 .. j ])
Problem 4.5.2. Show that in the worst case quicksort requires quadratic time.
Give an explicit example of an array to be sorted that causes such behaviour.
To make this more explicit, let d and no be two constants such that
n-1
t (n) <- do + n t (k) for n > n 0
k=0
An equation of this type is more difficult to analyse than the linear recurrences we saw
in Section 2.3. By analogy with sorting by merging, it is, nevertheless, reasonable to
hope that t(n) will be in 0 (n log n) and to apply constructive induction to look for a
constant c such that t (n) S cn lg n .
To use this approach we need an upper bound on i =no+ 1 i lg i. This is
obtained with the help of a simple lemma. (We suggest you find a graphical interpre-
tation of the lemma.) Let a and b be real numbers, a < b , and let f : [a, b ] - IR be
a nondecreasing function. Let j and k be two integers such that a <- j < k <- b. Then
k-1 k
f(i) < J f(x) dx .
[-lx - Igex2
2 4 ,x=no+1
2
<
2 lgn - Ige n2
provided no > 1.
By combining the modification hinted at in the previous problem with the linear
algorithm from the following section, we can obtain a version of quicksort that takes a
time in 0 (n log n) even in the worst case. We mention this possibility only to point
out that it should be shunned : the hidden constant associated with the "improved" ver-
sion of quicksort is so large that it results in an algorithm worse than heapsort in every
case.
Let T [I .. n ] be an array of integers. What could be easier than to find the smallest
element or to calculate the mean of all the elements ? However, it is not obvious that
the median can be found so easily. Intuitively, the median of T is that element m in T
such that there are as many items in T smaller than m as there are items larger than m.
The formal definition takes care of the possibility that n may be even, or that the
elements of T may not all be distinct. Thus we define m to be the median of T if and
only if m is in T and
# I i E [ I . . n ] I T [ i ] < m } <n/2 and # { i E [ I . . n ] I T [ i ] < m f >_n/2.
120 Divide-and-Conquer Chap. 4
The naive algorithm for determining the median of T consists of sorting the array
into ascending order and then extracting the In / 21th entry. If we use heapsort or
merge sort, this algorithm takes a time in 0 (n log n) to determine the median of n ele-
ments. Can we do better? To answer this question, we consider a more general
problem : selection. Let T be an array of n elements, and let k be an integer between
1 and n. The kth smallest element of T is that element m such that
# { i e [ l .. n i l T[i] <m } <k, whereas # { i e [ I .. n] I T [i ] <- m } >_ k . In other
words, it is the k th item in T if the array is sorted into ascending order. For instance,
the median of T is its In / 21th smallest element.
The following algorithm, which is not yet completely specified, solves the selec-
tion problem in a way suggested by quicksort.
function selection(T [ 1 .. n ], k)
{ finds the k th smallest element of T ;
this algorithm assumes that 1 k <_ n }
if n is small then sort T
return T RI
p - some element of T [I .. n ] { to be specified later)
u - # { i e [l .. n ] I T [i] < p }
v #{i a [1..n]IT[i]Sp }
if k S u then
array U[1 .. u ]
U F the elements of T smaller than p
{ the k th smallest element of T is
also the k th smallest element of U }
return selection(U, k)
if k <- v then { got it ! } return p
otherwise { k > v }
array V[1 .. n -v ]
V - the elements of T larger than p
{ the k th smallest element of T is
also the (k-v )th smallest of V }
return selection(V, k -v )
Problem 4.6.1. Generalize the notion of pivoting from Section 4.5 to partition
the array T into three sections, T [I .. i -1 ], T [i .. j], and T [j + 1 .. n ], containing
the elements of T that are smaller than p, equal to p, and greater than p, respectively.
The values i and j should be returned by the pivoting procedure, not calculated before-
hand. Your algorithm should scan T once only, and no auxiliary arrays should be
used.
Problem 4.6.2. Using ideas from the iterative version of binary searching seen
in Section 4.3 and the pivoting procedure of the previous problem, give a nonrecursive
version of the selection algorithm. Do not use any auxiliary arrays. Your algorithm is
allowed to alter the initial order of the elements of T.
Sec. 4.6 Selection and the Median 121
Which element of T should we use as the pivot p ? The natural choice is surely
the median of T, so that the sizes of the arrays U and V will be as similar as possible
(even if at most one of these two arrays will actually be used in a recursive call).
Suppose first that the median can be obtained by magic at unit cost. For the time
being, therefore, the algorithm works by simplification, not by divide-and-conquer. To
analyse the efficiency of the selection algorithm, notice first that, by definition of the
median, u < [n / 21 and v ? [n / 21. Consequently, n - v <- Ln / 2]. If there is a recur-
sive call, the arrays U and V therefore contain a maximum of Ln / 2] elements. The
remaining operations, still supposing that the median can be obtained magically, take a
time in O (n). Let tm (n) be the time required by this method in the worst case to find
the k th smallest element of an array of at most n elements, independently of the value
of k. We have tm(n)E0(n)+max{ tm(i) I i 5 Ln/2] }.
Problem 4.6.4. Show that tm (n) is in O (n).
Problem 4.6.6. Show, however, that in the worst case this algorithm requires
quadratic time.
This quadratic worst case can be avoided without sacrificing linear behaviour on
the average : the idea is to find quickly a good approximation to the median. This can
be done with a little cunning. Assuming n ? 5, consider the following algorithm :
function pseudomed (T [1 .. n ])
{ finds an approximation to the median of array T)
s - n div 5
array S [I .. s ]
for i E- 1 to s do S [i] t- adhocmed5(T [5i-4 .. 5i ])
return selection(S, (s + 1) div 2) ,
where adhocmed 5 is an algorithm specially designed to find the median of exactly five
elements. Note that the time taken by adhocmed5 is bounded above by a constant.
We look first at the value of the approximation to the median found by the algo-
rithm pseudomed. Let m be this approximation. Since m is the exact median of the
array S, we have
#{ie[l..s]IS[i]<-m }> Is/21.
But each element of S is the median of five elements of T. Consequently, for every i
such that S [i ] <- m , there are three i 1, i2, i3 between Si - 4 and 5i such that
T [i 1] <_ T [i 2] <_ T [i 3] = S [i ] m. Therefore
# { i E [ 1 .. n ] I T[i]m }>31s/21=31Ln/5]/21>(3n-12)/10.
Let n be the number of elements in T, and let t (n) be the time required in the worst
case by this algorithm to find the k th smallest element of T, still independently of
the value of k. At the first step, calculating pseudomed (T) takes a time in
O (n) + t( Ln / 5]), because the array S can be constructed in linear time. Calculating u
and v also takes linear time. Problem 4.6.7 and the preceding discussion show that
u <- (7n - 3)/ 10 and v >- (3n - 12)/10, so n -v <- (7n + 12)/10. The recursive call that
may follow therefore takes a time bounded above by
max { t (i) I i <- (7n + 12)/10 } .
The initial preparation of the arrays U and V takes linear time. Hence, there exists a
constant c such that
t(n)<-t(Ln/5j)+max{t(i)Ii <-(7n+12)/10}+cn
for every sufficiently large n.
This equation looks quite complicated. First let us solve a more general, yet
simpler problem of the same type.
* Problem 4.6.9. Let p and q be two positive real constants such that p + q < 1,
let no be a positive integer, and let b be some positive real constant. Let f : IN -* IR*
be any function such that
f(n) =.f (LpnJ) +.f (Lqn]) + bn
for every n > n 0. Use constructive induction to prove that f (n) E O(n). l7
Problem 4.6.10. Let t (n) be the time required in the worst case to find the k th
smallest element in an array of n elements using the selection algorithm discussed ear-
lier. Give explicitly a nondecreasing function f(n) defined as in Problem 4.6.9 (with
p =1/5 and q = 3/4) such that t(n) <_ f(n) for every integer n. Conclude that
t(n)eO(n). Argue that t(n)ES2(n), and thus t(n)EO(n).
tice, even though this does not constitute an iterative algorithm : it avoids calculating u
and v beforehand and using two auxiliary arrays U and V. To use still less auxiliary
space, we can also construct the array S (needed to calculate the pseudomedian) by
exchanging elements inside the array T itself.
In most of the preceding analyses we have taken it for granted that addition and multi-
plication are elementary operations, that is, the time required to execute these opera-
tions is bounded above by a constant that depends only on the speed of the circuits in
the computer being used. This is only reasonable if the size of the operands is such
that they can be handled directly by the hardware. For some applications we have to
consider very large integers. Representing these numbers in floating-point is not useful
unless we are concerned solely with the order of magnitude and a few of the most
significant figures of our results. If results have to be calculated exactly and all the
figures count, we are obliged to implement the arithmetic operations in software.
This was necessary, for instance, when the Japanese calculated the first 134 mil-
lion digits of It in early 1987. (At the very least, this feat constitutes an excellent
aerobic exercice for the computer !) The algorithm developed in this section is not,
alas, sufficiently efficient to be used with such operands (see Chapter 9 for more on
this). From a more practical point of view, large integers are of crucial importance in
cryptology (Section 4.8).
Problem 4.7.1. Design a good data structure for representing large integers on
a computer. Your representation should use a number of bits in O (n) for an integer
that can be expressed in n decimal digits. It must also allow negative numbers to be
represented, and it must be possible to carry out in linear time multiplications and
integer divisions by positive powers of 10 (or another base if you prefer), as well as
additions and subtractions.
Problem 4.7.2. Give an algorithm able to add an integer with m digits and an
integer with n digits in a time in O(m + n).
each of these operands into two parts of as near the same size as possible :
u = 10S w + x and v =10'y +z, where 0<_ x< 10', 0!5z< 10S , and s = Ln / 2j.
The integers w and y therefore both have Fn/21 digits. See Figure 4.7.1. (For con-
venience, we say that an integer has j digits if it is smaller than 10j, even if it is not
greater than or equal to 10i -1.)
u w x
V y z
uv = 102swy + 10S(wz+xy)+xz
We obtain the following algorithm.
Let tb (n) be the time required by this algorithm in the worst case to multiply two
n digit integers. If we use the representation suggested in Problem 4.7.1 and the algo-
rithms of Problem 4.7.2, the integer divisions and multiplications by 102s and 10',
as well as the additions, are executed in linear time. The same is true of the modulo
operations, since these are equivalent to an integer division, a multiplication,
and a subtraction. The last statement of the algorithm consists of four recursive
calls, each of which serves to multiply two integers whose size is about n12. Thus
tb (n) E 3tb ([n / 21) + tb (Ln / 2j) + O(n). This equation becomes tb (n) E 4tb (n / 2)
+ O(n) when n is a power of 2. By Example 2.3.7 the time taken by the preceding
algorithm is therefore quadratic, so we have not made any improvement compared to
the classic algorithm. In fact, we have only managed to increase the hidden constant !
The trick that allows us to speed things up consists of calculating wy, wz +xy,
and xz by executing less than four half-length multiplications, even if this means that
126 Divide-and-Conquer Chap. 4
we have to do more additions. This is sensible because addition is much faster than
multiplication when the operands are large. Consider the product
r =(w+x)(y+z)=wy +(wz+xy)+xz .
After only one multiplication this includes the three terms we need in order to calculate
uv. Two other multiplications are needed to isolate these terms. This suggests we
should replace the last statement of the algorithm by
r E- mult (w +x, y +z)
p <- mult(w,y);q -mult(x,z)
return 102sp + l0s (r -p -q) + q
Let t(n) be the time required by the modified algorithm to multiply two integers
of size at most n . Taking account of the fact that w + x and y + z may have up to
1 + Fn / 21 digits, we find that there exist constants c E IIt+ and no E IN such that
** Problem 4.7.5. Rework Problem 2.2.11 (analysis of the algorithm fib3) in the
context of our new multiplication algorithm, and compare your answer once again
to Example 2.2.8.
At first glance this algorithm seems to multiply two n digit numbers in a time in the
order of n a, where a = 1 + (lg lg n) / lg n, that is, in a time in 0 (n log n). Find two
fundamental errors in this analysis of supermul.
Although the idea tried in Problem 4.7.10 does not work, it is nevertheless pos-
sible to multiply two n digit integers in a time in 0 (n log n log log n) by separating
each operand to be multiplied into about parts of about the same size and using
Fast Fourier Transforms (Section 9.5).
128 Divide-and-Conquer Chap. 4
4.8 EXPONENTIATION :
AN INTRODUCTION TO CRYPTOLOGY
Alice and Bob do not initially share any common secret information. For some reason
they wish to establish such a secret. Their problem is complicated by the fact that the
only way they can communicate is by using a telephone, which they suspect is being
tapped by Eve, a malevolent third party. They do not want Eve to be privy to their
newly exchanged secret. To simplify the problem, we assume that, although Eve can
overhear conversations, she can neither add nor modify messages on the communica-
tions line.
** Problem 4.8.1. Find a protocol by which Alice and Bob can attain their ends.
(If you wish to think about the problem, delay reading the rest of this section!)
A first solution to this problem was given in 1976 by Diffie and Hellman.
Several other protocols have been proposed since. As a first step, Alice and Bob agree
openly on some integer p with a few hundred decimal digits, and on some other integer
g between 2 and p - 1. The security of the secret they intend to establish is not
compromised should Eve learn these two numbers.
At the second step Alice and Bob choose randomly and independently of each
other two positive integers A and B less than p. Next Alice computes a =.g A mod p
and transmits this result to Bob ; similarly, Bob sends Alice the value b = g B mod p.
Finally, Alice computes x = b A mod p and Bob calculates y = a B mod p. Now x = y
since both are equal to g' mod p. This value is therefore a piece of information
shared by Alice and Bob. Clearly, neither of them can control directly what this value
will be. They cannot therefore use this protocol to exchange directly a message chosen
beforehand by one or the other of them. Nevertheless, the secret value exchanged can
now be used as the key in a conventional cryptographic system.
exchanged by the protocol could take on any value between 1 and p - 1. Prove, how-
ever, that regardless of the choice of p and g some secrets are more likely to be chosen
than others, even if A and B are chosen randomly with uniform probability between 1
andp -1.
At the end of the exchange Eve has been able to obtain directly the values of p,
g, a, and b only. One way for her to deduce x would be to find an integer A' such
that a = g A'mod p , and then to proceed like Alice to calculate x' = b A' mod p . If p is
an odd prime, g a generator of ZP , and 1 <- A' < p , then A' is necessarily equal to A,
and sox =x and the secret is correctly computed by Eve in this case.
Problem 4.8.3. Show that even if A #A', still b A mod p =b A' mod p pro-
vided that g A mod p = g A' mod p and that there exists a B such that b = g B mod p.
The value x' calculated by Eve in this way is therefore always equal to the value x
shared by Alice and Bob.
Calculating A' from p, g and a is called the problem of the discrete logarithm.
There exists an obvious algorithm to solve it. (If the logarithm does not exist, the
algorithm returns the value p. For instance, there is no integer A such that
3=2Amod7.)
function dlog (g, a, p)
A <- O;x 4- l
repeat
A<-A+1
x 4- xg
until (a = x mod p) or (A = p)
return A
This algorithm takes an unacceptable amount of time, since it makes p 12 trips round
the loop on the average when the conditions of Problem 4.8.2 hold. If each trip round
the loop takes 1 microsecond, this average time is more than the age of Earth even if p
only has two dozen decimal digits. Although there exist other more efficient algo-
rithms for calculating discrete logarithms, none of them is able to solve a randomly
chosen instance in a reasonable amount of time when p is a prime with several hundred
decimal digits. Furthermore, there is no known way of recovering x from p, g, a, and
b that does not involve calculating a discrete logarithm. For the time being, it seems
therefore that this method of providing Alice and Bob with a shared secret is sound,
although no one has yet been able to prove this.
An attentive reader may wonder whether we are pulling his (or her) leg. If Eve
needs to be able to calculate discrete logarithms efficiently to discover the secret shared
by Alice and Bob, it is equally true that Alice and Bob must be able to calculate
efficiently exponentiations of the form a = g A mod p. The obvious algorithm for this
is no more subtle or efficient than the one for discrete logarithms.
130 Divide-and-Conquer Chap. 4
function dexpo l (g , A , p )
a<-- 1
for i F- 1 to A do a E- ag
return a mod p
The fact that x y z mod p = ((x y mod p) x z ) mod p for every x, y, z, and p allows us
to avoid accumulation of extremely large integers in the loop. (The same improvement
can be made in dlog, which is necessary if we hope to execute each trip round the loop
in 1 microsecond.)
function dexpo2(g, A , p )
a 4- 1
for i - 1 to A do a F- ag mod p
return a
* Problem 4.8.4. Analyse and compare the execution times of dexpol and
dexpo2 as a function of the value of A and of the size of p. For simplicity, suppose
that g is approximately equal to p/2. Use the classic algorithm for multiplying large
integers. Repeat the problem using the divide-and-conquer algorithm from Section 4.7
for the multiplications. In both cases, assume that calculating a modulo takes a time in
the exact order of that required for multiplication. 0
Happily for Alice and Bob, there exists a more efficient algorithm for computing
the exponentiation. An example will make the basic idea clear.
x25 = (((x2x)2)2)2x
Thus x25 can be obtained with just two multiplications and four squarings. We leave
the reader to work out the connection between 25 and the sequence of bits 11001
obtained from the expression (((x2x)2x 1)2x1)2x by replacing every x by a 1 and every
1bya0.
The preceding formula for x 25 arises because x 25 = x 24x , x 24 = (x 12)2, and so on.
This idea can be generalized to obtain a divide-and-conquer algorithm.
function dexpo (g , A , p )
if A = 0 then return 1
if A is odd then a - dexpo(g,A-l,p)
return (ag mod p)
else a f- dexpo(g,A/2,p)
return (a2 mod p)
Let h (A) be the number of multiplications modulo p carried out when we calcu-
late dexpo (g , A , p ), including the squarings. These operations dominate the execution
time of the algorithm, which consequently takes a time in O (h (A) x M (p )), where
M (p) is an upper bound on the time required to multiply two positive integers less
than p and to reduce the result modulo p. By inspection of the algorithm we find
Sec. 4.8 Exponentiation : An Introduction to Cryptology 131
0 ifA =0
h(A)= l + h (A -1) if A is odd
1 + h (A /2) otherwise .
Problem 4.8.5. Find an explicit formula for h (A) and prove your answer by
mathematical induction. (Do not try to use characteristic equations.)
Without answering Problem 4.8.5, let us just say that h (A) is situated between
once and twice the length of the binary representation of A, provided A >_ 1. This
means that Alice and Bob can use numbers p, A and B of 200 decimal digits each and
still finish the protocol after less than 3,000 multiplications of 200-digit numbers and
3,000 computations of a 400-digit number modulo a 200-digit number, which is
entirely reasonable. More generally, the computation of dexpo (g , A , P) takes a time
in 0 (M (p ) x log A ). As was the case for binary searching, the algorithm dexpo only
requires one recursive call on a smaller instance. It is therefore an example of
simplification rather than of divide-and-conquer. This recursive call is not at the
dynamic end of the algorithm, which makes it harder to find an iterative version.
Nonetheless, there exists a similar iterative algorithm, which corresponds intuitively to
calculating x25 as x 16x8x 1.
function dexpoiter (g , A , p )
n F- A;y F- g;a - 1
while n > 0 do
if n is odd then a - ay mod p
y <__ y2modp
n(-ndiv2
return a
Problem 4.8.6. The algorithms dexpo and dexpoiter do not minimize the
number of multiplications (including squarings) required. For example, dexpo calcu-
lates x 15 as (((lx)2x)2x)2x, that is, with seven multiplications. On the other hand, dex-
poiter calculates x 15 as 1 x x 2x 4x 8, which involves eight multiplications (the last being
a useless computation of x 16). In both cases the number of multiplications can easily
be reduced to six by avoiding pointless multiplications by the constant 1 and the last
squaring carried out by dexpoiter. Show that in fact x 15 can be calculated with only
five multiplications.
take when the classic multiplication algorithm is used? Rework the problem using the
divide-and-conquer multiplication algorithm from Section 4.7.
The preceding problem shows that it is sometimes not sufficient to be only half-
clever !
Let A and B be two n x n matrices to be multiplied, and let C be their product. The
classic algorithm comes directly from the definition :
It is therefore possible to multiply two 2 x 2 matrices using only seven scalar multipli-
cations, At first glance, this algorithm does not look very interesting : it uses a large
number of additions and subtractions compared to the four additions that are sufficient
for the classic algorithm.
If we now replace each entry of A and B by an n x n matrix, we obtain an algo-
rithm that can multiply two 2n x 2n matrices by carrying out seven multiplications of
n x n matrices, as well as a number of additions and subtractions of n x n matrices.
This is possible because the basic algorithm does not rely on the commutativity of
scalar multiplication. Given that matrix additions can be executed much faster than
matrix multiplications, the few additional additions compared to the classic algorithm
are more than compensated by saving one multiplication, provided n is sufficiently
large.
i ifi=j
T(i,j)= j + T(ij,j) if i > j
i +T(i,j-i) if i < j
For instance, Figure 4.10.2 shows how a block of three elements and a block of eight
elements are transposed. We have
Sec. 4.10 Exchanging Two Sections of an Array 135
T(i,j)=i + j -gcd(i,j),
where gcd(i, j) denotes the greatest common divisor of i and j (Section 1.7.4).
m elements in elements
exchange
a c d e I g h i j k
exchange (1, 9, 3)
J k d e I g h a b c
exchange (1, 6, 3)
I h J k a b c
exchange (1, 4, 2)
d e I h I a c
exchange (3, 5, 1)
e h a c
exchange (3, 4, 1)
d e g h i b c
F=(011.
Ill 1
Let i and j be any two integers. What is the product of the vector (i , j) and the matrix
F? What happens if i and j are two consecutive numbers from the Fibonacci
sequence? Use this idea to invent a divide-and-conquer algorithm to calculate this
sequence. Does this help you to understand the algorithm fib3 of Section 1.7.5 ?
assume that n is a power of 2. Exactly how many comparisons does your algorithm
require? How would you handle the situation when n is not a power of 2 ?
Problem 4.11.7. If you could not manage the previous problem, try again, but
allow your algorithm to take a time in 0 (n log n).
i. Using full adders and (i , j )-adders as primitive elements, show how to build an
efficient n-tally. You may not suppose that n has any special form.
ii. Give the recurrence, including the initial conditions, for the number of
3-tallies needed to build your n-tally. Do not forget to count the 3-tallies that are
part of any (i , j )-adders you might have used.
iii. Using the O notation, give the simplest expression you can for the number of
3-tallies needed in the construction of your n-tally. Justify your answer.
4 N
A A A A
B B B B
3 1
first 1 3 3 3 3 2
sorted 2
group 6 4 4 4 2 L37:
of inputs
6 4 It 4
sorted
2 5 5 outputs
5
second 2 5 5 5 6 6
sorted 2 6
group 7 7 7 7 7 7
4 7
of inputs
3 2 2 1
3 1
2 3 1 2 2 2
2 2
4
4 1 /\3-3 3 3
3
1 4 4 4
1 4
whenever n is sufficiently large. You may use merge circuits to your heart's content,
but their depth and size must then be taken into account. Give recurrences, including
the initial conditions, for the size T and the depth P of your circuit S, . Solve these
equations exactly, and express T and P in O notation as simply as possible. 0
** Problem 4.11.12. Continuing the two previous problems, show that it is pos-
sible to construct a sorting circuit for n elements whose size and depth are in
0(n log n) and 0(logn), respectively. O
Player
1 2 3 4 5
n=5 Day
1 2 1 - 5 4
2 3 5 1 - 2
3 4 3 2 1 -
4 5 - 4 3 1
- 4 5 2 3
Player
1 2 3 4 5 6
n=6 Day
1 2 1 6 5 4 3
2 3 5 1 6 2 4
3 4 3 2 1 6 5
4 5 6 4 3 1 2
5 6 4 5 2 3 1
Moreover, each competitor must play exactly one match every day, with the possible
exception of a single day when he does not play at all.
** Problem 4.11.14. Closest pair of points. You are given the coordinates of n
points in the plane. Give an algorithm capable of finding the closest pair
of points in a time in O (n log n ).
Quicksort is from Hoare (1962). Mergesort and quicksort are discussed in detail in
Knuth (1973). Problem 4.4.2 was solved by Kronrod; see the solution to Exercise 18
of Section 5.2.4 of Knuth (1973). The algorithm linear in the worst case for selection
and for finding the median is due to Blum, Floyd, Pratt, Rivest, and Tarjan (1972).
The algorithm for multiplying large integers in a time in 0 (n 1.59) is attributed to
Karatsuba and Ofman (1962). The answer to Problems 4.7.7 and 4.7.8 is discussed in
Knuth (1969). The survey article by Brassard, Monet, and Zuffellato (1986) covers
computation with very large integers.
The algorithm that multiplies two n x n matrices in a time in O (n 2.81) comes
from Strassen (1969). Subsequent efforts to do better than Strassen's algorithm began
with the proof by Hopcroft and Kerr (1971) that seven multiplications are necessary to
multiply two 2x2 matrices in a non-commutative structure ; the first positive success
was obtained by Pan (1978), and the algorithm that is asymptotically the most efficient
known at present is by Coppersmith and Winograd (1987). The algorithm of Section
4.10 comes from Gries (1981).
The original solution to Problem 4.8.1 is due to Diffie and Hellman (1976). The
importance for cryptology of the arithmetic of large integers and of the theory of
numbers was pointed out by Rivest, Shamir, and Adleman (1978). For more informa-
tion about cryptology, consult the introductory papers by Gardner (1977) and Hellman
(1980) and the books by Kahn (1967), Denning (1983), Kranakis (1986), and Brassard
(1988). Bear in mind, however, that the cryptosystem based on the knapsack problem,
as described in Hellman (1980), has since been broken. Notice too that the integers
involved in these applications are not sufficiently large for the algorithms of Section
4.7 to be worthwhile. On the other hand, efficient exponentiation as described in Sec-
tion 4.8 is crucial. The natural generalization of Problem 4.8.6 is examined in Knuth
(1969).
Sec. 4.12 References and Further Reading 141
The solution to Problem 4.11.1 can be found in Gries and Levin (1980) and
Urbanek (1980). Problem 4.11.3 is discussed in Pohl (1972) and Stinson (1985).
Problems 4.11.10 and 4.11.11 are solved in Batcher (1968). Problem 4.11.12 is
solved, at least in principle, in Ajtai, Komlos, and Szemeredi (1983). Problem 4.11.14
is solved in Bentley and Shamos (1976), but consult Section 8.7 for more on this
problem.
5
Dynamic Programming
5.1 INTRODUCTION
In the last chapter we saw that it is often possible to divide an instance into subin-
stances, to solve the subinstances (perhaps by further dividing them), and then to com-
bine the solutions of the subinstances so as to solve the original instance. It sometimes
happens that the natural way of dividing an instance suggested by the structure of the
problem leads us to consider several overlapping subinstances. If we solve each of
these independently, they will in turn create a large number of identical subinstances.
If we pay no attention to this duplication, it is likely that we will end up with an
inefficient algorithm. If, on the other hand, we take advantage of the duplication and
solve each subinstance only once, saving the solution for later use, then a more
efficient algorithm will result. The underlying idea of dynamic programming is thus
quite simple : avoid calculating the same thing twice, usually by keeping a table of
known results, which we fill up as subinstances are solved.
Dynamic programming is a bottom-up technique. We usually start with the
smallest, and hence the simplest, subinstances. By combining their solutions, we
obtain the answers to subinstances of increasing size, until finally we arrive at the solu-
tion of the original instance. Divide-and-conquer, on the other hand, is a top-down
method. When a problem is solved by divide-and-conquer, we immediately attack the
complete instance, which we then divide into smaller and smaller subinstances as the
algorithm progresses.
142
Sec. 5.1 Introduction 143
+
Inkl j, 0<k <n
n
k
otherwise .
If we calculate [ k] directly by
function C (n , k )
if k = 0 or k = n then return 1
else return C (n - l , k - 1) + C (n - 1, k)
many of the values C (i, j), i < n , j < k are calculated over and over. Since the final
result is obtained by adding up a certain number of Is, the execution time of this algo-
rithm is certainly in S2( [ k 1). We have already met a similar phenomenon in algo-
rithm fib] for calculating the Fibonacci sequence (see Section 1.7.5 and Example
2.2.7).
If, on the other hand, we use a table of intermediate results (this is of course
Pascal's triangle ; see Figure 5.1.1), we obtain a more efficient algorithm. The table
should be filled line by line. In fact, it is not even necessary to store a matrix : it is
sufficient to keep a vector of length k, representing the current line, which we update
from left to right. Thus the algorithm takes a time in 0 (nk) and space in 0 (k), if we
assume that addition is an elementary operation.
0 1 2 3 ... k - I k
0
1 1 1
2 1 2 1
C(n-1,k-1)C(n-1,k)
C(n, k) Figure 5.1.1. Pascal's triangle.
Problem 5.1.1. Prove that the total number of recursive calls made during the
computation of C (n, k) is exactly 2 [ n -2-
Example 5.1.2. If the shortest route from Montreal to Toronto goes via
Kingston, then that part of the journey from Montreal to Kingston must also follow the
shortest route between these two cities : the principle of optimality applies. However,
if the fastest way to drive from Montreal to Toronto takes us first to Kingston, it does
not follow that we should drive from Montreal to Kingston as quickly as possible : if
we use too much petrol on the first half of the trip, maybe we have to stop to fill up
somewhere on the second half, losing more time than we gained by driving hard. The
subtrips Montreal-Kingston and Kingston-Toronto are not independent, and the prin-
ciple of optimality does not apply.
Problem 5.1.3. Show that the principle of optimality does not apply to the
problem of finding the longest simple path between two cities. Argue that this is due
to the fact that one cannot in general splice two simple paths together and expect to
obtain a simple path. (A path is simple if it never passes through the same place twice.
Without this restriction the longest path might be an infinite loop.)
The principle of optimality can be restated as follows for those problems for
which it applies : the optimal solution to any nontrivial instance is a combination of
optimal solutions to some of its subinstances. The difficulty in turning this principle
into an algorithm is that it is not usually obvious which subinstances are relevant to the
instance under consideration. Coming back to Example 5.1.2, it is not immediately
obvious that the subinstance consisting of finding the shortest route from Montreal to
Ottawa is irrelevant to the shortest route from Montreal to Toronto. This difficulty
prevents us from using a divide-and-conquer approach that would start from the ori-
ginal instance and recursively find optimal solutions precisely to those relevant subin-
stances. Instead, dynamic programming efficiently solves every possible subinstance in
order to figure out which are in fact relevant, and only then are these combined into an
optimal solution to the original instance.
As our first example of dynamic programming, let us not worry about the principle of
optimality, but rather concentrate on the control structure and the order of resolution of
the subinstances. For this reason the problem considered in this section is not one of
optimization.
Imagine a competition in which two teams A and B play not more than 2n -1
games, the winner being the first team to achieve n victories. We assume that there
are no tied games, that the results of each match are independent, and that for any
given match there is a constant probability p that team A will be the winner and hence
a constant probability q = 1 -p that team B will win.
Let P (i, j) be the probability that team A will win the series given that they still
need i more victories to achieve this, whereas team B still needs j more victories if
they are to win. For example, before the first game of the series the probability that
Sec. 5.2 The World Series 145
team A will be the overall winner is P (n , n) : both teams still need n victories. If
team A has already won all the matches it needs, then it is of course certain that they
will win the series : P (0, i) = 1, 1 <- i < n. Similarly P (i, 0) = 0, 1 < i <- n. P (0, 0) is
undefined. Finally, since team A wins any given match with probability p and loses it
with probability q,
P(i,j)=pP(i-1,j)+gP(i,j-1) i ? l,j ? 1
Thus we can compute P (i, j) using
function P (i, j)
if i = 0 then return 1
else if j = 0 then return 0
else return pP (i -1, j) + qP (i, j -1)
Let T (k) be the time needed in the worst case to calculate P (i, j), where k = i +j.
With this method, we see that
T(1)=c
T(k):5 2T(k - 1) + d , k>1
where c and d are constants. T (k) is therefore in 0 (2k) = 0 (4") if i = j = n. In
fact, if we look at the way the recursive calls are generated, we find the pattern shown
in Figure 5.2.1, which is identical to that followed by the naive calculation of the bino-
mial coefficient C (i + j , j) = C ((i -1) + j , j) + C (i + (j -1), j -1). The total
number of recursive calls is therefore exactly 2(' ' I - 2 (Problem 5.1.1). To calculate
the probability P (n , n) that team A will win given that the series has not yet started,
the required time is thus in S2( [ 2n ll
n JJ
that call
P(i - 2, j) P(i - 1, j - 1) P(i, j - 2) k - 2 matches left
etc.
* Problem 5.2.2. Prove that in fact the time needed to calculate P (n , n) using
the preceding algorithm is in 0(4"I[ ).
function series (n , p )
array P[0..n,0..n]
qE-1-p
for s F- I to n do
P[0,s]- 1;P[s,0]-0
fork - 1 to s-1 do
P[k,s-k] <- pP[k-l,s-k] + gP[k,s -k - 1]
for s 1 to n do
fork - 0 to n -s do
P[s+k,n-k] - pP[s+k-1,n-k]+qP[s+k,n-k-1]
return P [n, n ]
Problem 5.2.3. Using this algorithm, calculate the probability that team A
will win the series if p = 0.45 and if four victories are needed to win.
Since in essence the algorithm has to fill up an n x n array, and since a constant
time is required to calculate each entry, its execution time is in 0(n 2).
Problem 5.2.5. Show how to compute P (n, n) in a time in O(n). (Hint : use a
completely different approach - see Section 8.6.)
M=(...((M1M2)M3)...Mn)
The choice of a method of computation can have a considerable influence on the time
required.
Adding the initial condition T (l) = 1, we can thus calculate all the values of T.
Among other values, we find
n 1 2 3 4 5 10 15
T (n) 1 1 2 5 14 4,862 2,674,440.
The values of T (n) are called Catalan numbers.
m24=min(m22+m34+5x89x34, m23+m44+5x3x34)
= min(24208, 1845) = 1,845
Finally, for s = 3
m14=min({k=1} mll+m24+13x5x34,
{k=2} m12 m34+ 13x89x34 ,
{k=3} m +m44+ 13x3x34)
= min(4055,5 01, 2856) = 2,856 .
s=2
3
S=1
4
s=0
Problem 5.3.4. How must the algorithm be modified if we want not only to
calculate the value of m In , but also to know how to calculate the product M in the
most efficient way ?
For s > 0, there are n - s elements to be computed in the diagonal s ; for each,
we must choose between s possibilities (the different possible values of k). The exe-
cution time of the algorithm is therefore in the exact order of
n-1 n-1 n-I
(n-s)s =n2:s - y s2
s=l s=1 s=l
= (n 3-n )/6
where the array d [0.. n ] is global, a call on minmat (1, n) takes a time in O(3" ). (Hint:
for the "0 " part, use ccnstructive induction to find constants a and b such that the
time taken by a call on minmat (1, n) is no greater than a 3" - b.)
Although a call on the recursive minmat (1, n) of Problem 5.3.5 is faster than
naively trying all possible ways to parenthesize the desired product, it is still much
slower than the dynamic programming algorithm described previously. This behaviour
illustrates a point made in the first paragraph of this chapter. In order to decide on the
best way to parenthesize the product ABCDEFG, minmat recursively solves 12 subin-
stances, including the overlapping ABCDEF and BCDEFG, both of which recursively
solve BCDEF from scratch. It is this duplication of effort that causes the inefficiency
of minmat.
Let G = <N, A > be a directed graph ; N is the set of nodes and A is the set of
edges. Each edge has an associated nonnegative length. We want to calculate the
length of the shortest path between each pair of nodes. (Compare this to Section 3.2.2,
where we were looking for the length of the shortest paths from one particular node,
the source, to all the others.)
As before, suppose that the nodes of G are numbered from 1 to n,
N = { 1, 2, ... , n }, and that a matrix L gives the length of each edge, with L [i , i ] = 0,
L [i, j] ? 0 if i # j, and L [i, j] = oc if the edge (i, j) does not exist.
The principle of optimality applies : if k is a node on the shortest path from i to
j, then that part of the path from i to k, and that from k to j, must also be optimal.
We construct a matrix D that gives the length of the shortest path between each
pair of nodes. The algorithm initializes D to L. It then does n iterations. After itera-
tion k , D gives the length of the shortest paths that only use nodes in 11, 2, ... , k) as
intermediate nodes. After n iterations we therefore obtain the result we want. At itera-
tion k, the algorithm has to check for each pair of nodes (i, j) whether or not there
exists a path passing through node k that is better than the present optimal path passing
only through nodes in { 1, 2, ... , k -1 }. Let Dk be the matrix D after the k th itera-
tion. The necessary check can be written as
Sec. 5.4 Shortest Paths 151
Dk[i,j] =min(Dk_i[i,j],Dk-i[i,k]+Dk-1[k,j]),
where we make use of the principle of optimality to compute the length of the shortest
path passing through k. We have also implicitly made use of the fact that an optimal
path through k does not visit k twice.
At the k th iteration the values in the k th row and the k th column of D do not
change, since D [k , k ] is always zero. It is therefore not necessary to protect these
values when updating D. This allows us to get away with using only a two-
dimensional matrix D, whereas at first sight a matrix n x n x 2 (or even n x n x n) seems
necessary.
The algorithm, known as Floyd's algorithm, follows.
15
0 5 oo o
50 0 15 5
Do=L = 30 00 0 15
15 00 5 0
0 5 00 00 0 5 20 10
50 0 15 5 50 0 15 5
D 1 = 30 35 0 15
D2 = 30 35 0 15
15 20 5 0 15 20 5 0
0 5 20 10 0 5 15 10
45 0 15 5 20 0 10 5
D 3 =
30 35 0 15
D4 = 30 35 0 15
15 20 5 0 15 20 5 0
When the algorithm stops, P [i, j ] contains the number of the last iteration that caused
a change in D [i, j]. To recover the shortest path from i to j, look at P [i, j]. If
P [i, j] = 0, the shortest path is directly along the edge (i, j) ; otherwise, if
P [i, j ] = k, the shortest path from i to j passes through k. Look recursively at
P [i, k ] and P [k, j] to find any other intermediate nodes along the shortest path.
Sec. 5.4 Shortest Paths 153
0 0 4 2
4 0 4 0
P = 0 1 0 0
0 1 0 0
Since P [1, 3] = 4, the shortest path from 1 to 3 passes through 4. Looking now at
P [1, 4] and P [4, 3], we discover that between 1 and 4 we have to go via 2, but that
from 4 to 3 we proceed directly. Finally we see that the trips from 1 to 2 and from 2
to 4 are also direct. The shortest path from I to 3 is thus 1, 2, 4, 3.
ii. On a graph that has some edges whose lengths are negative, but that does not
include a negative cycle ?
Even if a graph has edges with negative length, the notion of a shortest simple
path still makes sense. No efficient algorithm is known for finding shortest simple
paths in graphs that may have edges of negative length. This is the situation we
encountered in Problem 5.1.3. These two problems are NP-complete (see Section
10.3).
Problem 5.4.2. Warshall's algorithm. In this case, the length of the edges is
of no interest ; only their existence is important. Initially, L [i, j] = true if the edge
(i, j) exists, and L [i , j] = false otherwise. We want to find a matrix D such that
D [i, j] = true if there exists at least one path from i to j, and D [i, j] = false other-
wise. (We are looking for the reflexive transitive closure of the graph G.) Adapt
Floyd's algorithm for this slightly different case. (We shall see an asymptotically
more efficient algorithm for this problem in Section 10.2.2.)
* Problem 5.4.3. Find a significantly better algorithm for Problem 5.4.2 in the
case when the matrix L is symmetric (L [i, j] = L [ j, i ]).
154 Dynamic Programming Chap. 5
We begin by recalling the definition of a binary search tree. A binary tree each of
whose nodes contains a key is a search tree if the value contained in every internal
node is greater than or equal to (numerically or lexicographically) the values contained
in its left-hand descendants, and less than or equal to the values contained in its right-
hand descendants.
Problem 5.5.1. Show by an example that the following definition will not do :
"A binary tree is a search tree if the key contained in each internal node is greater than
or equal to the key of its left-hand child, and less than or equal to the key of its right-
hand child."
Figure 5.5.1 shows an example of a binary search tree containing the keys
A, B, C , .. , H. (For the rest of this section, search trees will be understood to be
binary.) To determine whether a key X is present in the tree, we first examine the key
held in the root. Suppose this key is R. If X=R, we have found the key we want, and
the search stops ; if X < R, we only need look at the left-hand subtree ; and if X > R,
we only need look at the right-hand subtree. A recursive implementation of this tech-
nique is obvious. (It provides an example of simplification : see chapter 4.)
Problem 5.5.2. Write a procedure that looks for a given key in a search tree
and returns true if the key is present and false otherwise.
The nodes may also contain further information related to the keys : in this case a
search procedure does not simply return true or false, but rather, the information
attached to the key we are looking for.
For a given set of keys, several search trees may be possible : for instance, the
tree in Figure 5.5.2 contains the same keys as those in Figure 5.5.1.
Problem 5.5.3. How many different search trees can be made with eight dis-
tinct keys ?
Sec. 5.5 Optimal Search Trees 155
*Problem 5.5.4. If T (n) is the number of different search trees we can make
with n distinct keys, find either an explicit formula for T (n) or else an algorithm to
calculate this value. (Hint: reread Section 5.3.)
In Figure 5.5.1 two comparisons are needed to find the key E ; in Figure 5.5.2,
on the other hand, a single comparison suffices. If all the keys are sought with the
same probability, it takes (2+3+1+3+2+4+3+4)/8 = 22/8 comparisons on the
average to find a key in Figure 5.5.1, and (4+3+2+3+1+3+2+3)/8=21/8 com-
parisons on the average in Figure 5.5.2.
Problem 5.5.5. For the case when the keys are equiprobable, give a tree that
minimizes the average number of comparisons needed. Repeat the problem for the
general case of n equiprobable keys.
In fact, we shall solve a more general problem still. Suppose we have an ordered
set c 1 < c 2 < * < cn of n distinct keys. Let the probability that a request refers to
key c, be pi , i = 1, 2, ... , n . For the time being, suppose that Ein=1 pi = 1, that is,
all the requests refer to keys that are indeed present in the search tree.
Recall that the depth of the root of a tree is 0, the depth of its children is 1, and
so on. If some key ci is held in a node at depth di , then di + 1 comparisons are neces-
sary to find it. For a given tree the average number of comparisons needed is
C= pi (di + 1).
i=1
This is the function we seek to minimize.
Consider the sequence of successive keys ci , ci + i , ... , cj , j >- i. Suppose that
in an optimal tree containing all the n keys this sequence of j-i + 1 keys occupies the
nodes of a subtree. If the key Ck , i < k < j, is held in a node of depth dk in the sub-
tree, the average number of comparisons carried out in this subtree when we look
for a key in the main tree (the key in question is not necessarily one of those held in
the subtree) is
CY. Pk(dk+l).
k=i
We observe that
We thus arrive at the principle of optimality : in an optimal tree all the subtrees must
also be optimal with respect to the keys they contain.
Let mij = Yk=i Pk , and let CiJ be the average number of comparisons carried out
in an optimal subtree containing the keys ci , ci + I , ... , cJ when a key is sought in
the main tree. (It is convenient to define CiJ = 0 if j = i - 1.) One of these keys, k,
say, must occupy the root of the subtree. In Figure 5.5.3, L is an optimal subtree con-
taining the keys ci , ci + I , ... , Ck - I and R is an optimal subtree containing
Ck +I , ... , cJ . When we look for a key in the main tree, the probability that it is in
the sequence ci , ci +I , ... , c j is mil . In this case one comparison is made with Ck ,
and others may then be made in L or R. The average number of comparisons carried
out is therefore
CiJ = mil + Ci, k - I+ Ck + I , J
where the three terms are the contributions of the root, L and R, respectively.
Example 5.5.1. To find the optimal search tree if the probabilities associated
with five keys c1 to c5 are
Sec. 5.5 Optimal Search Trees 157
i 1 2 3 4 5
Pi 0.30 0.05 0.08 0.45 0.12
we first calculate the matrix m.
Now, we note that Cii = pi , 1 <_ i <_ 5, and next, we use (*) to calculate the other
values of Cii.
C12 = m12 + min(CI0+C22, CII+C32)
= 0.35 + min(0.05, 0.30) = 0.40
Similarly
C23=0.18, C34=0.61, C45=0.69.
Then
C13 = m13 + min(C10+C23, C11+C33, C12+C43)
= 0.43 + min(0.18, 0.38, 0.40) = 0.61
In this algorithm we calculate the values of Cij first for j -i = 1, then for
j -i = 2, and so on. When j -i = m , there are n -m values of C,, to calculate, each
involving a choice among m + 1 possibilities. The required computation time is there-
fore in
,(In-t (n-m)(m+l)) =8(n3) .
m=1
* Problem 5.5.9. Generalize the preceding argument to take account of the pos-
sibility that a request may involve a key that is not in fact in the tree. Specifically, let
pi , i = 1, 2, ... , n , be the probability that a request concerns a key ci that is in the
tree, and let qj , i = 0, 1, 2, ... , n, be the probability that it concerns a missing key
situated between ci and ci+1 (with the obvious interpretation for q0 and qn ). We now
have
n n
,pi +Iqi=1.
The optimal tree must minimize the average number of comparisons required to either
find a key, if it is present in the tree, or to ascertain that it is missing.
Give an algorithm that can determine the optimal search tree in this context.
Problem 5.5.11. Use the result of Problem 5.5.10 to show how to calculate an
optimal search tree in a time in 0(n2). (Problems 5.5.10 and 5.5.11 generalize to the
case discussed in Problem 5.5.9.)
i. How much time does this algorithm take in the worst case, assuming the keys are
already sorted ?
ii. Show with the help of a simple, explicit example that this greedy algorithm does
not always find the optimal search tree. Give an optimal search tree for your
example, and calculate the average number of comparisons needed to find a key
for both the optimal tree and the tree found by the greedy algorithm.
We have already met this problem in Section 3.4.2. Given a graph with nonnegative
lengths attached to the edges, we are required to find the shortest possible circuit that
begins and ends at the same node, after having gone exactly once through each of the
other nodes.
Let G = < N, A > be a directed graph. As usual, we take N = { 1, 2, ... , n
and the lengths of the edges are denoted by L;j , with L [i , i ] = 0, L [i , j ] >_ 0 if i j,
and L [i, j] = oo if the edge (i, j) does not exist.
Suppose without loss of generality that the circuit begins and ends at node 1.
It therefore consists of an edge (1, j), j # 1, followed by a path from j to 1 that passes
exactly once through each node in N \ 11, j }. If the circuit is optimal (as short as pos-
sible), then so is the path from j to 1 : the principle of optimality holds.
Consider a set of nodes S c N \ 111 and a node i EN \ S, with i =1 allowed
only if S = N \ { 1 } . Define g (i, S) as the length of the shortest path from node i to
node 1 that passes exactly once through each node in S. Using this definition,
g (1, N \ { 1 }) is the length of an optimal circuit. By the principle of optimality, we see
that
g(1,N\{1})=2<j<n
min(Ljj +g(j,N\{l,j})). (*)
More generally, if i # 1, S # 0, S # N \ { 1 ), and i 0 S,
g(i,S)=rni (L,j +g(j,S\ { j})).
jES
160 Dynamic Programming Chap. 5
Furthermore,
g(i,0)=Li1, i =2,3..... n .
The values of g (i, S) are therefore known when S is empty. We can apply (**) to cal-
culate the function g for all the sets S that contain exactly one node (other than 1);
then we can apply (**) again to calculate g for all the sets S that contain two nodes
(other than 1), and so on. Once the value of g ( j, N \ { 1, j }) is known for all the
nodes j except node 1, we can use (*) to calculate g (1, N \ { 11) and solve the
problem.
Example 5.6.1. Let G be the complete graph on four nodes given in Figure
5.6.1:
0 10 15 20
5 0 9 10
L =
6 13 0 12
8 8 9 0
We initialize
g (2, 0) = 5, g (3, 0) = 6, g (4, 0) = 8 .
15
To know where this circuit goes, we need an extra function : J (i , S) is the value
of j chosen to minimize g at the moment when we apply (*) or (**) to calculate
g(i,S).
Example 5.6.2. (Continuation of Example 5.6.1.) In this example we find
J(2,{3,4)) =4
J(3,(2,4)) =4
J(4,12,31) =2
J(1, {2,3,4)) = 2
and the optimal circuit is
1 -+J(1,{2,3,4))=2
J(2,{3,4})=4
-3J(4,(31)=3
-1
The required computation time can be calculated as follows :
162 Dynamic Programming Chap. 5
Problem 5.6.1. Verify that the space required to hold the values of g and J is
in Q(n 2" ), which is not very practical either. 13
Problem 5.6.2. The preceding analysis assumes that we can find in constant
time a value of g (j , S) that has already been calculated. Since S is a set, which data
structure do you suggest to hold the values of g ? With your suggested structure, how
much time is needed to access one of the values of g ?
Table 5.6.1 illustrates the dramatic increase in the time and space necessary as n
goes up. For instance, 202220 microseconds is less than 7 minutes, whereas 20!
microseconds exceeds 77 thousand years.
function g (i, S )
if S = 0 then return L [i ,1]
ans <- oo
for each j E S do
distviaj F- L [i, j ] + g ( j, S \ { j } )
if distviaj < ans then ans - distviaj
return ans .
Unfortunately, if we calculate g in this top-down way, we come up once more against
the problem outlined at the beginning of this chapter : most values of g are recalcu-
lated many times and the program is very inefficient. (In fact, it ends up back in
S2((n - 1)!).)
So how can we calculate g in the bottom-up way that characterizes dynamic pro-
gramming ? We need an auxiliary program that generates first the empty set, then all
the sets containing just one element from N \ { I), then all the sets containing two ele-
ments from N \ { 1), and so on. Although it is maybe not too hard to write such a gen-
erator, it is not immediately obvious how to set about it.
One easy way to take advantage of the simplicity of a recursive formulation
without losing the efficiency offered by dynamic programming is to use a memory
function. To the recursive function we add a table of the necessary size. Initially, all
the entries in this table hold a special value to indicate that they have not yet been cal-
culated. Thereafter, whenever we call the function, we first look in the table to see
whether it has already been evaluated with the same set of parameters. If so, we return
the value held in the table. If not, we go ahead and calculate the function. Before
returning the calculated value, however, we save it at the appropriate place in the table.
In this way it is never necessary to calculate the function twice for the same values of
its parameters.
For the algorithm of Section 5.6 let gtab be a table all of whose entries are ini-
tialized to -1 (since a distance cannot be negative). Formulated in the following way :
function g (i, S )
if S = 0 then return L [i ,1 ]
if gtab [i, S ] > 0 then return gtab [i, S ]
ans F oo
for each j E S do
distviaj - L [i, j ] + g ( j, S \ { j } )
if distviaj < ans then ans - distviaj
gtab [i, S ] - ans
return ans ,
the function g combines the clarity obtained from a recursive formulation and the
efficiency of dynamic programming.
164 Dynamic Programming Chap. 5
Problem 5.7.1. Show how to calculate (i) a binomial coefficient and (ii) the
function series (n , p) of Section 5.2 using a memory function.
We sometimes have to pay a price for using this technique. We saw in Section
5.1, for instance, that we can calculate a binomial coefficient (k] using a time in 0 (nk)
and space in 0(k). Implemented using a memory function, the calculation takes the
same amount of time but needs space in Si(nk).
* Problem 5.7.2. If we are willing to use a little more space (the space needed
is only multiplied by a constant factor, however), it is possible to avoid the initializa-
tion time needed to set all the entries of the table to some special value. This is partic-
ularly desirable when in fact only a few values of the function are to be calculated, but
we do not know in advance which ones. (For an example, see Section 6.6.2.) Show
how an array T [1 .. n ] can be virtually initialized with the help of two auxiliary arrays
B [1 .. n ] and P [1 .. n ] and a few pointers. You should write three algorithms.
procedure init
{ virtually initializes T [1 .. n ] }
procedure store (i , v )
{ sets T [i ] to the value v }
function val (i)
{ returns the last value given to T [i ], if any ;
returns a default value (such as -1) otherwise }
A call on any of these procedures or functions (including a call on init !) should take
constant time in the worst case.
delete a character,
add a character,
change a character.
Right-hand symbol
a b c
Left-hand a b b a
symbol b c b a
c a c C
Problem 5.8.4. There are N Hudson's Bay Company posts on the River Kok-
soak. At any of these posts you can rent a canoe to be returned at any other post down-
stream. (It is next to impossible to paddle against the current.) For each possible
departure point i and each possible arrival point j the company's tariff gives the cost
of a rental between i and j. However, it can happen that the cost of renting from i to
j is higher than the total cost of a series of shorter rentals, in which case you can
return the first canoe at some post k between i and j and continue the journey in a
second canoe. There is no extra charge if you change canoes in this way.
Find an efficient algorithm to determine the minimum cost of a trip by canoe
from each possible departure point i to each possible arrival point j. In terms of N,
what is the computing time needed by your algorithm ?
also exists a coin worth 12 units (see Problem 3.1.1). The general problem can be
solved exactly using dynamic programming. Let n be the number of different coins
that exist, and let T [I .. n ] be an array giving the value of these coins. We suppose
that an unlimited number of coins of each value is available. Let L be a bound on the
sum we wish to obtain.
i. For 1 <- i <_ n and 1 <_ j <_ L, let cij be the minimum number of coins required to
obtain the sum j if we may only use coins of types T [1], T[2],. . ., T[i], or
cii _+00 if the amount j cannot be obtained using just these coins. Give a
recurrence for cij , including the initial conditions.
ii. Give a dynamic programming algorithm that calculates all the c,1 , 1 <_ j <- L.
Your algorithm may use only a single array of length L. As a function of n and
L, how much time does your algorithm take?
iii. Give a greedy algorithm that can make change using the minimum number of
coins for any amount M <_ L once the % have been calculated. Your algorithm
should take a time in 0 (n + provided 0.
* Problem 5.8.6. You have n objects, which you wish to put in order using the
relations "<" and "=". For example, 13 different orderings are possible with three
objects.
A(0,n)=n+1
A (m, 0) = A (m - 1, 1)ifm >0
A(m,n)=A(m-1,A(m,n-l)) ifm,n >0.
0
Figure 5.8.1. Cutting a hexagon into triangles.
grow quite large. (Hint : use two arrays val [0.. m ] and ind [0.. m ] such that at
every instant val [i ] = A (i, ind [i ]).) 11
Problem 5.8.8. Prove that the number of ways to cut an n -sided convex
polygon into n - 2 triangles using diagonal lines that do not cross is T (n - 1), the
(n -1)st Catalan number (see Section 5.3). For example, a hexagon can be cut in
14 different ways, as shown in Figure 5.8.1. 11
Several books are concerned with dynamic programming. We mention only Bellman
1957, Bellman and Dreyfus (1962), Nemhauser (1966), and Lauriere (1979). The algo-
rithm in Section 5.3 is described in Godbole (1973) ; a more efficient algorithm, able
to solve the problem of chained matrix multiplications in a time in 0 (n log n), can be
found in Hu and Shing (1982, 1984). Catalan's numbers are discussed in many places,
including Sloane (1973) and Purdom and Brown (1985).
Floyd's algorithm for calculating all shortest paths is due to Floyd (1962). A
theoretically more efficient algorithm is known : Fredman (1976) shows how to solve
the problem in a time in O (n 3(log log n / log n) 1/3 ). The solution to Problem 5.4.2 is
supplied by the algorithm in Warshall (1962). Both Floyd's and Warshall's algorithms
are essentially the same as the one in Kleene (1956) to determine the regular expres-
sion corresponding to a given finite automaton (Hopcroft and Ullman 1979). All these
algorithms (with the exception of Fredman's) are unified in Tarjan (1981).
The algorithm of Section 5.5 for constructing optimal search trees, including the
solution to Problem 5.5.9, comes from Gilbert and Moore (1959). The improvements
suggested by Problems 5.5.10 and 5.5.11 come from Knuth (1971, 1973). A solution
to Problem 5.5.10 that is both simpler and more general is given by Yao (1980); this
168 Dynamic Programming Chap. 5
paper gives a sufficient condition for certain dynamic programming algorithms that run
in cubic time to be transformable automatically into quadratic algorithms. The optimal
search tree for the 31 most common words in English is compared in Knuth (1973)
with the tree obtained using the greedy algorithm suggested in Problem 5.5.12.
The algorithm for the travelling salesperson problem given in Section 5.6 comes
from Held and Karp (1962). Memory functions are introduced in Michie (1968) ; for
further details see Marsh (1970). Problem 5.7.2, which suggests how to avoid initial-
izing a memory function, comes from Exercise 2.12 in Aho, Hopcroft and Ullman
(1974).
A solution to Problem 5.8.1 is given in Wagner and Fischer (1974). Problem
5.8.5 is discussed in Wright (1975) and Chang and Korsh (1976). Problem 5.8.6 sug-
gested itself to the authors one day when they set an exam including a question resem-
bling Problem 2.1.11: we were curious to know what proportion of all the possible
answers was represented by the 69 different answers suggested by the students (see
also Lemma 10.1.2). Problem 5.8.7 is based on Ackermann (1928). Problem 5.8.8 is
discussed in Sloane (1973). An important dynamic programming algorithm that we
have not mentioned is the one in Kasimi (1965) and Younger (1967), which takes
cubic time to carry out the syntactic analysis of any context-free language (Hopcroft
and Ullman 1979).
6
Exploring Graphs
6.1 INTRODUCTION
A great many problems can be formulated in terms of graphs. We have seen, for
instance, the shortest route problem and the problem of the minimal spanning tree. To
solve such problems, we often need to look at all the nodes, or all the edges, of a
graph. Sometimes, the structure of the problem is such that we need only visit some of
the nodes or edges. Up to now, the algorithms we have seen have implicitly imposed
an order on these visits : it was a case of visiting the nearest node, the shortest edge,
and so on. In this chapter we introduce some general techniques that can be used when
no particular order of visits is required.
We shall use the word "graph" in two different ways. A graph may be a data
structure in the memory of a computer. In this case, the nodes are represented by a
certain number of bytes, and the edges are represented by pointers. The operations to
be carried out are quite concrete : to "mark a node" means to change a bit in memory,
to "find a neighbouring node" means to follow a pointer, and so on.
At other times, the graph exists only implicitly. For instance, we often use
abstract graphs to represent games : each node corresponds to a particular position of
the pieces on the board, and the fact that an edge exists between two nodes means that
it is possible to get from the first to the second of these positions by making a single
legal move. When we explore such a graph, it does not really exist in the memory of
the machine. Most of the time, all we have is a representation of the current position
(that is, of the node we are in the process of visiting) and possibly representations of a
few other positions. In this case to "mark a node" means to take any appropriate meas-
ures that enable us to recognize a position we have already seen, or to avoid arriving at
169
170 Exploring Graphs Chap. 6
the same position twice ; to "find a neighbouring node" means to change the current
position by making a single legal move ; and so on.
However, whether the graph is a data structure or merely an abstraction, the tech-
niques used to traverse it are essentially the same. In this chapter we therefore do not
distinguish the two cases.
We shall not spend long on detailed descriptions of how to explore a tree. We simply
remind the reader that in the case of binary trees three techniques are often used. If at
each node of the tree we visit first the node itself, then all the nodes in the left-hand
subtree, and finally, all the nodes in the right-hand subtree, we are traversing the tree in
preorder ; if we visit first the left-hand subtree, then the node itself, and finally, the
right-hand subtree, we are traversing the tree in inorder; and if we visit first the left-
hand subtree, then the right-hand subtree, and lastly, the node itself, then we are
visiting the tree in postorder. Preorder and postorder generalize in the obvious way to
nonbinary trees.
These three techniques explore the tree from left to right. Three corresponding
techniques explore the tree from right to left. It is obvious how to implement any of
these techniques using recursion.
Lemma 6.2.1. For each of the six techniques mentioned, the time T (n) needed
to explore a binary tree containing n nodes is in O(n).
Proof. Suppose that visiting a node takes a time in O(1), that is, the time
required is bounded above by some constant c. Without loss of generality, we may
suppose that c >- T (0).
Suppose further that we are to explore a tree containing n nodes, n > 0, of
which one node is the root, g nodes are situated in the left-hand subtree, and n -g -1
nodes are in the right-hand subtree. Then
T(n) <- max (T(g) + T(n-g -1) + c) n > 0
O<g <n -1
<_ max
0<g <m-1
(dg +c +d(m-g-1)+c +c)
so the hypothesis is also true for n = in. This proves that T (n) 5 do + c for every
n >_ 0, and hence T (n) is in O (n).
On the other hand, it is clear that T (n) is in c2(n) since each of the n nodes is
visited. Therefore T (n) is in O(n).
Problem 6.2.1. Prove that for any of the techniques mentioned, a recursive
implementation takes memory space in S2(n) in the worst case.
* Problem 6.2.2. Show how the preceding exploration techniques can be imple-
mented so as to take only a time in 0(n) and space in 0(1), even when the nodes do
not contain a pointer to their parents (otherwise the problem becomes trivial).
Problem 6.2.3. Show how to generalize the concepts of preorder and post-
order to arbitrary (nonbinary) trees. Assume the trees are represented as in Figure
1.9.5. Prove that both these techniques still run in a time in the order of the number of
nodes in the tree to be traversed.
Let G = < N, A > be an undirected graph all of whose nodes we wish to visit. Sup-
pose that it is somehow possible to mark a node to indicate that it has already been
visited. Initially, no nodes are marked.
To carry out a depth-first traversal of the graph, choose any node v E N as the
starting point. Mark this node to show that it has been visited. Next, if there is a node
adjacent to v that has not yet been visited, choose this node as a new starting point and
call the depth-first search procedure recursively. On return from the recursive call, if
there is another node adjacent to v that has not been visited, choose this node as the
next starting point, call the procedure recursively once again, and so on. When all the
nodes adjacent to v have been marked, the search starting at v is finished.
If there remain any nodes of G that have not been visited, choose any one of
them as a new starting point, and call the procedure yet again. Continue in this way
until all the nodes of G have been marked. Here is the recursive algorithm.
procedure search (G)
for each v E N do mark [v] f- not-visited
for each v E N do
if mark [v] # visited then dfs (v )
procedure dfs (v : node )
[ node v has not been visited }
mark [v] - visited
for each node w adjacent to v do
if mark [w] # visited then dfs (w )
172 Exploring Graphs Chap. 6
The algorithm is called depth-first search since it tries to initiate as many recur-
sive calls as possible before it ever returns from a call. The recursivity is only stopped
when exploration of the graph is blocked and can go no further. At this point the
recursion "unwinds" so that alternative possibilities at higher levels can be explored.
Example 6.3.1. If we suppose that the neighbours of a given node are exam-
ined in numerical order, and that node 1 is the first starting point, a depth-first search
of the graph in Figure 6.3.1 progresses as follows :
Problem 6.3.1. Show how a depth-first search progresses through the graph in
Figure 6.3.1 if the neighbours of a given node are examined in numerical order but the
initial starting point is node 6.
How much time is needed to explore a graph with n nodes and a edges ? Since
each node is visited exactly once, there are n calls of the procedure dfs. When we
visit a node, we look at the mark on each of its neighbouring nodes. If the graph is
represented in such a way as to make the lists of adjacent nodes directly accessible
(type lisgraph of Section 1.9.2), this work is proportional to a in total. The algorithm
therefore takes a time in 0 (n) for the procedure calls and a time in 0 (a) to inspect the
marks. The execution time is thus in 0 (max(a, n)).
Problem 6.3.3. Show how depth-first search can be used to find the connected
components of an undirected graph.
node 1 2 3 4 5 6 7 8
prenum 1 2 3 6 5 4 7 8 11
Problem 6.3.4. Exhibit the tree and the numbering generated by the search of
Problem 6.3.1.
174 Exploring Graphs Chap. 6
a. Carry out a depth-first search in G, starting from any node. Let T be the tree
generated by the depth-first search, and for each node v of the graph, let
prenum [v ] be the number assigned by the search.
b. Traverse the tree T in postorder. For each node v visited, calculate lowest [v] as
the minimum of
i. prenum [v]
ii. prenum [w] for each node w such that there exists an edge [ v, w } in G
that has no corresponding edge in T
W. lowest [x] for every child x of v in T.
c. Articulation points are now determined as follows:
i. The root of T is an articulation point of G if and only if it has more than
one child.
H. A node v other than the root of T is an articulation point of G if and only
if v has a child x such that lowest [x] >_ prenum [v].
Example 6.3.4. (Continuation of Examples 6.3.1, 6.3.2, and 6.3.3) The search
described in Example 6.3.1 generates the tree illustrated in Figure 6.3.2. The edges of
G that have no corresponding edge in T are represented by broken lines. The value of
prenum [v] appears to the left of each node v, and the value of lowest [v] to the right.
The values of lowest are calculated in postorder, that is, for nodes 5, 6, 3, 2, 8, 7, 4,
and 1 successively. The articulation points of G are nodes 1 (by rule c(i)) and 4
(by rule c(ii)).
Problem 6.3.5. Verify that the same articulation points are found if we start
the search at node 6. 11
Sec. 6.3 Depth-First Search : Undirected Graphs 175
Figure 6.3.2. A depth-first search tree ; prenum on the left and lowest on the right.
Problem 6.3.8. Show how to carry out the operations of steps (a) and (b) in
parallel and write the corresponding algorithm.
* Problem 6.3.9. Write an algorithm that decides whether or not a given con-
nected graph is bicoherent.
Problem 6.3.13. Prove that for every pair of distinct nodes v and w in a
biconnected graph, there exist at least two chains of edges joining v and w that have
no nodes in common (except the starting and ending nodes).
The algorithm is essentially the same as the one for undirected graphs, the difference
being in the interpretation of the word "adjacent". In a directed graph, node w is adja-
cent to node v if the directed edge (v , w) exists. If (v , w) exists and (w , v) does not,
then w is adjacent to v but v is not adjacent to w. With this change of interpretation
the procedures dfs and search from Section 6.3 apply equally well in the case of a
directed graph.
The algorithm behaves quite differently, however. Consider a depth-first search
of the directed graph in Figure 6.4.1. If the neighbours of a given node are examined
in numerical order, the algorithm progresses as follows :
if G is connected. This happens in our example : the edges used, namely (1, 2), (2, 3),
(1, 4), (4, 8), (8, 7), and (5, 6), form the forest shown by the solid lines in Figure 6.4.2.
(The numbers to the left of each node are explained in Section 6.4.2.)
Let F be the set of edges in the forest. In the case of an undirected graph the
edges of the graph with no corresponding edge in the forest necessarily join some node
to one of its ancestors (Problem 6.3.6). In the case of a directed graph three kinds of
edges can appear in A \ F (these edges are shown by the broken lines in Figure 6.4.2).
i. Those like (3, 1) or (7,4) that lead from a node to one of its ancestors ;
ii. those like (1, 8) that lead from a node to one of its descendants ; and
W. those like (5,2) or (6,3) that join one node to another that is neither its ancestor
nor its descendant. Edges of this type are necessarily directed from right to left.
bring documents
at the end of the procedure, the numbers of the nodes will be printed in reverse topo-
logical order.
Problem 6.4.5. For the graph of Figure 6.4.4, what is the topological order
obtained if the neighbours of a node are visited in numerical order and if the depth-first
search begins at node I ?
A directed graph is strongly connected if there exists a path from u to v and also a
path from v to u for every distinct pair of nodes u and v. If a directed graph is not
strongly connected, we are interested in the largest sets of nodes such that the
corresponding subgraphs are strongly connected. Each of these subgraphs is called a
strongly connected component of the original graph. In the graph of Figure 6.4.1, for
instance, nodes 11, 2, 3) and the corresponding edges form a strongly connected com-
ponent. Another component corresponds to the nodes {4,7,8}. Despite the fact that
there exist edges (1,4) and (1,8), it is not possible to merge these two strongly con-
nected components into a single component because there exists no path from node 4
to node 1.
180 Exploring Graphs Chap. 6
i. Carry out a depth-first search of the graph starting from an arbitrary node. For
each node v of the graph let postnum [v] be the number assigned during the
search.
ii. Construct a new graph G' : G' is the same as G except that the direction of
every edge is reversed.
iii. Carry out a depth-first search in G'. Begin this search at the node w that has
the highest value of postnum. (If G contains n nodes, it follows that
postnum [w] = n.) If the search starting at w does not reach all the nodes,
choose as the second starting point the node that has the highest value of
postnum among all the unvisited nodes ; and so on.
iv. To each tree in the resulting forest there corresponds one strongly connected
component of G.
Example 6.4.1. On the graph of Figure 6.4.1, the first depth-first search
assigns the values of postnum shown to the left of each node in Figure 6.4.2. The
graph G' is illustrated in Figure 6.4.6, with the values of postnum shown to the left of
each node. We carry out a depth-first search starting from node 5, since
postnum [5] = 8; the search reaches nodes 5 and 6. For our second starting point, we
choose node 1, with postnum [1] = 6; this time the search reaches nodes 1, 3, and 2.
For the third starting point we take node 4, with postnum [4] = 5 ; this time the
remaining nodes 4, 7, and 8 are all reached. The corresponding forest is illustrated in
Figure 6.4.7. The strongly connected components of the original graph (Fig. 6.4.1) are
the subgraphs corresponding to the sets of nodes (5, 6), 11, 3, 21 and '{ 4, 7, 8 1.
* Problem 6.4.6. Prove that if two nodes u and v are in the same strongly con-
nected component of G, then they are in the same tree when we carry out the depth-
first search of G'.
Sec. 6.4 Depth-First Search : Directed Graphs 181
0 O
O 0
O
Figure 6.4.7. The forest of strongly connected components.
It is harder to prove the result the other way. Let v be a node that is in the tree
whose root is r when we carry out the search of G', and suppose v #r. This implies
that there exists a path from r to v in G' ; thus there exists a path from v to r in G.
When carrying out the search of G', we always choose as a new starting point (that is,
as the root of a new tree) that node not yet visited with the highest value of postnum.
Since we chose r rather than v to be the root of the tree in question, we have
postnum [r] > postnum [v].
When we carried out the search in G, three possibilities seem to be open a priori:
r was an ancestor of v ;
r was a descendant of v ; or
r was neither an ancestor nor a descendant of v.
The second possibility is ruled out by the fact that postnum [r] > postnum [v]. In the
third case it would be necessary for the same reason that r be to the right of v.
However, there exists at least one path from v to r in G. Since in a depth-first
search the edges not used by the search never go from left to right (Problem 6.4.2),
any such path must go up the tree from v to a common ancestor (x, say) of v and r,
and then go down the tree to r. But this is quite impossible. We should have
postnum [x] > postnum [r] since x is an ancestor of r. Next, since there exists a path
from v to x in G, there would exist a path from x to v in G'. Before choosing r as
182 Exploring Graphs Chap. 6
the root of a tree in the search of G', we would have already visited x (otherwise x
rather than r would be chosen as the root of the new tree) and therefore also v. This
contradicts the hypothesis that v is in the tree whose root is r when we carry out the
search of G. Only the first possibility remains : r was an ancestor of v when we
searched G. This implies that there exists a path from r to v in G.
We have thus proved that if node v is in the tree whose root is r when we carry
out the search of G', then there exist in G both a path from v to r and a path from r
to v. If two nodes u and v are in the same tree when we search G', they are therefore
both in the same strongly connected component of G since there exist paths from u to
v and from v to u in G via node r.
With the result of Problem 6.4.6, this completes the proof that the algorithm
works correctly.
Problem 6.4.7. Estimate the time and space needed by this algorithm.
When a depth-first search arrives at some node v, it next tries to visit some neighbour
of v, then a neighbour of this neighbour, and so on. When a breadth-first search
arrives at some node v, on the other hand, it first visits all the neighbours of v, and not
until this has been done does it go on to look at nodes farther away. Unlike depth-first
search, breadth-first search is not naturally recursive. To underline the similarities and
the differences between the two methods, we begin by giving a nonrecursive formula-
tion of the depth-first search algorithm. Let stack be a data type allowing two opera-
tions, push and pop. The type is intended to represent a list of elements that are to be
handled in the order "last come, first served". The function top denotes the element at
the top of the stack. Here is the modified depth-first search algorithm.
procedure dfs'(v : node)
P F empty-stack
mark [v] t- visited
push v on P
while P is not empty do
while there exists a node w adjacent to top (P)
such that mark [w] # visited do
mark [w] - visited
push w on P { w is now top (P) }
pop top (P)
For the breadth-first search algorithm, by contrast, we need a type queue that
allows two operations enqueue and dequeue. This type represents a list of elements
that are to be handled in the order "first come, first served". The function first denotes
the element at the front of the queue. Here now is the breadth-first search algorithm.
Sec. 6.5 Breadth-First Search 183
Example 6.5.1. On the graph of Figure 6.3.1, if the neighbours of a node are
visited in numerical order, and if node I is used as the starting point, breadth-first
search proceeds as follows.
Node Visited Q
1. 1 2,3,4
2. 2 3,4,5,6
3. 3 4,5,6
4. 4 5,6,7,8
5. 5 6,7,8
6. 6 7,8
7. 7 8
8. 8 -
As for depth-first search, we can associate a tree with the breadth-first search.
Figure 6.5.1 shows the tree generated by the search in Example 6.5.1. The edges of
the graph that have no corresponding edge in the tree are represented by broken lines.
It is easy to show that the time required by a breadth-first search is in the same
order as that required by a depth-first search, namely 0 (max(a, n)). If the appropriate
184 Exploring Graphs Chap. 6
Problem 6.5.2. Show how a breadth-first search progresses through the graph
of Figure 6.4.1, assuming that the neighbours of a node are always visited in numerical
order, and that necessary starting points are also chosen in numerical order.
Breadth-first search is most often used to carry out a partial exploration of certain
infinite graphs or to find the shortest path from one point to another.
As mentioned at the outset of this chapter, various problems can be thought of in terms
of abstract graphs. For instance, we can use the nodes of a graph to represent
configurations in a game of chess and edges to represent legal moves (see Sec-
tion 6.6.2). Often the original problem translates to searching for a specific node, path
or pattern in the associated graph. If the graph contains a large number of nodes, it
may be wasteful or infeasible to build it explicitly in computer memory before
applying one of the search techniques we have encountered so far.
An implicit graph is one for which we have available a description of its nodes
and edges. Relevant portions of the graph can thus be built as the search progresses.
Therefore computing time is saved whenever the search succeeds before the entire
graph has been constructed. The economy in memory space is even more dramatic
when nodes that have already been searched can be discarded, making room for subse-
quent nodes to be explored.
Backtracking is a basic search technique on implicit graphs. One powerful appli-
cation is in playing games of strategy by techniques known as minimax and alpha-beta
Sec. 6.6 Implicit Graphs and Trees 185
6.6.1 Backtracking
The first obvious way to solve this problem consists of trying systematically all
the ways of placing eight queens on a chess-board, checking each time to see whether
a solution has been obtained. This approach is of no practical use, since the number of
positions we have to check would be [64 J = 4,426,165,368. The first improvement we
a than one queen in any given row. This
might try consists of never putting more
reduces the computer representation of the chess-board to a simple vector of eight ele-
ments, each giving the position of the queen in the corresponding row. For instance,
the vector (3, 1, 6, 2, 8, 6, 4, 7) does not represent a solution since the queens in the third
and the sixth rows are in the same column, and also two pairs of queens lie on the
same diagonal. Using this representation, we can write the algorithm very simply
using eight nested loops.
program Queens 1
for ii-lto 8do
for i2-lto 8do
Problem 6.6.2. If you have not yet solved the previous problem, the informa-
tion just given should be of considerable help ! 0
Once we have realized that the chess-board can be represented by a vector, which
prevents us from ever trying to put two queens in the same row, it is natural to be
equally systematic in our use of the columns. Hence we now represent the board by a
vector of eight different numbers between 1 and 8, that is, by a permutation of the first
eight integers. The algorithm becomes
program Queens 2
try initial permutation
while try # final-permutation and not solution (try) do
try F- next-permutation
if solution (try) then write try
else write "there is no solution"
There are several natural ways to generate systematically all the permutations of
the first n integers. For instance, we might put each value in turn in the leading posi-
tion and generate recursively, for each of these leading values, all the permutations of
the n -1 remaining elements.
procedure perm (i)
if i = n then use (T) { T is a new permutation }
else for j F- i to n do exchange T [i] and T [ j ]
perm (i +1)
exchange T [i] and T [ j ]
Problem 6.6.3. If use (T) consists simply of printing the array T on a new
line, show the result of calling perm (1) when n = 4. 0
Problem 6.6.4. Assuming that use (T) takes constant time, how much time is
needed, as a function of n , to execute the call perm (1) ? Now rework the problem
assuming that use (T) takes a time in 0(n). 0
This approach reduces the number of possible cases to 8! = 40,320. If the
preceding algorithm is used to generate the permutations, only 2,830 cases are in fact
considered before the algorithm finds a solution. Although it is more complicated to
generate permutations rather than all the possible vectors of eight integers between 1
and 8, it is, on the other hand, easier in this case to verify whether a given position is a
solution. Since we already know that two queens can neither be in the same row nor in
the same column, it suffices to verify that they are not in the same diagonal.
Sec. 6.6 Implicit Graphs and Trees 187
Starting from a crude method that tried to put the queens absolutely anywhere on
the chess-board, we progressed first to a method that never puts two queens in the
same row, and then to a better method still where the only positions considered are
those where we know that two queens can neither be in the same row nor in the same
column. However, all these algorithms share an important defect : they never test a
position to see if it is a solution until all the queens have been placed on the board.
For instance, even the best of these algorithms makes 720 useless attempts to put the
last six queens on the board when it has started by putting the first two on the main
diagonal, where of course they threaten one another !
Backtracking allows us to do better than this. As a first step, let us reformulate
the eight queens problem as a tree searching problem. We say that a vector V [1 .. k]
of integers between I and 8 is k-promising, for 0S k 5 8, if none of the k queens
placed in positions (1, V [1]), (2, V [2]) , .. , (k, V [k]) threatens any of the others.
Mathematically, a vector V is k-promising if, for every i #j between 1 and k, we
have V[i] -V [j ] it (i - j , 0, j - i ). For k <- 1, any vector V is k -promising. Solu-
tions to the eight queens problem correspond to vectors that are 8-promising.
Let N be the set of k-promising vectors, 0 5 k S 8. Let G = <N, A > be the
directed graph such that (U, V ) E A if and only if there exists an integer k, 0 <_ k < 8,
such that U is k -promising, V is (k +l )-promising, and U[i]=V[i] for every
i E [1 .. k]. This graph is a tree. Its root is the empty vector (k =0). Its leaves are
either solutions (k = 8) or else they are dead ends (k < 8) such as [1, 4, 2, 5, 8] where it
is impossible to place a queen in the next row without her threatening at least one of
the queens already on the board. The solutions to the eight queens problem can be
obtained by exploring this tree. We do not generate the tree explicitly so as to explore
it thereafter, however : rather, nodes are generated and abandoned during the course of
the exploration. Depth-first search is the obvious method to use, particularly if we
only require one solution.
This technique has two advantages over the previous algorithm that systemati-
cally tried each permutation. First, the number of nodes in the tree is less than
8! = 40,320. Although it is not easy to calculate this number theoretically, it is
straightforward to count the nodes using a computer : #N = 2057. In fact, it suffices to
explore 114 nodes to obtain a first solution. Secondly, in order to decide whether a
vector is k-promising, knowing that it is an extension of a (k -1)-promising vector, we
only need to check the last queen to be added. This check can be speeded up if we
associate with each promising node the sets of columns, of positive diagonals (at 45
degrees), and of negative diago^,11s (at 135 degrees) controlled by the queens already
placed. On the other hand, to decide if some given permutation represents a solution,
it seems at first sight that we have to check each of the 28 pairs of queens on the
board.
To print all the solutions to the eight queens problem, call Queens (0, 0, 0, 0),
where try [I .. 8] is a global array.
188 Exploring Graphs Chap. 6
for j F- 1 to 8 do
if j Ocol and j -k Odiag45 and j +k Odiag 135
then try [k +I] F- j
{ try (L. k +11 is (k +1)-promising }
Queens (k +1, col u { j }, diag45 u { j -k }, diag 135 u { j +k))
Problem 6.6.5. Show that the problem for n queens may have no solution.
Find a more interesting case than n = 2.
Backtracking algorithms can also be used even when the solutions sought do not
necessarily all have the same length. Here is the general scheme.
procedure backtrack (v [ 1 .. k ])
{ v is a k -promising vector)
if v is a solution then write v
{ otherwise } for each (k +1)-promising vector w
such that w [1 .. k ] = v [I .. k ] do backtrack (w [I .. k +1])
The otherwise should be present if and only if it is impossible to have two different
solutions such that one is a prefix of the other.
Sec. 6.6 Implicit Graphs and Trees 189
The n-queens problem was solved using depth-first search in the corresponding
tree. Some problems that can be formulated in terms of exploring an implicit graph
have the property that they correspond to an infinite graph. In this case, it may be
necessary to use breadth-first search to avoid the interminable exploration of some
fruitless infinite branch. Breadth-first search is also appropriate if we have to find a
solution starting from some initial position and making as few changes as possible.
(This last constraint does not apply to the eight queens problem where each solution
involves exactly the same number of pieces.) The two following problems illustrate
these ideas.
* Problem 6.6.9. Give an algorithm that determines the shortest possible series
of manipulations needed to change one configuration of Rubik's Cube into another.
If the required change is impossible, your algorithm should say so rather than calcu-
lating forever.
To determine a winning strategy for a game of this kind, we need only attach to
each node of the graph a label chosen from the set win, lose, draw. The label
corresponds to the situation of a player about to move in the corresponding position,
assuming that neither player will make an error. The labels are assigned systematically
in the following way.
i. The labels assigned to terminal positions depend on the game in question. For
most games, if you find yourself in a terminal position, then there is no legal
move you can make, and you have lost ; but this is not necessarily the case (think
of stalemate in chess).
ii. A nonterminal position is a winning position if at least one of its successors is a
losing position.
iii. A nonterminal position is a losing position if all of its successors are winning
positions.
iv. Any remaining positions lead to a draw.
Problem 6.6.11. Grasp intuitively how these rules arise. Can a player who
finds himself in a winning position lose if his opponent makes an "error"?
We illustrate these ideas with the help of a variant of Nim (also known as the
Marienbad game). Initially, at least two matches are placed on the table between two
players. The first player removes as many matches as he likes, except that he must
take at least one and he must leave at least one. Thereafter, each player in turn must
remove at least one match and at most twice the number of matches his opponent just
took. The player who removes the last match wins. There are no draws.
Example 6.6.1. There are seven matches on the table initially. If I take two
of them, my opponent may take one, two, three, or four. If he takes more than one, I
can remove all the matches that are left and win. If he takes only one match, leaving
four matches on the table, I can in turn remove a single match, and he cannot prevent
me from winning on my next turn. On the other hand, if at the outset I choose to
remove a single match, or to remove more than two, then you may verify that my
opponent has a winning strategy.
The player who has the first move in a game with seven matches is therefore cer-
tain to win provided that he does not make an error. On the other hand, you may
verify that a player who has the first move in a game with eight matches cannot win
unless his opponent makes an error.
Sec. 6.6 Implicit Graphs and Trees 191
A position in this game is not specified merely by the number of matches that
remain on the table. It is also necessary to know the upper limit on the number of
matches that it is permissible to remove on the next move. The nodes of the graph
corresponding to this game are therefore pairs < i , j >. In general, < i , j >, 1 <_ j <- i ,
indicates that i matches remain on the table and that any number of them between 1
and j may be removed in the next move. The edges leaving node < i , j > go to the j
nodes < i -k, min(2k, i -k)>, 1 k <- j. The node corresponding to the initial position
in a game with n matches, n 2, is < n , n -1 >. All the nodes whose second com-
ponent is zero correspond to terminal positions, but only < 0, 0 > is interesting : the
nodes < i, 0 > for i > 0 are inaccessible. Similarly, nodes < i, j > with j odd and
j < i -1 cannot be reached starting from any initial position.
Figure 6.6.1 shows part of the graph corresponding to this game. The square
nodes represent losing positions and the round nodes are winning positions. The heavy
edges correspond to winning moves : in a winning position, choose one of the heavy
edges in order to win. There are no heavy edges leaving a losing position,
corresponding to the fact that such positions offer no winning move.
We observe that a player who has the first move in a game with two, three, or
five matches has no winning strategy, whereas he does have such a strategy in the
game with four matches.
Problem 6.6.14. Add nodes < 8, 7 >, < 7, 6 >, < 6, 5 > and their descendants to
the graph of Figure 6.6.1.
Problem 6.6.15. Can a winning position have more than one losing position
among its successors ? In other words, are there positions in which several different
winning moves are available? Can this happen in the case of a winning initial position
<n,n-1>?
The obvious algorithm to determine whether a position is winning is the fol-
lowing.
function rec (i, j )
{ returns true if and only if the node < i, j > is winning ;
we assume that 0 j <- i }
for k F- 1 to j do
if not rec (i -k, min(2k, i -k))
then return true
return false
Problem 6.6.16. Modify this algorithm so that it returns an integer k such that
k = 0 if the position is a losing position and 1 <- k <- j if it is a winning move to take
away k matches.
This algorithm suffers from the same defect as the algorithm fib! in Section
1.7.5: it calculates the same value over and over. For instance, rec (5, 4) returns false
having called successively rec (4, 2), rec (3, 3), rec (2, 2) and rec (1, 1), but rec (3, 3)
also calls rec (2, 2) and rec (1, 1).
Problem 6.6.17. Find two ways to remove this inefficiency. (If you want to
work on this problem, do not read the following paragraphs yet !)
Problem 6.6.18. The preceding algorithm only _ises G [0, 0] and the values of
G [1, k], 1 <- k <- l < i , to calculate G [i, j]. Show how to improve its efficiency by
also using the values of G [i , k ] for 1 <- k < j.
At first sight, there is no particular reason to favour this approach over dynamic
programming, because in any case we have to take the time to initialize the whole
array init [0.. n, 0.. n ]. Using the technique suggested in Problem 5.7.2 allows us,
however, to avoid this initialization and to obtain a worthwhile gain in efficiency.
The game we have considered up to now is so simple that it can be solved
without really using the associated graph. Here, without explanation, is an algorithm
for determining a winning strategy that is more efficient than any of those given previ-
ously. In an initial position with n matches, first call precond (n). Thereafter a call on
whatnow (i , j ), 1 <- j <- i, determines in a time in O(1) the move to make in a situation
where i matches remain on the table and the next player has the right to take at most j
of them. The array T [0.. n ] is global. The initial call of precond (n) is an application
of the preconditioning technique to be discussed in the next chapter.
194 Exploring Graphs Chap. 6
return 1
return T [i]
* Problem 6.6.19. Prove that this algorithm works correctly and that
precond (n) takes a time in 0(n).
Consider now a more complex game, namely chess. At first sight, the graph
associated with this game contains cycles, since if two positions u and v of the pieces
differ only by the legal move of a rook, say, the king not being in check, then we can
move equally well from u to v and from v to u. However, this problem disappears on
closer examination. Remember first that in the game we just looked at, a position is
defined not merely by the number of matches on the table, but also by an invisible item
of information giving the number of matches that can be removed on the next move.
Similarly, a position in chess is not defined simply by the positions of the pieces on the
board. We also need to know whose turn it is to move, which rooks and kings have
moved since the beginning of the game (to know if it is legal to castle), and whether
some pawn has just been moved two squares forward (to know whether a capture en
passant is possible). Furthermore, the International Chess Federation has rules that
prevent games dragging on forever : for example, a game is declared to be a draw after
50 moves in which no irreversible action (movement of a pawn, or a capture) took
place. Thus we must include in our notion of position the number of moves made
since the last irreversible action. Thanks to such rules, there are no cycles in the graph
corresponding to chess. (For simplicity we ignore exceptions to the 50-move rule, as
well as the older rule that makes a game a draw if the pieces return three times to
exactly the same positions on the board.)
Adapting the general rules given at the beginning of this section, we can there-
fore label each node as being a winning position for White, a winning position for
Black, or a draw. Once constructed, this graph allows us to play a perfect game of
chess, that is, to win whenever it is possible and to lose only when it is inevitable.
Unfortunately (or perhaps fortunately for the game of chess), the graph contains so
many nodes that it is quite out of the question to explore it completely, even with the
fastest existing computers.
* Problem 6.6.20. Estimate the number of ways in which the pieces can be
placed on a chess-board. For simplicity ignore the fact that certain positions are
Sec. 6.6 Implicit Graphs and Trees 195
impossible, that is, they can never be obtained from the initial position by a legal series
of moves (but take into account the fact that each bishop moves only on either white or
black squares, and that both kings must be on the board). Ignore also the possibility of
having promoted pawns. 11
Since a complete search of the graph associated with the game of chess is out of
the question, it is not practical to use a dynamic programming approach. In this situa-
tion the recursive approach comes into its own. Although it does not allow us to be
certain of winning, it underlies an important heuristic called minimax. This technique
finds a move that may reasonably be expected to be among the best moves possible
while exploring only a part of the graph starting from some given position. Explora-
tion of the graph is usually stopped before the leaves are reached, using one of several
possible criteria, and the positions thus reached are evaluated heuristically. Then we
make the move that seems to cause our opponent the most trouble. This is in a sense
merely a systematic version of the method used by some human players that consists
of looking ahead a small number of moves. Here we give only an outline of the tech-
nique.
The minimax principle. The first step is to define a static evaluation func-
tion eval that attributes some value to each possible position. Ideally, we want the
value of eval (u) to increase as the position u becomes more favourable to White. It is
customary to give values not too far from zero to positions where neither side has a
marked advantage, and large negative values to positions that favour Black. This
evaluation function must take account of many factors : the number and the type of
pieces remaining on both sides, control of the centre, freedom of movement, and so on.
A compromise must be made between the accuracy of this function and the time
needed to calculate it. When applied to a terminal position, the evaluation function
should return +oo if Black has been mated, -oc if White has been mated, and 0 if the
game is a draw. For example, an evaluation function that takes good account of the
static aspects of the position but that is too simplistic to be of real use might be the fol-
lowing : for nonterminal configurations, count 1 point for each white pawn, 31/4 points
for each white bishop or knight, 5 points for each white rook, and 10 points for each
white queen ; subtract a similar number of points for each black piece.
If the static evaluation function were perfect, it would be easy to determine the
best move to make. Suppose it is White's turn to move from position u. The best
move would be to go to the position v that maximizes eval (v) among all the succes-
sors w of u.
val F-oo
for each configuration w that is a successor of u do
if eval(w) >- val then val - eval(w)
V f- W
It is clear that this simplistic approach would not be very successful using the evalua-
tion function suggested earlier, since it would not hesitate to sacrifice a queen in order
to take a pawn !
196 Exploring Graphs Chap. 6
If the evaluation function is not perfect, a better strategy for White is to assume
that Black will reply with the move that minimizes the function eval, since the smaller
the value taken by this function, the better the position is supposed to be for him.
(Ideally, he would like a large negative value.) We are now looking half a move
ahead.
val --°°
for each configuration w that is a successor of u do
if w has no successor
then valw - eval(w)
else valw - min{eval(x) I x is a successor of w }
if valw >- val then val F- valw
v t-W
There is now no question of giving away a queen to take a pawn, which of course may
be exactly the wrong rule to apply if it prevents White from finding the winning move :
maybe if he looked further ahead the gambit would turn out to be profitable. On the
other hand, we are sure to avoid moves that would allow Black to mate immediately
(provided we can avoid this).
To add more dynamic aspects to the static evaluation provided by eval, it is
preferable to look several moves ahead. To look n half-moves ahead from position U,
White should move to the position v given by
We see why the technique is called minimax : Black tries to minimize the advan-
tage he allows to White, and White, on the other hand, tries to maximize the advantage
he obtains from each move.
Sec. 6.6 Implicit Graphs and Trees 197
Player Rule
A max
Problem 6.6.21. Let u correspond to the initial position of the pieces. What
can you say about White (u, 12800), besides the fact that it would take far too long to
calculate in practice? Justify your answer.
Example 6.6.2. Figure 6.6.2 shows part of the graph corresponding to some
game. If the values attached to the leaves are obtained by applying the function eval
to the corresponding positions, the values for the other nodes can be calculated using
the minimax rule. In the example we suppose that player A is trying to maximize the
evaluation function and that player B is trying to minimize it.
If A plays so as to maximize his advantage, he will choose the second of the
three possible moves. This assures him of a value of at least 10.
Example 6.6.3. Look back at Figure 6.6.2. Let < i, j > represent the j th
node in the i th row of the tree. We want to calculate the value of the root < 1, 1 >
starting from the values calculated by the function eval for the leaves < 4, j >,
1 <_ j 5 18. To do this, we carry out a bounded depth-first search in the tree, visiting
the successors of a given node from left to right.
198 Exploring Graphs Chap. 6
** Problem 6.6.23. Write a program that can beat the world backgammon cham-
pion. (This has already been done !)
** Problem 6.6.24. What modifications should be made to the principles set out
in this section to take account of those games of strategy in which chance plays a
certain part ? What about games with more than two players ?
Player Rule
A max
B min
A max
B eva!
-7 5 -3 10 ? ?
Player Rule
A max
It is not necessary
to explore these
branches
B eval
6.6.3 Branch-and-Bound
Let G be the complete graph on five points with the following distance matrix :
014 4 10 20
14 0 7 8 7
4 5 0 7 16
11 7 9 0 2
18 7 17 4 0
We are looking for the shortest tour starting from node 1 that passes exactly once
through each other node before finally returning to node 1.
The nodes in the implicit graph correspond to partially specified paths. For
instance, node (1,4,3) corresponds to two complete tours : (1,4,3,2,5, 1) and
(1, 4, 3, 5, 2, 1). The successors of a given node correspond to paths in which one addi-
tional node has been specified. At each node we calculate a lower bound on the length
of the corresponding complete tours.
To calculate this bound, suppose that half the distance between two points i and
j is counted at the moment we leave i, and the other half when we arrive at j. For
instance, leaving node I costs us at least 2, namely the lowest of the values 14/2, 4/2,
10/2, and 20/2. Similarly, visiting node 2 costs us at least 6 (at least 5/2 when we
arrive and at least 7/2 when we leave). Returning to node 1 costs at least 2, the
minimum of 14/2, 4/2, 11/2, and 18/2. To obtain a bound on the length of a path, it
suffices to add elements of this kind. For instance, a complete tour must include a
departure from node 1, a visit to each of the nodes 2, 3, 4, and 5 (not necessarily in
this order) and a return to 1. Its length is therefore at least
2+6+4+3+3+2= 20.
Notice that this calculation does not imply the existence of a solution that costs
only 20.
In Figure 6.6.5 the root of the tree specifies that the starting point for our tour is
node 1. Obviously, this arbitrary choice of a starting point does not alter the length of
the shortest tour. We have just calculated the lower bound shown for this node. (This
bound on the root of the implicit tree serves no purpose in the algorithm ; it was com-
puted here for the sake of illustration.) Our search begins by generating (as though for
a breadth-first search) the four possible successors of the root, namely, nodes (1, 2),
(1,3), (1,4), and (1,5). The bound for node (1,2), for example, is calculated as fol-
lows. A tour that begins with (1, 2) must include
The trip 1-2 : 14 (formally, leaving I for 2 and arriving at 2 from 1: 7+7 )
A departure from 2 toward 3, 4, or 5: minimum 7/2
A visit to 3 that neither comes from I nor leaves for 2: minimum 11/2
A similar visit to 4: minimum 3
A similar visit to 5: minimum 3
A return to 1 from 3, 4, or 5: minimum 2
Sec. 6.6 Implicit Graphs and Trees 201
Bound 20
1, 2
Bound 31
:Z 1, 5
Bound 41
The length of such a tour is therefore at least 31. The other bounds are calculated
similarly.
Next, the most promising node seems to be (1, 3), whose bound is 24. The three
children (1,3,2), (1,3,4), and (1,3,5) of this node are therefore generated. To give
just one example, we calculate the bound for node (1, 3, 2) as follows :
(1, 5) and (1, 3, 5), which cannot possibly lead to a better solution. Even exploration of
the node (1,3,4) is pointless. (Why?) There remains only node (1,4) to explore.
The only child to offer interesting possibilities is (1,4,5). After looking at the two
complete tours (1, 4, 5, 2, 3, 1) and (1, 4, 5, 3, 2, 1), we find that the tour (1, 4, 5, 2, 3, 1)
of length 30 is optimal. This example illustrates the fact that although at one point
(1, 3) was the most promising node, the optimal solution does not come from there.
To obtain our answer, we have looked at merely 15 of the 41 nodes that are
present in a complete tree of the type illustrated in Figure 6.6.5.
Problem 6.6.25. Solve the same problem using the method of Section 5.6.
Problem 6.6.27. Show how to solve the same problem using a backtracking
algorithm that calculates a bound as shown earlier to decide whether or not a partially
defined path is promising.
The need to keep a list of nodes that have been generated but not yet completely
explored, situated in all the levels of the tree and preferably sorted in order of the
corresponding bounds, makes branch-and-bound quite hard to program. The heap is an
ideal data structure for holding this list. Unlike depth-first search and its related tech-
niques, no elegant recursive formulation of branch-and-bound is available to the pro-
grammer. Nevertheless, the technique is sufficiently powerful that it is often used in
practical applications.
It is next to impossible to give any idea of how well the technique will perform
on a given problem using a given bound. There is always a compromise to be made
concerning the quality of the bound to be calculated : with a better bound we look at
less nodes, but on the other hand, we shall most likely spend more time at each one
calculating the corresponding bound. In the worst case it may turn out that even an
excellent bound does not allow us to cut any branches off the tree, and all the extra
work we have done is wasted. In practice, however, for problems of the size encoun-
tered in applications, it almost always pays to invest the necessary, time in calculating
the best possible bound (within reason). For instance, one finds applications such as
integer programming handled by branch-and-bound, the bound at each node being
obtained by solving a related problem in linear programming with continuous vari-
ables.
Problem 6.7.6. The value 1 is available. To construct other values, you have
available the two operations x2 (multiplication by 2) and /3 (division by 3, any
resulting fraction being dropped). Operations are executed from left to right. For
instance
10= 1x2x2x2x2/3x2.
We want to express 13 in this way. Show how the problem can be expressed in terms
of exploring a graph and find a minimum-length solution.
* Problem 6.7.7. Show how the problem of carrying out a syntactic analysis of
a programming language can be solved in top-down fashion using a backtracking algo-
rithm. (This approach is used in a number of compilers.)
i. Give a backtracking algorithm that finds a path, if one exists, from (1, 1) to
(n, n). Without being completely formal (for instance, you may use statements
such as "for each point v that is a neighbour of x do "), your algorithm
must be clear and precise.
ii. Without giving all the details of the algorithm, indicate how to solve this
problem by branch-and-bound.
204 Exploring Graphs Chap. 6
-+
17
-+ 11 .
Preconditioning
and Precomputation
If we know that we shall have to solve several similar instances of the same problem,
it is sometimes worthwhile to invest some time in calculating auxiliary results that can
thereafter be used to speed up the solution of each instance. This is preconditioning.
Even when there is only one instance to be solved, precomputation of auxiliary tables
may lead to a more efficient algorithm.
7.1 PRECONDITIONING
7.1.1 Introduction
Let I be the set of instances of a given problem. Suppose each instance jet can be
separated into two components j E J and k E K (that is, I c J x K ).
A preconditioning algorithm for this problem is an algorithm A that accepts as
input some element j of J and produces as output a new algorithm Bj . This algorithm
Bj must be such that if k E K and < j, k > E I, then the application of Bj on k gives the
solution to the instance < j , k > of the original problem.
205
206 Preconditioning and Precomputation Chap. 7
Let
a. We need to be able to solve any instance i E I very rapidly, for example to ensure
a sufficiently fast response time for a real-time application. In this case it is
sometimes impractical to calculate and store ahead of time the #1 solutions to all
the relevant instances. It may, on the other hand, be possible to calculate and
store ahead of time #J preconditioned algorithms. Such an application of
preconditioning may be of practical importance even if only one crucial instance
is solved in the whole lifetime of the system : this may be just the instance that
enables us, for example, to stop a runaway reactor. The time you spend studying
before an exam may also be considered as an example of this kind of precondi-
tioning.
b. We have to solve a series of instances <j, k 1 >, <j, k2>, ... , <j, kn > with the
same j. In this case the time taken to solve all the instances is
ti = t(j,k,)
if we work without preconditioning, and
n
t2=a(j)+ bj (k;)
.=1
with preconditioning. Whenever n is sufficiently large, it often happens that t2 is
much smaller than t I.
Let J be the set of all rooted trees, and let K be the set of pairs < v , w > of nodes. For
a given pair k = < v, w > and a given rooted tree j we want to know whether node v is
an ancestor of node w in tree j. (By definition, every node is its own ancestor and,
recursively, the ancestor of all the nodes of which its children are ancestors.)
If the tree j contains n nodes, any direct solution of this instance takes a time in
S2(n) in the worst case.
It is, however, possible to precondition the tree in a time in O(n), so that we can
subsequently solve any particular instance in a time in 0(1).
208 Preconditioning and Precomputation Chap. 7
We illustrate this approach using the tree in Figure 7.1.1. It contains 13 nodes.
To precondition the tree, we traverse it first in preorder and then in postorder (see Sec-
tion 6.2), numbering the nodes sequentially as we visit them. For a node v, let
prenum [v] be the number assigned to the node when we traverse the tree in preorder,
and let postnum [v] be the number assigned during the traversal in postorder. In Figure
7.1.1 these two numbers appear to the left and the right of the node, respectively.
2 B 5 C 12
D 1 4 E 3 6 F 4 8 ;G6 9 H
Let v and w be two nodes in the tree. In preorder we first number a node and
then we number its subtrees from left to right. Thus
prenum [v] <_ prenum [w] b v is an ancestor of w or
v is to the left of w in the tree.
In postorder we first number the subtrees of a node from left to right, and then we
number the node itself. Thus
postnum [v] >_ postnum [w] a v is an ancestor of w or
v is to the right of w in the tree.
It follows that
prenum [v] prenum [w] and postnum [v] >_ postnum [w]
t* v is an ancestor of w.
Once all the values of prenum and postnum have been calculated in a time in 0(n),
the required condition can be checked in a time in 0(1).
Let J be the set of polynomials in one variable x, and let K be the set of values this
variable can take. The problem consists of evaluating a given polynomial at a given
point.
For simplicity, we restrict ourselves to polynomials with integer coefficients,
evaluated at integer values of x. We use the number of integer multiplications that
have to be carried out as a barometer to measure the efficiency of an algorithm, taking
no account of the size of the operands involved nor of the number of additions and
subtractions.
Initially, we restrict ourselves even further and consider only monic polynomials
(the leading coefficient is 1) of degree n = 2k - I for some integer k ? 1.
where a is a constant and q (x) and r (x) are monic polynomials of degree 2k - I -1.
Next, we apply the same procedure recursively to q (x) and r (x). Finally, p (x) is
expressed entirely in terms of polynomials of the form x` + c, where i is a power of 2.
In the preceding example we first express p (x) in the form
(x4+a)(x3+82x2+qlx+qo)+(x3+r2x2+rlx+ro) .
x3-5x2+4x - 13=(x2+3)(x-5)+(x+2)
x3 - 3x + 9 = (x2-4)x + (x +9)
to arrive finally at the expression for p (x) given in Example 7.1.6. This expression is
the preconditioned form of the polynomial.
0 k=1
M(k) =
2M(k -1) + 1 k>2.
Consequently, M(k) = 2k ' -1 for k >- 1, and hence M(k)=2k -1+k-2. In other
words, (n -3)/2+Ig(n + 1) multiplications are sufficient to evaluate a preconditioned
polynomial of degree n = 2k -1.
* Problem 7.1.5. Prove that if the monic polynomial p (x) is given by its
coefficients, there does not exist an algorithm that can calculate p (x) using less than
n -1 multiplications in the worst case. In other words, the time invested in precondi-
tioning the polynomial allows us to evaluate it subsequently using essentially half the
number of multiplications otherwise required.
Problem 7.1.9. Show using an explicit example that the method described
here does not necessarily give an optimal solution (that is, it does not necessarily
minimize the required number of multiplications) even in the case of monic polyno-
mials of degree n = 2k -1.
The following problem occurs frequently in the design of text-processing systems (edi-
tors, macroprocessors, information retrieval systems, etc.). Given a target string con-
sisting of n characters, S = s is 2 s , and a pattern consisting of m characters,
P=p1p2 p,,, , we want to know whether P is a substring of S, and if so,
whereabouts in S it occurs. Suppose without loss of generality that n >- m. In the ana-
lyses that follow, we use the number of comparisons between pairs of characters as a
barometer to measure the efficiency of our algorithms.
The following naive algorithm springs immediately to mind. It returns r if the
first occurrence of P in S begins at position r (that is, r is the smallest integer such that
Sr+i - 1 =Pi , i = 1, 2, ... , m), and it returns 0 if P is not a substring of S.
for i E- O to n -m do
ok F- true
j *- 1
while ok and j <- m do
ifp[j]#s[i+j] then ok <--false
else j F j + 1
if ok then return i + 1
return 0
The algorithm tries to find the pattern P at every position in S. In the worst case
it makes m comparisons at each position to see whether or not P occurs there. (Think
of S = "aaa aab", P = "aaaab".) The total number of comparisons to be made is
therefore in S1(m (n -m )), which is in Q(mn) if n is much larger than in. Can we do
better ?
7.2.1 Signatures
Suppose that the target string S can be decomposed in a natural way into substrings,
S = S 1S 2 S, , and that the pattern P, if it occurs in S, must occur entirely within
one of these substrings (thus we exclude the possibility that P might straddle several
consecutive substrings). This situation occurs, for example, if the Si are the lines in a
text file S and we are searching for the lines in the file that contain P.
212 Preconditioning and Precomputation Chap. 7
The basic idea is to use a Boolean function T (P, Si) that can be calculated
rapidly to make a preliminary test. If T (P , Si) is false, then P cannot be a substring of
Si ; if T (P, Si) is true, however, it is possible that P might be a substring of Si , but we
have to carry out a detailed check to verify this (for instance, using the naive algorithm
given earlier). Signatures offer a simple way of implementing such a function.
Suppose that the character set used for the strings S and P is
{ a, b, c, ... , x, y, z, other) , where we have lumped all the non-alphabetic characters
together. Suppose too that we are working on a computer with 32-bit words. Here is
one common way of defining a signature.
i. Define val ("a") = 0, val ("b") = 1, ... , val ("z") = 25, val (other) = 26.
ii. If c1 and c2 are characters, define
B (c 1, c 2) = (27 val (c 1) + val (c 2)) mod 32.
M. Define the signature sig (C) of a string C = c 1c2 cr as a 32-bit word where
the bits numbered B (c 1, c 2), B (c 2, c 3), ... , B (cr_1, cr) are set to 1 and the other
bits are 0.
Only seven bits are set to 1 in the signature because B ("e", r") = B ("r", s") = 29.
We calculate a signature for each substring Si and for the pattern P. If Si con-
tains the pattern P, then all the bits that are set to 1 in the signature of P are also set to
1 in the signature of Si . This gives us the function T we need :
T (P , Si) = [(sig (P) and sig (Si )) = sig (P )J,
where the and operator represents the bitwise conjunction of two whole words. T can
be computed very rapidly once all the signatures have been calculated.
This is yet another example of preconditioning. Calculating the signatures for S
takes a time in O (n). For each pattern P we are given we need a further time in O (m)
to calculate its signature, but from then on we hope that the preliminary test will allow
us to speed up the search for P. The improvement actually obtained in practice
depends on the judicious choice of a method for calculating signatures.
Problem 7.2.2. Is the method illustrated of interest if the target string is very
long and if it cannot be divided into substrings ?
Problem 7.2.3. If T (P, Si) is true with probability E > 0 even if Si does not
contain P, what is the order of the number of operations required in the worst case to
find P in S or to confirm that it is absent ?
Many variations on this theme are possible. In the preceding example the func-
tion B takes two consecutive characters of the string as parameters. It is easy to invent
such functions based on three consecutive characters, and so on. The number of bits in
the signature can also be changed.
Problem 7.2.5. If the character set contains the 128 characters of the ASCII
code, and if the computer in use has 32-bit words, we might define B by
S babcbabcabcaabcabcabcacabc
P abcabcacab
T
We check the characters of P from left to right. The arrows show the comparisons car-
ried out before we find a character that does not match. In this case there is only one
comparison. After this failure we try
S babcbabcabcaabcabcabcacabc
P abcabcacab
TTTT
214 Preconditioning and Precomputation Chap. 7
This time the first three characters of P are the same as the characters opposite them
in S, but the fourth does not match. Up to now, we have proceeded exactly as in the
naive algorithm. However we now know that the last four characters examined in S
are abcx where x #"a". Without making any more comparisons with S, we can
conclude that it is useless to slide P one, two, or three characters along : such an align-
ment cannot be correct. So let us try sliding P four characters along.
S babcbabcabcaabcabcabcacabc
P abcabcacab
TTTTTTTT
Following this mismatch, we know that the last eight characters examined in S are
abcabcax where x # "c". Sliding P one or two places along cannot be right ; however
moving it three places might work.
S babcbabcabcaabcabcabcacabc
P abcabcacab
T
There is no need to recheck the first four characters of P : we chose the movement of
P in such a way as to ensure that they necessarily match. It suffices to start checking
at the current position of the pointer. In this case we have a second mismatch in the
same position. This time, sliding P four places along might work. (A three-place
movement is not enough: we know that the last characters examined in S are ax,
where x is not a "b".)
S babcbabcabcaabcabcabcacabc
P abcabcacab
TTTTTTTT
Yet again we have a mismatch, and this time a three-place movement is necessary.
S babcbabcabcaabcabcabcacabc
P abcabcacab
TTTTTT
To implement this algorithm, we need an array next[ I.. m ]. This array tells us
what to do when a mismatch occurs at position j in the pattern.
If next [ j ] = 0, it is useless to compare further characters of the pattern to the target
string at the current position. We must instead line up P with the first character of S
Sec. 7.2 Precomputation for String-Searching Problems 215
that has not yet been examined and start checking again at the beginning of P.
If next [ j ] = i > 0, we should align the i th character of P on the current character of S
and start checking again at this position. In both cases we slide P along j - next [ j ]
characters to the right with respect to S. In the preceding example we have
j 1 2 3 4 5 6 7 8 9 10
p[j] a b c a b c a c a b
next[ j] 0 1 1 0 1 1 0 5 0 1
Once this array has been calculated, here is the algorithm for finding P in S.
function KMP
j,k <- 1
while j <_ m and k 5 n do
while j > O ands [k] # p [j ] do
j'- next[j]
k-k+1
j-j+1
if j >m then return k - m
else return 0
Problem 7.2.6. Follow the execution of this algorithm step by step using the
strings from Example 7.2.2.
After each comparison of two characters, we move either the pointer (the arrow
in the diagrams, or the variable k in the algorithm) or the pattern P. The pointer and P
can each be moved a maximum of n times. The time required by the algorithm is
therefore in O (n). Precomputation of the array next [ 1 .. in ] can be carried out in a
time in 0(m), which can be neglected since m <_ n. Overall, the execution time is thus
in O (n).
It is correct to talk of preconditioning in this case only if the same pattern is
sought in several distinct target strings, which does happen in some applications.
On the other hand, preconditioning does not apply if several distinct patterns are
sought in a given target string. In all cases, including the search for a single pattern in
a single target, it is correct to talk of precomputation.
Problem 7.2.8. Modify the KMP algorithm so that it finds all the occurrences
of P in Sin a total time in 0 (n).
216 Preconditioning and Precomputation Chap. 7
Like the KMP algorithm, the algorithm due to Boyer and Moore (henceforth : the BM
algorithm) finds the occurrences of P in S in a time in O (n) in the worst case. How-
ever, since the KMP algorithm examines every character of the string S at least once in
the case when P is absent, it makes at least n comparisons. The BM algorithm, on the
other hand, is often sublinear : it does not necessarily examine every character of S,
and the number of comparisons carried out can be less than n. Furthermore, the BM
algorithm tends to become more efficient as m, the number of characters in the pattern
P, increases. In the best case the BM algorithm finds all the occurrences of P in S in a
time in O (m + n /m).
As with the KMP algorithm, we slide P along S from left to right, checking
corresponding characters. This time, however, the characters of P are checked from
right to left after each movement of the pattern. We use two rules to decide how far
we should move P after a mismatch.
Again we examine P from right to left, and again there is an immediate mismatch.
Since "i" does not appear in the pattern, we try
Sec. 7.2 Precomputation for String-Searching Problems 217
There is once again an immediate mismatch, but this time the character "a" that
appears opposite p[m] also appears in P. We slide P one place along to align the
occurrences of the letter "a", and start checking again (at the right-hand end of P).
Now, when we slide P along one position to align the "a" in the target string with the
"a" in the pattern, P is correctly aligned. A final check, always from right to left, will
confirm this. In this example we have found P without ever using rule (ii). We have
made only 9 comparisons between a character of P and a character of S.
S babcbabcabcaabcabcabcacabc
P abcabcacab
TTTT
We examine P from right to left. The left-hand arrow shows the position of the first
mismatch. We know that starting at this position S contains the characters xcab where
x #"a". If we slide P five places right, this information is not contradicted. (Under-
scores show which characters were aligned.)
S babcbabcabcaabcabcabcacabc
P abcabcacab
T
Unlike the KMP algorithm, we check all the positions of P after moving the pattern.
Some unnecessary checks (corresponding to the underscored characters in P) may be
made at times. In our example when we start over checking P from right to left, there
is an immediate mismatch. We slide P along to align the "c" found in S with the last
"c" in P.
218 Preconditioning and Precomputation Chap. 7
S babcbabcabcaabcabcabcacabc
P abcabcacab
TTTT
After four comparisons between P and S (of which one is unnecessary), carried out as
usual from right to left, we again have a mismatch. A second application of rule (ii)
gives us
S babcbabcabcaabcabcabcacabc
P abcabcacab
T
S babcbabcabcaabcabcabcacabc
P abcabcacab
T
S babcbabcabcaabcabcabcacabc
P abcabcacab
TTTTTTTTTT
This is the distance to move P according to rule (i) when we have an immediate
mismatch.
It is more complicated to compute d2. We shall not give the details here, but
only an example. The interpretation of d2 is the following : after a mismatch at posi-
tion i of the pattern, begin checking again at position m (that is, at the right-hand end)
of the pattern and d2[i] characters further along S.
S ??????xs???????? x "e"
P assesses
TT
The fact that x * "e" does not rule out the possibility of aligning the "s" in p [6] with
the "s" we have found in S. It may therefore be possible to align P as follows :
S ??????xs???????? X #"e
P assesses
T
We start checking again at the end of P, that is, 3 characters further on in S than the
previous comparison: thus d2[7] = 3.
Similarly, suppose now that we have a mismatch at position p [6]. Starting from
the position of the mismatch, the characters of S are xes, where x # "s" :
S ?????xes???????? X "s"
P assesses
TTT
The fact that x # "s" rules out the possibility of aligning the "e" and the "s" in p [4] and
p [5] with the "e" and the "s" found in S. It is therefore impossible to align P under
these characters, and we must slide P all the way to the right under the characters of S
that we have not yet examined :
S ?????xes???????? X "s
P assesses
T
We start checking again at the end of P, that is, 10 characters further on in S than the
previous comparison: thus d2[6] = 10.
As a third instance, suppose we have a mismatch at position p [4]. Starting from
the position of the mismatch, the characters of S are xsses, where x # "e":
S ???xsses???????? x "e"
P assesses
TTTTT
In this case it may be possible to align P with S by sliding it three places right :
S
P
???xsses???????? xe
assesses
T
220 Preconditioning and Precomputation Chap. 7
Now we start checking at the end of P, 7 characters further on in S than the previous
comparison, so d2[4] = 7.
For this example we find
i 1 2 3 4 5 6 7 8
p[i] a s s e s s e s
d2[i] 15 14 13 7 11 10 3
We also have d 1 ["s"] = 0, d 1 ["e"] =1, d 1 ["a"] = 7 and d 1 [any other character ] = 8.
Note that d 1 ["s"] has no significance, because an immediate mismatch is impossible at
a position where S contains "s".
Problem 7.2.11. In this algorithm, the choice between using rule (i) and rule
(ii) depends on the test "j = m ? ". However, even if j < m , it is possible that rule (i)
might allow k to advance more than rule (ii). Continuing from Example 7.2.5, con-
sider the following situation :
S ??????ts????????
P assesses
TT
The failure of the match between "t" and "e" is of the second kind, so k is increased by
d2[71 = 3 to obtain
Sec. 7.2 Precomputation for String-Searching Problems 221
S ??????ts????????
P assesses
T
However, the fact that "t" does not appear in P should have allowed us to increase k
directly by d 1["t"] = 8 positions.
S ??????te????????
P assesses
T
** Problem 7.2.14. Prove that the total execution time of the algorithm (compu-
tation of d I and d2 and search for P) is in 0 (n).
Problem 7.2.15. Modify the BM algorithm so that it will find all the
occurrences of P in S in a time in 0 (n).
It is easy to see intuitively why the algorithm is often more efficient for longer
patterns. For a character set of reasonable size (say, 52 letters if we count upper- and
lowercase separately, ten figures and about a dozen other characters) and a pattern that
is not too long, d I [c] is equal to m for most characters c. Thus we look at approxi-
mately one character out of every m in the target string. As long as m stays small
222 Preconditioning and Precomputation Chap. 7
compared to the size of the character set, the number of characters examined goes
down as m goes up. Boyer and Moore give some empirical results : if the target string
S is a text in English, about 20% of the characters are examined when m = 6; when
m = 12, only 15% of the characters in S are examined.
Probabilistic Algorithms
8.1 INTRODUCTION
Imagine that you are the hero (or the heroine) of a fairy tale. A treasure is hidden at a
place described by a map that you cannot quite decipher. You have managed to reduce
the search to two possible hiding-places, which are, however, a considerable distance
apart. If you were at one or the other of these two places, you would immediately
know whether it was the right one. It takes five days to get to either of the possible
hiding-places, or to travel from one of them to the other. The problem is complicated
by the fact that a dragon visits the treasure every night and carries part of it away to an
inaccessible den in the mountains. You estimate that it will take four more days' com-
putation to solve the mystery of the map and thus to know with certainty where the
treasure is hidden, but if you set out on a journey you will no longer have access to
your computer. An elf offers to show you how to decipher the map if you pay him the
equivalent of the treasure that the dragon can carry away in three nights.
Problem 8.1.1. Leaving out of consideration the possible risks and costs of
setting off on a treasure-hunting expedition, should you accept the elf's offer?
Obviously it is preferable to give three nights' worth of treasure to the elf rather
than allow the dragon four extra nights of plunder. If you are willing to take a calcu-
lated risk, however, you can do better. Suppose that x is the value of the treasure
remaining today, and that y is the value of the treasure carried off every night by the
dragon. Suppose further that x > 9y. Remembering that it will take you five days to
reach the hiding-place, you can expect to come home with x - 9y if you wait four days
to finish deciphering the map. If you accept the elf's offer, you can set out
223
224 Probabilistic Algorithms Chap. 8
immediately and bring back x - 5y, of which 3y will go to pay the elf; you will thus
have x - 8y left. A better strategy is to toss a coin to decide which possible hiding-
place to visit first, journeying on to the other if you find you have decided wrong. This
gives you one chance out of two of coming home with x -5y, and one chance out of
two of coming home with x -10y . Your expected profit is therefore x - 7.5y. This is
like buying a ticket for a lottery that has a positive expected return.
This fable can be translated into the context of algorithmics as follows : when an
algorithm is confronted by a choice, it is sometimes preferable to choose a course of
action at random, rather than to spend time working out which alternative is the best.
Such a situation arises when the time required to determine the optimal choice is prohi-
bitive, compared to the time that will be saved on the average by making this optimal
choice. Clearly, the probabilistic algorithm can only be more efficient with respect to
its expected execution time. It is always possible that bad luck will force the algorithm
to explore many unfruitful possibilities.
We make an important distinction between the words "average" and "expected".
The average execution time of a deterministic algorithm was discussed in section 1.4.
It refers to the average time taken by the algorithm when each possible instance of a
given size is considered equally likely. By contrast, the expected execution time of a
probabilistic algorithm is defined on each individual instance : it refers to the mean
time that it would take to solve the same instance over and over again. This makes it
meaningful to talk about the average expected time and the worst-case expected time
of a probabilistic algorithm. The latter, for instance, refers to the expected time taken
by the worst possible instance of a given size, not the time incurred if the worst pos-
sible probabilistic choices are unfortunately taken.
Example 8.1.1. Section 4.6 describes an algorithm that can find the k th small-
est of an array of n elements in linear time in the worst case. Recall that this algo-
rithm begins by partitioning the elements of the array on either side of a pivot, and that
it then calls itself recursively on the appropriate section of the array if need be. One
fundamental principle of the divide-and-conquer technique suggests that the nearer the
pivot is to the median of the elements, the more efficient the algorithm will be.
Despite this, there is no question of choosing the exact median as the pivot because
this would cause an infinite recursion (see Problem 4.6.3). Thus we choose a subop-
timal so-called pseudomedian. This avoids the infinite recursion, but choosing the
pseudomedian still takes quite some time. On the other hand, we saw another algo-
rithm that is much faster on the average, but at the price of a quadratic worst case : it
simply decides to use the first element of the array as the pivot. We shall see in Sec-
tion 8.4.1 that choosing the pivot randomly gives a substantial improvement in the
expected execution time as compared to the algorithm using the pseudomedian, without
making the algorithm catastrophically bad for the worst-case instances.
We once asked the students in an algorithmics course to implement the selection
algorithm of their choice. The only algorithms they had seen were those in Sec-
Sec. 8.1 Introduction 225
tion 4.6. Since the students did not know which instances would be used to test their
programs (and suspecting the worst of their professors), none of them took the risk of
using a deterministic algorithm with a quadratic worst case. Three students, however,
thought of using a probabilistic approach. This idea allowed them to beat their col-
leagues hands down : their programs took an average of 300 milliseconds to solve the
trial instance, whereas the majority of the deterministic algorithms took between 1500
and 2600 milliseconds.
Problem 8.1.2. Show how the effect of uniform (i .. j) can be obtained if only
uniform (a, b) is available.
226 Probabilistic Algorithms Chap. 8
Example 8.1.4. Let p be a prime number, and let a be an integer such that
1 5 a < p . The index of a modulo p is the smallest strictly positive integer i such
that a i = 1 (mod p ). It is thus the cardinality of X = { a i mod p I j > 1 } . For
example, the index of 2 modulo 31 is 5, that of 3 is 30, and that of 5 is 3. By Fermat's
theorem, an index modulo p always divides p -1 exactly. This suggests one way of
making a random, uniform, independent choice of an element of X.
function draw (a , p )
j F- uniform(1..p-1)
return dexpoiter (a , j, p) { Section 4.8 }
Problem 8.1.3. Give other examples of sets in which there is an efficient way
to choose an element randomly, uniformly, and independently.
Truly random generators are not usually available in practice. Most of the time
pseudorandom generators are used instead : these are deterministic procedures that are
able to generate long sequences of values that appear to have the properties of a
random sequence. To start a sequence, we must supply an initial value called a seed.
The same seed always gives rise to the same sequence, so to obtain different
sequences, we may choose, for example, a seed that depends on the date or time. Most
programming languages include such a generator, although some implementations
should be used with caution. Using a good pseudorandom generator, the theoretical
results obtained in this chapter concerning the efficiency of different algorithms can
generally be expected to hold. However, the impractical hypothesis that a genuinely
random generator is available is crucial when we carry out the analysis.
The theory of pseudorandom generators is complex, but a simple example will
illustrate the general idea. Most generators are based on a pair of functions S : X -+ X
and R : X -> Y, where X is a sufficiently large set and Y is the domain of pseu-
dorandom values to be generated. Let g r= X be a seed. Using the function S, this seed
defines a sequence : x0 = g and xi = S (xi - 1) for i > 0. Finally, the function R allows
us to obtain the pseudorandom sequence YO, y 1 , Y2, ... defined by yi = R (xi ), i >_ 0.
This sequence is necessarily periodic, with a period that cannot exceed #X. However,
if S and R (and sometimes g) are chosen properly, the period can be made very long,
and the sequence may be for most practical purposes statistically indistinguishable
from a truly random sequence of elements of Y. Suggestions for further reading are
given at the end of the chapter.
from one use to the next. Probabilistic algorithms can be divided into four major
classes : numerical, Monte Carlo, Las Vegas, and Sherwood. Some authors use the
term "Monte Carlo" for any probabilistic algorithm, and in particular for those we call
"numerical".
Randomness was first used in algorithmics for the approximate solution of
numerical problems. Simulation can be used, for example, to estimate the mean length
of a queue in a system so complex that it is impossible to get closed-form solutions or
to get numerical answers by deterministic methods. The answer obtained by such a
probabilistic algorithm is always approximate, but its expected precision improves as
the time available to the algorithm increases. (The error is usually inversely propor-
tional to the square root of the amount of work performed.) For certain real-life prob-
lems, computation of an exact solution is not possible even in principle, perhaps
because of uncertainties in the experimental data to be used, or maybe because a digital
computer can only handle binary or decimal values while the answer to be computed is
irrational. For other problems, a precise answer exists but it would take too long to
figure it out exactly. Sometimes the answer is given in the form of a confidence
interval.
Monte Carlo algorithms, on the other hand, are used when there is no question of
accepting an approximate answer, and only an exact solution will do. In the case of a
decision problem, for example, it is hard to see what an "approximation" might be,
since only two answers are possible. Similarly, if we are trying to factorize an integer,
it is of little interest to know that such-and-such a value is "almost a factor". A way to
put down seven queens on the chess-board is little help in solving the eight queens
problem. A Monte Carlo algorithm always gives an answer, but the answer is not
necessarily right; the probability of success (that is, of getting a correct answer)
increases as the time available to the algorithm goes up. The principal disadvantage of
such algorithms is that it is not in general possible to decide efficiently whether or not
the answer given is correct. Thus a certain doubt will always exist.
Las Vegas algorithms never return an incorrect answer, but sometimes they do
not find an answer at all. As with Monte Carlo algorithms, the probability of success
increases as the time available to the algorithm goes up. However, any answer that is
obtained is necessarily correct. Whatever the instance to be salved, the probability of
failure can be made arbitrarily small by repeating the same algorithm enough times on
this instance. These algorithms should not be confused with those, such as the simplex
algorithm for linear programming, that are extremely efficient for the great majority of
instances to be handled, but catastrophic for a few instances.
Finally, Sherwood algorithms always give an answer, and the answer is always
correct. They are used when some known deterministic algorithm to solve a particular
problem runs much faster on the average than in the worst case. Incorporating an ele-
ment of randomness allows a Sherwood algorithm to reduce, and sometimes even to
eliminate, this difference between good and bad instances. It is not a case of
preventing the occasional occurrence of the algorithm's worst-case behaviour, but
rather of breaking the link between the occurrence of such behaviour and the particular
228 Probabilistic Algorithms Chap. 8
instance to be solved. Since it reacts more uniformly than the deterministic algorithm,
a Sherwood algorithm is less vulnerable to an unexpected probability distribution of
the instances that some particular application might give it to solve (see the end of Sec-
tion 1.4).
Problem 8.2.2. Show how to obtain a Las Vegas algorithm to solve a well-
characterized problem given that you already have a Monte Carlo algorithm for the
same problem. Contrariwise, show how to obtain a Monte Carlo algorithm for any
problem whatsoever given that you already have a Las Vegas algorithm for the same
problem.
You spill a box of toothpicks onto a wooden floor. The toothpicks spread out on the
ground in random positions and at random angles, each one independently of all the
others. If you know that there were 355 toothpicks in the box, and that each one is
exactly half as long as the planks in the floor are wide (we realize that this gets
unlikelier every minute ! ), how many toothpicks will fall across a crack between two
planks ?
Clearly any answer between 0 and 355 is possible, and this uncertainty is typical
of probabilistic algorithms. However, as Georges Louis Leclerc showed, the average
number of toothpicks expected to fall across a crack can be calculated : it is almost
exactly 113.
In fact, each toothpick has one chance in it of falling across a crack. This sug-
gests a probabilistic "algorithm" for estimating the value of it by spilling a sufficiently
large number of toothpicks onto the floor. Needless to say, this method is not used in
practice since better methods of calculating the decimal expansion of it are known.
Sec. 8.3 Numerical Probabilistic Algorithms 229
* Problem 8.3.2. Supposing that the width of the planks is exactly twice the
length of the toothpicks, how many of the latter should you drop in order to obtain
with probability at least 90% an estimate of it whose absolute error does not exceed
0.001 ?
Problem 8.3.3. Supposing that you have available a random generator of the
type discussed previously, give an algorithm Buffon (n) that simulates the experiment
of dropping n toothpicks. Your algorithm should count the number k of toothpicks
that fall across a crack, and return n / k as its estimate of it. Try your algorithm on a
computer with n = 1000 and n = 10,000, using a pseudorandom generator. What are
your estimates of it ? (It is likely that you will need the value of it during the simula-
tion to generate the random angle - in radians - of each toothpick that falls. But then
nobody said this was a practical method !) 0
Consider next the experiment that consists of throwing n darts at a square target
and counting the number k that fall inside a circle inscribed in this square. We suppose
that every point in the square has exactly the same probability of being hit by a dart.
(It is much easier to simulate this experiment on a computer than to find a darts-player
with exactly the degree of expertise-or of incompetence - required.) If the radius of
the inscribed circle is r, then its area is nr 2, whereas that of the square target is 4r2, so
the average proportion of the darts that fall inside the circle is nr2/4r2 = it/4. This
allows us to estimate it = 4k/n. Figure 8.2.1 illustrates the experiment. In our
example, where 28 darts have been thrown, we are not surprised to find 21 of them
inside the circle, where we expect to see on average 28n/4 = 22.
The following algorithm simulates this experiment, except that it only throws
darts into the upper right quadrant of the target.
At
AF
M r
At
Si r
r
4
r /
This brings us to the best known of the numerical probabilistic algorithms : Monte
Carlo integration. (This name is unfortunate, because in our terminology it is not an
example of a Monte Carlo algorithm.) Recall that if f : [0, 1] -* [0,1] is a continuous
function, then the area of the surface bounded by the curve y = f (x), the x-axis,
the y-axis, and the line x =1 is given by
if (x)dx
0
To estimate this integral, we could throw a sufficient number of darts at the unit square
and count how many of them fall below the curve.
function hitormiss (f , n)
k+-0
for i E- 1 to n do
x t- uniform (0,1)
y F uniform (0,1)
if y <-f(x)then k Fk+1
return k In
Thus the algorithm using darts to estimate it is equivalent to the evaluation of
45(1-x 2)Zdx
0
* Problem 8.3.5. Consider two real constants a and S strictly between 0 and 1.
Prove that if ! is the correct value of the integral and if h is the value returned by the
preceding algorithm, then Prob[ I h - I I < E ] > 1- S whenever the number n of itera-
tions is at least I (1-I)/E2S. Therefore it is sufficient to use n = 11/4c281 (because
I (1-I) <-'/a) to reduce below 8 the probability of an absolute error exceeding E.
Sec. 8.3 Numerical Probabilistic Algorithms 231
Notice that this is not very good : one more decimal digit of precision requires one
hundred times more computation.
Problem 8.3.6. Let a, b, c, and d be four real numbers such that a < b and
c S d, and let f : [a, b I - [c, d ] be a continuous function. Generalize the preceding
algorithm to estimate
h
Jf(x)dx .
a
Problem 8.3.7. Try to grasp intuitively why this algorithm works. Why is it
called the trapezoidal algorithm ?
Problem 8.3.8. Compare experimentally the trapezoidal algorithm and the two
probabilistic algorithms we have seen. In each case, estimate the value of it by calcu-
lating fo 4(1-x2)zdx.
In general, the trapezoidal algorithm needs many less iterations than does Monte
Carlo integration to obtain a comparable degree of precision. This is typical of most of
the natural functions that we may wish to integrate. However, to every deterministic
integration algorithm, even the most sophisticated, there correspond continuous func-
tions that can be constructed expressly to fool the algorithm. Consider for example the
function f (x) = sin2((100!)itx). Any call on trapezoidal (f , n , 0,1) with 2 5 n <- 101
returns the value zero, even though the true value of this integral is z . No function
can play this kind of trick on the Monte Carlo integration algorithm (although there is
an extremely small probability that the algorithm might manage to make a similar kind
of error, even when f is a thoroughly ordinary function).
In practice, Monte Carlo integration is of interest when we have to evaluate a
multiple integral. If a deterministic integration algorithm using some systematic
method to sample the function is generalized to several dimensions, the number of
sample points needed to achieve a given precision grows exponentially with the dimen-
sion of the integral to be evaluated; If 100 points are needed to evaluate a simple
integral, then it will probably be necessary to use all the points of a 100 x 100 grid, that
is, 10,000 points, to achieve the same precision when a double integral is evaluated;
one million points will be needed for a triple integral, and so on. In Monte Carlo
integration, on the other hand, the dimension of the integral generally has little effect
on the precision obtained, although the amount of work for each iteration is likely to
increase slightly with the dimension. In practice, Monte Carlo integration is used to
evaluate integrals of dimension four or higher. The precision of the answer can be
improved using hybrid techniques that are partly systematic and partly probabilistic.
If the dimension is fixed, it may even be preferable to use quasi Monte Carlo integra-
tion, a technique not discussed here (but Section 8.7 gives a reference for further
reading).
The intuitive answer to the preceding question is almost invariably "of course
not". Nevertheless, the probability that you would win your bet is greater than 56%.
More generally, there are n!/(n-k)! different ways of choosing k distinct objects from
among n objects, taking into account the order in which they are chosen. Since there
are n k different ways of choosing k objects if repetitions are allowed, the probability
that k objects chosen randomly and uniformly from n (with repetition allowed) are all
distinct is n!/(n -k )!n k .
This suggests the following probabilistic algorithm for estimating the number of
elements in a set X .
234 Probabilistic Algorithms Chap. 8
The variance of the estimate obtained from this algorithm is unfortunately too
high for most practical applications (unless the solution to Problem 8.3.15 is used).
The following example shows that it can nonetheless be useful if we simply need to
know whether X contains less than a elements or more than b , where a << b .
You have the complete works of Shakespeare on a magnetic tape. How can you deter-
mine the number of different words he used, counting different forms of the same word
(plurals, possessives, and so on) as distinct items?
Two obvious solutions to this problem are based on the techniques of sorting and
searching. Let N be the total number of words on the tape, and let n be the number of
different words. The first approach might be to sort the words on the tape so as to
236 Probabilistic Algorithms Chap. 8
bring identical forms together, and then to make a sequential pass over the sorted tape
to count the number of different words. This method takes a time in O(N log N) but
requires a relatively modest amount of space in central memory if a suitable external
sorting technique is used. (Such techniques are not covered in this book.) The second
approach consists of making a single pass over the tape and constructing in central
memory a hash table (see Section 8.4.4) holding a single occurrence of each form so
far encountered. The required time is thus in 0 (N) on the average, but it is in S2(Nn)
in the worst case. Moreover, this second method requires a quantity of central memory
in S2(n), which will most likely prove prohibitive.
If we are willing to tolerate some imprecision in the estimate of n, and if we
already know an upper bound M on the value of n (or failing this, on the value of N ),
then there exists a probabilistic algorithm for solving this problem that is efficient with
respect to both time and space. We must first define what sequences of characters are
to be considered as words. (This may depend not only on the character set we are
using, but also whether we want to count such sequences as "jack-rabbit", "jack-
o'lantern", and "jack-in-the-box" as one, two, three, or four words.) Let U be the set of
such sequences. Let m be a parameter somewhat larger than lg M (a more detailed
analysis shows that m = 5 + Fig MI suffices). Let h : U -* {0,1 }'" be a hash function
that transforms a sequence from U in a pseudorandom way into a string of bits of
length m. If y is a string of bits of length k, denote by y [i] the i th bit of y,
1 5i 5k ; denote by 7t(y, b ), b E { 0,1 } , the smallest i such that y [i ] =b, or k+1 if
none of the bits of y is equal to b. Consider the following algorithm.
function wordcnt
{ initialization }
y - string of (m + 1) bits set to zero
{ sequential passage through the tape }
for each word x on the tape do
i -7t(h(x),1)
y[i]<-l
return 7t(y, 0)
Suppose, for example, that the value returned by this algorithm is 4. This means
that the final y begins with 1110. Consequently, there are words x 1, x2 and x3 on the
tape such that h (x1) begins with 1, 01, and 001, respectively, but there is no word x4
such that h (x4) begins with 0001. Let k be the value returned by a call on wordcnt.
Since the probability that a random binary string begins with 0001 is 2-4, it is unlikely
that there could be more than 16 distinct words on the tape. (The probability that
n(h (xi), 1) # 4 for 16 different values of xi is (15/16)16 = 35.6% = e-1, assuming h has
sufficiently random behaviour ; in fact, Prob[ k = 41 n =16 ] = 313/4 %. ) Conversely,
since the probability that a random binary string begins with 001 is 2-3, it is unlikely
that there could be less than 4 distinct words on the tape. (The probability that
it(h (x, ), 1) = 3 for at least one value of xi among 4 different values is
1-(7/8)4 = 41.4 % = 1- a -eh'' ; in fact, Prob[ k = 41 n = 4 ] = 183/4 %. ) This crude rea-
Sec. 8.3 Numerical Probabilistic Algorithms 237
soning indicates that it is plausible to expect that the number of distinct words on the
tape should lie between 2k-2 and 2"`. It is far from obvious how to carry out a more
precise analysis of the unbiased estimate of n given by k.
This offers a first approach for estimating the number of different words : calcu-
late k using the algorithm wordcnt and estimate n as 2k/1.54703. Unfortunately, the
standard deviation of R shows that this estimate may be in error by a factor of 2,
which is unacceptable.
Section 1.4 mentions that analysing the average efficiency of an algorithm may some-
times give misleading results. The reason is that any analysis of the average case must
be based on a hypothesis about the probability distribution of the instances to be han-
dled. A hypothesis that is correct for a given application of the algorithm may prove
disastrously wrong for a different application. Suppose, for example, that quicksort
(Section 4.5) is used as a subalgorithm inside a more complex algorithm. Analysis of
this sorting method shows that it takes an average time in O(n log n) to sort n items
provided that the instances to be sorted are chosen randomly. This analysis no longer
bears any relation to reality if in fact we tend to give the algorithm only instances that
are already almost sorted. Sherwood algorithms free us from the necessity of worrying
about such situations by evening out the time required on different instances of a given
size.
Let A be a deterministic algorithm and let to (x) be the time it takes to solve some
instance x. For every integer n let X, be the set of instances of size n. Supposing that
every instance of a given size is equiprobable, the average time taken by the algorithm
to solve an instance of size n is
tA(n)= Y. tA(x)/#X
xEX
This in no way rules out the possibility that there exists an instance x of size n such
that to (x) >> FA (n). We wish to obtain a probabilistic algorithm B such that
tB (x) = 1A (n) + s (n) for every instance x of size n, where tB (x) is the expected time
taken by algorithm B on instance x and s (n) is the cost we have to pay for this unifor-
mity.
Algorithm B may occasionally take more time than tA(n)+s(n) on an instance x
of size n, but this fortuitous behaviour is only due to the probabilistic choices made by
the algorithm, independently of the specific instance x to be solved. Thus there are no
longer worst-case instances, but only worst-case executions. If we define
Fe(n) = I te(a)l#X
xex'
We return to the problem of finding the kth smallest element in an array T of n ele-
ments (Section 4.6 and Example 8.1.1). The heart of this algorithm is the choice of a
Sec. 8.4 Sherwood Algorithms 239
pivot around which the other elements of the array are partitioned. Using the pseudo-
median as the pivot assures us of a linear execution time in the worst case, even
though finding this pivot is a relatively costly operation. On the other hand, using the
first element of the array as the pivot assures us of a linear execution time on the
average, with the risk that the algorithm will take quadratic time in the worst case
(Problems 4.6.5 and 4.6.6). Despite this prohibitive worst case, the simpler algorithm
has the advantage of a much smaller hidden constant on account of the time that is
saved by not calculating the pseudomedian. The decision whether it is more important
to have efficient execution in the worst case or on the average must be taken in the
light of the particular application. If we decide to aim for speed on the average thanks
to the simpler deterministic algorithm, we must make sure that the instances to be
solved are indeed chosen randomly and uniformly.
Suppose that the elements of T are distinct, and that we are looking for the
median. The execution times of the algorithms in Section 4.6 do not depend on the
values of the elements of the array, but only on their relative order. Rather than
express this time as a function solely of n, which forces us to distinguish between the
worst case and an average case, we can express it as a function of both n and a, the
permutation of the first n integers that corresponds to the relative order of the elements
of the array.
Let tp (n , a) and t, (n , (5) be the times taken by the algorithm that uses the pseu-
domedian and by the simplified algorithm, respectively. The simplified algorithm is
generally faster : for every n , t, (n , a) < tp (n , (Y) for most values of a. On the other
hand, the simplified algorithm is sometimes disastrous : t, (n ,a) is occasionally much
greater than tp (n , a). More precisely, let S, be the set of n! permutations of the first n
integers. Define F, (n) = I t , (n ,a)/n! . We have the following equations :
6ESn
(3cp)(3n,ElN)(`dn?n1)(Vo
but
while i < j do
m <-- T [uniform (i .. j)]
partition (T,i,j,m,u,v)
ifk < u <--
The modifications we made to the deterministic algorithms for sorting and for selection
in order to obtain Sherwood algorithms are simple. There are, however, occasions
when we are given a deterministic algorithm efficient on the average but that we
cannot reasonably expect to modify. This happens, for instance, if the algorithm is part
of a complicated, badly documented software package. Stochastic preconditioning
allows us to obtain a Sherwood algorithm without changing the deterministic algo-
Sec. 8.4 Sherwood Algorithms 241
rithm. The trick is to transform the instance to be solved into a random instance, to
use the given deterministic algorithm to solve this random instance, and then to deduce
the solution to the original instance.
Suppose the problem to be solved consists of the computation of some function
f : X -i Y for which we already have an algorithm that is efficient on the average. For
every integer n, let X be the set of instances of size n, and let A be a set with the
same number of elements. Assume random sampling with uniform distribution is pos-
sible efficiently within A . Let A be the union of all the A . Stochastic precondi-
tioning consists of a pair of functions u : X x A -* X and v : A x Y - Y such that
i.
H. )[f (x) = v(r, f(u(x,r)))]; and
iii. the functions u and v can be calculated efficiently in the worst case.
Problem 8.4.2. Why does the algorithm dlogRH work? Point out the func-
tions corresponding to u and v.
Problem 8.4.3. Find other problems that can benefit from stochastic precondi-
tioning.
A list of n keys sorted into ascending order is implemented using two arrays val [ 1 .. n ]
and ptr [ 1 .. n ] and an integer head. The smallest key is in val [head ], the next smal-
lest is in val [ptr [head ]], and so on. In general, if val [i] is not the largest key, then
ptr [i] gives the index of the following key. The end of the list is marked by
ptr [i] = 0. The rank of a key is the number of keys in the list that are less than or
equal to the given key. For instance, here is one way to represent the list 1, 2, 3, 5, 8,
13, 21 .
Sec. 8.4 Sherwood Algorithms 243
i 1 2 3 4 5 6 7
val [i] 2 3 13 1 5 21 8
ptr [i] 2 5 6 1 7 0 3
Problem 8.4.4. Prove the preceding assertion. (Hint : show how a worst-case
instance can be constructed systematically from the probes made into the list by any
given deterministic algorithm.)
Despite this inevitable worst case, there exists a deterministic algorithm that is
capable of carrying out such a search in an average time in 0 (I ). From this we can
obtain a Sherwood algorithm whose expected execution time is in 0 (') whatever the
instance to be solved. As usual, the Sherwood algorithm is no faster on the average
than the corresponding deterministic algorithm, but it does not have such a thing as a
worst-case instance.
Suppose for the moment that the required key is always in fact present in the list,
and that all the elements of the list are distinct. Given a key x, the problem is thus to
find that index i, 1 5 i <_ n , such that val [i] = x. Any instance can be characterized
by a permutation a of the first n integers and by the rank k of the key we are looking
for. Let Sn be the set of all n! permutations. If A is any deterministic algorithm,
tA (n , k, (Y) denotes the time taken by this algorithm to find the key of rank k among the
n keys in the list when the order of the latter in the array val is specified by the permu-
tation a. In the case of a probabilistic algorithm, to (n , k , (Y) denotes the expected
value of this time. Whether the algorithm is deterministic or probabilistic, wA (n) and
mA (n) denote its worst-case and its mean time, respectively. Thus
WA(n)=max{tA(n,k,(Y)I 1 <_k <_n and aeS,},
and
mA(n)= E EtA(n,k,(Y)
nxn! vES k=I
Problem 8.4.4 implies that wA (n) E S2(n) for every deterministic algorithm A. We want
a deterministic algorithm B such that mB (n)E 0 ('/n) and a Sherwood algorithm C
such that we (n) = mB (n).
The following algorithm finds a key x starting from some position i in the list,
provided that x ? val [i] and that x is indeed present.
244 Probabilistic Algorithms Chap. 8
function A (x)
return search (x, head )
Problem 8.4.5. Let iA (n, k) be the exact number of references to the array
val made by the algorithm A to find the key of rank k in a list of n keys. (The order a
of the keys is irrelevant for this algorithm.) Define WA (n) and mA (n) similarly. Deter-
mine 1A (n, k) for every integer n and for every k between 1 and n. Determine WA (n)
and mA (n) for every integer n .
Problem 8.4.8. Intuitively, why should we choose to execute the for loop
times in algorithm B ?
Sec. 8.4 Sherwood Algorithms 245
* Problem 8.4.9. Prove that m8 (n) E 0 (' ). [Hint : Let M1,,, be the random
variable that corresponds to the minimum of I integers chosen randomly, uniformly,
and independently with replacement from the set ( 1, 2, ... , n }. Find a link between
this random variable and the average-case analysis of algorithm B. Show that the
expected value of M1,,, is about n /(l + 1) when 1 is a constant and about when
Problem 8.4.13. Give an efficient Sherwood algorithm that takes into account
the possibility that the key we are seeking may not be in the list and that the keys may
not all be distinct. Analyse your algorithm.
Problem 8.4.14. Use the structure and the algorithms we have just seen to
obtain a Sherwood sorting algorithm that is able to sort n elements in a worst-case
expected time in 0 (n 312). Is this better than 0 (n log n) ? Justify your answer.
Hash coding, or simply hashing, is used in just about every compiler to implement the
symbol table. Let X be the set of possible identifiers in the language to be compiled,
and let N be a parameter chosen to obtain an efficient system. A hash function is a
function h : X --* { 1, 2, ... , N }. Such a function is a good choice if it efficiently
disperses all the probable identifiers, that is, if h (x) # h (y) for most of the pairs x * y
that are likely to be found in the same program. When x * y but h (x) = h (y), we say
that there is a collision between x and y . The hash table is an array T [I .. N ] of lists
in which T [i ] is the list of those identifiers x found in the program such that h (x) = i.
The load factor of the table is the ratio a = n IN, where n is the number of distinct
identifiers in the table. (The ratio (x may well be greater than 1.) If we suppose that
every identifier and every pointer occupies a constant amount of space, the table takes
space in O(N+n) and the average length of the lists is a. Thus we see that increasing
the value of N reduces the average search time but increases the space occupied by the
table.
Problem 8.4.15. Other ways to handle collisions, besides the use of a table of
lists as outlined here, are legion. Suggest a few.
246 Probabilistic Algorithms Chap. 8
Problem 8.4.16. What do you think of the "solution" that consists of ignoring
the problem? If we are given an a priori upper bound on the number of identifiers a
program may contain, does it suffice to choose N rather larger than this bound to
ensure that the probability of a collision is negligible ? (Hint : solve Problem 8.3.9
before answering.)
Problem 8.4.17. Show that n calls on the symbol table can take a total time in
f2(n2) in the worst case.
This technique is very efficient provided that the function h disperses the
identifiers properly. If we suppose, however, that #X is very much greater than N, it is
inevitable that certain programs will cause a large number of collisions. These pro-
grams will compile slowly every time they are submitted. In a sense they are paying
the price for all other programs to compile quickly. A Sherwood approach allows us to
retain the efficiency of hashing on the average, without arbitrarily favouring some pro-
grams at the expense of others. (If you have not yet solved Problem 8.2.3, now is the
time to think some more about it!)
The basic idea is to choose the hash function randomly at the beginning of each
compilation. A program that causes a large number of collisions during one compila-
tion will therefore probably be luckier next time it is compiled. Unfortunately, there
are f a r too many functions from X into { 1, 2, ... , N } for it to be reasonable to choose
one at random.
This follows because the algorithm succeeds at the first attempt with probability p (x),
thus taking an expected time s (x). With probability 1- p (x) it first makes an unsuc-
cessful attempt to solve the instance, taking an expected time e (x), before starting all
over again to solve the instance, which still takes an expected time t (x). The recurrence
is easily solved to yield
)x)
WW
t (x) = s (x) + 1 - ( (x)
e
Problem 8.5.1. Suppose that s (x) and e (x) are not just expected times, but
that they are in fact the exact times taken by a call on LV (x , ...) in the case of success
and of failure, respectively. What is the probability that the algorithm obstinate will
find a correct solution in a time not greater than t, for any t >- s (x) ? Give your
answer as a function of t, s (x), e (x) and p (x).
The eight queens problem (Section 6.6.1) provides a nice example of this kind of algo-
rithm. Recall that the backtracking technique used involves systematically exploring
the nodes of the implicit tree formed by the k-promising vectors. Using this technique,
we obtain the first solution after examining only 114 of the 2,057 nodes in the tree.
This is not bad, but the algorithm does not take into account one important fact : there
is nothing systematic about the positions of the queens in most of the solutions. On
the contrary, the queens seem more to have been positioned haphazardly. This obser-
vation suggests a greedy Las Vegas algorithm that places queens randomly on succes-
sive rows, taking care, however, that the queens placed on the board do not threaten
one another. The algorithm ends either successfully if it manages to place all the
queens on the board or in failure if there is no square in which the next queen can be
added. The resulting algorithm is not recursive.
Sec. 8.5 Las Vegas Algorithms 249
j4-i
if nb > 0
then ( amongst all nb possibilities for the (k + 1)st queen,
it is column j that has been chosen (with probability 1 / nb) }
try [k +I] '- j
col - colu{j}
diag45 - diag45 u { j - k }
diag 135 F- diag 135 u { j + k }
{ try [ 1 .. k + 1 ] is (k + 1)-promising }
k-k+1
until nb = 0 or k =8
success - (nb > 0)
To analyse the efficiency of this algorithm, we need to determine its probability p
of success, the average number s of nodes that it explores in the case of success, and
the average number e of nodes that it explores in the case of failure. Clearly s = 9
(counting the 0-promising empty vector). Using a computer we can calculate
p = 0.1293 . and e = 6.971 A solution is therefore obtained more than one
time out of eight by proceeding in a completely random fashion ! The expected
number of nodes explored if we repeat the algorithm until a success is finally obtained
is given by the general formula s + (1-p )e/p = 55.927 , less than half the
number of nodes explored by the systematic backtracking technique.
Problem 8.5.2. When there is more than one position open for the (k + I )st
queen, the algorithm QueensLV chooses one at random without first counting the
number nb of possibilities. Show that each position has, nevertheless, the same proba-
bility of being chosen.
We can do better still. The Las Vegas algorithm is too defeatist : as soon as it
detects a failure it starts all over again from the beginning. The backtracking
250 Probabilistic Algorithms Chap. 8
algorithm, on the other hand, makes a systematic search for a solution that we know
has nothing systematic about it. A judicious combination of these two algorithms first
places a number of queens on the board in a random way, and then uses backtracking
to try and add the remaining queens, without, however, reconsidering the positions of
the queens that were placed randomly.
An unfortunate random choice of the positions of the first few queens can make
it impossible to add the others. This happens, for instance, if the first two queens are
placed in positions 1 and 3, respectively. The more queens we place randomly, the
smaller is the average time needed by the subsequent backtracking stage, but the
greater is the probability of a failure.
The resulting algorithm is similar to QueensLV, except that the last two lines are
replaced by
until nb = 0 or k = stopVegas
if nb > 0 then backtrack (k , col, diag45, diag 135, success)
else success *- false ,
where 1 < stopVegas S 8 indicates how many queens are to be placed randomly before
moving on to the backtracking phase. The latter looks like the algorithm Queens of
Section 6.6.1 except that it has an extra parameter success and that it returns immedi-
ately after finding the first solution if there is one.
The following table gives for each value of stopVegas the probability p of suc-
cess, the expected number s of nodes explored in the case of success, the expected
number e of nodes explored in the case of failure, and the expected number
t = s +(I -p)e/p of nodes explored if the algorithm is repeated until it eventually
finds a solution. The case stopVegas = 0 corresponds to using the deterministic algo-
rithm directly.
stopVegas p s e t
0
1
1.0000
1.0000
114.00
39.63
-- 114.00
39.63
2 0.8750 22.53 39.67 28.20
3 0.4931 13.48 15.10 29.01
4 0.2618 10.31 8.79 35.10
5 0.1624 9.33 7.29 46.92
6 0.1357 9.05 6.98 53.50
7 0.1293 9.00 6.97 55.93
8 0.1293 9.00 6.97 55.93
half the time taken by the backtracking algorithm because we must also take into
account the time required to make the necessary pseudorandom choices of position.
Problem 8.5.3. If you are still not convinced of the value of this technique,
we suggest you try to solve the twelve queens problem without using a computer.
First, try to solve the problem systematically, and then try again, this time placing the
first five queens randomly.
For the eight queens problem, a systematic search for a solution beginning with
the first queen in the first column takes quite some time. First the trees below the
2-promising nodes [1,3] and [1,4] are explored to no effect. Even when the search
starting from node [1,5] begins, we waste time with [1,5,21 and [1,5,71. This is one
reason why it is more efficient to place the first queen at random rather than to begin
the systematic search immediately. On the other hand, a systematic search that begins
with the first queen in the fifth column is astonishingly quick. (Try it!) This unlucky
characteristic of the upper left-hand corner is nothing more than a meaningless
accident. For instance, the same corner is a better than average starting point for the
problems with five or twelve queens. What is significant, however, is that a solution
can be obtained more rapidly on the average if several queens are positioned randomly
before embarking on the backtracking phase. Once again, this can be understood intui-
tively in terms of the lack of regularity in the solutions (at least when the number of
queens is not 4k + 2 for some integer k).
Here are the values of p, s, e, and t for a few values of stopVegas in the case of
the twelve queens problem.
p
stopVegas
0
5
1.0000
0.5039
s
262.00
33.88
-e
47.23
t
262.00
80.39
12 0.0465 13.00 10.20 222.11
On the CYBER 835 the Las Vegas algorithm that places the first five queens randomly
before starting backtracking requires only 37 milliseconds on the average to find a
solution, whereas the pure backtracking algorithm takes 125 milliseconds. As for the
greedy Las Vegas algorithm, it wastes so much time making its pseudorandom choices
of position that it requires essentially the same amount of time as the pure backtracking
algorithm.
An empirical study of the twenty queens problem was also carried out using an
Apple II personal computer. The deterministic backtracking algorithm took more than
2 hours to find a first solution. Using the probabilistic approach and placing the first
ten queens at random, 36 different solutions were found in about five and a half
minutes. Thus the probabilistic algorithm turned out to be almost 1,000 times faster
per solution than the deterministic algorithm.
value of stopVegas, and then to apply the corresponding Las Vegas algorithm. In fact,
determining the optimal value of stopVegas takes longer than a straightforward search
for a solution using backtracking. (We needed more than 50 minutes computation on
the CYBER to establish that stopVegas = 5 is the optimal choice for the twelve queens
problem !) Find an analytic method that enables a good, but not necessarily optimal,
value of stopVegas to be determined rapidly as a function of n.
** Problem 8.5.5. Technically, the general algorithm obtained using the previous
problem (first determine stopVegas as a function of n, the number of queens, and then
try to place the queens on the board) can only be considered to be a Las Vegas algo-
rithm if its probability of success is strictly positive for every n. This is the case if
and only if there exists at least one solution. If no solution exists, the obstinate proba-
bilistic algorithm will loop forever without realizing what is happening. Prove or
disprove : the n queens problem can be solved for every n >- 4. Combining this with
Problem 8.5.4, can you find a constant S > 0 such that the probability of success of the
Las Vegas algorithm to solve the n queens problem is at least S for every n ?
Problem 8.5.7. Prove, on the other hand, that no quadratic residue has more
than two distinct square roots. (Hint : assuming that a 2 ° h 2 (mod p), consider
a 2 - b 2.)
Problem 8.5.8. Conclude from the preceding results that exactly half the
integers between 1 and p -1 are quadratic residues modulo p.
modulo p, does there exist an efficient algorithm for calculating the two square roots
of x modulo p ? The problem is easy when p - 3 (mod 4), but no efficient deter-
ministic algorithm is known to solve this problem when p =- 1 (mod 4).
There exists, however, an efficient Las Vegas algorithm to solve this problem
when p = 1 (mod 4). Let us decide arbitrarily to denote by the smaller of the two
square roots of x. Even if the value of is unknown, it is possible to carry out the
symbolic multiplication of a +b-6- and c + d modulo p, where a, b, c, and d
are integers between 0 and p - 1. This product is ((ac +bdx) mod p) +
((ad + bc) mod p Note the similarity to a product of complex numbers. The
symbolic exponentiation (a +b-6- )" can be calculated efficiently by adapting the algo-
rithms of Section 4.8.
Problem 8.5.11. Using Example 8.5.1 as a guide, carry out the symbolic cal-
culation (2+I7)26 = 0+41J (mod 53) in detail.
254 Probabilistic Algorithms Chap. 8
To obtain a square root of 7, the preceding problem shows that we need only find
the unique integer y such that 1 <- y <- 52 and 41y - 1 (mod 53). This can be done
efficiently using a modification of Euclid's algorithm for calculating the greatest
common divisor (Section 1.7.4).
* Problem 8.5.12. Let u and v be two positive integers, and let d be their
greatest common divisor.
i. Prove that there exist integers a and b such that au + by = d. [Hint : Suppose
without loss of generality that u >- v. If v = d, the proof is trivial (a = 0 and
b = 1). Otherwise, let w = u mod v. First show that d is also the greatest
common divisor of v and w. (This is the heart of Euclid's algorithm.) By
mathematical induction, now let a' and b' be such that a'v + b'w = d. Then we
need only take a = b' and b = a' -b' Lu l v j.]
ii. Give an efficient iterative algorithm for calculating d, a, and b from u and v.
Your algorithm should not calculate d before starting to work on a and b.
iii. If p is prime and 1 a <- p - 1, prove that there exists a unique y such that
1 <- y <- p -1 and ay I (mod p). Give an efficient algorithm for calculating y
given p and a.
i. The Las Vegas algorithm finds a square root of x if and only if it randomly
chooses an a that gives the key to ' ; and
ii. Exactly (p + 3)/2 of the p -1 possible random choices for a give the key to x_.
[Hint : Consider the function
f: 11 ,2, . . . ( , p ,p- }{2,3,...,p-21
defined by the equation (a - NFx_ ) f (a) = a +L (mod p ). Prove that this func-
tion is one-to-one and that f (a) is a quadratic residue modulo p if and only if a
does not give the key to '.] 0
This shows that the Las Vegas algorithm succeeds with probability somewhat
greater than one half, so that on the average it suffices to call it twice to obtain a
square root of x. In view of the high proportion of integers that give a key to ', it is
curious that no known efficient deterministic algorithm is capable of finding even one
of them with certainty.
Prove that the probability of failure of this algorithm is exactly (1/2)k-1 and that the
while loop is executed at most k - 2 times, where k is specified in the algorithm.
256 Probabilistic Algorithms Chap. 8
The two preceding problems show the computational equivalence between the
efficient deterministic calculation of square roots modulo p and the efficient deter-
ministic discovery of a quadratic nonresidue modulo p. This is an example of the
technique called reduction, which we study in Chapter 10.
Let n be an integer greater than 1. The problem of factorizing n consists of finding the
unique decomposition n = p 1 ' p 2 pk.
Problem 8.5.18. Suppose you have available an algorithm prime (n), which
tests whether or not n is prime, and an algorithm split(n), which finds a nontrivial
factor of n provided n is composite. Using these two algorithms as primitives, give an
algorithm to factorize any integer.
Section 8.6.2 concerns an efficient Monte Carlo algorithm for determining pri-
mality. Thus the preceding problem shows that the problem of factorization reduces to
the problem of splitting. Here is the naive algorithm for the latter problem.
Sec. 8.5 Las Vegas Algorithms 257
The preceding algorithm takes a time in Q(J) in the worst case to split n. It is
therefore of no practical use even on medium-sized integers : it could take more than 3
million years in the worst case to split a number with forty or so decimal digits,
counting just microsecond for each trip round the loop. No known algorithm,
1
The notion of quadratic residue modulo a prime number (Section 8.5.2) general-
izes to composite numbers. Let n be any positive integer. An integer x,
1 5 x S n -1, is a quadratic residue modulo n if it is relatively prime to n (they have
no nontrivial common factor) and if there exists an integer y, 1 <- y < n - 1, such that
x = y 2 (mod n). Such a y is a square root of x modulo n. We saw that a quadratic
residue modulo p has exactly two distinct square roots modulo p when p is prime.
This is no longer true modulo n if n has at least two distinct odd prime factors. For
instance, 82 = 132 = 222 = 272 = 29 (mod 35).
* Problem 8.5.21. Prove that if n = pq, where p and q are distinct odd primes,
then each quadratic residue modulo n has exactly four square roots. Prove further that
exactly one quarter of the integers x that are relatively prime to n and such that
I <- x <- n - 1 are quadratic residues modulo n.
Section 8.5.2 gave efficient algorithms for testing whether x is a quadratic residue
modulo p, and if so for finding its square roots. These two problems can also be
solved efficiently modulo a composite number n provided the factorization of n is
given. If the factorization of n is not given, no efficient algorithm is known for either
of these problems. The essential step in Dixon's factorization algorithm is to find two
integers a and h relatively prime to n such that a2 = b2 (mod n) but a ±b (mod n).
This implies that a2 - h2 = (a - b)(a + b) = 0 (mod n). Given that n is a divisor nei-
ther of a +h nor of a -b, it follows that some nontrivial factor x of n must divide
a + b while nix divides a -b. The greatest common divisor of n and a + b is thus a
258 Probabilistic Algorithms Chap. 8
nontrivial factor of n. In the previous example, a = 8, b = 13, and n = 35, and the
greatest common divisor of a +b = 21 and n = 35 is x = 7, a nontrivial factor of 35.
Here is an outline of Dixon's algorithm.
procedure Dixon (n, var x, var success)
{ tries to find some nontrivial factor x of composite number n }
if n is even then x <- 2, success - true
else for i - 2 to Llog3n J do
if n 11' is an integer then x <- n
success - true
return
{ since n is assumed composite, we now know that it has
at least two distinct odd prime factors }
a, b <- two integers such that a 2 = b 2 (mod n)
if a °±b (mod n) then success <- false
else x <- gcd(a+b, n) { using Euclid's algorithm }
success F- true
So how we can find a and b such that a2 = b2 (mod n)? Let k be an integer to
be specified later. An integer is k -smooth if all its prime factors are among the first k
prime numbers. For instance, 120 = 23 x 3 x 5 is 3-smooth, but 35 = 5 x 7 is not.
When k is small, k-smooth integers can be factorized efficiently by an adaptation of
the naive algorithm split (n) given earlier. In its first phase, Dixon's algorithm chooses
integers x randomly between 1 and n - 1. A nontrivial factor of n is already found if
by a lucky fluke x is not relatively prime to n . Otherwise, let y = x 2 mod n . If y is
k-smooth, both x and the factorization of y are kept in a table. The process is repeated
until we have k + 1 different integers for which we know the factorization of their
squares modulo n.
Example 8.5.2. Let n = 2,537 and k = 7. We are thus concerned only with
the primes 2, 3, 5, 7, 11, 13, and 17. A first integer x = 1,769 is chosen randomly.
We calculate its square modulo n : y = 1,240. An attempt to factorize
1,240 = 23 x 5 x 31 fails since 31 is not divisible by any of the admissible primes.
A second attempt with x = 2,455 is more successful : its square modulo n is
y = 1,650 = 2 x 3 x 52 x 11. Continuing thus, we obtain
x,= 2 , 455 y I= 1 ,650 = 2 x 3 x 52 x 11
x2= 970 y2=2,210 =2x5x 13x 17
X3=1,105 y3= 728 =23x7x13
X4 = 1,458 y4 = 2,295 = 33 x 5 x 17
x5= 216 y5= 990 =2x32x5x11
X6 = 80 y6= 1,326 = 2 x 3 x 13 x 17
X7 = 1,844 y7= 756 =22x33x7
xs = 433 ys= 2,288 = 24 x 11 x 13
Sec. 8.5 Las Vegas Algorithms 259
Problem 8.5.22. Given that there are 512 integers x between 1 and 2,536 such
that x2 mod 2537 is 7-smooth, what is the average number of trials needed to obtain
eight successes like those in Example 8.5.2 ?
Example 8.5.3. There are seven possible ways of doing this in Example 8.5.2,
including
y1y2y4y8=26x34x54x70 x112x132x172
Y1Y3Y4Y5Y6Y7=28 x310x54x72x 112x 132x 172.
Problem 8.5.24. Why is there always at least one solution ? Give an efficient
algorithm for finding one. [Hint : Form a (k + 1) x k binary matrix containing the pari-
ties of the exponents. The rows of this matrix cannot be independent (in arithmetic
modulo 2) because there are more rows than columns. In Example 8.5.3, the first
dependence corresponds to
(1,1,0,0,1,0,0) + (1,0,1,0,0,1,1) + (0,1,1,0,0,0,1) + (0,0,0,0,1,1,0)
(0,0,0,0,0,0,0) (mod 2) .
Use Gauss-Jordan elimination to find a linear dependence between the rows.]
This gives us two integers a and b such that a2 = b2 (mod n). The integer a is
obtained by multiplying the appropriate xi and the integer b by halving the powers of
the primes in the product of the y; If a 4t±b (mod n), it only remains to calculate the
.
greatest common divisor of a + b and n to obtain a nontrivial factor. This occurs with
probability at least one half.
It remains to be seen how we choose the best value for the parameter k. The
larger this parameter, the higher the probability that x 2 mod n will be k -smooth when
x is chosen randomly. On the other hand, the smaller this parameter, the faster we can
carry out a test of k -smoothness and factorize the k -smooth values y; , and the fewer
such values we need to be sure of having a linear dependence. Set L = e I and
let h E IR+. It can be shown that if k = L b , there are about L 112h failures for every suc-
cess when we try to factorize x2 mod n. Since each unsuccessful attempt requires k
divisions and since it takes k + 1 successes to end the first phase, the latter takes an
average time that is approximately in 0 (L 2h +1/2h ), which is minimized by b = z . The
second phase takes a time in O (k 3) = 0(L31) by Problem 8.5.24 (it is possible to do
better than this), which is negligible when compared to the first phase. The third phase
can also be neglected. Thus, if we take k = E , Dixon's algorithm splits n
with probability at least one half in an approximate expected time in 0(L2)
= O (e 2 I ) and in a space in O (L).
Several improvements make the algorithm more practical. For example, the pro-
bability that y will be k-smooth is improved if x is chosen near [I 1, rather than
being chosen randomly between I and n - 1. A generalization of this approach,
known as the continued fraction algorithm, has been used successfully. Unlike
Dixon's algorithm, however, its rigorous analysis is unknown. It is therefore more
properly called a heuristic. Another heuristic, the quadratic sieve, operates in a time in
0 (L 9is) and space in 0 (LD). In practice, we would never implement Dixon's algo-
rithm because the heuristics perform so much better. More recently, H. W. Lenstra Jr.
has proposed a factorization algorithm based on the theory of elliptic curves.
A number of identical synchronous processors are linked into a network in the shape of
a ring, as in Figure 8.5.1. Each processor can communicate directly with its two
immediate neighbours. Each processor starts in the same state with the same program
and the same data in its memory. Such a network is of little use so long as all the pro-
cessors do exactly the same thing at exactly the same time, for in this case a single
processor would suffice (unless such duplication is intended to catch erratic behaviour
from faulty processors in a sensitive real-time application). We seek a protocol that
allows the network to choose a leader in such a way that all the processors are in
agreement on its identity. The processor that is elected leader can thereafter break the
symmetry of the network in whatever way it pleases by giving different tasks to dif-
ferent processors.
No deterministic algorithm can solve this problem, no matter how much time is
available. Whatever happens, the processors continue to do the same thing at the same
time. If one of them decides that it wants to be the leader, for example, then so do all
Sec. 8.5 Las Vegas Algorithms 261
the others simultaneously. We can compare the situation to the deadlock that arises
when two people whose degree of courtesy is exactly equal try to pass simultaneously
through the same narrow door. However, if each processor knows in advance how
many others are connected in the network, there exists a Las Vegas algorithm that is
able to solve this problem in linear expected time. The symmetry can be broken on the
condition that the random generators used by the processors are independent. If the
generators are in fact pseudorandom and not genuinely random, and if each processor
starts from the same seed, then the technique will not work.
Suppose there are n processors in the ring. During phase zero, each processor
initializes its local variable m to the known value n, and its Boolean indicator active to
the value true. During phase k, k > 0, each active processor chooses an integer ran-
domly, uniformly, and independently between 1 and m. Those processors that chose 1
inform the others by sending a one-bit message round the ring. (The inactive proces-
sors continue, nevertheless, to pass on messages.) After n -1 clock pulses, each pro-
cessor knows the number I of processors that chose 1. There are three possibilities. If
/ = 0, phase k produces no change in the situation. If l > 1, only those processors that
chose 1 remain active, and they set their local variable m to the value 1. In either case
phase k + I is now begun. The protocol ends when l = 1 with the election of the
single processor that just chose 1.
This protocol is classed as a Las Vegas algorithm, despite the fact that it never
ends by admitting a failure, because there is no upper bound on the time required for it
to succeed. However, it never gives an incorrect solution: it can neither end after
electing more than one leader nor after electing none.
Let 1(n) be the expected number of phases needed to choose a leader among n
processors using this algorithm (not counting phase zero, the initialization). Let
p (n , j) _ [ ] n-i (1- lln)" -i be the probability that j processors out of n randomly
choose the value I during the first stage. With probability p (n, 1) only a single phase
is needed ; with probability p (n , 0) we have to start all over again ; and with
262 Probabilistic Algorithms Chap. 8
* Problem 8.5.27. Show that l (n) < e = 2.718 for every n >_ 2.
One phase of this protocol consists of single bit messages passed round the ring.
Since each phase thus takes linear time, the preceding problems show that the choice
of a leader also takes expected linear time.
** Problem 8.5.29. Prove that no protocol (not even a Las Vegas protocol) can
solve the problem of choosing a leader in a ring of n identical processors if they are
not given the value n . Nevertheless, given an arbitrary parameter p > 0, give a Monte
Carlo algorithm that is able to determine n exactly with a probability of error less than
p whatever the number of processors.
There exist problems for which no efficient algorithm is known that is able to obtain a
correct solution every time, whether it be deterministic or probabilistic. A Monte
Carlo algorithm occasionally makes a mistake, but it finds a correct solution with high
probability whatever the instance considered. This is not the same as saying that it
works correctly on a majority of instances, only failing now and again in some special
cases. No warning is usually given when the algorithm makes a mistake.
Problem 8.6.1. Prove that the following algorithm decides correctly whether
or not an integer is prime in more than 80% of the cases. (It performs much better on
small integers.) Show, on the other hand, that it is not a Monte Carlo algorithm by
exhibiting an instance on which the algorithm systematically gives a wrong answer.
function wrong (n)
if gcd(n , 30030) = 1 ( using Euclid's algorithm }
then return true
else return false
Sec. 8.6 Monte Carlo Algorithms 263
What constant should be used instead of 30,030 to bring the proportion of successes
above 85% even if very large integers are considered?
Let p be a real number such that 1/2 < p < 1. A Monte Carlo algorithm is
p-correct if it returns a correct solution with probability not less than p, whatever the
instance considered. The advantage of such an algorithm is p -'/2. The algorithm is
consistent if it never gives two different correct solutions to the same instance. Some
Monte Carlo algorithms take as a parameter not only the instance to be solved but also
an upper bound on the probability of error that is acceptable. The time taken by such
algorithms is then expressed as a function of both the size of the instance and the
reciprocal of the acceptable error probability. To increase the probability of success of
a consistent, p-correct algorithm, we need only call it several times and choose the
most frequent answer. This increases our confidence in the result in a way similar to
the world series calculation of Section 5.2.
More generally, let a and S be two positive real numbers such that e + S < -1. Let
q = I -p = z -e. The repetitive algorithm finds the correct answer if the latter is
obtained at least m times. Its error probability is therefore at most
m-1
E Prob[ i correct answers in n tries ]
i=0
M-1
< n []piqnl
1: i
i=0
264 Probabilistic Algorithms Chap. 8
-i
=(Pq)ni2
I,
i=0
n
(qlp)
m-1
)n12
(pq i since q /p < 1 and 2- i >- 0
i=0
_ (4pq )n i2 = (1-4F
2)n i2
12) lg 1/8
(1- 4E2)( since 0 < 1- 4E2 < 1
For example, suppose we have a consistent Monte Carlo algorithm whose advan-
tage is 5% and we wish to obtain an algorithm whose probability of error is less than
5% (that is, we wish to go from a 55%-correct algorithm to a 95%-correct algorithm).
The preceding theorem tells us that this can be achieved by calling the original algo-
rithm about 600 times on the given instance. A more precise calculation shows that it
is enough to repeat the original algorithm 269 times, and that repeating it 600 times
yields an algorithm that is better that 99%-correct. [This is because some of the ine-
qualities used in the proof are rather crude. A more complicated argument shows that
if a consistent (Z + E)-correct Monte Carlo algorithm is repeated 2m -1 times, the
resulting algorithm is (1-S)-correct, where
m-1
S= 2-E v 2i (4 -£2)i <
1
(1-4E2)m
i
i =0 4E nm
The first part of this formula can be used efficiently to find the exact number of repeti-
tions required to reduce the probability of error below any desired threshold S. Alterna-
tively, a good upper bound on this number of repetitions is quickly obtained from the
second part: find x such that e'x - 1/(2SI) and then set m = Fx/4E21.]
Repeating an algorithm several hundred times to obtain a reasonably small pro-
bability of error is not attractive. Fortunately, most Monte Carlo algorithms that occur
in practice are such that we can increase our confidence in the result obtained much
more rapidly. Assume for simplicity that we are dealing with a decision problem and
that the original Monte Carlo algorithm is biased in the sense that it is always correct
whenever it returns the answer true, errors being possible only when it returns the
answer false. If we repeat such an algorithm several times to increase our confidence
in the final result, it would be silly to return the most frequent answer : a single true
outweighs any number of falses. As we shall see shortly, it suffices to repeat such an
Sec. 8.6 Monte Carlo Algorithms 265
i. the solution returned by the algorithm is always correct whenever the instance to
be solved is not in X, and
ii. the correct solution to all the instances that belong to X is yo, but the algorithm
may not always return the correct solution to these instances.
Suppose, for example, that p = s (once again, this is not allowed for general
Monte Carlo algorithms, but it causes no problems with a biased algorithm). It suffices
to repeat the algorithm at most 20 times to be either sure that the correct solution is yo
(if either of the first two cases previously described is observed), or extremely
confident that the solution obtained on every one of the trials is correct (since other-
wise the probability of obtaining the results observed is less than one chance in a mil-
lion). In general, k repetitions of a consistent, p-correct, yo-biased algorithm yield an
algorithm that is (1 - (1-p )k )-correct and still consistent and yo-biased.
Assume your consistent, p-correct, yo-biased Monte Carlo algorithm has yielded
k times in a row the same answer y # yo on some instance x. It is important to under-
stand how to interpret such behaviour correctly. It may be tempting to conclude that
"the probability that y is an incorrect answer is at most (1-p )k ". Such a conclusion
of course makes no sense because either the correct answer is indeed y, or not. The
probability in question is therefore either 0 or 1, despite the fact that we cannot tell for
sure which it is. The correct interpretation is as follows : "I believe that y is the correct
answer and if you quiz me enough times on different instances, my proportion of errors
should not significantly exceed (1-p )k ".
The "proportion of errors" in question, however, is averaged over the entire
sequence of answers given by the algorithm, not only over those occurrences in which
the algorithm actually provides such probabilistic answers. Indeed, if you systemati-
cally quiz the algorithm with instances for which the correct solution is yo, it will
always be wrong whenever it "believes" otherwise. This last remark may appear
trivial, but it is in fact crucial if the probabilistic algorithm is used to generate with
high probability some random instance x on which the correct answer is a specific
Y #Yo
To illustrate this situation, consider a nonempty finite set I of instances. We are
interested in generating a random member of some subset S S 1. (As a practical
example, we may be interested in generating a random prime of a given length - see
Section 8.6.2.) Let MC be a false-biased, p-correct Monte Carlo algorithm to decide,
given any x E I, whether x E S. Let q =1-p . By definition of a false-biased algo-
rithm, Prob[ MC (x) = true ] = I for each instance x E S and Prob[ MC (x) = true ] <_ q
for each instance x 0 S. Consider the following algorithms.
function repeatMC (x , k)
i-0
ans E- true
while ans and i < k do
if-i+1
ans F- MC (x)
return ans
Sec. 8.6 Monte Carlo Algorithms 267
* Problem 8.6.4. Let r denote the probability that x E S given that x is returned
by a call on uniform (I). Prove that the probability that a call on genrand (k) errone-
ously returns some x 0 S is at most
1
1r q-'
r
1+ k
We are not aware of any unbiased Monte Carlo algorithm sufficiently simple to
feature in this introduction. Thus the section continues with some examples of biased
Monte Carlo algorithms. They all involve the solution of a decision problem, that is,
the only possible answers that they can return are true and false.
Problem 8.6.5. Let A and B be two efficient Monte Carlo algorithms for
solving the same decision problem. Algorithm A is p-correct and true-biased, whereas
algorithm B is q-correct and false-biased. Give an efficient Las Vegas algorithm
LV (x , var y, var success) to solve the same problem. What is the best value of r you
can deduce so that your Las Vegas algorithm succeeds with probability at least r on
each instance ? El
268 Probabilistic Algorithms Chap. 8
maj (T) has returned false on an array with a majority element does not change the
probability that it will return true on the following call on the same instance.
Problem 8.6.6. Show that the probability that k successive calls of maj(T) all
return false is less than 2-k if T contains a majority element. On the other hand, as
soon as any call returns true, we can be certain that T contains a majority element. El
The following Monte Carlo algorithm solves the problem of detecting the pres-
ence of a majority element with a probability of error less than E for every E > 0.
function majMC (T, E)
k rlg(1/E)1
for i <-- I to k do
if maj (T) then return true
return false
The algorithm takes a time in 0 (n log(1/E)), where n is the number of elements in the
array and E is the acceptable probability of error. This is interesting only as an illustra-
tion of a Monte Carlo algorithm since a linear time deterministic algorithm is known
(Problems 4.11.5 and 4.11.6).
This classic Monte Carlo algorithm recalls the algorithm used to determine whether or
not an array has a majority element. The problem is to decide whether a given integer
is prime or composite. No deterministic or Las Vegas algorithm is known that can
solve this problem in a reasonable time when the number to be tested has more than a
few hundred decimal digits. (It is currently possible to establish with certainty the pri-
mality of numbers up to 213 decimal digits within approximately 10 minutes of com-
puting time on a CDC CYBER 170/750.)
A first approach to finding a probabilistic algorithm might be
function prime (n)
d - uniform (2 .. LJ J)
return ((n mod d) # 0)
If the answer returned is false, the algorithm has been lucky enough to find a non-
trivial factor of n, and we can be certain that n is composite. Unfortunately, the
answer true is returned with high probability even if n is in fact composite. Consider
for example n = 2,623 = 43 x 61. The algorithm chooses an integer randomly between
2 and 51. Thus there is only a meagre 2% probability that it will happen on d = 43
and hence return false. In 98% of calls the algorithm will inform us incorrectly that n
is prime. For larger values of n the situation gets worse. The algorithm can be
improved slightly by testing whether n and d are relatively prime, using Euclid's algo-
rithm, but it is still unsatisfactory.
270 Probabilistic Algorithms Chap. 8
To obtain an efficient Monte Carlo algorithm for the primality problem, we need
a theorem whose proof lies beyond the scope of this book. Let n be an odd integer
greater than 4, and let s and t be positive integers such that n -1 = 2s t, where t is
odd. Let a be an integer such that 2 5 a <- n - 2. We say that n is a strong pseudo-
prime to the base a if a' _ 1 (mod n) or if there exists an integer i such that 0 <- i < s
and art _ -1 (mod n).
If n is prime, it is a strong pseudoprime to any base. There exist however com-
posite numbers that are strong pseudoprimes to some bases. Such a base is then
a false witness of primality for this composite number. For example, 158 is a false
witness of primality for 289 because 288 = 9 x 25, 1589 = 131 (mod 289),
1582"9 = 1312 = 110 (mod 289), 1584"9 = 1102 = 251 (mod 289), and finally,
1588 9 = 2512 =_ I (mod 289).
The theorem assures us that if n is composite, it cannot be a strong pseudoprime
to more than (n - 9)/4 different bases. The situation is even better if n is composed of
a large number r of distinct prime factors : in this case it cannot be a strong pseu-
doprime to more than 4(n)/2r-' - 2 different bases, where 4(n) < n -1 is Euler's
totient function. The theorem is generally pessimistic. For instance, 289 has only 14
false witnesses of primality, whereas 737 does not even have one.
tainty. Similarly there is better than a 50% chance that a will provide a key for finding
. Nevertheless the algorithm for testing primality is only a Monte Carlo algorithm
whereas the one for finding square roots is Las Vegas. This difference is explained by
the fact that the Las Vegas algorithm is able to detect when it has been unlucky : the
fact that a does not provide a key for is easy to test. On the other hand, if n is a
strong pseudoprime to the base a, this can be due either to the fact that n is indeed
prime or to the fact that a is a false witness of primality for the composite number n.
The difference can also be explained using Problem 8.2.2.
As usual, the probability of error can be made arbitrarily small by repeating the
algorithm. A philosophical remark is once again in order: the algorithm does not reply
"this number is prime with probability 1-E", but rather, "I believe this number to be
prime ; otherwise I have observed a natural phenomenon whose probability of
occurrence was not greater than E". The first reply would be nonsense, since every
integer larger than I is either prime or composite.
The naive way to solve this problem is to keep the sets in arrays, lists, search
trees, or hash tables. Whatever structure is chosen, each test of equality will take a
time in S2(k), if indeed it is not in S2(k log k ), where k is the cardinality of the larger of
the two sets concerned.
For any e > 0 fixed in advance, there exists a Monte Carlo algorithm that is able
to handle a sequence of m questions in an average total time in 0 (m). The algorithm
never makes an error when Si = S/ ; in the opposite case its probability of error does
not exceed E. This algorithm provides an interesting application of universal hashing
(Section 8.4.4).
Let e > 0 be the error probability that can be tolerated for each request to test the
equality of two sets. Let k = Ilg(max(m, 1/E))]. Let H be a universal2 class of func-
tions from U into (0,1 }k , the set of k -bit strings. The Monte Carlo algorithm first
chooses a function at random in this class and then initializes a hash table that has U
for its domain. The table is used to implement a random function rand : U -+ (0,1 }k
as follows.
function rand (x)
if x is in the table then return its associated value
y E- some random k -bit string
add x to the table and associate y to it
return y
Notice that this is a memory function in the sense of Section 5.7. Each call of rand (x)
returns a random string chosen with equal probability among all the strings of length k.
Two different calls with the same argument return the same value, and two calls with
different arguments are independent. Thanks to the use of universal hashing, each call
of rand (x) takes constant expected time.
To each set Si we associate a variable v [i] initialized to the binary string com-
posed of k zeros. Here is the algorithm for adding an element x to the set Si . We sup-
pose that x is not already a member of Si .
procedure add (i , x)
v[i] F- v[i]®rand(x)
The notation t ® u stands for the bit-by-bit exclusive-or of the binary strings t and u.
The algorithm to test the equality of Si and S1 is:
function test (i , j )
ifv[i]=v[jI
then return true
else return false
It is obvious that Si # Sj if v [i] # v [ j ]. What is the probability
that v [i ] = v [ j ] when Si # Sj ? Suppose without loss of generality that there
exists an xo a Si such that xa 9 S/ . Let Si = Si \ (x0}. For a set S c U, let
Sec. 8.6 Monte Carlo Algorithms 273
XOR (S) be the exclusive-or of the rand (x) for every x E S . By definition,
v [i] = XOR (Si) = rand (xo) ®XOR (S;) and v[j] = XOR (SS ). Let yo = XOR (S; )
®XOR (Si ). The fact that v [i ] = v [ j ] implies that rand (xo) = yo; the probability of
this happening is only 2-k since the value of rand (xo) is chosen independently of those
values that contribute to yo. Notice the similarity to the use of signatures in Sec-
tion 7.2.1.
This Monte Carlo algorithm differs from those in the two previous sections in
that our confidence in an answer "Si = Si " cannot be increased by repeating the call of
test (i , j ). It is only possible to increase our confidence in the set of answers obtained
to a sequence of requests by repeating the application of the algorithm to the entire
sequence. Moreover, the different tests of equality are not independent. For instance,
if Si # Sj , x 9 Si u SS , Sk = Si u {x }, Si = Sj u {x }, and if an application of the
algorithm replies incorrectly that Si = Sj , then it will also reply incorrectly that
Sk=S1.
Problem 8.6.12. Show how you could also implement a procedure elim (i, x),
which removes the element x from the set Si . A call of elim (i, x) is only permitted
when x is already in Si .
Problem 8.6.13. Modify the algorithm so that it will work correctly (with pro-
bability of error c) even if a call of add (i, x) is made when xE Si . Also implement a
request member (i , x), which decides without ever making an error whether x E Si .
A sequence of m requests must still be handled in an expected time in O (m).
You have three n x n matrices A, B, and C and you would like to decide whether
AB = C. Here is an intriguing false-biased, correct Monte Carlo algorithm that is
i
capable of solving this problem in a time in d (n2). Compare this with the fastest
known deterministic algorithm to compute the product AB (Section 4.9), which takes a
time in 0 (n 2.376), and with the probabilistic algorithm mentioned in Section 8.3.5,
which only computes the product approximately.
function goodproduct (A, B , C, n)
array X [ 1 .. n ] { to be considered as a column vector)
for i F- 1 ton do X [i ] <- uniform ({ -1,11)
if ABX = CX then return true
else return false
In order to take a time in 0 (n 2), we must compute ABX as A times BX, providing a
dramatic example of the topic discussed in Section 5.3.
Problem 8.6.17. Given three polynomials p (x), q (x), and r (x) of degrees
n, n and 2n, respectively, give a false-biased, correct Monte Carlo algorithm to
decide whether r (x) is the symbolic product of p z(x) and q (x). Your algorithm should
run in a time in 0 (n ). (In the next chapter we shall see a deterministic algorithm that
is capable of computing the symbolic product of two polynomials of degree n in a time
in 0 (n log n ), but no such algorithm is known that only takes a time in 0 (n).)
The experiment devised by Leclerc (1777) was carried out several times in the
nineteenth century; see for instance Hall (1873). It is no doubt the earliest recorded
probabilistic algorithm. The term "Monte Carlo" was introduced into the literature by
Metropolis and Ulam (1949), but it was already in use in the secret world of atomic
research during World War 11, in particular in Los Alamos, New Mexico. Recall that it
is often used to describe any probabilistic algorithm. The term "Las Vegas" was intro-
duced by Babai' (1979) to distinguish probabilistic algorithms that occasionally make a
mistake from those that reply correctly if they reply at all. The term "Sherwood" is
our own. For the solution to Problem 8.2.3, see Anon. (1495).
Sec. 8.7 References and Further Reading 275
Problem 8.5.17 is given by the algorithm of Shanks (1972) and Adleman, Manders,
and Miller (1977). The integer factorization algorithm of Pollard (1975) has a proba-
bilistic flavour. The probabilistic integer factorization algorithm discussed in Sec-
tion 8.5.3 originated with Dixon (1981) ; for a comparison with other methods, refer to
Pomerance (1982). The algorithm based on elliptic curves is discussed in Lenstra
(1986). For efficiency considerations in factorization algorithms, consult Montgomery
(1987). The algorithm for electing a leader in a network, including Problem 8.5.29,
comes from Itai and Rodeh (1981).
Amplification of the advantage of an unbiased Monte Carlo algorithm is used to
serve cryptographic ends in Goldwasser and Micali (1984). The probabilistic test of
primality presented here is equivalent to the one in Rabin (1976, 1980b). The test of
Solovay and Strassen (1977) was discovered independently. The expected number of
false witnesses of primality for a random composite integer is investigated in Erdos
and Pomerance (1986) ; see also Monier (1980). More information on number theory
can be found in the classic Hardy and Wright (1938). The implication of Problem 8.6.4
for the generation of random numbers that are probably prime is explained in
Beauchemin, Brassard, Cre peau, Goutier, and Pomerance (1988), which also gives a
fast probabilistic splitting algorithm whose probability of success on any given compo-
site integer is at least as large as the probability of failure of Rabin's test on the same
integer. A theoretical solution to Problem 8.6.10 is given in Goldwasser and Kilian
(1986) and Adleman and Huang (1987). For more information on tests of primality
and their implementation, consult Williams (1978), Lenstra (1982), Adleman, Pomer-
ance, and Rumely (1983), Kranakis (1986), and Cohen and Lenstra (1987). The proba-
bilistic test for set equality comes from Wegman and Carter (1981); they also give a
cryptographic application of universal hashing. The solution to Problem 8.6.14 is in
Brassard and Kannan (1988). The Monte Carlo algorithm to verify matrix multiplica-
tion (Section 8.6.4) and the solution to Problem 8.6.17 are given in Freivalds (1979);
also read Freivalds (1977).
Several interesting probabilistic algorithms have not been discussed in this
chapter. We close by mentioning a few of them. Given the cartesian coordinates of
points in the plane, Rabin (1976) gives an algorithm that is capable of finding the
closest pair in expected linear time (contrast this with Problem 4.11.14). Rabin
(1980a) gives an efficient probabilistic algorithm for factorizing polynomials over arbi-
trary finite fields, and for finding irreducible polynomials. A Monte Carlo algorithm is
given in Schwartz (1978) to decide whether a multivariate polynomial over an infinite
domain is identically zero and to test whether two such polynomials are identical.
Consult Zippel (1979) for sparse polynomial interpolation probabilistic algorithms. An
efficient probabilistic algorithm is given in Karp and Rabin (1987) to solve the string-
searching problem discussed in Section 7.2. Our favourite unbiased Monte Carlo algo-
rithm for a decision problem, which allows us to decide efficiently whether a given
integer is a perfect number and whether a pair of integers is amicable, is described in
Bach, Miller, and Shallit (1986). For an anthology of probabilistic algorithms, read
Valois (1987).
9
Transformations
of the Domain
9.1 INTRODUCTION
It is sometimes useful to reformulate a problem before trying to solve it. If you were
asked, for example, to multiply two large numbers given in Roman figures, you would
probably begin by translating them into Arabic notation. (You would thus use an
algorism, with this word's original meaning!) More generally, let D be the domain of
objects to be manipulated in order to solve a given problem. Let f :Dt -4 D be a
function to be calculated. An algebraic transformation consists of a transformed
domain R, an invertible transformation function a : D -+ R and a transformed function
g : R' -> R such that
for all x 1 , x22 ,---xt in the domain D . Such a transformation is of interest if g can
be calculated in the transformed domain more rapidly than f can be calculated in the
original domain, and if the transformations a and 6-1 can also be computed efficiently.
Figure 9.1.1 illustrates this principle.
Example 9.1.1. The most important transformation used before the advent of
computers resulted from the invention of logarithms by Napier in 1614. Kepler found
this discovery so useful that he dedicated his Tabulae Rudolphinae to Napier. In this
case, D = IN+ or R+, f (u , v) = u x v , R = R, a(u) = In u and g (x , y) =x +y. This
allows a multiplication to be replaced by the calculation of a logarithm, an addition,
and an exponentiation. Since the computation of a and a1 would take more time than
the original multiplication, this idea is only of interest when tables of logarithms are
278 Transformations of the Domain Chap. 9
I
Dt D
o`
Rt
computed beforehand. Such tables, calculated once and for all, thus furnish a historical
example of preconditioning (Chapter 7).
Example 9.1.3. Most computers that handle numerical data read these data
and print the results in decimal but carry out their computations in binary.
For the rest of this chapter all the arithmetic operations involved are carried out either
modulo some integer m to be determined or in the field of complex numbers - or
more generally in any commutative ring. They are assumed to be executed at unit cost
unless it is explicitly stated otherwise. Let n > I be a power of 2. We denote by co
some constant such that 012 =-1. If n = 8, for example, then w= 4 is a possible
value if we are using arithmetic modulo 257, and w= (1+i )/J is a possible value in
the field of complex numbers.
Consider an n-tuple a = (ao, a i , ... , an _ 1). This defines in the natural way a
polynomial pa(x)=an_1Xn-I+an_2Xn-2+ "' +alx+ao of degree less than n.
The discrete Fourier transform of a with respect to m is the n-tuple Fu,(a) =
(pa (1), pa (w), pa ((02), ... , pa (wn - 1)). As in Example 9.1.4, it appears at first glance
that the number of scalar operations needed to calculate this transform is in S2(n2).
However, this is not in fact the case, thanks to an algorithm known as the Fast Fourier
Transform (FFT). This algorithm is vitally important for a variety of applications,
particularly in the area of signal processing (Section 1.7.6).
Suppose that n > 2 and set t = n / 2. The t-tuples b and c defined by
b = (ao, a 2 , ... , an-4, an-2) and c = (al , a3, ... , an_3, an_j) are such that
pa(X)=ph(x2)+xpc(x2). In particular, Pa (w')=pb((X')+w'pC((X'), where a=w2.
Clearly, a`12 = (m2)t/2 = m' = wn/2 = -1, so it is legitimate to talk about Fa,(b) and
Fa(c ). Furthermore, a' = 1 and co' _ -1, hence a`+' = a` and w`+` d, so that
pa (w`+') = Pb ((X') - w' pC (c'). The Fourier transform Fu,(a) is calculated using the
divide-and-conquer technique.
function FFT (a [0.. n -1 ], co) : array [0.. n -1 ]
(n is a power of2andwn'2=-1 )
array A [0.. n -1] [ the answer is computed in this array )
if n = 1 then A [0] <- a [0]
else t - n / 2
arrays b, c, B, C [0.. t -1 ] ( intermediate arrays )
( creation of the sub-instances )
fori -0tot-1 dob[i] 4-a[2i]
c[i] E-a[2i+1]
280 Transformations of the Domain Chap. 9
Despite its importance in signal processing, our concern here is to use the discrete
Fourier transform as a tool for transforming the domain of a problem. Our principal
aim is to save the idea proposed in Example 9.1.4. Using the fast Fourier transform
allows us to evaluate the polynomials p (x) and q (x) at the points 1, w, cue, ... , & -1
in a time in 0 (n log n), where n is a power of 2 greater than the degree of the product
Sec. 9.3 The Inverse Transform 281
1. w# 1
2. wn = 1, and
n-I
3. E or" = 0 for every I <- p < n .
j=0
i. Conditions (1) and (2) are obviously fulfilled. To show condition (3), let n = 2k
and decompose p = 2n v where u and v are integers, and v is odd. Let
s =2k-1. Show that co-" = - (dj+s )P for every integer j. Conclude by split-
ting y o' wjP into 2n sub sums of 2s elements, each summing to zero.
ii. Obvious.
iii. Notice that (w 1)n i2 = -1.
iv. Assume co' = w j for 0 <- i < j < n and let p = j - i. Then wP = 1 and
1 <- p < n - 1. Use condition (3) and n * 0 to obtain a contradiction.
v. Use p = n/2 in condition (3), and use the existence of n-1.
Problem 9.3.2. Prove that e 2i 1/, is a principal n th root of unity in the field of
complex numbers.
282 Transformations of the Domain Chap. 9
Theorem 9.3.2. Let A and B be the matrices just defined. Then AB = I, , the
n x n identity matrix.
Proof. Let C = AB. By definition, Cu = Jk _o A,k Bki - n-I ik -o w(i-j)k
There are three cases to consider.
n
nxn-1=1.
i.Ifi=j, then co('-j)k=co0=1,andsoC11=n-1
k=0
n-1
H. If i > j, let p = i -j. Now C,j = n-1 Y, OP = 0 by property (3) of a principal
k=0
n th root of unity, since 1 <- p < n .
n - I
Problem 9.3.5. Prove that Fw' (Fu,(a)) = Fw(FW' (a)) = a for every a.
The inverse Fourier transform can be calculated efficiently by the following algo-
rithm, provided that n-1 is either known or easily calculable.
Sec. 9.3 The Inverse Transform 283
array F [0.. n -1 ]
F <-FFT(a,w"-')
for i <-Oton-1 do F[i]Fn-'F[i]
return F
Example 9.3.1. Let n = 8 and a = (255, 85, 127, 200, 78, 255, 194, 75). Let us
calculate Fj' (a) in arithmetic modulo m = 257, where w = 4. By Problem 9.3.3, w is
indeed a principal n th root of unity. First we calculate FFT (a , of ' ), where
w' = co7 = 193. To do this, a is decomposed into b=(255,127,78,194)
and c = (85,200,255,75). The recursive calls with 0-2 = 241 yield
B =FFT(b,w2)_(140,221,12,133) and C =FFT(c,w2)_(101,143,65,31).
Combined, these results give A = (241,64,0,9, 39, 121, 24, 0). There remains the mul-
tiplication by n-' = m -(m -1)/n = 225 (Problem 9.3.3). The final result is thus
F = (255,8,0,226,37,240,3,0), which is consistent with Example 9.2.1.
If the Fourier transform is calculated in the field of complex numbers (Problem
9.3.2), rounding errors may occur on the computer. On the other hand, if the transform
is calculated modulo m (Problem 9.3.3), it may be necessary to handle large integers.
For the rest of this section we no longer suppose that arithmetic operations can be per-
formed at unit cost : the addition of two numbers of size 1 takes a time in O (1). We
already know (Problem 9.3.4) that reductions modulo m can be carried out in a time in
O (log m), thanks to the particular form chosen for m. Furthermore, the fact that w is
a power of 2 means that multiplications in the FFT algorithm can be replaced by
shifts. For this it is convenient to modify the algorithm slightly. First, instead of
giving w as the second argument, we supply the base 2 logarithm of w, denoted by y.
Secondly, the recursive calls are made with 2y rather than w2 as the second argument.
The final loop becomes
R <-- 0
for i f-0tot-1 do
I R= iy}
A [i] <-B[i]+C[i]TR
A[t+i]<-- B[i]-C[i]TP
RF-R+y,
where x Ty denotes the value of x shifted left y binary places, that is, x x 2Y. All the
arithmetic is carried out modulo m = w"'2+ I using Problem 9.3.4.
The heart of the algorithm consists of executing instructions of the form
A - (B ± C T (3) mod m, where 0 <- B < m and 0 <- C < m . The value of the shift
never exceeds (z - 1) Ig w, even when the recursive calls are taken into account.
Consequently - w" -' <- B ± C T Q <- w" -' + w" '2, which means that it can be reduced
modulo m in a time in O (log m) = O (n log (o) by Problem 9.3.4. Since the number of
operations of this type is in 0 (n log n), the complete computation of the Fourier
284 Transformations of the Domain Chap. 9
transform modulo m can be carried out in a time in 0 (n 2 log n log co). (From a prac-
tical point of view, if m is sufficiently small that arithmetic modulo m can be con-
sidered to be elementary, the algorithm takes a time in 0 (n log n).)
Problem 9.3.6. Show that the inverse transform modulo m = co"t2+ I can also
be computed in a time in 0 (n 2 log n log co). (The algorithm FFTinv has to be
modified. Otherwise a direct call on the new FFT with y = (n - 1) lg co, corresponding
to the use of w = co' as a principal root of unity, causes shifts that can go up to
1 1
(2 -1) (n - 1) lg co, which means that Problem 9.3.4 can no longer be applied.
Similarly, the final multiplication by n-1 can profitably be replaced by a multiplication
by -n-1 followed by a change of sign, since -n-1 = (o"t2/n is a power of 2.)
We now have available the tools that are necessary to finish Example 9.1.4. Let
p(x) =a,xs +aS_,xs-1+
+alx+ao and q(x) = btxt+bt_lxt-1+ +b1x+bo be
two polynomials of degrees s and t, respectively. We want to calculate symbolically
the product polynomial r(x)=cdxd+cd_IXd-1+ +clx+co=p(x)q(x)
of degree d = s + t. Let n be the smallest power of 2 greater than d, and let co
be a principal n th root of unity. Let a, b, and c be the n-tuples defined
by a =(ao,a1,.. , as,0,O... , 0), b =(bo,bl,.. , bt,0,0,.. , 0), and
c = (c0 , c 1, ... , cd , 0 , 0 , ... , 0), respectively. (Padding c with zeros is unneces-
sary if d -1 is a power of 2.) Let A = Fu,(a), B = F",(b), and C = Fu,(c). By
definition of the Fourier transform, C; = r (co') = p (w`) q (co`) = A; B; . Therefore C is
the pointwise product of A and B. By Problem 9.3.5, c = Fw 1(C ).
Putting all this together, the coefficients of the product polynomial r (x) are given
by the first d + 1 entries in c = F, 1(Fw(a) x Fjb)). Notice that this reasoning made
no use of the classic unique interpolation theorem, and this is fortunate because unique
interpolation does not always hold when the arithmetic is performed in a ring rather
than in a field. (Consider, for instance, p,(x) = 2x + 1 and p2(x) = 5x + 1 in the ring of
integers modulo 9. Both of these degree 1 polynomials evaluate to 1 and 7 at the
points 0 and 3, respectively.)
Problem 9.4.1. Give explicitly the algorithm we have just sketched. Show
that it can be used to multiply two polynomials whose product is of degree d with a
number of scalar operations in 0 (d log d), provided that a principal n th root of unity
and the multiplicative inverse of n are both easily obtainable, where n is the smallest
power of 2 greater than d.
Problem 9.4.2. Let p (x) and q (x) be two polynomials with integer
coefficients. Let a and b be the maxima of the absolute values of the coefficients of
p (x) and q (x), respectively. Let u be the maximum of the degrees of the two polyno-
mials. Prove that no coefficient of the product polynomial p(x)q(x) exceeds ab (u+l )
in absolute value. (In Example 9.1.4, a = 5, b = 6, and u = 3, so no coefficient of
r(x) can exceed 120 in absolute value.)
Example 9.4.1. (Continuation of Example 9.1.4) We wish to multiply sym-
bolically the polynomials p(x) = 3x3-5x2-x +l and q(x) =x3-4x2+6x -2. Since
the product is of degree 6, it suffices to take n = 8. By Problem 9.4.2, all the
coefficients of the product polynomial r (x) = p (x) q (x) lie between -120 and 120; thus
it suffices to calculate them modulo in = 257. By Problem 9.3.3, w = 4 is a principal
n th root of unity in arithmetic modulo 257, and n -' = 225.
Let a = (1, -1, - 5, 3, 0, 0, 0, 0) and b =(-2,6,-4,1,0,O,0,0). Two applications
of the algorithm FFT yield
A = F.(a) = (255,109,199,29,251,247,70,133)
and
The analysis of the algorithm of Problem 9.4.3 depends on the degrees s and t of
the polynomials to be multiplied and on the size of their coefficients. If the latter are
sufficiently small that it is reasonable to consider operations modulo m to be elemen-
tary, the algorithm multiplies p (x) and q (x) symbolically in a time in 0 (d log d),
286 Transformations of the Domain Chap. 9
where d = s + t. The naive algorithm would have taken a time in 0 (st ). On the other
hand, if we are obliged to use multiple-precision arithmetic, the initial computation of
the Fourier transforms and the final calculation of the inverse take a time in
0 (d 2 log d log (o), and the intermediate pointwise multiplication of the transforms takes
a time in 0 (d M (d log co)), where M (1) is the time required to multiply two integers of
size 1. Since M (1) n O(1 log l log log 1) with the best-known algorithm for integer mul-
tiplication (Section 9.5), the first term in this analysis can be neglected. The total time
is therefore in 0 (d M (d log co)), where Co = 2 suffices if none of the coefficients of the
polynomials to be multiplied exceeds 2' 14/ 2(1 + max(s , t )) in absolute value.
(Remember that n is the smallest power of 2 greater than d.) By comparison, the naive
algorithm takes a time in 0 (s t M (1)), where 1 is the size of the largest coefficient in
the polynomials to be multiplied. It is possible for this time to be in 0 (st) in practice,
if arithmetic can be carried out on integers of size l at unit cost. The naive algorithm
is therefore preferable to the "fast" algorithm if d is very large and I is reasonably
small. In every case, the algorithm that uses 0) = e 2fi /" can multiply approximately the
two polynomials in a time in 0 (d log d).
We return once more to the problem of multiplying large integers (Sections 1.1, 1.7.2,
and 4.7). Let a and b be two n-bit integers whose product we wish to calculate. Sup-
pose for simplicity that n is a power of 2 (nonsignificant leading zeros are added at the
left of the operands if necessary). The classic algorithm takes a time in f (n2),
whereas the algorithm using divide-and-conquer requires only a time in 0 (n 1.59), or
even in 0 (n") for any a > 1 (Problem 4.7.8). We can do better than this thanks to a
double transformation of the domain. The original integer domain is first transformed
into the domain of polynomials represented by their coefficients ; then the symbolic
product of these polynomials is obtained using the discrete Fourier transform.
Sec. 9.5 Multiplication of Large Integers 287
We denote by pa (x) the polynomial of degree less than n whose coefficients are
given by the successive bits of the integer a. For instance, p53(x) = X5+X4+X2 + I
because 53 in binary is 00110101. Clearly, pa (2) = a for every integer a. To obtain
the product of the integers a and b, we need only calculate symbolically the polyno-
mial r (x) = p (x) q (x) using the fast Fourier transform (Section 9.4), and then evaluate
r (2). The algorithm is recursive because one of the stages in the symbolic multiplica-
tion of polynomials consists of a pointwise multiplication of Fourier transforms.
When n is a power of 2, prove that t(n) = Show that t(n) 0 (nk ), whatever
the value of the constant k.
The preceding problem shows that the modified algorithm is still bad news, even
if we do not take into account the time required to compute the Fourier transforms !
This is explained by the fact that we used the "fast" algorithm for multiplying two
polynomials in exactly the circumstances when it should be avoided : the polynomials
are of high degree and their coefficients are small. To correct this, we must lower the
degree of the polynomials still further.
1! Ign1
Let l = 2 2 ; that is, l = I or l = 2n , depending on whether lg n is even
or odd. Let k = n /I. Note that 1 and k are powers of 2. This time, denote by pa (x)
the polynomial of degree less than k whose coefficients correspond to the k blocks of I
successive bits in the binary representation of a. Thus we have that pa (21) = a. To
calculate the product of the integers a and b, we need only calculate symbolically the
polynomial r (x) = Pa (X)Pb (x), using Fourier transforms, and then evaluate r (21).
Let d = 2k, a power of 2 greater than the degree of the product polynomial r (x).
This time we need to choose a principal d th root of unity t,3. Since the coefficients of
the polynomials pa (x) and Pb (x) lie between 0 and 21 - 1, and the degree of these poly-
Sec. 9.5 Multiplication of Large Integers 289
nomials is less than k, the largest coefficient possible in r (x) is k (21 -1)2. It suffices
therefore that k 221 < m = wd 12 + 1, that is Ig co >- (21 + Ig k)/(dl 2). In the case when
Ig n is even I = k = dl 2 = I and Ig w >- (2' + IgFn- )/ = 2 + (Ig J )/W W. Simi-
larly, when n is odd, we obtain Ig w ? 4 + (Ig' )/ . Consequently, w = 8
suffices to guarantee that the computation of the coefficients of r (x) will be correct
when Ig n is even, and w = 32 is sufficient when Ig n is odd.
The multiplication of two n-bit integers is thus carried out using a sym-
bolic multiplication of two polynomials, which takes a time in
d M (2 d lg w) + O (d 2 log d log w). As far as the final evaluation of r(21) is con-
cerned, this can easily be carried out in a time in 0 (d 2 log w), which is negligible.
When n is even, d = 2J and w = 8, which gives M (n) e 24W M (3') + O (n log n).
When n is odd, d = 2n and w = 32; hence, M (n) E 2n M (2 I) + 0 (n log n).
* Problem 9.5.2. Let y > 0 be a real constant, and let t(n) be a function satis-
fying the asymptotic recurrence t(n) E yt(O (' )) + 0 (log n). Prove that
O (log n) if y < 2
t(n)E O (log n log log n) if y= 2
O ((log n)19 Y) if y > 2 .
[Hints: For the second case use the fact that Iglg(1J) (Iglgn) - lg(5/3) provided
that n >_ (310 for every real constant 0 > 1. For the third case prove by constructive
induction that t (n) 5 S [(lg n)1gT - yt(lg n)(197)- 11 - p Ig n , for some constants S, yt, and
p that you must determine and for n sufficiently large. Also use the fact that
(lg1')Ig1< T(lgn)IST+2lgylg1(lg1`r)(IgY)-1
provided n y21g R, for all real constants Q ? 1 and y > 2.]
Let t(n) = M (n)/n . The equations obtained earlier for M (n) lead to
t(n)E6t(3N/n-)+O(logn) when lgn is even, and t(n)E5t(52 )+O(logn) when
n is odd. By Problem 9.5.2, t (n) E O ((log n)1 g 6) c 0 ((log n)2-59). Consequently, this
algorithm can multiply two n-bit integers in a time in M (n) = nt (n) E 0 (n (log n)2.59 ).
Problem 9.5.3. Prove that 0 (n (log n)2.59) c n°C whatever the value of the real
constant a > 1. This algorithm therefore outperforms all those we have seen previ-
ously, provided n is sufficiently large.
I _ 2r+f; ignl and k = n/I for an arbitrary constant i >_ 0. Detailed analysis shows that
this gives rise to 21-' recursive calls on integers of size (2' + 1 + 2-' if Ig n is
even, and 2-' 2n recursive calls on integers of size (2' + + 2-' - if Ig n is odd
1
The first published algorithm for calculating discrete Fourier transforms in a time in
O (n log n) is by Danielson and Lanczos (1942). These authors mention that the source
of their method goes back to Runge and Konig (1924). In view of the great practical
importance of Fourier transforms, it is astonishing that the existence of a fast algorithm
remained almost entirely unknown until its rediscovery nearly a quarter of a century
later by Cooley and Tukey (1965). For a more complete account of the history of the
fast Fourier transform, read Cooley, Lewis, and Welch (1967). An efficient implemen-
tation and numerous applications are suggested in Gentleman and Sande (1966),
Sec. 9.6 References and Further Reading 291
and Rabiner and Gold (1974). The book by Brigham (1974) is also worth mentioning.
The nonrecursive algorithm suggested in Problem 9.2.2 is described in several refer-
ences, for example Aho, Hopcroft, and Ullman (1974).
Pollard (1971) studies the computation of Fourier transforms in a finite field.
The solution to Problems 9.4.5 and 9.4.6 is given in Aho, Hopcroft, and Ullman
(1974). Further ideas concerning the symbolic manipulation of polynomials, evalua-
tion, and interpolation can be found in Borodin and Munro (1971, 1975), Horowitz and
Sahni (1978), and Turk (1982).
The second edition of Knuth (1969) includes a survey of algorithms for integer
multiplication. A practical algorithm for the rapid multiplication of integers with up to
10 thousand decimal digits is given in Pollard (1971). The algorithm that is able to
multiply two integers of size n in a time in O (n log2n) is attributed to Karp and
described in Borodin and Munro (1975). The details of the algorithm by Schonhage
and Strassen (1971) are spelled out in Brassard, Monet, and Zuffellato (1986), although
the solution given there to Problem 9.3.3 is unnecessarily complicated in the light of
Theorem 9.3.1. Also read Turk (1982). The algorithm used by the Japanese to com-
pute it to 10 million decimal places is described in Kanada, Tamura, Yoshino and
Ushiro (1986); Cray Research (1986) mentions an even more precise computation of
the decimals of it, but does not explain the algorithm used. The empire struck back
shortly thereafter when the Japanese computed 134 million decimals, which is
the world record at the time of this writing; read Gleick (1987). And the saga goes
ever on.
10
Introduction
to Complexity
This technique applies to a variety of problems that use the concept of comparisons
between elements. We illustrate it with the sorting problem. Thus we ask the fol-
lowing question : what is the minimum number of comparisons that are necessary to
sort n elements? For simplicity we count only comparisons between the elements to
be sorted, ignoring those that may be made to control the loops in our program. Con-
sider first the following algorithm.
292
Sec. 10.1 Decision Trees 293
This algorithm is very efficient if the difference between the largest and the smal-
lest values in the array to be sorted is not too large. For example, if
max(T) - min(T) = #T , the algorithm provides an efficient and practical way of sorting
an array in linear time. However, it becomes impractical, on account of both the
memory and the time it requires, when the difference between the elements to be
sorted is large. In this case, variants such as radix sort or lexicographic sort (not dis-
cussed here) can sometimes be used to advantage. However, only in rare applications
will these algorithms prove preferable to quicksort or heapsort. The most important
characteristic of countsort and its variations is that they work using transformations :
arithmetic operations are carried out on the elements to be sorted. On the other hand,
all the sorting algorithms considered in the preceding chapters work using com-
parisons : the only operation allowed on the elements to be sorted consists of com-
paring them pairwise to determine whether they are equal and, if not, which is the
greater. This difference resembles that between binary search and hash coding. In this
book we pay no further attention to algorithms for sorting by transformation.
Problem 10.1.2. Show exactly how countsort can be said to carry out arith-
metic operations on the elements to be sorted. As a function of n, the number of ele-
ments to be sorted, how many comparisons between elements are made?
Coming back to the question we asked at the beginning of this section : what is
the minimum number of comparisons that are necessary in any algorithm for sorting n
elements by comparison ? Although the theorems set out in this section still hold even
if we consider probabilistic sorting algorithms (Section 8.4.1), we shall for simplicity
confine our discussion to deterministic algorithms. A decision tree is a labelled,
directed binary tree. Each internal node contains a comparison between two of the ele-
ments to be sorted. Each leaf contains an ordering of the elements. Given a total order
relation between the elements, a trip through the tree consists of starting from the root
and asking oneself the question that is found there. If the answer is "yes", the trip con-
tinues recursively in the left-hand subtree ; otherwise it continues recursively in the
right-hand subtree. The trip ends when it reaches a leaf ; this leaf contains the verdict
associated with the order relation used. A decision tree for sorting n elements is valid
if to each possible order relation between the elements it associates a verdict that is
294 Introduction to Complexity Chap. 10
compatible with this relation. Finally, a decision tree is pruned if all its leaves are
accessible from the root by making some consistent sequence of decisions. The fol-
lowing problem will help you grasp these notions.
Problem 10.1.3. Verify that the decision tree given in Figure 10.1.1 is valid
for sorting three elements A, B, and C.
Every valid decision tree for sorting n elements gives rise to an ad hoc sorting
algorithm for the same number of elements. For example, to the decision tree of
Figure 10.1.1 there corresponds the following algorithm.
procedure adhocsort3(T [l .. 3])
A- T [1], B- T [2], C <- T [3]
if A < B then if B < C then { already sorted }
else if A < C
then T - A,C,B
else T - C,A,B
else if B < C then if A <C
then T - B,A,C
else T - B,C,A
else T <- C,B,A
Similarly, to every deterministic algorithm for sorting by comparison there
corresponds, for each value of n, a decision tree that is valid for sorting n elements.
Figures 10.1.2 and 10.1.3 give the trees corresponding to the insertion sorting algo-
rithm (Section 1.4) and to heapsort (Section 1.9.4 and Problem 2.2.3), respectively,
when three elements are to be sorted. (The annotations on the trees are intended to
help follow the progress of the corresponding algorithms.) Notice that heapsort
A<B<C C<B<A
BAC ABC
BAA ABB
x=C x=C
I C<B<A B,<C<A I
A,<C<B A<B.C
sometimes makes unnecessary comparisons. For instance, if B <A A < C, the decision
tree of Figure 10.1.3 first tests whether B >A (answer : no), and then whether C > A
(answer: yes). It would now be possible to establish the correct verdict, but it,
nonetheless, asks again whether B > A before reaching its conclusion. (Despite this,
the tree is pruned : the leaf that would correspond to a contradictory answer "yes" to
the third question has been removed, so that every leaf can be reached by some con-
sistent sequence of decisions.) Thus heapsort is not optimal insofar as the number of
comparisons is concerned. This situation does not occur with the decision tree of
Figure 10.1.2, but beware of appearances : it occurs even more frequently with the
insertion sorting algorithm than with heapsort when the number of elements to be
sorted increases.
Problem 10.1.4. Give the pruned decision trees corresponding to the algo-
rithms for sorting by selection (Section 1.4) and by merging (Section 4.4), and to
quicksort (Section 4.5) for the case of three elements. In the two latter cases do not
stop the recursive calls until there remains only a single element to be "sorted". 0
Problem 10.1.5. Give the pruned decision trees corresponding to the insertion
sorting algorithm and to heapsort for the case of four elements. (You will need a big
piece of paper !) 0
The following observation is crucial: the height of the pruned decision tree
corresponding to any algorithm for sorting n elements by comparison, that is, the dis-
tance from the root to the most distant leaf, gives the number of comparisons carried
296 Introduction to Complexity Chap. 10
yes
C> B
C A
rBA BC
A C A C
BC AB BC BA
A C
BC BA
A C A B C B
BC AB CB AC BA CA
I
out by this algorithm in the worst case. For example, a possible worst case for sorting
three elements by insertion is encountered if the array is already sorted into descending
order (C < B < A ); in this case the three comparisons B < A ?, C < A ?, and C < B ?
situated on the path from the root to the appropriate verdict in the decision tree all have
to be made.
The decision trees we have seen for sorting three elements are all of height 3.
Can we find a valid decision tree for sorting three elements whose height is less ? If so,
we shall have an ad hoc algorithm for sorting three elements that is more efficient in
the worst case. Try it : you will soon see that this cannot be done. We now prove more
generally that such a tree is impossible.
Lemma 10.1.1. Any binary tree with k leaves has a height of at least [lg k 1.
Proof. It is easy to show (by mathematical induction on the total number of
nodes in the tree) that any binary tree with k leaves must have at least k -1 internal
nodes. To say the same thing differently, a binary tree with t nodes in all cannot have
more than It /21 leaves. Now a binary tree of height h can have at most 2h+1-1
nodes in all (by another simple argument using mathematical induction, this time on
the height of the tree), and hence it has at most 2" leaves. The lemma follows
immediately.
Lemma 10.1.2. Any valid decision tree for sorting n elements contains at least
n! leaves. (It may have more than n! leaves if it is not pruned or if some of the leaves
can only be reached when some keys are equal. The upper limit on the number of
leaves of any pruned decision tree can be computed with Problem 5.8.6.)
Sec. 10.1 Decision Trees 297
Proof. A valid tree must be able to produce at least one verdict corresponding to
each of the n! possible orderings of the n elements to be sorted.
This proof shows that any deterministic algorithm for sorting by comparison
must make at least Iig(n!)l comparisons in the worst case when sorting n elements.
This certainly does not mean that it is always possible to sort n elements with as few
as [lg(n!)1 comparisons in the worst case. In fact, it has been proved that 30 com-
parisons are necessary and sufficient in the worst case for sorting 12 elements, and yet
rlg(12!)] = 29. In the worst case, the insertion sorting algorithm makes 66 com-
parisons when sorting 12 elements, whereas heapsort makes 59 (of which the first 18
are made during construction of the heap).
Problem 10.1.6. Give exact formulas for the number of comparisons carried
out in the worst case by the insertion sorting algorithm and by the selection sorting
algorithm when sorting n elements. How well do these algorithms do when compared
to the lower bound Fig (n!)] for n = 50 ? 0
** Problem 10.1.7. Prove that the number of comparisons carried out by heap-
sort on n elements, n ? 2, is never greater than 2n ign. Prove further that if n is a
power of 2, then mergesort makes n Ig n - n +I comparisons in the worst case when
sorting n elements. What can you say about sorting by merging in the general case ?
More precise analysis shows that [lg(n!)1 E n lgn -O(n). The previous
problem therefore shows that heapsort is optimal to within a factor of 2 as far as the
number of comparisons needed in the worst case is concerned, and that sorting by
merging almost attains the lower bound. (Some modifications of heapsort come very
close to being optimal for the worst-case number of comparisons.)
Problem 10.1.8. Suppose we ask our sorting algorithm not merely to deter-
mine the order of the elements but also to determine which ones, if any, are equal. For
example, a verdict such as A< B <- C is not acceptable : the algorithm must answer
either A < B < C or A < B = C. Give a lower bound on the number of comparisons
required in the worst case to handle n elements. Rework this problem assuming that
there are three possible outcomes of a comparison between A and B : A < B , A = B,
orA >B.
298 Introduction to Complexity Chap. 10
Problem 10.1.9. Let T [I .. n ] be an array sorted into ascending order, and let
x be some element. How many comparisons between elements are needed in the worst
case to locate x in the array ? As in Section 4.3, the problem is to find an index i such
that 0 <- i S n and T [i ] <- x < T [i + 11, with the logical convention that T [0] = - o0
and T [n + 1 ] = + oo. How does binary search compare to this lower bound ? What
lower bound on the number of comparisons do you obtain using the decision tree tech-
nique if the problem is simply to determine whether x is in the array, rather than to
determine its position ? 0
Decision trees can also be used to analyse the complexity of a problem on the
average rather than in the worst case. Let T be a binary tree. Define the average
height of T as the sum of the depths of all the leaves divided by the number of leaves.
For example, the decision tree of Figure 10.1.1 has an average height
(2+3+3+3+3+2)/6=8/3. If each verdict is equally likely, then 8/3 is the average
number of comparisons made by the sorting algorithm associated with this tree. Sup-
pose for simplicity that the n elements are all distinct.
Lemma 10.1.3. Any binary tree with k leaves has an average height of at least
lgk. (By comparison with Lemma 10.1.1, we see that there is little difference between
the worst case and the average.)
Proof. Let T be a binary tree with k leaves. Define H (T) as the sum of the
depths of the leaves. For example, H (T) = 16 for the tree in Figure 10.1.1. By
definition, the average height of T is H (T )/k . The root of T can have 0, 1, or 2 chil-
dren. In the first case the root is the only leaf in the tree and H (T) = 0. In the second
case, the single child is the root of a subtree A, which also has k leaves. Since the dis-
tance from each leaf to the root of A is one less than the distance from the same leaf to
the root of T, we have H (T) = H (A) + k. In the third case the tree T is composed of a
root and of two subtrees B and C with i and k - i leaves, respectively, for some
1 <- i < k . By a similar argument we obtain this time H (T) = H (B) +H (C) + k.
For k >-1, define h (k) as the smallest value possible for H (X) for all the binary
trees X with k leaves. In particular, h (1) = 0. If we define h (0) = 0, the preceding dis-
cussion and the principle of optimality used in dynamic programming lead to
h (k) = min I h (i) + h (k - i ) + k 10<-i <-k }
for every k > 1. At first sight this recurrence is not well founded since it defines h (k)
in terms of itself (when we take i =0 or i =k in the minimum, which corresponds to
the root having only one child). This difficulty disappears because it is impossible that
h (k) = h (k) + k. We can thus reformulate the recurrence that defines h (k).
0 if k<-1
h(k) =
k +min[h(i)+h(k-i) 11 <-i <-k-1 } ifk > 1
Now, consider the function g (x) = x Igx + (k - x) lg(k - x), where x e R is such
that 1 <- x <- k -1. Calculating the derivative gives g' (x) =1g x - Ig (k -x), which is
Sec. 10.1 Decision Trees 299
zero if and only if x = k -x; that is, if x = k12. Since the second derivative is posi-
tive, g (x) attains its minimum at x = k / 2. This minimum is g (k / 2) = (k lg k) - k.
The proof that h (k) >- k lg k for every integer k ? 1 now follows by mathematical
induction. The base k = 1 is immediate. Let k > 1. Suppose by the induction
hypothesis that h (j) >- j lg j for every strictly positive integer j < k -1. By
definition,
10.2 REDUCTION
We have just shown that any algorithm for sorting by comparison takes a minimum
time in SZ(n log n) to sort n elements, both on the average and in the worst case. On the
other hand, we know that heapsort and mergesort both solve the problem in a time in
O (n log n). Except for the value of the multiplicative constant, the question of the
complexity of sorting by comparison is therefore settled : a time in O(n log n) is both
necessary and sufficient for sorting n elements. Unfortunately, it does not often happen
in the present state of our knowledge that the bounds derived from algorithmics and
complexity meet so satisfactorily.
Because it is so difficult to determine the exact complexity of most of the prob-
lems we meet in practice, we often have to be content to compare the relative difficulty
of different problems. There are two reasons for doing this. Suppose we are able to
prove that a certain number of problems are equivalent in the sense that they have
about the same complexity. Any algorithmic improvement in the method of solution
of one of these problems now automatically yields, at least in theory, a more efficient
algorithm for all the others. From a negative point of view, if these problems have all
been studied independently in the past, and if all the efforts to find an efficient algo-
rithm for any one of them have failed, then the fact that the problems are equivalent
makes it even more unlikely that such an algorithm exists. Section 10.3 goes into this
second motivation in more detail.
Even if we are not able to determine the complexities of A and B exactly, when
A =' B we can be sure that they are the same. In the remainder of this section we
shall see a number of examples of reduction from a variety of application areas.
Problem 10.2.2. Prove that the relations <_ 1 and = 1 are transitive.
Sec. 10.2 Reduction 301
Problem 10.2.3.
i. Prove that any strongly quadratic function is at least quadratic. (Hint: apply
Problem 2.1.20.)
ii. Give an explicit example of an eventually nondecreasing function that is at least
quadratic but not strongly quadratic.
iii. Show that n 2log n is strongly quadratic but not supra quadratic. 0
An upper triangular matrix is a square matrix M whose entries below the diagonal are
all zero, that is, M, = 0 when i > j. We saw in Section 4.9 that a time in O (n 2.81)
(or even 0 (n 2-376)) is sufficient to multiply two arbitrary n x n matrices, contrary to the
intuition that may suggest that this problem will inevitably require a time in S2(n3).
Is it possible that multiplication of upper triangular matrices could be carried out
significantly faster than the multiplication of two arbitrary square matrices ? From
another point of view, experience might well lead us to believe that inverting non-
singular upper triangular matrices should be an operation inherently more difficult than
multiplying them.
We denote these three problems, that is, multiplication of arbitrary square
matrices, multiplication of upper triangular matrices, and inversion of nonsingular
upper triangular matrices, by MQ, MT, and IT, respectively. We shall show under rea-
sonable assumptions that MQ I MT =_1 IT. (The problem of inverting an arbitrary
nonsingular matrix is also linearly equivalent to the three preceding problems (Problem
10.2.9), but the proof of this is much more difficult, it requires a slightly stronger
assumption, and the resulting algorithm is numerically unstable.) Once again this
means that any new algorithm that allows us to multiply upper triangular matrices
more efficiently will also provide us with a new, more efficient algorithm for inverting
arbitrary nonsingular matrices (at least in theory). In particular, it implies that we can
invert any nonsingular n x n matrix in a time in 0 (n 2.376).
In what follows we measure the complexity of algorithms that manipulate n x n
matrices in terms of n, referring to an algorithm that runs in a time in O(n 2) as qua-
dratic. Formally speaking, this is incorrect because the running time should be given as
a function of the size of the instance, so that a time in O(n2) is really linear. No con-
fusion should arise from this. Notice that the problems considered are at least qua-
dratic in the worst case because any algorithm that solves them must look at each entry
of the matrix or matrices concerned.
Proof. Any algorithm that can multiply two arbitrary square matrices can be
used directly for multiplying upper triangular matrices.
0 A 0 0 0 0 0 0 AB
000 x 00B = 00 0
0 0 0 0 0 0 0 0 0
where the "0" are n x n matrices all of whose entries are zero. This product shows us
how to obtain the desired result AB by multiplying two upper triangular 3n x 3n
matrices. The time required for this operation is in 0 (n 2) for the preparation of the
two big matrices and the extraction of AB from their product, plus 0 (t (3n)) for the
multiplication of the two upper triangular matrices. By the smoothness of t(n),
t (3n) E O (t (n)). Because t(n) is at least quadratic, n 2 E O (t (n)). Consequently, the
total time required to obtain the product AB is in 0 (t(n)).
which we now have to invert, are smaller than the original matrix A. Using the
divide-and-conquer technique suggests a recursive algorithm for inverting A in a time
in O(g(n)) where g(n) a 2g(nl2)+2t(nl2)+O(n2). The fact that t(n)ES2(n2) and
the assumption that t(n) is eventually nondecreasing (since it is strongly quadratic)
yield g(n) e 2g(n / 2) +O (t(n)) when n is a power of 2. By Problem 10.2.6, using the
assumption that t(n) is strongly quadratic (or at least supra linear), this implies that
g(n) e 0 (t (n) I n is a power of 2).
Problem 10.2.4. Let IT2 be the problem of inverting nonsingular upper tri-
angular matrices whose size is a power of 2. All that the proof of theorem 10.2.4
really shows is that IT2 <_' MQ. Complete the proof that IT <_' MQ.
Problem 10.2.7. An upper triangular matrix is unitary if all the entries on its
diagonal are 1. Denote by SU the problem of squaring a unitary upper triangular
matrix. Prove that SU MQ under suitable assumptions. What assumptions do you
need ?
Problem 10.2.8. A matrix A is symmetric if Aid = Aji for all i and j. Denote
by MS the problem of multiplying symmetric matrices. Prove that MS =' MQ under
suitable assumptions. What assumptions do you need?
In this section IR°° denotes IR*u I+-}, with the natural conventions that
x +(+oo) =+oo and min(x,+oo) = x for all x e R-.
Let X, Y, and Z be three sets of nodes. Let f : X x Y -4 IR°° and g : Yx Z -4 IR°°
be two functions representing the cost of going directly from one node to another.
An infinite cost represents the absence of a direct route. Denote by
Sec. 10.2 Reduction 305
The minimum cost of going from one node to another without restrictions
on the number of nodes on the path, which we write f*, is therefore
f * = min { f' I i >_ 0 1. This definition is not practical because it apparently implies an
infinite computation ; it is not even immediately clear that f * is well defined. How-
ever, f never takes negative values. Any path that passes twice through the same node
can therefore be shortened by taking out the loop thus formed, without increasing the
cost of the resulting path. Consequently, it suffices to consider only those paths whose
length is less than the number of nodes in X. Let this number be n. We thus have
that f * = min ( f' I 0 <_ i < n . At first sight, computing f * for a given function f
}
efficient than O(n3) for solving the problem of calculating fg therefore implies that
Floyd's algorithm for calculating shortest routes is not optimal, at least in theory.
Denote by MUL and TRC the problems consisting of calculating fg and f*,
respectively. As in the previous section, time complexities will be measured as a func-
tion of the number of nodes in the graphs concerned. An algorithm such as Dijkstra's,
for instance, would be considered quadratic even though it is linear in the number of
edges (for dense graphs). Again, the problems considered are at least quadratic in the
worst case because any algorithm that solves them must look at each edge concerned.
0 ifu =v
f(u,v) if uEX andvEY
h*(u,v) = g(u,v) ifuEY andvEZ
fg(u,v) if uEX and vEZ
+oo otherwise
Therefore the restriction of h* to X x Z is precisely the product fg we wished to calcu-
late. Let n = n I +n 2+n 3 be the cardinality of W. Using the algorithm for calculating
h* thus allows us to compute fg in a time in
t (n) + O (n 2)C O (t(3 max(n i , n 2, n 3))) + O (n 2)g 0 (t (max(n I , n 2, n 3)))
because t(n) is smooth and at least quadratic.
* Problem 10.2.10. Prove formally that the preceding formula for h* is correct.
When the range of the cost functions is restricted to 10, +00 }, calculating f *
comes down to determining for each pair of nodes whether or not there is a path
joining them, regardless of the cost of the path. We saw that Warshall's algorithm
(Problem 5.4.2) solves this problem in a time in 0(n3). Let MULB and TRCB be the
problems consisting of calculating fg and h*, respectively, when the cost functions are
restricted in this way. It is clear that MULB <_ 1 MUL and TRCB <_ 1 TRC since the
general algorithms can also be used to solve instances of the restricted problems.
Furthermore, the proof that MUL =/ TRC can easily be adapted to show that
MULB =' TRCB. This is interesting because MULB <_ 1 MQ, where MQ is the
problem of multiplying arbitrary arithmetic square matrices (Problem 10.2.12). Unlike
the case of arbitrary cost functions, Strassen's algorithm can therefore be used to solve
the problems MULB and TRCB in a time in 0 (n 2.81), thus showing that Warshall's
algorithm is not optimal. Note, however, that using Strassen's algorithm requires a
number of arithmetic operations in 0 (n 2-81 ) ; the time in 0 (n 3) taken by Warshall's
algorithm counts only Boolean operations as elementary. No algorithm is known that
can solve MULB faster than MQ.
We return to the problems posed by the arithmetic of large integers (sections 1.7.2, 4.7,
and 9.5). We saw that it is possible to multiply two integers of size n in a time in
0 (n 1.59) and even in 0 (n log n log log n). What can we say about integer division and
taking square roots ? Our everyday experience leads us to believe that the second of
Sec. 10.2 Reduction 309
these problems, and probably the first one, too, is genuinely more difficult than multi-
plication. Once again this turns out not to be true. Let SQR, MLT, and DIV be the
problems consisting of squaring an integer of size n, of multiplying two integers of
size n, and of determining the quotient when an integer of size 2n is divided by an
integer of size n, respectively. Clearly, these problems are at least linear because any
algorithm that solves them must take into account every bit of the operands involved.
(For simplicity we measure the size of integers in bits. As mentioned in Section 1.7.2,
however, this choice is not critical : the time taken by the various algorithms would be
in the same order if given as a function of the size of their operands in decimal digits
or computer words. This is the case precisely because we assume all these algorithms
to be smooth.)
Theorem 10.2.7. SQR =' MLT =' DIV, assuming these three problems are
smooth and MLT is strongly linear (weaker but more complicated assumptions would
suffice).
Proof outline. The full proof of this theorem is long and technical. Its concep-
tual beauty is also defaced in places by the necessity of using an inordinate number of
ad hoc tricks to circumvent the problems caused by integer truncation (see Problem
10.2.22). For this reason we content ourselves in the rest of this section with showing
the equivalence of these operations in the "cleaner" domain of polynomial arithmetic.
Nonetheless, we take a moment to prove that SQR MLT, assuming SQR is smooth
(a weaker assumption would do).
Clearly, SQR 51 MLT, since squaring is only a special case of multiplication. To
show that MLT <_' SQR, suppose there exists an algorithm that is able to square an
integer of size n in a time in 0 (t(n)), where t(n) is smooth (it is enough to assume that
t (n + 1) E O (t (n)) ). Let x and y be two integers of size n to be multiplied. Assume
without loss of generality that x >- y . The following formula enables us to obtain their
product by carrying out two squaring operations of integers of size at most n + 1, a few
additions, and a division by 4:
xy =((x+y)2-(x-y)2)/4.
Since the additions and the division by 4 can be carried out in a time in O (n), we can
solve MLT in a time in 2t(n +1)+O(n)c O(t(n)) because t(n) is smooth and
t (n) E Q(n).
case p (2) = 24 and d (2) = 8, but q (2) = 4 # 24/8. We even have that p (1) = 7 is not
divisible by d (1) = 4. This is all due to the remainder of the division r (x) = - 3x - 2.
Despite this difficulty, it is possible to determine the quotient and the remainder pro-
duced when a polynomial of degree 2n is divided by a polynomial of degree n in a
time in 0 (n log n) by reducing these problems to a certain number of polynomial mul-
tiplications calculated using the Fourier transform.
Recall that p (x) = E,'_ 0 ai x' is a polynomial of degree n provided that an # 0.
By convention the polynomial p (x) = 0 is of degree -1. Let p (x) be a polynomial of
degree n, and let d (x) be a nonzero polynomial of degree m. Then there exists a
unique polynomial r (x) of degree strictly less than m and a unique polynomial q (x)
such that p (x) = q (x) d (x) + r (x). The polynomial q (x) is of degree n - m if
n > m -1 ; otherwise q (x) = 0. We call q (x) and r (x), respectively, the quotient and
the remainder of the division of p (x) by d (x). By analogy with the integers, the quo-
tient is denoted by q(x) = Lp(x)Id(x)J.
Problem 10.2.14. Prove the existence and the uniqueness of the quotient and
the remainder. Show that if both p (x) and d (x) are monic polynomials (the coefficient
of highest degree is 1) with integer coefficients then both q (x) and r (x) have integer
coefficients and q (x) is monic (unless q (x) = 0).
* Problem 10.2.17. Prove that if p (x), p I (x) and p 2(x) are three arbitrary poly-
nomials, and if d (x), d i(x) and d2(x) are three nonzero polynomials, then
A direct attempt to calculate the square of a polynomial p (x) using the analogous for-
mula (p* (x) - (p (x) + 1)* )* -p (x) has no chance of working : the degree of this
expression cannot be greater than the degree of p (x). This failure is caused by trunca-
tion errors, which we can, nevertheless, eliminate using an appropriate scaling factor.
Suppose there exists an algorithm that is able to calculate the inverse of a poly-
nomial of degree n in a time in 0 (t(n)), where t(n) is a smooth function. Let p (x) be a
polynomial of degree n > 1 whose square we wish to calculate. The polynomial
x In p (x) is of degree 3n, so
[X2np(x)]* = Lx6n/x2np(x)J = LX4n/p(x)J II
Similarly
[x2n (p(x)+1)]* = Lx4n/(p(x)+l)J
By Problem 10.2.17
[x2n p(x)]* -[x2n (p(x)+l)]* = Lx4n/p(x)]- Lx4n/(p(x)+1)]
= L(x4n
(p(x)+1)-x 4np(x))l(p(x)(p(x)+1))j
= Lx4nl(p2(x)+p(x))j
= [p2(x)+p(x)]* .
The last equality follows from the fact that p 2(x) + p (x) is of degree 2n. By Problem
10.2.16, we conclude finally that
p2(x) _ [[x2n p(x)]* -[x2n (p(x)+l)]*]* -p(x)
312 Introduction to Complexity Chap. 10
This gives us an algorithm for calculating p2(x) by performing two inversions of poly-
nomials of degree 3n, one inversion of a polynomial of degree 2n, and a few opera-
tions (additions, subtractions, multiplications by powers of x) that take a time in O (n).
This algorithm can therefore solve SQRP in a time in
2t(3n)+t(2n)+0(n)c 0(t(n))
because t(n) is smooth and at least linear.
x/y = xy-I
If we try to calculate the quotient of a polynomial p (x) divided by a nonzero polyno-
mial d (x) using directly the analogous formula p (x) d* (x), the degree of the result is
too high. To solve this problem we divide the result by an appropriate scaling factor.
Suppose there exists an algorithm that is able to calculate the inverse of a poly-
nomial of degree n in a time in 0 (t(n)), where t(n) is a smooth function. Let p (x) be
a polynomial of degree less than or equal to 2n, and let d (x) be a polynomial of
degree n. We wish to calculate LP (x)/d (x)j. Let r (x) be the remainder of the division
of x 2n by d (x), which is to say that d* (x) = Lx 2" /d (x) j = (x 2n - r (x)) /d (x) and that
the degree of r (x) is strictly less than n . Now consider
x2n-p(x)h(x)=x4k-2 _p(x)xk(x3k-2-r(x))/p(x)=xkr(x)
= [(P(x)h(x))(2x2n -p(x)h(x))+P(x)s(x)]Ix In
= [(x2n
_xkr(x))(xIn +xkr(x))+p(x)s(x)]/xIn
= [xa" _x2kr2(x)+P(x)s(x)]lx2"
=x2n +(P(x)s(x)-x2kr'2(x))/x2"
314 Introduction to Complexity Chap. 10
It remains to remark that the polynomials p (x) s (x) and x 2k r 2(x) are of degree at most
3n -1 to conclude that the degree of x 2n - p (x) q (x) is less than n , hence
q (x) = p* (x), which is what we set out to calculate.
Combining these two stages, we obtain the following recursive formula :
Let g(n) be the time taken to calculate the inverse of a polynomial of degree n by the
divide-and-conquer algorithm suggested by this formula. Taking into account the
recursive evaluation of the inverse of Lp(x)/xk J, the two polynomial multiplications
that allow us to improve our approximation, the subtractions, and the multiplications
and divisions by powers of x, we see that
g(n)Eg((n -1)/2)+t((n -1)/2)+t(n)+O(n)c g((n -1)/2)+O(t(n))
because t(n) is strongly linear. Using Problem 10.2.19, we conclude that
g(n) E O (t (n))
Problem 10.2.18. Let INVP2 be the problem of calculating p* (x) when p (x)
is a polynomial of degree n such that n + 1 is a power of 2. All that the proof of
theorem 10.2.12 really shows is that INVP2 <-' MLTP. Complete the proof that
INVP S ' MLTP.
* Problem 10.2.21. We saw in Section 9.4 how Fourier transforms can be used
to perform the multiplication of two polynomials of degrees not greater than n in a
time in O (n log n). Theorems 10.2.11 and 10.2.12 allow us to conclude that this time
is also sufficient to determine the quotient obtained when a polynomial of degree at
most 2n is divided by a polynomial of degree n. However, the proof of theorem
10.2.11 depends crucially on the fact that the degree of the dividend is not more than
double the degree of the divisor. Generalize this result by showing how we can divide
a polynomial of degree m by a polynomial of degree n in a time in 0 (m log n).
inverse for an integer : if i is an n-bit integer (that is, 2n-1 < i <- 2" -1), define
i* = L22n - '/i J. Notice that i* is also an n-bit integer, unless i is a power of 2. The
problem INV is defined on the integers in the same way as INVP on the polynomials.
The difficulties start with the fact that (i* )* is not always equal to i, contrary to
Problem 10.2.16. (For example, 13* = 9 but 9* = 14.) This hinders all the proofs.
For example, consider how we prove that DIV <- I INV. Let i be an integer of size 2n
and let j be an integer of size n ; we want to calculate Li I j J. If we define
z = Li j * / 22n - 1 ] by analogy with the calculation of L(p (x) d* (x )) /x 2n j in the proof
of theorem 10.2.11, we no longer obtain automatically the desired result z = Li / j].
Detailed analysis shows, however, that z <- Li / j J < z + 2. The exact value of Li / j J
can therefore be obtained by a correction loop that goes around at most three times.
z E- Li j* /22n I]
t - (z +1)x j
while t <- i do
t Ft +j
z z+1
return z
The other proofs have to be adapted similarly.
** Problem 10.2.23. Let SQRT be the problem of computing the largest integer
less than or equal to the square root of a given integer of size n. Prove under suitable
assumptions that SQRT=/ MLT. What assumptions do you need? (Hint: for the
reduction SQRT <-' MLT, follow the general lines of theorem 10.2.12 but use
Newton's method to find the positive zero of f (w) = w 2 -x ; for the inverse reduction,
use the fact that
** Problem 10.2.25. Let GCD be the problem of computing the greatest common
divisor of two integers of size at most n . Prove or disprove GCD = MLT. (Warning:
at the time of writing, this is an open problem.)
There exist many real-life, practical problems for which no efficient algorithm is
known, but whose intrinsic difficulty no one has yet managed to prove. Among these
are such different problems as the travelling salesperson (Sections 3.4.2, 5.6, and
6.6.3), optimal graph colouring (Section 3.4.1), the knapsack problem, Hamiltonian cir-
cuits (Example 10.3.2), integer programming, finding the longest simple path in a
316 Introduction to Complexity Chap. 10
graph (Problem 5.1.3), and the problem of satisfying a Boolean expression. (Some of
these problems are described later.) Should we blame algorithmics or complexity ?
Maybe there do in fact exist efficient algorithms for these problems. After all, com-
puter science is a relative newcomer: it is certain that new algorithmic techniques
remain to be discovered.
This section presents a remarkable result : an efficient algorithm to solve any one
of the problems we have listed in the previous paragraph would automatically provide
us with efficient algorithms for all of them. We do not know whether these problems
are easy or hard to solve, but we do know that they are all of similar complexity. The
practical importance of these problems ensured that each of them separately has been
the object of sustained efforts to find an efficient method of solution. For this reason it
is widely conjectured that such algorithms do not exist. If you have a problem to solve
and you are able to show that it is equivalent (see Definition 10.3.1) to one of those
mentioned previously, you may take this result as convincing evidence that your
problem is hard (but evidence is not a proof). At the very least you will be certain that
nobody else claims to be able to solve your problem efficiently at the moment.
Before going further it will help to define what we mean by an efficient algorithm.
Does this mean it takes a time in O (n log n) ? O (n 2) ? O (n'-") ? It all depends on
the problem to be solved. A sorting algorithm taking a time in O(n2) is inefficient,
whereas an algorithm for matrix multiplication taking a time in 0 (n 2 log n) would be
an astonishing breakthrough. So we might be tempted to say that an algorithm is
efficient if it is better than the obvious straightforward algorithm, or maybe if it is the
best possible algorithm to solve our problem. But then what should we say about the
dynamic programming algorithm for the travelling salesperson problem (Section 5.6)
or the branch-and-bound algorithm (Section 6.6.3) ? Although more efficient than an
exhaustive search, in practice these algorithms are only good enough to solve instances
of moderate size. If there exists no significantly more efficient algorithm to solve this
problem, might it not be reasonable to decide that the problem is inherently intract-
able ?
For our present purposes we answer this question by stipulating that an algorithm
is efficient (or polynomial-time) if there exists a polynomial p (n) such that the algo-
rithm can solve any instance of size n in a time in 0 (p (n)). This definition is
motivated by the comparison in Section 1.6 between an algorithm that takes a time in
O(2') and one that only requires a time in O (n 3 ), and also by sections 1.7.3, 1.7.4,
and 1.7.5. An exponential-time algorithm becomes rapidly useless in practice, whereas
generally speaking a polynomial-time algorithm allows us to solve much larger
instances. The definition should, nevertheless, be taken with a grain of salt. Given
two algorithms requiring a time in O(ntg'g" ) and in O(n 10), respectively, the first, not
being polynomial, is "inefficient". However, it will beat the polynomial algorithm on
all instances of size less than 10300 assuming that the hidden constants are similar. In
fact, it is not reasonable to assert that an algorithm requiring a time in O(n 10) is
Sec. 10.3 Introduction to NP-Completeness 317
Notice that SolveNBF works in polynomial time (counting calls of SolveSF at no cost)
because no integer n can have more than Llg n ] prime factors, even taking repetitions
into account.
The usefulness of this definition is brought out by the two following exercises.
Problem 10.3.1. Let X and Y be two problems such that X <_ T Y Y. Suppose
there exists an algorithm that is able to solve problem Y in a time in O (t(n)), where
t(n) is a nonzero, nondecreasing function. Prove that there exist a polynomial p (n)
and an algorithm that is able to solve problem X in a time in O (p (n) t(p (n))).
Problem 10.3.2. Let X and Y be two problems such that X <T Y. Prove that
the existence of an algorithm to solve problem Y in polynomial time implies that there
also exists a polynomial-time algorithm to solve problem X.
When X <_,° Y and Y <_ m X both hold, then X and Y are many-one polynomially
equivalent, denoted X = m Y.
Sec. 10.3 Introduction to NP-Completeness 319
Example 10.3.2. Let TSPD and HAM be the travelling salesperson decision
problem and the Hamiltonian circuit problem, respectively. An instance of TSPD con-
sists of a directed graph with costs on the edges, together with some bound L used to
turn the travelling salesperson optimization problem (as in Sections 5.6 and 6.6.3) into
a decision problem : the question is to decide whether there exists a tour in the graph
that begins and ends at some node, after having visited each of the other nodes exactly
once, and whose cost does not exceed L. An instance of HAM is a directed graph, and
the question is to decide whether there exists a circuit in the graph passing exactly
once through each node n (with no optimality constraint).
To prove that HAM 5 m TSPD, let G = < N, A > be a directed graph for which you
would like to decide if it has a Hamiltonian circuit. Define f (G) as the instance for
TSPD consisting of the complete graph H = <N, NxN >, the cost function
I if (u,v)EA
c(u , v) _
2 otherwise
and the bound L = #N, the number of nodes in G. Clearly, G E HAM if and only if
< H, c, L > E TSPD. It is also the case that TSPD m HAM, but this is significantly
harder to prove.
Lemma 10.3.1. If X and Y are two decision problems such that X <_m Y, then
xSTY.
* Problem 10.3.4. Prove that the converse of Lemma 10.3.1 does not neces-
sarily hold by giving explicitly two decision problems X and Y for which you can
prove that X <_ T Y whereas it is not the case that X <_ m Y. El
Problem 10.3.5. Prove that the relations <_ T , <_ m , T , and = R, are transi-
tive.
The introduction of TSPD in Example 10.3.2 shows that the restriction to decision
problems is not a severe constraint. In fact, most optimization problems are polynomi-
ally equivalent in the sense of Turing to an analogous decision problem, as the fol-
lowing exercise illustrates.
320 Introduction to Complexity Chap. 10
Prove that COLD =T COLO =T COLC. Conclude that there exists a polynomial-time
algorithm to determine the chromatic number of a graph, and even to find an optimal
colouring, if and only if COLD E P.
These graph colouring problems have the characteristic that although it is perhaps
difficult to decide whether or not a graph can be coloured with a given number of
colours, it is easy to check whether a suggested colouring is valid.
Definition 10.3.5. NP is the class of decision problems for which there exists
a proof system such that the proofs are succinct and easy to check. More precisely, a
decision problem X is in NP if and only if there exist a proof space Q, a proof system
F c X x Q, and a polynomial p (n) such that
We do not require that there should exist an efficient way to find a proof of x when
x c X, only that there should exist an efficient way to check the validity of a proposed
short proof.
return true
return false
Although COLD is in NP and COLO =T COLD, it does not appear that NP contains
the problem of deciding, given a graph G and an integer k, whether k is the chromatic
number of G. Indeed, although it suffices to exhibit a valid colouring to prove that a
graph can be coloured with a given number of colours (Example 10.3.4), no one has
yet been able to invent an efficient proof system to demonstrate that a graph cannot be
coloured with less than k colours.
Problem 10.3.11. Let A and B be two decision problems. Do you believe that
if A 5 T B and B E NP, then A E NP ? Justify your answer.
Sec. 10.3 Introduction to NP-Completeness 323
Problem 10.3.12. Show that HAM, the Hamiltonian circuit problem defined in
Example 10.3.2, is in NP.
Example 10.3.7. In 1903, two centuries after Mersenne claimed without proof
that is a prime number, Frank Cole showed that
267_ 1
It took him "three years of Sundays" to discover this factorization. He was lucky that
the number he chose to attack is indeed composite, since this enabled him to offer a
proof of his result that is both short and easy to check. (This was not all luck : Lucas
had already shown in the nineteenth century that 267-1 is composite, but without
finding the factors.)
The story would have had quite a different ending if this number had been prime.
In this case the only "proof" of his discovery that Cole would have been able to pro-
duce would have been a thick bundle of papers covered in calculations. The proof
would be far too long to have any practical value, since it would take just as long to
check as it did to produce in the first place. (A similar argument may be advanced
concerning the "proof' by computer of the famous four colour theorem.) This results
from a phenomenon like the one mentioned in connection with the chromatic number
of a graph : the problem of recognizing composite numbers is in NP (Example 10.3.3),
but it seems certain at first sight not to be in co-NP, that is, the complementary
problem of recognizing prime numbers seems not to be in NP.
However, nothing is certain in this world except death and taxes : this problem
too is in NP, although the notion of a proof (or certificate) of primality is rather more
subtle than that of a proof of nonprimality. A result from the theory of numbers shows
that n, an odd integer greater than 2, is prime if and only if there exists an integer x
such that
0<x <n
x"- I = 1 (mod n ), and
xI" - I)lP 4P 1 (mod n) for each prime factor p of n -1
* Problem 10.3.13. Complete the proof sketched in Example 10.3.7 that the
problem of primality is in NP. It remains to show that the length of a recursive proof
of primality is bounded above by a polynomial in the size (that is, the logarithm) of the
integer n concerned, and that the validity of such a proof can be checked in polynomial
time.
324 Introduction to Complexity Chap. 10
Problem 10.3.14. Let F = { < x , y > I x , y E N and x has a prime factor less
than y }. Let FACT be the problem of decomposing an integer into prime factors.
Prove that
i. F E NP n co-NP; and
ii. F = T FACT.
The fundamental question concerning the classes P and NP is whether the inclusion
P9 NP is strict. Does there exist a problem that allows an efficient proof system but
for which it is inherently difficult to discover such proofs in the worst case ? Our intui-
tion and experience lead us to believe that it is generally more difficult to discover a
proof than to check it: progress in mathematics would be much faster were this not so.
In our context this intuition translates into the hypothesis that P # NP. It is a cause of
considerable chagrin to workers in the theory of complexity that they can neither prove
nor disprove this hypothesis. If indeed there exists a simple proof that P # NP, it has
certainly not been easy to find !
On the other hand, one of the great successes of this theory is the demonstration
that there exist a large number of practical problems in NP such that if any one of
them were in P then NP would be equal to P. The evidence that supports the
hypothesis P # NP therefore also lends credence to the view that none of these prob-
lems can be solved by a polynomial-time algorithm in the worst case. Such problems
are called NP-complete.
I. X E NP ; and
ii. for every problem Y E NP, Y S T X.
Problem 10.3.15. Prove that there exists an NP-complete problem X such that
X E P if and only if P = NP.
Be sure to work this important problem. It provides the fundamental tool for
proving NP-completeness. Suppose we have a pool of problems that have already
been shown to be NP-complete. To prove that Z is NP-complete, we can choose an
appropriate problem X from the pool and show that X is polynomially reducible to Z
(either many-one or in the sense of Turing). We must also show that Z E NP by exhi-
biting an efficient proof system for Z. Several thousand NP-complete problems have
been enumerated in this way.
This is all well and good once the process is under way, since the more problems
there are in the pool, the more likely it is that we can find one that can be reduced
without too much difficulty to some new problem. The trick, of course, is to get the
ball rolling. What should we do at the outset when the pool is empty to prove for the
very first time that some particular problem is NP-complete? (Problem 10.3.17 is then
powerless.) This is the tour de force that Steve Cook managed to perform in 1971,
opening the way to the whole theory of NP-completeness. (A similar theorem was
discovered independently by Leonid Levin.)
Example 10.3.8. Here are three Boolean expressions using the Boolean vari-
ables p and q .
i. (p+q)=pq
ii. (p b q) (p +q)(p +q )
iii. p (p +q)q
326 Introduction to Complexity Chap. 10
L (p +q +r)(p +q +r)q r
ii. (p +qr)(p +q (p +r))
W. (p =* q)b(p+q)
Expression (i) is composed of four clauses. It is in 3-CNF (and therefore in CNF), but
not in 2-CNF. Expression (ii) is not in CNF since neither p +qr nor p +q (p +r) is a
clause. Expression (iii) is also not in CNF since it contains operators other than con-
junction, disjunction and negation.
*Problem 10.3.19.
Proof. We already know that SAT-CNF is in NP. Thus it remains to prove that
X <_ T SAT-CNF for every problem X E NP. Let Q be a proof space and F an efficient
proof system for X. Let p (n) be the polynomial (given by the definition of NP) such
328 Introduction to Complexity Chap. 10
Problem 10.3.21. Prove that in fact X <_ m SAT-CNF for any decision problem
XENP.
We have just seen that SAT-CNF is NP-complete. Let X E NP be some other decision
problem. To show that X too is NP-complete, we need only prove that
SAT CNF <_ T X (Problem 10.3.17). Thereafter, to show that Y E NP is NP-complete,
we have the choice of proving SAT-CNF <_ T Y or X <_ T Y. We illustrate this principle
with several examples.
are simply a special case of general Boolean expressions. More precisely, if we ima-
gine that the satisfiability of Boolean expressions can be decided at no cost by a call on
DecideSAT, here is a polynomial-time algorithm for solving SAT-CNF.
function DecideSATCNF(`Y)
if P is not in CNF then return false
if DecideSAT (P) then return true
else return false
Problem 10.3.22. Prove that SAT <_ T SAT-CNF. Using Example 10.3.10,
conclude that SAT= T SAT-CNF. (Hint : This problem has a very simple solution.
However, resist the temptation to use Problem 10.3.19(i) to obtain the algorithm
function DecideSAT(Y)
let E be a Boolean expression in CNF equivalent to Y
if DecideSATCNF (E,) then return true
else return false
because Problem 10.3.19(ii) shows that expression i; can be exponentially longer than
expression T, so it cannot be computed in polynomial time in the worst case.)
Problem 10.3.24. Prove that SAT-CNF <_ m SAT-3-CNF still holds even if in
the definition of SAT-3-CNF we insist that each clause should contain exactly three
literals.
Each widget is linked to five other nodes of the graph : nodes C and T of the
control triangle, and three nodes chosen from the y; and z, so as to correspond to the
three literals of the clause concerned. Because these input nodes 1, 2 and 3 cannot be
the same colour as C, Problem 10.3.25 shows that the widget can be coloured with the
colours assigned to C, T, and F if and only if at least one of the nodes 1, 2, and 3 is
coloured with the same colour as node T. In other words, since the colour assigned to
node T represents true, the widget simulates the disjunction of the three literals
represented by the nodes to which it is joined.
332 Introduction to Complexity Chap. 10
This ends the description of the graph G, which can be coloured with three
colours if and only if `i is satisfiable. It is clear that the graph can be constructed
efficiently starting from the Boolean expression `I' in 3-CNF. We conclude that
SAT-3-CNF <_ m 3-COL, and therefore that 3-COL is NP-complete.
** Problem 10.3.31. Prove that the problem of the Hamiltonian circuit (Example
10.3.2 and Problem 10.3.12) is NP-complete.
* Problem 10.3.32. The halting problem consists of deciding, given any pro-
gram as instance, whether the latter will ever halt when started. The notion of reduci-
bility extends in a natural way to unsolvable problems such as the halting problem
(although it is usual to drop the polynomial-time aspect of the reductions - which we
do not do here). A function f : IN -* IN is polynomially bounded if the size of its value
on any argument is polynomially bounded by the size of its argument. Prove that the
problem of computing any polynomially bounded computable function is polynomially
reducible to the halting problem in the sense of Turing. Prove however that there exist
decision problems that are not polynomially reducible to the halting problem in the
sense of Turing.
10.3.4 Non-determinism
The class NP is usually defined quite differently, although the definitions are
equivalent. The classic definition involves the notion of non-deterministic algorithms,
which we only sketch here. The name NP arose from this other definition: it
represents the class of problems that can be solved by a Non-deterministic algorithm in
Polynomial time.
Sec. 10.3 Introduction to NP-Completeness 333
whose effect is to set n to some value between i and j inclusive. The actual value
assigned to n is not specified by the algorithm, nor is it subject to the laws of proba-
bility. The effect of the algorithm is determined by the existence or the nonexistence
of sequences of non-deterministic choices that lead to the production of a result. We
are not concerned with how such sequences could be determined efficiently or how
their nonexistence could be established. For simplicity, we write
return success - bool
as an abbreviation for
success - bool
return .
choices are made. It is even possible for a computation to be arbitrarily long, provided
that the same instance also admits at least one polynomially bounded computation.
prime - false
choose m between 2 and n -1
if m divides n then success - true
else success - false
else { the guess is that n is prime - let's guess a proof ! }
prime - true
choose x between 1 and n -1
{ the guess is that x is as in Example 10.3.7 }
if dexpo (x, n - 1, n) # 1 then return success f- false
m f-n -1
while m > 1 do
choose p between 2 and m
{ the guess is that p is a new prime divisor of n -1 }
primeND (p, pr, suc)
if suc and pr and p divides m and dexpo (x, (n -1)/p , n) 1
* Problem 10.3.34. Prove that a decision problem can be solved by a total con-
sistent polynomial-time non-deterministic algorithm if and only if it belongs to
NP c co-NP.
The preceding theorem and problem suggest the alternative (and more usual)
definition for NP: it is the class of decision problems that are the domain of some
polynomial-time non-deterministic algorithm. In this case, we are only concerned with
the existence or nonexistence of computations (usually called accepting computations
in this context) ; the actual result returned by the algorithm in case of success is
irrelevant, and the corresponding parameter may be ignored altogether (there is no
point in algorithm XND setting ans to true when it finds a q such that <x,q > EF ).
Although the authors prefer the definition based on proof systems, it is sometimes
easier to show that a problem belongs to NP with this other definition. For instance, it
makes Problems 10.3.8 and 10.3.9 completely obvious.
336 Introduction to Complexity Chap. 10
338
Table of Notation 339
ABADI, M., J. FEIGENBAUM, and J. KILIAN (1987), "On hiding information from an oracle",
Proceedings of 19th Annual ACM Symposium on the Theory of Computing, pp. 195-203.
ACKERMANN, W. (1928), "Zum Hilbertschen Aufbau der reellen Zahlen", Mathematische
Annalen, 99, 118-133.
ADEL'SON-VEL'SKII, G. M. and E. M. LANDIS (1962), "An algorithm for the organization of
information" (in Russian), Doklady Akademii Nauk SSSR, 146, 263-266.
ADLEMAN, L. M. and M.-D. A. HUANG (1987), "Recognizing primes in random polynomial
time", Proceedings of 19th Annual ACM Symposium on the Theory of Computing, pp.
462-469.
ADLEMAN, L. M., K. MANDERS, and G. MILLER (1977), "On taking roots in finite fields",
Proceedings of 18th Annual IEEE Symposium on the Foundations of Computer Science, pp.
175-178.
ADLEMAN, L. M., C. POMERANCE, and R. S. RUMELY (1983), "On distinguishing prime numbers
from composite numbers", Annals of Mathematics, 117, 173-206.
AHO, A. V. and M. J. CORASICK (1975), "Efficient string matching : An aid to bibliographic
search", Communications of the ACM, 18(6), 333-340.
AHO, A. V., J. E. HOPCROFT, and J. D. ULLMAN (1974), The Design and Analysis of Computer
Algorithms, Addison-Wesley, Reading, MA.
AHO, A. V., J. E. HOPCROFT, and J. D. ULLMAN (1976), "On finding lowest common ancestors in
trees", SIAM Journal on Computing, 5(1), 115-132.
AHO, A. V., J.E. HOPCROFr, and J. D. ULLMAN (1983), Data Structures and Algorithms,
Addison-Wesley, Reading, MA.
AJTAI, M., J. KOMLOS, and E. SZEMEREDI (1983), "An O(n logn) sorting network", Proceedings
of 15th Annual ACM Symposium on the Theory of Computing, pp. 1-9.
ANON. (c. 1495) Lytell Geste of Robyn Hode, Wynkyn de Worde, London.
341
342 Bibliography
BLUM, M., R. W. FLOYD, V.R. PRATT, R. L. RIVEST. and R. E. TARJAN (1972), "Time bounds for
selection", Journal of Computer and System Sciences, 7(4), 448-461.
BLUM, M. and S. MICAU (1984), "How to generate cryptographically strong sequences of
pseudo-random bits", SIAM Journal on Computing, 13(4), 850-864.
BORODIN, A. B. and J. 1. MUNRO (1971), "Evaluating polynomials at many points", Information
Processing Letters, 1(2), 66-68.
BORODIN, A. B. and J. 1. MUNRO (1975), The Computational Complexity of Algebraic and
Numeric Problems, American Elsevier, New York, NY.
BORUVKA, O. (1926), "O jistem problemu minimalnim", Praca Morarske Prirodovedecke
Spolecnosti, 3. 37-58.
BOYER, R. S. and J. S. MOORE (1977), "A fast string searching algorithm", Communications of
theACM, 20(10), 762-772.
BRASSARD, G. (1979), "A note on the complexity of cryptography", IEEE Transactions on Infor-
mation Theory, IT-25(2), 232-233.
BRASSARD, G. (1985), "Crusade for a better notation", SIGACT News, ACM, 17(1), 60-64.
BRASSARD, G. (1988), Modern Cr_yptology: A Tutorial, Lecture Notes in Computer Science,
Springer-Verlag, New York, NY.
BRASSARD, G. and S. KANNAN (1988), "The generation of random permutations on the fly",
Information Processing Letters (in press).
BRASSARD, G. and S. MONET (1982), "L'indecidabilite sans larme (ni diagonalisation)", Publica-
tion no. 445, Departement d'informatique et de recherche operationnelle, Universite de
Montreal.
BRASSARD, G., S. MONET, and D. ZUFFELLATO (1986), "L'arithmetique des tres grands entiers",
TSI: Technique et Science Informatiques, 5(2), 89-102.
BRATLEY, P., B. L. Fox, and L. E. SCHRAGE (1983), A Guide to Simulation, Springer-Verlag,
New York, NY; second edition, 1987.
BRIGHAM, E.O. (1974), The Fast Fourier Transform, Prentice-Hall, Englewood Cliffs, NJ.
BUNCH, J. and J. E. HOPCROFT (1974), "Triangular factorization and inversion by fast matrix
multiplication", Mathematics of Computation, 28(125), 231-236.
BUNEMAN, P. and L. LEVY (1980), "The towers of Hanoi problem", Information Processing
Letters, 10(4,5), 243-244.
CARASSO, C. (1971), Analyse numerique, Lidec, Montreal, Quebec, Canada.
CARLSSON, S. (1986), Heaps, Doctoral dissertation, Department of Computer Science, Lund
University, Lund, Sweden, CODEN: LUNFD6/(NFCS-I(X)3)/(1-70)/(1986).
CARLSSON, S. (1987), "Average case results on heapsort", Bit, 27, 2-17.
CARTER, J. L. and M. N. WEGMAN (1979), "Universal classes of hash functions", Journal of
Computer and System Sciences, 18(2), 143-154.
CHANG, L. and J. KORSH (1976), "Canonical coin changing and greedy solutions", Journal of the
ACM, 23(3), 418-422.
CHERITON, D. and R. E. TARJAN (1976), "Finding minimum spanning trees", SIAM Journal on
Computing, 5(4), 724-742.
CHRISTOFIDES, N. (1975), Graph Theory: An Algorithmic Approach, Academic Press,
New York, NY.
344 Bibliography
CHRISTOFIDES, N. (1976), "Worst-case analysis of a new heuristic for the traveling salesman
problem", Management Sciences Research Report no. 388, Carnegie-Mellon University, Pitts-
burgh, PA.
COHEN, H. and A. K. LENSTRA, (1987), "Implementation of a new primality test", Mathematics
of Computation, 48(177), 103-121.
COOK, S. A. (1971), "The complexity of theorem-proving procedures", Proceedings of 3rd
Annual ACM Symposium on the Theory of Computing, pp. 151-158.
COOK, S. A. and S. O. AANDERAA (1969), "On the minimum complexity of functions", Transac-
tions of the American Mathematical Society, 142, 291-314.
COOLEY, J. M., P. A. LEWIS, and P. D. WELCH (1967), "History of the fast Fourier transform",
Proceedings of the IEEE, 55, 1675-1679.
COOLEY, J. M. and J. W. TUKEY (1965), "An algorithm for the machine calculation of complex
Fourier series", Mathematics of Computation, 19(90), 297-301.
COPPERSMITH, D. and S. WINOGRAD (1987), "Matrix multiplication via arithmetic progressions",
Proceedings of 19th Annual ACM Symposium on the Theory of Computing, pp. 1-6.
CRAY RESEARCH (1986), "CRAY-2 computer system takes a slice out of pi", Cray Channels,
8(2), 39.
CURTISS, J. H. (1956), "A theoretical comparison of the efficiencies of two classical methods and
a Monte Carlo method for computing one component of the solution of a set of linear alge-
braic equations", in Symposium on Monte Carlo Methods, H. A. Meyer, ed., John Wiley &
Sons, New York, NY, pp. 191-233.
DANIELSON, G. C. and C. LANCZOS (1942), "Some improvements in practical Fourier analysis
and their application to X-ray scattering from liquids", Journal of the Franklin Institute, 233,
365-380, 435-452.
DE BRUIJN, N. G. (1961), Asymptotic Methods in Analysis, North Holland, Amsterdam.
DEMARS, C. (1981), "Transformee de Fourier rapide", Micro-Systemes", 155-159.
DENNING, D.E.R. (1983), Cryptography and Data Security, Addison-Wesley, Reading, MA.
DEVROYE, L. (1986), Non-Uniform Random Variate Generation, Springer-Verlag, New York,
NY.
DEWDNEY, A. K. (1984), "Computer recreations : Yin and yang : recursion and iteration, the
tower of Hanoi and the Chinese rings", Scientific American, 251(5), 19-28.
DEYONG, L. (1977), Playboy's Book of Backgammon, Playboy Press, Chicago, IL.
DIFFIE, W. and M. E. HELLMAN (1976), "New directions in cryptography", IEEE Transactions
on Information Theory, IT-22(6), 644-654.
DIJKSTRA, E. W. (1959), "A note on two problems in connexion with graphs", Numerische
Mathematik, 1, 269-271.
DIXON, J. D. (1981), "Asymptotically fast factorization of integers", Mathematics of Computa-
tion, 36(153), 255-260.
DROMEY, R. G. (1982), How to Solve It by Computer, Prentice-Hall, Englewood Cliffs, NJ.
ERDOS, P. and C. POMERANCE (1986), "On the number of false witnesses for a composite
number", Mathematics of Computation, 46(173), 259-279.
EVEN, S. (1980), Graph Algorithms, Computer Science Press, Rockville, MD.
Bibliography 345
FEIGENBAUM, J. (1986), "Encrypting problem instances, or .... can you take advantage of
someone without having to trust him?", Proceedings of CRYPTO 85, Springer-Verlag, Berlin,
pp. 477-488.
FISCHER, M. J. and A. R. MEYER (1971), "Boolean matrix multiplication and transitive closure",
Proceedings of IEEE 12th Annual Symposium on Switching and Automata Theory, pp.
129-131.
FLAJOLET, P. and G. N. MARTIN (1985), "Probabilistic counting algorithms for data base applica-
tions", Journal of Computer and System Sciences, 31(2), 182-209.
FLOYD, R. W. (1962), "Algorithm 97: Shortest path", Communications of the ACM, 5(6), 345.
Fox, B.L. (1986), "Algorithm 647: Implementation and relative efficiency of quasirandom
sequence generators", ACM Transactions on Mathematical Software, 12(4), 362-376.
FREDMAN, M. L. (1976), "New bounds on the complexity of the shortest path problem", SIAM
Journal on Computing, 5(1), 83-89.
FREDMAN, M. L. and R. E. TARJAN (1984), "Fibonacci heaps and their uses in improved network
optimization algorithms", Proceedings of 25th Annual IEEE Symposium on the Foundations of
Computer Science, pp. 338-346.
FREIVALDS, R. (1977), "Probabilistic machines can use less running time", Proceedings of Infor-
mation Processing 77, pp. 839-842.
FREIVALDS, R. (1979), "Fast probabilistic algorithms", Proceedings of 8th Symposium on the
Mathematical Foundations of Computer Science, Lecture Notes in Computer Science, 74,
Springer-Verlag, Berlin, pp. 57-69.
FURMAN, M.E. (1970), "Application of a method of fast multiplication of matrices in the
problem of finding the transitive closure of a graph" (in Russian), Doklady Akademii Nauk
SSSR, 194, 524.
GARDNER, M. (1977), "Mathematical games : A new kind of cipher that would take millions of
years to break", Scientific American, 237(2), 120-124.
GARDNER, M. and C. H. BENNETT (1979), "Mathematical games : The random number omega
bids fair to hold the mysteries of the universe", Scientific American, 241(5), 20-34.
GAREY, M. R. and D. S. JOHNSON (1976), "Approximation algorithms for combinatorial prob-
lems : An annotated bibliography", in Traub (1976), pp. 41-52.
GAREY, M. R. and D. S. JOHNSON (1979), Computers and Intractability: A Guide to the Theory
of NP-Completeness, W. H. Freeman and Co., San Francisco, CA.
GENTLEMAN, W. M. and G. SANDE (1966), "Fast Fourier transforms-for fun and profit",
Proceedings of AFIPS Fall Joint Computer Conference, 29, Spartan, Washington, DC, pp.
563-578.
GILBERT, E. N. and E. F. MOORE (1959), "Variable length encodings", Bell System Technical
Journal, 38(4), 933-968.
GLEICK, J. (1987), "Calculating pi to 134 million digits hailed as great test for computer", New
York Times, c. March 14.
GODBOLE, S. (1973), "On efficient computation of matrix chain products", IEEE Transactions on
Computers, C-22(9), 864-866.
GOLDWASSER, S. and J. KILIAN (1986), "Almost all primes can be quickly certified", Proceed-
ings of 18th Annual ACM Symposium on the Theory of Computing, pp. 316-329.
GOLDWASSER, S. and S. MICALI (1984), "Probabilistic encryption", Journal of Computer and
System Sciences, 28(2), 270-299.
346 Bibliography
GOLOMB, S. and L. BAUMERT (1965), "Backtrack programming", Journal of the ACM, 12(4),
516-524.
GONDRAN, M. and M. (1979), Graphes et algorithmes, Eyrolles, Paris ; translated as :
Graphs and Algorithms (1984), John Wiley & Sons, New York, NY.
GONNET, G. H. (1984), Handbook of Algorithms and Data Structures, Addison-Wesley, Reading,
MA.
GONNET, G. H. and J. I. MUNRO (1986), "Heaps on heaps", SIAM Journal on Computing, 15(4),
964-971.
GOOD, I. J. (1968), "A five-year plan for automatic chess", in Machine Intelligence, vol. 2,
E. Dale and D. Michie, eds., American Elsevier, New York, NY, pp. 89-118.
GRAHAM, R. L. and P. HELL (1985), "On the history of the minimum spanning tree problem",
Annals of the History of Computing, 7(1), 43-57.
GREENE, D. H. and D.E. KNUTH (1981), Mathematics for the Analysis of Algorithms, Birkhauser,
Boston, MA.
GRIES, D. (1981), The Science of Programming, Springer-Verlag, New York, NY.
GRIES, D. and G. LEVIN (1980), "Computing Fibonacci numbers (and similarly defined func-
tions) in log time", Information Processing Letters, 11(2), 68-69.
HALL, A. (1873), "On an experimental determination of it ", Messenger of Mathematics, 2,
113-114.
HAMMERSLEY, J. M. and D.C. HANDSCOMB (1965), Monte Carlo Methods; reprinted in 1979 by
Chapman and Hall, London.
HARDY, G. H. and E. M. WRIGHT (1938), An Introduction to the Theory of Numbers, Oxford Sci-
ence Publications, Oxford, England ; fifth edition, 1979.
HAREL, D. (1987), Algorithmics : The Spirit of Computing, Addison-Wesley, Reading, MA.
HARRISON, M. C. (1971), "Implementation of the substring test by hashing", Communications of
the ACM, 14(12), 777-779.
HELD, M. and R. KARP (1962), "A dynamic programming approach to sequencing problems",
SIAM Journal on Applied Mathematics, 10(1), 196-210.
HELLMAN, M. E. (1980), "The mathematics of public-key cryptography", Scientific American,
241(2), 146-157.
HOARE, C.A.R. (1962), "Quicksort", Computer Journal, 5(1), 10-15.
HoPCROFT, J. E. and R. KARP (1971), "An algorithm for testing the equivalence of finite auto-
mata", Technical report TR-71-114, Department of Computer Science, Cornell University,
Ithaca, NY.
HOPCROFT, J. E. and L. R. KERR (1971), "On minimizing the number of multiplications necessary
for matrix multiplication", SIAM Journal on Applied Mathematics, 20(1), 30-36.
HOPCROFT, J. E. and R. E. TARTAN (1973), "Efficient algorithms for graph manipulation", Com-
munications of the ACM, 16(6), 372-378.
HOPCROFT, J. E. and R. E. TARJAN (1974), "Efficient planarity testing", Journal of the ACM,
21(4), 549-568.
HOPCROFT, J. E. and J. D. ULLMAN (1973), "Set merging algorithms", SIAM Journal on Com-
puting, 2(4), 294-303.
HOPCRoFT, J. E. and J. D. ULLMAN (1979), Introduction to Automata Theory, Languages, and
Computation, Addison-Wesley, Reading, MA.
Bibliography 347
HoRowiTz, E. and S. SAHNI (1976), Fundamentals of Data Structures, Computer Science Press,
Rockville, MD.
HoRowrrz, E. and S. SAHNI (1978), Fundamentals of Computer Algorithms, Computer Science
Press, Rockville, MD.
Hu, T. C. and M. R. SHING (1982), "Computations of matrix chain products", Part I, SIAM
Journal on Computing, 11(2), 362-373.
Hu, T.C. and M. R. SHING (1984), "Computations of matrix chain products", Part II, SIAM
Journal on Computing, 13(2), 228-251.
ITAI, A. and M. RODEH (1981), "Symmetry breaking in distributive networks", Proceedings of
22nd Annual IEEE Symposium on the Foundations of Computer Science, pp. 150-158.
JANKO, W. (1976), "A list insertion sort for keys with arbitrary key distribution", ACM Transac-
tions on Mathematical Software, 2(2), 143-153.
JARNIK, V. (1930), "0 jistem problemu minimalnim", Praca Moravske Prirodovedecke Spolec-
nosti, 6, 57-63.
JENSEN, K. and N. WIRTH (1985), Pascal User Manual and Report, third edition revised by
A. B. Michel and J. F. Miner, Springer-Verlag, New York, NY.
JOHNSON, D. B. (1975), "Priority queues with update and finding minimum spanning trees",
Information Processing Letters, 4(3), 53-57.
JOHNSON, D. B. (1977), "Efficient algorithms for shortest paths in sparse networks", Journal of
the ACM, 24(1), 1-13.
KAHN, D. (1967), The Codebreakers: The Story of Secret Writing, Macmillan, New York, NY.
KALISKI, B.S., R.L. RIVEST, and A.T. SHERMAN (1988), "Is the Data Encryption Standard a
group?", Journal of Cryptology, (1), in press.
KANADA, Y, Y. TAMURA, S. YOSHINO, and Y. USHIRO (1986), "Calculation of it to 10,013,395
decimal places based on the Gauss-Legendre algorithm and Gauss arctangent relation",
manuscript.
KARATSUBA, A. and Y. OFMAN (1962), "Multiplication of multidigit numbers on automata"
(in Russian), Doklady Akademii Nauk SSSR, 145, 293-294.
KARP, R. (1972), "Reducibility among combinatorial problems", in Complexity of Computer
Computations, R. E. Miller and J. W. Thatcher, eds., Plenum Press, New York, NY, pp.
85-104.
KARP, R. and M. O. RABIN (1987), "Efficient randomized pattern-matching algorithms", IBM
Journal of Research and Development, 31(2), 249-260.
KASIMI, T. (1965), "An efficient recognition and syntax algorithm for context-free languages",
Scientific Report AFCRL-65-758, Air Force Cambridge Research Laboratory, Bedford, MA.
KLAMKIN, M.S. and D.J. NEWMAN (1967), "Extensions of the birthday surprise", Journal of
Combinatorial Theory, 3(3), 279-282.
KLEENE, S.C. (1956), "Representation of events in nerve nets and finite automata", in Automata
Studies, C.E. Shannon and J. McCarthy, eds., Princeton University Press, Princeton, NJ, pp.
3-40.
KNUTH, D. E. (1968), The Art of. Computer Programming, 1 : Fundamental Algorithms,
Addison-Wesley, Reading, MA; second edition, 1973.
KNUTH, D. E. (1969), The Art of Computer Programming, 2: Seminumerical Algorithms,
Addison-Wesley, Reading, MA; second edition, 1981.
348 Bibliography
RIVEST, R. L. and R. W. FLOYD (1973), "Bounds on the expected time for median computations",
in Combinatorial Algorithms, R. Rustin, ed., Algorithmics Press, New York, NY, pp. 69-76.
RIVEST, R. L., A. SHAMIR, and L.M. ADLEMAN, (1978), "A method for obtaining digital signa-
tures and public-key cryptosystems", Communications of the ACM, 21(2), 120-126.
ROBSON, J.M. (1973), "An improved algorithm for traversing binary trees without auxiliary
stack", Information Processing Letters, 2(1), 12-14.
ROSENTHAL, A. and A. GOLDNER (1977), "Smallest augmentation to biconnect a graph", SIAM
Journal on Computing, 6(1), 55-66.
RUDIN, W. (1953), Principles of Mathematical Analysis, McGraw-Hill, New York, NY.
RUNGE, C. and H. KoNIG (1924), Die Grundlehren der Mathematischen Wissenschaften, 11,
Springer, Berlin.
RYTTER, W. (1980), "A correct preprocessing algorithm for Boyer-Moore string searching",
SIAM Journal on Computing, 9(3), 509-512.
SAHNI. S. and E. HOROWITZ (1978), "Combinatorial problems: reducibility and approximation",
Operations Research, 26(4), 718-759.
SCHONHAGE, A. and V. STRASSEN (1971), "Schnelle Multiplikation grosser Zahlen", Computing,
7, 281-292.
SCHWARTZ, E. S. (1964), "An optimal encoding with minimum longest code and total number of
digits", Information and Control, 7(1), 37-44.
SCHWARTZ, J. (1978), "Probabilistic algorithms for verification of polynomial identities", Com-
puter Science Department, Courant Institute, New York University, Technical Report no. 604.
SEDGEWICK, R. (1983), Algorithms, Addison-Wesley, Reading, MA.
SHAMIR, A. (1979), "Factoring numbers in O (log n) arithmetic steps", Information Processing
Letters, 8(1), 28-31.
SHANKS, D. (1972), "Five number-theoretic algorithms", Proceedings of the Second Manitoba
Conference on Numerical Mathematics, pp. 51-70.
SLOANE, N.J. A. (1973), A Handbook of Integer Sequences, Academic Press, New York, NY.
SOBOL', I. M. (1974), The Monte Carlo Method, second edition, University of Chicago Press,
Chicago, IL.
SOLOVAY, R. and V. STRASSEN (1977), "A fast Monte-Carlo test for primality", SIAM Journal
on Computing, 6(1), 84-85; erratum (1978), ibid, 7, 118.
STANDISH, T. A. (1980), Data Structure Techniques, Addison-Wesley, Reading, MA.
STINSON, D. R. (1985), An Introduction to the Design and Analysis of Algorithms, The Charles
Babbage Research Centre, St. Pierre, Manitoba.
STOCKMEYER, L.J. (1973), "Planar 3-colorability is polynomial complete", SIGACT News, 5(3),
19-25.
STOCKMEYER, L.J. and A. K. CHANDRA (1979), "Intrinsically difficult problems", Scientific
American, 240(5), 140-159.
STONE, H. S. (1972), Introduction to Computer Organization and Data Structures, McGraw-Hill,
New York, NY.
STRASSEN, V. (1969), "Gaussian elimination is not optimal", Numerische Mathematik, 13,
354-356.
TARJAN, R. E. (1972), "Depth-first search and linear graph algorithms", SIAM Journal on Com-
puting, 1(2), 146-160.
Bibliography 351
TARJAN, R. E. (1975), "On the efficiency of a good but not linear set merging algorithm",
Journal of'the ACM, 22(2), 215-225.
TARJAN, R.E. (1981), "A unified approach to path problems", Journal of the ACM, 28(3),
577-593.
TARJAN, R.E. (1983), Data Structures and Network Algorithms, SIAM, Philadelphia, PA.
TRAUB, J. F., ed. (1976), Algorithms and Complexity : Recent Results and New Directions,
Academic Press, New York, NY.
TURING, A. M. (1936), "On computable numbers with an application to the
Entscheidungsproblem", Proceedings of the London Mathematical Society, 2(42), 230-265.
TURK, J. W. M. (1982), "Fast arithmetic operations on numbers and polynomials", in Lenstra and
Tijdeman (1982), pp. 43-54.
URBANEK, F. J. (1980), "An O (log n) algorithm for computing the n th element of the solution of
a difference equation", Information Processing Letters, 11(2), 66-67.
VALOIS, D. (1987), Algorithmes prohabilistes: une anthologie, Masters Thesis, Departement
d'informatique et de recherche operationnelle, Universite de Montreal.
VAZIRANI, U. V. (1986), Randomness, Adversaries and Computation, Doctoral dissertation,
Computer Science, University of California, Berkeley, CA.
VAZIRANI, U. V. (1987), "Efficiency considerations in using semi-random sources", Proceedings
of 19th Annual ACM Symposium on the Theory of Computing, pp. 160-168.
VICKERY, C. W. (1956), "Experimental determination of eigenvalues and dynamic influence
coefficients for complex structures such as airplanes", Symposium on Monte Carlo Methods,
H. A. Meyer, ed., John Wiley & Sons, New York, NY, pp. 145-146.
WAGNER, R. A. and M. J. FISCHER (1974), "The string-to-string correction problem", Journal of
the ACM, 21(1), 168-173.
WARSHALL, S. (1962), "A theorem on Boolean matrices", Journal of the ACM, 9(1), 11-12.
WARUSFEL, A. (1961), Les nombres et leurs mysteres, Editions du Seuil, Paris.
WEGMAN, M.N. and J. L. CARTER (1981), "New hash functions and their use in authentication
and set equality", Journal of Computer and System Sciences, 22(3), 265-279.
WILLIAMS, H. (1978), "Primality testing on a computer", Ars Combinatoria, 5, 127-185.
WILLIAMS, J.W.J. (1964), "Algorithm 232: Heapsort", Communications of the ACM, 7(6),
347-348.
WINOGRAD, S. (1980), Arithmetic Complexity of Computations, SIAM, Philadelphia, PA.
WRIGHT, J. W. (1975), "The change-making problem", Journal of the ACM, 22(1), 125-128.
YAO, A. C. (1975), "An O (JE Ilog logI V 1) algorithm for finding minimum spanning trees", Infor-
mation Processing Letters, 4(1), 21-23.
YAO, A. C. (1982), "Theory and applications of trapdoor functions", Proceedings of 23rd Annual
IEEE Symposium on the Foundations of Computer Science, pp. 80-91.
YAO, F. F. (1980), "Efficient dynamic programming using quadrangle inequalities", Proceedings
of 12th Annual ACM Symposium on the Theory of Computing, pp. 429-435.
YOUNGER, D. H. (1967), "Recognition of context-free languages in time n 3 ", Information and
Control, 10(2), 189-208.
ZIPPED, R. E. (1979), Probabilistic Algorithms for Sparse Polynomials, Doctoral dissertation,
Massachusetts Institute of Technology, Cambridge, MA.
Index
353
354 Index
Monet, S., 140, 291, 337, 343 Papadimitriou, C.H., 35, 204, 348, 349
Monic polynomial, 136, 209-211, 286, 310 parent, 23
Monier, L., 276, 349 Pascal (computer language), 3, 8-17, 35, 106, 205
Monte Carlo algorithm, 227, 262-274, 322 Pascal's triangle, 143, 146
See also Numeric integration; Quasi Monte Carlo Path compression, 33, 34, 62
Montgomery, P.L., 276, 349 Pattern, 211, 212-222
Moore, E.F., 167, 345 p-correct probabilistic algorithm, 263
Moore, J.S., 216, 222, 343 Peralta, R.C., 275, 349
Morris, J.H., 222, 348 Percolate, 26, 27, 28-30, 91
Multiplication. See also Large integer arithmetic; percolate, 27
Polynomial arithmetic Permutation. See Generation of Permutation; Random
a la russe, 2-4, 5, 13-14, 35, 124 permutation
classic, 1, 2, 13-14, 124-132, 278 Pi (it), 124, 228-234, 281, 290-291
of matrices, 132-133, 140, 146-150, 237, 274, Pippenger, N., 336, 349
302-308 Pivot, 116, 120, 224, 239-240
Munro, J.I., 35, 36, 291, 336, 337, 343, 346 pivot, 117
Planar graph, 204, 332
Pohl, 1., 141, 349
Napier, J., 277 Pointer, 3, 20
n-ary-node, 24 Pollard, J.M., 276, 291, 349
Nebut, J.-L., 35, 348 Polynomial algorithm, 6, 316
Nemhauser, G., 167, 204, 342, 349 Polynomial arithmetic:
Nested asymptotic notation, 45 division, 309-314
Newman, D.J., 275, 347 evaluation, 209-211, 222, 286, 291
Newton's binomial, 10 interpolation, 136, 286
Newton's method, 313, 315 multiplication, 136, 278, 284-286, 309-314
Nievergelt, J., 35, 204, 349 reductions, 308-314
Nilsson, N., 35, 204, 349 Polynomial. See Equivalence; Reduction
Nim,190-194 Pomerance, C., 276, 336, 341, 342, 344, 349
Node, 20, 21 postnum, 180, 208
node, 21 Postorder, 170, 174, 208
Nondecreasing. See Eventually nondecreasing function Pratt, V.R., 140, 222, 336, 343, 348, 349
Non-determinism, 332-335, 336 Precomputation, 205, 211-222
Non-deterministic algorithm, 333 Preconditioning, 154-159, 166, 193, 205-211, 215,
Nontrivial factor, 225, 228, 256, 334 240-242, 278
NP, 321, 332-335 prenum, 173, 208
Preorder, 170, 208
NP-complete, 102, 103, 153, 324-332
Prim, R,C., 104, 349
NP-completeness, 315-335, 336-337
Prim's algorithm, 85-87, 92
Number theory. See Theory of numbers
Primality testing, 9, 256, 269-271, 276, 334
Numeric integration:
See also Certificate of primality
Monte Carlo, 230-232, 275
primeND, 334
multiple, 232, 275
Principal root of unity, 281
trapezoidal, 231
Principle of invariance, 6, 38
Numerical probabilistic algorithm, 227, 228-237, 275
Principle of optimality, 143, 144-159, 298
Priority list, 25, 27, 30, 199
Objective function, 79-80 Probabilistic algorithm. See Biased probabilistic algo-
obstinate, 248 rithm; Consistent probabilistic algorithm; Las
Ofman, Y., 140, 347 Vegas algorithm; Monte Carlo algorithm;
Omega (a), 41 Numerical probabilistic algorithm; p-correct
One-way equality, 51, 78 probabilistic algorithm; Sherwood algorithm;
Operations on asymptotic notation, 43-45 Unbiased probabilistic algorithm
Optimal graph colouring, 315, 320, 330-332 Probabilistic counting, 232-237, 275
Optimal search tree, 154-159, 167-168, 207 Probability of success, 247, 263, 333
Optimality. See Principle of optimality Problem, 4
Order of (0), 6, 37 Program, 2
Programming language, 2
Promising, 79
P, 318 See also k-promising
Pan, V., 133, 140, 349 Proof space, 320
Index 359
Shortest path, 30, 87-92, 104, 150-153, 167, Switch. See Telephone switching
304-308 Symbolic manipulation of polynomials.
Shortest simple path, 153 See Polynomial arithmetic
Shub, M., 275, 342 Symmetric matrix, 22, 86, 103, 153, 304
shuffle, 241 Syntactic analysis, 168, 203, 205
Sibling, 23 Szemeredi, E., 141, 341
Sift-down, 27, 28-30, 55-56, 91
sift-down, 27
sift-up. See Percolate tablist, 20
Signal processing, 19, 279, 280 Tally circuit, 137
Signature, 211-213, 222, 273 Tamura, Y., 291, 347
Simple path, 144, 153, 315 Target string, 211, 213-222
Simplex, 227 Tarjan, R.E., xiv, 35, 36, 78, 104, 140, 167, 204, 343,
Simplification, 106, 109-115, 121, 128-132, 134-136 345, 346, 350, 351
Simula, 205 Tautology, 325, 326-327, 336
Simulation, 28, 227, 275 Telephone switching, 137, 336
Sink, 203 Text editor, 211
Size of circuit, 138 Text processing, 211
Size of instance, 5 Theory of numbers, 140, 276
Sloane, N.J.A., 167, 168, 350 Threshold for asymptotic notation, 37, 39, 43
slow-make-heap, 28, 56 Threshold for divide-and-conquer, 106
Smooth algorithm, 301 determination, 107-109
Smooth function, 46, 74, 301 ultimate, 108, 123
Smooth integer. See k-smooth integer Tijdeman, R., 348, 349, 351
Smooth problem, 301 Timetable, 139-140
Sobol', I.M., 275, 350 Top-down technique, 142
Solovay, R., 276, 350 Topological sorting, 178-179, 190
Sort. See Ad hoc sorting; Batcher's sorting circuit; Tournament, 139-140, 144-146
Comparison sort; countsort; Heapsort; Insertion Towers of Hanoi, 64, 69, 78
sort; Lexicographic sorting; Quicksort; Radix Transformation:
sort; Selection sort; Topological sorting; of the domain, 277-291
Transformation sort function, 277
Source, 30, 87, 150 sort, 293, 336
Sparse graph, 87, 91, 151 Transformed domain, 277
Special path, 87, 88-90 Transformed function, 277
Splitting, 256, 276 Trapezoidal algorithm. See Numeric integration
Square root. See also Large integer arithmetic Traub, J.F., 345, 349, 351
modulo n, 257 Travelling salesperson, 102, 103-104, 159-162, 168,
modulo p, 252-256 199-202, 204, 315, 319
Stack, 21, 107, 182, 199 See also Euclidean travelling salesperson
Stanat, D.F., 275, 342 Traversal of tree. See Inorder; Postorder;
Standish, T.A., 35, 350 Preorder; Searching
Statistical test, 226, 275 Tree, 22
Steele, J.M., 275, 342 See also Ancestry in a rooted tree, Balanced tree,
Steiglitz, K., 35, 204, 349 Binary tree, Decision tree, Minimum spanning
Stinson, D.R., 35, 141, 350 tree, Optimal search tree, Rooted tree,
Stirling's formula, 233 Searching in a tree, Traversal of tree
Stochastic advantage. treenode, 23
See Amplification of stochastic advantage Triangle inequality, 103
Stochastic preconditioning, 240-242 Triangular matrix, 302
Stockmeyer, L.J., 336, 337, 350 Trip. See Decision tree
Stone, H.S., 35, 350 TRS-80, 35
Strassen, V., 15, 35, 132, 140, 276, 290, 291, 350 Tukey, J.W., 19, 290, 344
Strassen's algorithm, 132-133, 140, 305, 308 Turing, A.M., 337, 351
String. See Searching See also Equivalence; Reduction
Strong pseudoprime, 270, 320 Turing machine, 327
Strongly connected, 179-182 Turk, J.W.M., 291, 351
Strongly quadratic, 301 2-edge-connected. See Bicoherent
Supra quadratic, 301 2-3 tree, 25, 35
Index 361
Now, this innovative new book gives readers the basic tools they need to
develop their own algorithms--in whatever field of application they may be
required!
CONTENT HIGHLIGHTS:
Concentrates on the techniques needed to design and analyze
algorithms.
Details each technique in full.
Illustrates each technique with concrete examples of algorithms taken
from such different applications as optimization, linear algebra,
cryptography, operations research, symbolic computation, artificial
intelligence, numerical analysis, computing in the humanities, among
others.
Presents real-life applications for most algorithms.
Contains approximately 500 exercises-many of which call for an
algorithm to be implemented on a computer so that its efficiency may be
measured experimentally and compared to the efficiency of alternative
solutions.
ISBN 0-13-023243-2