AlgebraOfPrograms PDF
AlgebraOfPrograms PDF
AlgebraOfPrograms PDF
Programnimg
Prentice Hall International Series in Computer Science
Richard Bird
and
Oege de Moor
University of Oxford
Reading, Massachusetts
*
San, Francisco
Toronto Don Mills, Ontario Sydney Tokyo Singapore J-Hong Kong Seoul
Taipei Cape Town Madrid Mexico City Amsterdam Munich Paris Milan
Pearson Education Limited
Edinburgh Gate
Harlow
Essex CM20 2JE
England
10 98765432
04 03 02 01 00
Contents
Foreword ix
Preface xi
Programs 1
1.1 Datatypes 1
1.2 Natural numbers 4
1.3 Lists t 7
1.4 Trees 14
1.5 Inverses 16
1.6 Polymorphic functions 18
1.7 Pointwise and point-free 19
Applications 55
3.1 Banana-split 55
3.2 Ruby triangles and Horner's rule 58
3.3 The TfiK problem -
part one 62
3.4 Conditions and conditionals 66
3.5 Concatenation and currying 70
VI
259
Appendix 265
Bibliography 271
Index 291
Foreword
algebraic properties of their specifications. The formulae and equations that you
will see here share the elegance of those which underlie physics or chemistry or any
other branch of basic science; and like them, they inspire our interest, enlarge our
understanding, and hold out promise of enduring benefits in application.
Tony Hoare
Preface
construction that form the core of the subject known as Algorithm Design. Examples
of such principles include: dynamic programming, greedy algorithms, exhaustive
search, and divide and conquer.
The main ideas of the algebraic approach are illustrated by an extensive study of
optimisation problems, conducted in Chapters 7-10. These are problems that
involve finding a largest, smallest, cheapest, and so on, member of a set of possible
solutions satisfying some given constraints. It is surprising how many
computational problems can be specified in terms of optimising some suitable measure, even
problems that do not at first sight fall into the traditional areas of combinatorial
optimisation. However, the book is not primarily about optimisation problems,
rather it is about one approach to solving programming problems in general.
Specific algorithms for specific problems are obtained by checking that the
conditions hold andinstantiating the results. The solution may take the form of a
function, but more usually a relation, characterised as the solution to a certain
recursive equation. The recursive equation is then refined to a recursive program
that delivers a function, and the result is translated into a functional programming
Xll Preface
language. All the programs derived in Chapters 7-10 follow this pattern, and the
popular language Gofer (Jones 1994) is used to implement the results.
A categorical calculus provides not only a means for formulating algorithmic
strategies abstractly, but also a smooth and integrated framework for conducting proofs.
The style employed throughout the book is one of equational and inequational
point-free reasoning with functions and relations. A point-free calculation is one in
which the expressions under manipulation denote functions or relations, built using
functional and relational composition as the basic combining form. In contrast,
pointwise reasoning is reasoning conducted at the level of functional or relational
application and expressed in a formalism such as the predicate calculus.
The point-free style is intrinsic to a categorical approach, but is less common in
proofs about programs. One of the advantages of a point-free style is that one is
unencumbered by many of the complications involved in manipulating formula
dealing with bound variables introduced by explicit quantifications. Point-free
reasoning is therefore well suited to mechanisation, though none of the many calculations
recorded in this book were in fact produced with the help of a mechanical proof
assistant.
Audience
not excluded. Although we have taken pains to make the book as self-contained as
possible, and have provided lots of exercises for self-study, the reader will need some
mathematical background to appreciate and master the more abstract material.
Outline
Roughly speaking, the first half of the book (Chapters 1-6) is devoted to basic
theory, while the second half (Chapters 7-10) pursues the theme of finding efficient
solutions for various kinds of optimisation problem. But most of the early chapters
contain some applications of the theory to programming problems.
The methods explored in Chapters 7-10 fall into two basic kinds, depending on
whether a solution to an optimisation problem is viewed as being composed out
of smaller ones, or decomposed into smaller ones. The two views are
complementary and individual problems can fall into both classes. Chapter 7 discusses greedy
algorithms that assemble a solution to a problem by a bottom-up process of
constructing solutions to smaller problems, while Chapter 10 studies another class of
greedy algorithm that chooses an optimal decomposition at each stage.
The remaining chapters, Chapter 8 and Chapter 9, deal with similar views of
dynamicprogramming. Each of these four chapters contains three or four case studies
of non-trivial problems, most of which have been taken from textbooks on algorithm
design, general programming, and combinatorial optimisation.
The chapters are intended to be read in sequence. Bibliographical remarks are
included at the end of each chapter, and the majority of individual sections contain
a selection of exercises. Answers to all the exercises in the first six chapters can be
obtained from the World-wide Web: see the URL
http://www.comlab.ox.ac.uk/oucl/publications/books/algebra/
Acknowledgements
Sharon Curtis, Jeremy Gibbons, Martin Henson, Tony Hoare, Guy LaPalme, Bern-
hard Moller, Jesus Ravelo, and Philip Wadler.
We would like to thank Jim Davies for knocking our WFf£& into shape, and Jackie
Harbor, our editor at Prentice Hall, for enthusiasm, moral support, and a number
of lunches.
The diagrams in this book were drawn using Paul Taylor's package (Taylor 1994).
Richard Bird would like to record a special debt of gratitude to Lambert Meertens
for his friendship and collaboration over many years. Oege de Moor would like to
thank the Dutch STOP project and British Petroleum for the financial assistance
that enabled him to come and work in Oxford. The first part of this book was
XIV Preface
written at the University of Tokyo, while visiting Professors Masato Takeichi and
Hidohiko Tanaka. Their hospitality and the generosity of Fujitsu, which made the
visit possible, are gratefully acknowledged.
April, 1996
'Now, then,' she said, somewhat calmer. 'An explanation,
if you please, and a categorical one. What's the idea?
What's it all about? Who the devil's that inside the winding-sheet?'
P.G. Wodehouse, The Code of the Woosters
Chapter 1
Programs
Most of the derivations recorded in this book end with a program, more specifically,
a functional program. In this opening chapter we settle notation for expressing
functional programs and review those features of functional languages that will
emerge again in a more general setting later on. Many aspects of modern functional
languages (of which there is an abundance, e.g. Gofer, Haskell, Hope, Miranda™,
Orwell, SML) are not covered. For example, we will not go into questions of strict
versus non-strict semantics, infinite values, evaluation strategies, cost models, or
operating environments. For fuller information we refer the reader to the standard
texts on the subject, some of which are mentioned in the bibliographical remarks
at the end of the chapter. Our main purpose is to identify familiar landmarks that
will help readers to navigate through the abstract material to come.
1.1 Datatypes
At the heart of functional programming is the ability to introduce new datatypes
and to define functions thatmanipulate their values. Datatypes can be introduced
by simple enumeration of their elements; for example:
The type Bool consists of two values and Char consists of 128. It would be painful
to refer to characters only by their ASCII numbers, so most languages provide
an alternative syntax, allowing one to write 'A' for ascn65, 'a' for ascii97, '\n'
for asciilO, and so on. The various identifiers, asciiO, true, and so on, are called
constructors and the vertical bar | is interpreted as the operation of disjoint union.
The type Either consists of 130 values: bool false, bool true, char asciiO, and so on.
The type Both consists of 256 values, one for each combination of a value in Bool
with a value in Char. In these datatypes the constructors bool, char and tuple
denote functions; for example, char produces a value of type Either given a value
of type Char.
which functions are applied to arguments on the right. In the alternative, so-called
diagrammatic forms, one writes xf for application and /; g for composition, where
xtf\9) (xf)9- The conventional order is consistent with adjectival order in
English, in which adjectives are functions taking noun phrases to noun phrases.
not false =
true
not true =
false
Functions of more than one argument can be defined in one of two basic styles:
either by pairing the arguments, as in
and (false, b) =
false
and (true, b) =
b
or by currying, as in
cand false b =
false
cand true b =
b.
1.1 I Datatypes 3
cand :
(Bool «— Bool) «— Bool.
More generally, we can define a function / of two arguments by choosing any of the
types
f:A<-(BxC)
f:(A^B)^C
f:(A<-C)+- B.
With the first type we would write / (6, c); with the second, f cb; and with the
third, f be. For obvious reasons, the first and third seem more natural companions.
The function curry, with type
curry :
((A <- C) <- B) <- (A<- (B x
C)),
converts a non-curried function into a curried one:
curry f be =
/(&, c).
One can also define a function uncurry that goes the other way.
with the datatype, and we write the names of functions using lower case letters.
4 1 / Programs
introduces the type of natural numbers. Nat is the union of an infinite number
of distinct values: zero, succ zero, succ {succ zero), and so on. If two distinct
expressions of Nat denoted the same value, we could show, for some element n of
Nat, that both zero and succn denoted the same value, contradicting the basic
assumption that different constructors produce different values.
Forcing the programmer to write succ (succ (succ zero)) instead of 3, and to re-create
all of arithmetic from scratch, would be a curious decision, to say the least, by the
designers of a programming language, so a standard syntax for numbers is usually
provided, as well as the basic arithmetic operations. In particular, zero is written
0 and succ n is written n + 1. With these conventions, we can write definitions in
a more perspicuous form; for example,
fact 0 = 1
fact (n + 1) =
(n + 1) x fact n
fibO =
0
fibl = 1
/?&(n + 2) =
fibn + fib(n + l)
defines the Fibonacci function. The expression n + 2 corresponds to the pattern
succ (succ n), which is disjoint from the patterns zero and succ zero.
fn =
/(n + 1).
Every constant function satisfies the equation for /, but none is defined by it. On
1.2 J Natural numbers 5
/0 = c
/(n + 1) =
h(fn)
do define a unique function / for every constant c and function h of appropriate
types. More precisely, if c has type A for some A, and if h has type ft : A«— j4, then
/ is defined uniquely for every natural number and has type f : A<-Nat. The above
scheme is called definition by structural recursion over the natural numbers, and is
an instance of slightly
a general more scheme called primitive recursion. Much of
this book is devoted to understanding and exploiting the idea of defining a function
(or, more generally, a relation) by structural recursion over a datatype.
The two equations given above can be captured in terms of a single function foldn
that takes the constant c and function h
arguments; thus / foldn (c, h). The
as =
function foldn is called the fold operator for the type Nat. Observe that foldn (c, h)
works by taking a natural number expressed in terms of zero and succ, replacing
zero by c and by ft, and then evaluating the result.
succ In other words, foldn (c, ft)
describes a homomorphism of Nat.
It is a fact that not every computable function over the natural numbers can be
described using structural recursion, so certainly some functional programs are
inaccessible if only structural recursion is allowed. However, in the presence of currying
and other bits and pieces, structural recursion is both a flexible and powerful tool
(see Exercise 1.6). For example,
plus m =
foldn (m, succ)
mult m =
foldn (0, plus m)
expnm =
foldn (1, mult m)
define curried versions of addition, multiplication and exponentiation. In these
definitions currying plays essential role since foldn gives us no way of defining
an
fact =
outr foldn ((0,1), /)
outr(m,n) =
n
f(m,n) =
(m + l,(m + 1) x
n),
and the Fibonacci function can be computed by
fib =
outl- foldn ((0,1),/)
outl (m, n) =
m
f (m,n) =
(n,m + n).
6 1 / Programs
for fact and fib can be regarded as implementations of the recursive definitions.
The program for fib has the advantage that values of fib are computed in linear
time, while the recursive definition, if implemented directly, would require
exponential time. The program for fib illustrates an important idea, called tabulation,
in which function values are stored for subsequent use rather than being calculated
afresh each time. Here, the table is very simple, consisting of just a pair of values:
foldn ((0,1),/) n returns the pair (fib n,fib (n + 1)). The theme of tabulation will
emerge again in Chapter 9 on dynamic programming.
Exercises
1.1 Give an example of a recursion equation that is not satisfied by any function.
1.3 Construct a datatype Nat+ for representing the integers >0, together with an
operator foldn+ for iterating over such numbers. Give functions / : Not* «— Nat
and g : Nat <— Nat~*~ such that / g is the identity function on Nat~*~ and g / is the
identity function on Nat.
1.4 Express the squaring function sqr : Nat«— Nat in the form sqr =
f -foldn (c, h)
for suitable /, c and h.
1.5 Consider the function lastp : Nat <- Nat such that lastpn returns the largest
natural number m < n satisfying p : Bool<-Nat. Assuming that p 0 holds, construct
suitable /, c and h so that lastp f foldn (c, h).
=
1.6 Ackermann's function ack : Nat <— Nat x Nat is defined by the equations
ack(0, y) y + l
=
ack(x + 1,0) =
ack(x, 1)
ack(x + l,y + l) =
ack(x,ack(x+ l,y)).
The function curry ack can be expressed as foldn(succ,f) for an appropriate /.
What is/? (Remark: this shows that, in the presence of currying, functions which
are not primitive recursive can be expressed in terms of foldn.)
1.3 J Lists 7
1.3 Lists
There are two basic views of lists, given by the type declarations
The former describes the type of cons-lists, in which elements are added to the front
ofa list; the latter describes the type of snoc-lists, in which elements are added to
the rear. Thus listr builds lists from the right, while listl builds lists from the left.
The constructor nil is overloaded in that it denotes both the empty cons-list and
the empty snoc-list; in any program making use of both forms of list, distinct names
would have to be chosen.
The two types of list are different, though isomorphic to one another. For example,
the function convert : listr A «— listl A that converts a snoc-list into a cons-list can
be defined recursively by
convert nil =
nil
from one view to the other, quietly and efficiently, as occasion demands.) Cons-lists
are taken as the basic view, and special syntax is provided for nil and cons. The
empty list is written [], and cons (o, x) is written a : x. In addition, [o] can be used
for a : [], [a, b] for a : b : [], and so on. However, since we want to treat both
types of list on an equal footing, we will not use the syntax o : x\ for now we stick
with the slightly cumbersome forms nil, cons (o, x) and snoc (ar, a).
8 1 / Programs
[1,2,3]-H- [4,5] =
[1,2,3,4,5].
In particular,
cons (a, x) =
[a] -H- x
snoc (x, a) =
a?-H-[a].
Later on, but not just yet, we will use the expressions on the right as alternatives
for those on the left; this is an extension of a similar convention for Nat, in which
we wrote n + 1 for succ n, thereby harnessing the operation + for a more primitive
purpose.
The type and definition of concatenation depends on the version of lists under
consideration. For example, taking
x -H- snoc(y, a) =
snoc(x-W- y,a).
Using this definition, we can show that -H- is an associative operation, and that nil
is a left unit as well as a right one. The proof that
x-ti-(y-ti-z) =
{x -H- y) -H- z
x-tt-(y-tt- nil)
=
{first equation defining -H-}
x-W-y
=
{first equation defining -H-}
(x -H- y) -H- nil.
The induction step is
=
{second equation defining -H-}
snoc(x-W- (y-W-z),a)
1.3 / Lists 9
=
{induction hypothesis}
snoc ((x -H- y) -H- z, a)
=
{second equation defining -H- (backwards)}
(x -W- y) -W- snoc (z, a).
We leave the proof that nil is the unit of -H- as an exercise. The above style and
format for calculations will be adopted throughout the book.
A most useful operation on lists is the function that applies a function to every
element of
a list. Traditionally, this operation is called mapf. If/ : A «— 5, then
name of the type plays a dual role, signifying the type in type expressions, and
the map operation in value expressions. The same convention is extended to other
parameterised types. The reason for this choice will emerge in the next chapter.
listrf nil =
nil
listr f (cons (a, x)) =
cons (f a, listr f x).
There is a similar definition for listlf. Instead of writing down recursion equations
we can appeal to a standard recipe similar to that introduced for Nat. Consider
the scheme
f nil =
c
f(cons(a,x)) =
h(a,f x)
for defining a recursive function / with source type listr A for some A. We
encapsulate this pattern of recursion by a function foldr, so that / foldr (c, h). = In other
words,
listrf =
foldr (nil, h) where fe(a, x) = cons (f a, x).
10 1 / Programs
foW/ =
foldl (nil, ft) where h(x, o) = snoc
(x,fa).
The functions foldr (c, ft) and foldl (c, ft) work in a similar fashion to foldn (c, ft) of
the preceding foldr (c, ft) transforms a list by systematically replacing nil
section:
by c and cons by ft; similarly, /oZrfZ (c, ft) replaces mZ by c and snoc by ft. Like
foldn on the natural numbers, these two functions embody structural recursion on
their respective datatypes and can be used to define many useful functions. For
example, on snoc-lists we can define a curried version of concatenation by
cat x =
foldl (x, snoc).
We have cat x y
= x 4h y. This definition mirrors the earlier definition of addition:
plus m =
foldn (m, succ). We leave it as an exercise to define a version of cat over
cons-lists.
sum =
foldr (0, plus)
product =
foldr (1, mult)
concat =
foldr (nil, cat)
length =
sum listr one, where one a = 1.
The function concat : listr A <—listr (listr A) concatenates a list of lists into one long
list, and length returns the length of a list. The length function can also be defined
in terms of a single foldr:
length =
foldr (0, ft), where ft(a, n) = n + 1.
as a fold after
a mapping operation can also be expressed as a single fold. We will
state and prove a suitably general version of this result in the next chapter.
Another example is provided by the function filter p : listr A«— listr A, where p has
type p : Bool <- A. This function filters a list, retaining only those elements that
filter p =
concat listr (p —> wrap, nilp)
( * _
{ f a, \i p a
\P J,9)a
| ga^ otherwise
I 11 J Lists 11
wrap a =
cons (a, nil)
nilp a =
nil.
filter p =
foldr (nil, (p outl -> cons, outr)).
The projection functions outl and outr were introduced earlier. Applied to (a, x),
the function (p-outl -> cons, outr) returns cons (a, x) is true, and x otherwise.
if p a
Yet another way to express filter is given in the last section of this chapter.
Finally, let us consider an example where the difference between cons-lists and
snoc-lists plays an essential role. Consider the problem of converting some suitable
representation of a decimal value into the real number it represents. Suppose the
number is
w =
10mrfm + 10m-1rfm_i + ---10°d0
/ =
ei/lO^ea/lO^-'-en/lO71.
Observing that
w =
10 ((... (10 x (10 x 0 + dm) + dm-i)...))
x + db
/ =
(ei + ...(cn_i + cn/10)/10...)/10,
we can see that onesensible way to represent decimal numbers is by a pair of lists
listl Digit x listr Digit. We can then define the evaluating function eval by
eval : Real (listl Digit x listr Digit)
<—
eval (x,y) =
foldl (0, /) x + foldr (0, g) y
f (n, d) =
10 x n + d
g(e,r) =
(e + r)/10.
appropriate to represent the whole number part by a snoc-list because it is
It, is
evaluated more conveniently by processing the digits from left to right; on the other
hand, the fractional part is more appropriately represented by a cons-list since the
processing is from right to left.
12 1 / Programs
In traditional functional programming, the functions foldr (c, h) and foldl (c, h) are
defined a differently. There are two minor differences and one major one. First,
little
foldr and foldl are usually defined as curried functions, writing foldr h c instead of
foldr (c, ft), and similarly for foldl. One small advantage of the switch of arguments
is that some functions can be defined more succinctly; for example, the curried
function cat on snoc-lists can now be defined by
cat =
foldl snoc.
The second minor difference is that the argument ft in foldr ft c is also curried
becausecons is usually introduced as a curried function. Since we have introduced
cons to have type listr A «— (A x listr A), it is appropriate to take the type of ft to
be B<-(A x B).
The more important difference is that in traditional functional programming the
basic view of lists is cons-lists and, because foldl is a useful operation to have, the
type assigned to foldl (c, ft) is B <— listr A, for some A and B. This means that foldl
is given a different definition, namely,
reverse =
foldr (mZ, append)
append (a, x) =
snocr (x, a),
where snocr was defined earlier. As a function on snoc-lists, we can define
reverse =
foldl (niZ, prepend)
prepend (#, o) =
cons (a, x).
As an implementation of reverse on cons-lists, the first definition takes 0(n2) steps
to reverse a list of length n, the reason being that snocr requires linear time.
However, interpreting foldl as an operation on cons-lists, the second definition of reverse
takes linear time because cons takes constant time.
1.3 / Lists 13
Non-empty lists
Having the empty list around sometimes causes more trouble than it is worth.
Fortunately, we can always introduce the types
listr* A ::= wrap A cons (A, listr* A)
listl+ A ::= wrap A | snoc (lisil* A, A)
of non-empty cons-lists and snoc-lists. Here, wrap returns a singleton list and the
generic fold operation replaces the function wrap by a function / and cons by a
function g:
foldr+(f,g)(wrapa) =
fa
foldr+ (/, g) (cons (a, x)) =
g (a,foldr+ (/, g) x).
In particular, the function head : A «— listr* A that returns the first element of a
head =
foldr* (id, outl).
In some functional languages the fold operator on non-empty cons-lists is denoted
by /oZdrl, with the definition
foldrlf =
foldr+(idJ).
So foldrl cannot express the general fold operator on non-empty cons-lists, but
only the special case (admittedly the most frequent in practice) in which the first
argument is the identity function.
List comprehensions
is called list
comprehension and produces a list of values of the form exprO for
a
[n x n
| n <—
[1.. 10]; even n]
produces, in order, the list of squares of even numbers n in the range 1 < n < 10.
In particular, we have
listr f x =
\f a a <— x]
filter px =
[a\a<—x;pa].
14 1 / Programs
There is general
a more form of list comprehension, but we will not need it; indeed,
list comprehensions are used only occasionally in what follows.
Exercises
1.7 Construct the function convert : listr A <— listlA in the form foldl (c^h) for
suitable c and h.
1.11 Construct the iterative function foldl over cons-lists in terms of foldr.
1.12 The function take n : listr A <— listr A takes the first n items of a list, or the
whole list if its length is no larger than n. Construct suitable h and c for which
takenx foldr(c,h)xn. Similarly, define the function dropn (which drops the
first n items from a list) in terms of foldr.
1.4 Trees
We will briefly consider two more examples of recursive datatypes to drive home
the points made in preceding sections. First, consider the type
The generic form of the fold operator for binary trees is foldt (/, #), defined by
Here, foldt (f,g) : B 4- tree A if/ : B <— A and g : B <— B x B. In particular, the
map function for trees is given by
treef =
foldt (tip /, bin).
The functions size and depth for determining the size and the depth of a tree are
given by
size =
foldt (one, plus), where one a 1
depth =
foldt (zero, succ bmax), where zero a =
0.
Here, bmax (x, y) (short for 'binary maximum') returns the greater of x and y; the
depth of the tree bin (x, y) is one more than the greater of the depths of trees x
and y.
The final example is of two mutually recursive datatypes. Consider the types
defining trees and forests in terms of each other. The type forest A is in fact
isomorphic to listr (tree A), so we could also have introduced trees using lists rather
than forests.
The generic fold operation for this kind of tree is not defined by a single
recursion, but as the first of a pair of functions, foldt (g, c, h) and foldf (g, c, ft), defined
simultaneously by mutual recursion:
foldf (g, c, h) (grow (x, xs)) = h (foldt (g, c, h) x, foldf (g, c, h) xs).
For example, the size of a tree is defined by
size =
foldt (succ outr, 0, plus).
We have now seen enough examples to get the general idea: when introducing
a new datatype, also define the generic fold operation for that datatype. When
the datatype is parameterised, also introduce the appropriate mapping operation.
Given these functions, a number of other useful functions can be quickly defined.
It would be nice if we could give, once and for all, a single completely generic
definition of the fold operator, parameterised by the structure of the datatype being
defined. Indeed, we shall do just this in the next chapter. But in most functional
languages currently available, this is not possible: we can parameterise functions with
abstract operators, but we cannot parameterise functions with abstract datatypes.
16 1 / Programs
Recently, several authors have proposed new languages that overcome this
restriction, and some references can be found in the bibliographical remarks at the end of
this chapter.
Exercises
f(g(a,b),h(c),d)
as an element of gtree Char. Convert this expression to curried form, and represent
the result as an element of tree Char. Using this translation as a guide, construct
functions
1.5 Inverses
Another theme that will emerge in subsequent chapters is the use of inverses in
program specification and synthesis. Some functions are best specified as inverses
to other functions. Consider, for example, the function zip with type
zip : listr (A x
B) <- (listr A x listr S),
which is defined informally by
unzip =
pair (listr outl, listr outr),
1.5 / Inverses 17
where pair (f, g)x (f x,gx). Thus, unzip takes a list of pairs and returns a pair
=
of lists consisting of the first components (listr outl) and the second components
(listr outr). The function unzip can also be expressed in terms of a single fold:
unzip =
foldr (nils, conss)
nils =
(nil, nil)
conss ((a, b), (x, y)) =
(cons (a, x), cons (b, y)).
This is another example of a general result that we will give later in the book: a
Now, unzip is an injective function, which means that we can specify zip by the
condition
zip unzip =
id.
zip (nil, y) =
nil
As another example, consider the function decimal: listl Digit <— Nat that converts
a natural number to the decimal numeral that represents it. The inverse function
in this case is eval: Nat «— listl Digit defined by
eval =
foldl(0,f)
f(n,d) =
10 x n + d.
either we can define a type Decimal, replacing listl Digit, so that eval is an injective
function on Decimal; or else specify decimal n to be, say, a shortest member of
the set {x eval x =
n}. Both methods will be examined in due course, so we
will not go into details at this stage. The main point we want to make here is
that definition by inverse is a useful method of specification, but one that involves
18 1 J Programs
difficulties when working exclusively with functions. The solution, as we shall see, is
to move to relations: all relations possess
a unique converse, so there is no problem
Exercises
1.16 Define datatype Digits that represents non-empty lists of digits, not
a
with
beginning Define the generic fold function for Digits, and use it to construct
zero.
the evaluating function evdl : Nat+ <— Digits, where Nat+ is the type of positive
integers. Can you specify decimal: Digits «— Nat~*~ as the inverse of eval?
listr f concat =
concat listr (listr f).
This equation can be interpreted as the assertion that the recipe of concatenating
a list of
lists, and then renaming the elements, has the same outcome as renaming
each element in the list of lists, and then concatenating. Thus, concat does not
depend on the structure of the elements of the lists being concatenated. A formal
proof of the equation above is left as an exercise.
As another example, consider the function inits : listl (listl A) «— listl A that returns
a list of all prefixes of a list:
inits =
foldl ([nil], /)
/ (snoc (xs, x), a) =
snoc (snoc (xs, x), snoc (x, a)).
For example,
then process each element in each list, or first process each element and then take
prefixes. We therefore have the identity
listl (listlf) inits =
inits listlf.
In a similar fashion, the function reverse : listr A <— listr A satisfies the identity
listrf reverse =
reverse listr f.
Finally, the function zip : listr (Ax B) <— (listr A x listr B) satisfies the identity
listr (cross (/, g)) zip =
zip cross (listrf, listr g),
where cross (f, g) (a, b) =
a, g b). Functions, like concat, inits, reverse, zip, and
(f
so on, which do not dependin any essential way on the structure of the elements in
their arguments, will be studied in a general setting later on, where they are called
natural transformations.
Exercises
1.17 Give proofs by induction of the various identities cited in this section.
1.18 Suppose you are given a polymorphic function foo with type
Recall that the function filter p can be defined on lists by the equation
filter p =
concat listr (p —> wrap, nilp).
The function wrap takes a value and returns a singleton list; thus wrap a [a]. The
20 1 / Programs
function nilp takes a value and returns the empty list; thus, nilp a nil. Our aim
in this section is to prove the identity
filter p = listr outl filter outr zip pair (id, listr p).
The operational reading of the right-hand side is that each element of a list is
paired with its boolean value under p, and then those elements paired with true
are selected. Although we won't go into details, the identity is useful in optimising
pair(f,g)a =
(fa,ga)
outl (a, b) =
a
outr (a, b) =
b.
categorical product.
For the function nilp we have two rules:
nilp f - =
nilp (1.3)
listr f nilp =
nilp. (1.4)
The first rule states that nilp is a constant function, and the second rule states that
this constant is the empty list.
listr f-wrap =
wrap
-
f (1.5)
listrf -
concat =
concat listr (listrf). (1.6)
These state that wrap and concat are natural transformations.
listr (f g) =
listr f listr g (1.8)
listr id =
id. (1.9)
As we will see in the next chapter, these rules say that listr is what is known as a
functor.
(P^f,9)'h =
(p-h^f-h,g-h) (1.10)
h-(p^f,g) =
(p-+h.f,h.g). (1.11)
These rules say how composition distributes over conditionals.
f'id =
f (1.12)
id-f =
/. (1.13)
The two occurrences of id denote different instances of the identity function, one
It might appear that these dozen or so rules have been plucked out of thin air but,
as we have hinted, they form coherent groups based on a small number of concepts
(products, functors, natural transformations, and so on) to be studied in the next
chapter. For now we just accept them.
listr outl -
listr outl -
concat listr (outr -> wrap, nil) zip pair (id, listr p)
=
{equation (1.6)}
concat listr (listr outl) listr (outr -> wrap, nil) zip pair (id, listr p)
-
=
{equation (1.8) (backwards)}
concat listr (listr outl (outr -> wrap, nil) zip pair (id, listr p)
-
=
{equations (1.11), (1.5), and (1.4)}
concat listr (outr -> wrap outl, nil) zip pair (id, listr p)
=
{equation (1.9) (backwards)}
concat listr (outr -> wrap outl, nil) zip pair (listr id, listr p)
=
{equation (1.7)}
concat listr (outr —> wrap outl, nil) listr (pair (id,p))
22 1 / Programs
=
{equation (1.8) (backwards)}
concat listr (outr —> wrap outl, nil) pair (id,p)
=
{equations (1.10), (1.1), and (1.3)}
concat listr (p -> wrap id, nil)
-
=
{equation (1.12)}
concat listr (p -> wrap, nil)
=
{definition of filter}
filter p.
of the steps is possible; for example, we could have simplified zip pair (id, listr p)
to listr (pair (id, p)) Apart from this, almost every step is
earlier in the calculation.
forced. Indeed, when some students set the
problem in an examination, almost
were
nobody had difficulties solving it. The problem was also given as a test example to
a graduate student who had designed a simple proof editor, including a 'go' button
that automatically applied identities from a given set from left to right until no
more rules in the set were applicable. Apart from expressing rules (1.8) and (1.9)
in reverse form, the calculation proceeded quickly and automatically to the desired
conclusion, somewhat to the student's surprise.
With this single exercise hope to have convinced the reader that point-free
we
free or pointwise are satisfying to do, they are far less satisfying to read. It has
-
been said that calculating is not a spectator sport. Therefore, our advice to the
reader in studying a calculation is first to try and do it for oneself. Only when
difficulties arise should the text be consulted. Although we have strived to present
calculations in the best possible way, there will no doubt be occasions when the
diligent reader can find a shorter or clearer route to the desired conclusion.
Bibliographical remarks
http://www.lpac.ac.uk/SEL-HPC/Articles/FuncArchive.html
Readers who wish to experiment with the programs presented in this book might
consider the Gofer system (Jones 1994), which is freely available from
ftp://ftp.cs.nott.ac.uk/nott-fp/languages/gofer/
In fact, in later chapters, when we come to study some non-trivial programming
examples, we shall present the result of our derivations as Gofer programs.
The realisation that functional programs are good for equational reasoning is as
old as the subject itself. Two landmark papers are (Backus 1978; Burstall and
1989). The material of this book evolved from all these works. Quite similar in
spirit, but slightly different in notation and style are (Backus 1981, 1985;
Harrison and Khoshnevisan 1988; Harrison 1988; Williams 1982), and (Pettorossi and
Recently there has been a surge of interest in functional languages that, given the
definition of a datatype, automatically provide the user with the associated fold.
One approach, which is quite transparent to the naive user, can be found in (Fegaras,
Sheard, and Stemple 1992; Sheard and Fegaras 1993; Kieburtz and Lewis 1995).
Another approach, which is more elegant but also requires more understanding on
the user's part, is the use of constructor classes in (Jeuring 1995; Jones 1995; Meijer
and Hutton 1995).
Chapter 2
This chapter provides a brief introduction to the elements of category theory that are
necessary for understanding the rest of the book. In particular, it emphasises ways
in which category theory offers economy in definitions and proofs. Subsequently, it
is shown how category theory can be used in defining the basic building blocks of
datatypes, and how these definitions give rise to a set of combinators that unify the
operators found in functional programming and program derivation. In Chapter 3
these combinators, and the associated theory, are illustrated in a number of small
but representative programming examples.
One does not so much learn category theory as absorb it over a period of time. It is
difficult, at a first or second reading, to appreciate the point of many definitions and
the reasons subject's abstract nature. We have tried to take this into account
for the
in two ways: first, by adopting a strictly minimalist style, leaving out anything that
is not germane to our purpose; and second, by confining attention to a small range
of examples, all drawn from the area of program specification and derivation, which
is, after all, our main topic.
2.1 Categories
A category C is an algebraic structure consisting of a class of objects, denoted by
A,B,C,..., and so on, and a class of arrows, denoted by /,#, ft,..., and so on,
together with three total operations and one partial operation.
The first two total operations are called target and source; both assign an object
to an arrow. f : A<— B (pronounced '/ is of type A from S') to indicate
We write
that the target of the arrow f is A and the source of / is B.
The third total operation takes an object A to an arrow ida '• A «— A, called the
identity arrow on A.
The partial operation is called composition and takes two arrows to another one.
26 2 / Functions and Categories
f'(9'h) =
(f-g)-h
for all / : A <- £, g : B <- C and h : C <- D, and
Examples of categories
The motivating example of a category is Pun, the category of sets and total
In this category the objects are sets and the arrows are typed functions.
functions.
More precisely, an arrow is a triple (f,A,B) in which the set A contains the range
of / and the set B is the domain of /. By definition, A is the target and B the
source of (/, A, B). The identity arrow ida : A «— A is the identity function -4,
on
and the composition of two arrows (/, A, B) and (#, C, D) is defined if and only if
B C, in which case
(f,A,B)-(g,B,D) =
(f-g,A,D),
where, on the right, / g denotes the usual composition of functions / and g.
Another example of a category is Par, the category of sets and partial functions.
The definition is similar to Pun except that, now, the triple (/, -4, B) is an arrow if
A contains the range of/ and B contains the domain of/. Since a total function is
a special case of a partial function, Fun is a subcategory of Par.
cartesian product Ax B. Again, the target of (JR, -4, B) is A and the source B. The
identity arrow id,A : A <- A is the relation
id>A { (a, o) | cl G A }
and the composition of arrows (i?, -4, B) and (5,5, C) is the arrow (T, -4, C), where,
writing aRb for (a, 6) G i?, we have
oTc =
(36 : aRb A 65c).
2.1 J Categories 27
(f,9)-(h,k) =
(f-h,g-k).
The identity arrow idAxB is, of course, (id,A, ids).
Although we shall see a number of other examples of categories in due course, Fun,
Par and Rel -
Diagrams
As illustrated by the above examples, the requirement that each arrow has a unique
target and source can be something of a burden when it comes to spelling out the
details of an expression or equation. For this reason it is quite common to refer to an
arrow f : simply by the identifier /, leaving A and B implicit. Furthermore,
A<— B
whenever a composition it is implicitly assumed to be well defined. For
one writes
these abbreviations to be legitimate, the type information should always be clear
from the context.
/ : A<— B is represented as A +
B, and its composition with an arrow g : B<— C is
represented as A + B <* C. For example, one can depict the type information
in the equation ida / =
/ as
idA
A
This diagram has the property that any two paths between the same pair of objects
depicts the same arrow: in such cases, a diagram is said to commute. As another
example, here is the diagram that illustrates one of the laws of the last chapter,
namely, listrf wrap wrap /: =
wrap
hstr A <+ A
listrf
UstrB + B
wrap
28 2 / Functions and Categories
It is possible to phrase precise rules about reasoning with diagrams, giving them the
sameformal status as, say, formulae in predicate calculus (Freyd and Scedrov 1990).
However, in what follows we shall use diagrams mainly for the simple purpose of
supplying necessary type information. Just occasionally we will use a diagram in
place of a calculational proof.
f =
g
=
m-f =
m-g
for all /, g : B «— C. In the particular case of Fun, an arrow is monic if and only if it
is injective. To appreciate the calculational advantage of the above definition over
the usual set-theoretic one, let us prove that the composition of two monic arrows
m n / =
m n g
=
{since m is monic}
n-f = n- g
=
{since n is monic}
/ =
</,
f =
9 =
f -e =
g-e
for all/, g : A 4— B. In the particular case of Fun, an arrow is epic if and only if
it issurjective. A symmetrical proof to the one given above for monies shows that
the composition of two epics is again epic.
Duality
same objects and arrows as C, but the source and target operators are interchanged
2.1 J Categories 29
f-gmCop =
g-fin C.
The category Cop may be thought of as being obtained from C by reversing all
arrows. Reversing the arrows twice does not change anything, so (Cop)op C. =
Now, let 5(0) be a statement about the objects and arrows of a category C. By
reversing the direction of all arrows in 5(C), we obtain another statement Sop(C) =
S(Cop) about C. If 5(C) holds for each category C, it follows that Sop(C) also
holds for each category C. The converse implication is also true, because (Cop)op =
(VC:5(C)) =
(VC:5op(C)).
This special case of symmetry is called duality.
To illustrate, recall that above we proved that for any category C, the statement
5(C) ='the composition of two monies in C is monic' is true. Reversing the arrows
in the definition of monic gives precisely the definition of epic, and therefore the
statement 5op(C) 'the composition of two epics in C is epic' is also true for any
=
category C. This argument is summarised by saying that epics are dual to monies.
Some definitions do not change when the arrows are reversed, and a typical example
is the notion of an isomorphism. An isomorphism is an arrow i : A <— B such that
there exists an arrow in the opposite direction, say j : B <— A, such that
j -
i =
ids and i j =
id,A-
It is easy to show that there exists at most one j satisfying this condition, and this
unique arrow is called the inverse i~x of i. If there exists an isomorphism i : A<—B,
then the objects A and B are said to be isomorphic, and we write A B. In Fun =
both monic and epic it is also an isomorphism, but this is a particular property of
Fun that does not hold in every category (see Exercise 2.6 below).
Exercises
2.2 Suppose we have four arrows f : A<-B, g : C«— A, h : B«— A, and A; : C<-B.
Which of the following compositions are well defined:
k-h-f-h g-k-h 1
2.3 An arrow r : A «— B is a retraction if there exists an arrow r' : B<-A such that
r -
r' ida- Show that if r : A «— 5 is a retraction, then for any arrow f : A<- C
there exists g B<-C such that r-g
an arrow /. What is the dual of a retraction?
: =
2.4 Show that / g = id implies that g is monic and / is epic. It follows that
retractions are epic.
2.5 Show that if / g is epic, then / is epic. What is the dual statement?
2.6 Any preorder (^4, <) can be regarded as a category: the objects are the elements
of j4, and there exists a unique arrow a <- b precisely when a < b. What are the
monic arrows? What are the epic arrows? Is every arrow that is both monic and
epic an isomorphism?
2.7 A relation R : A «- B is onto if for all a G -A, there exists b G B such that
aRb. Is every onto relation an epic arrow in Rel? If not, are epic arrows in Rel
necessarily partial functions?
2.8 For any category A, it is possible to construct a category Arr(A) whose objects
are the arrows of A. What is a suitable choice for the arrows of Arr(A)? What are
2.2 Functors
for functors will be written using sans serif font. On the other hand, multiple-letter
identifiers for functors will be written in the normal italic font. For example, id
denotes both the identity functor and the identity arrow.)
The two component mappings of a functor F are required to satisfy the property
F(idA) =
idfA and F(/ g) =
F/ Fg.
take advantage of this convention, it is not without ambiguity, since there may be
many functors that have the same action on objects. In such cases we will, of course,
specify both parts of a functor.
Examples of functors
Let us now look at some examples of functors. As we have already mentioned, there
is an identity functor id : C«— C for every category C. This functor leaves objects
and arrows unchanged.
An equally trivial functor is the constant functor K^ : A<— B that maps each object
B of B to one and the object A of A, and each arrow / of B
same to the arrow idA
of A. This functor preserves composition since idA idA idA- * =
A2 =
{{a,b)\aeA,b€A}
/2(a,6) =
(fa J b).
It is easy to check that the squaring functor preserves identities and composition
and we leave details to the reader.
Compare squaring functor to the product functor (x) : Fun «— Fun x Fun. We
the
will write A B and / x g in preference to x(A1B) and x (/,#). This functor is
x
(fxg)(a,b) =
(fa,gb).
Again, we leave the proof that x preserves identities and composition to the reader.
We met / x g in a programming context in the last chapter, where it was written
as cross (/,#).
Note that (x) takes two arguments (more precisely, a single argument consisting of
a pair of values); such functors are usually referred to as bifunctors. A bifunctor is
therefore a functor whose source is a product category. When F is a bifunctor, the
;vi 2 / Functions and Categories
F(zrf, id) =
id
F(f-h,g-k) =
F(f,g)-F(h,k).
Next, consider the functor listr : Fun «— Fun that takes a set A to the set listr A
of cons-listsover A, and a function / to the function listr f that applies / to each
element of a list. We met listr in the last chapter, where we made use of the
following pair of laws:
listr (f g)-
=
listr f listr g
listr id =
id.
Now we can see that these laws are simply the defining properties of the action of
a functor on arrows. We can also see why this action is denoted by listr f rather
than the more traditional mapf.
Next, the powerset functor P : Fun«— Fun maps a set A to the powerset P-4, which
is definedby
PA =
{x\xCA},
and a function / to the function P/ that applies / to all elements of a given set. The
powerset functor is, of course, closely related to the list functor, the only difference
being that it acts on sets rather than lists.
Next, the existential image functor E : Fun«— Rel maps a set A to P-4, the powerset
ofA, and a relation to its existential image function:
(ER) x =
{ a | (36 : aRb A b G x) }.
For example, the existential image of a finite set x : P Nat under the relation
(<) : Nat is the smallest initial segment of Nat
Nat «— containing x. Again, if
G : A«— PA denotes the membership relation on sets, then E(g) is the function that
takes a set of sets to the union of its members; in symbols, E(g) =
union.
Note that E and P are very similar (they both send a set to its powerset), but they
are functors between different categories: E : Fun«— Rel while P : Fun «— Fun.
In fact, as we shall make more precise in a moment, P is the restriction of E to
functions.
Finally, the graph functor J : Rel«— Fun goes the other way round to E. This
functor maps every function to the corresponding set of pairs, but leaves the objects
unchanged. The graph functor is an example of an inclusion functor that embeds
a category as a subcategory of a larger one. In particular, we have P =
EJ, which
formalises the statement that P is the restriction of E to functions.
2.3 J Natural transformations 33
Exercises
2.9 Prove that functors preserve isomorphisms. That is, for any functor F and
isomorphism z, the arrow Fi is again an isomorphism.
2.10 What is a functor between preorders? (See Exercise 2.6 for the treatment of
preorders as categories.)
H(A,B) =
{f\f:A^BmC}
H{f,h)9 =
f'9'h.
This gives a mapping taking sets to sets. Extend this mapping to a functor, i.e.
define tree on functions. (Later in this chapter we shall see how this can be done
in general.)
P'A = PA
P*(f:A<-B)x =
{aeA {Vb e B : f b = a : b e x)}.
Prove that this does indeed define a functor. Show that P' is different from P. It
follows that P cannot be defined by merely giving its action on objects.
Ffe (\>b =
4>a Gfe
'
FB^-GB
Fh\ Gft
FA+—GA
Another example, again in Fun: the function forkA : A2 <- A defined by fork a =
(a, a) is a natural transformation fork : (_)2 <— id. The naturality condition is
f2 fork
-
=
fork-f.
A natural transformation is called a natural isomorphism if its components are
bijective. For example, in Fun the arrows
swapA,B : A x B <- B x A defined by
swap (6, a) =
(a, b) form a natural isomorphism, with naturality condition
(9 x
/) *
swap =
swap (/ x g).
The above examples are typical: all polymorphic functions in functional
programming languages are natural transformations. This informal statement can be made
precise, see, for instance, (Wadler 1989), but to do so here would go beyond the
scope of the book.
Relations, that is, arrows of Rel, can also be natural transformations. For example,
the membership relations £a : A <— PA are the components of a natural
transformation: : id <— JE. To see what this means, recall that the existential image functor
2.3 J Natural transformations 35
E has type Fun «— Rel and the inclusion functor J has type Rel «— Fun. Thus,
JE : Rel <— Rel. The naturality condition, namely,
R .
e =
e .
JEi?,
For any functor F, the identity transformation idf : F<— F is given by (id?)a id* A- =
{4>"^)a =
4>a"^a-
It can easily be checked that <\>-i\) is natural, for one can paste two diagrams together:
FA^-GA^- HA
Fk Gk Hk
GA+—GB+— HB
<PB WB
The outer rectangle commutes because the inner two squares do. Thus, natural
transformations form the arrows of a category whose objects are functors.
HFfe H<j>A =
H(Ffe fa) =
H(<j>B Gh) =
Hfo HGfe.
An example is the natural transformation E(e) : E«— EJE. As we have seen, E(€a) =
uniona, the function that returns the union of a collection of sets over A.
Exercises
2.14 The text did not explicitly state the functors in the naturality condition of
swap. What are they?
2.15 The function ta takes an element of A and turns it into a singleton set. Verify
thatr : P «— id. Do we also have Jr : JE «— id?
In this book, we follow functional programming practice by writing (j> instead of (fA\.
2.18 The list functor listr : Fun«— Fun can be generalised to a functor Par«— Par
by stipulating that listr f x is undefined if there exists an element in x that is not
in the domain of /. For each set A, we have an arrow head : A <— listr A in Par
that returns the first element of a list. Is head a natural transformation id <— listr?
2.19 The category AB has as its objects functors A«— B and as its arrows natural
transformations. Take for B the category consisting of two objects, with one arrow
between them. Find a category that is isomorphic to AB, whose description does
not make use of natural transformations or functors.
characterise various kinds of datatype, such as products, sums, lists and trees, purely
in terms of composition. These definitions therefore make sense in any category -
although it can happen that, in a particular category, some datatypes do not exist.
When these definitions are interpreted in Fun they describe the datatypes we know
The simplest datatype is a datatype with only one element, so we begin with the
categorical abstraction of the notion of a singleton set.
Terminal objects
!i =
idx. (2.2)
This identity is known as the reflection law. We have also the fusion law
U-/ =
!b «= f:A<-B, (2.3)
because U / : 1 «— B. Note that the fusion law may be restated as saying that
! is natural transformation Ki «— zrf, where K^ is the constant functor defined
a
in Section 2.2. Like universal properties, there will be many examples of other
reflection and fusion laws in due course.
In Fun the terminal object is a singleton set, say {p}. The arrow U is the constant
function that maps every element of A to p. The statement that the terminal object
is unique up to unique isomorphism states that all singleton sets are isomorphic in
a unique way. In Par and Rel the terminal object is the empty set; in both cases
Initial objects
initial objects are unique up to unique isomorphism. A commonly used notation for
the initial object of C is 0, and the unique arrow A «— 0 is denoted j^. In Fun the
initial object is the empty set and j^ is the empty function. Thus, the names 0 and
1 for the initial and terminal objects connote the cardinality of the corresponding
sets in Fun. In Par and Rel the initial object is also the empty set, so in these
Exercises
2.21 An object A is said to be empty if the only with target A are j^ and
arrows
ida- What are the empty objects in Fun? Same question for Rel and Fun x Fun.
2.22 What does it mean to say that a preorder has a terminal object? (See Exercise
2.6 for the interpretation of preorders as categories.)
2.23 Let A and B be categories that have initial and terminal objects. Does A x B
have initial and terminal objects?
2.24 Assuming that A and B have terminal objects, what is the terminal object
in AB? (For the definition of AB, see Exercise 2.19.)
New datatypes can be built by tupling existing datatypes or by taking their disjoint
union; the categorical abstractions considered here are the notions of product and
coproduct.
Products
A product of two objects A and B consists of an object and two arrows. The object
is written as A xB and the arrows are written outl: A^AxB and outr : B^AxB.
These three things are required to satisfy the following property: for each pair of
arrows/ : A^C and g : B<—C there exists an arrow (f,g):AxB<-C such that
h =
(f>9) = outl -
outl outr
AxB B
</,<?>
outl (/, g) =
f and outr (f,g} =
g, (2.5)
which are obtained by taking h =
(/, #) in the right-hand side of (2.4).
Taking outl for / and o^r for g in (2.4), we obtain the reflection law
id =
(outl, outr).
Taking (ft, k) m for ft in (2.4) and using (2.5), we obtain the fusion law
(ft, &) m =
(/, g) <= h-m=f and k-m =
g.
In other words,
(ft, &) ra =
(h- m,k m).-
(2.6)
Use of these rules in subsequent calculations will be signalled simply by the hint
products.
Examples of products
In Fun products are given by pairing. That is, A x B is the cartesian product of A
and S, and outl and outr are the obvious projection functions. In the last chapter
we wrote pair (/, g) for the arrow
{/, g), with the definition
pair(f,g)a =
{fa,ga).
This construction does not define a product in Par or Rel since, for example,
Any two categories A and B also have a product. As we have seen, the category
AxB has as its objects pairs (A,B)y where A is an object of A and B is an
40 2 / Functions and Categories
object of B. Similarly, the arrows are pairs (/, g) where / is an arrow of A and g
is an arrow of B. Composition is defined component-wise, and outl and outr are
the obvious projection functions. In fact we can turn outl and outr into functors
outl: A «— A x B and outr : B «— A x B by defining two mappings:
Spans
objects of Span(i4, B) are spans over A and S, and the arrows m : (/, g) «— (ft, &)
are arrows of C satisfying
/ .
m =
ft and g
-
m = k.
A J— C -^ B
V/
ft\ I /k
D
If each pair of objects in C has a product, one says that C has products. In such a
case x can be made into a bifunctor C^-CxCby defining it on arrows as follows:
/ x g =
(/ outl, g outr).
-
(2.7)
We met x in the special case C = Fun in Section 2.2. For general C the proof
that x preserves identities is immediate from the reflection law (outl, outr) =
id.
To show that x also preserves composition, that is,
(fxg).(hxk) =
(f-h)x(g-k),
2.5 J Products and coproducts 41
(f*9)-(p,q) =
<f-P,g-q). (2.8)
Takingp = h- outl and q = k- outr in (2.8) gives the desired result. Here is a proof
of (2.8):
(/ g) (p, q)
x
=
{definition of x}
{/ outl, g outr) (p, #)
-
=
{fusion law (2.6)}
(f outl {p,q),g outr {p,q))
-
=
{cancellation law (2.5)}
(f-P,9'Q)'
Using the definition of x and the cancellation law (2.5), we now obtain that outl
and outr are natural transformations:
Coproducts
^ =
[/> #] = h- inl =
f and h inr =
g (2.9)
for all h : C <— A + B. The following diagram spells out the type information:
inl M _
inr
The properties of coproducts follow at once from those of products by duality, but
42 2 / Functions and Categories
we can also describe them in a direct approach. The cancellation properties are:
id =
[inl, inr}.
Taking m [ft, &] for ft, we obtain the fusion law
m [ft, k] =
[/, g] <= m ft =
/ and m -
k =
g,
f + g =
[M. f,inr-g].
The composition and fusion laws
(f + 9)'(h + k) =
f-h + g-k
f\ ,9]-{h + k) =
\f-h,g.k]
follow at once by duality, though one can also give a direct proof.
A + B =
{(a,0) I a€-4}U{(6,l) | b e B}.
Thus inl adds a 0 tag, while inr adds a 1. In a functional programming style one
can do this rather more directly, avoiding the artifice of using 0 and 1 as tags, by
defining A + B with the type declaration
outl (/, g) =
f and outr (/, g) =
g
fail to hold under the interpretation (/, g) a (f a,g a) when / and g are partial. =
To be sure, in lazy functional programming, these laws are restored to good health
by extending the notion of function to include a bottom element _L and making
constructions such as pairing non-strict. We will not go into details because this
approach is not exploited in this book.
outl
(inl a) =
a
outl (inr b) =
undefined,
and outr by
(inl a)
outr =
undefined
outr (mid (a, b)) = b
outr (inr b) = b.
(f,g)a =
inl(f a), if defined (f a) and undefined (g a)
=
inr(g a), if undefined (f a) and defined (g a)
=
mid (f a,g a), otherwise.
is that every relation has a converse and so Rel is the same as Relop. This is
the same reason why initial objects and terminal objects coincide in Rel. We will
discuss this situation in more depth in Chapter 5.
44 2 / Functions and Categories
Polynomial functors
The identity functor id and the constant functors K^ for varying A are
polynomial;
If F and G are polynomial, then so are their composition FG, their pointwise
sum F + G and their pointwise product F x G. These pointwise functors are
defined by
(F x G) h = Fhx Gh.
Polynomial functors are useful in the construction of datatypes, but they are not
enough by themselves; we also need type functors, which correspond to recursively
defined types. These are discussed in Section 2.7. For datatypes that make use
of function spaces, and for a categorical treatment of currying in general, we need
exponential objects; these are discussed in Chapter 3.
Exercises
swap : AxB^BxA
assocr : A x (B x
C) ^ (A x
B) x C.
These natural isomorphisms arise in a number of examples later in the book. The
inverse arrow for assocr will be denoted by assocl; thus,
assocl :
(Ax B) x C ^ Ax (B x
C)
satisfies assocl assocr = id and assocr assocl =
id.
2.6 J Initial algebras 45
(\f,9},[h,k}) =
[{f,h),(9,k)}.
2.28 Consider products and coproducts in Fun. Are the projections (outl, outr)
epic? Are the injections (mZ, inr) monic? If the answers to these two questions are
different, does this contradict duality?
2.29 Let A be a category with products. What are the products in Arr(A)? (See
Exercise 2.8 for the definition of Arr(A).)
2.30 Complete the verification of the construction of products in Par.
algebra (Nat, +) of the natural numbers and addition is an algebra of the functor
FA A x A and Ffe = h x h.
h-g =
/-FA.
The type information is provided by the diagram:
B *J— FB
\Fh
j
FA
f
To give just one simple illustration, consider the algebra (+) : Nat <- Nat2 of
addition, and the algebra (0) : Natp «— Nat2 of addition modulo p, where Natp =
{0,1,..., p 1} and n 0 m =
(n + m) mod p. The function hn =
n mod p is a
(_)2-homomorphism to 0 from +.
46 2 / Functions and Categories
of Fun, this category has an initial object, which we shall denote by a : T<- FT (the
letter T stands for 'Type' and also for 'Term' since such algebras are often called
term algebras). The proof that these initial algebras exist is beyond the scope of
the book; the interested reader should consult (Manes and Arbib 1986).
The existence of an F-algebra means that for any other F-algebra / : A<- FA,
initial
there is unique homomorphism to / from a. We will denote this homomorphism
a
by ([/D, so
([/]) : A «— T is characterised by the universal property
ft =
([fD =
h-a=f-Fh. (2.10)
The type information is summarised in the diagram:
T ^— FT
w\
FA
f
Arrows of the form ([/]) are called catamorphisms, and we shall refer to uses of
the above equivalence by the hint catamorphisms. (The word 'catamorphism' is
derived from the greek preposition Kara meaning 'downwards'.) Catamorphisms,
like other constructions by universal properties, satisfy fusion and reflection laws.
Before giving these, let us first pause to give two examples that reveal the notion
of a catamorphism to be a familiar idea in abstract clothing.
Natural numbers
declares [zero, succ] : Nat <- F Nat to be the initial algebra of the functor F defined
by FA =
1 + A and Ffe =
id\ + h. Here zero : Nat«— 1 is a constant function.
The names Nat, zero and succ are inspired by the fact that we can think of Nat
as the natural numbers, zero as the constant function returning 0, and succ as the
successor function. The functor F is polynomial, so the category Alg(F) has an
initialobject; the purpose of the type declaration is to give a name to this initial
algebra.
2.6 J Initial algebras 47
Every algebra of the functor F : Fun <— Fun takes the form [c,f] for some constant
function c : A «— 1 and function / : A «— A. To see this, let ft : A «— FA be an
F-algebra. We have ft =
[ft inl, ft inr], so we can set c =
ft mZ and / = ft inr.
It isclumsy to write ([[c,/]]) so we shall drop the inner brackets and write ([c,/])
instead.
ft a [c,/] Fft
=
=
{definition of F}
ft a [c,/] (idi + ft)
=
=
{coproduct}
ft a [c,/ ft]
=
=
{since a [zero, succ}}
=
=
{coproduct}
[ft zero, ft succ] [c,/ ft] =
=
{cancellation}
ft zero = c and ft succ =
f ft.
Writing 0 for the particular constant returned by the constant function zero and
n + 1 for succn, we now see that ft =
([c,/]) is the unique solution of the two
equations
ft
(0) =
c
ft(n l)
+ =
/(ftn).
In other words, ft =
foldn (c,/). Thus ([c,/]) =
/oZrfn (c,/) in the datatype Nat.
Strings
The second example deals with lists of characters, also called strings:
but it is worth while considering the simpler case first. The above declaration
names [nil, cons] : String «— F String to be the initial algebra of the functor FA =
Like the example of Nat given above, every algebra of this functor takes the form
[c,/] for some constant c : A <- 1 and function f : A*- Char x A. Simplifying, we
find that h =
([c,/]) is the unique solution of the equations
h nil = c
h{cons{a,x)) =
f(a,hx).
In other words, ([c,/]) foldr(c,f) = in the datatype String. So, once again, ([c,/])
corresponds to a fold operator.
Fusion
<[a]) =
id (2.11)
and the very useful fusion law
fc d/D =
dffD «= h-f =
g-Fh. (2.12)
The fusion law can be proved by looking at the diagram
_
ol
FT
W TO
y
A FA
h\ y
Fft
B FB
This diagram commutes because the lower part does (by assumption) and the upper
part does (by definition of catamorphism). But since ([g]) is the unique homomor-
phism from a to p, we conclude that ([g]) h ([/]). =
The fusion law for catamorphisms is probably the most useful tool in the arsenal of
techniques for program
derivation, and we shall see literally dozens of uses in the
programming examples given in the remainder of the book. In particular, it can be
used to prove that a is an isomorphism. Suppose in the statement of fusion we take
both g and h to be a. Then we obtain a ([/]) ([a]) id provided a / a Fa. = = =
([Fa]) a
2.71 Type functors 49
=
{cata}
Fa FflFaD
=
{F functor}
F(a-dFaD)
=
{above}
Fid
=
{F functor}
id
The fact that a is an isomorphism was first recorded in (Lambek 1968), and it
is sometimes referred to as Lambek's Lemma. Motivated by his lemma, Lambek
called (a, T) a Gxpoint of F, but we shall not use this terminology.
Exercises
2.36 Let a : T <- FT be the initial algebra of F. Show that if / : A <- T, then
/ = 0m£Z ([g]) for some p.
2.37 Give an example of a functor of Fun that does not have an initial algebra.
(Hint: think of an operator F taking sets to sets such that F A is not isomorphic to
A for any A.)
Datatypes are often parameterised. For example, we can generalise the example of
strings described above to a datatype of cons-lists over an arbitrary A:
F{A,B) =
11(5x4).
Let F be a bifunctor with the collection of initial algebras aa : TA 4- F(A, J A). The
construction T can be made into a functor by defining
T/ =
fla-F(/,«J)D. (2.13)
For example, the cons-list functor is defined by
listrf =
([[nil, cons] (id + (/ x id))])
which simplifies to listrf =
([nil, cons (/ x id)]). Translated to the point level, this
reads
listrf nil =
nil
listr f (cons (a,x)) = cons (f o, listr f x),
so listrf is just what functional programmers would call mapf, or maplistf.
do it. First:
J id
=
{definition}
([a F(td, id)])
=
{bifunctors preserve identities}
(M)
=
{reflection law}
id.
2.7 J Type functors 51
Second:
Tf-Tg
=
{definition}
<[a-F(f,id)])-Tg
=
{fusion (see below)}
tfa.F(/,«0-F(<MoOD
=
{F bifunctor}
da-F(/-p,id)])
=
{definition}
T(f-g).
The appeal to fusion is justified by the following more general argument:
flftD-Tp ([ft-F(0,id)B
=
=
{definition of T}
dft])-([a-F(p,ui)]) ([ft-F(p,id)])
=
<= {fusion}
(M) a F(p, id) A F(p, id) F(id, ([A]))
=
=
{cata}
A F(id, ([A])) F(p, id) A F(p, id) F(id, ([A]))
=
=
{F bifunctor}
true.
dfcJ-Tp =
dfc-F(p,»d)]). (2.14)
In words, a catamorphism composed with its type functor can always be expressed
as a single catamorphism. Equation (2.14) is quite useful by itself and we shall refer
to it in calculations by the hint type functor fusion. To give just one example now:
if sum =
([zero, plus]) is the function sum : Nat 4- listr Nat, then
sum -
listr f =
([zero,plus (/ x id)]).
Now that we have established that T is a functor, we can show that a : T «- G is a
T/ a =
a F(/, id) F(id,T/) =
a F(/,T/) =
a G/.
In what follows we will say that (a, T) is the initial type defined by the bifunctor F.
52 2 / Functions and Categories
The second remark is that, subject to certain healthiness conditions on the functor
involved, the initial algebras in Par and Rel coincide with those in Fun. This will
be proved in Chapter 5.
The third remark concerns duality. As with the definitions of terminal objects and
products, may dualise the above discussion to coalgebras. This gives a clean
one
description, for instance, of infinite lists. We shall not have any use for such infinite
data structures, however, and their discussion is therefore omitted. The interested
reader is referred to (Manes and Arbib 1986; Malcolm 1990b; Hagino 1989) for
details.
Exercises
2.38 The discussion of initial types does in fact allow bifunctors of type F : A <-
(B x A). Consider the the initial type (a,T). Between what categories is T a
functor? An example where B = Fun x Fun and A = Fun is
H(f,9),h) =
f + g.
2.39 Let F be a bifunctor, and let (a, T) be the corresponding initial type. Let G
and H be unary functors, and define LA F(GA, HA). Prove that if <j>: H 4- L, then
=
W:H<-TG.
2.40 A monad is a functor H : A «- A, together with two natural transformations
fi H77 =
id =
fi
-
rj and //•//
=
//• H/i.
Many initial types give rise to a monad, and the purpose of this exercise is to prove
that fact. Let F be a bifunctor given by
Hf,g) =
f + Gg,
for some other functor G. Let (a, T) be the initial type of F. Define (j) = a inl and
if) =
([id, a -
inr]). Prove that (T, 0,^) is a monad, and work out what this means
Bibliographical remarks
The material presented in this chapter is well documented in the literature. There
is variety of textbooks on category theory that are aimed at the computing
now a
science community, for instance (Asperti and Longo 1991; Barr and Wells 1990;
Pierce 1991; Rydeheard and Burstall 1988; Walters 1992a). The idea to use initiality
for reasoning about programs goes back at least to (Burstall and Landin 1969), and
was reinforced in
(Goguen 1980). However, this work did not make use of F-algebras
and thus lacks the conciseness that gives the approach its charm. Nevertheless,
the advantages of algebra in program construction were amply demonstrated by
the CIP-L project, see e.g. (Bauer, Berghammer, Broy, Dosch, Geiselbrechtinger,
Gnatz, Hangel, Hesse, Krieg-Briickner, B., Laut, A., Matzner, T., Moller, B., Nickl,
F., Partsch, H., Pepper, P., Samelson, K., Wirsing, M., and Wossner, H. 1985;
Bauer, Ehler, Horsch, Moller, Partsch, Paukner, O., and Pepper, P. 1987; Partsch
1990).
The notion of F-algebras first appeared in the categorical literature during the 1960s,
for instance in (Lambek 1968). Long before the applications to program derivation
were realised, numerous authors e.g. (Lehmann and Smyth 1981; Manes and Arbib
1986) pointed out the advantages of F-algebras in the area of program semantics.
Hagino used a generalisation of F-algebras in designing a categorical programming
language (Hagino 1987a, 1987b, 1989, 1993), and (Cockett and Fukushima 1991)
have similar goals.
It is (Malcolm 1990a, 1990b) who deserves credit for first making the program
derivation community aware of this work. The
particular treatment of
datatypes
given here is strongly influenced by the presentations of our colleagues in (Spivey
1989; Gibbons 1991, 1993; Fokkinga 1992a, 1992b, 1992c; Jeuring 1991, 1993;
Meertens 1992; Meijer 1992; Paterson 1988). In particular, Fokkinga's thesis
contains a much more thorough account of the foundations, and Jeuring presents some
spectacular applications. The paper by (Meijer, Fokkinga, and Paterson 1991) is
an introduction specially aimed at functional programmers.
One topic that we avoid in this book (except briefly in Section 5.6) is the
categorical
treatment of datatypes thatsatisfy equational laws. An
example of such
a datatype
is, for instance, the datatype of finite bags. Our reason for not discussing such
datatypes is that we feel the benefits in later chapters are not quite justified by the
technical machinery required. The neatest categorical approach that we know of to
datatypes with laws is (Fokkinga 1996); see also (Manes 1975). There are of course
many data structures that are not easily expressed in terms of initial algebras, but
recently it has been suggested that even graphs fit the framework presented here,
provided laws are introduced (Gibbons 1995).
Another issue that we shall not address is that of mechanised reasoning. We are
hopeful, however, that the material presented here can be successfully employed in
54 2 J Functions and Categories
a mechanised reasoning system: see, for instance, (Martin and Nipkow 1990).
Chapter 3
Applications
Let us now come down to earth and illustrate some of the abstract machinery we
3.1 Banana-split
Recall that the type of cons-lists over A is defined by
morphism
sum =
([zero, plus]),
where plus(a, b) =
a+b. Similarly, the function length is defined by a catamorphism
length =
([zero, succ outr]).
Given these two functions we can define the function average by
(sum, length) as a single catamorphism. This is in fact possible for any pair of
catamorphisms, irrespective of the details of this particular problem: we have
(W, ([*])> =
d(fc F outl, k F outr)]),
where F is the -
so far -
«M,G*D>-«
=
{split fusion}
<([*]) •«,([*])•€*}
=
{catamorphisms}
<ft-F([ftD,*-F([*B)
=
{split cancellation (backwards)}
(h F(outl (([A]), ([*])», k F(outr (([A]), ([*])»>
=
{F functor}
(h Fotifl F(([ftB, ([*])>, ft Fotitr F(([ft]), ([*])»
=
{split fusion (backwards)}
(h Fotifl, ifc Fotitr) F(([ft]), ([*])>.
Applying the banana-split law to the particular problem of writing (sum, length) as
(sum, length) =
([zeros, pZ^ss])
where zeros =
(zero, zero), and pluss(a, (6, n)) =
(o + 6, n + 1). The banana-split
law is perfect example of the power of the categorical approach: a simple technique
a
Exercises
3.3 A list of numbers is called steep if each element is greater than the sum of the
elements that follow it:
3.4 The pattern in the preceding exercise can be generalised as follows. Suppose
that h: B*- FB, and
T *^ FT
F{/,W)
|
A+— F(A xB)
9
commutes. Construct k such that / = outl ([k]) and prove that your construction
works.
Apply the preceding exercise to obtain an efficient program for testing whether a
tree is balanced.
3.6 The function preds : list Nat 4- Nat takes a natural number n and returns the
list [n, n 1,..., 1]. Apply Exercise 3.4 to write preds in terms of a catamorphism.
fact =
product preds,
where product returns the product of a list of numbers. Use the preceding exercise
and fusion to obtain a more efficient solution.
58 3 J Applications
f-a =
h-F{f,g) A p-a =
*-F(/,p>
</,<?> =
(KM)]).
It may behelpful to start by drawing a diagram of the types involved. Show that the
banana-split law and Exercise 3.4 are special cases of the mutual recursion theorem.
The initial type of cons-lists is the basis of the circuit design language Ruby (Jones
and Sheeran 1990), which is in many ways similar to the calculus used in this book.
Ruby does, however, have a number of additional primitives. One of these primitives
is called triangle. For any function / : A «- A, the function trif : listr A «- UstrA
is defined informally by
flc,0D-*ri/ =
([c,$-(tdx/)]) 4= f-c =
caiidf-g =
g-(fxf).
In Ruby, this fact is called Horner's rule, because it generalises the well-known
method for evaluating polynomials. If we take c 0, p(o, b)
=
a+6, and/ o
=
axx, =
Ox x =
0
(a+ b) x x = a x x + b x x,
we have
oq + oi x x + Ofi x x2 H h an x xn
=
oq + («i + (oa H (an + 0)xx--)xx).
In Ruby, Horner's rule is stated only for the type of lists built up from nil and cons.
The purpose of this section is to generalise Horner's rule to arbitrary initial types,
and then to illustrate it with a small, familiar programming problem.
trif =
([nil, cons (id x listrf)]).
The base functor of cons-lists is F(A, B) = 1 + 4x5, and the initial algebra
3.2 J Ruby triangles and Horner's rule 59
a =
[nil, cons], so we can write the above in the form
trif =
([a>F(idJistrf)]).
This immediately gives the abstract definition: let F be a bifunctor with initial type
(a,T); then
trif =
([a.F(id,T/)]).
For the definition to make sense we require / to be of type A 4- A for some A, in
which case trif : TA+-TA. We aim to generalise Horner's rule by finding conditions
such that
([ffD-tri/ =
fo-F(id,/)D.
The type information is illustrated in the following diagram:
tri f
A F(A, A) JA + -
JA
f 9
A A A
fo]).a.F(id,T/) =
tf-F(uJ,/).F(ui,fo])).
We calculate:
(M)-a.F(fd,T/)
=
{catamorphisms}
S-F(id,fo5).F(«J,T/)
=
{F bifunctor}
9-HidM-V)
=
{type functor fusion (2.14)}
S-F(W,ds-F(/,«)]))
=
{claim: see below}
9 F(«,/ tftfD)
¦
=
{F bifunctor}
g-Hid,f)-F(idA9D)-
The claim is that ([g F(/, id)]) =
f ([#]). Appealing to fusion a second time, this
equation holds if
f'9 =
9'Hf,id).f(id,f).
60 3 J Applications
<M)-*ri/ =
fo.F(uJ,/)5 <= f-9 =
9-f(f,f).
For the special case of lists this is precisely the statement of Horner's rule in Ruby.
Depth of a tree
Now consider the problem of computing the depth of a binary tree. We define such
trees with the declaration
/ (tip o) =
g a
f(bin(x,y)) =
h(f x,f y),
so ([g, ft]) generic fold operator for binary trees.
is the In particular, the map operator
treef for binary trees is defined by
treef =
([[tip, bin] F(/, id)]).
At the point level, this equation translates into two equations
(tip a)
tree f =
tip (f a)
treef (bin (x, y)) =
bin (treef x, treef y).
The function max : N 4- tree N returns the maximum of a tree of numbers:
max =
([id, bmax]),
where bmax (a, b) returns the maximum of a and b. The function depths : treeN *-
tree A takes a tree and replaces every tip by its depth in the tree:
where zero is the constant function returning 0, and succ is the successor function.
Finally, we specify the depth of a tree by
depth =
max depths.
this takes 0(n2) steps. We aim to improve the efficiency by applying the generalised
statement of Horner's rule to the term max tri succ. The proviso of Horner's rule
in this case is that
depth
=
{definitions}
max tri succ tree zero
=
{Horner's rule}
([[id, bmax] (id + succ x succ)]) tree zero
=
{coproducts}
([id, bmax (succ x succ)]) £ree zero
=
{since bmax (succ x succ) succ bmax} =
tree.
The moral of this example is that the categorical proof of familiar laws about lists
(such as Horner's rule) are free of the syntactic clutter that a specialised proof
would require.
Furthermore, the categorically formulated law sometimes applies to
programming examples that have nothing to do with lists.
Exercises
3.9 The function slice :: list (listr+A) «- list (listr+A) is given informally by
slice [xo,xi,...,xn-i] =
[drop 0 xo, drop lxi,..., drop (n -
l)xn_i],
where drop n x drops the first n elements from the list x. Define the function slice
in terms of tri.
62 3 J Applications
3.10 The binary hyperproduct of a sequence of numbers [ao, ai,..., an_i] is given
by nr=To °r* Using Horner's rule, derive an efficient program for computing binary
hyperproducts.
foD-tri/ =
(fo-F(»d,ft)D.
Draw a diagram of the types involved and prove the new rule.
3.12 Show that, when the new rule of the preceding exercise is applied to
polynomial evaluation, there is only one possible choice for ft.
3.13 Specify the problem of computing $^£T0 io>% in terms of tri. Horner's rule is
not immediately applicable, but it is if you consider computing ($27=0 *°»» 2 ai)
instead. Work out the details of this application.
part one
The TjgK problem (Knuth 1990; Gries 1990a) is to do with converting between
binary and decimal numbers in Knuth's text processing system 1^}X (used to produce
this book). TJjjX uses integer arithmetic, with all fractions expressed as integer
multiples of 2~16. Since the input language of l^X documents is decimal, there is
the problem of converting between decimal fractions and their nearest representable
binary equivalents.
Here, we are interested only in the decimal to binary problem; the converse problem,
which is more difficult, will be dealt with in Chapter 10. Let x denote the decimal
fraction O.dicfe dk and let
j=k
val(x) =
J2dj/1QJ (3-1)
be the corresponding real number. The problem is to find the integer multiple
of 2~16 nearest to val(x), that is, to round 2l6val(x) to the nearest integer. If
3.3 J The Tfe?C problem -
part one 63
two integers are equally near this quantity, we will take the larger; so we want
n =
[216val(x) + 1/2J. The value n will lie in the range 0 < n < 216.
val =
([zero, shift])
shift (d, r) =
(d + r)/10.
For example, with x =
[d\, d%, ds] we obtain that val x is the number
Writing [0,216] for the set of integers n in the range 0 < n < 216, our problem is to
compute intern :
[0,216] «- Decimal, where
intern =
round val
round r =
|_(217r + 1)/2J,
under the restriction that only integer arithmetic is allowed.
range 0 < n < 216. The function extern is defined by the condition that for all
arguments n the value of extern n should be a shortest decimal fraction satisfying
intern (extern n) We cannot yet formalise this specification, let alone solve
=
n.
the problem, since the definition does not identify a unique decimal fraction, and
so extern cannot be described solely within a functional framework. On the other
hand, extern can be specified using relations, a point that motivates the remainder
of the book.
Let us return to the problem of computing intern. Given its definition, it is tempting
to try and use the fusion law for catamorphisms, promoting the computation of
round into the catamorphism. However, this idea does not quite work. To solve the
problem, we need to make use of the following 'rule of floors': for integers a and b,
with b > 0, and real r we have
L(a + r)/6j =
L(a + LrJ)/6J.
64 3 J Applications
halve n =
(n + l)div2
convert r =
[217rJ.
This division of round into two components turns out to be necessary because, as
we shall see, we can apply fusion with convert but not with halve.
obtain
Two further remarks are in order. The first is a small calculation to show that the
expression halve ([zero, cshift]) cannot be optimised by a second appeal to fusion.
We have
1)).
3.3 J The Tfe?C problem
-
part one 65
The second remark concerns the fact that nowhere above have we exploited any
property of 217 except that it was a non-negative integer. For the particular value
217, the algorithm can be optimised: except for the first 17, all digits of the given
decimal can be discarded since they do not affect the answer. A proof of this fact
can be found in (Knuth 1990).
Exercises
3.15 Taking Decimal = listr Digit (why is it valid to do so?), the function val could
be specified
val =
sum tri (/10) listr (/10).
Derive the catamorphism in the text.
3.16 Supposing we take 22 rather than 216, characterise those decimals whose
intern values are n, for 0 < n < 4.
3.18 The rule of indirect equality states that two integers m and n are equal iff
fc < m =
fc < n ,
for all fc.
Prove the rule of indirect equality. Can you generalise the rule to arbitrary ordered
sets?
3.19 The Goor of a real number x is defined by the property that, for all integers n,
<
n < x
=
n [x\.
Prove the rule of floors using this definition and the rule of indirect equality.
3.20 Show that the rule of floors is not valid when o or b is not an integer.
3.21 Show that if/ : A^-B is injective, then for any binary operator (0) : B*-Cx B
there exists a binary operator (®) : A 4- A x C such that
/(cei>) =
c<g>/6.
3.22 Let f : A<r- B and (®) : B 4- C x B. To prove that there exists no binary
operator (®) : Ai- C x A such that
/(ceii) =
C0/6,
it suffices to find c, &o and &i such that
are two important omissions: definition by cases and currying. Currying will be
dealt with in the following section; here we are concerned with how to characterise
definition by cases.
filter p[] =
[]
n,. / / w f const a, filter px), if pa
filter p (cons (a, x))
v v "
=
< £1. T
otherwise.
y filter px,
Given the McCarthy conditional form (p -> /, g) for writing conditionals, we can
filter p =
([nil, test p])
testp =
(p outl -> cons, outr).
The question thus arises: how can we express and characterise the conditional form
(p —>f,g) in a categorical setting?
In functional programming the datatype Bool is declared by
Bool ::= true false.
not =
[false, true].
3.4 J Conditions and conditionals 67
Distributive categories
undistr : A x (B + C) «- (A x B) + (A x C)
defined by undistr =
[id x inl, id x inr] (undistr is short for 'un-distribute-right').
Thus,
(/ x (9 + ^)) *
undistr =
undistr ((/ x g) + (/ x
ft))
for all /, g and ft of the appropriate types. In a distributive category undistr is, by
assumption, a natural isomorphism. This means that there is an arrow
distr :
(A x B) + (A x
C) <-A x (B + C)
such that distr undistr =
id and undistr dis£r = id.
unnull: A x 0 «- 0
A x (B + C) £
(4x5) + (ixC)
ixO £ 0,
AxB^BxA A + B*±B + A
Axl^A 4+ 0^4,
described in Exercise 2.26. Below we shall sometimes omit brackets in products and
coproducts that consist of more than two components.
68 3 J Applications
One consequence of a category being distributive is that there are non-trivial arrows
whose source is a product, the trivial arrows being the identity arrow and the
We will leave the proof as an exercise. It follows that we can define arrows of type
Bool 4- Bool2 in terms of arrows of type Bool «-l + l + l + l. For example, we can
define
and =
[true, false, false, false] quad
and the conjunction of p, q : Bool 4- A by and (p, q). Other boolean connectives
can also be defined by such 'truth tables'.
(P^f,9) =
\f,9]-p1. (3-2)
The arrow p? is defined by
p? =
(unit 4- unit) distr (id,p).
The types are shown by the following diagram:
(id,p) -? A x Bool
p?l distr
The association of conditions p with arrows pi is injective (see Exercise 3.25). Using
(3.2), let us now show that the following three
definition properties of conditionals
hold:
h-(p^f,9) =
(p^h-f,h-g) (3.3)
(p^>f,9)-h =
(p-h^f-h,g.h) (3.4)
(P-*/,/) =
/• (3-5)
Equation (3.3) is immediate from (3.2) using the distributivity property of coprod-
ucts. For (3.4) it is sufficient to show
(ft + ft)-(p-ft)?
=
{definition}
(ft + ft) (unit 4- unit) dis£r (id,p ft) *
=
{naturality of distr and unit}
(unit 4- tmi£) dis£r (ft x id) (id,p ft) *
=
{products}
(unit 4- tmi£) dis£r (id,p) ft
=
{definition}
=
{definition}
[/",/] (unit 4- tmi£) d^r (id,p)
=
{coproducts; naturality of unit}
wm* [/" x id,/ x id] dis£r (id,p)
=
{claim: see below}
Mnit (/ x id) (id,p)
=
{products}
unit- (/,p)
=
{since unit om*/} =
/•
The claim is an instance of the fact that
/ x
[#, ft] =
\f x £,/ x
ft] disfr.
The proof, which we leave as a short exercise, uses the definition of undistr and the
fact that undistr distr = id.
Exercises
j a oM*r
0
commutes, then there must exist an arrow unnull such that unnull md/ = id and
null unnull = id:
70 3 / Applications
ti =
(! + !)•*.
Hint: first show that (! + !)• distr =
(! + !)• outr.
where RA =
A + A and G>1 =
ix .Boo/.
filter p =
concat fo^r (p —> wrap, ni/),
where wrap a =
[a] and nil : listr A 4- A is a constant returning the empty list.
Consider once more the type listr A of cons-lists over A. In functional programming
the function cat : listr A 4- listr A x listr A is written as an infix operator -If and
[]*» =
y
cons (a, x) -W-y =
cons (a, x 4f y).
In terms of our categorical combinators these equations become
cat (nil x
id) =
outr
cat (cons x
id) =
cons (id x
cat) assocr,
(j> =
{id 4- assocr) distl
But how do we know that equation (3.6) defines cat uniquely? The function cat
The answer is to consider a variant ccat of cat in which the arguments are curried.
Suppose we define ccat: (listr A <- listr A) <- listr A by ccat x y
= x -H- y. Then we
have
ccat[] =
id
ccat =
([const id, compose (ccons x
id)]),
where const f is areturning / and compose(f, g)
constant =
/ #. Just to check
this, we expand the catamorphism to two equations:
ccat cons =
compose (ccons x
ccat).
Applying the first equation to the element of the terminal object, and the second
to (o,x), we obtain the pointwise versions
ccat nil =
id
All this leads to a more general problem: consider a functor F with initial type
(a,T), another functor G, and a transformation (J)a,b '-
G(i4 x B)i-FAx B. What
conditions on (j> guarantee that the recursion
/ (a .
x
id) =
h G/ (j) (3.7)
defines a unique function / for each choice of ft?
72 3 J Applications
In a diagram we have
B + QB
h
To solve the problem we use the same idea as before and curry the function /. To
do this we need the idea of
function space object A *- B, more
a usually written in
the form AB. Function space objects are called exponentials.
Exponentials
g
=
curry f
=
apply (g x id) =
/.
For fixed A and B, this definition can be regarded as defining a terminal object in
the category Exp, constructed as follows. The objects of Exp are arrows A<- C x B
in C. An arrow h «- k of Exp is an arrow f:CxB*-DxB of C just when the
C x B ^ DxB
curry apply =
id,
and the fusion law reads
curry f g =
curry (/ (g x
id)).
If a category has finite products, and for every pair of objects A and B the
exponential AB exists, the category is said to be cartesian closed. In what follows, we
Returning to the problem of solving equation (3.7), the fusion law gives us that
/ (a x
id) =
ft G/ (j> =
curry f a =
curry (ft G/ (j>).
Our aim now is to find a A; so that
curry (h G/ 0)
=
{curry cancellation}
cum/ (ft G (apply (curry f x id)) 0)
=
{functor}
cum/ (ft G apply G (curry f x id) 0)
=
{assumption: (j> natural}
curry (ft G apply 0 (F(curryf) x id))
=
{curry fusion law (backwards)}
curry (ft G apply 0) F(curry f).
G(ft x
id) -<j) =
<j> (Fft x id).
In summary, we have proved the following structural recursion theorem.
f (ax id)
-
=
h-Gf (/> -
if and only if
/ =
apply (([cum/ (ft Gapply 0)]) x
id).
Let us now see what this gives in the case of cat. We started with
cat -
(ax id) =
[outr, cons] (id + id x
cat) 0,
where 0 (id 4- assocr) dis£Z. So, ft
= =
[ot^r, cons] and G/ =
(id -\- id x
f). The
naturality condition on (/> is
(id + id x (ft x
id)) <j> =
0 ((id + (id x
ft) x
id)),
74 3 J Applications
cat =
([curry ([outr, cons] (id + id x
apply) (id + assocr) distl)]),
which simplifies to
cat =
([curry ([outr, cons (id x apply) ossocr] dis£Z)]).
We leave it as an instructive exercise to recover the pointwise definition of cat from
the above catamorphism.
Tree traversal
Let us look at another illustration. Consider again the initial type of trees
introduced in Section 3.2. The function tips returns the list of tips of a given tree:
tips =
([wrap, cat]).
Here cat (x, y) = x -H- y is the concatenation function on lists from the last section,
and wrap is the function that converts an element into a singleton list, so wrap a =
[a]. In most functional languages, the computation of rr-H-y takes time proportional
to the length of x. Therefore, when we attempt to implement the above definition
directly in such a language, the result is a quadratic-time program.
To improve the efficiency, we aim to design a curried function tipcat such that
tipcat tx =
tips t -H- x.
Since the empty list is the unit of concatenation we have tips t tipcat t [], so tipcat =
Using curry, we can write the above definition of tipcat more briefly as
This suggests an application of the fusion law. Can we find an / and op so that
both of the following equations hold?
reason as follows:
3.5 J Concatenation and currying 75
Hence we have
tips t =
([curry cons, compose]) t nil.
Exercises
cat (nil x
id) = 0M£r
3.32 Prove that any cartesian closed category that has coproducts is distributive.
3.34 Construct a bijection between arrows of type A^-B and arrows of type AB^-\.
3.35 What does it mean for a preorder to be cartesian closed? (See Exercise 2.6
for the interpretation of preorders as categories.)
3.36 Let B be an object in a cartesian closed category. Show how (_)B can be
made into a functor by defining fB for an arbitrary arrow /.
3.37 Show that if A is cartesian closed, then so is AB. (See Exercise 2.19 for the
definition of AB.)
76 3 / Applications
3.39 The function cpr (short for 'cartesian product, right') with type
cpr(x,b) =
[(o, b) | a
4-x].
Give a point-free definition of cpr in terms of listr.
mapA.B : F4FB «- AB .
Show that every functor of Fun is strong. Give an example of a functor that is
notstrong. (Warning: in the literature, strength usually involves a number of
additional conditions. Interested readers should consult the references at the end
of this chapter.)
3.41 What conditions guarantee that
/ (id x a) =
ft G/ <j)
has a unique solution for each choice of ft?
3.42 Show that the following equations uniquely determine iter (#, ft) : A*- (Nat x
B), for each choice of g : A 4- B and ft : A 4- A:
id x iter (id,
l—LJ.
ft) ...
Nat -* Nat x A + ¦
(Nat x Nat) x A
iter (id, ft) plus x id
3.46 Recall the function convert : listr A «- listl A which produces the cons-list
corresponding to a given snoc list. It is defined by
convert =
([nil, snocr]),
where snocr (x,a) =
x -H- [a]. Improve the efficiency of convert by introducing an
accumulation parameter.
reverse =
([nil, snocr]),
where snocr was defined above. Improve the efficiency of reverse by introducing an
accumulation parameter.
3.48 The function depths, as defined in terms of tri, takes quadratic time. Derive a
linear-time implementation by introducing an accumulation parameter. Hint: take
kan tree (+n) a, and e = 0.
3.49 In analogy depth of a tree example, we can also define the minimum
with the
depth, and the minimum depth can be written as a catamorphism. Direct evaluation
of the catamorphism is inefficient because it will explore subtrees all the way down
to the tips, even if it has already found a tip at a lesser depth. Improve the efficiency
and e =
(0,oo).
3.50 Consider the recursion scheme:
3.51 Using the preceding exercise and Exercise 3.46, check that convent xy =
uncurry convcat =
loop cons.
([e,/]) convert =
loopf (id, e •!).
How can snoc-list catamorphisms be implemented by a loop over cons-lists?
Bibliographical remarks
The banana-split law was first recorded by (Fokkinga 1992a), who attributes its
catchy name to Meertens and Van der Woude. Of course, similar transformations
have been studied in other contexts; they are usually classified under the names
tupling (Pettorossi 1984) or parallel loop fusion.
Our exposition on Horner's rule in Ruby does not do justice either to Ruby or to the
use of this particular rule. We entirely ignored several important aspects of Ruby,
partly because only introduce these once relations have been discussed. The
we can
standard introduction to Ruby is (Jones and Sheeran 1990). Other references are
(Jones and Sheeran 1993; Hutton 1992; Sheeran 1987, 1990). (Harrison 1991)
describes a categorical approach to the synthesis of static parallel algorithms which
is very similar to the theory described here, and is also similar to Ruby. (Skillicorn
1995) considers the categorical view of datatypes as an appropriate setting for
reasoning about architecture-independent parallel programs. In (Gibbons, Cai, and
Skillicorn 1994), some parallel tree-based algorithms are discussed.
Distributive categories are the subject of new and exciting developments on the
border between computation and category theory. The exposition given here was
heavily influenced by the text (Walters 1992a), as well as by a number of research
papers (Carboni, Lack, and Walters 1993; Cockett 1993; Walters 1989, 1992b).
The connection between distributive categories and the algebra of conditionals was
definitively explored by Robin Cockett (Cockett 1991).
The subject of cartesian closed categories is rich and full of deep connections to
computation. Almost all introductory books on category theory mention cartesian
closed categories; the most comprehensive treatment can be found in (Lambek and
Scott 1986). The trick of using currying to define such operations as concatenation
in terms of catamorphisms goes at least back to Lawvere's Recursion theorem for
natural numbers: see e.g. (Lambek and Scott 1986). In (Cockett 1990; Cockett and
Spencer 1992) it is considered how the same effect can be achieved in categories that
do not have exponentials. The key is to concentrate on functors that have tensorial
strength, that is a natural transformation 0 :
F(A x B) «- FA x B satisfying certain
Bibliographical remarks 79
The interplay between fusion and accumulation parameters was first studied in
(Bird 1984). Our appreciation of the connection with currying grew while reading
(Meijer 1992; Meijer and Hutton 1995), and by discussions with Masato Takeichi
and his colleagues at Tokyo University (Hu, Iwasaki, and Takeichi 1996).
Chapter 4
We now generalise from functions to relations. There are a number of reasons for
this step. First, like the move from real numbers to complex ones, the move to
relations increases our powers of expression. Relations, unlike functions, are essentially
nondeterministic and employ them to specify nondeterministic problems.
one can
optimal solution among a set of candidates without also having to specify precisely
which one should be chosen. Every relation has a well-defined converse, so one can
specify problems in terms of converses of other problems.
On the other hand, in the hundred years or so of its existence, the calculus of
relations has gained a good deal of notoriety for the apparently enormous number
of operators and laws that one has to memorise in order to do proofs effectively.
In this chapter we aim to cope with this problem by presenting the calculus in five
successive stages, each of which is motivated by categorical considerations and is
sufficiently small to be studied as a unit. We will how these parts interact, and
see
how they can be put to use in developing a concise and effective style of reasoning.
4.1 Allegories
target, source, composition and identities. These extra operators are inspired by
the category Rel of sets and relations. Briefly, we can compare relations with a
82 4 J Relations and Allegories
partial order C, take the intersection of two relations with fl, and take a relation to
its converse with the unary operator (_)°. The purpose of this section is to describe
these operators axiomatically.
Inclusion
The first assumption is that any two arrows with the same source and target can be
compared with a partial order C, and that composition is monotonic with respect
to this order: that is,
(Si C S2) and (li C T2) implies (Si li) C (S2 T2).
In Rel, where a relation R : A «— B is interpreted as a subset R C Ax B, inclusion
of relations is the same as set-theoretic inclusion; thus
RCS =
(Va,b:aRb=>aSb).
Monotonicity of composition is so fundamental that we often apply it tacitly in
proofs. An expression of the form S C T is called an inequation, and most of the
laws in the relational calculus are inequations rather than equations. The proof
format used in the preceding chapter adapts easily to reasoning about inequations,
as long as we don't mix reasoning with C and reasoning with D. A proof of R S =
by two separate proofs, one of R C S and one of S C i2, is sometimes called a ping-
pong argument. Use of ping-pong arguments can often be avoided either by direct
equational reasoning, or by an indirect proof in which the following equivalence is
exploited:
R =
S =
(X CR =
XCS) for all X.
depicts the inequation 5i Ti C S2 T2. In such cases, one says that a diagram
semi-commutes.
4.1 J Allegories 83
Meet
The second assumption is that for all arrows i?, S : A 4- B there is an arrow
RnS : A «— B, called the meet of R and £, and characterised by the universal
property
XC(RnS) =
(XCR) and (X C 5), (4.1)
for all X : j4«-1?. In words, RnS is the greatest lower bound of i? and S. Using this
universal property of meet it can easily be established that meet is commutative,
associative and idempotent. In symbols:
Rns =
snR
Rn(SnT) =
(RnS)nT
RHR =
R.
R-(SnT) C
(R.S)n(R-T)
(RnS)>T C (R-T)n(S>T).
Given n as an associative, commutative, and idempotent operation, we need not
postulate inclusion of arrows as a primitive concept, for R C S can be defined as
an abbreviation for R H S = R.
Converse
of R (and also known as the reverse or reciprocal of i?). The converse operator has
three properties. First, it is an involution:
(R°)° =
R. (4.2)
Second, it is order-preserving:
RCS = R°CS°. (4.3)
Third, it is contravariant:
(R-S)° =
S°R°. (4.4)
Using (4.2) and (4.3), together with the universal property (4.1), we obtain that
converse distributes over meet:
(Rf)S)° =
R°nS°. (4.5)
Use of these four properties in calculations will usually be signalled just by the hint
converse.
84 4 J Relations and Allegories
There is one more axiom that connects all three operators in an allegory. The axiom
is called the modular law and states that
By applying converse to both sides of the modular law and renaming, we obtain the
dual variant
(R-S)HT C
(Rn(T-S°))-(Sn(R°-T)). (4.8)
Let us prove that (4.8) is equivalent to the preceding two versions. First, mono-
tonicity of composition gives at once that (4.8) implies both (4.6) and (4.7). For
the other direction, we reason:
(R-S)HT
=
{meet idempotent}
(R s) n t n t
C {inequation (4.7), writing U R 0 (T S°)}
=
(u-S)nT
C
{inequation (4.6)}
u-(sn(u°- T))
C {since U C R; converse; monotonicity}
U-(Sn(R°- T)).
4.1 J Allegories 85
(R°'S)md c
(RnS)°-(RnS). (4.9)
This inclusion is useful when reasoning about the range operator, defined below.
R C RR°R. (4.10)
This completes the formal definition of an allegory. Note that neither the join
operation (U) nor the complementation operator (-1) on arrows are part of the
definition of an allegory, even though both are meaningful in Rel.
Exercises
4.1 Using an indirect proof and the universal property of meet, prove that meet is
associative: R n (S 0 T) =
(R n S) 0 T.
A
R D
t
R
A + B
Various properties of relations that relate to order are familiar from ordinary set
theory,and can be stated very concisely in the language of relations and allegories.
An arrow R : A
A is said to be symmetric if R C R°. Because converse is a
*-
monotonic involution, this is the same as saying that R R°, Again it is easy to =
A less familiar notion, but one which turns out to be extremely useful, is that of
a coreflexive arrow. An arrow C : A 4- A is called coreflexive if C C ida- One
can think of C as a subset of A. Every coreflexive arrow is both transitive and
symmetric. Here is a proof of symmetry:
C
C {inequation (4.10)}
C C° C
C {since C coreflexive}
id C° id
=
{identity arrows}
C°
Associated with every arrow R : A *- B are two coreflexives ran R : A*- A and
dom R : B+-B, called the range and domain of R respectively. Below we shall only
discuss range; the properties of domain follow by duality since, by definition,
domR =
ranR°.
ranRCX =
RCXR for all X C idA. (4.11)
4.2 J Special properties of arrows 87
The intended interpretation in Rel is that a(ran R)a holds if there exists an element
b such that aRb.
ranR =
(R-R°)nid. (4.12)
To prove that (4.12) implies (4.11), note that
R =
RHR C ((R R°) Hid) R = ranRR
ranRCX =» RCX-R
(R R°) fl id
C {assume R C X R} -
(X-R>R°)n id
C {modular law}
x-((R-R?)nx°)
C {meet}
C {assuming X is a coreflexive}
R =
ranRR. (4.13)
Taking R =
S-T and X = ranS in (4.11), and using (4.13), we obtain ran (5- T) C
Finally, let us consider briefly how the range operator interacts with meet. Prom
the direct definition of range (4.12) and monotonicity, we have
ran(RnS) C idn(R-S°).
The converse inequation also holds, by (4.9), and therefore
ran(RnS) =
idn(R-S°). (4.15)
Simple arrows are also known as imps (short for implementations) or partial
functions. In set-theoretic terms, S is simple if for every b there exists at most one o
such that aSb. Simple arrows satisfy various algebraic properties not enjoyed by
arbitrary arrows. For example, the modular law can be strengthened to an identity:
s (R n (S° r)) c (s r) n (s s° r) c (s R) n r.
(RHT)-S =
(R-S)n(T-S) provided S is simple. (4.17)
Again, the proof makes essential use of the modular law.
idB C R° >R.
RnS entire =
idCR°>S. (4.18)
This condition will be useful below.
4.2 J Special properties of arrows 89
The following two shunting rules for functions are very useful:
f-RCS =
RCfo-S (4.19)
R-f°CS =
RCS-f. (4.20)
To prove (4.19) we reason:
f-RCS
=> {monotonicity}
f°-f-RCf°.S
=> {since / is entire}
R C f° S
=> {monotonicity}
f-RCf-f°-S
=> {since / is simple}
f-RCS.
The dual form (4.20) is obtained by taking converses. Any arrow / satisfying either
(4.19) or (4.20) for all R and S is necessarily a function; the proof is left to the
reader.
An easy consequence of the shunting rules is the fact that inclusion of functions
reduces to equality:
VQ 9) =
(f =
9) i=
(f^g).
We reason:
fQ9
{shunting}
idCfO-g
{shunting}
9°Qf°
{converse is a monotonic involution}
gQf-
Functions can also be characterised without making explicit use of the converse
We now reason:
S
C
{since R is entire}
R° R 5
C {since R- S C id}
R°.
By taking R = S° and 5 = i2° in the above argument, we also have i?° C 5, and
so S =
#°.
Exercises
(RnS).
4.11 Let C be coreflexive. Prove that
(c-x)md =
(i.c)na
=
(c-jr-c)nid
=
c-(xmd)
=
(xn id) a
4.16 Prove that ran (Rn(S- T)) = ran ((R T°) n 5).
4.17 Prove that dora #•/=/• dora (R-f).
4.18 A iocaie is a partial order (C, V) in which every subset X C V has a least
upper bound \JX, and any two elements o, 6 have a greatest lower bound o l~l b.
(U*)n& =
LKanHae*}-
A V-valued relation of type A 4- B is a function Ff-(AxB). Show that V- valued
relations form an allegory.
Tabulations
tabulation of R if
R =
f-g° and (/° •/) n (g° g) -
= id.
ft =
k =
(/ ft =
/ k and g ft =
g k).
In one direction we reason:
/ ft =
/ k and g -
h =
g
-
k
=
{shunting of functions}
h-k° C f°.f and h-k° Cg°.g
=
{meet}
h-k°C(f°.f)n(g°-g)
=
{assumption}
ft fc° C id
=
{shunting of functions}
ft C jfc
=
{inclusion of functions is equality}
ft = jfc.
For the other direction, assume that (ft, k) is a tabulation of (/° /) fl (g° #):
h-k°C(f°.f)n(g°-g)
=
{as before}
/ ft =
/ k and g -
h =
g k
=
{assuming (/, #) is jointly monic}
ft =
fc,
and so (/° /) fl (g° #) = ft ft° C id. But since / and g are entire, this inclusion
can be strengthened to an equality.
The kind of reasoning seen in the last part of this proof is typical: we are essentially
doing pointwise proofs in a relational framework. Similar reasoning arises in the
proof of the next result, which gives a characterisation of inclusion in terms of
functions.
ft. ,k
B m
commutes, we have
h-k°
=
{assumption, converse}
/ m ra° g°
C
{m simple}
f-9°
=
{(/>#) tabulates R}
R.
m ra°
=
{definition and converse}
(V° -h)n(g° •!:))¦ ((h° -f)n(k° -g))
C {monotonicity}
(/°-fc-fc°-/)n(fl0-*-*0-s)
C
{h and fc are simple}
(r-f)n(g°-g)
Q {(/>P) tabulates R}
id.
id
C {h and A; are entire}
ft° ft k° fc
C {assumption and (/, #) tabulates i?}
ft°./'(?0-fc.
94 4 J Relations and Allegories
/•mC/./°.ftCft,
we obtain that / m h because inclusion of functions is equality. By symmetry,
it is also the case that g m k. =
Finally, the fact that is uniquely defined by
m
/ .
m = h and g m =k follows at once from the fact that (/, g) is jointly monic.
One import of this result is that tabulations are unique up to unique isomorphism,
thatis, if both (/, g) and (A, k) tabulate i?, then there is a unique isomorphism m
such that h =
/ m and k =
g m.
Unit
A unit in an allegory is an object U with two properties. First, idu is the largest
arrow of type U «— U, that is,
The allegory Rel is unitary: a unit is a singleton set and pa is the unique function
mapping every element of A to the sole inhabitant of the singleton. Recall that a
singleton set in Rel is a terminal object in Fun, its subcategory of functions.
In a unitary allegory we have, for any relation R : A 4- B, that
RQPA°'PB
=
{shunting functions}
Pa- R' Pb° Q ^u
=
{definition unit}
true.
In other words, pa° Pb is the largest arrow of type A 4- B. From now on, we shall
denote this arrow by II: A <- B. In Rel, the arrow II is just the product Ax B.
As a special case of the above result we have that II: U <- A is the arrow pa, and
since inclusion of functions is equality, it follows that pa is the only function with
this type. Thus a unit in an allegory is always a terminal object in its subcategory
of functions, a point we illustrated above in the case of Rel.
43 / Tabular allegories 95
R =
f-g° =
l.\°Cid,
and the claim is proved. Based on this equivalence of terminal objects and units,
we will write U in preference to pa, and 1 instead of U when discussing units.
Finally, let us consider the tabulation (/, g) of II : A <- By where f : A<- C and
g : B «- C for some C. k° C II for any two functions ft : A <- D and
Since ft
k : B «- -D, we obtain from Proposition 4.2 that m = (/° ft) f) (g° k) is the unique
arrow such that ft / m and A;
= =
g m. But this just says that C, together with
*
(ft, k) =
(outl° ft) fl (o^r° k).
We will use this fact in the following chapter, when we discuss how to obtain a
Set-theoretic relations
Finally, let us briefly return to the question of pointless and pointwise proofs. There
is a metartheorem about unitary tabular allegories which makes precise our intuition
that they are very much like relations in set theory.
E1 =
DlAE2 =
D2A...AEn =
Dn => En+! =
Dn+ly
where Ei and Di are expressions that refer only to the operators of an allegory, as
well as tabulations and units. The meta-theorem is that a Horn-sentence holds for
every unitary tabular allegory if and only if it is true for the specific category Rel
of sets and relations. In other words, everything we have said so far could have
been proved by checking it in set theory. A proof of this remarkable meta-theorem
is outside the scope of this book and the interested reader is referred to (Freyd and
Scedrov 1990) for details.
Although set-theoretic proofs are valid for any unitary tabular allegory, direct proofs
from the axioms are usually simpler, possess more structure, and are more revealing.
Exercises
4.20 Prove that for every function / there exist functions c and m such that
/ =
m c and c c° = id and ra° m =
id.
Is this factorisation in some sense unique? (In text books on category theory, this
factorisation is introduced under the heading 'epi-monic factorisation'.)
4.21 Show that if (/,#) tabulates R and R is simple, then g is monic.
4.23 Using the above two exercises, show that if (/,#) tabulates R and i? is a
4.24 Show that h fc° C R 5 iff there exists an entire relation Q such that
two relations. In fact, we will consider the more general operator |J that returns
the union of a set of relations. Inparticular, we shall see how its distributivity
properties give rise to two other operations, implication (=>) and division (\).
An allegory is said to be locally complete if for any set H of arrows A <- B there is
an arrow \JH A<r-B,
: called the join of H, characterised by the universal property
\JHCX =
(VReHiRCX)
for all X : A 4- B.
4.4 J Locally complete allegories 97
(\jH)nS =
{J{RnS\ReH}
qjh)-s =
\j{R-s\Ren}.
Neither of these equations follows from the universal property of join. On the other
hand, the universal property does give us that converse distributes over join:
(IPO0 =
\J{R°\Rzn}.
This is because converse is a monotonic involution. In the special case where H is
the empty set we write 0 for \JH; when H =
{i2,S} we write R\J S. By taking H
to be the empty set we obtain that 0 is a zero both of meet and composition. In
Rel, the arrow 0 is the empty relation.
Implication
Given two arrows R,S:A*-B, the implication (R => S) : A*- B can be defined
by the universal property
XC(R^S) =
(Xf)R)CS for all X: A<-B.
r =» s =
U{* \(xnR)c s}.
To prove that this definition satisfies the universal property, assume first that (X n
X C (R =» S)
=
{definition of R =» S}
XQ\J{Y\(YnR)CS}
=> {meet distributes over join}
xnRc[j{YnR\(YnR)cs}
=> {universal property of join}
XDRCS.
98 4 J Relations and Allegories
Composing preorders
One use of implication is incomposing preorders. Consider, for instance, the lexical
ordering on pairs of numbers:
If R and S are preorders, then R ; S is a preorder. The proof makes use of the
symmetric modular law (4.8) and is left as an exercise. One can also show that (;)
is associative with unit II. This, too, is left as an exercise.
Division
Given two arrows R and S with a common target, the left-division operation R\S
(pronounced 'R under 5") is defined by the universal property
XCR\S =
RXCS.
In a diagram, X =
R\S is the largest arrow that makes the triangle
A* C
r\
~
/s
B
a(R\S)c =
(V6 : bRa => bSc),
so the operation (\) gives us a way of expressing specifications that involve universal
quantification.
4.4 J Locally complete allegories 99
R\S =
[J{X \R-XCS}.
The proof that this works hinges on the fact that composition distributes over join,
and is analogous to the argument for implication spelled out above. Note that R\S
is monotonic in S, but anti-monotonic in R. In fact, we have
(RUS)\T =
(R\T) n (S\T) and R\(S flT) =
(R\S) n (R\T).
The universal property of left-division also gives the identity
(R-S)\T =
S\(R\T),
but nothing interesting can be said about R\(S T).
The cancellation law of division is
R-(R\S) C S
Right-division
Since composition is symmetric in both arguments we can define the dual operation
of right-division S/R (pronounced 'S over i?') for any relations S and R with a
common source:
XCS/R =
XRCS.
a(S/R)b =
(VciaSc^bRc).
Since converse is a monotonic involution the two division operators can be defined
in terms of each other:
R\S =
(S°/R°)° and S/R =
(R°\S°)°.
Sometimes it's better to use one version of division rather than the other; the choice
is usually dictated by the desire to reduce the number of converses inan expression.
Exercises
4.28 Prove that the meet of a collection H of arrows can be constructed as a join:
f]H =
[){S\(VR£H:SCR)}.
100 4 J Relations and Allegories
R-SCX =
RCSUX
R-m = r
RUS =
RU(S-R)
R-(SUT) = R-S-T
(RUS)-T =
(fl- T)U(5- T).
prove that
((/I 5) n T =
0) =
(B n (T 5°) =
0).
4.32 Prove the following properties of implication using its universal property:
R=>(S=>T) =
(RHS)^T
(RUS)^T =
(fl =» T) H (5 =» T)
fl =» (5n T) =
(R^S)n(R^T)
R fl (72 =» 5) = fl fl 5.
r-(R=>S)-g =
(f°-R-9)^(f°-S-g).
4.34 Prove that fl;S is a preorder if fl and £ are. Also prove that (;) is associative
with unit II.
(R\S)-f =
R\(S-f)
f°-(R\S) =
(R-f)\S.
4.36 Let (<, A) and (C, B) be preorders (in the ordinary set-theoretic sense, not as
arrows in an allegory). A Galois connection is a pair (/, g) of monotonic mappings
x<gy =
fxQy.
4.5 J Boolean allegories 101
The function / is called the lower adjoint, and g the upper adjoint For example,
defining
fX =
XDR and g Y
=
R =» F,
we see that the universal property of => in fact asserts the existence of a Galois
connection. Spot the Galois connection in the universal properties of division and
subtraction (see Exercise 4.30).
The following few exercises continue the theme of Galois connections.
4.37 What does it mean to say that the mapping X «-? X R is lower adjoint to
Y^SYl
4.38 In Exercise 3.19, we defined the floor of a rational number using a universal
property. This property can be phrased as a Galois connection; identify the relevant
preorders and the adjoints.
4.39 Show how the universal property of binary meet can be viewed as a Galois
connection.
4.40 Now consider a Galois connection between complete lattices (partially ordered
sets where every subset has both a least upper bound (lub) and a greatest lower
bound (gib)). Prove that the following two statements are equivalent:
gy =
lub{x\fx<y}-
This notion of negation satisfies a number of the properties one would expect. First
of all, negation is order-reversing. Furthermore, we have De Morgan's law
-.(/JU5) =
(-./I) n (-.5).
In general, however, it is not true that negation is an involution.
102 4 J Relations and Allegories
particular, Rel is boolean. Boolean allegories satisfy many properties that are not
valid in other allegories. For instance, division can be defined in terms of negation:
X/Y =
^X-Y°). (4.21)
(Prom now on, we omit the brackets round -iX, giving -• the highest priority in
formulae.) This definition expresses the relationship between universal and existential
quantification in classical logic. To prove (4.21), it suffices to show that
^R/Y =
-.(/*.y°), (4.22)
because taking R -*X and using X
=
-*(-*X) gives the desired
=
result. It turns
out that equation (4.22) is valid in any locally complete allegory:
XC^R/Y
=
{division}
X-YC-iR
=
{negation}
X Y C (R =» 0)
=
{implication}
X- YHRC0
=
{Exercise (4.31)}
x n R Y° c 0
=
{implication; negation}
X C-i(/J. y°).
Notice that the above proof uses indirect equational reasoning, proving that R =
S
by showing that X C R =
X C £ for arbitrary X.
In our own experience, it is best to avoid proofs that involve negation as much as
possible, since the number of rules in the relational calculus becomes quite
unmanageable when boolean negation is considered.
Exercises
-n =
o
-¦# =
-¦-¦-¦#
-.(/ius) =
^Rn-^s
-.-.(jru-tR) = n
4.6 J Power allegories 103
4.42 Prove that a locally complete allegory is boolean iff R U -yR = U for all R.
4.43 In relational calculi that take negation as primitive, the use of division is often
replaced by an appeal to Schroder's rule, which asserts that
(-.T -S°C^R) =
(R-SCT) =
(R° -.T C -.5).
Prove that Schroder's rule is valid in any boolean allegory.
These three things are defined (up to unique isomorphism) by the following universal
property. For all R : A <- B and
/ : PA <- B,
f =
AR =
£-f =
R.
PA ^ B
g\ /r
A
(AR)b =
{a\aRb}.
Indeed, one might say that the definition of A is merely a restatement of the
Let us now see how the universal property of A can be used to prove some simple
identities in set theory. First of all, by taking / Ai2, we have the cancellation
=
property
£ AR =
R,
A(R-f) =
ARf,
which is valid only if / is a function.
For completeness we remark that the definition of (PA, A, e) can also be phrased as
asserting the existence of a terminal object in a certain category. Given an allegory
A and an object A, consider the category A/A whose objects are all arrows R of
A with target A, and whose arrows are those functions / : R 4- S for which the
following diagram commutes:
B -
r\ /s
A
Composition in AjA is the same as that in A. Now, the terminal object of AjA is
the relation £a -A 4- PA because
/ =
AR 4= f :£A<-R
and so AR is just another notation for \r.
Existential image
ER =
A(R-£).
In set theory we have
(ER)x =
{a | (36: aRbAbex)}.
It is easy to see from the reflection law AG id that E preserves identities. To
=
show that E also preserves composition, it suffices to prove the absorption property
ERAS =
A(R-S).
Taking S = T £ gives us ER ET =
E(R T), which is what is required.
ERAS A(R S)
=
=
{defining property of A}
ER AS =
R S
=
{definition of E, A-cancellation}
R AS = R S
=
{A-cancellation}
true
Pfx =
{fa a£x}.
In the following chapter we shall show how to extend P to a functor P : A 4- A.
Prom now on we will omit J in the composition JE, silently embedding Fun(A) in
A; thus we assume that E : A 4- A.
106 4 J Relations and Allegories
settheory, r takes an element and returns a singleton set. Using the fusion law
A(R /) AR /, we obtain
=
A/ =
A(«./) =
Auf./ =
T-/
and so
P/ r =
E/ Aid =
A(/ id) =
A/ =
r •/•
Thus r is a natural transformation P «- id. These and similar facts illustrate the
difference between P and E, for r is not a natural transformation E 4- id.
/x. Pr = id \i'T
/jL' /JL
=
/i P/i.
/x P/x =
/x PEG =
/x EEe =
E€ /x =
/x /x.
x subset y =
(Va : a x => a y).
Note the distinction between subset and C: the former models the inclusion relation
between sets, while the latter is the primitive operation that compares arrows in an
allegory.
order. Reflexivity and transitivity are immediate from the properties of division,
but the proof that subset is anti-symmetric, that is, subset n subset0 id, requires =
a little more effort, and in fact holds only for a unitary tabular allegory. Given this
4.6 J Power allegories 107
AR =
(e\R)n(R\e)°.
Anti-symmetry of subset then follows from the reflection law AG = id.
To prove the above equation for A, we invoke the rule of indirect equality and
establish that
xcar =
x c (e\R) n (R\e)°
for all X. Assume that (/, g) is a tabulation of X. We reason:
/ 9° C AR
=
{shunting of functions}
f =
AJl-g
=
{fusion}
=
{universal property of A}
ۥ/ =
/*.p
=
{anti-symmetry of C}
(€-/C#-<?)and(fl.(?C€./)
=
{shunting of functions}
(€-/-0°Ci*)and(i^./°C€)
=
{division}
if 9° C e\R) and (g-rcR\€)
¦
=
{converse, meet}
f-g" C(e\R)n(R\€)°,
Exercises
is a preorder if and only if there exists a partial order S and a function / such that
R=f°-S-f.
f<9 =
€./C€.p.
RCS =
wlp S < wlp R.
(Exercise 4.50 will come in handy here.) Finally, show how to associate with any
predicate transformer p :PA<r- PB an arrow S : B 4- A so that
p < wlp R =
wlp S < wlp R
Bibliographical remarks
The calculus of relations has a rich history, going back to (De Morgan 1860), (Peirce
1870) and (Schroder 1895). The subject as we know it today was mostly shaped
by Tarski and his students in a series of articles, starting with (Tarski 1941). An
overview of the origins of the relational calculus can be found in (Maddux 1991;
Pratt 1992).
Although we have omitted to elaborate on this, the axioms introduced here, namely
those of a tabular allegory that has aunit and power objects, precisely constitute
the definition of a topos. There are many books on topos theory (Barr and Wells
1985; Goldblatt 1986; Mac Lane and Moerdijk 1992; McLarty 1992), and several of
the results quoted here are merely special cases of theorems that can be found in
these books.
During the 1970s the calculus of relations was applied to various areas of computing
science, e.g. (De Bakker and De Roever 1973; De Roever 1972, 1976). This work
see
culminated in (Sanderson 1980), where it was suggested that relational algebra could
be aunifying instrument in theoretical computing. Our own understanding of the
applications to program semantics has been much influenced by (Hoare and He
1986a, 1986b, 1987; Hoare, He, and Sanders 1987). It is probably fair to say that
until recently, most of this work was based on Tarski's axiomatisation; the standard
reference is (Schmidt and Strohlein 1993). Related references are (Berghammer
and Zierer 1986; Berghammer, Kempf, Schmidt, and Strohlein
1991; Desharnais,
Mili, and Mili 1993; Mili 1983; Mili, Desharnais, and Mili 1987). (Mili, Desharnais,
and Mili 1994) is particularly similar to this book, in that it focuses on program
derivation.
A topic that we shall not address in this book is that of executing relational
Clearly, it would be desirable to do so, but it is as yet unclear what the model
expressions.
of computation should be. One promising proposal, which involves rewriting using
the axioms of an allegory, has been put forward by (Broome and Lipton 1994).
Chapter 5
Datatypes in Allegories
The solution proposed in this chapter is to define all relevant datatype constructions
in Fwi(A), and then extend them in
canonical way to A. In fact, we show
some
that the base functors of datatypes in Fun(A), the power functor, and type functors
can all be extended to monotonic functors of A. In particular this means that a
to the success of the whole enterprise is the notion of tabulations introduced in the
preceding chapter.
As result, catamorphisms can be extended to include relational algebras.
a
5.1 Relators
As we shall see in Theorem 5.1 below, a relator can also be characterised as a functor
on relations that preserves converse, that is,
(FR)° =
F(R°).
As a first step towards proving this result, we prove the following lemma.
F/-F(/°) =
F(/./°) C Fid =
id
F(/°)-F/ =
F(/°./) D Fid =
id.
Now recall Proposition 4.1 of the preceding chapter, which states that R is a function
if and only if there exists an S such that R S C id and id C S R. Furthermore,
these two inequations imply that S R°. It follows that F/ is a function with
=
converse F(/°).
?
F(R°) =
F((f-g°)°) =
F(g-f°) =
Fg-F(f°)
(FR)° =
(Ff.F(g°))° =
(Ff.(Fg)°)° =
Fg.(Ff)° =
Fg F(/°).
¦
Thus F(R°) =
(FR)°.
For the reverse direction again use tabulations. Suppose R h-k°Cf-g°
we S,
with (/, g) jointly By Proposition 4.2 of the preceding chapter there exists
monic.
a function m such that h f m and k
=
g m. Hence, we can reason:
=
FR
=
{definition of R and F a functor}
Fh ¦
F(k°)
=
{F preserves converse}
Fh¦(Fk)°
5.1 J Relators 113
=
{definition of m and F is a
functor}
F/.Fm-(Fm)°-(F</)0
C {since Fra is a function and so is simple}
F/F5°
=
{Fa functor and definition of S}
FS,
and so F is monotonic.
FR =
F(f g°) =
F/ (Fg)° =
G/ (Gg)° =
GR.
One consequence of Theorem 5.1 is that, when F is a relator, it is safe to write FR°
tomean either (FR)° or F(i?°), a convention we henceforth adopt.
Exercises
5.2 Show that any relator preserves meets of coreflexives, that is,
F(XnY) =
FXDFY,
O, if ^ {>
{
=
fA =
(^ {0}, otherwise.
Now suppose F can be extended to a relator on Rel. Consider the constant functions
one, two :
{1,2} {0} returning
<- 1 and 2 respectively. Use the definition of F to
show that F(one° two) id, where =
id :
{0} <- {0}.
Now use the assumption that F preserves converse to show that F(one° two) =
0.
(This exercise shows that not every functor of Fun can be extended to a monotonic
functor of Rel.)
F(domR) =
dom(FR).
Let us now see how we can extend the product functor to a relator. Recall from the
discussion of units in the preceding chapter that Fun(A) has products whenever A
is a unitary tabular allegory. Recall also that (outl, outr) is the tabulation of II and
the pairing operator satisfies
(/,<?) =
(outl°-f)n(outr°-g).
This immediately suggests how pairing might be generalised to relations: define
(R,S) =
(outl°'R)n(outr°'S). (5.1)
The interpretation of (R, S) in Rel is, of course, that (a, b)(R, S)c when aRc and
bSc. Given (5.1), we can define x in the normal way by
RxS =
(R outl, S outr). (5.2)
Note that the same sign x is used for the generalised product construction (hereafter
called the relational product) in A as for the categorical product in the subcategory
Fun(A). However, relational product is not a categorical product.
The task now is to show that relational product isa monotonic bifunctor. First,
it is clear that (i?, S) is monotonic both in R and 5, and from the definition of x
we obtain
by a short proof that (R x 5)° R° x S°. Thus x preserves converse.
=
Furthermore, x preserves identity arrows (because they are functions), so the nub
of the matter is to show that it also preserves composition, that is,
(RxS)-(UxV) =
(R-U)x(S'V).
This result follows from the more general absorption property
(RxS)-(X,Y) =
(R-X,S-Y). (5.3)
5.2 J Relational products 115
The proof in the other direction is a little tricky, so we will break it into stages.
Below, we will give a direct proof of the special case
(R-X,S- Y)
Q {(5.4)}
(Rxid).(X,S- Y)
C {(5.5)}
(R x id) (S x id) (X, Y)
C {since (R x id) (id x S) C (R x
S) (exercise)}
(RxS)- (X, Y).
To prove (5.4) we argue:
{R-X,Y)
=
{(5-1)}
(outl° -R-X)n (outr° Y) ¦
=
{claim: outl (R x id) R outl for all R; converse}
=
C {modular law}
(R x id) ((outl° X) n ((R° x id) outr° Y))
¦ ¦ ¦
=
{(5-1)}
{R x id) {X, Y)¦
In the above calculation we appealed to two claims, both of which follow from the
more general facts
outl (R, S)
=
{(5.1)}
outl ((outl° R) fl (outr° S))
=
{modular identity, since outl simple; outl outl° =
id}
R fl (outl outr° 5)
=
{since outl ot/^r° II} =
R H (n S)
=
{Exercise (4.27)}
R -
dom S.
Equation (5.6) indicates why (outl, outr) does not form a categorical product in the
allegorical setting: for any arrow R, we have outl (R, 0) 0, not R. =
(R,S)°-(X,Y)
=
{converse, absorption (backwards)}
(id, id)° (R° x S°) (X, Y)
=
{(5.3)}
(id, id)° (#° -X,S°-Y)
=
{(5.1)}
(zrf, «)° ((outl° .R°.X)n (outr° S° F))
=
{distribution over meet, since (id, id) is a function}
((id, id)° outl° -R°'X)n ((id, id)° outr° 5° Y)
=
{products}
(R°-X)n(S°- Y).
It is worth while observing that all the above equations and inclusions can also be
proved by appeal to the meta-theorem of Section 4.3. Such indirect proofs are
an
Exercises
objects A and B is an object AB and an arrow eval : A <— AB x B such that for
each f : A<- C x B there is a unique arrow curryf : AB <- C such that
g =
curry f
=
eval (g x id) =
f.
Fortunately, coproducts are simpler than products, at least in the setting of power
allegories. Let (mZ, inr, A + B) be the coproduct of A and B in Fun(A), where A
is a power allegory. Then it is also a coproduct in the whole allegory A:
T inl =
R and T inr =
S
=
{power transpose isomorphism}
A(T inl) AR and A(T inr) AS
= =
=
{A fusion (backwards)}
AT inl =
AR and AT inr =
AS
=
{coproduct of functions}
AT [ARyAS]
=
=
{A-cancellation}
T e-[AR,AS].
=
Hence we can define [/?, S] G [AR,AS]. The following diagram illustrates this
calculation:
118 5 j Datatypes in Allegories
The border arrows of the diagram also suggests an explicit formula for [/?, 5]:
[R,S] =
(R'inl°)U(S'inr°). (5.9)
The proof of (5.9) is left as Exercise 5.12.
Given the definition of [/?, 5], we can define the coproduct relator + in the usual
way by
R + S =
[inl-R,inr>S\. (5.10)
It is easy to check that + is monotonic in both arguments, so + is a relator. In
analogy with products, we obtain the useful cancellation law
[R,S]-[U, V]° =
(R-U°)\J(S-V°). (5.11)
The proof is easier than the corresponding cancellation law for products, and details
are left as an exercise.
Exercises
5.12 Define X =
[zrf,0]. Use the universal property of coproducts to show that
X inl =
id and X inr =
0. Give the corresponding equations satisfied by Y =
(inl X) U (inr Y) =
[inl, inr] = id.
[R,S] =
(R-inl°) U (S-inr°).
5.13 Prove (5.11). Why not say simply that this result follows by duality from the
corresponding law for products?
5.14 Prove the equation
(R + S)n([u,v]°-[P,Q}) =
(Rn(u°-P)) + (sn(v°-Q)).
5.4 / The power relator 119
Next we show how to extend P to a relator. Recall from the last chapter that,
over functions, P was defined as the restriction of existential image to functions:
P =
EJ. Furthermore, we know that P/ =
A(/ G) because ER =
A(R g), and
we also know from the final section of the preceding chapter the explicit formula
AR =
(e\R) n (-R\g)° for A. Putting these bits together we obtain
P/ =
(G\(/.G))n((/-e)\G)°.
The second term can be rephrased using the fact that (/ R)\S =
R\(f° 5), so
P/ =
(e\tf.e))n((9./)/9),
where 3 is a convenient abbreviation for G°.
Pi? =
(e\(*.e)) n ((s-R)/s).
In Rel, this reads
x(PR)y =
(\faex:3bey: aRb) A (V6 G y : 3a G x : aRb).
In words, if x(PR)y, then every element of x is related by R to some element in y
and conversely.
PR PS C P(R-S),
observe that the right-hand side is the meet of two relations. By the universal
properties of meet and division, the result follows if we can prove the inclusions
G Pi? PS C R-S-e
PR PS 3 C 3-R-S.
Both follow from the definition of P and the cancellation laws of division.
Now for the hard part, which is to show that P(R S) C Pi? PS. The proof involves
tabulations. Let (#, z) be a tabulation of P(i? S) and define
y
=
A((R° G x) H (S G z)).
120 5 / Datatypes in Allegories
The first inclusion follows because y is a function and functions are entire. For
the second inclusion, we prove that x y° C Pi?, and appeal to the symmetrical
argument to obtain y z° C PS.
By definition of division, x -
y° C. PR is equivalent to
G £ y° C i? G and £ y° 9 C 9 i?.
g a; y° c i?. g
=
{shunting of functions}
e-xCR-e-y
=
{definition of $/, A-cancellation}
G a: C R ((i?° G a;) fl (5 G *))
<£= {modular law}
G a: C (G a;) fl (R 5 G z) -
=
{definition of meet}
€-x CR-S -e-z
=
{shunting of z\ division}
x-z° c e\(R 5 g)
=
{since a: 2° P(i? S)} =
true.
x 1/° 9
=
{definition of $/, A cancellation}
a: ((i?° e a;) n (5 G *))°
C {monotonicity, converse}
x a;° 9 i?
C {since a; is simple}
E)i?.
Exercises
(X- <x =
RFX) =
(X =
€-<[A(J!.F€)5). (5.12)
The proof is:
X•a= R•FX
=
{A is an isomorphism}
A(X a) A(R FX) =
=
{A cancellation (backwards)}
A(X a) A(R F(€ AX))
=
=
{relators; A fusion (backwards, twice)}
AX a A(R Fe) FAX
=
=
{catamorphisms of functions}
A* ([A(Jl-FG)D
=
=
{A cancellation}
x =
e- <[A(R Fe)]).¦
GA(*.F€)5 t
F<[A(R Fe)])
r
¦
P4 ^— FP4
A{R Fe)
¦
e Fe
j 1- ^ FA
R
122 5 / Datatypes in Allegories
a«D =
<[A(R FG)B.
Equivalently, A([i?]) ([A(i? Fe)]). This identity was first exploited in (Eilen-
=
berg and Wright 1967) to reason algebraically about the equivalence between
deterministic and nondeterministic automata. For this reason we will refer to it
subsequently as the Eilenberg-Wright Lemma,
Type relators
Let F be a binary relator with initial type (a,T), so T is a type functor. To show
that T is a relator, it is sufficient to prove that it preserves converse:
(JR)° J(R°)
=
=
{definition of T, catamorphisms}
(Ti?)°.a =
a.F((Ti?)°,ii0)
=
{since a is an isomorphism}
(TR)0 =
a-F((TR)°,R0)-a0
=
{converse, F relator}
TR a?(TR,R)a°
=
=
{as before}
true.
Exercises
5.17 Provided the allegory is Boolean, every corefiexive C can be associated with
another corefiexive ~C such that
Cn~C =
0 and CU~C =
zd.
(C -> R, S) =
(R + S)- guard C.
RC(C->S,T) =
(R-CCS)tmd(R-~CC T).
This useful exercise gives us another way of modelling conditional expressions, and
is theone that we will adopt in the future.
5.6 / Combinatorial functions 123
5.18 Let F be a binary relator, with initial type (a,T). Suppose that F preserves
meets, i.e.,
F(RnX,SnY) =
F(R,S)nF(X,Y).
Show that T also preserves meets. Note that this proves, for instance, that the list
functor preserves meets.
5.19 Prove that ([/?]) is entire when R is entire. Hint: use reflection to show that
dom ([R]) =
id.
We also take the opportunity to fix some conventions. In the second half of the book
we will be
writing functional programs to solve a number of combinatorial problems
and, since cons-lists are given privileged status in functional programming, we will
stipulate that, in the future, lists mean cons-lists except where otherwise stated.
We will henceforth write list rather than listr to denote the type functor for lists.
Subsequences
subseq =
([nil, cons U outr]}.
The function Asubseq takes a list x and returns the set of all subsequences of x. This
definition is very succinct, but is not available in functional programming, which
does not allow either relations or sets. To express Asubseq without using relations,
recall the Eilenberg-Wright lemma, which states that
A([i?D =
([A(/l-F(W,G))B.
If we can find e and / such that
singleton set.
To find an appropriate expression for / we will need the power transpose of the join
of two relations. This is given by
A(RUS) =
cup-(AR,AS),
where cup =
A((G outl) U (G outr)) is the function that returns the union of
two sets. The proofof the above equation is a simple exercise using the universal
property of A, and we omit details.
Now we continue:
/ =
cup (Peons A(irf x G), ott£r),
and so
Asubseq =
([r ni/, c?/p (Peons A(irf x
G), omJt)]).
The final task in implementing Asubseq is to replace the sets by lists. The result is
simple enough to see if we write e and / in the form
e =
{[]}
f (a,xs) =
{cons (a, x) | x G xs}l)xs.
5.6 / Combinatorial functions 125
e =
[[]]
/ (a, xs) =
[cons (a, x) x <- xs] -H- #s.
subseqs =
([wrap mZ, ca£ (fe£ cons cpr, outr)]),
where cpr : list (A x B) <- A x listB (short for "cartesian product, right") is
defined by
cpr(a,x) =
[(a, b) | b <- x].
To justify the definition of subseqs we need the function setify : PA <- list A that
turns a list into the set of its elements:
setify =
([a;, cup (r x
id)]),
where u) returns the empty set. With the help of setify we can formalise the
relationship between each set-theoretic operation and the list-theoretic function that
implements it. For instance,
setify -nil = u)
setify wrap = r
setify cat =
cup {setify x setify)
setify concat = union seta/y to£ seta/y
seta/y fo£ / =
P/ sefa/y
seta/y cpr =
A(id x
G) (zrf x setify).
setify subseqs =
Asubseq.
Cartesian product
The function cpr used above implements one member of an important class of
combinators associated with cartesian product. Two other functions, cpp and cpl,
defined by
cpp(x, y) =
[(a, b) a <-x, b<-y]
cpl(x,b) =
[(a, 6) | a <-x],
126 5 / Datatypes in Allegories
setify cpp =
A(G x
g)
seta/y cpl =
A(G x
zd).
All three functions are among the combinators catalogued in the Appendix for useful
point-free programming.
The three functions are examples of a more general pattern. For a relator F, the
function
cp(F):PFA<-FPA
is defined by cp (F) AF(G). In particular, cpp is an implementation of
=
cp (F) for
FA = A x A, and cpl is a similar implementation when FA A x B. =
cp (list) =
([A[nil, cons (G x
G)]]).
Expanding this definition, we obtain
cp (list) =
([Anil,A(cons (G x
G))]) =
([r mZ, Peons A(G x
G)]).
The function cp implemented by a function cplist: list (list A) <- list (list A)
(list) is
obtained by representing sets by lists in the way we have seen above. The result is:
cplist =
([wrap nil, list cons cpp]).
The function cplist is another example of a useful combinator for point-free
programming. We will meet the cp-family again in Section 8.3.
The relation prefix describes the prefix relation on lists, so x prefix y when x-ti-z y =
for some z. Thus, prefix outl cat°. Alternatively, we can define prefix
= ?
init*, =
where init* is the reflexive transitive closure of the relation init = outl snoc° that
removes the last element of a list. The reflexive transitive closure R* of a relation
R will be defined formally in the following chapter.
We can also describe prefix by a relational catamorphism:
prefix =
([nil, nil U cons]).
5.6 J Combinatorial functions 127
The first nil in the catamorphism has type list A <- 1, while the second has type
list A «— A x list A. Strictly speaking, we should write the second one in the form
nil •!, where :1<- Ax list A,
Apreftx =
([r nil, cup (r nil, P cons A(id x
G))]).
Replacing sets by lists in the usual way, we obtain an implementation of Aprefix by
a function inits defined by
inits =
([wrap nil, cat (wrap nil, list cons cpr)]).
This translates to two familiar equations:
inits[] =
[[]]
inits ([a] -H- re) =
[[]] -H- [[a] -H- y y «- mite re].
Note that mite returns a list of initial segments in increasing order of length.
The relation suffix is dual to prefix, but we have to use snoc-lists to describe it as
a relational
catamorphism. Alternatively, suffix tail*, where tail outr cons° = =
removes the first element from a list. The implementation of Asujfix is by a function
tails that returns a list of tail segments in decreasing order of length. The two
equations defining tails are
tails[) =
[[]]
tails (x -H- [a])) =
[y -H- [a] | y <- tails x] -H- [[]].
This is not a legitimate implementation in most functional languages. Instead, one
taUs[] =
[[]]
tails ([a] -H- x)) =
[[a] -H- x] -H- tails x.
We will see in Section 6.7 how these equations are obtained. Alternatively, tails can
tails =
([wrap nil, extend]),
where
We can put zmte and tozte together to give an implementation of the function Acat°
that splits a list in all possible ways:
splits =
zip {inits, tails).
128 5 / Datatypes in Allegories
For this implementation to work it is essential that inits and tails order their
segments in opposite ways.
Partitions
is defined by partition =
concat0, where concat =
([nil, cat]) and cat is restricted to
the type list A <- list A* x list A, One can also express partition as a catamorphism
partition =
{[nil, new U glue]},
where
(id x
cons°).
The pointwise definitions are
(id x
concat)
concat glue C cons (id x
concat).
5.6 / Combinatorial functions 129
concat glue
=
{definition of
glue}
concat {cons x id) assocZ (td x cons°)
cons
We also have to show that [niZ, new U glue] is surjective, that is,
<?Zwe </Z?/e0 =
cons {cons cons° x id) cons*.
id,
Kpartition =
([Anil, A((new U glue) {id x
e))}.
130 5 / Datatypes in Allegories
partitions =
([wrap m7, concat fe£ (cons (new;, glues)) cpr]),
where #Zwes implements Aglue:
glues (a,[]) =
[]
glues (a, [x] -H- a») =
[[[a] -H- re] -H- a»].
The proof of seta/w partitions =
Apartition is left as an exercise.
Permutations
Finally, consider the relation perm that holds between two lists if one is a
permutation of the other. There are a number of ways to specify perm; perhaps the
simplest is to use the type bag A of finite bags over A as an intermediate datatype.
This type can be described as a functional F-algebra [bnil, bcons] of the functor
F(i4, B) = 1 + A x B. The function bcons satisfies the property that
bcons (a, bcons (6, x)) = bcons (6, bcons (a, #)),
which in point-free style reads
bcons (id x
bcons) =
bcons -
where exch : B x (A x
C) <— A x (B x
C) is the natural isomorphism
exch =
assocr (swap x
id) assocl.
The property captures the fact that the order of the elements in a bag is irrelevant
butduplicates do have to be taken into account. The function bagify : bag A <-list A
turns a list into the bag of its elements, and is defined by the catamorphism
bagify =
([bnil, bcons]}.
5.6 / Combinatorial functions 131
Since every finite bag is the bag of elements of some list, bagify is a surjective
function, so bagify bagify0 = id.
perm =
bagify0 bagify.
In words, # is a permutation of y if the bag of values in x is equal to the bag of
values in y. In particular, it follows at once from the definition that perm perm°. =
The above specification of perms does not lead directly to a functional program for
computing Aperm. One possibility is to express perm as a catamorphism perm =
([nil, add]) and then follow the path taken with all the examples given above. It is
easy to show (exercise) that
perm cons =
perm cons -
(id x
perm),
so we can take add =
perm cons, although, of course, the result is not useful for
computing perm. An alternative choice for add is the relation
add =
cat (id x
cons) exch (id x ca£°).
In words, add (a,x) =
y-\\-[a}-\\-zwhere y-H-2 #, so add (a,re) adds a somewhere
=
to the list x. Although this definition of add is intuitively straightforward, the proof
that perm =
([nil, add]) depends on the fact that bags can be viewed as an initial
algebra, and won't go into it.
we The function A add can be implemented as a
adds (a, x) =
[y -H- [a] -H- z (y, z) <- splits x],
where splits is the implementation of Acat° described above. The function perms
that implements Aperm is given by
perms =
([wrap nil, list add cpr]).
We will meet perm again in the following chapter when we derive some sorting
algorithms.
Exercises
A(RUS) =
cup (AjR,AS)
A(Rf)S) =
cap-{AR,AS)
A(R x S) = cross (AR x AS).
132 5 / Datatypes in Allegories
setify nil =
uj
setify wrap = r
seta/y cons =
cup (r x
setify)
setify- list f =
P/ setify.
catamorphism. How?
5.25 Prove that
new new° =
cons (wrap wrap0 x zrf) cons°
glue <7fa/e° =
cons (cons cons° x zrf) cons°.
Apartition =
([Am7, c?/p (Pnew, ttmon PAgfee) A(id x
E)]),
and hence find another implementation of Apartition.
5.28 Using bagify bagify0 =
id, show that per7n cons =
per7n cons (zrf x perm).
5.29 A list # is an interleaving of two sequences y and 2 if it can be split into a
inequation. Formally, a collection of arrows <j>a •' FA <- GA is called a lax natural
transformation, and is denoted by <j>: F f-3 G, if
FA+?-GA
FR GR
FB +—GB
<P
Re D G-Pfl,
The main result concerning lax natural transformations is the following theorem.
Theorem 5.2 Let F, G : A <- B be relators and J : B <- Fun(B) the inclusion of
functions into relations. Then 0:F«-^G =
0:FJ<f-GJ.
Proof. First, assume that <j>: F «-^» G, so in particular we have F/ <j> D <j> G/. But
also
=
{shunting of functions}
=
{inequation (5.13) with R=f°}
true,
and so F/ <j> =
<j> G/ for all /.
Conversely, assume that <j> : FJ <- GJ, so F# <j> <j> G# for all functions g. By =
shunting of functions, this gives Fg° 0 D 0 G#° since F and G are relators and
thus preserve converse.
134 5 / Datatypes in Allegories
=
{let (/, g) be a tabulation of R}
Hf-9°)-<f>
=
{relators}
Ff-Fg°-</>
D
{above}
=
{since <j>: FJ <- GJ}
0.G/.G^°
=
{relators}
=
{since / g° tabulates R}
</>FR.
D
Exercises
5.30 Each of the following pairs is related by C or D. State in each case what the
relationship is:
PRr and r R
(R x
R) (id, id) and (id, id) R
Bibliographical Remarks
The notion of a relator was first introduced in (Kawahara 1973a); the concept
then went unnoticed for long time, until it was reinvented in (Carboni, Kelly,
a
and Wood 1991). Almost simultaneously, Backhouse and his colleagues started to
write series of papers that demonstrated the relevance of relators to computing
a
(Backhouse et al. 1991, 1992; Backhouse and Hoogendijk 1993). The discussion in
this chapter owes much to that work. Our definition of relators is more restrictive
than that of (Mitchell and Scedrov 1993).
Several authors have considered the use of relational products in the context of
program derivation, e.g. (De Roever 1976). The (sometimes excruciating) symbol
Bibliographical Remarks 135
manipulations that result from their introduction can be made more attractive by
adapting a graphical notation (Brown and Hutton 1994; Curtis and Lowe 1995).
As already mentioned in the text, relational algebras and catamorphisms were
first employed in (Eilenberg and Wright 1967) to reason about the equivalence
of deterministic and nondeterministic automata. This work was further amplified
in (Goguen and Meseguer 1983). Numerous examples of relational catamorphisms
can be found in (Meertens 1987), and these examples awakened our own interest
in the topic (Bird and De Moor 1993c; De Moor 1992a, 1994). The circuit design
language Ruby is essentially a relational programming language based on
catamorphisms (Jones and Sheeran 1990).
Recursive Programs
The recursive programs we have seen so far have all been based, one way or the
other, on catamorphisms. But not all the recursions that arise in programming
are homomorphisms of a datatype. For example, one may want to implement the
converse of a catamorphism, or a divide and conquer scheme.
deriving fast exponentiation, one or two sorting algorithms, and an algorithm for
computing the closure of a relation.
The problem in this section is simply to convert a natural number to its decimal
representation. The decimal representation of a nonzero natural number is a list of
digits starting with a nonzero digit. The representation of zero is exceptional in this
respect, in that it is a list with one element, namely zero itself. Having observed
this anomaly, we shall concentrate on deriving an algorithm for converting positive
natural numbers.
The first step is to specify the problem formally. The types involved are four in
number: the type Nat* of positive natural numbers; the type Digit {0,1,..., 9} =
ofdigits; Digit*
the type {1,2,..., 9}
= of
digits; and
finally the type of
nonzero
FA =
Digit* + (A x Digit).
The function val: Nat* <- Decimal is a catamorphism
val =
([embed, op]),
where embed : Nat* <- Digit* is the inclusion of digits into natural numbers, and
op(n, d) =
lOn + d. To check this, let x be the decimal
x =
snoc (snoc (wrap d%,di), do).
Then we have
valx =
10(10^2 + di) + (h =
102d2 + 101d1+ 10°d<).
We can now specify the function digits, which takes a number and returns its decimal
representation, by
val°
=
{definition}
([embed, op]}°
=
{catamorphisms}
([embed, op] Fval [wrap, snoc]° )°
=
{converse}
[wrap, snoc] Fval° [embed, op]°
=
{definition of F}
[wrap, snoc] (id + val° x id) [embed, op]°
=
{coproduct}
[wrap, snoc (val° x id)] [embed, op]°
=
{coproduct}
(wrap embed0) U (snoc (val° x id) op°).
-
val° =
(wrap embed0) U (snoc (val° x
id) op°).
6.1 j Digits of a number 139
In order to see what relation is given by op° : Nat* x Digit <- Nat+, we reason:
op(n, d) ra =
=
{definition of op}
lOn + d = ra
=
{arithmetic and 0 < d < 10}
n = ra div 10 A d =
m mod 10.
To obtain the right type for op° we need to ensure that 0 < n in the above
calculation, and this means require 10 precondition. So op° is a
that we < ra as a
partial function, defined if and only if its argument is at least 10. On the other
hand, embed0 is also a partial function, defined if and only if its argument is less
than 10. The join in the recursive equation for val° can therefore be replaced by a
conditional expression, and we obtain
if
/o _
[ra], if ra < 10
digits ra = <
div mod otherwise.
digits (ra 10) -H- [ra 10],
The program runs in quadratic time because the implementation of snoc on lists
takes linear time. To obtain linear-time program we
a can introduce an
M*** if 10
f(m
J v x); '
= i m
otherwise.
<
Simple is, the digits of a number example illustrates a basic strategy for
as it
program derivationusing a relational calculus. First, a function of interest is specified
as a refinement of some relation R. Then, after due process of manipulation, R is
Exercises
The key result for reasoning about least fixed points is the celebrated Knaster-
Tarski theorem (Knaster 1928; Tarski 1955), which in our terminology is as follows:
has a least solution and these least solutions coincide. Dually, each of the equations
X C <j)X and X =
<j)X has a greatest solution and these greatest solutions coincide.
Proof. Let X =
{X (j>X C X} and define R =
C\X. We first show that <j>R C R,
or, equivalently, that X E X implies (j>R C X. We reason:
X eX
=> {definition of R}
RCX
=> {(j) monotonic}
<f>RC<f>X
=> {since <j>X C X}
</>RCX.
But now, since R G X, it follows that X =
R is the least solution of <j>X C X. It
RC<j>R
<= {definition of R}
<t>{<t>R) C <j>R
<= {since <j> monotonic}
<j)RC R
=
{above}
true.
D
6.2 J Least fixed points 141
Let consider what the Knaster-Tarski theorem says about datatypes and
us now
X =
<[R]) =
Xa =
RFX,
Exercises
6.2 Where in the proof of the Knaster-Tarski theorem did we use the locally
complete property of the allegory?
6.3 Say that <j> is continuous if it preserves joins of ascending chains of relations.
That is, if X0 C Xx C X2..., then </)([j{Xn 0 < n}) \J{<j>Xn | 0 < n}. Prove =
Kleene's theorem (Kleene 1952) which states that, under the conditions of the
Knaster-Tarski theorem and the assumption that <j> is continuous,
(HX:<J>X) =
U{<£n0 I 0 < n},
where <j>nX =
<j>X <j)X <j>X (n times).
6.4 Use the Knaster-Tarski theorem to justify the following method for showing
(pX <j)X) : C A: show that <j>A C A.
Use Kleene's theorem to justify the alternative method: show that X C A implies
<j>X C A. This method is called fixed point induction.
142 6 / Recursive Programs
6.5 If (j> is a monotonic mapping, then the least solution of <j)X C X satisfies
<j>X = X. Show that this is a special case of Lambek's lemma when the partial
6.3 Hylomorphisms
The composition of a catamorphism with the converse of a catamorphism is called
a hylomorphism. Thus hylomorphisms are expressions of the form ([/?]) ([5])°.
Hylomorphisms are important because they capture the idea of using an intermediate
data structure in the solution of a problem.
More precisely, suppose that R : A <- FA and also that S : B <- FB. Then we have
([#]) ([S])0 : A <r- B, where ([/?]): A *- T and ([5])° : T *- B, and where T is the
initial type of F. The type T is the intermediate data structure.
Theorem 6.2 Suppose that R : A <- FA and S : B <- FB are two F-algebras.
Then ([#]) ([5])° : A *- B is given by
m-W =
{liX:R'FX-S°).
Proof. First we show that ([/?]) ([5])° is a fixed point:
WMS])0) s°
^
=
{functors}
i?-Fp])-F([S])0.S°
6.3 J Hylomorphisms 143
=
{catamorphisms}
([Z?D.a.F([S])0.S0
=
{converse; catamorphisms}
<im m°-
¦
and appeal to Knaster-Tarski. The proof makes use of the division operation, a
fl^D W £ x
=
{division}
When FX =
GX + HX, so F-algebras are coproducts, we can appeal to the following
corollary of Theorem 6.2:
Corollary 6.1
[^^.(GJr + HJT).^!,*]0
=
{coproduct}
[R1-GX,R2-HX]-[SUS2}°
=
{coproduct}
(Hi GX 5i°) U (R2 HX .
52°)).
144 6 J Recursive Programs
Exercises
6.9 Specify the function that converts the binary representation of a number to its
octal representation as a hylomorphism.
6.10 Show that hylomorphisms preserve simplicity: if R is simple and S is simple,
then <[R]) ([5°D° is simple.
exp a =
{[one, mult a]).
This definition encapsulates the two equations a0 1 and a6+1 a x ab.= The =
computation of exp a 6 by the catamorphism takes 0(b) steps, but by using a divide
and conquer scheme we can improve the running time to 0(log b) steps.
To derive the fast exponentiation algorithm consider the type Bin of binary
numbers, defined by Bin UstlBit, where Bit=
{0,1}. For example, as an element of
=
Bin the number 6 is given as [1,1,0]. The partial function convert: Nat <- Bin
converts a well-formed binary number, that is, a sequence of bits that is either empty
or begins with a 1, into natural numbers and is given by a snoc-list catamorphism
convert =
([zero, shift]},
where shift: Nat+ <r- Nat x Bit is given by shift (n, d) =
2 x n + d.
exp a
=
{corollary to hylomorphism theorem}
(fjiX : (one zero0) U (op a (X x id) shift0))
exp a -
shift =
op a (exp a x
id).
By construction, zero and shift have disjoint ranges, so we can proceed as in the
digits of a number example and replace the join by a conditional expression. The
result is the following program for computing exp a b:
if 6 0
I *' =
ab =
~
Modulus computation
Exactly the same idea can be used to compute a mod b for natural a and positive
natural b. The curried function mod :
(Nat <- Nat) <- Nat+ is defined by the
catamorphism
modb =
([zero,succb]),
where succ 6o =
(o 6 l->0, o + l). The computation
=
of a mod b by this
method takes 0(a) steps. But, as before, we can argue:
mod b
D
{since convert is simple}
mod b -
convert convert0
=
{fusion, see below}
{[zero, op &]) convert0
=
{hylomorphisms}
(fiX : (zero zero0) U (op b (X x
id) shift0)).
The fusion step is justified by the equations
mod b shift =
opb (mod b x
id),
where op b (r, d) =
(n > b -> n 6, n) and n = 2 x r + d.
146 6 J Recursive Programs
if a =
0
if odd a.
0(log a) steps.
These simple exercises demonstrate how divide and conquer schemes can be
introduced by invoking a suitable intermediate datatype.
Exercises
6.11 Why do the programs for exponentiation and modulus terminate and deliver
functions?
X =
([zero, id]} ([zero, positive])0
on natural
numbers, where positive is the coreflexive that holds only on positive
integers (so positive succ succ°). The catamorphism ([zero, id]) describes the
=
constant function that always returns zero, and ([zero, positive]) describes the
coreflexive that holds only on zero. Hence X is again the coreflexive that holds only on
X =
[zero, id] (id + X) [zero, positive]0,
which simplifies to X =
(zero zero0) U (X positive). This equation has other
solutions, including X =
id.
Note also that [zero, positive]0 : 1 + Nat <- Nat is a function, as is [zero, id], but the
least solution of the recursion is not even entire. However, Exercise 6.10 shows that
if R and S are simple relations, then so is (fiX : R FX S).
It isimportant to know when a recursion equation X =
R FX 5° has a unique
solution, and when the solution is a function. It is not sufficient for R and 5° to
be functions, as we saw simple to state: we need the fact
above. The condition is
that member (F) 5° is an relation, where member (F)a : A <- FA is the
inductive
membership relation for the datatype FA. The two sections that follow explain the
essential ideas without going into full details.
6.5 j Unique fixed points 147
Inductive relations
r\xcx => na
for all X : A <- B. At the point level, this definition says that if
(Vc:cRa=>cXb) =? aXb
To take a simple example, let < be the usual ordering on natural numbers. For
fixed 6, the implication
aXb. As another example, take the relation tail outr cons°. The induction
=
principle here is that if a relation holds for a list x whenever it holds for tail #, then
it holds for every list.
There is another way to define the notion of an inductive relation, but it requires
theallegory to be Boolean. A relation R : A <- A is well-founded if
XCXR => XC0
for all X : B <- A. This corresponds to the set-theoretic notion that there are no
infinite chains ao, oi,... such that ai+iRai for all i > 0. If a relation is inductive,
then it is also well-founded, but the converse holds only in a Boolean allegory.
Membership
The other key idea is membership. Data types record the presence of elements, so
onewould expect a relator F to come equipped with a membership arrow membera
'
relators that arise in programming (the polynomial relators, the power relator, and
the type relators) do. Here are the membership relations for the polynomial relators,
in which we write member (F) to emphasise the dependence on F:
member
(id) =
id
member (Ka) =
0
member (F 4- G) =
[member (F), member (G)]
member (F x G) =
(member (F) outl) U (member (G) outr)
member (F G) =
member (G) member (F).
Most of these are intuitively obvious, given the informal idea of membership. For
example, in Rel the relator FA = AxA returnspairs of elements and # is a member
of a pair (y, z) if x =
y or x =
z. On the other hand the constant relator Ka(B) A =
The membership relation for the power relator is G, as one would expect. That
leaves the membership relation for type relators. In a power allegory, the problem
of defining the membership relation for a type functor T is the same problem as
defining setify for the type. We have
member (T) =
G setify (T)
setify (T) =
Amember(T).
There is an alternative method (see Exercise 6.17) for defining the membership
relation of type functors that does not depend on sets.
So far we have not said what it means for a relation to be a membership relation.
One might expect that the formal definition would be straightforward, but in fact
it is not and we will not discuss it in the text (but see Exercise 6.18). If F does have
a membership relation member, then
R member D member FR
Consequences
The central result about the existence of inductive relations is that member (F) a°
is inductive, where a is the initial F-algebra. For example, consider the initial type
([zero, succ], Nat) of the functor FX = 1 + X. The membership relation here is
[0, id], so we now know that
The theorem referred to earlier about unique solutions is the following one.
entire.
Corollary 6.2 Suppose member (F) g is inductive. Then the unique solution of
X f FX g is a function.
=
Proof. The unique solution is X ([/]) ([<7°D°, which is entire by the theorem,
=
since / and gare. But Exercise 6.10 shows that the solution is also simple, since /
and g are.
For the next result, recall that R is surjective if id C R R°. Thus, R is surjective
if and only if R° is entire.
Proof. X =
a FX R° has the unique solution X =
([R])0.
?
Using these results, we can now prove the theorem used in Section 5.6 to justify the
definition of partition as a catamorphism.
=
{shunting and ([a]) =
id}
/ d^D C ([a])
<= {fusion}
f-RCa-Ff
<= {assumption}
true.
f° C ([R])
<= {claim: ([/?]) is surjective}
fl*D-([/q)0-/0£ d^D
«= {since / ([-R]) £ «d from above; converse}
true.
By Corollary 6.3 the claim follows by showing that member R° is inductive. But
member R°
C {since /-iJCa-F/, shunting}
member F/° a° /
C {since member : id <-^> F}
/° member a° •/.
Exercises
6.15 Show that the meet of two inductive relations is again inductive. Give a
6.6 J Sorting by selection 151
counter-example to show that the join of two inductive relations need not be
inductive.
FR -
(member\id) =
member\R
for all R. Show that F has a membership relation member if and only if FR
(member\S) =
member\(R S) for all R and S.
6.19 Assume that id is the largest lax natural transformation of type id <-^> id, and
that relator F has a membership relation member. Show that member is the largest
lax natural transformation of type id <-^ F.
6.20 Prove that for any relators F and G, the relation member (F)\member (G) is
the largest lax natural transformation of type F <-^> G.
6.21 Prove that in a Boolean allegory member (F) is entire if and only if F0 =
0.
terminology is total but this invites confusion with the quite different idea of an
entire relation. The function sort is specified by
where perm was defined in the preceding chapter, and ordered is the coreflexive
that tests whether a list is ordered under R.
ordered =
([nil, cons ok]},
where the coreflexive ok is defined by the corresponding predicate
ok (a, x) =
(V6 : b inlist x : aRb).
The relation inlist : A <— list A is the membership relation for lists. Thus ordered
rebuilds its argument list, ensuring at each step that only smaller elements are
added to the front. There is an alternative definition of ok, namely,
ok(a,x) =
(x =
[] V aR(headx)),
but this definition turns out not to be so useful for our purposes.
Selection sort
ordered perm
=
{since perm perm° and ordered ordered0}
= =
(perm ordered)0
=
{since ordered ([nil, cons ok]}}
=
perm cons ok
=
{since ([nil, perm cons]) (Section 5.6)}
perm =
=
{claim: (id x perm) ok ok (zrf x perm) (Exercise 6.22)}
=
In words, se/ec£ is defined by the rule that if (a,y) select x, then [a] -W- =
y is
a permutation of x with aRb for all elements b of y. The relation se/ec£ is not
a function because it is undefined on the empty list. But we do want it to be a
The functions base and step are specified by the fusion conditions:
C ok -
cons° perm wrap
base a =
(a,[])
iiaRb
stev(a (b x))
step(a,(0,x))
-
I ^'W^*)'
| (&,[a]Hf a), otherwise.
X =
(nil nil°) U (cons (id x
X) select),
so we can implement sort by
ifx =
[]
sort x =
{ [o]-W- sort y, otherwise
where (a, y) =
select x.
154 6 j Recursive Programs
Quicksort
development of Hoare's quicksort (Hoare 1962), which follows the path of selection
sort quite closely.
flatten =
([nil, join]},
where join (x,a,y) x -H- [a] -H- y. Thus flatten produces a list of the elements in
a tree in left to right order.
ordered perm
D {since flatten is a
function}
ordered flatten flatten0 perm
=
{claim: ordered flatten =
flatten inordered (see below)}
flatten inordered flatten0 perm
=
{converses}
flatten (perm flatten inordered)0
D {fusion, for an appropriate definition of split}
flatten ([nil, split0])0.
an intermediate datatype.
inordered =
([null, fork check])
where the coreflexive check holds for (x, a, y) if
To establish this condition we need the coreflexive check' that holds for (x,a,y) if
We now reason:
Formal proofs of the three claims are left as exercises. In words, split is defined
by the rule that if (y, a, z) =
split x, then y -H- [a] -H- z is a permutation of # with
6i?a for dll b in y and ai?6 for all b in z. As in the case of selection sort, we can
implement split with a catamorphism on non-empty lists:
split =
([base, step]} embed.
base a =
([],<*,[])
if aRb
stev(a (x b v))
step(a,{x,b,y))
-
I (M 6?^*>^6'2/)'
| (x? [a] ^)? otherwise.
X =
(nil niZ°) U (join (X x id x X) spZif).
156 6 j Recursive Programs
N, ** []
f
=
[ where ($/, a, 2) =
split x.
The derivation of quicksort is thus very similar to that of selection sort except for
the introduction of trees as an intermediate datatype.
Exercises
6.22 Using Exercise 6.19, show that inlist inlist perm. Give a point-free =
definition of ok.
Using the preceding exercise, prove that (idxperm)-ok ok-(idxperm). =
6.23 Why is the recursion in the programs for selection sort and quicksort
guaranteed to terminate?
6.24 Writing ordered R to show explicitly the dependence on the preorder Z2, prove
6.25 Consider the problem of sorting a (finite) set. Why is the second of the
specifications
sort C ordered R setify0
sort C ordered (R fl neq) setify0,
more sensible? The relation neq satisfies a neq b if a ^ b. Develop the second
specification to a version of selection sort, assuming that the input is presented as
a list possibly with duplicates.
6.26 Sort using the type tree A as in quicksort, but changing the definition of
flatten =
{[nil Join]) by taking join (x, a, y) =
[a] -H- x -H- y.
merge ([],y) =
y
t\ i n m ,, f [a)-W-merge(x,[b]-W-y), if aRb
merge([a]*x,[b} + y)
{{^^[[afi^
=
otherwise.
6.29 What goes wrong if one attempts to sort using the intermediate datatype
tree A ::= null tip A fork (tree A, tree A) ?
6.7 J Closure 157
add =
cat (id x
cons) exch (id x cat°),
and exch = assocr (swap x
id) assocl. Using this characterisation of perm, we can
reason:
ordered perm
=
{using perm ([nil, add]}}
=
6.7 Closure
A good illustration ofthe problem of how to compute the least fixed point of a
recursion equation, when other fixed points may exist, is provided by relational
closure. For every relation R : A<- A, there exists a smallest preorder R* containing
R, called the reflexive transitive closure of R. Our primary aim in this section is
to show how to compute E(R*) : PA <— PA whenever the result is known to be a
finite set the computation will terminate). Many graph algorithms make use of
(so
such a computation, for instance in determining the set of vertices reachable from
a given vertex.
R C X =
R* C X for all preorders X.
ZT =
(fiX : id U (X R)) (6.7)
R* =
(fiX :id U (R-X)). (6.8)
The proof that these definitions coincide is left as an exercise.
158 6 J Recursive Programs
id C id U (S-R) C 5,
R C id U R C id U (S R)
-
C S,
S-SCS
=
{left-division}
SCS\S
<= {definition of S}
id U (S\S) -RCS\S
=
{division}
S (id U (5\5) .U)CS
=
{composition over join}
S U (S (5\5) -fi)CS
<= {cancellation of division}
5 U (S fl) C 5
=
{definition of S}
true.
Note the similarity of the proof to that of Theorem 6.2 with a switch of division
operator. Finally, suppose X is a preorder that contains R. Then we have
id U (X R) C id U (X-X) C X,
and so S C I.
Computing closure
Asuffix =
cup (r,A(suffix faw/)).
6.7 J Closure 159
Using the fact that tail is not defined on the empty list, we can introduce a case
analysis, obtaining
(Asuffix)[] =
{[]}
(Asujfix) ([a] -W-x) =
{[a] -H- x) U (Asuffix) x.
Representing sets by lists in the usual way, we now obtain the following recursive
program for computing the function tails of Section 5.6:
tails[] =
{[}]
tails ([a] -H- x) =
[[a] -H- x] -H- tails x.
All this is very straightforward, but the method only works because the basic
recursion equation has a unique solution. In this section we show how to compute
A(R*) when R is not an inductive relation.
possible.
R-SCT =
RCSUT.
R-Q =
R
RUS =
RU(S-R)
fl-(5UT) =
R-S-T
(RUS)-T =
(R-T)U(S-T).
In the third identity the subtraction operator is assumed to associate to the left, so
R S—T=(R S)—T. Use of these rules will be signalled just with the hint
subtraction.
We will also make use of the following property of least fixed points, called the
rolling rule:
ifiX:(M,X)) =
^((iX-.tP^X)).
The proof of the rolling rule is left as an exercise, as are two other identities for
R* S =
(fiX : S U (R X))
This too is left as an exercise.
0(P,0) = P U {jjlX : R X -
P) =
P U 0 =
P,
it follows that 0(P, 0) = P. We can also obtain an expression for 0(P, <5), arguing:
0(P, <?)
=
{definition}
P U (fj,X : Q \J (R X P)) ¦ -
=
{subtraction}
P U (jjlX : Q U (R X P Q)) ¦ - -
P -
Q)}
P U Q U (pX : R (Q U X) P ¦
Q) - -
{subtraction}
P U Q U (ftX : (R Q P Q) U (R¦ - - ¦
X -
P -
Q))
=
{definition of 0}
0(PU Q,QU(R- Q P Q)). - -
0(0,5) =
R*S
0(P,0) =
P
6(P,Q) =
0(Pl)Q,RQ-P-Q)).
These three equations can be used in a recursive method for computing R* ¦
S:
compute 0(0,5), where
if Q =
0
0(P,Q)
| e(puQ,RQ- P
-
Q), otherwise.
The algorithm will not terminate unless, regarded as a set of pairs, R* S is a finite
relation; under this restriction the algorithm will terminate because the size of P
6.7 J Closure 161
increases at each recursive step. If it does terminate, then the three properties of 6
given above guarantee that the result is R* S.
The above algorithm is non-standard in that relations appear as data objects. But
we can algorithm to use sets as data rather than relations. Think of
rewrite the
the relations P, Q and S as being elements (that is, having type A <- 1, where
R : A <- A is given) and let p AP, q KQ and s AS. Then p, q and s are the
= = =
recalling that A(R* S) E(R*) AS, we obtain the following method for computing
=
^
~~
'
close (p U g, (ER) q p q), otherwise.
In this algorithm the operations are set-theoretic rather than relational; thus U is
set union and (-) is set difference. As before, the algorithm is not guaranteed to
terminate unless the closure of s under R is a finite set.
Exercises
6.35 The fi-calculus. There are just two defining properties of (fiX : (j)X):
(/>(fjiX : </)X) =
(fiX : </)X)
<f)Y<ZY => (^X:<t)X)CY.
The first one states that (fiX : (j)X) is a fixed point of 0, and the second one states
that (fiX (j)X)
: is a lower bound on all fixed points. Use these two rules to give a
(jiX-.^X)) =
<KnX:rl;(4,X)).
The diagonal rule of the ^/-calculus states that
{VLX:iiY:<l>{X,Y)) =
(pX : <t>{X,X)).
Prove the diagonal rule.
162 6 J Recursive Programs
(Rusy =
r*-(s-r*)*
6.37 Using the preceding exercise, show that for any coreflexive C
RC=C =» /T =
(#-~C)*,
where ~C is defined in Exercise 5.17.
Bibliographical remarks
The first account of the Knaster-Tarski fixed point theorem occurs in (Knaster
1928). That version
only applied to powersets, and the generalisation to arbitrary
complete lattices was published in (Tarski 1955). The use of this theorem (and of
Kleene's celebrated result) are all-pervading in computing science. A much more
in-depth account of the relevant theory can be found in (Davey and Priestley 1990).
The term hylomorphism was coined in (Meijer 1992). According to Meijer, the
terminology is inspired by the Aristotelian philosophy that form and matter are
one, vXoa meaning 'dust' or 'matter'. The original proof that hylomorphisms can
was initiated by (Mikkelsen 1976), and further elaborated in (Brook 1977). The
account given here is based on (Doornbos and Backhouse 1995), which contains proofs
of the quoted results, as well as various techniques for establishing that a relation is
inductive. The calculus of least fixed points, partly developed in the exercises, has
its roots in (De Bakker and De Roever 1973) in the context of relations. A more
The idea of using function converses in program specification and synthesis was
originally suggested by (Dijkstra 1979), and has since been elaborated by various
authors (Chen and Udding 1990; Harrison and Khoshnevisan 1992; Gries 1981). Our
own interest in the topic was revived by reading (Augusteijn 1992; Schoenmakers
1992; Knapen 1993), and this led to the statement of Theorem 6.4. Indeed, this
theorem seems to be at the heart of several of the references cited above.
The idea that algorithms can be classified through their synthesis is fundamental
to this and it is
book, a recurring theme in the literature on formal program
development. Clark and Darlington first illustrated the idea by a classification of sorting
algorithms (Darlington 1978; Clark and Darlington 1980), and the exposition given
here is inspired by that pioneering work. An even more impressive classification of
parsing algorithms was undertaken by (Partsch 1986); in (Bird and De Moor 1994)
we have attempted to improve over a tiny portion of Partsch's results using the
Optimisation Problems
minR-A(([S])'<in0)'
This asks for a minimum element under the relation R in the set of results returned
by ahylomorphism. Problems of this form will be referred to as optimisation
problems.
The present chapter and Chapter 10 deal with greedy algorithms, while Chapters 8
and 9 are concerned with dynamic programming. This chapter and Chapter 8
consider a restricted class of
optimisation problem in which T is the initial algebra
of the intermediate
datatype of the hylomorphism, so the problems take the form
min R A([5]). Chapters 9 and 10 deal with the general case.
The central result of this chapter is Theorem 7.2, which gives a simple condition
under which an optimum result be computed by computing an optimum partial
can
problem. The same format (specification, derivation, program) is followed for each
optimisation problem that we solve in the remainder of the book.
We begin by defining the relation min R formally and establishing its properties.
Some proofs illustrate the interaction of division, membership and power-transpose,
while others show the occasional need to bring in tabulations; many are left as
instructive exercises in the relational calculus.
166 7 J Optimisation Problems
minR =
G fl {R/3).
In words, a is a minimum element of x under R if a is both an element of x and a
lower bound of under R. The definition of min R does not require that R be a
x
preorder, but it is only really useful when such is the case. The definition of min R
can be phrased as the universal property
X C min R =
ICG and I-3CiJ
max R =
minR°,
The following three properties of lower bounds are easy consequences of the fact
that (R/S)-f =
R/(f°-S):
(R/3)-t =
R (7.1)
(R/3)-AS =
R/S° (7.2)
(R/3)' union =
{R/3)/3. (7.3)
Prom (7.1) and (7.2) we obtain
min Rr =
idnR (7.4)
minR-AS =
Sn(R/S°). (7.5)
Equation (7.4) gives that R is reflexive if and only if the minimum element under
R of a singleton set is its sole inhabitant. Equation (7.5) can be rephrased as the
universal property
X C minR-AS =
X C S and X 5° C R.
This rule is used frequently and is indicated in calculations by the hint universal
property of min.
minR-AS =
min(Rf)(S 5°)) AS. (7.6)
For the proof we argue as follows:
7.11 Minimum and maximum 167
min(Rn(S- S°)) AS
=
{(7-5)}
s n ((Rn(s-s°))/s°)
=
{division}
5 n (R/s°) n ((5 s°)/s°)
=
{commutativity of meet, and S C (S S°)/S°}
¦
s n (R/s°)
=
{(7-5)}
min R AS.
Since ES =
A(S G), equation (7.5) leads to:
min RES =
(5 G) H (R/(S G)°). (7.7)
One application of (7.7) is the following result, which shows how to shunt a function
through a minimum:
minR-Pf =
f min(f° R •/). (7.8)
We reason:
min R P/
=
{(7.7) and E P on functions}
=
=
{definition of ram}
f.min(f° •/?•/).
As an intermediate step in the above proof, we showed that
minR-Pf =
(/•€) n ((R-f)/3).
This suggests the truth of
min RPS =
(S G) n ((R S)/3). (7.9)
Equation (7.9) does in fact hold provided that R is reflexive. In one direction the
proof involves tabulations. The other half, namely,
min RPS C (5 G) H ((/? S)/3), (7.10)
is easier and is all we will need later on. For the proof, observe that by the universal
property of meet, (7.10) is equivalent to
min R- PS C G PS C S G
ram RPS 3 C ram R-3-S C i? 5.
Given a collection of non-empty sets, one can select a minimum of the union by
selecting a minimum element in each collection and then taking a minimum of the
set of minimums. Since a minimum of the empty set is not defined, the procedure
breaks down if any set in the collection is empty, which is why we have inclusion
rather than equality in:
Using fusion with the power functor, the other half, namely
Implementing min
bminR (a, b) =
(aRb —> a, b).
The function minlist R chooses the leftmost minimum element in the case of ties.
In the Appendix, minlist is defined as a function that takes as argument a Boolean
function of type Bool <- A x A.
Exercises
7.8 Prove that if R and S are reflexive, then RC\S° min R (min S)°. (Hint: for
=
the direction C use tabulations, letting (/, g) tabulate RC\ S° and h A(/ U g).) =
7.10 Suppose that R and S are reflexive. Prove that min R C min S if and only if
RCS.
7.12 Suppose that R is a preorder. Using Exercise 7.5, show that minR =
en(R-
minR).
7.13 Show that if R is a preorder and S is a function, then RC\(S° S) is a preorder.
supR =
minR-A(R°/e)
supR =
((e\R)/R)°n(R/(e\R)).
Prove that these two definitions are equivalent if R is a preorder.
7.18 One proof of the other half of (7.9) makes use of (7.4). Given (7.4) it suffices
to show
7.19 The following few exercises, many of which originate from (Bleeker 1994), deal
with minimal elements. Informally, a minimal element of a set x under a relation
R is an element a G x such that for all b G x with bRa we have aRb. The formal
definition is
mnl R =
min (R° => R).
Prove that (R° => R) is reflexive for any /?, but that (R° => R) is not necessarily a
7.20 Prove that min R C mnl R with equality only if R is a connected preorder.
cZass <5 =
cap (zrf, A<5 G),
where cap returns the intersection of two sets. Informally, class Q takes a set and
returns some equivalence class under Q. Prove that if R is a preorder, then
mnl (#; S) =
mnlS class (R 0 R°) A(mnl R).
7.26 The remaining exercises deal with the notion of a well-bounded preorder. In
set-theoretic terms, a preorder R is well bounded if every non-empty set has a
dom (G) =
dom (min R).
7.29 Using the preceding exercise, show that if R is a well-bounded preorder, then
SFR C RS.
To illustrate, consider the function plus : Nat <- Nat x Nat. Addition of natural
numbers is monotonic on leq, the normal linear ordering on numbers, a fact we can
express as
min x + min y =
min{a + b a G x Ab G y},
provided that x and y are non-empty. Here min = min leq.
Proof. We argue:
/ F(min R) C min R
A(/ Fg)
{universal property of
min}
=
=
{since min R C e}
f-F(minR)-(f-Fe)°CR
7.2 J Monotonic algebras 173
{converse; relators}
f'F(minR-3)-f° CR
{since min R 9 R if =
R is reflexive}
f.FR.f°CR.
In this chapter the main result about monotonicity is the following, which we will
refer to subsequently as the greedy theorem.
=
{since min R AS C5}
|ro i? ASD ([Sf C i?
«£= {hylomorphism theorem (see below)}
mmR'AS'FR'S0 CR
<= {monotonicity: FR S° C S° /?}
mm i? AS S° i? C i?
For an alternative formulation of the greedy theorem see Exercise 7.37. For problems
involving rather than min, the relevant condition of the greedy theorem is that
max
S should be monotonic on Z2, not R°. Note also that we can always bring in context
if we need to, and show that S is monotonic on R° C\ (([S]) ([S])0).
174 7 J Optimisation Problems
The exercises given below explore some simple consequences of the greedy theorem.
In the remainder of this chapter we will look at three other problems, each chosen
to bring out a different aspect of the theory.
Exercises
7.33 Express the fact that a +b < a + b' implies that b < V in a point-free manner.
7.34 Let a be the initial F-algebra. Prove that if a is monotonic on R, then R is
reflexive.
7.35 Sometimes we want monotonicity and distributivity to hold only on the set of
values returned by a relation. Finda suitably weakened definition of monotonicity
that implies
7.36 Use the preceding exercise to give a necessary, as well as a sufficient, condition
for establishing the conclusion of the greedy theorem.
7.37 Prove the following variation of the greedy theorem: if / is monotonic on R
and / C min R AS, then ([/]) C min R A([5]).
7.38 Prove that if S is monotonic on R°, then min R AS min (FR) C min R E5.
7.40 The maximum segment sum problem (Gries 1984, 1990b) is specified by
mss =
max A{sum segment),
where max is an abbreviation for max leq. Using segment =
prefix suffix, express
this problem in the form
mss =
max P(max A(sum prefix)) Asuffix.
Express prefix as a catamorphism on cons-lists, and use fusion to express sum-prefix
as a catamorphism. Hence use the greedy theorem to show that
where oplus = max A(zero U plus). Finally, express list ([c,f]) tails as a catamor-
words, filter px returns thelongest subsequence of x with the property that all its
elements satisfy p. (Question: again, why the rather than a longest subsequence?)
Using subseq ([nil, cons U outr]) and the greedy theorem, derive the standard
=
cons -
(id x L) C L cons.
Now justify the facts that: (i) a lexically largest subsequence of a given sequence is
necessarily a descending sequence; and (ii) if x is descending and a > headx, then
[a] -H- x is lexically larger than x. Use point-free versions of these facts to prove
(formally!) that
We will solve this problem greedy algorithm. The moral of the exercise
with a
is that our classification of what is greedy algorithm can include problems that
a
R =
(sum -
list rating)0 leq (sum list rating),
and rating : Real <- Employee is the conviviality function for individual employees.
We can define party : list A <- tree A in terms of a catamorphism that produces two
parties, one that includes the root and one that excludes it:
party =
choose ([(include, exclude)]).
The relation choose is defined by choose outlUoutr. The relation include includes
=
the root of the tree, so by the president's ruling the roots of the immediate subtrees
have to be excluded. The relation exclude excludes the root, so we have an arbitrary
choice between including or excluding the roots of the immediate subtrees. The
formal definitions are:
include = cons -
Derivation
The derivation involves two appeals to monotonicity, both of which we will justify
afterwards.
We argue:
max R -
hparty
=
{definition of party}
max R A(choose
-
([(include, exclude)]))
{since A(X- Y) EX-AY} =
D
{claim: the greedy theorem is applicable}
max R Achoose ([max (R x R) A{include, exclude)]}.
-
The first claim requires us to show that choose is monotonic on R, that is,
choose -
(R x R) C R choose.
The proof is left as a simple exercise. The second claim requires us to show that
=
{products}
(R x R) (include, exclude).
We outline the proof of one subclaim and leave the other as an exercise. We argue:
C
{products; functors}
cons (id x concat Zis£ # Zis£ oi/fr*)
C {claim: concat Zis£ R C R- concat (exercise)}
cons (id x R- concat Zis£ oi/^r)
-
C
{since cons is monotonic on R (exercise)}
R cons (id x concat list outr)
- -
=
{definition of include}
R -
include.
178 7 / Optimisation Problems
Since include is a function, the first term simplifies to include. We will leave it as
concat -
list (max R Kchoose) outr.
exclude =
concat list (max R Kchoose) outr.
The program
For efficiency, a list x is represented by the pair (x, sum (list rating x)). The relation
max R choose is refined to the standard function bmax R that chooses the left-hand
-
argument in the case of ties. All functions not defined in the following Gofer program
appear in the list of standard functions given in the Appendix. (Actually, bmax r
is a function, but
standard is given here for clarity.) Employees are identified by
their conviviality rating:
>
party
=
bmax r . treecata (pair (include, exclude))
> include =
cons' . cross (id, concat' . list outr)
> exclude =
concat' . list (bmax r) . outr
> cons' =
cross (cons, plus) .
dupl
> concat' =
cross (concat, sum) .
unzip
> r =
leq . cross (outr, outr)
> bmax r =
cond (r .
swap) (outl, outr)
Exercises
7.44 Answer the remaining questions in the problem, namely (i) what is the running
time of the algorithm; and (ii) how can the professor ensure that the president gets
invited to his or her own party?
11 53 34 73 18 53 99 52 31 54
4 72 24 6 46 17 63 82 89 25
—> 67 22 10 97 99 64 33 45 81 76 —>
24 71 46 62 18 11 54 40 17 51
99 8 57 76 7 51 90 92 51 21
adjacent. A
path is to be threaded from the entry side of the cylinder
to the exit side, subject to the restriction that from a given square it
is possible to go to only one of the three positions in the next column
adjacent to the current position. The path may begin at any position
on the entry side and may end at any position on the exit side. The cost
of such a path is the sum of the integers in the squares through which
it passes. Thus the cost of the sample path shown above (in boldface)
is 429. Show how the dynamic programming approach to exhaustive
search allows a path of least cost to be found in 0(n x m) time.
relation with type paths : PL Nat <— LN Nat. Because of the restriction on moves it
is not possible to define paths by the power transpose of a relational catamorphism,
180 7 J Optimisation Problems
so that, strictly speaking, the problem does not fall within the class described at
the outset of the chapter. Instead, we will define it in terms of a relation
for any relation R. To define generate we will need a number of other lax natural
transformations of different types; what follows is an attempt to motivate their
introduction.
First ofall, it is clear that we have to take into account the restriction on moves in
generating legal paths. The relation moves : PNA <- NA is defined by
moves x =
{upx^x^downx},
where up and down rotate columns:
up(ai,a2,...1an) =
(an, ai,..., an-i)
down (oi, 02,..., an) (02, a$,..., an, a\).
These functions are easily implemented when tuples are represented by lists. The
relation F(id, moves) has type
The next step is to make use of a function trans : NP^4 <- PNA that transposes a
In the final program, when sets and n-tuples are both represented by lists, trans
will be implemented by a catamorphism of type LLA <— LLA.
F(I\L4,NPL4) «- F(UA,UPLA).
We nowhave generate =
S F(id, N union £rans moves) for an appropriately chosen
relation 5.
The next step is to make use of a function zip : NF(^4, B) <- F(N^4, UB) that
commutes N with F. In the final program zip is replaced by the standard function on
lists. The relation zip F(id, Uunion trans moves) has type
The next step is to make use of the function cp : PF(A, B) <- F(A, PB), defined by
cp AF(id, g). The relation Ucp zip F(id, Numon trans moves) has type
=
NPF(4,L4) «- F(I\L4,NPL4),
so generate =
5 Ncp zip F(id, Nunion trans moves) for some relation 5.
Finally, we bring in a : LA
F(A, LA), the<- initial algebra of non-empty cons-lists.
The relation N(Pa cp) zip F(id, Uunion trans moves) has type
NPL4 «- F(I\L4,NPL4),
and is the definition of generate.
We have motivated the definition of generate by following the arrows, but one can
Derivation
The derivation that follows relies heavily on the fact that all the above functions
and relations lax natural transformations of appropriate type. The monotonicity
are
a
F(id, min R) C min RPa- cp. (7.13)
Armed with this fact, we calculate:
min R paths
=
{definition of paths}
min R union setify ([generate])
D {distribution over union (7.11), since R is a preorder}
min R P(min R) setify ([generate])
2 {naturality of setify}
min R setify N(ram R) ([generate])
D
{fusion (see below for definition of Q)}
minR setify •([<?]).
N(min R) generate
=
{definition of generate}
N(mm R-Pa- cp) zip F(id, N union trans moves)
D
{(7.13); functors}
Na NF(id, min R)) 2ip F(id, N i/mon £rans moves)
D
{naturality of zip}
Na zip F(Nid, N(rain #)) F(id, N union £rans moves)
-
=
{functors}
Na zip F(irf, N(rain # union) £rans moves)
D
{distribution over union (7.11)}
Na zip F(id, N(mm # P(rain R)) £rans moves)
=
{functors}
Na zip F(id, N(mm #) NP(ram R) £rans moves)
7.4 / Shortest paths on a cylinder 183
2 {naturality of trans}
Na zip F(id, N(ram R) trans PN(ram #) moves)
D {naturality of moves}
Na zip F(id, N(ram #) £rans moves N(ram #))
=
{functors, introducing Q}
Q-F(id,N(minR)),
where Q =
Ua- zip -
Q =
[Nwrap, Neons zip1 (id x
N(mm #) trans moves)].
With this definition of Q, the solution is min R setify •([<?]).
The program
>
path =
minlist r . catallist (list wrap', list cons' step) .
>
step
=
zip . cross (id, (minlist r)
list trans moves
. .
> r =
leq . cross (outr, outr)
>
wrap' =
pair (wrap, id)
> cons' =
cross (cons, plus) .
dupl
> moves x =
[up x, x, down x]
>
up x =
tail x ++ [head x]
> down x =
[last x] ++ init x
184 7 J Optimisation Problems
Exercises
7.45 Did we use the fact that N was a relator in the derivation?
moves x =
{up (up x), up x,x, down x, down (down x)}?
The following problem, invented by Hans Zantema, is typical of the sort that can
Let us call a sequence [oi, 02,..., an) of transactions secure if there is an amount
reserve of between three and six units. Given the constraint that N is no smaller
than any single transaction, every singleton sequence is secure, so a valid schedule
certainly exists.
where sum : Nat <- list Nat sums a list of numbers and prefix is the prefix relation
on non-empty lists. Then a sequence x of transactions is secure if there is an r > 0
such that
We leave it as a short exercise to show that this condition can be phrased in the
equivalent form
prefix -
secure C secure prefix.
is secure.
partition =
([nil, new U glue]),
where
new = cons
(wrap x id)
glue =
cons (cons x id) assocl -
(id x cons°).
old =
cons -
Derivation
A greedy algorithm exists if [nil, new U old] is monotonic on R°. The monotonicity
condition is equivalent to two conditions:
new
(id x R°) C R° (new U old) (7.14)
old (id x R°) C R° (new U o/d). (7.15)
Well, (7.14) is true but (7.15) is false.
=
{definition of new}
R° new
C {monotonicity of join}
R° (new U o/d).
To see why (7.15) is false* let [x] -tf xs and [y] -tf t/s be two equal-length partitions
of the same sequence, so, certainly,
([x]-U-xs)R°([y]-U- ys).
Suppose also that [a] -tf x is secure. Then (7.15) states that one or other of the
following two possibilities must hold:
However, the analysis given above does suggest a way out: if y is a prefix of #, then
secure ([a]-tf x) does imply secure
([a]-W-y) because secure is prefix-closed. Suppose
we refine the order R to R ; H, where
H =
(head0 prefix head) U (nil nil°).
7.5 J The security van problem 187
R,H =
R n (R° =? H).
In words, [] (R ; H) [] and ([y] -tf 2/5) (R ; #) ([x] -tf xs) if 2/5 is strictly shorter than
xs, or it has the same length and y prefix x. Since R, H C R we can still obtain a
greedy algorithm for our problem if we can show that S is monotonic on (R ; 7/)°
and that ([ram (R ; H) AS]) can be refined to a function. The second task is easy
since old returns a shorter result than new if it returns any result at all; in symbols,
old C (R ; H) new. Hence we obtain
([wrap -
where the coreflexive ok holds on (a, xs) if xs ^ [] and [a] -tf ftead xs is secure.
new
(id x (i?; H)°) C (jR ; #)° (new U oU) (7.16)
old (id x (R ; J5T)°) C (i?; H)° (new U old). (7.17)
Condition (7.16) follows from the fact that
old-(idx(\R°\U(R0nH°)))
=
{distributing join}
(old (id x |fl°|)) U (old (id x (fl° n H°)))
C {conditions (7.19), (7.20) and (7.21)}
((fl° H H°) new) U ((fl° n H°) oW)
C {since ATI Y C X ; Y, and converses}
(R;H)° (new U old).
The greedy condition is established.
Up to this point we have used no property of secure other than the fact that it
is prefix-closed. For the final program we need to implement the security test
efficiently. To do this, recall from Section 5.6 that the prefix relation prefix : list A<-
list A can be defined as a catamorphism
prefix =
([nil, cons U nil]).
Since sum prefix =
([zero, plus U zero]), two baby-sized applications of the greedy
theorem yield:
ceiling =
([zero, omax plus])
floor =
([zero, omin plus]),
where omax a =
bmax (a,0) and omin a = bmin (a,0). Recall that x is secure if
omax (a + b) -
omin (a + c) < N,
where b =
ceiling x and c =
floor x. This condition implies omax b omin c < N,
so x is secure. This proves that secure is suffix-closed.
In summary, we have derived the following program for computing a valid schedule,
in which schedule is parameterised by N and ok is expressed as a predicate rather
than a coreflexive:
schedule N =
([nil, (ok N -» glue, new)])
okN(a, []) =
false
okN (a, [x] -tf xs) =
omax (a + ceiling x) omin (a + floor x) < N.
7.5 J The security van problem 189
The program
In the final Gofer program we represent the empty partition by ([],(0,0)) and a
> schedule n =
catalist (start, cond (ok n) (glue', new'))
> new' =
cross (new, augment) dupl .
>
glue =
cons cross (cons,
.
id) assocl cross (id, split). .
> new =
cons cross (wrap,
.
id)
> omax =
cond (>= 0) (id, zero)
> omin =
cond (<= 0) (id, zero)
Exercises
7.48 Prove that 0 < r + floor x < N and 0 < r + ceiling x < N for some r > 0 if
and only if
that y is secure?
is that no line in the paragraph should have a width that exceeds some fixed quantity
W, where the width of a line x is the sum of the lengths of the words in x, plus
some suitable value for the interword spaces. Calling the associated coreflexive fits,
argue that fits is both prefix- and suffix-closed. Why is the following formulation
not a reasonable specification of the problem?:
Q =
(nil° •!) U (prefix ; (tail0 Q tail)).
This defines preorder, and a linear order on partitions of the same sequence. Using
a
only secure is prefix-closed, show that both new and old are monotonic
the fact that
on Q°. Although it is not true that Q C #, we nevertheless do have
not a refinement of
R, it is still the case that the (unique) minimum partition under
Q is a minimum partition under R. The proof is a slightly tricky combinatorial
argument. The advantage of taking this Q is that we can replace R by a more
general preorder R cost0 leq cost and establish general properties of cost under
=
Bibliographical remarks
first explored in (Brook 1977). Most of the ideas found there are also apparent in
earlier work on calculus, for instance (Riguet 1948).
the relational We adapted those
works for applications to
optimisation problems in (De Moor 1992a). Of course, the
definitions in relational calculus are obvious, and have also been applied by others,
see e.g. (Schmidt, Berghammer, and Zierer 1989).
Many researchers have attempted a classification of greedy algorithms before. An
overview can be found in (Korte, Lovasz, and Schrader 1991), which proposes a
Thinning Algorithms
In this chapter we continue to study problems of the form minR A ([5]). The
greedy theorem of the last chapter gave a rather strong condition under which such
a problem could be solved by maintaining a single partial solution at each stage.
At the other extreme, the Eilenberg-Wright lemma shows that A ([5]) can always
be implemented as a set-valued catamorphism. This leads to an exhaustive search
algorithm in which all possible partial solutions are maintained at each stage.
Between the two extremes of all and one, there is a third possibility: at each stage keep
a representative collection of partial solutions, namely those that might eventually
8.1 Thinning
Given a relation Q : A <- A, the relation thin Q : PA <- PA is defined by
thinQ =
(e\e)n((3-Q)/3). (8.1)
Informally, thin Q is a nondeterministic mapping that takes a set y, and returns
some subset x of y with the property that all elements of y have a lower bound
under Q in x. To see this, note that x(e\e)y means that a; is a subset of y, and
x((3 Q)/3)y =
(V6 G y : 3a e x :
aQb).
Thus, to thin a set x with thin Q means to reduce the size of x without losing the
possibility of taking a minimum element of x under Q. Unlike the case of min R,
we can implement thin Q when Q is not a connected preorder (see Section
8.3).
Definition (8.1) can be restated as the universal property
X C thinQ -AS =
e X C S and X S° C 3 .
Q,
Properties of thinning
We can introduce thin into an optimisation problem with the following rule, called
thin-introduction:
min R =
min R thin Q provided that Q C R.
The proof, left as an exercise, depends on the assumption that Q and R are pre-
orders.
thin Q D r -
min Q, (8.2)
where r : PA<— A returns Q is a connected preorder,
singleton sets. However, unless
the domain of r min Q is smaller than that of thin
Q. For instance, thin id is entire
but the domain of r min id consists only of singleton sets. So, use of (8.2) may
result in an infeasible refinement. At the other extreme, thin Q D id, so thin Q can
always be refined to the identity relation on sets.
There is a useful variant of thin-elimination:
G r min RKS C S
T-minR-hS-S° C 3 Q.
The first inclusion is immediate from G r = id. For the second, we argue:
r min R AS S° C 3 Q
=
{shunting r and G r =
id}
minR-AS-S°CQ
=
{context}
min(Rn(S S°)) AS S° -
C Q
<= {since AS S° C a}
min (Rn(S> S°)) BCQ
The following theorem and corollary show how the use of thinning can be exploited
in solving optimisation problems. Both exhaustive search and the greedy algorithm
follow as special cases. As usual, F is the base functor of the catamorphism.
C
([thin Q A(S Fe)D ([5])
([thin Q A(S Fe)D ([55° C 3 Q.
The first is an easy exercise in fusion and, by the hylomorphism theorem, the second
follows if we can show that
We reason:
Exercises
8.4 Prove that cup {thin R, thin Q) C thin R by showing more generally that
cup
-
where subset is the inclusion relation on sets. You will need the inclusion
exe c (e,e)'cup,
so prove that as well. Does equality hold in the original inclusion when Q =
Rl
8.5 Prove that minR = r° thin R and hence prove the thin-elimination rule.
8.8 Prove that the greedy algorithm is a special case of Theorem 8.1.
8.9 Show that if Q is well-supported (see Exercise 7.30), then A(mnl Q) C thin Q.
where dj G Xj for 0 < j < n. With each path is associated a cost, defined by
obtain that
C min R
minpath A([a F(e, id)]),
where a =
[wrap, cons] and F is the base bifunctor of non-empty cons-lists.
cost =
outr -
([wrapz, consw])
where wrapz =
(wrap, zero) and
Derivation
a-F(e,Q°) C Q°.a-F(e,id).
Of course, if we can take Q R, then we can appeal to the greedy theorem,
=
and q are two paths in the network [x\,..., xn] with costp > cost q. Then the
monotonicity condition with Q R says that for any set of vertices xq, and any
=
>
cost ([a] -tfp) cost([b] -tf q).
In particular, this condition should hold when xq {a}, and =
so a = b. Using
cost ([a] -tf p) =
wt (a, headp) + costp, we therefore require
However, since wt(a,head q) can be arbitrarily large, this condition fails unless
headp head q. On the other hand, if headp
=
head q, then the inequality reduces =
path beginning with v. This motivates the following calculation, in which the term
thin Q is eliminated:
D
{thin distributes over union (8.4)}
union P(thin Q A(a F(id, e))) AF(e, id)
D
{thin-elimination (8.3) see below}
-
i?n(5-5°)c q
<£= {definition of Q}
S S° C ftead° Aeaci
=
{shunting}
(ftead 5) (ftead 5)° C id
<= {since head S C [id, owtf] (exercise), so ftead 5 is simple}
true.
The above derivation is quite general and makes hardly any use of the specific
datatype. For the base bifunctor F of non-empty cons-lists we have
where the functions cpl and cpr were defined in Section 5.6. Hence, finally, we have
The program
In the Gofer program we represent sets by lists in the usual way, and represent a
path p by (p, (headp, costp)). The program is parameterised by the function wt:
>
path =
minlist r . catallist (list wrap', list step .
cpl)
>
step =
minlist r . list cons', cpr
> r =
leq . cross (cost, cost)
> cost =
outr . outr
>
wrap' =
pair (wrap, pair (id, zero))
> cons' =
cross (cons, augment) dupl .
>
augment
=
pair (outl, plus cross (wt, id) . .
assocl)
Exercises
8.10 Can we replace the cons-list bifunctor with one or both of the following bi-
functors?
F(A,B) = A + (Ax(BxB))
F(A,B) = A + (BxB).
What is the interpretation of the generalised layered network problem?
8.11 The derivation above is an instance of the following more general result.
Suppose Q C R and S =
Si -
In the layered network example we were fortunate in that the thinning step could
be eliminated, but most often we have toimplement thinning part of the final
as
except when thin Q is applied to finite sets; unlike min R we do not require the sets
to be non-empty, nor that Q be a connected preorder.
The function thinlist Q might be specified by
where setify : PA <- list A. However, we want to impose an extra condition upon
thinlist Q, namely that
thinlist Q C subseq.
In words, we want thinlist Q to preserve the relative order of the elements in the
list. The reason for this additional restriction will emerge below.
thinlist Qx =
[minlist Q x], (8.5)
where minlist Q was defined in the preceding chapter.
thinlist Q =
([nil, bump Q]),
where
bump Q(a1[]) =
[a]
bump Q (a, [b] -tf x) =
(aQb -> [a] -tf x, bQa -> [b] -tf x, [a] -tf [b] -tf x).
This gives a linear-time algorithm in the number of evaluations of Q, though it is
not always guaranteed to deliver a shortest result. There are other possible choices
for thinlist Q, some of which are explored in the exercises.
Sorting sets
In the main theorem of this section we make use of the idea ofmaintaining a finite
set as a sorted list. We will use a version of sort from Chapter 6, taking
sort P =
ordered P setify0',
where P : A <- A is some connected preorder. Note that sort P is not a function,
even when P is a linear order: for example, sort leq {1,2,3} may produce [1,2,3] or
[1,1,2,3], or any one of a number of similar lists.
thinlist Q sort P
=
{definition of sort
P}
thinlist Q ordered P setify0
C
{claim: thinlist Q ordered P C ordered P thinlist Q}
ordered P thinlist Q setify0
C
{specification of thinlist Q and shunting}
ordered P setify0 £Am Q
=
{definition of sort P}
sort P -
thin Q.
For the claim it is sufficient to show that thinlist Q ordered P C ordered P:
thinlist Q ordered P
C {specification of thinlist Q}
subseq ordered P
C
{since subseq ordered P C ordered P if P is connected}
ordered P.
It is important to note that the choice of P can affect the success of the subsequent
thinning process; ideally, sort P should
bring together elements that comparable
are
There are five other properties about sort P that we will need. Proofs are left as
The fifth property deals with an implementation of the general cartesian product
function cp (F) A(Fe) described in Section 5.6. We met the special case cp (list)
=
C
listcp (F) F(sort P) sort (FP) cp (F). (8.11)
202 8 J Thinning Algorithms
Not every functor F admits an implementation of listcp (F) satisfying (8.11); one
requirement is that F distributes over arbitrary joins. It is left as an exercise to
define listcp (F) for each polynomial functor F. It follows that if F is polynomial
and distributes over arbitrary joins (such a functor is called linear), then (8.11) can
be satisfied. In what follows we will assume that (8.11) can be satisfied.
Inclusions (8.8), (8.9) and (8.11) are used in the proof of the following lemma, which
is required in the theorem to come:
sortP-A(p-f-Fe)
=
{A of composition and cp (F) AFe} =
D
{(8.9)}
filter p sort P Pf cp (F)
-
D
{(8.8)}
filterp listf sort (f° P /) cp (F)
D
{since / is monotonic on P}
filterp listf sort(FP) cp (F)
- -
D
{(8.11)}
filterp listf listcp (F) F(sort P).
- -
Binary thinning
With these preliminaries out of the way, the main theorem of this section can now
1. S =
(pi f\)
-
U (p2 fa),
-
where p\ and p2 are coreflexives.
2. Q isa preorder with Q C R and such that p\ -f\ and p2 'h are both monotonic
on Q°.
8.3 J Implementing thin 203
Then
£ win # A ([5]),
where (/» =
filter pi Zis£/i.
Proof. We reason:
ramfl-AflS])
2 {thinning theorem since S is monotonic on Q°}
min R ([ttin Q A(5 Fg)])
2 {(8.7)}
minlist R sort P ([tfwn Q A(5 Fg)])
2 {fusion}
minlist R ([thinlist Q raen/e P (#1, #2) Ustcp]). *
D
{(8.10)}
thinlist Q raen/e P (sort P A(pi /1 Fg), sort P A(p2 /2 Fg)) * *
D
{Lemma 8.1}
thinlist Q merge P (#1, #2) /«s£cp F(sort P).
*
Exercises
thinlist Q[] =
[]
thinlist Q [a] =
[a]
Q ([a] -tf x), if aQb
shorter
examples Q may return a or longer
result than that of the text.
connected Qx =
(Va : a Mist x : (V6 : b inlist x : aQb V bQa)).
In words, we partition a list into the smallest number of components, each of whose
elements are all connected under Q, and then take a minimum under Q of each
component. Use the fact that connected Q is prefix-closed (in fact, subsequence-
closed) to greedy algorithm for the optimisation problem
give a on the right. Apply
type functor fusion to obtain a catamorphism for thinlist Q.
that QQ°CQUQ°?
8.17 Prove that subseq sort P C sort P subset provided P is a connected preorder.
8.20 Can you define listcp (T) for an arbitrary type functor T?
8.4 J The knapsack problem 205
8.21 Give a counter-example showing that (8.11) fails for non-linear polynomial
relators.
8.22 Formalise and prove a version of binary thinning in which the algebra S takes
the form S =
(A* Pi) U (/2-pa).
Let Item denote the type of items to be packed and vol, wt : Real <- Item the
associated value and weight functions. The input consists of an element x of type
list Item and a given capacity w.
We will model selections as subsequences of the given list of items. The relation
subseq : list A <— list A can be expressed in the form
subseq =
([[nil, cons] U [nil, outr]]).
The total value and weight of a selection are given by two functions value, weight :
value =
sum list vol
weight =
sum list wt.
within w subseq =
([(within w [nil, cons]) U [nil, outr]]).
206 8 / Thinning Algorithms
Of course, the right-hand side simplifies to ([nil, {within w cons) U outr]); the form
above suggests that binary thinning might be applicable.
Derivation
We first check to see whether (within w [nil, cons]) U [nil, outr] is monotonic on
have
listcp =
listcp(F) =
wrap + cpr.
Furthermore, g\
=
[list nil, hi] and g<i =
[list nil, hz], where
hi =
filter (within w) list cons
h% = list outr.
The program
We represent a list x of items by the pair (x, (value x, weight x)). The following
program is parameterised by the functions val and wt:
> start =
[([],(0,0))]
> step w =
pair (filter (within w) . list cons', list outr) .
cpr
> within w =
(<= w) weight.
> cons' =
cross (cons, augment) .
dupl
>
augment
=
cross (addin val, addin wt) .
dupl
> addin f =
plus . cross (f, id)
> r =
geq . cross (value, value)
> p
=
leq . cross (weight, weight)
>
q
=
meet (p,r)
> value =
outl . outr
The algorithm, though it takes exponential time in the worst case, is quite efficient
in practice. The knapsack problem is presented in many text books as an
application of dynamic programming, in which a recursive formulation of the problem
is implemented efficiently under the assumption that the weights and capacity are
integers. Dynamic programming will be the topic of the next chapter, but the
thinning approach to knapsack gives a simpler algorithm that does not depend on the
inputs being integers. Moreover, if the weights and capacity are integers, then the
algorithm is as efficient as the dynamic programming scheme.
maximum possible line width. The width of a line is the sum of the widths of its
words plus some measure of the interword spaces. It is assumed that w is sufficiently
large that any word will at least fit on a line by itself.
208 8 J Thinning Algorithms
Line =
list+ Word
Para =
list+ Line.
We will build paragraphs from right to left, so our lists are cons-lists. Certainly,
no greedy algorithm for the paragraph problem can be based on cons-lists
sensible
(see Exercise 7.56), but thinning algorithms consider all possibilities and are not
sensitive to the kind of list being used.
To complete the specification we need to define waste w, fits w and partition. The
type assigned to partition is Para <— list+ Word and we can define it as a catamor-
phism on non-empty lists by changing the definition given in Section 5.6 slightly:
partition =
([wrap wrap, new U glue]),
where
Note that glue is a (total) function on non-empty lists, but only a partial function
on possibly empty lists. We will need the fact that glue is a function in the thinning
algorithm to come.
The coreflexive fits w holds on a line x if width x < w, where width is given by a
width =
([length, succ plus (length x id)]).
It is assumed that interword spaces contribute one unit toward the width of a line,
which accounts for the term succ in the catamorphism above.
Finally, the function waste depends on the 'white-space' that occurs at the end
w
of all the lines of the paragraph, except for the very last line, which, by definition,
has no white-space associated with it. Formally,
specification. specification.
where init: list A <— list+ A removes the last element from a list, and
white wx =
(w width x)-
Provided it satisfies certain properties, the precise definition of collect is not too
important, but for concreteness we will take
where sqrm =
m2. This definition is suggested in (Knuth and Plass 1981).
After an appeal to fusion, using the assumption that each individual word will fit
on a line by itself, we can phrase the paragraph problem in the form
C min R
paragraph w h([wrap wrap, new U (ok w glue)]),
where ok w holds on
([x] -H- xs) if width x < w. Since an individual word will fit on
a line by itself, we can rewrite the algebra of the catamorphism in the form
Derivation
Before proceeding with the derivation of an algorithm, we note that the obvious
greedy algorithm does not solve this specification. The greedy algorithm is a left
to right algorithm, filling lines for as long as possible before starting a new line.
The left-hand side of Figure 8.1 shows the output of the greedy algorithm on the
opening sentence of this section, and an optimal paragraph (with the given definition
of collect) on the right.
210 8 J Thinning Algorithms
One reason why the greedy algorithm fails is that glue is not monotonic on R°.
Even for paragraphs [x] -H- xs and [y] -H- ys of the same input, the implication
> waste
waste ([x] -H- xs) ([y] -H- ys)
=>• waste ([[a] -H- x] -tf xs) > waste ([[a] -H- y] -tf 2/5)
does not hold unless x = y. Even then, we require an extra condition, namely that
cons is monotonic under collect0 Ze# collect. This condition holds for the given
definition of collect, among others.
Given this property of collect, we do have that both new and ok w glue are
on Q°, where
monotonic
Q =
R H (ftead° head).
We leave the formal justification as an exercise. So all the conditions for binary
thinning place, except for the choice of the connected preorder P. Unlike the
are in
case of the knapsack problem we cannot take P R. The choice of P is a sensitive =
P =
head0 L head,
is easy to show that both new and glue are monotonic on P. However, all this is
overkill because a much simpler choice of P suffices, namely, P U, the universal =
relation. Trivially, all functions are monotonic on II. The reason why II works is
because we have
merge II cat,
=
paragraph w =
minlist R ([thinlist Q -
cat -
where
g1
=
Ust [wrap wrap, new]
g2
=
filter (ok w) list [wrap wrap, glue].
For the functor FA =
Word + (Word x 4) we have
listcp =
listcp(F) =
wrap + cpr.
8.5 J The paragraph problem 211
paragraph w =
minlist R ([start, thinlist Q cat (/&i, A2) cpr]), *
where
start =
wrap wrap wrap
fti =
list new
/&2 =
/i/ter (ok w) Zis£ </Z?/e.
The program
>
paragraph w =
unpara .
para w words
>
unpara
=
unlines . list unwords . outl
> para w =
minlist r . catallist (start w, thinlist q .
step w)
> step w =
cat .
pair (list (new' w), filter ok list
.
glue') .
cpr
> start w =
wrap .
pair (wrap wrap, augment) .
> new' w =
cross (new, augment) dupl
> where augment cross ((w-)
=
length, waste) .
>
glue' =
cross (glue, augment) dupl .
> r =
leq . cross (waste .
outr, waste .
outr)
> p
=
eql . cross (outl .
outr, outl .
outr)
>
q
=
meet (r,p)
> waste =
omax .
plus . cross (sqr, id)
> omax =
cond (>= 0) (id, zero)
>
sqr
=
times .
pair (id, id)
> ok =
(>= 0) . outl . outr
>
neginf =
const (-10000)
Exercises
This leads to less pleasing paragraphs, but a greedy algorithm is possible provided
we switch to snoc-lists. Derive this algorithm.
Describe an
0(ra2)-time algorithm for
determining an optimal bitonic
tour. You may assume that points have the same x-coordinate.
no two
(Hint: Scan right to left, maintaining optimal possibilities for the two
parts of the tour.)
8.6 J Bitonic tours 213
We will solve a generalised version of the bitonic tours problem in which distances
are necessarily euclidean nor necessarily symmetric. We suppose only that with
not
each ordered pair (a, b) of points (called cities below) is associated a travelling
cost tc(a,b), not necessarily positive nor necessarily equal to tc(b,a). The final
algorithm will take 0(n2) time, where n is the length of the input, assuming that
tc can be computed in constant time.
It does not make sense to talk about a bitonic tour of one city, so we will assume
that the input is a list of at least two cities, the order of the cities in the list being
relevant. We will take the hint in the formulation of the problem and build tours
from right to left, but this is only because cons-lists are more efficient than snoc-lists
in functional programming. Formally, all this means that we are dealing with the
base functor
FA (City x
City) + (City x A)
of cons-lists of length at least two.
We will describe tours, much as a travel agent would, by a pair of lists (x, y), where
x represents the outward journey and y the return (reading from right to left). For
example, the tour that proceeds directly from New York to Rome but visits London
on its return is represented by the itinary
because the travelling costs may depend on the direction of travel. As we have
described them, both parts of the tour are subsequences of the input, and have
lengths at least two.
Suppose the first example is extended to include Los Angeles as the new starting
point. This gives rise to two extended tours:
tour =
([start, dropl U dropr]),
where start (a, b) =
([a, b], [a, b]) and
dropl(a,([b]-U-x,y)) =
([a] -H- x, [a] -H- y)
dropr (a, (x, [b] -tf y)) =
([a] -tf x, [a] -tf y).
Each partial tour (x, y) maintains the property that the first elements of x and y
are the same, as are the last elements.
where
Derivation
As usual, analysis of why [start, dropl U dropr] is not monotonic on R° will help to
suggest an appropriate Q for the thinning step. The monotonicity condition comes
down to two inclusions:
dropl
dropr (id x R°) C R° .
dropr.
8,6/ Bitonic tours 215
To see what these conditions mean, observe that cost (dropl {a, {x, y))) equals
cost {x, y) + tc {a, next x) tc {head x, next x) + cost {head y, a),
where next {[a] -tf [b] -H- x) 6. Dually, cost {dropr {a, {x, #))) equals
cos£ (#, y) + tc (a, ftead x) tc {next y, head y) + tc {next y, a).
Now, the first condition says that if cost {x, y) < cost
{u, v), then
The second condition is similar. Neither holds under an arbitrary function cost
unless
{head x, head y) =
{head u, head v) A {next x, next y) =
(next u, next v).
The first conjunct will hold whenever (x, y) and (m, v) are tours of the same input.
It is now clear that we have to define Q by
Q =
R fl {next2)° next2, -
for then dropl and dropr are both monotonic under Q° (and Q).
Since merge II =
cat, we can appeal to binary thinning and take
mintour minlist R -
([thinlist Q -
cat -
{91,92) listcp]),
-
where #1 =
list [start, dropl) and 92 list [start, dropr). As before, we have listcp
wrap + cpr, so mintour simplifies to
minlist R -
The algorithm takes quadratic time because just two new tours are added to the list
of partial solutions at each stage (see Exercise 8.25). If the list of partial solutions
grows linearly and it takes linear time to generate the new tours, then the total
time is quadratic in the length of the input.
216 8 / Thinning Algorithms
The program
For efficiency a tour t is represented by the pair (£, costp). The Gofer program is
parameterised by the function tc:
> mintour *
minlist r . cata21ist (wrap .
start, thinlist q .
step)
>
step
*
cat .
pair (list dropl, list dropr) .
cpr
>
adjustl (a, b:c:x, d:y) =
tc (a,c) -
tc (b,c) + tc (d,a)
>
adjustr (a, b:x, d:e:y) =
tc (a,b) -
tc (e,d) + tc (e,a)
> r =
leq . cross (outr, outr)
>
p
=
eql . cross (next2, next2)
>
q
=
meet (r,p)
> next2 =
cross (next, next) . outl
> next =
head . tail
Exercises
8.25 Determine next (dropl (a, (x, y))) and next (dropr (a, (re, y))). Hence show by
induction that the next values of the list of tours maintained after processing the
input [ao, oi,..., On] are:
8.26 Consider the case where tc is symmetric, so the tour (y, x) is essentially the
same as (x,y). Show how dropl and dropr can be modified to avoid generating the
same tour twice. What is the resulting algorithm?
8.27 One basic assumption of the problem was that a city could not be visited
both on the outward and inward part of the journey. Reformulate the problem to
remove this restriction. What is the algorithm?
8.28 The other assumption was that each city should be visited at least once.
8.30 The rally driver's problem is described as follows: imagine a long stretch of
road along which n gas stations are placed. At gas station i (1 < i < n) is a
quantity of fuel /*. The distance from station i to station i +1 (or to the end of the
road if i n) is a known quantity d*, where both fuel and distance are measured
in terms of the same unit. Imagine that the rally driver is at the beginning of the
road, with a quantity /o of fuel in the tank. Suppose also that the capacity of the
fuel tank is some fixed quantity c. Assuming that the rally driver can get to the
end of the road, devise a thinning algorithm for determining a minimum number of
stops to pick up fuel. (Hint: model the problem using partitions.)
8.31 Solve the following exercise from (Denardo 1982) by a thinning algorithm: A
long one-way street consists of m blocks of equal length. A bus runs 'uptown' from
one end of the street to the other. A fixed number n of bus stops are to be located
so as to minimise the total distance walked by the population. Assume that each
person taking uptown bus trip walks to the nearest bus stop, gets on the bus,
an
rides, gets off at the stop nearest his or her destination, and walks the rest of the
way. During the day, exactly Bj people from block j start uptown bus trips, and
Cj complete uptown bus trips at block j. Write a program that finds an optimal
location of bus stops.
Bibliographical remarks
The motivation of this chapter was to capture the essence of sequential decision
processes as by Bellman (Bellman 1957), and rigorously defined
first introduced
by (Karp and Held 1967). In particular, Theorem 8.2 could be seen as a 'generic
program' for sequential decision processes (De Moor 1995). In that paper it is
indicated how the abstract relational expressions of Theorem 8.2 can actually be
written as an executable computer program.
In programming methodology, our work is very much akin to that of (Smith and
Lowry 1990; Smith 1991). Smith's notion of problem reduction generators is quite
similar to the generic algorithm presented here in fact, but bears a closer
resemblance to the results of the following chapter.
218 8 J Thinning Algorithms
novel solution to the 0/1 knapsack problem that outperforms all others in practice
(Ning 1997).
Chapter 9
Dynamic Programming
minR-A«lS])-<lT])°).
However, we only consider the case where S
will ft, a function. This chapter
=
If the principle of optimality is satisfied, then one can decompose the problem in
all possible ways into subproblems, solve the subproblems recursively, and assemble
an optimal solution from the partial results. This is the content of Theorem 9.1.
The sets of decompositions associated with different subproblems are usually not
disjoint, so a naive approach to solving the subproblems recursively will involve
repeating work. For this reason there is a second phase of dynamic programming in
which the subproblems are solved more efficiently. There are two complementary
schemes: memoisation and tabulation. (The terminology is standard, but tabulation
has nothing to do with tabular allegories.)
The memoisation scheme is top-down; the computation follows that of the recursive
program but solutions to subproblems are recorded and retrieved for subsequent
use. Some functional languages provide a built-in memoisation facility as an
9.1 Theory
As mentioned only the case that S
above, we consider =
ft, a function. To save ink,
define H ([ft]) ([T])0
=
follows, where ft and
in all that T are F-algebras. The basic
theorem about dynamic programming is the following one.
Proof. It follows from Knaster-Tarski that the conclusion holds if we can show
that
minR-P(h-FM)-AT° C M. (9.1)
Using the universal property of min we can rewrite (9.1) as two inclusions:
minR-P(h-FM)-AT° C H (9.2)
min RP(hFM)AT°H° C R. (9.3)
To prove (9.2) and (9.3) we will need the rule
minR-P(h-fM)-AT0
C {since (9.4) gives min R PX C X -
e}
hFMeAT0
=
{A cancellation}
ft FM T°
C {definition of M and universal property of min}
hFHT°
=
{definition of H and hylomorphism theorem (Theorem 6.2)}
H.
9.1 / Theory 221
minR-P(h-FM)-AT°-H°
{since (9.4) gives min R PX C (R X)/3> .
((^/&.FM)/3)-Aro-#0
{definition of # and hylomorphism theorem}
((R-h- FM)/3). AT° TFH°h°
{since AX° «IC9; division; functors}
R -h F(M J5T°) ft°
.
{assumption h FR ft° C R}
RR
{since i? is transitive}
R.
will never lead to better results than others. The basic theorem can be refined by
bringing in a thinning step to eliminate unprofitable decompositions. This leads to
the following version of dynamic programming; the proof follows the preceding one
very closely, and we leave it as an exercise:
Both theorems conclude that an optimal solution can be computed as a least fixed
point of a certain equation. Theorem 6.3 says that the equation has a unique fixed
point if member (F) T° is an inductive relation. Furthermore, if AT° returns finite
non-empty sets and R is a connected preorder, then the unique solution is entire.
By suitably refining min R and thin Q AT°, we can then implement the solution
-
as a recursive function.
Proposition 9.1 Suppose that V\ and V2 have disjoint ranges, that is, suppose
thatVi° V2 =
0. Then
h-FR C #. ft
hFHQ° C R°hFH.
To ease the task of checking these conditions, we can often appeal to one or other
of the following results; the first could have been given in an earlier chapter.
then h-FRCR-h.
Proof. We argue:
ft .
F# C R .
ft
=
{definition of # and shunting}
cos^ ft Fi2 C leq cos^ ft
=
{assumption cos^} on
?
9.1 J Theory 223
cost ft =
k F(cos£, J?°)
A; F(/eg x id) C /eg fc,
Proof. We argue:
=
{definition of R and shunting}
cos* ft F((cos*° /eg cos*) n (# H°)) C /eg. cost ft
=
{products}
cost ft F((cos£, J5T0)0 (/eg cos*, H°}) C /eg cos* ft
=
{assumption on cost}
k F((cos£, jET°> (cos*, #0}0 (/eg cost, H°)) C /eg * F(cos£, J5T°>
In the next result we take F to be a bifunctor, writing F(id, X) rather than FX.
Proof. Monotonicity follows at once from the reflexivity of U. For the second part
we argue as follows:
h F(W, H) Q°
=
{taking Q F(£/, V); converse; bifunctors}
=
h-F(U°,H- V°)
C {assumption on V}
h-F(U°,R°-H)
C {bifunctors}
h-F(U0,R°)*F(id,H)
C {assumption on h (taking converse and shunting)}
fl°-fc-F(td,J5T).
D
Exercises
h F(R n (H #°)) C Rh
h FH .
9.4 Prove that the thinning condition of Theorem 9.2 can be satisfied by taking
Q =
F(M° R M). Why may this not be a good choice in practice?
9.5 This exercise deals with the proof of Proposition 9.1. Relations V\ and V2 have
disjoint ranges if ran V2 C~ ran Vi, where is the complementation operator
~ on
ran V2 = ran VV ~
ran V±.
A[ Vi, V2]° =
(ran Vi -> A(inl Vi°), A(tnr V2°)).
Now show that
There are many possible choices for these operations, but for simplicity we assume
that we are given just three: copy, delete and insert. Their meanings are as follows:
The point about these operations is that if we swap the roles of delete and insert,
then we obtain a transforming the target string back into the source.
sequence
In fact, the operations contain enough information to construct both strings from
scratch: we merely have to interpret copy a as meaning "append a to both strings";
delete a as "append a to the left string"; and insert a as "append a to the right
string". Since there are many different edit sequences from which the two strings
can be reconstituted, we ask for a shortest edit sequence.
To specify the problem formally we will use cons-lists for both strings and edit
sequences; thus a string is an element of list Char and an edit sequence is an element
of list Op, where
The function edit: (list Char x list Char) <- list Op reconstitutes the two strings:
edit =
([base, step]),
where base returns the pair ([],[]) and
step(cpya,(x,y)) =
([aj-tf x, [a] -tf y)
step (del a, (x, y)) =
([a] -tf x, y)
step(insa,(x,y)) =
(x,[a]-W-y).
The specification of string edit problem is to find a function mle (short for "minimum
length edit") satisfying
mle C min R Aedit°,
where R =
length0 leq length.
226 9 J Dynamic Programming
Derivation
length =
([zero, succ outr])
and the monotonicity of succ under leq.
For this problem go further and make good use of a thinning step. The
we can
a-F(U,R)CR-a and V -
edit C edit -
R.
Since smj0?# =
tail*, it is sufficient to show that
remove e from es. The result is an edit sequence fs that is no longer than es and
The result of this analysis is that a shortest edit sequence can be obtained by
computing the least fixed point of the recursion equation
X = min R -
P[nil, cons -
(id x
X)] -
thin Q A[base, step)0,
Since base and step have disjoint ranges we can appeal to Proposition 9.1 and obtain
X =
(empty -» nil, minR P(cons (id x X)) thin(U x
V) hstep°),
where empty (x, $/) holds if both x and y are empty lists.
f [(cp2/a,(z,2/))], if a = 6
[(deZ a, (x, [b] -tf ?/)), (ms 6, ([a] -H- z, y))], otherwise.
The relation min R is implemented as the function minlist R on lists. The result is
that mle can be implemented by the program
mle =
(empty —> m/, minlist R list (cons (id x ra/e)) unstep).
The program terminates because the second components of unstep (x, y) are pairs
ofstrings whose combined length is strictly smaller than that of x and y.
The problem with this implementation is that the running time is an exponential
function of the sizes of the two input strings. The reason is that the same subprob-
lem is solved many times over. A suitably chosen tabulation scheme can bring this
down to quadratic time, and this is the next part of the derivation.
Tabulation
The tabulation scheme for mle is motivated by the observation that in order to
compute mle (x, y) we also need to compute mle (u, v) for all tails u of x and tails
v of y. It is helpful to imagine these values arranged in columns: for example,
column xy =
[mle (u, y) u <— tails x],
228 9 / Dynamic Programming
column x =
([fstcol x, nextcol x]).
It is easy to check from the definition of mle that fstcol = tails-list del. The function
nextcol is to satisfy the equation
otherwise,
< n
column x
([b] -tf y) column x y
Thus, each entry in the left column may depend on the one below it (if a delete is
best), the one to the right (if an insert is best), and the one below that (if a copy
is best).
In order to have all the necessary information available in the right place, the
catamorphism for nextcol is applied to the sequence
The elements of x are needed for the case analysis in the definition of mle, and
adjacent pairs of elements in column x y are needed to determine the value of mle. The
bottom element of nextcol x (6, column x y) is obtained from the bottom element of
column x y as a special case. With this explanation, the definition of nextcol is
where
xus =
zip(x, zip (init us, tail us))
base (b,u) =
[[ins b] -tf w],
and
The program
The only change in the Gofer program is that an edit sequence v is represented by
the pair (v, length v) for efficiency. The program is
> data Op -
Cpy I Del Char I Ins Char
Char
> mle (x,y) =
outl (head (column x y))
> column x =
catalist (fstcol x, nextcol x)
> fstcol x =
zip (tails (list Del x), countdown (length x))
>
cpy b (ops,n) (Cpy b ops, n+1)
=
:
> countdown 0 =
[0]
> countdown (n+1) =
(n+1) : countdown n
Finally, let us show that this program takes quadratic time. The evaluation of
column x y requires q evaluations of nextcol, where q is the length of y, and the
time to compute each evaluation of nextcol is 0(p) steps, where p is the length of
x. Hence the time to construct column xy is 0(p x q) steps.
230 9 J Dynamic Programming
Exercises
bin (bin (tip 01, tip 02), bin (tip a$, tip 04)),
while the alternative bracketing a\ 0 ((02 © ^3) © &*) is represented by the tree
bin (tip 01, bin (bin (tip 02, tip a3), tip a±)).
A tree can be flattened by the function flatten : list+ A <- tree A defined by
flatten =
([wrap, cat]),
where cat : list+ A <- (list+ A)2. This function produces the list of tip values in
left to right order. Our
problem, therefore, is to find a function met (short for
"minimum cost tree") satisfying
met C min R A([wrap, cat])0,
where R = cost0 leq cost.
The interesting part is the definition of cost. Here is the general scheme:
(tip a)
cost = 0
size (tip a) = st a
In words, the cost of building a single tip is zero, while the cost of building a node
is some function cb of the sizes of the expressions associated with the two subtrees,
plus the cost of building the two subtrees. The function size (which, by the way,
has no relation to the function that returns the number of elements in a tree) is a
(cost, size) =
([opt, opb]),
where opt =
(zero, st) and
Derivation
obvious criterion for preferring some decompositions over others, so the thinning
step is omitted and we will aim for an application of Theorem 9.1. To establish the
monotonicity condition we will need the assumption that the size of an expression
is
dependent only on its atomic constituents, not on the bracketing. This condition
is satisfied if the function sb in the definition of size is associative. It follows that
size = sz -
For the monotonicity condition we will use Proposition 9.3 and choose a function g
satisfying
cost-[tip, bin) =
g (id + (cost, flatten)2) (9.5)
g-(id + (leq x id)2) C leq g. (9.6)
We take
g =
[zero, outl opb (id x
sz)2],
where sz is the function introduced above.
232 9 / Dynamic Programming
=
{definition of g; coproducts and products}
[zero, outl opb (cost, sz flatten)2]
=
{assumption sz flatten size} =
=
{products}
cost [tip, bin].
-
=
{definition of </}
leq-g.
The dynamic programming theorem is therefore applicable and says that we can
compute a minimum cost tree by computing the least fixed point of the recursion
equation
X =
(single -» tip wrap0, min R P(6in (X x A")) Acat°),
where single x holds if x is a singleton list. The recursion can be implemented by
representing Acat° as the function splits, where
splits =
zip -
(inits*, tails+),
and inits+ and £ai/s+ return the lists of proper initial and tail segments of a list;
9.3 J Optimal bracketing 233
met {single -» tip head, minlist R list (bin (met x met)) splits).
The program terminates because the recursive calls are on shorter arguments: if
(y, z) is an element of splits x, then both y and z are shorter than x. As in the
case of the string editing problem, multiple evaluations of the same subproblem
mean that the program has an exponential running time, so, once again, we need a
tabulation scheme.
Tabulation
In order to compute met x we also need to compute met y for every non-empty
segment y of x. It is helpful to picture these values as a two-dimensional array:
mct(ai)
mct(aiaa) met (02)
The object is to compute the bottom entry of the leftmost column. We will represent
the array as a list of rows, although we will also need to consider individual columns.
Hence we define
The functions inits and tails both have type list*(list+ A) <— list+ A; the function
inits returns the list of non-empty initial segment in increasing order of length, and
tails the tail segments in decreasing order of length.
In order to tackle the main calculation, which is to show how to compute array, we
will need various subsidiary identities, solet us begin by expressing met in terms
of row and col. For the recursive case we can argue:
met
=
{recursive case of met and definition of splits}
minlist R list (bin (met x met)) zip (inits+, tails+)
234 9 J Dynamic Programming
{since list
g) zip zip (listf x Zist </)}
(f x =
Hence
met =
(single -» tap ftead, mix (co/ init, row to'/)) (9.7)
mix =
minlist R Zist 6m 2ip.
Next, let us express col in terms of row and co/. For the recursive case, we argue:
col
=
{definition of col}
list met -
inits
list met -
Hence
coZ =
(single -» wrap tip ftead, next (col init, row tail)) (9.8)
next =
snoc (outl, mix).
Equation (9.8) can be used to justify the implementation of col as a loop (see
Exercise 9.13):
col =
loop next (wrap tip ftead, list row mits tail).
Below, we will need this equation in the equivalent form:
col -
cons process (id array)
x (9.9)
process =
loop next ((wrap tip) -
x id).
9.3 J Optimal bracketing 235
As a final preparatory step, we express row in terms of met and col. For the
recursive case we can argue:
row
=
{definition of row}
list met -
tails
=
{since tails
(id, tails tail) on non-singletons}
cons
Hence
row =
(single -» wrap tip ftead, cons -
array =
(\fstcol, addcol]),
for appropriate functions fstcol and addcol. It is easy to check that
fstcol =
wrap wrap tip,
array cons
=
{definition of array}
list row -
inits cons
=
{since inits (wrap outl, tail inits cons)}
cons = cons
=
{(9.10)}
cons (wrap tip om£Z, Zis£ (cons (met, row £m/)) tail imfe cons).
-
We continue by simplifying the second term, abbreviating tail inits cons by tic:
=
{since list (f,g) zip (listf, list g)}=
=
{products}
list cons -
=
{(9-9)}
Zis£ cons zip (tail process (id x array), array outr)
- -
=
{products}
list cons zip (tail process, outr) (id x array).
-
addcol =
cons (wrap tip om£Z, step)
step = list cons zip (tail process, outr).
-
The program
The following Gofer program follows the above scheme, except that we label the
trees with cost and size information. More precisely, the tree bin (x,y)is represented
by bin (c, s) (x, y), where c =
cost (bin (x, y)) and s =
size (bin (x, y)):
> met =
head . last .
array
>
array
=
catallist (fstcol, addcol)
> fstcol =
wrap wrap.
tip .
> addcol =
cons pair (wrap
.
tip outl, step) . .
>
step =
list cons zip pair (tail
.
process, outr) . .
>
process
=
loop next cross (wrap .
tip, id) .
> next =
snoc pair (outl, minlist r
. list bin zip) . .
>
tip =
Tip
> bin (x,y) =
Bin (c,s) (x,y)
> where c =
cb (size x, size y) + cost x + cost y
> s =
sb (size x, size y)
Finally, let us estimate the running time of the program. To build an (n x n) array,
the operation addcol is performed n 1 times. For execution of addcol on an array
of size (m x
m), the operation step takes 0(m2) steps since next is executed m
times and takes 0(m) steps. So the total is 0(n3) steps.
Exercises
9.9 Same questions as in the preceding exercise, but for the problem of computing
the product x\ x X2 x x xn.
9.10 Same questions, but for matrix multiplication in which we want to compute
M\ x Mi x Mn, where Mj is an (rj_i, rj) matrix.
9.11 Prove the claim that concat is best evaluated in terms of a catamorphism on
cons-lists.
9.13 The standard function loopf is defined in the Appendix by the equations
loopf (id -
x nil) = outl
loopf (id -
x cons) =
loopf (/ x id) assocl.
loopf (id -
x nil) = outl
loopf (id -
x
snoc) =
f (loopf x id) assocl.
k =
loopf (g -
x list h inits)
238 9 J Dynamic Programming
if and only if
k -
(id x nil) =
g outl
k -
(id x snoc) =
f (k (id x
outl), h snoc outr).
Hence prove (9.9).
9.14 The optimal bracketing problem can be phrased, like the knapsack problem,
in terms of catamorphisms. Using the converse function theorem, express flatten0
as a catamorphism, and hence find a thinning algorithm for the problem. (This is
a research problem.)
9.15 Explore the variation of the bracketing problem in which 0 is assumed to be
commutative as well as associative. Thisgives us the freedom to choose an optimal
bracketing from among all possible permutations of the input [a\, da,..., an).
decode =
([nil, extend]},
where
extend
(x, sym a) =
x 4f [a]
extend (x, ptr (y, z)) = x 4f z, provided (y 4f z) init+ (x 4f z).
9.4 J Data compression 239
The relation init+ is the transitive closure of init and describes the proper prefix
relation. Note in the second equation that it is not required that y 4f z be a prefix
of x; in particular, we have
We have chosen to define pointers as pairs of strings, but the success of data
compression in practice results from representing each pointer (y,z) simply by the
lengths of y and z. For this new representation, the decoding of a pointer is given by
#<8>(ra,0) = x
x(g)(m, n + 1) =
(x4f [xm]) <8> (m + l,n).
Here, xm is the rath element of x (counting from 0). This change of representation
yields a compact representation of strings. For instance,
Bearing the new representation of pointers in mind, we define the size of a code
sequence by
size =
([zero, plus [id x c,id x
p]- distr]),
where c and p are given constant functions returning the amount of space to store
symbols and pointers. Typically, symbols require one byte, while pointers require
fourbytes (three bytes for the first number, and one byte for the second). Both
c and p are determined by the implementation of the algorithm on a particular
computer.
The function size induces a preorder R = size0 leq size, so our problem is to
compute a function encode satisfying
Derivation
symbol or a pointer, assuming that both choices are possible. On the other hand,
it is possible to choose between pointers: a pointer (y,z) should be better than
(y',zf) whenever z is longer than z' because a longer portion of the input will then
be consumed. More precisely, suppose
w = extend (x, ptr (y, z)) and w = extend (#', ptr ($/', z')),
so w = x 4f z = x' 4f z'. Now, z is longer than z' if and only if z' is a suffix of z.
Q =
F(n + n, prefix),
where the first II is the universal relation on symbols, and the second II is the
universal relation on pointers. The functor F is given by
a -
The first condition is routine using the fact that the sizes of symbols and pointers
are constants (i.e. [c, d] (II + II) =
[c, d]), and we leave details as an exercise. The
second condition follows if we can show
init -
decode C decode R.
We give an informal proof. Suppose decode cs = snoc (x, a); either cs ends with the
code element sym a, in which case drop it from
cs, or it ends with the code element
ptr (y, z4f [a]) for some y and z\ in the second case, replace it by ptr (y,z)i£z^[],
or drop it if z
[]. The result is a new code sequence that decodes to x, and which
=
The dynamic programming theorem states that the data compression problem can
X =
(null -> nil, min R P(snoc (X x
id)) thin (U x
V) Aextend0).
9.4 J Data compression 241
(Aextend0) (w 4f [a]) =
Irt w = min (U x
V) {(#, (y, z)) x -M- z = w A (y -H- 2?) init+ w},
and so implement thin (U x
V) Aextend0 by a function reduce defined by
if 2^ []
{[(ti;,«yma),(a;,p*r(y,z))],
[(11;, sym a)],
where (x, (y, z)) = H (w 4f
otherwise
[a]).
There is a fast algorithm for computing Irt (Crochemore 1986) but we give only a
simple implementation.
encode =
(null —> nil, minlist R list (snoc (encode x
id)) reduce).
As with preceding problems, the computation of encode is inefficient since the same
subproblem may be computed more than once. We will not, however, go into
the details of a tabulation phase; although the general scheme is clear, namely, to
compute encode on all initial segments of the input string, the details are messy.
The program
> nil' =
const ([] ,0)
> snoc' =
cross (snoc, plus cross (id, bytes))
.
dupr .
> reduce w =
[(init w, Sym (last w)), (x, Ptr (y,z))], if z /= []
> =
[(init w, Sym (last w))], otherwise
> where (x,(y,z)) =
Irt w
242 9 J Dynamic Programming
> lrt w =
head [(x,(y,z)) I (x,z) <- splits w, y <- Iocs (w, z)]
> Iocs (w,z) =
[y I (y, v) <- splits (init w), prefix (z, v)]
>
prefix ([], v) =
True
>
prefix (z, []) =
False
>
prefix (a:z,b:v) =
(a ==
b) && prefix (z,v)
Exercises
Bibliographical remarks
In 1957, Bellman published the first book on dynamic programming (Bellman 1957).
Bellman showed that the use of dynamic programming is governed by the principle
of optimality, and many authors have since considered the formalisation of that
principle as a monotonicity condition, e.g. (Bonzon 1970; Mitten 1964; Karp and
Held 1967; Sniedovich 1986). The paper by Karp and Held places a lot of emphasis
on the sequential nature of dynamic programming, essentially by concentrating on
list-based programming problems. The preceding chapter deals with that type of
problem.
(Helman and Rosenthal 1985; Helman 1989a) present a wider view of dynamic
programming, generalising from lists to more general tree-like datatypes. Our approach
is a natural reformulation of those ideas to a categorical setting, making the
definitions and proofs more compact by parameterising specifications and programs
with functors. Furthermore, the relational calculus admits a clean treatment of
indeterminacy.
Bibliographical remarks 243
The work of Smith (Smith and Lowry 1990; Smith 1991) shows close parallels with
the view of dynamic programming put forward here: in fact the main difference is
in the style of presentation. Smith's work has the additional aim of mechanising the
algorithm design process. To this end, Smith has built a system that implements
his ideas (Smith 1990), and has illustrated its use with an impressive number of
examples. As said before, we have not investigated whether the results of this book
are amenable to mechanical application, although we believe they are. The ideas
underlying Smith's work are also of an algebraic nature (Smith 1993), but, again,
this is rather different in style from the approach taken here.
Besides Bellman's original book, there are many other texts on dynamic
programming, e.g. (Bellman and Dreyfus 1962; Denardo 1982; Dreyfus and Law 1977).
There is a fair amount of work on tabulation, and on ways in which tabulation
schemes may be formally derived (Bird 1980; Boiten 1992; Cohen 1979; Pettorossi
1984). These methods are, however, still ad-hoc, and a more generic solution to the
problem of tabulation remains elusive.
Greedy Algorithms
10.1 Theory
As in the preceding chapter, define H ([ft]) ([T])0, where h and T are F-algebras.
=
The proof of the following theorem is very similar to that of Theorem 9.2 and is left
as an exercise:
(/xX:ft-FX-ramQ-AT°) C M.
Theorem 10.1 has exactly the same hypotheses as Theorem 9.2 but appears to give
a much stronger result. Indeed it does, but the crucial point is that it is much
harder to refine the result to a computationally useful program. To do so, we need,
in addition to the conditions described in the preceding chapter, the further—and
very strong—condition that Q is preorder on sets returned by AT°.
a connected
This was not the case with the examples given in the preceding chapter. Since Q
is a relation on FA (for some ^4) and FA is often a coproduct, we can make use of
the following result, which is a variation on Proposition 9.1.
Proposition 10.1 Suppose that V\ and V2 have disjoint ranges, that is, suppose
thatVi° V2 =
0. Then
[UuU2]"min{Q1 + Q2)-A[VuV2]0 =
(ran Vi -> Wu W2),
where W% =
Ui min Qi A Vi° for i =
1,2.
246 10 J Greedy Algorithms
Recall also Proposition 9.4, which states that the hypotheses of the greedy theorem
can be satisfied by taking Q F([/, V), where U and V are preorders such that
=
h-F(U,R)CR.h and H V° C R° H.
However, such a choice of Q is not always appropriate when heading for a greedy
algorithm since we also require min Q AT° to be entire.
Exercise 1-20. Write a program detab that replaces tabs in the input
with the proper number of blanks to space to the next tab stop. Assume
a fixed set of tab stops, say every n columns. Should n be a variable or
a symbolic parameter?
Exercise 1-21. Write a program entab that replaces strings of blanks
by the minimum number of tabs and blanks to achieve the same spacing.
Use the same tab stops as for detab. When either a tab or a single blank
would suffice to reach a tab stop, which should be given preference?
Our aim in this section is to solve these two exercises. They go together because
entab is specified as an optimum converse to detab.
Detab
detab =
([nil, expand]),
where
expand (x, a) =
(a =
TB -> fill x, x 4f [a])
fillx =
x 4f blanks (n (col x) mod n),
and
col =
([zero, count])
count (c, a) =
(a NL —> 0,
=
c + 1).
The expression blanks m returns a string of m blanks, TB denotes the tab character,
and NL the newline character. The function col counts the columns in each line of
the input, and tab stops occur every n columns.
10.2 J The detab-entab problem 247
if NL
f (a?-H-[M],0), a =
where convert converts cons-lists to snoc-lists. The resulting Gofer program is:
> detab =
outl .
loop step pair (pair (nil, zero), id)
.
> -
> =
(x ++ [a], c+1), otherwise
> where m =
n
-
(c 'mod' n)
> blanks 0 =
[]
' '
> blanks (m+1) =
: blanks m
There is another optimisation that improves efficiency still further. Observe that
base and step take a particular form, namely,
base =
(nil, cq)
step{{x,c),a) =
(x-U-f(c,a),g{c,a)),
for some constant cq and functions / and g. When base and step have this form,
we have
loop'(f,g)(c,[]) =
[}
loop'(f,g)(c,[a]^rx) =
f (c,a) 4f/<V'(/,g)(g(c,a),x).
ft[ao,ai,...,an-i] =
((#i 4f X2) 4f •) 4f xn
> detab x =
detab'(0,x)
> detab'(c,[]) =
[]
> detab'(c,a:x) =
['Xn'] ++detab'(0,x), if a ==
'\n'
> =
blanks m ++ detab'(c+m,x), if a ==
'\t'
> =
[a] ++ detab'(c+l,x), otherwise
> where m =
n
-
c cmodc n
Entab
Derivation
F(C/, V) id +=
(V U). Furthermore, according
x to Proposition 9.4, the greedy
condition holds if we can find U and V to satisfy the two conditions
a -
where a =
[nil,snoc]. Bear in mind, however, the additional requirement that
min Q A[mZ, expand]0 be entire.
10.2 J The detab-entab problem 249
precisely, define
V =
prefix H (fill0 -fill).
To prove V detab C detab R we reason:
V detab
=
{since detab is a catamorphism}
V -
Leaving aside the claim for the moment, we have shown that X V =
detab is a
the greatest solution of this inequation is the unique solution of the corresponding
equation, namely X = detabU(X-init). But the unique solution is X =
detab-prefix,
so V -
detab C detab prefix. It is immediate that prefix C i?, so we are done.
V -
expand
=
{definition of expand}
V (istab outr -> fill outl, snoc)
-
=
{conditionals}
(istab outr —t V fill outl, V snoc)
-
250 10 J Greedy Algorithms
=
{claim: V -fill fill (exercise)} =
The conditions of the greedy theorem are established, so we can solve our problem
by computing the least fixed point of the equation
X =
[nil, snoc] (id + (X x id)) min Q A [ml, eapand]0,
where Q = id + (V x [/). Appeal to Proposition 10.1 gives
X =
(null -> nil, snoc (X x
id) ram (V x
[/) Aexpand0).
It remains to implement min(V x
U) Aexpand0.
-
Since
Aexpand0 (x 4f [a]) =
{(y, TB) | ^// y =
x 4f [a]} U {(x, a)},
and
(3y : filly =
x -ti- [a]) =
a =
BL A col(x 4f [a]) mod n =
0,
we have
ram
V{(y, TB) filly =
x-Vt [a]} =
(unfillx, TB),
where unfill x is the shortest prefix of x satisfying
fill (unfill x) =
fillx.
unfill [] =
[]
«„/
wtfiK l*-H-[a]}
f
,v
_
-
/ unfill x, if a = BL and col (x 4f [a]) mod n ^ 0
> entab x =
[] , if null x
> =
entab y ++ [a], otherwise
> where (y,a) =
contract x
> contract x
' >
> =
(unfill y,'\t'), if a ==
&& (col x) 'mod' n ==
0
> =
(y,a), otherwise
> where (y,a) =
(init x,last x)
> unfill x =
[] , if null x
' '
> =
unfill y, if a ==
&& col x 'mod' n /= 0
> =
x, otherwise
> where (y,a) =
(init x,last x)
> =
c+1, otherwise
The idea is to define a function tbc (short for 'trailing blanks count') satisfying
entab x =
entab {unfill x) 4f blanks {tbc x). (10-1)
tbc[] =
0
tbc(x-M-\
v La]) "
=
| tbcx + 1, otherwise.
{tbc, col) =
([base, op]},
where base returns (0,0) and
{t + 1, c + 1), if a =
BL and (c + 1) mod n ^ 0
im((t A a\
op{(t,c),a)
-
- J^ (°'c + 1)' if
[f
a =
fllr and
NL
(c + 1) mod n =
0
(0Q^ a =
as a snoc-list catamorphism:
triple =
([base, op]),
where base returns ([],(0,0)) and
op((x,(t,c)),a) =
( (x, (t + 1, c + 1)), if a =
BL and (c + 1) mod n ^ 0
I (a?-H-[TB],(0,c + l)), if a = BI and (c + 1) mod n =
0
if NL
| (a;-H-6/an*»<-H-[M],(0,0)), a =
entab =
cat (id x
blanks) owW assocZ ^np/e.
> entab x =
entab'(0,0,x)
> (t, c, [] )
entab' blanks t =
> entab'(t,c,a:x)
' '
> entab'(t+l,c+l,x),
=
if a ==
&& d /= 0
' '
> ['\t'] ++ entab'(0,c+l,x),
=
if a ==
&& d ==
0
> blanks t ++ ['\n'] ++ entab'(0,0,x),
=
if a ==
'\n'
> blanks t ++ [a] ++ entab'(0,c+l,x),
=
otherwise
> where d (c+1) cmodc n =
Exercises
10.4 For V =
prefix n (fill0 fill) prove that
V nil =
nil
V-fill =
fill
V snoc C snoc U (V outl).
Which of these conditions does not hold for V =
prefix?
10.3 j The minimum tardiness problem 253
where bagify turns a bag. The function cost is defined in terms of three
list into a
penalty (x,j) =
(sum (list ct x) + ctj -
dtj) x wtj.
cost =
max leq P([zero,penalty] a°) Apreftx,
where a =
[nil, snoc]. We can also describe cost recursively by:
cost[] =
0
1 2 3
ct 5 10 15
dt 10 20 20
wt 1 3 3
254 10 J Greedy Algorithms
The best schedules are [2,3,1] and [3,2,1], each with a cost of 20; for example:
2 3 1
time 10 25 30
dt 20 20 10
penalty 0 15 20
The definition of cost is given in terms of snoc-lists, although we can use either
snoc-lists
or cons-lists to build schedules.
As we have Chapter 7 the choice of what kind of list to use can be critical in
seen in
the success greedy approach. Suppose we did go for snoc-lists. Then the final
of a
if emptybagu
{[],schedule
where
v-W-\j], otherwise
(vj) pick u
=
for some function pick. At each stage, therefore, we pick the job that is best
placed at the end of the schedule. Such algorithms are known as backward greedy
algorithms. If schedules are described by cons-lists, then the greedy algorithm would
involve picking a job that is best placed first in the schedule. In general, it does
not follow that if a greedy algorithm exists for snoc-lists, then a similar algorithm
exists for cons-lists.
bagify =
([nil, snag]),
where nil returns the empty bag, and snag (a contraction of snoc and bag, somewhat
more attractive than bsnoc) takes a pair (uj) and places j in the bag u, thereby
'snagging' it.
There is another strategic decision that should be mentioned at this point. In the
finalalgorithm input will be presented as a list rather than a bag. That means
the
we are, in effect, seeking a permutation of the input that minimises cost, so we
With this specification another avenue of attack is opened up. The relation perm
can be defined as a snoc-list catamorphism (see Section 5.6):
10.3 J The minimum tardiness problem 255
perm
=
([ml, add]),
where add (x, j) =
y 4f \j] 4f z for some decomposition x =
y 4f 2.
In this form, the minimum tardiness problem might be solvable by the greedy
method of Chapter 7. However, no greedy algorithm based on catamorphisms
exists—or at least no simple one, which is why the problem appears in this chapter
and not earlier. To appreciate why, recall that a greedy method based on snoc-
list catamorphisms solves not only the problem associated with the given list, but
also the problems associated with all its prefixes. Dually, one based on cons-list
catamorphisms solves all suffixes of the input.
Now, consider again the three example jobs described above. With the input
presentedas [1,2,3], the best schedule for prefix [1,2] is [1,2] itself, incurring zero cost.
However, this schedule cannot be extended to either [2,3,1] or [3,2,1], the two best
solutions for the three jobs. Dually, with a cons-list catamorphism, suppose the
input is presented as [3,2,1]; again a best schedule for [2,1] is [1,2], but [1,2] cannot
be extended to either [2,3,1] or
[3,2,1].
Derivation
Although nil and snag have disjoint ranges, a development along the lines of the
detab-entab problem does not work here. For this problem we need to bring context
into both the monotonicity and greedy conditions. As a result, the proof of the
greedy condition is a little tricky.
With a =
[nil,snoc], (3 =
[nil, snag] and FX = 1 + (X x
Job), the monotonicity
and greedy conditions read:
a'F(Rn(bagify° bagify)) -
C R-a (10.2)
a *
Fbagify0, (10.3)
To prove (10.2) we need the fact that cost can be expressed in the form
cost[] =
0
cost (x 4f \j]) =
bmax (cost x, penalty (perm x, j)).
This is identical to the earlier recursive characterisation of cost, except for the term
permx. It holds because penalty (x,j) depends only on the jobs in the schedule x,
not on their order. The reason is that penalty (x,j) is defined in terms of the sum
of the completion times of jobs in x, and sum applied to a list returns the same
Using perm =
bagify0 bagify, the new expression for cost can be put in the form
for which Proposition 9.3 applies:
cost -a = k F(cost, bagify)
-
k =
[zero, bmax (id x (penalty (bagify0 x
id))) assocr].
It is easy to check that
k F(leq x
id) C leq k
For the greedy condition (10.3) we will need the fact that the original definition of
cost can be rewritten in the form
cost a =
bmax (g,h) (10-4)
g =
[zero, penalty] (10.5)
h =
[zero, cost outl]. (10.6)
We will also need two additional facts. Firstly, the cost of a schedule can only
increase when more jobs are added to it. In symbols,
add C R°-outl, (10.7)
where add is the relation for which perm =
([nil, add]}. A formal proof of (10.7) is
left as Exercise 10.7.
bagify0 (3 =
[nil, add] Fbagify0. (10.8)
The proof is left as Exercise 10.8. Putting (10.7) and (10.8) together, we obtain
bagify0 /3
=
{(10.8)}
[nil, add] Fbagify0
C {(10.7)}
[nil, R° outl] Fbagify0
- -
C
{definition of R and nil C cost0 geq zero}
cost0 geq [zero, cost outl] Fbagify0
=
{definition of h}
cost0 geq h Fbagify0.
-
C {monotonicity of composition}
(a Fbagify0 Q°) n (a Fbagify0 /?° /?)
=
{catamorphisms, since bagify ([/?])} =
«
((9° 9^ 9) H (a° cos*° #eg ft)) Fbagify0
* * *
=
{products}
a (g, cost a)° (geq #, ^e^ ft) Fbagify0
The choice Q f° leq /, where /
=
[zero,penalty (bagify0
- -
= x
id)], satisfies the
required specification. In words, a minimum under Q identifies a job with the least
penalty.
a
(g, cost a)° (geq - -
g, geq -ft) C R° a.
C {monotonicity of bmax}
geq &raax (#, ft)
=
{(10-4)}
#e(7 cost a.
The greedy condition (10.3) is now established, so we can solve our problem by
computing the least fixed point of
X =
[nil, snoc] (id + (X x
id)) min Q A [ml, sna^]°.
X =
(null -> ml, snoc (X x
id) min Q' Asnag°),
where Q' =
f° leq /-
and / =
penalty (bagify0 x id).
258 10 J Greedy Algorithms
schedule =
(null —> nil, snoc (schedule x
id) pick).
The program
In the Gofer program we represent bags by lists and represent a list x by the pair
(x, sum (list ct x)). The function pick is implemented by choosing the first job in
the list with minimum penalty.
> schedule =
schedule' .
pair (id, sum . list ct)
> =
schedule' (x^f) ++
[j] , otherwise
> where x' =
delete j x
> t' =
t -
j ct
>
j =
pick (x,t)
>
pick (x,t) =
outl (minlist [(j, (t dt j) * wt j) I j <- x])
-
> where r =
leq . cross (outr, outr)
> delete j [] =
[]
> delete j (k:x) =
x, if j ==
k
> =
k : delete j x, otherwise
Exercises
k =
[zero, bmax (id x (penalty (bagify0 x
id))) assocr}.
10.7 To show that add C R° outl we can use a recursive characterisation of add:
add =
(jiX : snoc U (snoc (X x
id) exch (snoc° x
id))),
where exch :
(A x
C) x B <r-(A x
B) <r- C.
10.4 I The TeX problem -
Prove that add C R° outl using fixed-point induction (see Exercise 6.4) and the
fact that
penalty (add x
id) C geq penalty (outl x
id).
bagify0 /? =
[ml, add] Fbagify0.
10.9 Assuming all weights are the same, give an 0(n log n) algorithm for computing
the complete schedule.
10.10 The minimum lateness problem is similar to the minimum tardiness problem,
except that the cost function is defined by
cos£[] =
—oo
cost (x 4f [?']) =
bmax (penalty (xj), cost x).
It follows that costs can be negative. How does this change affect the development?
10.11 Does the problem in which cost is defined by
cost[] =
0
cost (x 4f [?]) =
plus (penalty (x, jf), cost x),
have a greedy solution?
part two
As a final example, let us solve the second of the IJgX problems described in
Chapter 3. Recall that the task is to convert between decimal fractions and
integer
multiples of 2~16. The function extern has type Decimal <- [0,216) and is
specified by the property that extern n should be some shortest decimal whose internal
representation is n:
intern =
round val
round r =
[216r+l/2j
val =
([zero, shift]}
shift (d,r) =
(d + r)/10,
260 10 / Greedy Algorithms
The first job is to cast the problem of computing extern into the standard mould.
C
extern min R A(val° round0).
Since round0 is not a function we cannot simply take it out of the A expression.
Instead, we use the fact that
n=L216r + l/2j =
2n -
round0 =
inrange interval,
where
interval n =
((2n -
l)/217,(2n + l)/217)
r inrange (a, 6) =
(a < r < 6).
Since interval is a function, we can rewrite the specification of extern to read
catamorphism on cons-lists:
inrange0 val =
([arb, step]).
The conditions to be satisfied are
inrange0 zero =
arb
inrange0 shift =
step (id x
inrange0).
The first condition determines arb and to determine step we argue:
=
{introducing step (d, (a', 6')) ((d =
+ a')/10, (d + 6')/10)}
(a, 6) (step (id x inrange0)) (d, r).
So far we haven't considered the restriction on the problem, namely, that the
argument to extern is an integer n in the range 0 < n < 216. For n in this range we
have interval n =
(a, 6), where a and 6 have the property that
Derivation
where ft =
(Jar6, step]).
For this problem a simple choice of Q suffices. To see why, consider the expression
A[ar6, step]0. Writing * for the sole inhabitant of the terminal object, we have for
(a, b) of type Interval that
(Aarb°)(a,b) =
(a < 0 -> {*},{ })
(Astep°) (a, 6) =
{(d, (10a -
d, 106 -
d < 1}.
But for digits d\ and d%, bearing (10.9) in mind,
(0 < 106 -
da < 1) => dr =
d^.
Hence step0 :
(Digit x
Interval) <- Interval is, in fact, a function
step0 (a, 6) =
(d, (10a -
d, 106 -
d)), where d =
[106J.
262 10 J Greedy Algorithms
It follows that
d))}, if a < 0
and so Q need only choose between two alternatives. The appropriate definition of
Qis
Q =
(inl-\-inr°) U id,
where ! : 1 <- (Digit x Interval). With this choice of Q the inhabitant of the terminal
object is preferred whenever possible.
Q Fh a° C Fh a° R
=
{definition of Q}
((inl •! inr°) U id) Fh a° C Fft a° i?
=
{since i? is reflexive}
m/ •! inr° Fft a° C Fh a° R
=
{definition of F}
inl •! (id x h) mr° a° C Fft a° i?
=
{universal property of ! and a inr =
cons}
inl •! cons° C Fft a° i2
=
{shunting}
id C !° m/° Fft a° R cons
=
{definition of F}
id C !° m/° a° R cons
=
{since a- inl =
nil}
id C !° nil° i? cons
=
{shunting}
m/ •! C i? cons
We know how to simplify min Q A[ar6, step]0 and the result is that
extern =
f interval,
where
[], ifa<0
f(a, b) =
{
{ [d] 4f/(10a d, 106
where d [I0b\.
The final step is to introduce the restriction that the
=
- -
d), otherwise
computation should be
performed using integer arithmetic only. This turns out to be easy: writing w =
217,
every interval (a, b) computed during the algorithm satisfies a p/w and b = =
q/w
for some integers p and q. Initially, we have interval ((2n l)/w, (2n +
n =
l/w))
and if p/w, then 10a d
a = =
(lOp wd)/w, similarly for b. Representing
(p/w, q/w) by (p, q), we therefore obtain that extern n f(2n 1,2n + 1), where
=
[], ifp<0
f{p-> Q) {
i [d] -H-/(10p wd, lOq wd),
where d (lOq) div w.
=
otherwise
The program
> extern =
f . interval
>
f(p,q) =
[], if p <= 0
> =
[d] ++
f(10*p w*d, 10*q
- -
w*d), otherwise
> where d «
(10*q) 'div' w
> interval n =
(2*n -
1, 2*n + 1)
> w =
131072
Exercises
10.14 Show that the only property of w = 217 assumed in the derivation is that
lOq/w should not be aninteger for any q with 0 < q < w. How can this restriction
be removed?
264 10 J Greedy Algorithms
Bibliographical remarks
For general remarks about the literature on greedy algorithms, see Chapter 7. The
approach of this chapter is arguably more general, and closer to the view of greedy
algorithms in the literature. We originally published the idea that dynamic
programming and greedy algorithms are closely related in (Bird and De Moor 1993a).
A similar suggestion occurs in (Helman 1989b).
In this chapter we have only problems where the base functor F is linear:
considered
no tree-like structures were For non-linear F, the recursion would be
introduced.
more appropriately termed 'divide-and-conquer'. We have not investigated this in
The following Gofer prelude file contains definitions of the standard functions
running all the programs in this book. As a prelude for general functional
necessary for
programming it is incomplete.
infixr 9 .
infixl 7 *
infix 7 /, 'mod'
infixl 6 +, -
infixr 5 ++, :
infixr 3 kk
infixr 2 ||
Standard combinators:
(f.g)x-f(gx)
const k a =
k
id a =
a
outl (a,b) =
a
outr (a,b) =
b
assocl (a,(b,c)) =
((a,b),c)
assocr ((a,b),c) =
(a,(b,c))
266 Appendix
dupl (a,(b,c)) =
((a,b),(a,c))
dupr ((a,b),c) =
((a,c),(b,c))
pair (f ,g) (f g a)
=
a a,
(f,g) (a,b) (f g b)
=
cross a,
cond p (f,g) a
=
if (p a) then (f a) else (g a)
curry fab
=
f (a,b)
uncurry f (a,b) =
f a b
Boolean functions:
false =
const False
true =
const True
False && x =
False
True && x =
x
False I I x =
x
True I I x =
True
not True =
False
not False =
True
otherwise =
True
Relations:
eql =
uncurry (==)
uncurry (/=)
=
neq
gtr
=
uncurry (>)
uncurry (>=)
=
geq
meet (r,s) =
cond r (s, false)
join (r,s) =
cond r (true, s)
wok r =
r .
swap
Numerical functions:
zero =
const 0
succ =
(+1)
pred =
(-1)
Appendix 267
negative =
(< 0)
positive =
(> 0)
List-processing functions:
[] ++ y =
y
(a:x) ++ y =
a : (x++y)
null [] =
True
null (a:x) =
False
nil =
const []
wrap
=
cons pair (id, nil)
.
cons =
uncurry (:)
cat uncurry (++)
=
concat =
catalist ([], cat)
snoc =
cat cross
.
(id, wrap)
head (a:x) =
a
tail (a:x) =
x
split =
pair (head, tail)
last =
catallist (id, outr)
init =
catallist (nil, cons)
inits =
catalist ([[]], extend)
where extend (a,xs) [[]] = ++ list (a:) xs
tails =
catalist ([[]], extend)
where extend (a,x:xs) (a =
: x) : x : xs
splits =
zip pair (inits, tails)
.
cpp (x,y) =
[(a,b) I a <- x, b <- y]
cpl (x,b) =
[(a,b) I a <- x]
cpr (a,y) =
[(a,b) I b <- y]
cplist =
catalist ([[]], list cons .
cpp)
minlist r =
catallist (id, bmin r)
bmin r =
cond r (outl, outr)
268 Appendix
maxlist r =
catallist (id, bmax r)
bmax r =
cond (r .
swap) (outl, outr)
thinlist r =
catalist ([], bump r)
where bump r (a,[]) [a] =
I r(b,a) =
b:x
I otherwise =
a:b:x
length =
catalist (0, succ .
outr)
sum =
catalist (0, plus)
trans =
catallist (list wrap, list cons .
zip)
list f =
catalist ([], cons . cross (f, id))
filter p =
catalist ([], cond (p .
outl) (cons, outr))
catalist (c,f) [] =
c
loop f (a,b:x) =
loop f (f (a,b), x)
zip (x,[]) =
[]
zip ([],y) =
[]
zip (a:x,b:y) =
(a,b) :
zip (x,y)
unzip =
pair (list outl, list outr)
words =
filter (not.null) . catalist ([[]], cond ok (glue, new))
' '
where ok (a,xs) =
(a /= && a /= 'W)
glue (a,x:xs) =
(a:x):xs
new (a,xs) =
[]:xs
Appendix 269
lines =
catalist ([[]], cond ok (glue, new))
where ok (a,xs) =
(a /= '\n')
glue (a,x:xs) =
(a:x):xs
new (a,xs) =
[]:xs
unwords =
catallist (id, join)
" M
where join (x,y) x = ++ ++
y
unlines =
catallist (id, join)
where join (x,y) x =
„\n„
show :: a ->
String
show x =
primPrint 0 x []
flip f a b =
f b a
Aarts, C. J., Backhouse, R. C, Hoogendijk, P. F., Voermans, E., and Van der
Woude, J. C. S. P. (1992). A relational theory of datatypes. Available from
URL http: //www. win. tue. nl/win/cs/wp/papers/papers.html.
Ahrens, J. H. and Finke, G. (1975). Merging and sorting applied to the 0-1
Asperti, A. and
Longo, G. (1991). Categories, Types, and Structures: An
Introduction toCategory Theory for the Working Computer Scientist.
Foundations of Computing Series. MIT Press.
Backhouse, R. C., De Bruin, P., Malcolm, G., Voermans, T. S., and Van der
Woude, J. C. S. P. (1991). Relational catamorphisms. In Moller, B., editor,
Constructing Programs from Specifications, pages 287-318. Elsevier Science
Publishers.
272 Bibliography
Backus, J. (1978). Can programming be liberated from the Von Neumann style? a
Barr, M. and Wells, C. (1985). Toposes, Triples and Theories, Volume 278 of
Grundlehren der Mathematischen Wissenschaften. Springer-Verlag.
Bauer, F. L., Berghammer, R., Broy, M., Dosch, W., Geiselbrechtinger, F., Gnatz,
R., Hangel, E., Hesse, W., Krieg-Briickner, B., Laut, A., Matzner, T., Moller,
B., Nickl, F., Partsch, H., Pepper, P., Samelson, K., Wirsing, M., and
Wossner, H. (1985). The Munich Project CIP. Volume I: The Wide Spectrum
Language CIP-L, Volume 183 of Lecture Notes in Computer Science.
Springer-Verlag.
Bauer, F. L., Ehler, H., Horsch, A., Moller, B., Partsch, H., Paukner, O., and
Pepper, P. (1987). The Munich Project CIP. Volume II: The Program
Transformation System CIP-S, Volume 292 of Lecture Notes in Computer
Science. Springer-Verlag.
Bird, R. S., Gibbons, J., and Jonesr G. (1989). Formal derivation of a pattern
matching algorithm. Science of Computer Programming, 12(2), 93-104.
appear.
Carboni, A., Kasangian, S., and Street, R. (1984). Bicategories of spans and
relations. Journal of Pure and Applied Algebra, 33(3), 259-267.
Carboni, A., Lack, S., and Walters, R. F. C. (1993). Introduction to extensive and
distributive categories. Journal of Pure and Applied Algebra, 84(2), 145-158.
Chen, W. and Udding, J. T. (1990). Program inversion: more than fun!. Science
of Computer Programming, 15(1), 1-13.
Desharnais, J., Mili, A., and Mili, F. (1993). On the mathematics of sequential
decompositions. Science of Computer Programming, 20(3), 253-289.
Eppstein, D., Galil, Z., Giancarlo, R., and Italiano, G. F. (1992). Sparse dynamic
programming II: Convex and concave cost functions. Journal of the ACM,
39(3), 546-567.
Fegaras, L., Sheard, T., and Stemple, D. (1992). Uniform traversal combinators:
definition, use and properties. In Kapur, D., editor, Automated Deduction,
Volume 607 of Lecture Notes in Computer Science, pages 148-162.
Springer-Verlag.
22(1-2), 21-44.
Bibliography 279
Gibbons, J., Cai, W., and Skillicorn, D. B. (1994). Efficient parallel algorithms for
tree accumulations. Science of Computer Programming, 23, 1-18.
Gries, D. (1984). A note on a standard strategy for developing loop invariants and
loops. Science of Computer Programming, 2, 207-214.
Gries, D. (1990a). Binary to decimal, one more time. In Beauty is our Business: A
Birthday Salute to Edsger W. Dijkstra, pages 141-148. Springer-Verlag.
Jay, C. B. (1994). Matrices, monads and the fast fourier transform. In Proceedings
of the Massey Functional Programming Workshop 1994y pages 71-80.
Knuth, D. E. (1990). A simple program whose proof isn't. In Feijen, W., Gries,
D., and Van Gasteren, A. J. M., editors, Beauty is Our Business A Birthday -
Kock, A. (1972). Strong functors and monoidal monads. Archiv fiir Mathematik,
23, 113-120.
Mac Lane, S. and Moerdijk, I. (1992). Sheaves in Geometry and Logic: A First
Introduction to
Topos Theory. Universitext. Springer-Verlag.
Meijer, E. and Hutton, G. (1995). Bananas in space: extending fold and unfold to
Meijer, E., Fokkinga, M., and Paterson, R. (1991). Functional programming with
bananas, lenses, envelopes and barbed wire. In Hughes, J., editor,
Proceedings of the 1991 ACM Conference on Functional Programming
Languages and Computer Architecture, Volume 523 of Lecture Notes in
Computer Science, pages 124-144. Springer-Verlag.
Mili, A., Desharnais, J., and Mili, F. (1987). Relational heuristics for the design of
deterministic programs. Acta Informatica, 24(3), 239-276.
Mili, A., Desharnais, J., and Mili, F. (1994). Computer Program Construction.
Oxford University Press.
Schmidt, G. W., Berghammer, R., and Zierer, H. (1989). Symmetric quotients and
domain construction. Information Processing Letters, 33(3), 163-168.
Schroder, E. (1895). Vorlesungen iiber die Algebra der Logik (Exakte Logik).
Dritter Band: Algebra und Logik der Relative. Teubner, Leipzig.
288 Bibliography
Wadler, P. (1987). Views: a way for pattern matching to cohabit with data
abstraction. In Principles of Programming Languages, pages 307-313.
Association for Computing Machinery.
inequation, 82 Miranda, 1
infinite lists, 52 modular identity, 88
initial algebra, 45-49 modular law, 84
initial object, 37-38 modulus computation, 145
initial type, 51 monad, 52
injection, 17, 28, 65, 68 monic arrow, 28
insertion sort, 157 monotonic algebra, 172
inverse, 16-18, 29 monotonic functor, see relator
involution, 83, 101 monotonicity
isomorphism, 29, 33, 48 of composition, 82
iterative definition, 12, 14 of division, 99
//-calculus, 161
jointly monic arrows, 92
natural isomorphism, 34, 67
Kleene'stheorem, 141 natural transformation, 19, 33-35
knapsack problem, 205 naturality condition, 34, 133
Knaster-Tarski theorem, 140 negation operator, 2
non-empty lists, 13
Lambek's Lemma, 49, 142 non-empty power object, 107
large category, 31 non-strict constructor, 43
Lawvere's Recursion Theorem, 78 non-strict
semantics, 22
lax natural transformation, 132, 148, nondeterminism, 81
182
layered networkproblem, 196 objects of a category, 25
lazy functional programming, 43, 45 one-pass program, 56
lexical ordering, 98, 175 opposite category, 28
linear functor, 202 optimal bracketing problem, 230
linear order, 152 Orwell, 1
list comprehension, 13
locale, 91 pair operator, 39, 43
locally complete allegory, 96 paragraph problem, 190, 207
longest upsequence problem, 217 parallel loop fusion, 78
loops, 12, 264 partial function, 26, 30, 88
lower adjoint, 101 partial order, 44, 86, 108
partitions, 128
maximum, 166 pattern matching, 2, 66
maximum segment sum problem, 174 permutations, 130
membership relation, 32, 34, 103, ping-pong argument, 82
147-151 point-free, 19-22
memoisation, 219 pointwise, 19-22
mergesort, 156 polymorphism, 18-19, 34, 35
merging loops, 56 polynomial functor, 44-45
minimal elements, 170 power allegory, 103, 117
minimum tardiness problem, 253 power functor, 105
294 Index
rally driver's problem, 217 set theory, 95, 104, 162, 169
balanced, 57
binary, 14
general, 16
weighted path length, 62
truth tables, 68
tupling, 78
type functor, 44, 49-52
type information, 27
type relator, 122
unit, 91, 94
unitary allegory, 94
universal property
of ram, 166
of thin, 193
of catamorphisms, 46
of closure, 157
of coproducts, 41
of division, 98
of implication, 97
of join, 96
of meet, 83
of power transpose, 103
of products, 39
of range, 86
of terminal object, 37
universal quantification, 98
upper adjoint, 101
Prentice Hall International Series in Computer Science (continued)