AlgebraOfPrograms PDF

Download as pdf or txt
Download as pdf or txt
You are on page 1of 312

Algebra of

Programnimg
Prentice Hall International Series in Computer Science

C.A.R. Hoare, Series Editor

APT., K. R., From Logic to Prolog


ARNOLD, A., Finite Transition Systems
ARNOLD, A. and GUESSARIAN, I., Mathematics for Computer Science
BARR, M. and WELLS, C, Category Theoryfor Computing Science (2nd edn)
BEN-ARI, M., Principles of Concurrent and Distributed Programming
BEN-ARI, M., Mathematical Logic for Computer Science
BEST, E., Semantics of Sequential and Parallel Programs
BIRD, R. and de MOOR, O., The Algebra of Programming
BIRD, R. and WADLER, P., Introduction to Functional Programming
BOVET, D.P. and CRESCENZI, P., Introduction to the Theory ofComplexity
de BROCK, B., Foundations of Semantic Databases

BRODA, EISENBACH, KHOSHNEVISAN and VICKERS, Reasoned Programming


BRUNS, G., Distributed Systems Analysis with CCS
BURKE, E. and FOXLEY, E., Logic Programming
BURKE, E. and FOXLEY, E., Logic and Its Applications
CLEMENT, T., Programming Language Paradigms
DAHL, O.-J., Verifiable Programming
DUNCAN, E., Microprocessor Programming and Software Development
ELDER, J., Compiler Construction
ELLIOTT, R.J. and HOARE, C.A.R. (eds), Scientific Applications of Multiprocessors
FREEMAN,T.L. and PHILLIPS, R.C.,Fara//e//VMmericfl/Algorithms
GOLDSCHLAGER, L. and LISTER, A., Computer Science: A modern introduction (2nd edn)
GORDON, M.J.C., Programming Language Theory and Its Implementation
GRAY, P.M.D., KULKARNI, K.G. and PATON, N. W., Object-oriented Databases
HAYES, I. (ed.), Specification Case Studies (2nd edn)
HEHNER,E.C.R., The Logic ofProgramming
HINCHEY, M.G. andBOWEN, I.V., Applications of Formal Methods
HOARE, C.A.R., Communicating Sequential Processes
HOARE, C.A.R. and GORDON, M.J.C. (eds), Mechanized Reasoning and Hardware Design
HOARE, C.A.R. and JONES, C.B. (eds), Essays in Computing Science
HUGHES, J.G., Database Technology: A software engineering approach
HUGHES, J.G., Object-oriented Databases
INMOS LTD, Occam 2 Reference Manual
JONES, C.B., Systematic Software Development Using VDM (2nd edn)
JONES, C.B. and SHAW, R.C.F. (eds), Case Studies in Systematic Software Development
JONES, G. and GOLDSMITH, M., Programming in Occam2
JONES, N.D., GOMARD, C.K. and SESTOFT, P., Partial Evaluation and Automatic Program Generatio
JOSEPH, M. (ed.), Real-time Systems: Specification, verification and analysis
KALDEWAIJ, A., Programming: The derivation of algorithms
KING, P. J.B., Computer and Communications Systems Performance Modelling
LALEMENT, R., Computation as Logic
McCABE, F.G., Logic and Objects
McCABE, F.G., High-level Programmer's Guide to the 68000
MEYER, B., Introduction to the Theory ofProgramming Languages
MEYER, B., Object-oriented Software Construction
MILNER, R., Communication and Concurrency
MITCHELL, R., Abstract Data Types and Modula 2
MORGAN, C., Programmingfrom Specifications (2nd edn)
OMONDI, A.R., Computer Arithmetic Systems
PATON, COOPER, WILLIAMS and TRINDER, Database Programming Languages
PEYTON JONES, S.L., The Implementation of Functional Programming Languages
Algebra of
Programming

Richard Bird
and

Oege de Moor

University of Oxford

An imprint of Pearson Education

Harlow, England London New York -

Reading, Massachusetts
*

San, Francisco
Toronto Don Mills, Ontario Sydney Tokyo Singapore J-Hong Kong Seoul
Taipei Cape Town Madrid Mexico City Amsterdam Munich Paris Milan
Pearson Education Limited
Edinburgh Gate
Harlow
Essex CM20 2JE
England

and Associated Companies throughout the world

Visit us on the World Wide Web at:


http://www.pearsoneduc .com

First published by Prentice Hall

© Prentice Hall Europe 1997

All rights reserved. No part of this publication may be reproduced,


stored in a retrieval system, or transmitted, in any form, or by any
means, electronic, mechanical, photocopying, recording or otherwise,
without prior permission, in writing, from the publisher.

Printed and bound in Great Britain by


MPG Books Ltd, Bodmin, Cornwall

Library of Congress Cataloguing-in-Publication Data

Available from the publisher

British Library Cataloguing in Publication Data

A catalogue record for this book is available from


the British Library
ISBN0-13-507245-X

10 98765432

04 03 02 01 00
Contents

Foreword ix

Preface xi

Programs 1
1.1 Datatypes 1
1.2 Natural numbers 4
1.3 Lists t 7
1.4 Trees 14
1.5 Inverses 16
1.6 Polymorphic functions 18
1.7 Pointwise and point-free 19

Functions and Categories 25


2.1 Categories 25
2.2 Functors 30
2.3 Natural transformations 33
2.4 Constructing datatypes 36
2.5 Products and coproducts 38
2.6 Initial algebras 45
2.7 Type functors 49

Applications 55
3.1 Banana-split 55
3.2 Ruby triangles and Horner's rule 58
3.3 The TfiK problem -

part one 62
3.4 Conditions and conditionals 66
3.5 Concatenation and currying 70
VI

Relations and Allegories 81


4.1 Allegories 81
4.2 Special properties of arrows 86
4.3 Tabular allegories 91
4.4 Locally complete allegories 96
4.5 Boolean allegories 101
4.6 Power allegories 103

Datatypes in Allegories 111


5.1 Relators 111
5.2 Relational products 114
5.3 Relational coproducts 117
5.4 The power relator 119
5.5 Relational catamorphisms 121
5.6 Combinatorial functions 123
5.7 Lax natural transformations 132

Recursive Programs 137


6.1 Digits of a number 137
6.2 Least fixed points 140
6.3 Hylomorphisms 142
6.4 Fast exponentiation and modulus computation 144
6.5 Unique fixed points 146
6.6 Sorting by selection 151
6.7 Closure 157

Optimisation Problems 165


7.1 Minimum and maximum 166
7.2 Monotonic algebras 172
7.3 Planning a company party 175
7.4 Shortest paths on a cylinder 179
7.5 The security van problem 184

Thinning Algorithms 193


8.1 Thinning 193
8.2 Paths in a layered network 196
8.3 Implementing thin 199
8.4 The knapsack problem 205
8.5 The paragraph problem 207
8.6 Bitonic tours 212
Contents vn

Dynamic Programming 219


9.1 Theory 220
9.2 The string edit problem 225
9.3 Optimal bracketing 230
9.4 Data compression 238

Greedy Algorithms 245


10.1 Theory 245
10.2 The detab-entab problem 246
10.3 The minimum tardiness problem 253
10.4 The T]eX problem part two
-

259

Appendix 265

Bibliography 271

Index 291
Foreword

It is a great pleasure and privilege to introduce this book on the Algebra of


Programming as the hundredth book in the Prentice Hall International Series in Computing
Science. It develops and consolidates one of the abiding and central themes of the
series: it codifies the basic laws of algorithmics, and shows how they can be used
to classify many ingenious and important programs into families related by the

algebraic properties of their specifications. The formulae and equations that you
will see here share the elegance of those which underlie physics or chemistry or any
other branch of basic science; and like them, they inspire our interest, enlarge our
understanding, and hold out promise of enduring benefits in application.

Tony Hoare
Preface

Our purpose in this book is to show how to calculate programs. We describe an


algebraic approach to programming, suitable both for the derivation of individual
programs and for the study of
programming principles in general. The
programming principles
we have in mind are those paradigms and strategies of program

construction that form the core of the subject known as Algorithm Design. Examples
of such principles include: dynamic programming, greedy algorithms, exhaustive
search, and divide and conquer.

The main ideas of the algebraic approach are illustrated by an extensive study of
optimisation problems, conducted in Chapters 7-10. These are problems that
involve finding a largest, smallest, cheapest, and so on, member of a set of possible
solutions satisfying some given constraints. It is surprising how many
computational problems can be specified in terms of optimising some suitable measure, even
problems that do not at first sight fall into the traditional areas of combinatorial
optimisation. However, the book is not primarily about optimisation problems,
rather it is about one approach to solving programming problems in general.

Our mathematical framework is a categorical calculus of relations. The calculus is


categorical because we want to formulate algorithmic strategies without reference to
specific datatypes, and relational because we need a degree of freedom in
specification and proof that a calculus of functions alone would not provide. With the help
of this calculus, the standard principles of algorithm design can be formulated as
theorems about classes of problems whose specifications possess a particular
structure. The problems are abstract in the sense that they are parameterised by one or
more datatypes. These theorems say that, under appropriate conditions, a certain

strategy works and leads to a particular form of abstract solution.

Specific algorithms for specific problems are obtained by checking that the
conditions hold andinstantiating the results. The solution may take the form of a
function, but more usually a relation, characterised as the solution to a certain
recursive equation. The recursive equation is then refined to a recursive program
that delivers a function, and the result is translated into a functional programming
Xll Preface

language. All the programs derived in Chapters 7-10 follow this pattern, and the
popular language Gofer (Jones 1994) is used to implement the results.
A categorical calculus provides not only a means for formulating algorithmic
strategies abstractly, but also a smooth and integrated framework for conducting proofs.
The style employed throughout the book is one of equational and inequational
point-free reasoning with functions and relations. A point-free calculation is one in
which the expressions under manipulation denote functions or relations, built using
functional and relational composition as the basic combining form. In contrast,
pointwise reasoning is reasoning conducted at the level of functional or relational
application and expressed in a formalism such as the predicate calculus.
The point-free style is intrinsic to a categorical approach, but is less common in
proofs about programs. One of the advantages of a point-free style is that one is
unencumbered by many of the complications involved in manipulating formula
dealing with bound variables introduced by explicit quantifications. Point-free
reasoning is therefore well suited to mechanisation, though none of the many calculations
recorded in this book were in fact produced with the help of a mechanical proof
assistant.

Audience

The book is addressed primarily to the mathematically inclined functional


programmer, though the non-functional
but still mathematically inclined programmer is
- -

not excluded. Although we have taken pains to make the book as self-contained as

possible, and have provided lots of exercises for self-study, the reader will need some
mathematical background to appreciate and master the more abstract material.

A first course in functional lot, since many of the


programming will help quite a

ideas we describe can be found there in clothing. Prior exposure to


more concrete
the basic ideas of set theory would be a significant bonus, as would some familiarity
with relations and equational reasoning in a logical calculus. The bibliographical
remarks at the end of each chapter describe where appropriate background material
can be found.

Outline

Roughly speaking, the first half of the book (Chapters 1-6) is devoted to basic
theory, while the second half (Chapters 7-10) pursues the theme of finding efficient
solutions for various kinds of optimisation problem. But most of the early chapters
contain some applications of the theory to programming problems.

Chapter 1 reviews some basic concepts and techniques in functional programming,


and the same ideas are presented again in categorical terms in Chapter 2. This
Preface Xlll

material is followed in Chapter 3 with one or two simple applications to program


derivation, as well as a discussion of additional categorical ideas. Building on this
material, Chapters 4 and 5 present a categorical calculus of relations, and Chapter 6
contains a treatment of recursion in a relational setting. This chapter also contains
discussions of various problems, including sorting and breadth-first search.

The methods explored in Chapters 7-10 fall into two basic kinds, depending on
whether a solution to an optimisation problem is viewed as being composed out
of smaller ones, or decomposed into smaller ones. The two views are
complementary and individual problems can fall into both classes. Chapter 7 discusses greedy
algorithms that assemble a solution to a problem by a bottom-up process of
constructing solutions to smaller problems, while Chapter 10 studies another class of
greedy algorithm that chooses an optimal decomposition at each stage.
The remaining chapters, Chapter 8 and Chapter 9, deal with similar views of
dynamicprogramming. Each of these four chapters contains three or four case studies
of non-trivial problems, most of which have been taken from textbooks on algorithm
design, general programming, and combinatorial optimisation.
The chapters are intended to be read in sequence. Bibliographical remarks are
included at the end of each chapter, and the majority of individual sections contain
a selection of exercises. Answers to all the exercises in the first six chapters can be
obtained from the World-wide Web: see the URL

http://www.comlab.ox.ac.uk/oucl/publications/books/algebra/

Acknowledgements

Many people have had a significant impact on the


work, and detailed
acknowledgements about who did what can be found at the end of each
chapter. We owe a
particular debt of gratitude to the following people, who took time to comment on
an earlier draft, and to make many constructive suggestions: Roland Backhouse,

Sharon Curtis, Jeremy Gibbons, Martin Henson, Tony Hoare, Guy LaPalme, Bern-
hard Moller, Jesus Ravelo, and Philip Wadler.

We would like to thank Jim Davies for knocking our WFf£& into shape, and Jackie
Harbor, our editor at Prentice Hall, for enthusiasm, moral support, and a number
of lunches.

The diagrams in this book were drawn using Paul Taylor's package (Taylor 1994).

Richard Bird would like to record a special debt of gratitude to Lambert Meertens
for his friendship and collaboration over many years. Oege de Moor would like to
thank the Dutch STOP project and British Petroleum for the financial assistance
that enabled him to come and work in Oxford. The first part of this book was
XIV Preface

written at the University of Tokyo, while visiting Professors Masato Takeichi and
Hidohiko Tanaka. Their hospitality and the generosity of Fujitsu, which made the
visit possible, are gratefully acknowledged.

We would be pleased to hear of any errors, oversights and comments.

Richard Bird (birdQcomlab.ox.ac.uk)


Oege de Moor (oegeOcomlab. ox. ac. uk)

April, 1996
'Now, then,' she said, somewhat calmer. 'An explanation,
if you please, and a categorical one. What's the idea?
What's it all about? Who the devil's that inside the winding-sheet?'
P.G. Wodehouse, The Code of the Woosters
Chapter 1

Programs

Most of the derivations recorded in this book end with a program, more specifically,
a functional program. In this opening chapter we settle notation for expressing
functional programs and review those features of functional languages that will
emerge again in a more general setting later on. Many aspects of modern functional
languages (of which there is an abundance, e.g. Gofer, Haskell, Hope, Miranda™,
Orwell, SML) are not covered. For example, we will not go into questions of strict
versus non-strict semantics, infinite values, evaluation strategies, cost models, or

operating environments. For fuller information we refer the reader to the standard
texts on the subject, some of which are mentioned in the bibliographical remarks
at the end of the chapter. Our main purpose is to identify familiar landmarks that
will help readers to navigate through the abstract material to come.

1.1 Datatypes
At the heart of functional programming is the ability to introduce new datatypes
and to define functions thatmanipulate their values. Datatypes can be introduced
by simple enumeration of their elements; for example:

Bool ::= false true

Char ::= asciiO asciil | | asciil27.

The type Bool consists of two values and Char consists of 128. It would be painful
to refer to characters only by their ASCII numbers, so most languages provide
an alternative syntax, allowing one to write 'A' for ascn65, 'a' for ascii97, '\n'
for asciilO, and so on. The various identifiers, asciiO, true, and so on, are called
constructors and the vertical bar | is interpreted as the operation of disjoint union.

Thus, distinct constructors are associated with distinct values.

Datatypes can be defined in terms of other datatypes; for example:

Either ::= bool Bool I char Char


2 1 / Programs

Both :: = tuple (Bool, Char).

The type Either consists of 130 values: bool false, bool true, char asciiO, and so on.
The type Both consists of 256 values, one for each combination of a value in Bool
with a value in Char. In these datatypes the constructors bool, char and tuple
denote functions; for example, char produces a value of type Either given a value
of type Char.

As a departure from tradition, we write f : A^B rather than / : B -> A to indicate


the source and target types associated with a function /. Thus

char : Either «— Char

tuple : Both«- (Bool x Char).


The reason for this choice has to do with functional composition, whose definition
now takes the smooth form: if / : A «— B and g : B «— C, then / g : A «— C is
defined by (/ g)x f(gx). Writing the target type on the left and the source
type the right is also consistent with the normal notation for application, in
on

which functions are applied to arguments on the right. In the alternative, so-called
diagrammatic forms, one writes xf for application and /; g for composition, where
xtf\9) (xf)9- The conventional order is consistent with adjectival order in
English, in which adjectives are functions taking noun phrases to noun phrases.

Given the assurance producing different values,


about different constructors we can

define functions on datatypes by pattern matching; for example,

not false =
true

not true =
false

defines the negation operator not : Bool <— Bool ,


and

switch (tuple (b,c)) =


tuple (not b,c)

defines a function switch : Both «— Both.

Functions of more than one argument can be defined in one of two basic styles:
either by pairing the arguments, as in

and (false, b) =
false
and (true, b) =
b

or by currying, as in

cand false b =
false
cand true b =
b.
1.1 I Datatypes 3

The difference between and and cand is just one of type:

and : Bool (Bool x Bool)


«—

cand :
(Bool «— Bool) «— Bool.
More generally, we can define a function / of two arguments by choosing any of the
types

f:A<-(BxC)
f:(A^B)^C
f:(A<-C)+- B.
With the first type we would write / (6, c); with the second, f cb; and with the
third, f be. For obvious reasons, the first and third seem more natural companions.
The function curry, with type

curry :
((A <- C) <- B) <- (A<- (B x
C)),
converts a non-curried function into a curried one:

curry f be =
/(&, c).
One can also define a function uncurry that goes the other way.

Functional programmers prefer to curry their functions as a matter of course, one


reason being that it usually leads to fewer brackets. However, we will be more
sparing with currying, reserving its use for those situations that really need it. The
reason is that the product type A x B is a simpler object than the function space

type C «— D in an abstract setting. We will see some examples of curried functions


below, but functional programmers are warned at this point that some familiar
functions will make their appearance in non-curried form.

To return to datatypes, we can parameterise datatypes with other types; for


example, the definition

maybe A ::= nothing just A

introduces a type maybe A in


terms of a parameter type A. For example, just true
has type maybe Bool, while just asciiO has type maybe Char. We will write non-
parameterised datatypes using a capital initial letter, and parameterised datatypes
using lower case letters only. The reason, as we shall explain later on, is that the
name of a parameterised datatype will also be used for a certain function associated

with the datatype, and we write the names of functions using lower case letters.
4 1 / Programs

1.2 Natural numbers

Datatypes can also be defined recursively; for example,

Nat ::= zero succ Nat

introduces the type of natural numbers. Nat is the union of an infinite number
of distinct values: zero, succ zero, succ {succ zero), and so on. If two distinct
expressions of Nat denoted the same value, we could show, for some element n of
Nat, that both zero and succn denoted the same value, contradicting the basic
assumption that different constructors produce different values.

Functions over Nat can be defined by recursion; for example,

plus (m, zero) = m

plus (m, succn) =


succ (plus (m,n))
and

mult (m, zero) =


zero

mult (m, succn) =


plus (m, mult (m,n)).

Forcing the programmer to write succ (succ (succ zero)) instead of 3, and to re-create
all of arithmetic from scratch, would be a curious decision, to say the least, by the
designers of a programming language, so a standard syntax for numbers is usually
provided, as well as the basic arithmetic operations. In particular, zero is written
0 and succ n is written n + 1. With these conventions, we can write definitions in
a more perspicuous form; for example,

fact 0 = 1

fact (n + 1) =
(n + 1) x fact n

defines the factorial function, and

fibO =
0

fibl = 1

/?&(n + 2) =
fibn + fib(n + l)
defines the Fibonacci function. The expression n + 2 corresponds to the pattern
succ (succ n), which is disjoint from the patterns zero and succ zero.

Some systems of recursive equations do not define functions; for example,

fn =
/(n + 1).

Every constant function satisfies the equation for /, but none is defined by it. On
1.2 J Natural numbers 5

the other hand, the two equations

/0 = c

/(n + 1) =
h(fn)
do define a unique function / for every constant c and function h of appropriate

types. More precisely, if c has type A for some A, and if h has type ft : A«— j4, then
/ is defined uniquely for every natural number and has type f : A<-Nat. The above
scheme is called definition by structural recursion over the natural numbers, and is
an instance of slightly
a general more scheme called primitive recursion. Much of
this book is devoted to understanding and exploiting the idea of defining a function
(or, more generally, a relation) by structural recursion over a datatype.

The two equations given above can be captured in terms of a single function foldn
that takes the constant c and function h
arguments; thus / foldn (c, h). The
as =

function foldn is called the fold operator for the type Nat. Observe that foldn (c, h)
works by taking a natural number expressed in terms of zero and succ, replacing
zero by c and by ft, and then evaluating the result.
succ In other words, foldn (c, ft)
describes a homomorphism of Nat.

It is a fact that not every computable function over the natural numbers can be
described using structural recursion, so certainly some functional programs are
inaccessible if only structural recursion is allowed. However, in the presence of currying
and other bits and pieces, structural recursion is both a flexible and powerful tool
(see Exercise 1.6). For example,

plus m =
foldn (m, succ)
mult m =
foldn (0, plus m)
expnm =
foldn (1, mult m)
define curried versions of addition, multiplication and exponentiation. In these
definitions currying plays essential role since foldn gives us no way of defining
an

recursive functions on pairs of numbers.

As two more examples, the factorial function can be computed by

fact =
outr foldn ((0,1), /)
outr(m,n) =
n

f(m,n) =
(m + l,(m + 1) x
n),
and the Fibonacci function can be computed by

fib =
outl- foldn ((0,1),/)
outl (m, n) =
m

f (m,n) =
(n,m + n).
6 1 / Programs

The two functions outl (short for 'out-left') and outr


('out-right') are projection
functions that select the left and right pair of values. These programs
elements of a

for fact and fib can be regarded as implementations of the recursive definitions.
The program for fib has the advantage that values of fib are computed in linear
time, while the recursive definition, if implemented directly, would require
exponential time. The program for fib illustrates an important idea, called tabulation,
in which function values are stored for subsequent use rather than being calculated
afresh each time. Here, the table is very simple, consisting of just a pair of values:
foldn ((0,1),/) n returns the pair (fib n,fib (n + 1)). The theme of tabulation will
emerge again in Chapter 9 on dynamic programming.

Further examples of recursive datatypes appear in subsequent sections.

Exercises

1.1 Give an example of a recursion equation that is not satisfied by any function.

1.2 Consider the recursion equation


ra 0&, y) =
y +1, if x =
y
=
m(x, m(x l,y + 1)), otherwise.

Does this determine a unique function m?

1.3 Construct a datatype Nat+ for representing the integers >0, together with an
operator foldn+ for iterating over such numbers. Give functions / : Not* «— Nat
and g : Nat <— Nat~*~ such that / g is the identity function on Nat~*~ and g / is the
identity function on Nat.

1.4 Express the squaring function sqr : Nat«— Nat in the form sqr =
f -foldn (c, h)
for suitable /, c and h.

1.5 Consider the function lastp : Nat <- Nat such that lastpn returns the largest
natural number m < n satisfying p : Bool<-Nat. Assuming that p 0 holds, construct
suitable /, c and h so that lastp f foldn (c, h).
=

1.6 Ackermann's function ack : Nat <— Nat x Nat is defined by the equations

ack(0, y) y + l
=

ack(x + 1,0) =
ack(x, 1)
ack(x + l,y + l) =
ack(x,ack(x+ l,y)).
The function curry ack can be expressed as foldn(succ,f) for an appropriate /.
What is/? (Remark: this shows that, in the presence of currying, functions which
are not primitive recursive can be expressed in terms of foldn.)
1.3 J Lists 7

1.3 Lists

The datatype of lists dominates functional programming; much of the subject is


taken up with notation for lists, and the names and properties of useful functions
for manipulating them. The Appendix contains a summary of the more important
list-processing functions, together with other combinators we will use throughout
the book.

There are two basic views of lists, given by the type declarations

listr A ::= nil cons(A, listr A)


listl A ::= nil snoc (listl A, A).

The former describes the type of cons-lists, in which elements are added to the front
ofa list; the latter describes the type of snoc-lists, in which elements are added to

the rear. Thus listr builds lists from the right, while listl builds lists from the left.
The constructor nil is overloaded in that it denotes both the empty cons-list and
the empty snoc-list; in any program making use of both forms of list, distinct names
would have to be chosen.

The two types of list are different, though isomorphic to one another. For example,
the function convert : listr A «— listl A that converts a snoc-list into a cons-list can

be defined recursively by

convert nil =
nil

convert (snoc (#, a)) =


snocr (convert x, a)
(nil, b)
snocr =
cons (6, nil)
snocr (cons (a, #), b) = cons (a, snocr (#, &)).
The function snocr : listr A «— (listr A x A) appends an element to the end of a
cons-list. This function takes 0(n) steps on a list of length n, so convert takes
0(n2) steps to convert a list of length n. The number of steps can be brought down
to 0(n) using a technique known as an accumulation parameter (see the exercises).
It is inconvenient to have to manipulate two versions of what is essentially the same
datatype, so functional
languages have traditionally given privileged status to just
one of them. (The alternative, explored in (Wadler
1987), is to regard both types
as different views of one and the same type, and to create a mechanism for moving

from one view to the other, quietly and efficiently, as occasion demands.) Cons-lists
are taken as the basic view, and special syntax is provided for nil and cons. The

empty list is written [], and cons (o, x) is written a : x. In addition, [o] can be used
for a : [], [a, b] for a : b : [], and so on. However, since we want to treat both
types of list on an equal footing, we will not use the syntax o : x\ for now we stick
with the slightly cumbersome forms nil, cons (o, x) and snoc (ar, a).
8 1 / Programs

The concatenation of two lists x and y is denoted by x -H- y. For example,

[1,2,3]-H- [4,5] =
[1,2,3,4,5].
In particular,

cons (a, x) =
[a] -H- x
snoc (x, a) =
a?-H-[a].
Later on, but not just yet, we will use the expressions on the right as alternatives
for those on the left; this is an extension of a similar convention for Nat, in which
we wrote n + 1 for succ n, thereby harnessing the operation + for a more primitive
purpose.

The type and definition of concatenation depends on the version of lists under
consideration. For example, taking

(-H-) : listl A <r- listl A x listl A,

so that x -H- y abbreviates -H-(#, y), we can define -H- by


x -H- nil = x

x -H- snoc(y, a) =
snoc(x-W- y,a).

Using this definition, we can show that -H- is an associative operation, and that nil
is a left unit as well as a right one. The proof that

x-ti-(y-ti-z) =
{x -H- y) -H- z

proceeds by induction on z. The base case is

x-tt-(y-tt- nil)
=
{first equation defining -H-}
x-W-y
=
{first equation defining -H-}
(x -H- y) -H- nil.
The induction step is

x-W- (y-W- snoc(z,a))


=
{second equation defining -H-}
x -H- snoc (y -H- z, a)

=
{second equation defining -H-}
snoc(x-W- (y-W-z),a)
1.3 / Lists 9

=
{induction hypothesis}
snoc ((x -H- y) -H- z, a)
=
{second equation defining -H- (backwards)}
(x -W- y) -W- snoc (z, a).

We leave the proof that nil is the unit of -H- as an exercise. The above style and
format for calculations will be adopted throughout the book.

A most useful operation on lists is the function that applies a function to every
element of
a list. Traditionally, this operation is called mapf. If/ : A «— 5, then

mapf : list A «— list B is defined informally by

mapf [ai, 02,..., an] =


\f ai,/ 02,...,/ an].
We will not,however, use the traditional name, preferring to use listrf for the map
operation cons-lists, and listlf for the same operation on snoc-lists. Thus the
on

name of the type plays a dual role, signifying the type in type expressions, and

the map operation in value expressions. The same convention is extended to other
parameterised types. The reason for this choice will emerge in the next chapter.

The function listrf can be defined recursively:

listrf nil =
nil
listr f (cons (a, x)) =
cons (f a, listr f x).

There is a similar definition for listlf. Instead of writing down recursion equations
we can appeal to a standard recipe similar to that introduced for Nat. Consider
the scheme

f nil =
c

f(cons(a,x)) =
h(a,f x)

for defining a recursive function / with source type listr A for some A. We
encapsulate this pattern of recursion by a function foldr, so that / foldr (c, h). = In other
words,

foldr (c, h) nil = c

foldr (c, ft) (cons (a,#)) =


h(a, foldr (c,h)x).

Given h : B <— Ax B and c : B, we have foldr (c, h) : B <— listr A. In particular,

listrf =
foldr (nil, h) where fe(a, x) = cons (f a, x).
10 1 / Programs

In a similar spirit, we define

foldl (c, ft) nil =


c

foldl (c, ft) (snoc (x, a)) =


h(foldl (c, ft) x, a),
so that foldl (c, h) : B <- listl A provided h : B <- B x A and c : B. Now we have

foW/ =
foldl (nil, ft) where h(x, o) = snoc
(x,fa).
The functions foldr (c, ft) and foldl (c, ft) work in a similar fashion to foldn (c, ft) of
the preceding foldr (c, ft) transforms a list by systematically replacing nil
section:
by c and cons by ft; similarly, /oZrfZ (c, ft) replaces mZ by c and snoc by ft. Like
foldn on the natural numbers, these two functions embody structural recursion on
their respective datatypes and can be used to define many useful functions. For
example, on snoc-lists we can define a curried version of concatenation by

cat x =
foldl (x, snoc).
We have cat x y
= x 4h y. This definition mirrors the earlier definition of addition:
plus m =
foldn (m, succ). We leave it as an exercise to define a version of cat over
cons-lists.

Other examples on cons-lists include

sum =
foldr (0, plus)
product =
foldr (1, mult)
concat =
foldr (nil, cat)
length =
sum listr one, where one a = 1.

The function concat : listr A <—listr (listr A) concatenates a list of lists into one long
list, and length returns the length of a list. The length function can also be defined
in terms of a single foldr:

length =
foldr (0, ft), where ft(a, n) = n + 1.

This is example of a general phenomenon: any function which can be expressed


an

as a fold after
a mapping operation can also be expressed as a single fold. We will

state and prove a suitably general version of this result in the next chapter.

Another example is provided by the function filter p : listr A«— listr A, where p has
type p : Bool <- A. This function filters a list, retaining only those elements that

satisfy p. It can be defined as follows:

filter p =
concat listr (p —> wrap, nilp)

( * _
{ f a, \i p a
\P J,9)a
| ga^ otherwise
I 11 J Lists 11

wrap a =
cons (a, nil)
nilp a =
nil.

The McCarthy conditional form (p —> f,g) is used to describe conditional


expressions, wrap argument into a singleton list, and nilp is a constant function
turns its
that returns the empty list for each argument. The function filter p works by
turning an clement that satisfies p into a singleton list, and an element that doesn't
satisfy p into theempty list, and concatenating the results.

We can express filter as a single foldr:

filter p =
foldr (nil, (p outl -> cons, outr)).
The projection functions outl and outr were introduced earlier. Applied to (a, x),
the function (p-outl -> cons, outr) returns cons (a, x) is true, and x otherwise.
if p a

Yet another way to express filter is given in the last section of this chapter.

Finally, let us consider an example where the difference between cons-lists and
snoc-lists plays an essential role. Consider the problem of converting some suitable
representation of a decimal value into the real number it represents. Suppose the
number is

dmdm-i ...do.eie2... en,

which represents the number w + /, where

w =
10mrfm + 10m-1rfm_i + ---10°d0
/ =
ei/lO^ea/lO^-'-en/lO71.
Observing that

w =
10 ((... (10 x (10 x 0 + dm) + dm-i)...))
x + db
/ =
(ei + ...(cn_i + cn/10)/10...)/10,
we can see that onesensible way to represent decimal numbers is by a pair of lists
listl Digit x listr Digit. We can then define the evaluating function eval by
eval : Real (listl Digit x listr Digit)
<—

eval (x,y) =
foldl (0, /) x + foldr (0, g) y
f (n, d) =
10 x n + d

g(e,r) =
(e + r)/10.
appropriate to represent the whole number part by a snoc-list because it is
It, is
evaluated more conveniently by processing the digits from left to right; on the other

hand, the fractional part is more appropriately represented by a cons-list since the
processing is from right to left.
12 1 / Programs

Lists in functional programming

In traditional functional programming, the functions foldr (c, h) and foldl (c, h) are
defined a differently. There are two minor differences and one major one. First,
little
foldr and foldl are usually defined as curried functions, writing foldr h c instead of
foldr (c, ft), and similarly for foldl. One small advantage of the switch of arguments
is that some functions can be defined more succinctly; for example, the curried
function cat on snoc-lists can now be defined by

cat =
foldl snoc.

The second minor difference is that the argument ft in foldr ft c is also curried
becausecons is usually introduced as a curried function. Since we have introduced

cons to have type listr A «— (A x listr A), it is appropriate to take the type of ft to
be B<-(A x B).
The more important difference is that in traditional functional programming the
basic view of lists is cons-lists and, because foldl is a useful operation to have, the
type assigned to foldl (c, ft) is B <— listr A, for some A and B. This means that foldl
is given a different definition, namely,

foldl (c, ft) nil =


c

foldl (c, ft) {cons (a, x)) =


foldl (ft (c, a), ft) x.

This is essentially an iterative definition and corresponds to a loop in imperative


programming. The first component of the first argument of foldl is treated as an
accumulation parameter, and models the state of an imperative program. We leave
it as an exercise to show that the two definitions are equivalent, and to discover a
way of expressing this version of foldl in terms of foldr.

This definition of foldl as an operation on cons-lists can be used to good effect.


Consider, example, the function reverse that reverses the elements of a list. As
for
a function on cons-lists, we can define

reverse =
foldr (mZ, append)
append (a, x) =
snocr (x, a),
where snocr was defined earlier. As a function on snoc-lists, we can define

reverse =
foldl (niZ, prepend)
prepend (#, o) =
cons (a, x).
As an implementation of reverse on cons-lists, the first definition takes 0(n2) steps
to reverse a list of length n, the reason being that snocr requires linear time.
However, interpreting foldl as an operation on cons-lists, the second definition of reverse
takes linear time because cons takes constant time.
1.3 / Lists 13

Non-empty lists

Having the empty list around sometimes causes more trouble than it is worth.
Fortunately, we can always introduce the types
listr* A ::= wrap A cons (A, listr* A)
listl+ A ::= wrap A | snoc (lisil* A, A)
of non-empty cons-lists and snoc-lists. Here, wrap returns a singleton list and the
generic fold operation replaces the function wrap by a function / and cons by a
function g:

foldr+(f,g)(wrapa) =
fa
foldr+ (/, g) (cons (a, x)) =
g (a,foldr+ (/, g) x).
In particular, the function head : A «— listr* A that returns the first element of a

non-empty list can be defined by

head =
foldr* (id, outl).
In some functional languages the fold operator on non-empty cons-lists is denoted
by /oZdrl, with the definition

foldrlf =
foldr+(idJ).
So foldrl cannot express the general fold operator on non-empty cons-lists, but
only the special case (admittedly the most frequent in practice) in which the first
argument is the identity function.

List comprehensions

Finally, we introduce a useful piece of syntax that can be used as an alternative to

many expressions involving listr and filter. An expression of the form

[exprO | var «— exprl; expr2]

is called list
comprehension and produces a list of values of the form exprO for
a

values var drawn from the list


expression exprl and satisfying the boolean expression
expr2. For example,

[n x n
| n <—
[1.. 10]; even n]
produces, in order, the list of squares of even numbers n in the range 1 < n < 10.
In particular, we have
listr f x =
\f a a <— x]
filter px =
[a\a<—x;pa].
14 1 / Programs

There is general
a more form of list comprehension, but we will not need it; indeed,
list comprehensions are used only occasionally in what follows.

Exercises

1.7 Construct the function convert : listr A <— listlA in the form foldl (c^h) for
suitable c and h.

1.8 Consider the curried function catconv :


(listr A «— listl A) «— listr A defined by
catconv x y
= convert x -H- y. Express catconv in the form foldl (c, h) and hence
show how convert can be carried out in linear time.

1.9 Prove that nil is a left unit of -H-.

1.10 Construct cat :


(listr A <— listr A) <— listr A.

1.11 Construct the iterative function foldl over cons-lists in terms of foldr.

1.12 The function take n : listr A <— listr A takes the first n items of a list, or the
whole list if its length is no larger than n. Construct suitable h and c for which
takenx foldr(c,h)xn. Similarly, define the function dropn (which drops the
first n items from a list) in terms of foldr.

1.4 Trees

We will briefly consider two more examples of recursive datatypes to drive home
the points made in preceding sections. First, consider the type

tree A ::= tip A bin (tree A, tree A)


of binary trees with elements from A in the tips. In particular, the expression

bin (tip 0, bin (tip 1, tip 2))


denotes an element of tree Nat, while

bin (tip 'A', bin (tip 'B\ tip 'C'))


denotes an element of tree Char.

The generic form of the fold operator for binary trees is foldt (/, #), defined by

foldt (/, g) (tip a) =


f a

foldt (/, g) (bin (x, y)) =


g (foldt (/, g) x, foldt (/, g) y).
1.4 J Trees 15

Here, foldt (f,g) : B 4- tree A if/ : B <— A and g : B <— B x B. In particular, the
map function for trees is given by

treef =
foldt (tip /, bin).

The functions size and depth for determining the size and the depth of a tree are

given by

size =
foldt (one, plus), where one a 1

depth =
foldt (zero, succ bmax), where zero a =
0.

Here, bmax (x, y) (short for 'binary maximum') returns the greater of x and y; the
depth of the tree bin (x, y) is one more than the greater of the depths of trees x
and y.

The final example is of two mutually recursive datatypes. Consider the types

tree A ::= fork (A, forest A)


forest A ::= null grow (tree A, forest A),

defining trees and forests in terms of each other. The type forest A is in fact
isomorphic to listr (tree A), so we could also have introduced trees using lists rather
than forests.

The generic fold operation for this kind of tree is not defined by a single
recursion, but as the first of a pair of functions, foldt (g, c, h) and foldf (g, c, ft), defined
simultaneously by mutual recursion:

foldt (g, c,h) (fork (a, xs)) =


g (a, foldf (g, c, h) xs)
foldf (g, c, h) null =
c

foldf (g, c, h) (grow (x, xs)) = h (foldt (g, c, h) x, foldf (g, c, h) xs).
For example, the size of a tree is defined by

size =
foldt (succ outr, 0, plus).

We have now seen enough examples to get the general idea: when introducing
a new datatype, also define the generic fold operation for that datatype. When
the datatype is parameterised, also introduce the appropriate mapping operation.
Given these functions, a number of other useful functions can be quickly defined.

It would be nice if we could give, once and for all, a single completely generic
definition of the fold operator, parameterised by the structure of the datatype being
defined. Indeed, we shall do just this in the next chapter. But in most functional
languages currently available, this is not possible: we can parameterise functions with
abstract operators, but we cannot parameterise functions with abstract datatypes.
16 1 / Programs

Recently, several authors have proposed new languages that overcome this
restriction, and some references can be found in the bibliographical remarks at the end of
this chapter.

Exercises

1.13 Consider the type

gtree A ::= node (A, listl (gtree A))


of general trees with nodes labelled with elements from A. Define the generic foldg
function for this kind of tree, and hence construct functions size and depth for
computing the size and depth of a tree.
1.14 Continuing on from the preceding exercise, represent the expression

f(g(a,b),h(c),d)
as an element of gtree Char. Convert this expression to curried form, and represent
the result as an element of tree Char. Using this translation as a guide, construct

functions

curry : tree A «— gtree A


uncurry : gtree A <— tree A

for converting from general trees to binary trees and vice-versa.

1.5 Inverses

Another theme that will emerge in subsequent chapters is the use of inverses in
program specification and synthesis. Some functions are best specified as inverses
to other functions. Consider, for example, the function zip with type

zip : listr (A x
B) <- (listr A x listr S),
which is defined informally by

zip ([ai, 02,..., an], [&i, 62, •, K]) =


[(ai, M, (^ M, •, (an, Mi-
One way of specifying zip is as the inverse of a function

unzip : listr A x listr B <— listr (Ax S),


defined by

unzip =
pair (listr outl, listr outr),
1.5 / Inverses 17

where pair (f, g)x (f x,gx). Thus, unzip takes a list of pairs and returns a pair
=

of lists consisting of the first components (listr outl) and the second components
(listr outr). The function unzip can also be expressed in terms of a single fold:

unzip =
foldr (nils, conss)
nils =
(nil, nil)
conss ((a, b), (x, y)) =
(cons (a, x), cons (b, y)).
This is another example of a general result that we will give later in the book: a

pair of folds can always be expressed as a single fold.

Now, unzip is an injective function, which means that we can specify zip by the
condition

zip unzip =
id.

Using this specification, we can synthesise a direct recursive definition of zip:

zip (nil, nil) =


nil

zip (cons (a, x), cons (b, y)) =


cons ((a, b), zip (x, y)).
Note that zip is a partial function, defined only on lists of equal length. This is
because unzip is not surjective. In functional programming, zip is made total by
extending its recursive definition to read

zip (nil, y) =
nil

zip (cons (a, x), nil) =


nil

zip (cons (a, x), cons (b, y)) =


cons ((a, b), zip (x, y)).
This version of zip works for two lists of different lengths, stopping when either list
is exhausted.

As another example, consider the function decimal: listl Digit <— Nat that converts
a natural number to the decimal numeral that represents it. The inverse function
in this case is eval: Nat «— listl Digit defined by

eval =
foldl(0,f)
f(n,d) =
10 x n + d.

However, eval is not an injective function, so we cannot specify decimal simply


by the equation decimal eval id. There are two ways out of this problem:
=

either we can define a type Decimal, replacing listl Digit, so that eval is an injective
function on Decimal; or else specify decimal n to be, say, a shortest member of
the set {x eval x =
n}. Both methods will be examined in due course, so we

will not go into details at this stage. The main point we want to make here is
that definition by inverse is a useful method of specification, but one that involves
18 1 J Programs

difficulties when working exclusively with functions. The solution, as we shall see, is
to move to relations: all relations possess
a unique converse, so there is no problem

in specifying one relation as the converse of another. If we want to specify a function


in this way, then we have to find some functional refinement of the converse. We
shall also study methods for doing just this.

Exercises

1.15 Construct the curried version zip :


(listr (A x
B) «— listr B) «— listr A in the
form foldr (c, h) for suitable h and c.

1.16 Define datatype Digits that represents non-empty lists of digits, not
a

with
beginning Define the generic fold function for Digits, and use it to construct
zero.

the evaluating function evdl : Nat+ <— Digits, where Nat+ is the type of positive
integers. Can you specify decimal: Digits «— Nat~*~ as the inverse of eval?

1.6 Polymorphic functions


Some of the list-processing functions defined above are polymorphic in that they
do not depend in any essential way on the particular type of lists being considered.
For example, concat : listr A «— listr (listr A) does not depend in any essential way
on the type A. Such functions satisfy certain identities appropriate to their type.

For example, we have

listr f concat =
concat listr (listr f).
This equation can be interpreted as the assertion that the recipe of concatenating
a list of
lists, and then renaming the elements, has the same outcome as renaming
each element in the list of lists, and then concatenating. Thus, concat does not
depend on the structure of the elements of the lists being concatenated. A formal
proof of the equation above is left as an exercise.
As another example, consider the function inits : listl (listl A) «— listl A that returns
a list of all prefixes of a list:
inits =
foldl ([nil], /)
/ (snoc (xs, x), a) =
snoc (snoc (xs, x), snoc (x, a)).
For example,

inits [oi, 02, a3] =


[[], [oi], [ai, oq], [ai, oq, a3}}.
Like concat, the function inits does not depend in any essential way on the nature
of elements in the list; the result is the same whether we take the prefixes and
1.71 Pointwise and point-free 19

then process each element in each list, or first process each element and then take
prefixes. We therefore have the identity
listl (listlf) inits =
inits listlf.
In a similar fashion, the function reverse : listr A <— listr A satisfies the identity

listrf reverse =
reverse listr f.

Finally, the function zip : listr (Ax B) <— (listr A x listr B) satisfies the identity
listr (cross (/, g)) zip =
zip cross (listrf, listr g),
where cross (f, g) (a, b) =
a, g b). Functions, like concat, inits, reverse, zip, and
(f
so on, which do not dependin any essential way on the structure of the elements in
their arguments, will be studied in a general setting later on, where they are called
natural transformations.

Exercises

1.17 Give proofs by induction of the various identities cited in this section.

1.18 Suppose you are given a polymorphic function foo with type

foo : tree (Ax B) <- (listr Ax B).


What identity would you expect foo to satisfy?
1.19 Similarly to the preceding exercise, guess the identity for

foo : listl A <— gtree A.

1.7 Pointwise and point-free


There are styles for expressing functions, the pointwise style and the point-
two basic
freestyle. pointwise style we describe a function by describing its application
In the
to arguments. Many of the examples above are expressed in the pointwise style.
In the point-free style we describe a function exclusively in terms of functional
composition; we have also seen some examples in this style too. In this section we
want to illustrate with the aid of a small example how a point-free style leads to a

very simple method for reasoning about functions.

Recall that the function filter p can be defined on lists by the equation

filter p =
concat listr (p —> wrap, nilp).
The function wrap takes a value and returns a singleton list; thus wrap a [a]. The
20 1 / Programs

function nilp takes a value and returns the empty list; thus, nilp a nil. Our aim
in this section is to prove the identity

filter p = listr outl filter outr zip pair (id, listr p).

The operational reading of the right-hand side is that each element of a list is
paired with its boolean value under p, and then those elements paired with true
are selected. Although we won't go into details, the identity is useful in optimising

computations of filter p when the structure of the predicate p enables listr p to be


computed efficiently.
For the proof we will need a number of identities concerning the functions that
appear in the two expressions for filter. The first group concerns the following

combinators for expressing pairing:

pair(f,g)a =
(fa,ga)
outl (a, b) =
a

outr (a, b) =
b.

These functions are related by the properties

outl pair (f', g) =


f (1.1)
outr pair (f, g) =
g. (1.2)
As we shall see in the next chapter, these properties characterise the notion of a

categorical product.
For the function nilp we have two rules:

nilp f - =
nilp (1.3)
listr f nilp =
nilp. (1.4)
The first rule states that nilp is a constant function, and the second rule states that
this constant is the empty list.

For wrap and concat we have the rules

listr f-wrap =
wrap
-

f (1.5)
listrf -
concat =
concat listr (listrf). (1.6)
These state that wrap and concat are natural transformations.

For the function zip we use a similar rule:

zip pair (listr/, listr g) =


listr (pair (/, g)). (1.7)
This states that zip is a natural transformation taking pairs of lists to lists of pairs.
1.71 Pointwise and point-free 21

For listr we have two rules:

listr (f g) =
listr f listr g (1.8)
listr id =
id. (1.9)
As we will see in the next chapter, these rules say that listr is what is known as a

functor.

For conditionals we will use the following rules:

(P^f,9)'h =
(p-h^f-h,g-h) (1.10)
h-(p^f,g) =
(p-+h.f,h.g). (1.11)
These rules say how composition distributes over conditionals.

Finally, the identity function id satisfies two properties, namely

f'id =
f (1.12)
id-f =
/. (1.13)
The two occurrences of id denote different instances of the identity function, one

on the source type of/, and one on its target type.

It might appear that these dozen or so rules have been plucked out of thin air but,
as we have hinted, they form coherent groups based on a small number of concepts
(products, functors, natural transformations, and so on) to be studied in the next
chapter. For now we just accept them.

Having armed ourselves with sufficient tools, we calculate:

listr outl -

filter outr zip pair (id, listr p)


=
{definition of filter}

listr outl -
concat listr (outr -> wrap, nil) zip pair (id, listr p)
=
{equation (1.6)}
concat listr (listr outl) listr (outr -> wrap, nil) zip pair (id, listr p)
-

=
{equation (1.8) (backwards)}
concat listr (listr outl (outr -> wrap, nil) zip pair (id, listr p)
-

=
{equations (1.11), (1.5), and (1.4)}
concat listr (outr -> wrap outl, nil) zip pair (id, listr p)
=
{equation (1.9) (backwards)}
concat listr (outr -> wrap outl, nil) zip pair (listr id, listr p)

=
{equation (1.7)}
concat listr (outr —> wrap outl, nil) listr (pair (id,p))
22 1 / Programs

=
{equation (1.8) (backwards)}
concat listr (outr —> wrap outl, nil) pair (id,p)
=
{equations (1.10), (1.1), and (1.3)}
concat listr (p -> wrap id, nil)
-

=
{equation (1.12)}
concat listr (p -> wrap, nil)

=
{definition of filter}
filter p.

Although this calculation is fairly long -

and would have been twice the length if


we had not combined steps it is very simple. Some slight variations in the order
-

of the steps is possible; for example, we could have simplified zip pair (id, listr p)
to listr (pair (id, p)) Apart from this, almost every step is
earlier in the calculation.
forced. Indeed, when some students set the
problem in an examination, almost
were

nobody had difficulties solving it. The problem was also given as a test example to
a graduate student who had designed a simple proof editor, including a 'go' button

that automatically applied identities from a given set from left to right until no
more rules in the set were applicable. Apart from expressing rules (1.8) and (1.9)

in reverse form, the calculation proceeded quickly and automatically to the desired
conclusion, somewhat to the student's surprise.

With this single exercise hope to have convinced the reader that point-free
we

reasoning can reasoning. Indeed, most of the many calculations to


be effective
come are done in a point-free style. However, while calculations whether point- -

free or pointwise are satisfying to do, they are far less satisfying to read. It has
-

been said that calculating is not a spectator sport. Therefore, our advice to the
reader in studying a calculation is first to try and do it for oneself. Only when
difficulties arise should the text be consulted. Although we have strived to present
calculations in the best possible way, there will no doubt be occasions when the
diligent reader can find a shorter or clearer route to the desired conclusion.

Bibliographical remarks

There are numerous introductory textbooks on functional programming; probably


the best background for the material presented here is (Bird and Wadler 1988). A
more modern text that is based on Haskell is (Davie 1992). Both of these books
take non-strict semantics as the point of departure; a good introduction to strict
functional programming can be found in (Paulson 1991). Other recommended books
on functional programming are (Field and Harrison 1988; Henson 1987; Reade 1988;
Wickstrom 1987). There is an archive for functional programming on the world-wide
web which contains a wealth of articles describing the latest developments:
Bibliographical remarks 23

http://www.lpac.ac.uk/SEL-HPC/Articles/FuncArchive.html
Readers who wish to experiment with the programs presented in this book might
consider the Gofer system (Jones 1994), which is freely available from

ftp://ftp.cs.nott.ac.uk/nott-fp/languages/gofer/
In fact, in later chapters, when we come to study some non-trivial programming
examples, we shall present the result of our derivations as Gofer programs.

The realisation that functional programs are good for equational reasoning is as
old as the subject itself. Two landmark papers are (Backus 1978; Burstall and

Darlington 1977). More recent work on an algebraic approach to the derivation of


functional programs, in which we were involved ourselves, is described in e.g. (Bird
1986, 1987; Bird and Meertens 1987; Bird, Gibbons, and Jones 1989; Bird 1989a,
1989b, 1990; Bird and De Moor 1993b; Jeuring 1989, 1990, 1994;Meertens 1987,

1989). The material of this book evolved from all these works. Quite similar in
spirit, but slightly different in notation and style are (Backus 1981, 1985;
Harrison and Khoshnevisan 1988; Harrison 1988; Williams 1982), and (Pettorossi and

Burstall 1983; Pettorossi 1985).

Recently there has been a surge of interest in functional languages that, given the
definition of a datatype, automatically provide the user with the associated fold.

One approach, which is quite transparent to the naive user, can be found in (Fegaras,
Sheard, and Stemple 1992; Sheard and Fegaras 1993; Kieburtz and Lewis 1995).
Another approach, which is more elegant but also requires more understanding on
the user's part, is the use of constructor classes in (Jeuring 1995; Jones 1995; Meijer
and Hutton 1995).
Chapter 2

Functions and Categories

This chapter provides a brief introduction to the elements of category theory that are
necessary for understanding the rest of the book. In particular, it emphasises ways
in which category theory offers economy in definitions and proofs. Subsequently, it
is shown how category theory can be used in defining the basic building blocks of
datatypes, and how these definitions give rise to a set of combinators that unify the
operators found in functional programming and program derivation. In Chapter 3
these combinators, and the associated theory, are illustrated in a number of small
but representative programming examples.

One does not so much learn category theory as absorb it over a period of time. It is
difficult, at a first or second reading, to appreciate the point of many definitions and

the reasons subject's abstract nature. We have tried to take this into account
for the
in two ways: first, by adopting a strictly minimalist style, leaving out anything that
is not germane to our purpose; and second, by confining attention to a small range

of examples, all drawn from the area of program specification and derivation, which
is, after all, our main topic.

2.1 Categories
A category C is an algebraic structure consisting of a class of objects, denoted by
A,B,C,..., and so on, and a class of arrows, denoted by /,#, ft,..., and so on,
together with three total operations and one partial operation.
The first two total operations are called target and source; both assign an object
to an arrow. f : A<— B (pronounced '/ is of type A from S') to indicate
We write
that the target of the arrow f is A and the source of / is B.

The third total operation takes an object A to an arrow ida '• A «— A, called the
identity arrow on A.

The partial operation is called composition and takes two arrows to another one.
26 2 / Functions and Categories

The composition / g (pronounced '/ after #') is defined if and only if / : A «— B


and g : B «— C for some objects A, B, and C, in which case / g : -A«— C. In other
words, if the source of/ is the target of g, then / g is an arrow whose target is the
target of/ and whose source is the source of g.

Composition is required to be associative and to have identity arrows as units:

f'(9'h) =
(f-g)-h
for all / : A <- £, g : B <- C and h : C <- D, and

for all / : A <- 5.

Examples of categories

The motivating example of a category is Pun, the category of sets and total
In this category the objects are sets and the arrows are typed functions.
functions.
More precisely, an arrow is a triple (f,A,B) in which the set A contains the range
of / and the set B is the domain of /. By definition, A is the target and B the
source of (/, A, B). The identity arrow ida : A «— A is the identity function -4,
on

and the composition of two arrows (/, A, B) and (#, C, D) is defined if and only if
B C, in which case

(f,A,B)-(g,B,D) =
(f-g,A,D),
where, on the right, / g denotes the usual composition of functions / and g.

Another example of a category is Par, the category of sets and partial functions.
The definition is similar to Pun except that, now, the triple (/, -4, B) is an arrow if
A contains the range of/ and B contains the domain of/. Since a total function is
a special case of a partial function, Fun is a subcategory of Par.

Generalising still further, a example of a category is Rel, the category of sets


third
and relations. This time the triples (JR, -4, B), where R is a subset of the
arrows are

cartesian product Ax B. Again, the target of (JR, -4, B) is A and the source B. The
identity arrow id,A : A <- A is the relation

id>A { (a, o) | cl G A }
and the composition of arrows (i?, -4, B) and (5,5, C) is the arrow (T, -4, C), where,
writing aRb for (a, 6) G i?, we have

oTc =
(36 : aRb A 65c).
2.1 J Categories 27

We can categories A and B to form another category AxB,


also combine two
called the product category of A and B. The product category has, as objects,
pairs (A, B), where A is an object of A and B is an object of B. The arrows are
pairs (/, #), where / is an arrow of A and g is an arrow of B. Composition is defined
component-wise:

(f,9)-(h,k) =
(f-h,g-k).
The identity arrow idAxB is, of course, (id,A, ids).

Although we shall see a number of other examples of categories in due course, Fun,
Par and Rel -

and especially Fun and Rel -

will be our main focus of interest.

Diagrams

As illustrated by the above examples, the requirement that each arrow has a unique
target and source can be something of a burden when it comes to spelling out the
details of an expression or equation. For this reason it is quite common to refer to an
arrow f : simply by the identifier /, leaving A and B implicit. Furthermore,
A<— B
whenever a composition it is implicitly assumed to be well defined. For
one writes
these abbreviations to be legitimate, the type information should always be clear
from the context.

A useful device for recording type information is a diagram. In a diagram an arrow

/ : A<— B is represented as A +
B, and its composition with an arrow g : B<— C is

represented as A + B <* C. For example, one can depict the type information
in the equation ida / =
/ as

idA
A

This diagram has the property that any two paths between the same pair of objects
depicts the same arrow: in such cases, a diagram is said to commute. As another
example, here is the diagram that illustrates one of the laws of the last chapter,
namely, listrf wrap wrap /: =

wrap
hstr A <+ A

listrf
UstrB + B
wrap
28 2 / Functions and Categories

It is possible to phrase precise rules about reasoning with diagrams, giving them the
sameformal status as, say, formulae in predicate calculus (Freyd and Scedrov 1990).
However, in what follows we shall use diagrams mainly for the simple purpose of
supplying necessary type information. Just occasionally we will use a diagram in
place of a calculational proof.

Reasoning with arrows

As a model for the


algebra of functions a category is rather a simple structure, and
one interpret familiar ideas about functions in terms of composition alone. As
has to
a typical example, consider how the notion of an injective function can be rendered

in an arbitrary category. An arrow m : A <— B is said to be monic if

f =
g
=
m-f =
m-g

for all /, g : B «— C. In the particular case of Fun, an arrow is monic if and only if it
is injective. To appreciate the calculational advantage of the above definition over
the usual set-theoretic one, let us prove that the composition of two monic arrows

is again monic. Suppose m : A «— B and n : B «— C are monic. Then we have

m n / =
m n g
=
{since m is monic}
n-f = n- g
=
{since n is monic}
/ =
</,

and so m n : A <— C is monic.

We can model the notion of a surjective function in an arbitrary category in a

similar fashion. An arrow e : B «—C is said to be epic if

f =
9 =
f -e =
g-e

for all/, g : A 4— B. In the particular case of Fun, an arrow is epic if and only if
it issurjective. A symmetrical proof to the one given above for monies shows that
the composition of two epics is again epic.

Duality

The exploitation of symmetry is very common in category theory and leads to


substantial economy in proof. It is worth while, therefore, to consider it in a bit
more detail. For any category C the opposite category Cop is defined to have the

same objects and arrows as C, but the source and target operators are interchanged
2.1 J Categories 29

and composition is defined by swapping arguments:

f-gmCop =
g-fin C.

The category Cop may be thought of as being obtained from C by reversing all
arrows. Reversing the arrows twice does not change anything, so (Cop)op C. =

Now, let 5(0) be a statement about the objects and arrows of a category C. By
reversing the direction of all arrows in 5(C), we obtain another statement Sop(C) =

S(Cop) about C. If 5(C) holds for each category C, it follows that Sop(C) also
holds for each category C. The converse implication is also true, because (Cop)op =

C. We have thus proved the equivalence

(VC:5(C)) =
(VC:5op(C)).
This special case of symmetry is called duality.
To illustrate, recall that above we proved that for any category C, the statement
5(C) ='the composition of two monies in C is monic' is true. Reversing the arrows
in the definition of monic gives precisely the definition of epic, and therefore the
statement 5op(C) 'the composition of two epics in C is epic' is also true for any
=

category C. This argument is summarised by saying that epics are dual to monies.

Some definitions do not change when the arrows are reversed, and a typical example
is the notion of an isomorphism. An isomorphism is an arrow i : A <— B such that
there exists an arrow in the opposite direction, say j : B <— A, such that

j -
i =
ids and i j =
id,A-

It is easy to show that there exists at most one j satisfying this condition, and this
unique arrow is called the inverse i~x of i. If there exists an isomorphism i : A<—B,

then the objects A and B are said to be isomorphic, and we write A B. In Fun =

an arrow is an isomorphism if and only if it is a bijective function, and two objects


are isomorphic whenever they have the same cardinality. When an arrow in Fun is

both monic and epic it is also an isomorphism, but this is a particular property of
Fun that does not hold in every category (see Exercise 2.6 below).

Exercises

2.1 Given is an arrow u : A «— A such that / u =


f for all B and / : B«— A. Prove
that u =
ida- It follows that identity arrows are unique.

2.2 Suppose we have four arrows f : A<-B, g : C«— A, h : B«— A, and A; : C<-B.
Which of the following compositions are well defined:

k-h-f-h g-k-h 1

(Drawing a diagram will make this book-keeping exercise very easy.)


30 2 / Functions and Categories

2.3 An arrow r : A «— B is a retraction if there exists an arrow r' : B<-A such that
r -
r' ida- Show that if r : A «— 5 is a retraction, then for any arrow f : A<- C
there exists g B<-C such that r-g
an arrow /. What is the dual of a retraction?
: =

Give the dual statement of the above property of retractions.

2.4 Show that / g = id implies that g is monic and / is epic. It follows that
retractions are epic.
2.5 Show that if / g is epic, then / is epic. What is the dual statement?

2.6 Any preorder (^4, <) can be regarded as a category: the objects are the elements
of j4, and there exists a unique arrow a <- b precisely when a < b. What are the
monic arrows? What are the epic arrows? Is every arrow that is both monic and
epic an isomorphism?
2.7 A relation R : A «- B is onto if for all a G -A, there exists b G B such that
aRb. Is every onto relation an epic arrow in Rel? If not, are epic arrows in Rel
necessarily partial functions?
2.8 For any category A, it is possible to construct a category Arr(A) whose objects
are the arrows of A. What is a suitable choice for the arrows of Arr(A)? What are

the monic arrows in this category?

2.2 Functors

Abstractly defined, a functor is a homomorphism between categories. Given two


categories A and B, a functor F : A «— B consists of two mappings: one maps
objects to objects and the other maps arrows to arrows. Both mappings are usually,
though not always, denoted by the same letter F. (A remark on notation: because
we will need a variety of capital letters to denote relations, single-letter identifiers

for functors will be written using sans serif font. On the other hand, multiple-letter
identifiers for functors will be written in the normal italic font. For example, id
denotes both the identity functor and the identity arrow.)

The two component mappings of a functor F are required to satisfy the property

F/ : FA <r- FB whenever / : A<- B.

They are also required to preserve identities and composition:

F(idA) =
idfA and F(/ g) =
F/ Fg.

Together, these properties mean that functors take diagrams to diagrams.


Some examples of functors are given below. In the literature, the definition of a
functor is often indicated by its action on objects alone. Although we will sometimes
2.2 J Functors 31

take advantage of this convention, it is not without ambiguity, since there may be
many functors that have the same action on objects. In such cases we will, of course,
specify both parts of a functor.

Functors can be composed in the obvious way: (F G)/ =


F(G/), and for every
category C there exists an identity functor id : C «— C. It follows that functors are
the arrows of a category in which the objects are themselves categories. Admittedly,
the construction of such large categories can lead to paradoxes similar to those
found in set theory; the interested reader is referred to (Lawvere 1966; Feferman
1969) for a detailed discussion. In the
sequel, we will suppose that application of
right, so FG-4
functors associates to the F(GA). =
Accordingly, we will often denote
composition of functors by juxtaposition, writing FG in preference to F G.

Examples of functors

Let us now look at some examples of functors. As we have already mentioned, there
is an identity functor id : C«— C for every category C. This functor leaves objects
and arrows unchanged.

An equally trivial functor is the constant functor K^ : A<— B that maps each object
B of B to one and the object A of A, and each arrow / of B
same to the arrow idA
of A. This functor preserves composition since idA idA idA- * =

Next, consider the squaring functor (_)2 : Fun«— Fun defined by

A2 =
{{a,b)\aeA,b€A}
/2(a,6) =
(fa J b).
It is easy to check that the squaring functor preserves identities and composition
and we leave details to the reader.

Compare squaring functor to the product functor (x) : Fun «— Fun x Fun. We
the
will write A B and / x g in preference to x(A1B) and x (/,#). This functor is
x

defined by taking A x B to be the cartesian product of A and S, and

(fxg)(a,b) =
(fa,gb).

Again, we leave the proof that x preserves identities and composition to the reader.
We met / x g in a programming context in the last chapter, where it was written
as cross (/,#).
Note that (x) takes two arguments (more precisely, a single argument consisting of
a pair of values); such functors are usually referred to as bifunctors. A bifunctor is
therefore a functor whose source is a product category. When F is a bifunctor, the
;vi 2 / Functions and Categories

functor laws take the form

F(zrf, id) =
id

F(f-h,g-k) =
F(f,g)-F(h,k).
Next, consider the functor listr : Fun «— Fun that takes a set A to the set listr A
of cons-listsover A, and a function / to the function listr f that applies / to each

element of a list. We met listr in the last chapter, where we made use of the
following pair of laws:

listr (f g)-
=
listr f listr g

listr id =
id.

Now we can see that these laws are simply the defining properties of the action of
a functor on arrows. We can also see why this action is denoted by listr f rather
than the more traditional mapf.

Next, the powerset functor P : Fun«— Fun maps a set A to the powerset P-4, which
is definedby

PA =
{x\xCA},
and a function / to the function P/ that applies / to all elements of a given set. The
powerset functor is, of course, closely related to the list functor, the only difference
being that it acts on sets rather than lists.

Next, the existential image functor E : Fun«— Rel maps a set A to P-4, the powerset
ofA, and a relation to its existential image function:

(ER) x =
{ a | (36 : aRb A b G x) }.
For example, the existential image of a finite set x : P Nat under the relation
(<) : Nat is the smallest initial segment of Nat
Nat «— containing x. Again, if
G : A«— PA denotes the membership relation on sets, then E(g) is the function that
takes a set of sets to the union of its members; in symbols, E(g) =
union.

Note that E and P are very similar (they both send a set to its powerset), but they
are functors between different categories: E : Fun«— Rel while P : Fun «— Fun.
In fact, as we shall make more precise in a moment, P is the restriction of E to
functions.

Finally, the graph functor J : Rel«— Fun goes the other way round to E. This
functor maps every function to the corresponding set of pairs, but leaves the objects
unchanged. The graph functor is an example of an inclusion functor that embeds
a category as a subcategory of a larger one. In particular, we have P =
EJ, which
formalises the statement that P is the restriction of E to functions.
2.3 J Natural transformations 33

Exercises

2.9 Prove that functors preserve isomorphisms. That is, for any functor F and
isomorphism z, the arrow Fi is again an isomorphism.

2.10 What is a functor between preorders? (See Exercise 2.6 for the treatment of
preorders as categories.)

2.11 For any category C, define

H(A,B) =
{f\f:A^BmC}
H{f,h)9 =
f'9'h.

Between what categories is H a functor?

2.12 Consider the datatype of binary trees:

tree A ::= tip A bin (tree A, tree A)

This gives a mapping taking sets to sets. Extend this mapping to a functor, i.e.
define tree on functions. (Later in this chapter we shall see how this can be done

in general.)

2.13 The functor P' : Fun <- Fun is defined by

P'A = PA

P*(f:A<-B)x =
{aeA {Vb e B : f b = a : b e x)}.
Prove that this does indeed define a functor. Show that P' is different from P. It
follows that P cannot be defined by merely giving its action on objects.

2.3 Natural transformations

Let F, G : A «— B be functors between two categories A and B. By definition, a


transformation to F from G is a collection ofarrows <j>b : FB <- G#, one for each

object B of B. These arrows are called the components of </>. A transformation is


called natural if

Ffe (\>b =
4>a Gfe
'

for all arrows h : A <— B in B. In a diagram, this equation can be pictured as


34 2 J Functions and Categories

FB^-GB
Fh\ Gft

FA+—GA

We write </>: F «— G to indicate that a transformation </> to F from G is natural. One


can remember the shape of the naturality condition by picturing <j> above the arrow
«— between F and G and associating it both to the left (Ffe (j>) and to the right
(0-Gfe).

Examples of natural transformations

chapter we met some natural transformations in the category Fun. For


In the first
example, consider again the function inits that returns all prefixes of its argument:

inits[ai, 02,... On] =


[ [], [ai], [ai, 02],..., [ai, 03,..., an] ].
For each set A there is an arrow inits a : listr (listr A) <— listr A. Since

listr (listr f) inits =


inits listr f,

we have that inits is a natural transformation inits : listr listr «— listr.

Another example, again in Fun: the function forkA : A2 <- A defined by fork a =

(a, a) is a natural transformation fork : (_)2 <— id. The naturality condition is

f2 fork
-
=
fork-f.
A natural transformation is called a natural isomorphism if its components are
bijective. For example, in Fun the arrows
swapA,B : A x B <- B x A defined by

swap (6, a) =
(a, b) form a natural isomorphism, with naturality condition

(9 x
/) *

swap =
swap (/ x g).
The above examples are typical: all polymorphic functions in functional
programming languages are natural transformations. This informal statement can be made
precise, see, for instance, (Wadler 1989), but to do so here would go beyond the
scope of the book.

Relations, that is, arrows of Rel, can also be natural transformations. For example,
the membership relations £a : A <— PA are the components of a natural
transformation: : id <— JE. To see what this means, recall that the existential image functor
2.3 J Natural transformations 35

E has type Fun «— Rel and the inclusion functor J has type Rel «— Fun. Thus,
JE : Rel <— Rel. The naturality condition, namely,

R .
e =
e .

JEi?,

says, in effect, that for any set x and relation


f?, the process of choosing an element
a of x and then equivalent to the process of choosing
a value b such that 6i?a, is

an element of the set {b


(3a : bRa A a x)}. This equivalence holds even when x
is the empty set or R is the empty relation, for in either case both processes fail to

produce a result. This particular natural transformation will be discussed at length


in Chapter 4.

Composition of natural transformations

For any functor F, the identity transformation idf : F<— F is given by (id?)a id* A- =

Composition of transformations is also defined componentwise. That is, if </>: F«— G


and V> G «— H, then the composite transformation (j> ij): F «— H is defined by

{4>"^)a =
4>a"^a-

It can easily be checked that <\>-i\) is natural, for one can paste two diagrams together:

FA^-GA^- HA

Fk Gk Hk

GA+—GB+— HB
<PB WB

The outer rectangle commutes because the inner two squares do. Thus, natural
transformations form the arrows of a category whose objects are functors.

One can compose a functor H with each component of a transformation </>: F «— G


to obtain a new transformation H</>: HF <— HG. The naturality of H(j> follows from

HFfe H<j>A =
H(Ffe fa) =
H(<j>B Gh) =
Hfo HGfe.

An example is the natural transformation E(e) : E«— EJE. As we have seen, E(€a) =

uniona, the function that returns the union of a collection of sets over A.

In what follows we will omit


subscripts when reasoning about the components of
natural transformations whenever they can be inferred from context. This is
common practice when reasoning about polymorphic functions in programming.
36 2 / Functions and Categories

Exercises

2.14 The text did not explicitly state the functors in the naturality condition of
swap. What are they?

2.15 The function ta takes an element of A and turns it into a singleton set. Verify
thatr : P «— id. Do we also have Jr : JE «— id?

2.16 The function cp returns the cartesian product of a sequence of sets. It is


defined by

cp [xi, a>2,..., xn] =


{ [oi, a2,..., an] \/i : 1 < i < n : di a* }.
Is cp a natural transformation? What are the functors involved?

2.17 Let F, G : A «— B, and H : B «— C be functors. Furthermore, let <\>: F«— G be


a natural transformation. Define a new transformation by (^H)^ 4>\\a- What is
=

the type of this transformation? Show that (/)H is a natural transformation.

In this book, we follow functional programming practice by writing (j> instead of (fA\.

2.18 The list functor listr : Fun«— Fun can be generalised to a functor Par«— Par
by stipulating that listr f x is undefined if there exists an element in x that is not
in the domain of /. For each set A, we have an arrow head : A <— listr A in Par
that returns the first element of a list. Is head a natural transformation id <— listr?

2.19 The category AB has as its objects functors A«— B and as its arrows natural
transformations. Take for B the category consisting of two objects, with one arrow
between them. Find a category that is isomorphic to AB, whose description does
not make use of natural transformations or functors.

2.4 Constructing datatypes


Our objective in the next few sections is to show how the basic
building blocks of
datatypes can be characterised in
categorical style. We will give properties that
a

characterise various kinds of datatype, such as products, sums, lists and trees, purely
in terms of composition. These definitions therefore make sense in any category -

although it can happen that, in a particular category, some datatypes do not exist.

When these definitions are interpreted in Fun they describe the datatypes we know

from programming practice. However, as we shall see, interpreting the same


definitions in Par or Rel may yield unexpected results. The discussion of these
unexpected interpretations serves both to deepen our understanding of the categorical
definitions, and as a motivation for Chapter 5, where datatypes are discussed in a
relational setting.
2.4 J Constructing datatypes 37

The simplest datatype is a datatype with only one element, so we begin with the
categorical abstraction of the notion of a singleton set.

Terminal objects

A terminal object of a category C is an object T such that for each object A of C


there isexactly one arrow T«— A. Any two terminal objects are isomorphic. If V is
another terminal object, then there exist unique arrows / : T«— T" and g : T' <-T.
But since the identity idr : T «— T is the only arrow of its type, it follows that
f g -

idr and, by symmetry, g /


=
idr*, so T and T" are isomorphic. This is
=

sometimes summarised by saying that 'terminal objects are unique up to (unique)


isomorphism'.
Prom now on, 1 will denote some fixed terminal object, and we shall speak of the
terminal object. The unique arrow from A to 1 is written U. The uniqueness of U
can be expressed as an equivalence:
h =
\A =
h:l^A. (2.1)
Such equivalences are called universal properties and we shall see them in abundance
in the pages to follow.

Taking 1 for A in the universal property of 1, we obtain

!i =
idx. (2.2)
This identity is known as the reflection law. We have also the fusion law

U-/ =
!b «= f:A<-B, (2.3)
because U / : 1 «— B. Note that the fusion law may be restated as saying that
! is natural transformation Ki «— zrf, where K^ is the constant functor defined
a

in Section 2.2. Like universal properties, there will be many examples of other
reflection and fusion laws in due course.

In Fun the terminal object is a singleton set, say {p}. The arrow U is the constant
function that maps every element of A to p. The statement that the terminal object
is unique up to unique isomorphism states that all singleton sets are isomorphic in
a unique way. In Par and Rel the terminal object is the empty set; in both cases

the unique arrow { } <— A is the empty relation 0.

Initial objects

An initial object of C is a terminal object of Cop. Thus, J is initial if for each


object A of C there is exactly one arrow of type A <— I. By duality, it follows that
38 2 / Functions and Categories

initial objects are unique up to unique isomorphism. A commonly used notation for
the initial object of C is 0, and the unique arrow A «— 0 is denoted j^. In Fun the
initial object is the empty set and j^ is the empty function. Thus, the names 0 and
1 for the initial and terminal objects connote the cardinality of the corresponding
sets in Fun. In Par and Rel the initial object is also the empty set, so in these

categories initial and terminal objects coincide.

Exercises

2.20 An element of A is an arrow e : A «— 1. An arrow c : A «— B is said to be


constant if for all other arrows/, g : B «— C we have c / =
c g. Prove that
any element is constant. Assuming that B has at least one element, show that any
constantarrow c : A «— B can be factored as e !s for some element e of A.

2.21 An object A is said to be empty if the only with target A are j^ and
arrows

ida- What are the empty objects in Fun? Same question for Rel and Fun x Fun.

2.22 What does it mean to say that a preorder has a terminal object? (See Exercise
2.6 for the interpretation of preorders as categories.)
2.23 Let A and B be categories that have initial and terminal objects. Does A x B
have initial and terminal objects?

2.24 Assuming that A and B have terminal objects, what is the terminal object
in AB? (For the definition of AB, see Exercise 2.19.)

2.5 Products and coproducts

New datatypes can be built by tupling existing datatypes or by taking their disjoint
union; the categorical abstractions considered here are the notions of product and
coproduct.

Products

A product of two objects A and B consists of an object and two arrows. The object
is written as A xB and the arrows are written outl: A^AxB and outr : B^AxB.
These three things are required to satisfy the following property: for each pair of
arrows/ : A^C and g : B<—C there exists an arrow (f,g):AxB<-C such that
h =
(f>9) = outl -

h=f and outr h =


g (2.4)
2.5 J Products and coproducts 39

for all h : A x B «— C. This is another example of a universal property: it states


that (/, g) is the
unique arrow satisfying the property on the right. The operator
(/, g) is pronounced 'pair / and g\ The following diagram summarises the type
information:

outl outr
AxB B

</,<?>

The diagram also illustrates the cancellation properties

outl (/, g) =
f and outr (f,g} =
g, (2.5)
which are obtained by taking h =
(/, #) in the right-hand side of (2.4).
Taking outl for / and o^r for g in (2.4), we obtain the reflection law

id =
(outl, outr).

Taking (ft, k) m for ft in (2.4) and using (2.5), we obtain the fusion law

(ft, &) m =
(/, g) <= h-m=f and k-m =
g.

In other words,

(ft, &) ra =
(h- m,k m).-

(2.6)
Use of these rules in subsequent calculations will be signalled simply by the hint
products.

Examples of products

In Fun products are given by pairing. That is, A x B is the cartesian product of A
and S, and outl and outr are the obvious projection functions. In the last chapter
we wrote pair (/, g) for the arrow
{/, g), with the definition

pair(f,g)a =
{fa,ga).
This construction does not define a product in Par or Rel since, for example,

taking everywhere undefined partial function 0 we obtain (/, 0) =0 and


g to be the
so outl (/,0) 0, not /. The discussion of products in Par and Rel is deferred
=

until we have also discussed coproducts.

Any two categories A and B also have a product. As we have seen, the category
AxB has as its objects pairs (A,B)y where A is an object of A and B is an
40 2 / Functions and Categories

object of B. Similarly, the arrows are pairs (/, g) where / is an arrow of A and g
is an arrow of B. Composition is defined component-wise, and outl and outr are
the obvious projection functions. In fact we can turn outl and outr into functors
outl: A «— A x B and outr : B «— A x B by defining two mappings:

outl(A, B) A = and outr(A, B) = B

outl(f, g)=f and outr(f, g) =


g.

Spans

There is an a product of A and B in a category C, namely,


alternative definition of
as object in the category Span(^4, B) of spans over A and B. A span
the terminal
over A and B is a pair of arrows (/ : A<^C,g : B«— C) with a common source. The

objects of Span(i4, B) are spans over A and S, and the arrows m : (/, g) «— (ft, &)
are arrows of C satisfying

/ .
m =
ft and g
-
m = k.

The information is summarised in the following diagram:

A J— C -^ B

V/
ft\ I /k
D

Composition in Span(4, B) is the same as that in C. The particular span {outl :


A<- A x S, o^r : B<- A x S) is the terminal object in Span(^4, S), and (/, #) is just
another notation for !(/,^). Indeed, our earlier definition (2.4) is a special case of
the universal property (2.1) of terminal objects. This fact implies that products are
unique up to unique isomorphism. Also, the reflection and fusion law for products
are special cases of the same laws for terminal objects.

The product functor

If each pair of objects in C has a product, one says that C has products. In such a
case x can be made into a bifunctor C^-CxCby defining it on arrows as follows:

/ x g =
(/ outl, g outr).
-

(2.7)
We met x in the special case C = Fun in Section 2.2. For general C the proof
that x preserves identities is immediate from the reflection law (outl, outr) =
id.
To show that x also preserves composition, that is,

(fxg).(hxk) =
(f-h)x(g-k),
2.5 J Products and coproducts 41

it suffices to prove the absorption law

(f*9)-(p,q) =
<f-P,g-q). (2.8)

Takingp = h- outl and q = k- outr in (2.8) gives the desired result. Here is a proof
of (2.8):

(/ g) (p, q)
x

=
{definition of x}
{/ outl, g outr) (p, #)
-

=
{fusion law (2.6)}
(f outl {p,q),g outr {p,q))
-

=
{cancellation law (2.5)}
(f-P,9'Q)'
Using the definition of x and the cancellation law (2.5), we now obtain that outl
and outr are natural transformations:

outl: outl«— (x) and outr : outr«— (x).


Note the two uses of outl and outr, both as a collection of arrows and as functors
between two categories. Again, use of any of the above properties in calculations
will often be signalled from now on simply by the hint products.

Coproducts

The product of A and B in Cop is called the coproduct of A and B in C. Thus


coproducts, like products, also consist of one object and two arrows for each A and
B. The object is denoted by A + S, and the two arrows by inl : A + B <- A and
inr : A + B <- B. Given f : C <- A and g : C <- B, the unique arrow C <- A + B
is written [/, g], and pronounced 'case / or g\ Thus, the coproduct in C is defined
by the universal property

^ =
[/> #] = h- inl =
f and h inr =
g (2.9)
for all h : C <— A + B. The following diagram spells out the type information:

inl M _
inr

The properties of coproducts follow at once from those of products by duality, but
42 2 / Functions and Categories

we can also describe them in a direct approach. The cancellation properties are:

[/, g] -inl=f and [/, g] inr =


g,

These can be obtained by taking ft =


[/, g] on the right of (2.9). Taking inl for /
and inr for g in (2.9) we obtain the reflection law

id =
[inl, inr}.
Taking m [ft, &] for ft, we obtain the fusion law

m [ft, k] =
[/, g] <= m ft =
/ and m -
k =
g,

which is the same as saying


m
[ft, A] =
[m ft, m &].
Use of these laws in calculations is signalled by the hint coproducts.

The coproduct functor

We also obtain a bifunctor + whose definition on arrows is

f + g =
[M. f,inr-g].
The composition and fusion laws

(f + 9)'(h + k) =
f-h + g-k
f\ ,9]-{h + k) =
\f-h,g.k]
follow at once by duality, though one can also give a direct proof.

Coproducts in Fun are disjoint unions:

A + B =
{(a,0) I a€-4}U{(6,l) | b e B}.
Thus inl adds a 0 tag, while inr adds a 1. In a functional programming style one
can do this rather more directly, avoiding the artifice of using 0 and 1 as tags, by
defining A + B with the type declaration

A + B ::= inl A inr B

and the case operator by

[f,g](inla)=f a and [/, g] (inr b) g b.


=

Note that in Fun if A is of size m and B is of size n, then A + B is of size m + n


while A x B is of size m x n. Unlike products,.coproducts in Par and Rel are
defined in exactly the same way as in Fun.
2.5 I Products and coproducts 43

Products in Par and Rel

As we have already indicated, we cannot define products in Par simply by taking


the cartesian product of two sets. The reason bears repeating: the cancellation laws

outl (/, g) =
f and outr (/, g) =
g

fail to hold under the interpretation (/, g) a (f a,g a) when / and g are partial. =

To be sure, in lazy functional programming, these laws are restored to good health
by extending the notion of function to include a bottom element _L and making
constructions such as pairing non-strict. We will not go into details because this
approach is not exploited in this book.

Instead, we can define A x B in Par by

Ax B ::= inl A | mid(A,B) inrB.

The partial function outl: A <— A x B is defined by the equations

outl
(inl a) =
a

outl (mid (a, b)) a

outl (inr b) =
undefined,

and outr by

(inl a)
outr =
undefined
outr (mid (a, b)) = b

outr (inr b) = b.

The pair operator is defined by

(f,g)a =
inl(f a), if defined (f a) and undefined (g a)
=
inr(g a), if undefined (f a) and defined (g a)
=
mid (f a,g a), otherwise.

To check, for example, that (outl (/, g)) a =


f a for all a we have to consider four
cases, depending whether / and g
on a a is defined. Taking just one case, suppose
fa is undefined and ga is defined. Then we have (f,g)a =
inr(ga). But then
outl (inr (g a)) is undefined, as required. The other cases are left as exercises.

The definition of products in Rel is simpler because products coincide with


coproducts. That is, we can define Ax B to be the disjoint union of A and B. The reason

is that every relation has a converse and so Rel is the same as Relop. This is
the same reason why initial objects and terminal objects coincide in Rel. We will
discuss this situation in more depth in Chapter 5.
44 2 / Functions and Categories

Polynomial functors

Functors built up from constants, products and coproducts are said to be


polynomial More precisely, the class of polynomial functors is defined inductively by the
following clauses:

The identity functor id and the constant functors K^ for varying A are

polynomial;

If F and G are polynomial, then so are their composition FG, their pointwise
sum F + G and their pointwise product F x G. These pointwise functors are

defined by

(F + G)fc = Ffc + Gfc

(F x G) h = Fhx Gh.

For example, the functor F defined by FX = A + X x A and Ffe =


idA + h x idA is
polynomial because
F =
KA + (id x
Ka),
where, in this equation, + and x denote the pointwise versions.

Polynomial functors are useful in the construction of datatypes, but they are not
enough by themselves; we also need type functors, which correspond to recursively
defined types. These are discussed in Section 2.7. For datatypes that make use
of function spaces, and for a categorical treatment of currying in general, we need
exponential objects; these are discussed in Chapter 3.

Exercises

2.25 The partial order (Nat, <) of natural numbers regarded


can be as a category
(see Exercise 2.6). Does this category have products? Coproducts?
2.26 Show that in any category with a terminal object and products there exist
natural isomorphisms
unit : A «— A x 1

swap : AxB^BxA
assocr : A x (B x
C) ^ (A x
B) x C.

These natural isomorphisms arise in a number of examples later in the book. The
inverse arrow for assocr will be denoted by assocl; thus,
assocl :
(Ax B) x C ^ Ax (B x
C)
satisfies assocl assocr = id and assocr assocl =
id.
2.6 J Initial algebras 45

2.27 Prove the exchange law

(\f,9},[h,k}) =
[{f,h),(9,k)}.

2.28 Consider products and coproducts in Fun. Are the projections (outl, outr)
epic? Are the injections (mZ, inr) monic? If the answers to these two questions are
different, does this contradict duality?

2.29 Let A be a category with products. What are the products in Arr(A)? (See
Exercise 2.8 for the definition of Arr(A).)
2.30 Complete the verification of the construction of products in Par.

2.31 A lazy functional programming language can be regarded as a category, where


the types are objects and the arrows are (meanings of) programs. Does pair forming

give a categorical product in this category?

2.6 Initial algebras


In order to say exactly what a recursively defined datatype is, we need one final
piece of machinery: the notion of an initial algebra.

Let F : C «— C be a functor. By definition, an F-algebra is an arrow of type


A FA, the object A being called the carrier of the algebra. For example, the
<—

algebra (Nat, +) of the natural numbers and addition is an algebra of the functor
FA A x A and Ffe = h x h.

A F-homomorphism to an algebra / : A «— FA from an algebra g : B <- FB is an

arrow h : A<- B such that

h-g =
/-FA.
The type information is provided by the diagram:

B *J— FB

\Fh
j

FA
f
To give just one simple illustration, consider the algebra (+) : Nat <- Nat2 of
addition, and the algebra (0) : Natp «— Nat2 of addition modulo p, where Natp =

{0,1,..., p 1} and n 0 m =
(n + m) mod p. The function hn =
n mod p is a

(_)2-homomorphism to 0 from +.
46 2 / Functions and Categories

Identity arrows are homomorphisms, and the composition of two homomorphisms is


again a homomorphism, so F-algebras form the objects of a category Alg(F) whose
arrows are homomorphisms. For many functors, including the polynomial functors

of Fun, this category has an initial object, which we shall denote by a : T<- FT (the
letter T stands for 'Type' and also for 'Term' since such algebras are often called
term algebras). The proof that these initial algebras exist is beyond the scope of
the book; the interested reader should consult (Manes and Arbib 1986).

The existence of an F-algebra means that for any other F-algebra / : A<- FA,
initial
there is unique homomorphism to / from a. We will denote this homomorphism
a

by ([/D, so
([/]) : A «— T is characterised by the universal property
ft =
([fD =
h-a=f-Fh. (2.10)
The type information is summarised in the diagram:

T ^— FT

w\
FA
f

Arrows of the form ([/]) are called catamorphisms, and we shall refer to uses of
the above equivalence by the hint catamorphisms. (The word 'catamorphism' is
derived from the greek preposition Kara meaning 'downwards'.) Catamorphisms,
like other constructions by universal properties, satisfy fusion and reflection laws.
Before giving these, let us first pause to give two examples that reveal the notion
of a catamorphism to be a familiar idea in abstract clothing.

Natural numbers

algebras of the category Fun will be named by type declarations


Initial of the kind
commonly found in functional programming. For example,

Nat ::= zero succNat

declares [zero, succ] : Nat <- F Nat to be the initial algebra of the functor F defined
by FA =
1 + A and Ffe =
id\ + h. Here zero : Nat«— 1 is a constant function.
The names Nat, zero and succ are inspired by the fact that we can think of Nat
as the natural numbers, zero as the constant function returning 0, and succ as the
successor function. The functor F is polynomial, so the category Alg(F) has an

initialobject; the purpose of the type declaration is to give a name to this initial

algebra.
2.6 J Initial algebras 47

Every algebra of the functor F : Fun <— Fun takes the form [c,f] for some constant
function c : A «— 1 and function / : A «— A. To see this, let ft : A «— FA be an

F-algebra. We have ft =
[ft inl, ft inr], so we can set c =
ft mZ and / = ft inr.
It isclumsy to write ([[c,/]]) so we shall drop the inner brackets and write ([c,/])
instead.

helpful to spell out exactly what function ft


It is is defined by ft =
([c, /]). Simplifying
the definition, we find

ft a [c,/] Fft
=

=
{definition of F}
ft a [c,/] (idi + ft)
=

=
{coproduct}
ft a [c,/ ft]
=

=
{since a [zero, succ}}
=

ft [zero, succ] [c,/ ft]=

=
{coproduct}
[ft zero, ft succ] [c,/ ft] =

=
{cancellation}
ft zero = c and ft succ =
f ft.

Writing 0 for the particular constant returned by the constant function zero and
n + 1 for succn, we now see that ft =
([c,/]) is the unique solution of the two

equations

ft
(0) =
c

ft(n l)
+ =
/(ftn).
In other words, ft =
foldn (c,/). Thus ([c,/]) =
/oZrfn (c,/) in the datatype Nat.

Strings

The second example deals with lists of characters, also called strings:

String ::= nil | cons (Char, String).


In the next section will generalise this datatype to lists over an arbitrary type,
we

but it is worth while considering the simpler case first. The above declaration
names [nil, cons] : String «— F String to be the initial algebra of the functor FA =

1 + (Char x ^4) and F/ id + (id x /). In particular, nil: String «— 1 is a constant


=

function, returning the empty string.


48 2 / Functions and Categories

Like the example of Nat given above, every algebra of this functor takes the form
[c,/] for some constant c : A <- 1 and function f : A*- Char x A. Simplifying, we
find that h =
([c,/]) is the unique solution of the equations

h nil = c

h{cons{a,x)) =
f(a,hx).
In other words, ([c,/]) foldr(c,f) = in the datatype String. So, once again, ([c,/])
corresponds to a fold operator.

Fusion

Prom the definition of catamorphisms we immediately obtain the reflection law

<[a]) =
id (2.11)
and the very useful fusion law

fc d/D =
dffD «= h-f =
g-Fh. (2.12)
The fusion law can be proved by looking at the diagram

_
ol
FT

W TO
y

A FA

h\ y
Fft

B FB

This diagram commutes because the lower part does (by assumption) and the upper
part does (by definition of catamorphism). But since ([g]) is the unique homomor-
phism from a to p, we conclude that ([g]) h ([/]). =

The fusion law for catamorphisms is probably the most useful tool in the arsenal of
techniques for program
derivation, and we shall see literally dozens of uses in the
programming examples given in the remainder of the book. In particular, it can be
used to prove that a is an isomorphism. Suppose in the statement of fusion we take
both g and h to be a. Then we obtain a ([/]) ([a]) id provided a / a Fa. = = =

Clearly, we can choose / Fa and as a result


= we obtain a
([Fa]) id. We can also =

show that ([Fa]) a =


id:

([Fa]) a
2.71 Type functors 49

=
{cata}
Fa FflFaD
=
{F functor}
F(a-dFaD)
=
{above}
Fid
=
{F functor}
id

The fact that a is an isomorphism was first recorded in (Lambek 1968), and it
is sometimes referred to as Lambek's Lemma. Motivated by his lemma, Lambek
called (a, T) a Gxpoint of F, but we shall not use this terminology.

Exercises

2.32 Let fo : A*- B x A, fi : A*- Ax C and /2 : A «- B. Define a functor F, an

F-algebra g : A «- FA, andmappings (j>i (i =


0,1,2) such that (j>iQ =
fa.

2.33 What is the initial algebra of the identity functor?

2.34 Let a : T «- FT be the initial algebra of F. Prove that mf m =


idx implies
that m is a catamorphism.

2.35 Show that ([f g]) =


f fa F/J.

2.36 Let a : T <- FT be the initial algebra of F. Show that if / : A <- T, then
/ = 0m£Z ([g]) for some p.

2.37 Give an example of a functor of Fun that does not have an initial algebra.
(Hint: think of an operator F taking sets to sets such that F A is not isomorphic to
A for any A.)

2.7 Type functors

Datatypes are often parameterised. For example, we can generalise the example of
strings described above to a datatype of cons-lists over an arbitrary A:

UstrA ::= nil cons(A,listr A).


This declares [nil,cons]A : UstrA «- FA(UstrA) to be the initial algebra of the
functor Fa defined by F^(^) l + (AxB) and F^(/)
= = id 4- (id x /). We can, and
50 2 / Functions and Categories

will, write F(A, B) instead of Fa(B), in which case we think of F as a bifunctor.


We will always arrange the arguments of a bifunctor so that the functor obtained
by fixing the first argument (and varying the second) is the one that describes the
initial algebra.

To illustrate this important convention, consider the declaration

listlA ::= nil snoc(listlA,A),


which describes the type of snoc-lists over A. Snoc-lists are similar to cons-lists
except that we build them by adding to the end rather than to the beginning of the
list. The algebra [nil, snoc] is the initial algebra of the bifunctor

F{A,B) =
11(5x4).

Fixing the first argument gives us a functor F^(/) =


F(idA,f) and it is this functor
that describes the initial algebra.

Let F be a bifunctor with the collection of initial algebras aa : TA 4- F(A, J A). The
construction T can be made into a functor by defining

T/ =
fla-F(/,«J)D. (2.13)
For example, the cons-list functor is defined by

listrf =
([[nil, cons] (id + (/ x id))])
which simplifies to listrf =
([nil, cons (/ x id)]). Translated to the point level, this
reads

listrf nil =
nil
listr f (cons (a,x)) = cons (f o, listr f x),
so listrf is just what functional programmers would call mapf, or maplistf.

We have, of course, to prove that T preserves identities and composition, so let us

do it. First:

J id
=
{definition}
([a F(td, id)])
=
{bifunctors preserve identities}
(M)
=
{reflection law}
id.
2.7 J Type functors 51

Second:

Tf-Tg
=
{definition}
<[a-F(f,id)])-Tg
=
{fusion (see below)}
tfa.F(/,«0-F(<MoOD
=
{F bifunctor}
da-F(/-p,id)])
=
{definition}
T(f-g).
The appeal to fusion is justified by the following more general argument:

flftD-Tp ([ft-F(0,id)B
=

=
{definition of T}
dft])-([a-F(p,ui)]) ([ft-F(p,id)])
=

<= {fusion}
(M) a F(p, id) A F(p, id) F(id, ([A]))
=

=
{cata}
A F(id, ([A])) F(p, id) A F(p, id) F(id, ([A]))
=

=
{F bifunctor}
true.

This argument in effect shows that

dfcJ-Tp =
dfc-F(p,»d)]). (2.14)
In words, a catamorphism composed with its type functor can always be expressed
as a single catamorphism. Equation (2.14) is quite useful by itself and we shall refer
to it in calculations by the hint type functor fusion. To give just one example now:
if sum =
([zero, plus]) is the function sum : Nat 4- listr Nat, then
sum -
listr f =
([zero,plus (/ x id)]).
Now that we have established that T is a functor, we can show that a : T «- G is a

natural transformation, where G/ =


F(/,T/). We argue in a line:

T/ a =
a F(/, id) F(id,T/) =
a F(/,T/) =
a G/.
In what follows we will say that (a, T) is the initial type defined by the bifunctor F.
52 2 / Functions and Categories

Before passing on to examples we make three remarks. The first is that it is


important not to confuse the type functor T associated with a datatype with the functor
F that defines the structure of the datatype. We will call the latter the base functor.
For example, the datatype of cons-lists over an arbitrary A has as base functor the
functor F defined on arrows by F/ id\ 4- ida x /, whereas the type functor listr is
=

defined on arrows by listr f ([ml, cons (/ x id)]). =

The second remark is that, subject to certain healthiness conditions on the functor
involved, the initial algebras in Par and Rel coincide with those in Fun. This will
be proved in Chapter 5.

The third remark concerns duality. As with the definitions of terminal objects and
products, may dualise the above discussion to coalgebras. This gives a clean
one

description, for instance, of infinite lists. We shall not have any use for such infinite
data structures, however, and their discussion is therefore omitted. The interested
reader is referred to (Manes and Arbib 1986; Malcolm 1990b; Hagino 1989) for
details.

Exercises

2.38 The discussion of initial types does in fact allow bifunctors of type F : A <-

(B x A). Consider the the initial type (a,T). Between what categories is T a
functor? An example where B = Fun x Fun and A = Fun is

H(f,9),h) =
f + g.

What is the initial type of this bifunctor?

2.39 Let F be a bifunctor, and let (a, T) be the corresponding initial type. Let G
and H be unary functors, and define LA F(GA, HA). Prove that if <j>: H 4- L, then
=

W:H<-TG.
2.40 A monad is a functor H : A «- A, together with two natural transformations

rj: H «- id and // : H «- HH, such that

fi H77 =
id =
fi
-

rj and //•//
=
//• H/i.

Many initial types give rise to a monad, and the purpose of this exercise is to prove
that fact. Let F be a bifunctor given by

Hf,g) =
f + Gg,
for some other functor G. Let (a, T) be the initial type of F. Define (j) = a inl and
if) =
([id, a -

inr]). Prove that (T, 0,^) is a monad, and work out what this means

for the special case where Qg =


g x g.
Bibliographical remarks 53

Bibliographical remarks

The material presented in this chapter is well documented in the literature. There
is variety of textbooks on category theory that are aimed at the computing
now a

science community, for instance (Asperti and Longo 1991; Barr and Wells 1990;
Pierce 1991; Rydeheard and Burstall 1988; Walters 1992a). The idea to use initiality
for reasoning about programs goes back at least to (Burstall and Landin 1969), and
was reinforced in
(Goguen 1980). However, this work did not make use of F-algebras
and thus lacks the conciseness that gives the approach its charm. Nevertheless,
the advantages of algebra in program construction were amply demonstrated by
the CIP-L project, see e.g. (Bauer, Berghammer, Broy, Dosch, Geiselbrechtinger,
Gnatz, Hangel, Hesse, Krieg-Briickner, B., Laut, A., Matzner, T., Moller, B., Nickl,
F., Partsch, H., Pepper, P., Samelson, K., Wirsing, M., and Wossner, H. 1985;
Bauer, Ehler, Horsch, Moller, Partsch, Paukner, O., and Pepper, P. 1987; Partsch
1990).
The notion of F-algebras first appeared in the categorical literature during the 1960s,
for instance in (Lambek 1968). Long before the applications to program derivation
were realised, numerous authors e.g. (Lehmann and Smyth 1981; Manes and Arbib

1986) pointed out the advantages of F-algebras in the area of program semantics.
Hagino used a generalisation of F-algebras in designing a categorical programming
language (Hagino 1987a, 1987b, 1989, 1993), and (Cockett and Fukushima 1991)
have similar goals.

It is (Malcolm 1990a, 1990b) who deserves credit for first making the program
derivation community aware of this work. The
particular treatment of
datatypes
given here is strongly influenced by the presentations of our colleagues in (Spivey
1989; Gibbons 1991, 1993; Fokkinga 1992a, 1992b, 1992c; Jeuring 1991, 1993;
Meertens 1992; Meijer 1992; Paterson 1988). In particular, Fokkinga's thesis
contains a much more thorough account of the foundations, and Jeuring presents some
spectacular applications. The paper by (Meijer, Fokkinga, and Paterson 1991) is
an introduction specially aimed at functional programmers.

One topic that we avoid in this book (except briefly in Section 5.6) is the
categorical
treatment of datatypes thatsatisfy equational laws. An
example of such
a datatype

is, for instance, the datatype of finite bags. Our reason for not discussing such
datatypes is that we feel the benefits in later chapters are not quite justified by the
technical machinery required. The neatest categorical approach that we know of to
datatypes with laws is (Fokkinga 1996); see also (Manes 1975). There are of course
many data structures that are not easily expressed in terms of initial algebras, but
recently it has been suggested that even graphs fit the framework presented here,
provided laws are introduced (Gibbons 1995).

Another issue that we shall not address is that of mechanised reasoning. We are
hopeful, however, that the material presented here can be successfully employed in
54 2 J Functions and Categories

a mechanised reasoning system: see, for instance, (Martin and Nipkow 1990).
Chapter 3

Applications

Let us now come down to earth and illustrate some of the abstract machinery we

have set up in the preceding chapter with a number of programming techniques


and examples. We also take the opportunity to discuss some features of functional
programming that have not been covered so far in a categorical setting. These
include the use of currying and conditionals.

3.1 Banana-split
Recall that the type of cons-lists over A is defined by

listr A ::= nil cons (A, listr A).


The function sum returns the sum of a list of numbers and is defined by the cata-

morphism

sum =
([zero, plus]),
where plus(a, b) =
a+b. Similarly, the function length is defined by a catamorphism

length =
([zero, succ outr]).
Given these two functions we can define the function average by

average = div (sum, length),


where div(m, n) =
m/n. Of course, applied to the empty list average returns 0/0
and we had better fix this problem if average is to be a total function. So let
div(0,0) = 0.

Naive implementation of this definition of average yields a program that traverses


its argument list twice: once for the computation of sum, and once for the
computation of length. An obvious strategy to obtain a one-pass program is to express
56 3 J Applications

(sum, length) as a single catamorphism. This is in fact possible for any pair of
catamorphisms, irrespective of the details of this particular problem: we have

(W, ([*])> =
d(fc F outl, k F outr)]),

where F is the -

so far -

unmentioned base functor of the catamorphism. The above


identity is known among researchers in the field as the banana-split law (because
catamorphism brackets are like bananas, and because the pairing operator has also
been called 'split' in the literature). To prove the banana-split law, it suffices by
the universal property of catamorphisms to show that

(([A]), ([*])> 'Ol =


(h F outl, k F outr) F(([ft]), ([*])>.

This equation can be verified as follows:

«M,G*D>-«
=
{split fusion}
<([*]) •«,([*])•€*}
=
{catamorphisms}
<ft-F([ftD,*-F([*B)
=
{split cancellation (backwards)}
(h F(outl (([A]), ([*])», k F(outr (([A]), ([*])»>
=
{F functor}
(h Fotifl F(([ftB, ([*])>, ft Fotitr F(([ft]), ([*])»
=
{split fusion (backwards)}
(h Fotifl, ifc Fotitr) F(([ft]), ([*])>.
Applying the banana-split law to the particular problem of writing (sum, length) as

acatamorphism, we find that

(sum, length) =
([zeros, pZ^ss])
where zeros =
(zero, zero), and pluss(a, (6, n)) =
(o + 6, n + 1). The banana-split
law is perfect example of the power of the categorical approach: a simple technique
a

of program optimisation involving the merging of two loops is generalised to


structural recursion over arbitrary datatypes and proved with a short and convincing
argument.

Exercises

3.1 Let FX = 1 + (N x X). Show that

([zero, plus] Foutl, [zero, succ outr] Fo^^r) =


[zeross,pluss].
3.1 J Banana-split 57

3.2 Let F : C «- C, where C is a category that has products. Define <j) =

(Foutl,Foutr). Between what functors is 0 a natural transformation? Prove that


the naturality condition is indeed satisfied.

3.3 A list of numbers is called steep if each element is greater than the sum of the
elements that follow it:

steep nil = true

steep (cons (o, x)) =


a> sumx A steep x.

A naive implementation takes quadratic time. Give a linear-time program.

3.4 The pattern in the preceding exercise can be generalised as follows. Suppose
that h: B*- FB, and

T *^ FT

F{/,W)
|
A+— F(A xB)
9

commutes. Construct k such that / = outl ([k]) and prove that your construction
works.

3.5 Consider the datatype of trees:

tree A ::= null node (tree A, A, tree A).


A tree is balanced if at each node we have

1/3 < n/(n + ro + l) < 2/3,


where n and m are the sizes of the left and right subtree respectively.

Apply the preceding exercise to obtain an efficient program for testing whether a

tree is balanced.

3.6 The function preds : list Nat 4- Nat takes a natural number n and returns the
list [n, n 1,..., 1]. Apply Exercise 3.4 to write preds in terms of a catamorphism.

3.7 The factorial function can be defined as

fact =
product preds,

where product returns the product of a list of numbers. Use the preceding exercise
and fusion to obtain a more efficient solution.
58 3 J Applications

3.8 Prove Fokkinga's mutual recursion theorem:

f-a =
h-F{f,g) A p-a =
*-F(/,p>

</,<?> =
(KM)]).
It may behelpful to start by drawing a diagram of the types involved. Show that the
banana-split law and Exercise 3.4 are special cases of the mutual recursion theorem.

3.2 Ruby triangles and Horner's rule

The initial type of cons-lists is the basis of the circuit design language Ruby (Jones
and Sheeran 1990), which is in many ways similar to the calculus used in this book.
Ruby does, however, have a number of additional primitives. One of these primitives
is called triangle. For any function / : A «- A, the function trif : listr A «- UstrA
is defined informally by

trif [ao, ai,..., Oi,..., on] =


[ao,/ oi,... ,/* Oi,... ,/n on].
In Ruby the single most important result for reasoning about triangles is the
following one. For all / and c,

flc,0D-*ri/ =
([c,$-(tdx/)]) 4= f-c =
caiidf-g =
g-(fxf).
In Ruby, this fact is called Horner's rule, because it generalises the well-known
method for evaluating polynomials. If we take c 0, p(o, b)
=
a+6, and/ o
=
axx, =

then the above equation states that because

Ox x =
0

(a+ b) x x = a x x + b x x,

we have

oq + oi x x + Ofi x x2 H h an x xn
=
oq + («i + (oa H (an + 0)xx--)xx).
In Ruby, Horner's rule is stated only for the type of lists built up from nil and cons.

The purpose of this section is to generalise Horner's rule to arbitrary initial types,
and then to illustrate it with a small, familiar programming problem.

First, let us define trif : listr A 4- listr A formally: we have

trif =
([nil, cons (id x listrf)]).
The base functor of cons-lists is F(A, B) = 1 + 4x5, and the initial algebra
3.2 J Ruby triangles and Horner's rule 59

a =
[nil, cons], so we can write the above in the form

trif =
([a>F(idJistrf)]).
This immediately gives the abstract definition: let F be a bifunctor with initial type
(a,T); then

trif =
([a.F(id,T/)]).
For the definition to make sense we require / to be of type A 4- A for some A, in
which case trif : TA+-TA. We aim to generalise Horner's rule by finding conditions
such that

([ffD-tri/ =
fo-F(id,/)D.
The type information is illustrated in the following diagram:
tri f
A F(A, A) JA + -

JA

f 9

A A A

By fusion it suffices to find conditions such that

fo]).a.F(id,T/) =
tf-F(uJ,/).F(ui,fo])).
We calculate:

(M)-a.F(fd,T/)
=
{catamorphisms}
S-F(id,fo5).F(«J,T/)
=
{F bifunctor}
9-HidM-V)
=
{type functor fusion (2.14)}
S-F(W,ds-F(/,«)]))
=
{claim: see below}
9 F(«,/ tftfD)
¦

=
{F bifunctor}
g-Hid,f)-F(idA9D)-
The claim is that ([g F(/, id)]) =
f ([#]). Appealing to fusion a second time, this
equation holds if

f'9 =
9'Hf,id).f(id,f).
60 3 J Applications

Since the right-hand side equals g F(/,/), we have shown that

<M)-*ri/ =
fo.F(uJ,/)5 <= f-9 =
9-f(f,f).
For the special case of lists this is precisely the statement of Horner's rule in Ruby.

Depth of a tree

Now consider the problem of computing the depth of a binary tree. We define such
trees with the declaration

tree A ::= tip A bin (tree A, tree A).


The base functor F of this definition is V(A, B) A + B x B, and the initial type of
=

F is ([tip, bin], tree). We have that / =


([g, ft]) is the unique solution of the equations

/ (tip o) =
g a

f(bin(x,y)) =
h(f x,f y),
so ([g, ft]) generic fold operator for binary trees.
is the In particular, the map operator
treef for binary trees is defined by

treef =
([[tip, bin] F(/, id)]).
At the point level, this equation translates into two equations

(tip a)
tree f =
tip (f a)
treef (bin (x, y)) =
bin (treef x, treef y).
The function max : N 4- tree N returns the maximum of a tree of numbers:

max =
([id, bmax]),
where bmax (a, b) returns the maximum of a and b. The function depths : treeN *-

tree A takes a tree and replaces every tip by its depth in the tree:

depths = tri succ tree zero,

where zero is the constant function returning 0, and succ is the successor function.
Finally, we specify the depth of a tree by

depth =
max depths.

A direction implementation of depth will require time that is quadratic in the


tips. For an unbalanced tree of n tips with a single tip at every positive
number of
depth, the computation of depths requires evaluation of succ1 for 1 < i < n and
3.2 J Ruby triangles and Horner's rule 61

this takes 0(n2) steps. We aim to improve the efficiency by applying the generalised
statement of Horner's rule to the term max tri succ. The proviso of Horner's rule
in this case is that

succ [id, bmax] =


[id, bmax] (succ + succ x succ).
Since succ 'id id- succ we require

succ bmax = bmax (succ x succ),


but this is equivalent to the fact that succ is monotonic. Therefore, we obtain

depth
=
{definitions}
max tri succ tree zero

=
{Horner's rule}
([[id, bmax] (id + succ x succ)]) tree zero
=
{coproducts}
([id, bmax (succ x succ)]) £ree zero
=
{since bmax (succ x succ) succ bmax} =

([id, succ feraax]) tree zero


=
{type functor fusion}
([zero, succ bmax]).
This is the obvious linear-time program for computing the maximum depth of a

tree.

The moral of this example is that the categorical proof of familiar laws about lists
(such as Horner's rule) are free of the syntactic clutter that a specialised proof
would require.
Furthermore, the categorically formulated law sometimes applies to
programming examples that have nothing to do with lists.

Exercises

3.9 The function slice :: list (listr+A) «- list (listr+A) is given informally by

slice [xo,xi,...,xn-i] =
[drop 0 xo, drop lxi,..., drop (n -

l)xn_i],
where drop n x drops the first n elements from the list x. Define the function slice
in terms of tri.
62 3 J Applications

3.10 The binary hyperproduct of a sequence of numbers [ao, ai,..., an_i] is given
by nr=To °r* Using Horner's rule, derive an efficient program for computing binary
hyperproducts.

3.11 Horner's rule can be generalised as follows. If ft g =


g F(/, ft), then

foD-tri/ =
(fo-F(»d,ft)D.
Draw a diagram of the types involved and prove the new rule.

3.12 Show that, when the new rule of the preceding exercise is applied to
polynomial evaluation, there is only one possible choice for ft.

3.13 Specify the problem of computing $^£T0 io>% in terms of tri. Horner's rule is
not immediately applicable, but it is if you consider computing ($27=0 *°»» 2 ai)
instead. Work out the details of this application.

3.14 Consider binary trees of type

tree A ::= tip A | node (tree A, tree A).


The weighted path length of a tree of numbers is obtained by multiplying each tip
by depth, and then summing the tips. Define a function wpl : Nat 4- tree Nat
its
that returns the weighted path length of a tree, using tri. Using Horner's rule,
improve the efficiency of the definition.

3.3 The TfcjjX problem


-

part one

The TjgK problem (Knuth 1990; Gries 1990a) is to do with converting between
binary and decimal numbers in Knuth's text processing system 1^}X (used to produce
this book). TJjjX uses integer arithmetic, with all fractions expressed as integer
multiples of 2~16. Since the input language of l^X documents is decimal, there is
the problem of converting between decimal fractions and their nearest representable
binary equivalents.

Here, we are interested only in the decimal to binary problem; the converse problem,
which is more difficult, will be dealt with in Chapter 10. Let x denote the decimal
fraction O.dicfe dk and let

j=k

val(x) =

J2dj/1QJ (3-1)

be the corresponding real number. The problem is to find the integer multiple
of 2~16 nearest to val(x), that is, to round 2l6val(x) to the nearest integer. If
3.3 J The Tfe?C problem -

part one 63

two integers are equally near this quantity, we will take the larger; so we want
n =
[216val(x) + 1/2J. The value n will lie in the range 0 < n < 216.

So far, so good. But it is required to use integer arithmetic only in the


calculation and to keep intermediate results reasonably small, so there is a programming
problem to get round.

To formulate (3.1) in programming terms we will need the datatype

Decimal ::= nil cons {Digit, Decimal).


The function vol : Unit 4- Decimal, where Unit denotes the set of real numbers r

in the unit interval 0 < r < 1, is then given by the catamorphism

val =
([zero, shift])
shift (d, r) =
(d + r)/10.
For example, with x =
[d\, d%, ds] we obtain that val x is the number

(di + (<k + (d3 + 0)/10)/10)/10 =


di/lO+dz/lOO+da/lOOO.

Writing [0,216] for the set of integers n in the range 0 < n < 216, our problem is to
compute intern :
[0,216] «- Decimal, where

intern =
round val
round r =
|_(217r + 1)/2J,
under the restriction that only integer arithmetic is allowed.

For completeness, we specify the converse problem, which is to compute a function


extern : Decimal 4- [0,216), where [0,216) denotes the set of integers n in the

range 0 < n < 216. The function extern is defined by the condition that for all
arguments n the value of extern n should be a shortest decimal fraction satisfying
intern (extern n) We cannot yet formalise this specification, let alone solve
=
n.

the problem, since the definition does not identify a unique decimal fraction, and
so extern cannot be described solely within a functional framework. On the other
hand, extern can be specified using relations, a point that motivates the remainder
of the book.

Let us return to the problem of computing intern. Given its definition, it is tempting
to try and use the fusion law for catamorphisms, promoting the computation of
round into the catamorphism. However, this idea does not quite work. To solve the
problem, we need to make use of the following 'rule of floors': for integers a and b,
with b > 0, and real r we have

L(a + r)/6j =
L(a + LrJ)/6J.
64 3 J Applications

Applied to the function round, the rule of floors gives that

round = halve convert

halve n =
(n + l)div2
convert r =
[217rJ.
This division of round into two components turns out to be necessary because, as

we shall see, we can apply fusion with convert but not with halve.

To see if we can apply fusion with convert, we calculate:

(convert shift) (d, r)


=
{definitions of convert and shift}
L217(d + r)/10j
=
{rule of floors, since 217d is an integer}
'L(217d+L217rJ)/10j
=
{definition of convert}
L(217d + convert (r))/10j
=
{introducing cshift; see below}
cshift(d, convert(r)),
where cshift(d, n) =
(2l7d 4- n) div 10. Since we also have convert(0) =
0, we now

obtain

convert [zero, shift] =


[zero, cshift] (id + (id x convert)),
and hence, by fusion, intern = halve ([zero, cshift]). This concludes the derivation.

Two further remarks are in order. The first is a small calculation to show that the
expression halve ([zero, cshift]) cannot be optimised by a second appeal to fusion.
We have

(halve cshift) (d, n)


=
{definitions of halve and cshift}
L(L(217d + n)/10j+l)/2j
=
{arithmetic}
L(217d + n + 10)/20j
Now, in order to appeal to fusion, we have to write this last expression in the form
/ (d, halve n) for some function /. Since halve(2k) halve(2k 1) for all fc > 0, we
=

therefore require that

f(d, halve (2k)) =


f (d, halve (2k -

1)).
3.3 J The Tfe?C problem
-

part one 65

In other words, we need

L(217d + 2fc + 10)/20J =


L(217d + 2fc + 9)/20j
for all fc > 0. But, taking d =
0 and fc =
5, this gives 1 =
0, so no function / can

exist and the attempt to use fusion a second time fails.

The second remark concerns the fact that nowhere above have we exploited any
property of 217 except that it was a non-negative integer. For the particular value
217, the algorithm can be optimised: except for the first 17, all digits of the given
decimal can be discarded since they do not affect the answer. A proof of this fact
can be found in (Knuth 1990).

Exercises

3.15 Taking Decimal = listr Digit (why is it valid to do so?), the function val could
be specified
val =
sum tri (/10) listr (/10).
Derive the catamorphism in the text.

3.16 Supposing we take 22 rather than 216, characterise those decimals whose
intern values are n, for 0 < n < 4.

3.17 Show that intern =


intern take 17.

3.18 The rule of indirect equality states that two integers m and n are equal iff

fc < m =
fc < n ,
for all fc.

Prove the rule of indirect equality. Can you generalise the rule to arbitrary ordered
sets?

3.19 The Goor of a real number x is defined by the property that, for all integers n,

<
n < x
=
n [x\.
Prove the rule of floors using this definition and the rule of indirect equality.

3.20 Show that the rule of floors is not valid when o or b is not an integer.

3.21 Show that if/ : A^-B is injective, then for any binary operator (0) : B*-Cx B
there exists a binary operator (®) : A 4- A x C such that

/(cei>) =
c<g>/6.

(Cf. Exercise 2.34.)


66 3 J Applications

3.22 Let f : A<r- B and (®) : B 4- C x B. To prove that there exists no binary
operator (®) : Ai- C x A such that

/(ceii) =
C0/6,
it suffices to find c, &o and &i such that

fb0=fbi and /(c0 60) //(c0 &i).


Apply this strategy to prove that fusion does not apply to round vo/.

3.4 Conditions and conditionals

We have already shown how many features of current functional programming


languages be expressed and characterised in a purely categorical setting. But there
can

are two important omissions: definition by cases and currying. Currying will be
dealt with in the following section; here we are concerned with how to characterise
definition by cases.

The coproduct construction permits a restricted kind of definition by cases,


essentially definition by pattern-matching. As we have seen, this is sufficient for the
description of many functions. However, programmers also make use of
conditionals; for example, the function filter p is defined using a mixture of pattern-matching
and case analysis:

filter p[] =
[]
n,. / / w f const a, filter px), if pa
filter p (cons (a, x))
v v "
=
< £1. T
otherwise.
y filter px,

Given the McCarthy conditional form (p -> /, g) for writing conditionals, we can

express filter p as a catamorphism on cons-lists:

filter p =
([nil, test p])
testp =
(p outl -> cons, outr).
The question thus arises: how can we express and characterise the conditional form
(p —>f,g) in a categorical setting?
In functional programming the datatype Bool is declared by
Bool ::= true false.

Thus, Bool = 1 + 1, with injection functions inl =


true and inr =
false. Using this
datatype, we can define the function not: Bool 4- Bool by

not =
[false, true].
3.4 J Conditions and conditionals 67

The negation of a condition p : Bool 4- A can now be defined as not p. Although


this is straightforward enough, the construction of binary operators such as and and
or is a little more problematic. As we shall see, we need the assumption that the
underlying category is distributive. In a distributive category one can also construct
conditionals.

Distributive categories

In any category with products and coproducts there is a natural transformation

undistr : A x (B + C) «- (A x B) + (A x C)
defined by undistr =
[id x inl, id x inr] (undistr is short for 'un-distribute-right').
Thus,

(/ x (9 + ^)) *
undistr =
undistr ((/ x g) + (/ x
ft))
for all /, g and ft of the appropriate types. In a distributive category undistr is, by
assumption, a natural isomorphism. This means that there is an arrow

distr :
(A x B) + (A x
C) <-A x (B + C)
such that distr undistr =
id and undistr dis£r = id.

There is a second requirement on a distributive category. In any category with


products and initial objects, there is a (unique) arrow

unnull: A x 0 «- 0

for each A. In a distributive category unnull, like undistr, is assumed to be an


isomorphism. Thus, there is an arrow null :0f-ix0 such that null unnull id =

and unnull ntt// =


id.

In other words, in a distributive category we have the natural isomorphisms

A x (B + C) £
(4x5) + (ixC)
ixO £ 0,

as well as the natural isomorphisms


*
Ax(BxC)^(AxB)xC A+ (B + C) (A + B) + C

AxB^BxA A + B*±B + A

Axl^A 4+ 0^4,

described in Exercise 2.26. Below we shall sometimes omit brackets in products and
coproducts that consist of more than two components.
68 3 J Applications

One consequence of a category being distributive is that there are non-trivial arrows
whose source is a product, the trivial arrows being the identity arrow and the

projections. In particular, there is an isomorphism

quad :1 + 1 + 1 + 1<- Bool2.

We will leave the proof as an exercise. It follows that we can define arrows of type
Bool 4- Bool2 in terms of arrows of type Bool «-l + l + l + l. For example, we can

define

and =
[true, false, false, false] quad
and the conjunction of p, q : Bool 4- A by and (p, q). Other boolean connectives
can also be defined by such 'truth tables'.

A distributive category also gives us the means to construct, given a function p :


Bool 4- A and two functions /, g : B «- A, a conditional function (p -> f,g) : B+-A.
The idea is to associate with each condition p : Bool 4- A an arrow p? : A 4- A <- A,
for thenwe can define

(P^f,9) =
\f,9]-p1. (3-2)
The arrow p? is defined by

p? =
(unit 4- unit) distr (id,p).
The types are shown by the following diagram:

(id,p) -? A x Bool

p?l distr

A + A <* 4x1 + 4x1


unit 4- unit

The association of conditions p with arrows pi is injective (see Exercise 3.25). Using
(3.2), let us now show that the following three
definition properties of conditionals
hold:

h-(p^f,9) =
(p^h-f,h-g) (3.3)
(p^>f,9)-h =
(p-h^f-h,g.h) (3.4)
(P-*/,/) =
/• (3-5)

Equation (3.3) is immediate from (3.2) using the distributivity property of coprod-
ucts. For (3.4) it is sufficient to show

(ft + h) .(p. ft)? =


p?.ft.
3.4 / Conditions and conditionals 69

The proof is:

(ft + ft)-(p-ft)?
=
{definition}
(ft + ft) (unit 4- unit) dis£r (id,p ft) *

=
{naturality of distr and unit}
(unit 4- tmi£) dis£r (ft x id) (id,p ft) *

=
{products}
(unit 4- tmi£) dis£r (id,p) ft
=
{definition}

For (3.5) it is sufficient to show that [/",/] p? =


/. We argue:

=
{definition}
[/",/] (unit 4- tmi£) d^r (id,p)
=
{coproducts; naturality of unit}
wm* [/" x id,/ x id] dis£r (id,p)
=
{claim: see below}
Mnit (/ x id) (id,p)
=
{products}
unit- (/,p)
=
{since unit om*/} =

/•
The claim is an instance of the fact that

/ x
[#, ft] =
\f x £,/ x
ft] disfr.

The proof, which we leave as a short exercise, uses the definition of undistr and the
fact that undistr distr = id.

Exercises

3.23 Show that if outl


A AxQ

j a oM*r
0

commutes, then there must exist an arrow unnull such that unnull md/ = id and
null unnull = id:
70 3 / Applications

3.24 Is Rel a distributive category?

3.25 Prove that (_)? is injective with inverse (_)£ defined by

ti =
(! + !)•*.
Hint: first show that (! + !)• distr =
(! + !)• outr.

3.26 Prove that

{unit 4- unit) distr : F 4- G,

where RA =
A + A and G>1 =
ix .Boo/.

3.27 Suppose in a distributive category that there is an arrow ft : 0 «- A. Show


that ft is an isomorphism and hence that A is also an initial object.

3.28 Prove that / x [g, ft] =


[f x g,f x ft] distr.

3.29 Show that Bool2 ^1 + 1 + 1 + 1.

3.30 Prove that filterp listrf =


listrf filter (p /) using the following definition
of filter:

filter p =
concat fo^r (p —> wrap, ni/),
where wrap a =
[a] and nil : listr A 4- A is a constant returning the empty list.

3.5 Concatenation and currying

Consider once more the type listr A of cons-lists over A. In functional programming
the function cat : listr A 4- listr A x listr A is written as an infix operator -If and

defined by the equations

[]*» =
y
cons (a, x) -W-y =
cons (a, x 4f y).
In terms of our categorical combinators these equations become

cat (nil x
id) =
outr

cat (cons x
id) =
cons (id x
cat) assocr,

where assocr is the natural isomorphism assocr : A x (B x C) *- (A x


B) x C
described in Exercise 2.26. We can combine the two equations for cat into one:

cat ([nil, cons] x


id) =
[outr, cons] (id + id x cat) 0, (3.6)
3.5 J Concatenation and currying 71

where </>: (1 x C) + A x (B x C) «- (1 + A x B) x C is given by

(j> =
{id 4- assocr) distl

and distl is the natural isomorphism (A x C) + (B x C) *- (A + B) x C whose


companion distr was described in the preceding section.

But how do we know that equation (3.6) defines cat uniquely? The function cat

is not a catamorphism, in spite of its name, because it has two arguments, so we


cannot appeal to the unique solution property of catamorphisms.

The answer is to consider a variant ccat of cat in which the arguments are curried.
Suppose we define ccat: (listr A <- listr A) <- listr A by ccat x y
= x -H- y. Then we

have

ccat[] =
id

ccat (cons (o, x)) =


ccons a ccat x,

where we take ccons : (listrA*- listrA) 4- A to be a curried version of cons. This


version of cat is a catamorphism, for we have

ccat =
([const id, compose (ccons x
id)]),
where const f is areturning / and compose(f, g)
constant =
/ #. Just to check
this, we expand the catamorphism to two equations:

ccat ni/ = const id

ccat cons =
compose (ccons x
ccat).

Applying the first equation to the element of the terminal object, and the second
to (o,x), we obtain the pointwise versions

ccat nil =
id

ccat (cons (o, x)) =


compose (ccons o, ccat x),
which is what we had before. The conclusion is that since the curried version of cat
is uniquely defined by this translation, the original version is, too.

All this leads to a more general problem: consider a functor F with initial type
(a,T), another functor G, and a transformation (J)a,b '-
G(i4 x B)i-FAx B. What
conditions on (j> guarantee that the recursion

/ (a .
x
id) =
h G/ (j) (3.7)
defines a unique function / for each choice of ft?
72 3 J Applications

In a diagram we have

lAxB <?X%d VIA x B A G(T4 x 5)


G/
r

B + QB
h

To solve the problem we use the same idea as before and curry the function /. To
do this we need the idea of
function space object A *- B, more
a usually written in
the form AB. Function space objects are called exponentials.

Exponentials

Let C be category with terminal object and products. An exponential of two


a

objects an object AB and an arrow apply : A 4- AB x B such that for


A and B is
each / : A 4- C x B there is a unique arrow curry f : AB 4- C such that

apply (curry f x id) =


/.
In other words, we have the universal property

g
=
curry f
=
apply (g x id) =
/.
For fixed A and B, this definition can be regarded as defining a terminal object in
the category Exp, constructed as follows. The objects of Exp are arrows A<- C x B
in C. An arrow h «- k of Exp is an arrow f:CxB*-DxB of C just when the

following diagram commutes:

C x B ^ DxB

The terminal Exp is apply : A 4- AB x B, and !/


object of is given by curry f. The
reflection law of terminal objects translates in this case to

curry apply =
id,
and the fusion law reads

curry f g =
curry (/ (g x
id)).
If a category has finite products, and for every pair of objects A and B the
exponential AB exists, the category is said to be cartesian closed. In what follows, we

assume that we are working in a cartesian closed category.


3.5 J Concatenation and currying 73

Returning to the problem of solving equation (3.7), the fusion law gives us that

/ (a x
id) =
ft G/ (j> =
curry f a =
curry (ft G/ (j>).
Our aim now is to find a A; so that

curry (ft G/ 4>) =


k F(curryf),
in which case we obtain curry f =
([&]). We reason:

curry (h G/ 0)
=
{curry cancellation}
cum/ (ft G (apply (curry f x id)) 0)
=
{functor}
cum/ (ft G apply G (curry f x id) 0)
=
{assumption: (j> natural}
curry (ft G apply 0 (F(curryf) x id))
=
{curry fusion law (backwards)}
curry (ft G apply 0) F(curry f).

Hence we can take k =


curry (ft G apply 0). The only assumption in the argument
above was that (j> is natural in the following sense:

G(ft x
id) -<j) =
<j> (Fft x id).
In summary, we have proved the following structural recursion theorem.

Theorem 3.1 If (j) is natural in the sense that G(ft x


id) (j> =
(j> (Fft x
id), then

f (ax id)
-
=
h-Gf (/> -

if and only if

/ =
apply (([cum/ (ft Gapply 0)]) x
id).

Let us now see what this gives in the case of cat. We started with

cat -

(ax id) =
[outr, cons] (id + id x
cat) 0,
where 0 (id 4- assocr) dis£Z. So, ft
= =
[ot^r, cons] and G/ =
(id -\- id x
f). The
naturality condition on (/> is

(id + id x (ft x
id)) <j> =
0 ((id + (id x
ft) x
id)),
74 3 J Applications

which is easily checked. Hence we find that

cat =
([curry ([outr, cons] (id + id x
apply) (id + assocr) distl)]),
which simplifies to

cat =
([curry ([outr, cons (id x apply) ossocr] dis£Z)]).
We leave it as an instructive exercise to recover the pointwise definition of cat from
the above catamorphism.

Tree traversal

Let us look at another illustration. Consider again the initial type of trees
introduced in Section 3.2. The function tips returns the list of tips of a given tree:

tips =
([wrap, cat]).
Here cat (x, y) = x -H- y is the concatenation function on lists from the last section,
and wrap is the function that converts an element into a singleton list, so wrap a =

[a]. In most functional languages, the computation of rr-H-y takes time proportional
to the length of x. Therefore, when we attempt to implement the above definition
directly in such a language, the result is a quadratic-time program.

To improve the efficiency, we aim to design a curried function tipcat such that

tipcat tx =
tips t -H- x.

Since the empty list is the unit of concatenation we have tips t tipcat t [], so tipcat =

isa generalisation of our problem. The addition of an extra parameter such as x is

known as accumulation and is a well-known technique for improving the efficiency


of functional programs.

Using curry, we can write the above definition of tipcat more briefly as

tipcat curry cat tips.


=

This suggests an application of the fusion law. Can we find an / and op so that
both of the following equations hold?

curry cat wrap


=
/
curry cat cat =
op (curry cat x curry cat)

Well, since cons (o, x) =


cat([a],x), we can take / =
curry cons. To find op we

reason as follows:
3.5 J Concatenation and currying 75

(curry cat cat) (x, y) z


=
{application}
(x 4f y) -H- z
=
{since (-H-) is associative}
(x 4f (?/ 4f z))
=
{application}
(curry cat x cum/ co^ ?/) z
=
{introducing compose (ft, fc) ft A;} =

(compose (curry cat x cum/ co^)) (x, y) z.

Hence we have

tips t =
([curry cons, compose]) t nil.

In contrast to the original definition of tips, this equation can be implemented


directly as a linear-time program.

Exercises

3.31 Show that

cat (nil x
id) = 0M£r

ca£ (cons x id) =


cons (id x co^) assocr

is equivalent to equation (3.6), using properties of products and coproducts only.

3.32 Prove that any cartesian closed category that has coproducts is distributive.

3.33 Construct the following isomorphisms:


A°^l A1* A AB+c*ABxAc.

3.34 Construct a bijection between arrows of type A^-B and arrows of type AB^-\.

3.35 What does it mean for a preorder to be cartesian closed? (See Exercise 2.6
for the interpretation of preorders as categories.)

3.36 Let B be an object in a cartesian closed category. Show how (_)B can be
made into a functor by defining fB for an arbitrary arrow /.

3.37 Show that if A is cartesian closed, then so is AB. (See Exercise 2.19 for the
definition of AB.)
76 3 / Applications

3.38 The map function (as in functional programming) is a collection of arrows

mapA,B : listr Alistr B <-AB

such that mapA,Bf =


listrf. Between what functors is map a natural
transformation. Write out the naturality condition and prove that it is satisfied.

3.39 The function cpr (short for 'cartesian product, right') with type

cpr : listr (A x B) «- A x listr B

is defined by the list comprehension

cpr(x,b) =
[(o, b) | a
4-x].
Give a point-free definition of cpr in terms of listr.

3.40 A functor F is said to be strong if there exists a corresponding natural


transformation

mapA.B : F4FB «- AB .

Show that every functor of Fun is strong. Give an example of a functor that is
notstrong. (Warning: in the literature, strength usually involves a number of
additional conditions. Interested readers should consult the references at the end
of this chapter.)
3.41 What conditions guarantee that

/ (id x a) =
ft G/ <j)
has a unique solution for each choice of ft?

3.42 Show that the following equations uniquely determine iter (#, ft) : A*- (Nat x
B), for each choice of g : A 4- B and ft : A 4- A:

iter (g, ft) (zero x id) =


g outr

iter (#, ft) (succ x id) =


ft iter g ft.

How can addition be expressed in terms of iter?

3.43 Continuing the preceding exercise, show that

id x iter (id,
l—LJ.
ft) ...

Nat x A + Nat x (Nat x A))


iter (id, ft) assocl

Nat -* Nat x A + ¦

(Nat x Nat) x A
iter (id, ft) plus x id

commutes for all ft : A 4- A.


3.5 J Concatenation and currying 11

3.44 Consider the type definition

tree A v.— tip A node (tree A)A


Does this definition make sense in Fun? Could you write it in your favourite
functional programming language?

3.45 The introduction of an accumulation parameter in the tree traversal example


can be summarised as follows. Suppose that we have a function k and a value e
such that kae a (all o) and k •/ =
g Fk. Then for all x, we have ([/]) x =
([#]) x e.
Prove this general statement. The following four exercises aim to apply this strategy
to other examples.

3.46 Recall the function convert : listr A «- listl A which produces the cons-list
corresponding to a given snoc list. It is defined by

convert =
([nil, snocr]),
where snocr (x,a) =
x -H- [a]. Improve the efficiency of convert by introducing an

accumulation parameter.

3.47 Using the type of cons-lists, define

reverse =
([nil, snocr]),
where snocr was defined above. Improve the efficiency of reverse by introducing an

accumulation parameter.

3.48 The function depths, as defined in terms of tri, takes quadratic time. Derive a
linear-time implementation by introducing an accumulation parameter. Hint: take
kan tree (+n) a, and e = 0.

3.49 In analogy depth of a tree example, we can also define the minimum
with the
depth, and the minimum depth can be written as a catamorphism. Direct evaluation
of the catamorphism is inefficient because it will explore subtrees all the way down
to the tips, even if it has already found a tip at a lesser depth. Improve the efficiency

by introducing an accumulation parameter. Hint: take k a (n, m) min(a + n, m) =

and e =
(0,oo).
3.50 Consider the recursion scheme:

loop h- (ax id) =


[id, loop h (id x h) assocr] distl,
where a =
[nil, snoc]. Show that for any choice of h the function loop h is determined
uniquely.
78 3 J Applications

3.51 Using the preceding exercise and Exercise 3.46, check that convent xy =

convert x -H- y satisfies the equation

uncurry convcat =
loop cons.

Hence show how cons-list catamorphisms can be implemented on snoc-lists by

([e,/]) convert =
loopf (id, e •!).
How can snoc-list catamorphisms be implemented by a loop over cons-lists?

Bibliographical remarks

The banana-split law was first recorded by (Fokkinga 1992a), who attributes its
catchy name to Meertens and Van der Woude. Of course, similar transformations
have been studied in other contexts; they are usually classified under the names
tupling (Pettorossi 1984) or parallel loop fusion.

Our exposition on Horner's rule in Ruby does not do justice either to Ruby or to the
use of this particular rule. We entirely ignored several important aspects of Ruby,

partly because only introduce these once relations have been discussed. The
we can

standard introduction to Ruby is (Jones and Sheeran 1990). Other references are
(Jones and Sheeran 1993; Hutton 1992; Sheeran 1987, 1990). (Harrison 1991)
describes a categorical approach to the synthesis of static parallel algorithms which
is very similar to the theory described here, and is also similar to Ruby. (Skillicorn
1995) considers the categorical view of datatypes as an appropriate setting for
reasoning about architecture-independent parallel programs. In (Gibbons, Cai, and
Skillicorn 1994), some parallel tree-based algorithms are discussed.

Distributive categories are the subject of new and exciting developments on the
border between computation and category theory. The exposition given here was
heavily influenced by the text (Walters 1992a), as well as by a number of research
papers (Carboni, Lack, and Walters 1993; Cockett 1993; Walters 1989, 1992b).
The connection between distributive categories and the algebra of conditionals was
definitively explored by Robin Cockett (Cockett 1991).
The subject of cartesian closed categories is rich and full of deep connections to
computation. Almost all introductory books on category theory mention cartesian
closed categories; the most comprehensive treatment can be found in (Lambek and
Scott 1986). The trick of using currying to define such operations as concatenation
in terms of catamorphisms goes at least back to Lawvere's Recursion theorem for
natural numbers: see e.g. (Lambek and Scott 1986). In (Cockett 1990; Cockett and
Spencer 1992) it is considered how the same effect can be achieved in categories that
do not have exponentials. The key is to concentrate on functors that have tensorial
strength, that is a natural transformation 0 :
F(A x B) «- FA x B satisfying certain
Bibliographical remarks 79

coherence conditions. For more information on strength, the interested reader is


also referred to (Kock 1972; Moggi 1991). Some interesting applications of these
categorical concepts to programming languages and program construction can be
found in (Jay 1994; Jay and Cockett 1994; Jay 1995).

The interplay between fusion and accumulation parameters was first studied in
(Bird 1984). Our appreciation of the connection with currying grew while reading
(Meijer 1992; Meijer and Hutton 1995), and by discussions with Masato Takeichi
and his colleagues at Tokyo University (Hu, Iwasaki, and Takeichi 1996).
Chapter 4

Relations and Allegories

We now generalise from functions to relations. There are a number of reasons for
this step. First, like the move from real numbers to complex ones, the move to
relations increases our powers of expression. Relations, unlike functions, are essentially
nondeterministic and employ them to specify nondeterministic problems.
one can

For instance, optimisation problem can be specified in terms of finding an


an

optimal solution among a set of candidates without also having to specify precisely
which one should be chosen. Every relation has a well-defined converse, so one can
specify problems in terms of converses of other problems.

A second reason concerns the structure of certain proofs. There


deterministically are

specified programming problems with deterministic solutions


where, nevertheless,
it is helpful to consider nondeterministic expressions in passing from the former to
the latter. The proofs become easier, more structure is revealed, and directions for
useful generalisation are clearly signposted. So it is with problems about functions
of real variables that are solved more easily in the complex plane.

On the other hand, in the hundred years or so of its existence, the calculus of
relations has gained a good deal of notoriety for the apparently enormous number
of operators and laws that one has to memorise in order to do proofs effectively.
In this chapter we aim to cope with this problem by presenting the calculus in five
successive stages, each of which is motivated by categorical considerations and is
sufficiently small to be studied as a unit. We will how these parts interact, and
see

how they can be put to use in developing a concise and effective style of reasoning.

4.1 Allegories

Allegories are to the algebra of relations as categories are to the algebra of


functions. An allegory A is category endowed with three operators in addition to
a

target, source, composition and identities. These extra operators are inspired by
the category Rel of sets and relations. Briefly, we can compare relations with a
82 4 J Relations and Allegories

partial order C, take the intersection of two relations with fl, and take a relation to
its converse with the unary operator (_)°. The purpose of this section is to describe
these operators axiomatically.

Inclusion

The first assumption is that any two arrows with the same source and target can be
compared with a partial order C, and that composition is monotonic with respect
to this order: that is,

(Si C S2) and (li C T2) implies (Si li) C (S2 T2).
In Rel, where a relation R : A «— B is interpreted as a subset R C Ax B, inclusion
of relations is the same as set-theoretic inclusion; thus

RCS =
(Va,b:aRb=>aSb).
Monotonicity of composition is so fundamental that we often apply it tacitly in
proofs. An expression of the form S C T is called an inequation, and most of the
laws in the relational calculus are inequations rather than equations. The proof
format used in the preceding chapter adapts easily to reasoning about inequations,
as long as we don't mix reasoning with C and reasoning with D. A proof of R S =

by two separate proofs, one of R C S and one of S C i2, is sometimes called a ping-
pong argument. Use of ping-pong arguments can often be avoided either by direct
equational reasoning, or by an indirect proof in which the following equivalence is
exploited:

R =
S =
(X CR =
XCS) for all X.

Thus, an indirect proof is equational reasoning with =.

It willoccasionally be helpful to illustrate inequations by diagrams similar to those


given in the
preceding chapter. The fact that a diagram illustrates an inequation
rather than an equation is signalled by inserting an inclusion sign at an appropriate
point. For instance, the diagram

depicts the inequation 5i Ti C S2 T2. In such cases, one says that a diagram
semi-commutes.
4.1 J Allegories 83

Meet

The second assumption is that for all arrows i?, S : A 4- B there is an arrow
RnS : A «— B, called the meet of R and £, and characterised by the universal
property

XC(RnS) =
(XCR) and (X C 5), (4.1)
for all X : j4«-1?. In words, RnS is the greatest lower bound of i? and S. Using this
universal property of meet it can easily be established that meet is commutative,
associative and idempotent. In symbols:

Rns =
snR

Rn(SnT) =
(RnS)nT
RHR =
R.

Using meet, we can restate the axiom of monotonicity as two inclusions:

R-(SnT) C
(R.S)n(R-T)
(RnS)>T C (R-T)n(S>T).
Given n as an associative, commutative, and idempotent operation, we need not
postulate inclusion of arrows as a primitive concept, for R C S can be defined as

an abbreviation for R H S = R.

Converse

Finally, for each arrow R : A «- B there is an arrow R° : B «- A called the converse

of R (and also known as the reverse or reciprocal of i?). The converse operator has
three properties. First, it is an involution:

(R°)° =
R. (4.2)
Second, it is order-preserving:
RCS = R°CS°. (4.3)
Third, it is contravariant:

(R-S)° =
S°R°. (4.4)
Using (4.2) and (4.3), together with the universal property (4.1), we obtain that
converse distributes over meet:

(Rf)S)° =
R°nS°. (4.5)
Use of these four properties in calculations will usually be signalled just by the hint
converse.
84 4 J Relations and Allegories

The modular law

There is one more axiom that connects all three operators in an allegory. The axiom
is called the modular law and states that

(R-S)nT C R.'(Sn(R°'T)). (4.6)


The modular law is also known as Dedekind's rule. The modular law holds in Rel,
the proof being as follows:

(36 : aRb A bSc) A aTc


=
{predicate calculus}
(36 : aRb A bSc A aTc)
=» {since aRb A aTc =» b(R° T)c}
(3b:aRbAbScAb(R°- T)c)
=
{meet}
(3b:aRbAb(Sn(R°- T))c).
One can think of the modular law as a weak converse of the distributivity of
composition over meet.

By applying converse to both sides of the modular law and renaming, we obtain the
dual variant

(R-S)nT C (Rn(T-S°))-S. (4.7)


In fact, the modular law can be stated symmetrically in R and S:

(R-S)HT C
(Rn(T-S°))-(Sn(R°-T)). (4.8)
Let us prove that (4.8) is equivalent to the preceding two versions. First, mono-
tonicity of composition gives at once that (4.8) implies both (4.6) and (4.7). For
the other direction, we reason:

(R-S)HT
=
{meet idempotent}
(R s) n t n t
C {inequation (4.7), writing U R 0 (T S°)}
=

(u-S)nT
C
{inequation (4.6)}
u-(sn(u°- T))
C {since U C R; converse; monotonicity}
U-(Sn(R°- T)).
4.1 J Allegories 85

In particular, taking T = id and replacing R by R° in (4.8), we obtain

(R°'S)md c
(RnS)°-(RnS). (4.9)
This inclusion is useful when reasoning about the range operator, defined below.

A proof similar to the above one gives that

R C RR°R. (4.10)

This completes the formal definition of an allegory. Note that neither the join
operation (U) nor the complementation operator (-1) on arrows are part of the
definition of an allegory, even though both are meaningful in Rel.

To explore the consequences of the axiomatisation we will need some additional


concepts and notation, and we turn to these next.

Exercises

4.1 Using an indirect proof and the universal property of meet, prove that meet is
associative: R n (S 0 T) =
(R n S) 0 T.

4.2 Translate the following semi-commutative diagrams into inequations:

A
R D
t
R
A + B

4.3 Find a counter-example for (R S) 0 (R T) C R (S n T).


4.4 The term universal property of meet suggests that R H S is the terminal object
in a certain category. Is it?

4.5 Show that Rn(S> T) = Rn S ((5° R) n T).


4.6 Prove that RCRR° >R.

4.7 Prove that if A and B are allegories, then so is A x B.


86 4 J Relations and Allegories

4.2 Special properties of arrows

Various properties of relations that relate to order are familiar from ordinary set
theory,and can be stated very concisely in the language of relations and allegories.

An arrow R : Ai- A is said to be reflexive if id,A Q R and transitive if R R C R.


An arrow that is both reflexive and transitive is called a preorder. The converse of

a preorder is again a preorder, and monotonicity of composition gives us that the


meet of two preorders is a preorder as well.

An arrow R : A
A is said to be symmetric if R C R°. Because converse is a
*-

monotonic involution, this is the same as saying that R R°, Again it is easy to =

check that the meet of two symmetric arrows is symmetric.

An arrow R : Ai-A is said to be anti-symmetric if RC\R° C ida- An anti-symmetric


preorder is called a partial order. A symmetric preorder is called an equivalence. If

R is a preorder, then R D R° is an equivalence.

A less familiar notion, but one which turns out to be extremely useful, is that of
a coreflexive arrow. An arrow C : A 4- A is called coreflexive if C C ida- One
can think of C as a subset of A. Every coreflexive arrow is both transitive and
symmetric. Here is a proof of symmetry:

C
C {inequation (4.10)}
C C° C
C {since C coreflexive}
id C° id
=
{identity arrows}

The proof of transitivity is even easier and is left as an exercise.

Range and domain

Associated with every arrow R : A *- B are two coreflexives ran R : A*- A and
dom R : B+-B, called the range and domain of R respectively. Below we shall only
discuss range; the properties of domain follow by duality since, by definition,

domR =
ranR°.

One way of defining ran R : A 4- A is by the universal property

ranRCX =
RCXR for all X C idA. (4.11)
4.2 J Special properties of arrows 87

The intended interpretation in Rel is that a(ran R)a holds if there exists an element
b such that aRb.

We can also define ran R directly:

ranR =
(R-R°)nid. (4.12)
To prove that (4.12) implies (4.11), note that

R =
RHR C ((R R°) Hid) R = ranRR

by the modular law and so, by monotonicity, we obtain

ranRCX =» RCX-R

for any X. Conversely,

(R R°) fl id
C {assume R C X R} -

(X-R>R°)n id
C {modular law}
x-((R-R?)nx°)
C {meet}

C {assuming X is a coreflexive}

completing the proof.

If X is coreflexive, then X R C R, and so R C X i2 if and only if i2


-
= X #. In
particular, taking X ranR in (4.11) we obtain
=

R =
ranRR. (4.13)
Taking R =
S-T and X = ranS in (4.11), and using (4.13), we obtain ran (5- T) C

ran £. In fact, this result can be sharpened: we have

ran(R-S) = ran (R- ran S). (4.14)


In one direction the proof is

ran (R- S) = ran (R ran S S) C ran (R ran S).


The other direction follows from (4.12).
88 4 J Relations and Allegories

Finally, let us consider briefly how the range operator interacts with meet. Prom
the direct definition of range (4.12) and monotonicity, we have
ran(RnS) C idn(R-S°).
The converse inequation also holds, by (4.9), and therefore

ran(RnS) =
idn(R-S°). (4.15)

Simple and entire arrows

An allegory has three subcategories of special interest, the categories formed by


taking just: (i) the simple arrows, also called partial functions; (ii) the entire arrows,
also called total relations; and (iii) those arrows that are both simple and entire,
that is, functions. We now examine each of these subcategories in some detail.

An arrow S : A «- B is said to be simple if


S-S° C idA.

Simple arrows are also known as imps (short for implementations) or partial
functions. In set-theoretic terms, S is simple if for every b there exists at most one o
such that aSb. Simple arrows satisfy various algebraic properties not enjoyed by
arbitrary arrows. For example, the modular law can be strengthened to an identity:

(S-R)nT = S (R fl (S° T)) provided S is simple. (4.16)


The inclusion (D) is proved as follows:

s (R n (S° r)) c (s r) n (s s° r) c (s R) n r.

We also have that composition of simple arrows right-distributes through meets:

(RHT)-S =
(R-S)n(T-S) provided S is simple. (4.17)
Again, the proof makes essential use of the modular law.

An arrow R : A «- B is said to be entire if

idB C R° >R.

Equivalently, R is entire when domR =


ids- In set-theoretic terms R is entire if
for every b there exists at least one a such that aRb. Since dom (R S) C dom S we
have that S is entire whenever R S is for any R. Also clear is the fact that if R is
entire and R C S, then S is entire. Finally, using (4.15) it is easy to show that

RnS entire =
idCR°>S. (4.18)
This condition will be useful below.
4.2 J Special properties of arrows 89

An arrow simple and entire is said to be a function. Single lower-case


that is both
letters willalways denote functions, even if we do not say so explicitly. For any
allegory A, its subcategory of functions will be denoted by Fun(A). In particular,
jRm(Rel) =
Fun.

The following two shunting rules for functions are very useful:

f-RCS =
RCfo-S (4.19)
R-f°CS =
RCS-f. (4.20)
To prove (4.19) we reason:

f-RCS
=> {monotonicity}
f°-f-RCf°.S
=> {since / is entire}
R C f° S
=> {monotonicity}
f-RCf-f°-S
=> {since / is simple}
f-RCS.
The dual form (4.20) is obtained by taking converses. Any arrow / satisfying either
(4.19) or (4.20) for all R and S is necessarily a function; the proof is left to the
reader.

An easy consequence of the shunting rules is the fact that inclusion of functions
reduces to equality:

VQ 9) =
(f =
9) i=
(f^g).
We reason:

fQ9
{shunting}
idCfO-g
{shunting}
9°Qf°
{converse is a monotonic involution}
gQf-

This fact is used frequently.


90 4 J Relations and Allegories

Functions can also be characterised without making explicit use of the converse

operator. This result will be of fundamental importance in the following chapter,


so we record it as a proposition.

Proposition 4.1 Suppose that R : A «- B and S : B «- A satisfy R S C id and


id C S R. Then S =
i?°, and so R is a function.

Proof. First observe that id C S R implies that S R is entire. Hence R is entire


as well.

We now reason:

S
C
{since R is entire}
R° R 5
C {since R- S C id}
R°.

By taking R = S° and 5 = i2° in the above argument, we also have i?° C 5, and
so S =
#°.

Exercises

4.8 Prove that coreflexives are transitive.

4.9 Let A and B be coreflexive arrows. Prove that A- B =


Af)B.

4.10 Let C be coreflexive. Prove that (C R)nS =


C -

(RnS).
4.11 Let C be coreflexive. Prove that

(c-x)md =
(i.c)na
=
(c-jr-c)nid
=
c-(xmd)
=
(xn id) a

4.12 Show that, when C is coreflexive, ran(C R) -


=
C -
ranR.

4.13 An arrow is said to be idempotent if R R =


R. Prove that an arrow which
is both symmetric and transitive is idempotent.
4.3 J Tabular allegories 91

4.14 Prove that R is symmetric and transitive if and only if R =


R R°.

4.15 Prove that if S is simple, S =


S S° S. Does this equation imply simplicity?

4.16 Prove that ran (Rn(S- T)) = ran ((R T°) n 5).
4.17 Prove that dora #•/=/• dora (R-f).
4.18 A iocaie is a partial order (C, V) in which every subset X C V has a least
upper bound \JX, and any two elements o, 6 have a greatest lower bound o l~l b.

Furthermore, it is required that

(U*)n& =
LKanHae*}-
A V-valued relation of type A 4- B is a function Ff-(AxB). Show that V- valued
relations form an allegory.

4.3 Tabular allegories


The definition of an general and admits models that are quite
allegory is very
different from set-theoretic relations. Surprisingly, however, one only needs to make
two additional assumptions, the existence of tabulations and a unit, to get very close
to set-theoretic relations, at least in terms of proofs. The existence of tabulations
makes it possible to mimic pointwise proofs in a categorical setting. In a pointwise
proof we reason about relations as binary predicates, manipulating aRb instead of
R itself. In some cases pointwise proofs are more effective than point-free proofs;
indeed it may even happen that no point-free proof is available. Tabulations give
us a means of overcoming this phenomenon and thus the best of both worlds.

Tabulations

Let R : A 4- B. A pair of functions f : A *- C and g : B «- C is said to be a

tabulation of R if

R =
f-g° and (/° •/) n (g° g) -
= id.

An allegory is said to be tabular if every arrow has a tabulation.

In particular, the allegory Rel is tabular. In Rel a relation R : A «— B can be


identified with a subset C of Ax B. Taking / and g to be the projection functions
outl : A<r- C and outr : B <- C, we obtain R outl outr°. Moreover, in Rel the
=

projection functions satisfy

(outl° outl) fl (outr° outr) =


id,
92 4 J Relations and Allegories

as one can easily check, so the second condition is satisfied as well.

In any tabular allegory, the condition (/° /) fl (g° g) id is equivalent to saying =

that the pair of functions is


(/, g) jointly monic, that is, for all functions ft and k
if
we have

ft =
k =
(/ ft =
/ k and g ft =
g k).
In one direction we reason:

/ ft =
/ k and g -
h =
g
-
k
=
{shunting of functions}
h-k° C f°.f and h-k° Cg°.g
=
{meet}
h-k°C(f°.f)n(g°-g)
=
{assumption}
ft fc° C id
=
{shunting of functions}
ft C jfc
=
{inclusion of functions is equality}
ft = jfc.

For the other direction, assume that (ft, k) is a tabulation of (/° /) fl (g° #):

h-k°C(f°.f)n(g°-g)
=
{as before}
/ ft =
/ k and g -
h =
g k
=
{assuming (/, #) is jointly monic}
ft =
fc,

and so (/° /) fl (g° #) = ft ft° C id. But since / and g are entire, this inclusion
can be strengthened to an equality.

The kind of reasoning seen in the last part of this proof is typical: we are essentially
doing pointwise proofs in a relational framework. Similar reasoning arises in the
proof of the next result, which gives a characterisation of inclusion in terms of
functions.

Proposition 4.2 Let (/, g) be a tabulation of R. Then ft k° C R if and only if


there exists a (necessarily unique) function m such that ft =
/ m and k =
g m.
4.3 J Tabular allegories 93

Proof. Given the existence of function m such that

ft. ,k

B m

commutes, we have

h-k°
=
{assumption, converse}
/ m ra° g°
C
{m simple}
f-9°
=
{(/>#) tabulates R}
R.

In the other direction, define m =


(/° ft) fl (#° fc). We first show that m is simple:

m ra°
=
{definition and converse}
(V° -h)n(g° •!:))¦ ((h° -f)n(k° -g))
C {monotonicity}
(/°-fc-fc°-/)n(fl0-*-*0-s)
C
{h and fc are simple}
(r-f)n(g°-g)
Q {(/>P) tabulates R}
id.

To show that m is entire, we can appeal to (4.18) and prove id C h° -f g° k. The


argument is

id
C {h and A; are entire}
ft° ft k° fc
C {assumption and (/, #) tabulates i?}
ft°./'(?0-fc.
94 4 J Relations and Allegories

Since we now know m is a function and

/•mC/./°.ftCft,
we obtain that / m h because inclusion of functions is equality. By symmetry,
it is also the case that g m k. =
Finally, the fact that is uniquely defined by
m

/ .
m = h and g m =k follows at once from the fact that (/, g) is jointly monic.

One import of this result is that tabulations are unique up to unique isomorphism,
thatis, if both (/, g) and (A, k) tabulate i?, then there is a unique isomorphism m
such that h =
/ m and k =
g m.

Unit

A unit in an allegory is an object U with two properties. First, idu is the largest
arrow of type U «— U, that is,

RCidu <= R: U*-U.

In other words, every arrow U 4- U is a coreflexive. The second property is that


for every object A there exists an entire arrow pa : U 4- A. This entire arrow is
necessarily a function because pa'Pa° Q i&u by the first condition, so pa is simple
as well. An allegory possessing a unit is called unitary.

The allegory Rel is unitary: a unit is a singleton set and pa is the unique function
mapping every element of A to the sole inhabitant of the singleton. Recall that a
singleton set in Rel is a terminal object in Fun, its subcategory of functions.
In a unitary allegory we have, for any relation R : A 4- B, that

RQPA°'PB
=
{shunting functions}
Pa- R' Pb° Q ^u
=
{definition unit}
true.

In other words, pa° Pb is the largest arrow of type A 4- B. From now on, we shall
denote this arrow by II: A <- B. In Rel, the arrow II is just the product Ax B.

As a special case of the above result we have that II: U <- A is the arrow pa, and
since inclusion of functions is equality, it follows that pa is the only function with
this type. Thus a unit in an allegory is always a terminal object in its subcategory
of functions, a point we illustrated above in the case of Rel.
43 / Tabular allegories 95

Restricted to tabular allegories, the converse is also true: in a tabular allegory, a


terminal object subcategory of functions is a unit of the allegory. By the
in the
definition of a unit, it suffices to show that id\ is the largest arrow of type 1«- 1.
To this end, let R : 1«-1, and let (/, g) be a tabulation of R. Because 1 is terminal
and /, g : 1 <- A we have / g !. Hence, = =

R =
f-g° =
l.\°Cid,
and the claim is proved. Based on this equivalence of terminal objects and units,
we will write U in preference to pa, and 1 instead of U when discussing units.

Finally, let us consider the tabulation (/, g) of II : A <- By where f : A<- C and
g : B «- C for some C. k° C II for any two functions ft : A <- D and
Since ft
k : B «- -D, we obtain from Proposition 4.2 that m = (/° ft) f) (g° k) is the unique
arrow such that ft / m and A;
= =
g m. But this just says that C, together with
*

/ and g> is a product of A and J? in the subcategory of functions. So without loss


of generality, we may assume that C = i x B, and (/, g) =
(outl,outr). Also,
m =
(ft, k) and so

(ft, k) =
(outl° ft) fl (o^r° k).
We will use this fact in the following chapter, when we discuss how to obtain a

useful notion of products in a relational setting.

Set-theoretic relations

Finally, let us briefly return to the question of pointless and pointwise proofs. There
is a metartheorem about unitary tabular allegories which makes precise our intuition
that they are very much like relations in set theory.

A Horn-sentence is a predicate of the form

E1 =
DlAE2 =
D2A...AEn =
Dn => En+! =
Dn+ly
where Ei and Di are expressions that refer only to the operators of an allegory, as
well as tabulations and units. The meta-theorem is that a Horn-sentence holds for
every unitary tabular allegory if and only if it is true for the specific category Rel
of sets and relations. In other words, everything we have said so far could have
been proved by checking it in set theory. A proof of this remarkable meta-theorem
is outside the scope of this book and the interested reader is referred to (Freyd and
Scedrov 1990) for details.

Although set-theoretic proofs are valid for any unitary tabular allegory, direct proofs
from the axioms are usually simpler, possess more structure, and are more revealing.

Accordingly, we resort to proof by tabulation only when other methods fail.


96 4 J Relations and Allegories

Exercises

4.19 Prove that a function m is monic if and only if ra° m = id.

4.20 Prove that for every function / there exist functions c and m such that

/ =
m c and c c° = id and ra° m =
id.

Is this factorisation in some sense unique? (In text books on category theory, this
factorisation is introduced under the heading 'epi-monic factorisation'.)
4.21 Show that if (/,#) tabulates R and R is simple, then g is monic.

4.22 Show that if R =


/ g° and R is entire, g g° =
id.

4.23 Using the above two exercises, show that if (/,#) tabulates R and i? is a

function, then g is an isomorphism.

4.24 Show that h fc° C R 5 iff there exists an entire relation Q such that

h-Q°CR and Q-k°CS.

4.25 Is the allegory of F-valued relations in Exercise 4.18 tabular?

4.26 Prove that (X C Y) =


(ran X C ran Y) for all X, F : A <- 1.

4.27 Prove that domS = id nil S.

4.4 Locally complete allegories


It is now time to study the operator that seems, somewhat mysteriously, to have
been left out of the discussion far: the operator U that returns the union of
so

two relations. In fact, we will consider the more general operator |J that returns
the union of a set of relations. Inparticular, we shall see how its distributivity
properties give rise to two other operations, implication (=>) and division (\).

Join and zero

An allegory is said to be locally complete if for any set H of arrows A <- B there is
an arrow \JH A<r-B,
: called the join of H, characterised by the universal property

\JHCX =
(VReHiRCX)
for all X : A 4- B.
4.4 J Locally complete allegories 97

It is assumed that meet and composition distribute over join:

(\jH)nS =
{J{RnS\ReH}
qjh)-s =
\j{R-s\Ren}.
Neither of these equations follows from the universal property of join. On the other
hand, the universal property does give us that converse distributes over join:

(IPO0 =
\J{R°\Rzn}.
This is because converse is a monotonic involution. In the special case where H is
the empty set we write 0 for \JH; when H =
{i2,S} we write R\J S. By taking H
to be the empty set we obtain that 0 is a zero both of meet and composition. In
Rel, the arrow 0 is the empty relation.

Like meet, the binary join U is associative, commutative and idempotent. It is


important to bear in mind, however, that in a locally complete allegory there does
not exist the symmetry between meet and join found in the predicate calculus: for
meet one only has the modular law, while composition properly distributes over
join.

Implication

Given two arrows R,S:A*-B, the implication (R => S) : A*- B can be defined
by the universal property

XC(R^S) =
(Xf)R)CS for all X: A<-B.

The intended interpretation in set theory is that a(R =» S)b =


(aRb =» aSb).
Implication can also be defined directly as a join:

r =» s =
U{* \(xnR)c s}.
To prove that this definition satisfies the universal property, assume first that (X n

R) C S. Then we obtain X e {X | (X n R) C S} and so X C (R =» S) by the


universal property of join. For the other direction we argue:

X C (R =» S)
=
{definition of R =» S}
XQ\J{Y\(YnR)CS}
=> {meet distributes over join}
xnRc[j{YnR\(YnR)cs}
=> {universal property of join}
XDRCS.
98 4 J Relations and Allegories

Composing preorders

One use of implication is incomposing preorders. Consider, for instance, the lexical
ordering on pairs of numbers:

(a, b) < (c, d) =


a < c V (a =
c A b < d).
We can write this in an alternative way:

(o, b) < (c, d) =


o < c A (c < o => b < d).
This suggests defining i?; S (pronounced 'R then 5') by
R;S =
Rn(R°^S).
The relation i? ; S first compares two elements by i?, and if the two elements are
equivalent in i?, it then compares them by S. In particular, the lexical ordering on
pairs of numbers is rendered as

(outl° leq outl); (outr° leq outr),


-

where leq is the prefix name for <.

If R and S are preorders, then R ; S is a preorder. The proof makes use of the
symmetric modular law (4.8) and is left as an exercise. One can also show that (;)
is associative with unit II. This, too, is left as an exercise.

Division

Given two arrows R and S with a common target, the left-division operation R\S
(pronounced 'R under 5") is defined by the universal property

XCR\S =
RXCS.

In a diagram, X =
R\S is the largest arrow that makes the triangle

A* C

r\
~

/s
B

semi-commute. The interpretation of R\S as a predicate is

a(R\S)c =
(V6 : bRa => bSc),
so the operation (\) gives us a way of expressing specifications that involve universal
quantification.
4.4 J Locally complete allegories 99

Left-division can also be defined explicitly as a join:

R\S =
[J{X \R-XCS}.
The proof that this works hinges on the fact that composition distributes over join,
and is analogous to the argument for implication spelled out above. Note that R\S
is monotonic in S, but anti-monotonic in R. In fact, we have

(RUS)\T =
(R\T) n (S\T) and R\(S flT) =
(R\S) n (R\T).
The universal property of left-division also gives the identity

(R-S)\T =
S\(R\T),
but nothing interesting can be said about R\(S T).
The cancellation law of division is

R-(R\S) C S

and its proof is an immediate consequence of the universal property.

Right-division

Since composition is symmetric in both arguments we can define the dual operation
of right-division S/R (pronounced 'S over i?') for any relations S and R with a
common source:

XCS/R =
XRCS.

At the point level we have

a(S/R)b =
(VciaSc^bRc).
Since converse is a monotonic involution the two division operators can be defined
in terms of each other:

R\S =
(S°/R°)° and S/R =
(R°\S°)°.
Sometimes it's better to use one version of division rather than the other; the choice
is usually dictated by the desire to reduce the number of converses inan expression.

Exercises

4.28 Prove that the meet of a collection H of arrows can be constructed as a join:

f]H =
[){S\(VR£H:SCR)}.
100 4 J Relations and Allegories

4.29 Prove that ran (|JW) =


\J{ran X X £ H}.
4.30 Show that there exists an operator (—) such that

R-SCX =
RCSUX

for all X. Using this universal property, show that

R-m = r

RUS =
RU(S-R)
R-(SUT) = R-S-T

(RUS)-T =
(fl- T)U(5- T).

4.31 Prove that R =


0 if and only if ran R =
0. Show how this may be used to

prove that

((/I 5) n T =
0) =
(B n (T 5°) =
0).

4.32 Prove the following properties of implication using its universal property:

R=>(S=>T) =
(RHS)^T
(RUS)^T =
(fl =» T) H (5 =» T)
fl =» (5n T) =
(R^S)n(R^T)
R fl (72 =» 5) = fl fl 5.

4.33 Prove the following property of implication

r-(R=>S)-g =
(f°-R-9)^(f°-S-g).

4.34 Prove that fl;S is a preorder if fl and £ are. Also prove that (;) is associative
with unit II.

4.35 Prove the laws

(R\S)-f =
R\(S-f)
f°-(R\S) =
(R-f)\S.

4.36 Let (<, A) and (C, B) be preorders (in the ordinary set-theoretic sense, not as
arrows in an allegory). A Galois connection is a pair (/, g) of monotonic mappings

/ : A±- B and g : B «— A such that

x<gy =
fxQy.
4.5 J Boolean allegories 101

The function / is called the lower adjoint, and g the upper adjoint For example,
defining

fX =
XDR and g Y
=
R =» F,

we see that the universal property of => in fact asserts the existence of a Galois
connection. Spot the Galois connection in the universal properties of division and
subtraction (see Exercise 4.30).
The following few exercises continue the theme of Galois connections.

4.37 What does it mean to say that the mapping X «-? X R is lower adjoint to
Y^SYl

4.38 In Exercise 3.19, we defined the floor of a rational number using a universal
property. This property can be phrased as a Galois connection; identify the relevant
preorders and the adjoints.

4.39 Show how the universal property of binary meet can be viewed as a Galois
connection.

4.40 Now consider a Galois connection between complete lattices (partially ordered
sets where every subset has both a least upper bound (lub) and a greatest lower

bound (gib)). Prove that the following two statements are equivalent:

(/,#) is a Galois connection.

/ preserves least upper bounds and for all x,

gy =
lub{x\fx<y}-

4.5 Boolean allegories

One operator is still missing, namely the operator -• of negation. In a locally


complete allegory, one can define negation by

This notion of negation satisfies a number of the properties one would expect. First
of all, negation is order-reversing. Furthermore, we have De Morgan's law

-.(/JU5) =
(-./I) n (-.5).
In general, however, it is not true that negation is an involution.
102 4 J Relations and Allegories

If the equation -"(-"iZ) R is satisfied, then the allegory is called boolean; in


=

particular, Rel is boolean. Boolean allegories satisfy many properties that are not
valid in other allegories. For instance, division can be defined in terms of negation:

X/Y =
^X-Y°). (4.21)
(Prom now on, we omit the brackets round -iX, giving -• the highest priority in

formulae.) This definition expresses the relationship between universal and existential
quantification in classical logic. To prove (4.21), it suffices to show that

^R/Y =
-.(/*.y°), (4.22)
because taking R -*X and using X
=
-*(-*X) gives the desired
=
result. It turns
out that equation (4.22) is valid in any locally complete allegory:

XC^R/Y
=
{division}
X-YC-iR
=
{negation}
X Y C (R =» 0)
=
{implication}
X- YHRC0
=
{Exercise (4.31)}
x n R Y° c 0
=
{implication; negation}
X C-i(/J. y°).
Notice that the above proof uses indirect equational reasoning, proving that R =
S
by showing that X C R =
X C £ for arbitrary X.

In our own experience, it is best to avoid proofs that involve negation as much as

possible, since the number of rules in the relational calculus becomes quite
unmanageable when boolean negation is considered.

Exercises

4.41 Without assuming the allegory is boolean, prove that:

-n =
o

-¦# =
-¦-¦-¦#

-.(/ius) =
^Rn-^s

-.-.(jru-tR) = n
4.6 J Power allegories 103

4.42 Prove that a locally complete allegory is boolean iff R U -yR = U for all R.

4.43 In relational calculi that take negation as primitive, the use of division is often
replaced by an appeal to Schroder's rule, which asserts that

(-.T -S°C^R) =
(R-SCT) =
(R° -.T C -.5).
Prove that Schroder's rule is valid in any boolean allegory.

4.6 Power allegories


In set theory, relations usually defined as subsets of a cartesian product, a
are

fact we have used a number of times


already. But it is important to observe that
this is a more or less arbitrary decision, since relations could have been introduced
as boolean-valued functions of two arguments, or as set-valued functions. In this
section, we shall show how the notion of powersets may be defined in an allegory
by exploiting the isomorphism between relations and set-valued functions.

Power transpose and epsilon

In order to model set-valued functions in an allegory A we need three things:

for each object A in A an object PA, called the power-object of A;


a function A, called power transpose, that takes an arrow R : A «— B and
returns a function AR : PA <- B;

an arrow £ : A<- PA, called the membership relation of P.

These three things are defined (up to unique isomorphism) by the following universal
property. For all R : A <- B and
/ : PA <- B,

f =
AR =
£-f =
R.

The following diagram summarises the type information:

PA ^ B

g\ /r
A

It is immediate from the universal property that AR AS implies R S, so A = =

is an isomorphism between relations and (set-valued) functions. In set theory, A is


104 4 J Relations and Allegories

defined by the set comprehension

(AR)b =
{a\aRb}.
Indeed, one might say that the definition of A is merely a restatement of the

comprehension principle in axiomatic set theory.

Let us now see how the universal property of A can be used to prove some simple
identities in set theory. First of all, by taking / Ai2, we have the cancellation
=

property

£ AR =
R,

so the diagram above commutes.

As a consequence of A-cancellation we obtain the fusion law

A(R-f) =
ARf,
which is valid only if / is a function.

Finally, we have the reflection law A£ =


id, which is proved by taking j id and
R £ in the universal property.

For completeness we remark that the definition of (PA, A, e) can also be phrased as
asserting the existence of a terminal object in a certain category. Given an allegory
A and an object A, consider the category A/A whose objects are all arrows R of
A with target A, and whose arrows are those functions / : R 4- S for which the
following diagram commutes:

B -

r\ /s
A

Composition in AjA is the same as that in A. Now, the terminal object of AjA is
the relation £a -A 4- PA because

/ =
AR 4= f :£A<-R
and so AR is just another notation for \r.

Existential image

It is a general principle in category theory that any suitable operator on objects


can be extended to a functor. Since we have just introduced the operator P on
4.6 J Power allegories 105

the objects of an allegory A, it is natural to look for a corresponding functor. In


the present case there are several possibilities, one of which is the power functor
P : A 4- A, and another is the existential image functor E : Fun(A) 4- A. It is
not immediately obvious what the action of the power functor should be on the
arrows of an allegory, so we will postpone consideration of this possibility to the
next chapter.

The existential image functor is defined by

ER =
A(R-£).
In set theory we have

(ER)x =
{a | (36: aRbAbex)}.
It is easy to see from the reflection law AG id that E preserves identities. To
=

show that E also preserves composition, it suffices to prove the absorption property

ERAS =
A(R-S).

Taking S = T £ gives us ER ET =
E(R T), which is what is required.

Here is a proof of the absorption property:

ERAS A(R S)
=

=
{defining property of A}
ER AS =
R S
=
{definition of E, A-cancellation}
R AS = R S
=
{A-cancellation}
true

As an immediate consequence of A-cancellation, we obtain that is a natural


transformation id 4- JE, where J : A 4- Fun(A) is the inclusion functor.

The restriction of E to functions is called the power functor P; thus P =


EJ. Note
that P : Fun(A) 4- Fun(A), while E : Fun(A) 4- A. In set theory, P is the map
operation that applies a function to all elements of a set:

Pfx =
{fa a£x}.
In the following chapter we shall show how to extend P to a functor P : A 4- A.

Prom now on we will omit J in the composition JE, silently embedding Fun(A) in
A; thus we assume that E : A 4- A.
106 4 J Relations and Allegories

Singleton and union

For any object A, the singleton function r : PA 4- A is defined by r Aid. In


=

settheory, r takes an element and returns a singleton set. Using the fusion law
A(R /) AR /, we obtain
=

A/ =
A(«./) =
Auf./ =
T-/

and so

P/ r =
E/ Aid =
A(/ id) =
A/ =
r •/•

Thus r is a natural transformation P «- id. These and similar facts illustrate the
difference between P and E, for r is not a natural transformation E 4- id.

For each A, the


union function x\ : PA 4- PPA is given by \x Ee. In words, \x
=

returns the union of a collection of sets. Since £ : id «- E, we have


/x : E «- EE.
Union satisfies the one-point properties

/x. Pr = id \i'T

as well as the distribution property

/jL' /JL
=
/i P/i.

This last result follows from the definition of \x plus naturality:

/x P/x =
/x PEG =
/x EEe =
E€ /x =
/x /x.

In later chapters we will use union as a synonym for /x.

The subset relation

For any A, the subset relation subset : PA 4- PA is defined by subset =


G\G.
Interpreted in set theory we have

x subset y =
(Va : a x => a y).
Note the distinction between subset and C: the former models the inclusion relation
between sets, while the latter is the primitive operation that compares arrows in an

allegory.

Based its set-theoretic interpretation, we would expect that subset is a partial


on

order. Reflexivity and transitivity are immediate from the properties of division,
but the proof that subset is anti-symmetric, that is, subset n subset0 id, requires =

a little more effort, and in fact holds only for a unitary tabular allegory. Given this
4.6 J Power allegories 107

assumption, we will prove a more general fact, namely, that

AR =
(e\R)n(R\e)°.
Anti-symmetry of subset then follows from the reflection law AG = id.

To prove the above equation for A, we invoke the rule of indirect equality and
establish that

xcar =
x c (e\R) n (R\e)°
for all X. Assume that (/, g) is a tabulation of X. We reason:

/ 9° C AR
=
{shunting of functions}
f =
AJl-g
=
{fusion}

=
{universal property of A}
ۥ/ =
/*.p
=
{anti-symmetry of C}
(€-/C#-<?)and(fl.(?C€./)
=
{shunting of functions}
(€-/-0°Ci*)and(i^./°C€)
=
{division}
if 9° C e\R) and (g-rcR\€)
¦

=
{converse, meet}
f-g" C(e\R)n(R\€)°,

and we are done.

Exercises

4.44 Consider the equation A(R-f) ARf. Why is it not possible


= to replace the
function / by an arbitrary arrow? Give a counter-example.

4.45 The notion of non-empty power objects (corresponding to non-empty subsets)


can be defined by changing the defining property of power transpose slightly. What
is the required change?

4.46 Show that r is monic.


108 4 J Relations and Allegories

4.47 Prove that AR = ER- r, and ER =


\x P(AR).
4.48 Prove that (AR)° AS =
(£\5) n (S\R)°.
4.49 Prove that R is preorder if and only if R
a i2\#. Using this, show that R =

is a preorder if and only if there exists a partial order S and a function / such that
R=f°-S-f.

4.50 Prove that e/(R\£) = R.

4.51 A predicate transformer is a function of type PA «- P#. One can define a

partial on predicate transformers by


order

f<9 =
€./C€.p.

A predicate transformer ft is said to be monotonic if ft subset C subset ft. Prove


that ft is monotonic if andonly if

f < g implies ft / < ft g

for all / and g.

4.52 For # : B 4- A, consider the predicate transformer wlp R : PA «- PB defined


by wlp R =
A(R\£). Prove that wlp (R- S) wlp S wlp R, and
=

RCS =
wlp S < wlp R.

(Exercise 4.50 will come in handy here.) Finally, show how to associate with any
predicate transformer p :PA<r- PB an arrow S : B 4- A so that

p < wlp R =
wlp S < wlp R

for any R : B 4- A. (This Exercise is the topic of (Morgan 1993).)

Bibliographical remarks

The calculus of relations has a rich history, going back to (De Morgan 1860), (Peirce
1870) and (Schroder 1895). The subject as we know it today was mostly shaped
by Tarski and his students in a series of articles, starting with (Tarski 1941). An
overview of the origins of the relational calculus can be found in (Maddux 1991;
Pratt 1992).

During explore relations in a categorical setting


the 1960s several authors started to
(Brinkmann 1969; 1961; Puppe 1962). This resulted in a consensus
Mac Lane
that regular categories are the appropriate setting for studying relations in general
(Grillet 1970; Kawahara 1973b). In fact, a category is regular if and only if it
Bibliographical remarks 109

is isomorphic to the subcategory of functions of a tabular unitary allegory. The


study of categories of relations is still a buoyant subject, see for instance (Carboni,
Kasangian, and Street 1984; Carboni and Street 1986; Carboni and Walters 1987).
The definitive introduction to this area of category theory is the text book by Preyd
and Scedrov (Preyd and Scedrov 1990), on which our presentation is based.

Although we have omitted to elaborate on this, the axioms introduced here, namely
those of a tabular allegory that has aunit and power objects, precisely constitute
the definition of a topos. There are many books on topos theory (Barr and Wells
1985; Goldblatt 1986; Mac Lane and Moerdijk 1992; McLarty 1992), and several of
the results quoted here are merely special cases of theorems that can be found in
these books.

During the 1970s the calculus of relations was applied to various areas of computing
science, e.g. (De Bakker and De Roever 1973; De Roever 1972, 1976). This work
see

culminated in (Sanderson 1980), where it was suggested that relational algebra could
be aunifying instrument in theoretical computing. Our own understanding of the
applications to program semantics has been much influenced by (Hoare and He
1986a, 1986b, 1987; Hoare, He, and Sanders 1987). It is probably fair to say that
until recently, most of this work was based on Tarski's axiomatisation; the standard
reference is (Schmidt and Strohlein 1993). Related references are (Berghammer
and Zierer 1986; Berghammer, Kempf, Schmidt, and Strohlein
1991; Desharnais,
Mili, and Mili 1993; Mili 1983; Mili, Desharnais, and Mili 1987). (Mili, Desharnais,
and Mili 1994) is particularly similar to this book, in that it focuses on program
derivation.

The categorical viewpoint of relations first


exploited in computing science by
was

(Sheeran 1990), and later also


by (Backhouse, De
Bruin, Malcolm, Voermans, and
Van der Woude 1991; Backhouse, De Bruin, Hoogendijk, Malcolm, Voermans, and
Van der Woude 1992; Martin 1991; Kawahara 1990; Brown and Hutton 1994). The
presentation in this chapter has greatly benefited from discussions with Roland
Backhouse, Jaap van der Woude and their colleagues at Eindhoven University. The
use of Galois connections, made explicit in the exercises, is mainly due to their

influence. An especially interesting application of Galois connections in the relational


calculus is presented by (Backhouse and Van der Woude 1993).

The calculus of binary relations is of quite restrictive, and therefore a


course

number of researchers have started to explore generalisations. In (Moller 1991, 1993;


Moller and Russling 1994; Russling 1995) relations are taken to be sets of sequences
rather than sets of strings: this is very convenient, for instance, when reasoning
about graph algorithms. Another generalisation is to drop converse (Von Karger
and Hoare 1995; Berghammer and Von Karger 1995): this leads to a calculus of
processes. Another attempt at dealing with distributed algorithms in a relational
setting is (Rietman 1995). Many of these developments are summarised in a
forthcoming book on relational methods in computer science (Brink and Schmidt 1996).
110 4 J Relations and Allegories

A topic that we shall not address in this book is that of executing relational
Clearly, it would be desirable to do so, but it is as yet unclear what the model
expressions.
of computation should be. One promising proposal, which involves rewriting using
the axioms of an allegory, has been put forward by (Broome and Lipton 1994).
Chapter 5

Datatypes in Allegories

The idea nowreplace categories by allegories as the mathematical basis of a


is to
useful calculus for deriving programs. However, there is a major stumbling block:
categorical definitions of datatypes are not suitable when working in an allegory.
Each allegory is identical to its opposite so dual categorical constructs coincide. In
particular, products coincide with coproducts, which is not what one wants in a
sensible theory of datatypes.

The solution proposed in this chapter is to define all relevant datatype constructions
in Fwi(A), and then extend them in
canonical way to A. In fact, we show
some

that the base functors of datatypes in Fun(A), the power functor, and type functors
can all be extended to monotonic functors of A. In particular this means that a

monotonic extension of a categorical product exists in A, and this extension -

called relational product can be used in place of a categorical product. Crucial


-

to the success of the whole enterprise is the notion of tabulations introduced in the
preceding chapter.
As result, catamorphisms can be extended to include relational algebras.
a

catamorphisms are powerful tools for problem specification, and we go on


Relational
to illustrate their use by showing how some standard combinatorial functions can
be defined very succinctly using relations. This material is used heavily in later
chapters on solving optimisation problems. The chapter ends with a discussion of
how natural transformations can be generalised in a relational setting.

5.1 Relators

Let A and B be tabular allegories. By definition, a relator is a monotonic functor


F : A «— B, that is, a functor F satisfying
RCS =? FRCPS

for all R and 5.


112 5 j Datatypes in Allegories

As we shall see in Theorem 5.1 below, a relator can also be characterised as a functor
on relations that preserves converse, that is,

(FR)° =
F(R°).
As a first step towards proving this result, we prove the following lemma.

Lemma 5.1 Let F be a relator and / a function. Then F/ is a function, and


(F/)° =
F(/°).
Proof. Since functions are entire and simple, we have

F/-F(/°) =
F(/./°) C Fid =
id
F(/°)-F/ =
F(/°./) D Fid =
id.

Now recall Proposition 4.1 of the preceding chapter, which states that R is a function
if and only if there exists an S such that R S C id and id C S R. Furthermore,
these two inequations imply that S R°. It follows that F/ is a function with
=

converse F(/°).
?

Theorem 5.1 A functor is a relator if and only if it preserves converse.

Proof. First, assume F is a relator and let (f,g) be a tabulation of R. Using


Lemma 5.1 we have:

F(R°) =
F((f-g°)°) =
F(g-f°) =
Fg-F(f°)
(FR)° =
(Ff.F(g°))° =
(Ff.(Fg)°)° =
Fg.(Ff)° =
Fg F(/°).
¦

Thus F(R°) =
(FR)°.
For the reverse direction again use tabulations. Suppose R h-k°Cf-g°
we S,
with (/, g) jointly By Proposition 4.2 of the preceding chapter there exists
monic.
a function m such that h f m and k
=
g m. Hence, we can reason:
=

FR
=
{definition of R and F a functor}
Fh ¦

F(k°)
=
{F preserves converse}
Fh¦(Fk)°
5.1 J Relators 113

=
{definition of m and F is a
functor}
F/.Fm-(Fm)°-(F</)0
C {since Fra is a function and so is simple}
F/F5°
=
{Fa functor and definition of S}
FS,

and so F is monotonic.

Corollary 5.1 If two relators F and G agree on functions, that is, if F/ =


G/ for
/, then F G.
all =

Proof. Let R be an arbitrary relation, and (/, g) a tabulation of R. Then

FR =
F(f g°) =
F/ (Fg)° =
G/ (Gg)° =
GR.

One consequence of Theorem 5.1 is that, when F is a relator, it is safe to write FR°
tomean either (FR)° or F(i?°), a convention we henceforth adopt.

Exercises

5.1 Give an example of a non-monotonic functor F : Rel <- Rel.

5.2 Show that any relator preserves meets of coreflexives, that is,

F(XnY) =
FXDFY,

for all I,yc id. Is the restriction X, Y C id necessary?


5.3 Denoting the set of all functions A <- X by Ax, the exponential functor (_)x :
Fun <- Fun is defined on arrows by fx h =
f ft. Is the exponential functor a
relator? What is its generalisation to relations?

5.4 Consider a functor F : Fun <- Fun defined on objects by

O, if ^ {>
{
=

fA =

(^ {0}, otherwise.

This defines the action of F on arrows uniquely; what is it?


114 5 J Datatypes in Allegories

Now suppose F can be extended to a relator on Rel. Consider the constant functions
one, two :
{1,2} {0} returning
<- 1 and 2 respectively. Use the definition of F to
show that F(one° two) id, where =
id :
{0} <- {0}.
Now use the assumption that F preserves converse to show that F(one° two) =
0.
(This exercise shows that not every functor of Fun can be extended to a monotonic
functor of Rel.)

5.5 Show that any relator preserves the domain operator

F(domR) =
dom(FR).

5.2 Relational products

Let us now see how we can extend the product functor to a relator. Recall from the
discussion of units in the preceding chapter that Fun(A) has products whenever A
is a unitary tabular allegory. Recall also that (outl, outr) is the tabulation of II and
the pairing operator satisfies

(/,<?) =
(outl°-f)n(outr°-g).
This immediately suggests how pairing might be generalised to relations: define

(R,S) =
(outl°'R)n(outr°'S). (5.1)
The interpretation of (R, S) in Rel is, of course, that (a, b)(R, S)c when aRc and
bSc. Given (5.1), we can define x in the normal way by

RxS =
(R outl, S outr). (5.2)
Note that the same sign x is used for the generalised product construction (hereafter
called the relational product) in A as for the categorical product in the subcategory
Fun(A). However, relational product is not a categorical product.

The task now is to show that relational product isa monotonic bifunctor. First,

it is clear that (i?, S) is monotonic both in R and 5, and from the definition of x
we obtain
by a short proof that (R x 5)° R° x S°. Thus x preserves converse.
=

Furthermore, x preserves identity arrows (because they are functions), so the nub
of the matter is to show that it also preserves composition, that is,

(RxS)-(UxV) =
(R-U)x(S'V).
This result follows from the more general absorption property

(RxS)-(X,Y) =
(R-X,S-Y). (5.3)
5.2 J Relational products 115

In one direction (C), the proof of


expand the definitions, use mono-
(5.3) is easy:
tonicity and the fact that outl and outr
simple. We leave details as an exercise.
are

The proof in the other direction is a little tricky, so we will break it into stages.
Below, we will give a direct proof of the special case

(R-X,Y) C (Rxid)-(X,Y). (5.4)


By a symmetrical argument, we also obtain the special case

(X,S-Y) C (idxS)-(X,Y). (5.5)


Now we argue:

(R-X,S- Y)
Q {(5.4)}
(Rxid).(X,S- Y)
C {(5.5)}
(R x id) (S x id) (X, Y)
C {since (R x id) (id x S) C (R x
S) (exercise)}
(RxS)- (X, Y).
To prove (5.4) we argue:

{R-X,Y)
=
{(5-1)}
(outl° -R-X)n (outr° Y) ¦

=
{claim: outl (R x id) R outl for all R; converse}
=

((R x id) outl" X) n (outr° Y)


¦ ¦ ¦

C {modular law}
(R x id) ((outl° X) n ((R° x id) outr° Y))
¦ ¦ ¦

C {claim: outr (R x S) C S outr for all R, S; converse}


(R x id) {{outl0 X) n {outr° Y))
¦ ¦ ¦

=
{(5-1)}
{R x id) {X, Y)¦

In the above calculation we appealed to two claims, both of which follow from the
more general facts

outl-(R,S) = R-domS (5.6)


outr-{R,S) = S-domR. (5.7)
116 5 / Datatypes in Allegories

The proof of (5.6) is

outl (R, S)
=
{(5.1)}
outl ((outl° R) fl (outr° S))
=
{modular identity, since outl simple; outl outl° =
id}
R fl (outl outr° 5)
=
{since outl ot/^r° II} =

R H (n S)
=
{Exercise (4.27)}
R -
dom S.

The proof of (5.7) is symmetrical.

Equation (5.6) indicates why (outl, outr) does not form a categorical product in the
allegorical setting: for any arrow R, we have outl (R, 0) 0, not R. =

Finally, let us prove the following useful cancellation law:

(R, S)° (X, Y) =


(R° X) n (S° Y). (5.8)
The proof is

(R,S)°-(X,Y)
=
{converse, absorption (backwards)}
(id, id)° (R° x S°) (X, Y)
=
{(5.3)}
(id, id)° (#° -X,S°-Y)
=
{(5.1)}
(zrf, «)° ((outl° .R°.X)n (outr° S° F))
=
{distribution over meet, since (id, id) is a function}
((id, id)° outl° -R°'X)n ((id, id)° outr° 5° Y)
=
{products}
(R°-X)n(S°- Y).

It is worth while observing that all the above equations and inclusions can also be
proved by appeal to the meta-theorem of Section 4.3. Such indirect proofs are
an

quite short as compared to the excruciating symbol manipulation found above. On


the other hand, practice in the style of calculation given here will be useful later on
when the meta-theorem cannot be applied.
5.3 J Relational coproducts 117

Exercises

5.6 Prove that (R x


id) (id x
S) D (R x
S) using only (5.4) and (5.2).
5.7 Show that (P, Q) (#, 5)° C (P R°) x (Q S°).
5.8 Prove that (R x S) (X, Y)C(R.X,S- Y).
-

5.9 Prove that (R,S) f =


(R f,S f). -
Is this equation true when / is replaced
by an arbitrary arrow?

5.10 Let F : A <- A be a relator. Define unzip (F) =


(Foutl, Foutr). Prove that

unzip (F) F(R x S) =


(FR x
FS) mzzp (F).
for all /?, 5. (Hint: first consider the case S =
id.)
5.11 Recall the definition of exponentials from Chapter 3 An exponential of two

objects A and B is an object AB and an arrow eval : A <— AB x B such that for
each f : A<- C x B there is a unique arrow curryf : AB <- C such that

g =
curry f
=
eval (g x id) =
f.

Reading (x) as relational product, does Rel have exponentials?

5.3 Relational coproducts

Fortunately, coproducts are simpler than products, at least in the setting of power
allegories. Let (mZ, inr, A + B) be the coproduct of A and B in Fun(A), where A
is a power allegory. Then it is also a coproduct in the whole allegory A:

T inl =
R and T inr =
S
=
{power transpose isomorphism}
A(T inl) AR and A(T inr) AS
= =

=
{A fusion (backwards)}
AT inl =
AR and AT inr =
AS
=
{coproduct of functions}
AT [ARyAS]
=

=
{A-cancellation}
T e-[AR,AS].
=

Hence we can define [/?, S] G [AR,AS]. The following diagram illustrates this
calculation:
118 5 j Datatypes in Allegories

The border arrows of the diagram also suggests an explicit formula for [/?, 5]:

[R,S] =
(R'inl°)U(S'inr°). (5.9)
The proof of (5.9) is left as Exercise 5.12.

Given the definition of [/?, 5], we can define the coproduct relator + in the usual
way by
R + S =
[inl-R,inr>S\. (5.10)
It is easy to check that + is monotonic in both arguments, so + is a relator. In
analogy with products, we obtain the useful cancellation law

[R,S]-[U, V]° =
(R-U°)\J(S-V°). (5.11)
The proof is easier than the corresponding cancellation law for products, and details
are left as an exercise.

Exercises

5.12 Define X =
[zrf,0]. Use the universal property of coproducts to show that
X inl =
id and X inr =
0. Give the corresponding equations satisfied by Y =

[0, id]. Hence prove that

(inl X) U (inr Y) =
[inl, inr] = id.

Now use Proposition 4.1 to conclude that X = inl° and Y =


inr°. Hence prove

[R,S] =
(R-inl°) U (S-inr°).

5.13 Prove (5.11). Why not say simply that this result follows by duality from the
corresponding law for products?
5.14 Prove the equation

(R + S)n([u,v]°-[P,Q}) =
(Rn(u°-P)) + (sn(v°-Q)).
5.4 / The power relator 119

5.4 The power relator

Next we show how to extend P to a relator. Recall from the last chapter that,
over functions, P was defined as the restriction of existential image to functions:
P =
EJ. Furthermore, we know that P/ =
A(/ G) because ER =
A(R g), and
we also know from the final section of the preceding chapter the explicit formula
AR =
(e\R) n (-R\g)° for A. Putting these bits together we obtain

P/ =
(G\(/.G))n((/-e)\G)°.
The second term can be rephrased using the fact that (/ R)\S =
R\(f° 5), so

P/ =
(e\tf.e))n((9./)/9),
where 3 is a convenient abbreviation for G°.

The last identity suggests the following generalisation of P to relations:

Pi? =
(e\(*.e)) n ((s-R)/s).
In Rel, this reads

x(PR)y =
(\faex:3bey: aRb) A (V6 G y : 3a G x : aRb).
In words, if x(PR)y, then every element of x is related by R to some element in y
and conversely.

It is immediate from the monotonicity of division that Pi? is monotonic in R. We


also have that Pid =
id, since this is just a restatement of the anti-symmetry of
subset, proved in the preceding chapter. So to show that P is a relator, we are left
with the task of proving that P distributes over composition.

The proof is in two parts. To show that

PR PS C P(R-S),
observe that the right-hand side is the meet of two relations. By the universal
properties of meet and division, the result follows if we can prove the inclusions

G Pi? PS C R-S-e
PR PS 3 C 3-R-S.

Both follow from the definition of P and the cancellation laws of division.

Now for the hard part, which is to show that P(R S) C Pi? PS. The proof involves
tabulations. Let (#, z) be a tabulation of P(i? S) and define

y
=
A((R° G x) H (S G z)).
120 5 / Datatypes in Allegories

We aim to justify the following steps:

P(R S) = x -z° C x .y° y z° C Pi? PS.

The first inclusion follows because y is a function and functions are entire. For
the second inclusion, we prove that x y° C Pi?, and appeal to the symmetrical
argument to obtain y z° C PS.

By definition of division, x -

y° C. PR is equivalent to

G £ y° C i? G and £ y° 9 C 9 i?.

For the first inclusion, we argue

g a; y° c i?. g

=
{shunting of functions}
e-xCR-e-y
=
{definition of $/, A-cancellation}
G a: C R ((i?° G a;) fl (5 G *))
<£= {modular law}
G a: C (G a;) fl (R 5 G z) -

=
{definition of meet}
€-x CR-S -e-z
=
{shunting of z\ division}
x-z° c e\(R 5 g)
=
{since a: 2° P(i? S)} =

true.

The second inclusion is proved as follows:

x 1/° 9

=
{definition of $/, A cancellation}
a: ((i?° e a;) n (5 G *))°
C {monotonicity, converse}
x a;° 9 i?
C {since a; is simple}
E)i?.

This completes the proof.


5.5 J Relational catamorphisms 121

Exercises

5.15 A relator is completely determined by its action on functions. Since P and E


coincide on functions, they are equal. What is wrong with this argument?
5.16 Prove that ER P(dom R) C PR.

5.5 Relational catamorphisms


Since the type functors of Fun(A) are defined as catamorphisms, we first check that
catamorphisms can be extended to include relational algebras.
Let F be a relator and suppose that F has initial algebra a : T <— FT in the
subcategory of functions. By analogy with the above discussion of coproducts, we
can show that a is also initial in the whole allegory:
'

(X- <x =
RFX) =
(X =
€-<[A(J!.F€)5). (5.12)
The proof is:
X•a= R•FX
=
{A is an isomorphism}
A(X a) A(R FX) =

=
{A cancellation (backwards)}
A(X a) A(R F(€ AX))
=

=
{relators; A fusion (backwards, twice)}
AX a A(R Fe) FAX
=

=
{catamorphisms of functions}
A* ([A(Jl-FG)D
=

=
{A cancellation}
x =
e- <[A(R Fe)]).¦

The proof is summarised in the following diagram:


a
T + FT

GA(*.F€)5 t
F<[A(R Fe)])
r
¦

P4 ^— FP4
A{R Fe)
¦

e Fe

j 1- ^ FA
R
122 5 / Datatypes in Allegories

It follows that we can define ([/?]) by the equation

a«D =
<[A(R FG)B.

Equivalently, A([i?]) ([A(i? Fe)]). This identity was first exploited in (Eilen-
=

berg and Wright 1967) to reason algebraically about the equivalence between
deterministic and nondeterministic automata. For this reason we will refer to it
subsequently as the Eilenberg-Wright Lemma,

Type relators

Let F be a binary relator with initial type (a,T), so T is a type functor. To show
that T is a relator, it is sufficient to prove that it preserves converse:

(JR)° J(R°)
=

=
{definition of T, catamorphisms}
(Ti?)°.a =
a.F((Ti?)°,ii0)
=
{since a is an isomorphism}
(TR)0 =
a-F((TR)°,R0)-a0
=
{converse, F relator}
TR a?(TR,R)a°
=

=
{as before}
true.

Exercises

5.17 Provided the allegory is Boolean, every corefiexive C can be associated with
another corefiexive ~C such that

Cn~C =
0 and CU~C =
zd.

For any corefiexive C define guard C =


[C,~C]°. Prove that guard C is a function.
Define the conditional (C —> i2, S) by

(C -> R, S) =
(R + S)- guard C.

Now prove that conditionals are very similar to cases:

RC(C->S,T) =
(R-CCS)tmd(R-~CC T).
This useful exercise gives us another way of modelling conditional expressions, and
is theone that we will adopt in the future.
5.6 / Combinatorial functions 123

5.18 Let F be a binary relator, with initial type (a,T). Suppose that F preserves
meets, i.e.,

F(RnX,SnY) =
F(R,S)nF(X,Y).
Show that T also preserves meets. Note that this proves, for instance, that the list
functor preserves meets.

5.19 Prove that ([/?]) is entire when R is entire. Hint: use reflection to show that
dom ([R]) =
id.

5.6 Combinatorial functions

To counter-balance the foregoing, rather technical material, we devote the rest of


this chapter to giving some immediate feeling for the increase indescriptive power
that one obtains in a relational setting. The functional programmer is familiar with
a range of list-theoretic functions that can be used in the specification and solution
of many combinatorial problems. In this section, we define some of these functions
in terms of relational catamorphisms. All the functions defined below will re-appear
in later chapters, so this section is quite important.

We also take the opportunity to fix some conventions. In the second half of the book
we will be
writing functional programs to solve a number of combinatorial problems
and, since cons-lists are given privileged status in functional programming, we will
stipulate that, in the future, lists mean cons-lists except where otherwise stated.
We will henceforth write list rather than listr to denote the type functor for lists.

Subsequences

The subsequence relation subseq : list A <— list A can be defined by

subseq =
([nil, cons U outr]}.
The function Asubseq takes a list x and returns the set of all subsequences of x. This
definition is very succinct, but is not available in functional programming, which
does not allow either relations or sets. To express Asubseq without using relations,
recall the Eilenberg-Wright lemma, which states that

A([i?D =
([A(/l-F(W,G))B.
If we can find e and / such that

A([mZ, cons U outr] F(id, G)) =


[e,/],
where F(j4, B) = 1 + (A x
B), then we obtain Asubseq =
([e,/]).
124 5 / Datatypes in Allegories

To determine e and / we just expand the left-hand side and simplify:

A([mZ, cons U outr] F(id, G))


=
{definition of F}
A([mZ, cons U o^r] (id + id x G))
=
{coproduct}
A[mZ, (cons U o^r) (id x G)]
=
{power transpose of case}
[Am/, A((cons U outr) (zrf x G))].
So we can certainly take e =
Anil =
r n*7, where r converts its argument into a

singleton set.

To find an appropriate expression for / we will need the power transpose of the join
of two relations. This is given by

A(RUS) =
cup-(AR,AS),
where cup =
A((G outl) U (G outr)) is the function that returns the union of
two sets. The proofof the above equation is a simple exercise using the universal
property of A, and we omit details.

Now we continue:

A((cons U outr) (id x G))


=
{composition over join, naturality of outr}
A((cons (id x G)) U (G outr))
=
{power transpose of join}
cup (A(cons (id x G)), A(G outr))
=
{power transpose of composition, cons and outr are functions}
cup (Peons A(id x G), o^r).

It follows that we can take

/ =
cup (Peons A(irf x G), ott£r),
and so

Asubseq =
([r ni/, c?/p (Peons A(irf x
G), omJt)]).
The final task in implementing Asubseq is to replace the sets by lists. The result is
simple enough to see if we write e and / in the form

e =
{[]}
f (a,xs) =
{cons (a, x) | x G xs}l)xs.
5.6 / Combinatorial functions 125

In the implementation of Asubseq these definitions are replaced by

e =
[[]]
/ (a, xs) =
[cons (a, x) x <- xs] -H- #s.

In other words, we define the implementation subseqs of Asubseq by

subseqs =
([wrap mZ, ca£ (fe£ cons cpr, outr)]),
where cpr : list (A x B) <- A x listB (short for "cartesian product, right") is
defined by

cpr(a,x) =
[(a, b) | b <- x].
To justify the definition of subseqs we need the function setify : PA <- list A that
turns a list into the set of its elements:

setify =
([a;, cup (r x
id)]),
where u) returns the empty set. With the help of setify we can formalise the
relationship between each set-theoretic operation and the list-theoretic function that
implements it. For instance,

setify -nil = u)

setify wrap = r

setify cat =
cup {setify x setify)
setify concat = union seta/y to£ seta/y
seta/y fo£ / =
P/ sefa/y
seta/y cpr =
A(id x
G) (zrf x setify).

Using these identities, it is easy to show by an appeal to fusion that

setify subseqs =
Asubseq.

We leave the details as an exercise.

Cartesian product

The function cpr used above implements one member of an important class of

combinators associated with cartesian product. Two other functions, cpp and cpl,
defined by

cpp(x, y) =
[(a, b) a <-x, b<-y]
cpl(x,b) =
[(a, 6) | a <-x],
126 5 / Datatypes in Allegories

implement A(G x g) and A(G x id), respectively. Thus,

setify cpp =
A(G x
g)
seta/y cpl =
A(G x
zd).
All three functions are among the combinators catalogued in the Appendix for useful
point-free programming.
The three functions are examples of a more general pattern. For a relator F, the
function

cp(F):PFA<-FPA
is defined by cp (F) AF(G). In particular, cpp is an implementation of
=
cp (F) for
FA = A x A, and cpl is a similar implementation when FA A x B. =

As another example, consider cp (list), which is described informally by

cp (list) [xi,x%,..., xn] =


{[ai, 02,..., an] | a,- G Xj}.
Since list R =
([nil, cons (R x
id)]), appeal to the Eilenberg-Wright lemma gives

cp (list) =
([A[nil, cons (G x
G)]]).
Expanding this definition, we obtain

cp (list) =
([Anil,A(cons (G x
G))]) =
([r mZ, Peons A(G x
G)]).
The function cp implemented by a function cplist: list (list A) <- list (list A)
(list) is
obtained by representing sets by lists in the way we have seen above. The result is:

cplist =
([wrap nil, list cons cpp]).
The function cplist is another example of a useful combinator for point-free
programming. We will meet the cp-family again in Section 8.3.

Prefix and suffix

The relation prefix describes the prefix relation on lists, so x prefix y when x-ti-z y =

for some z. Thus, prefix outl cat°. Alternatively, we can define prefix
= ?

init*, =

where init* is the reflexive transitive closure of the relation init = outl snoc° that
removes the last element of a list. The reflexive transitive closure R* of a relation
R will be defined formally in the following chapter.
We can also describe prefix by a relational catamorphism:

prefix =
([nil, nil U cons]).
5.6 J Combinatorial functions 127

The first nil in the catamorphism has type list A <- 1, while the second has type
list A «— A x list A. Strictly speaking, we should write the second one in the form
nil •!, where :1<- Ax list A,

Applying the Eilenberg-Wright lemma to the given definition of prefix, we find

Apreftx =
([r nil, cup (r nil, P cons A(id x
G))]).
Replacing sets by lists in the usual way, we obtain an implementation of Aprefix by
a function inits defined by
inits =
([wrap nil, cat (wrap nil, list cons cpr)]).
This translates to two familiar equations:

inits[] =
[[]]
inits ([a] -H- re) =
[[]] -H- [[a] -H- y y «- mite re].
Note that mite returns a list of initial segments in increasing order of length.

The relation suffix is dual to prefix, but we have to use snoc-lists to describe it as
a relational
catamorphism. Alternatively, suffix tail*, where tail outr cons° = =

removes the first element from a list. The implementation of Asujfix is by a function

tails that returns a list of tail segments in decreasing order of length. The two
equations defining tails are

tails[) =
[[]]
tails (x -H- [a])) =
[y -H- [a] | y <- tails x] -H- [[]].
This is not a legitimate implementation in most functional languages. Instead, one

can use the two equations

taUs[] =
[[]]
tails ([a] -H- x)) =
[[a] -H- x] -H- tails x.

We will see in Section 6.7 how these equations are obtained. Alternatively, tails can

be implemented as a catamorphism on cons-lists:

tails =
([wrap nil, extend]),
where

extend (a, [x] -H- xs) =


[[a] -H- x] -H- [re] -H- res.

We can put zmte and tozte together to give an implementation of the function Acat°
that splits a list in all possible ways:

splits =
zip {inits, tails).
128 5 / Datatypes in Allegories

For this implementation to work it is essential that inits and tails order their
segments in opposite ways.

Partitions

A partition of a list is a decomposition of the list into a list of non-empty contiguous


segments. For instance, the set of partitions of [1,2,3] is

{[[1],[2],[3]], [[1,2], [3J], f[l],[2,3]], [[1,2,3]]}.


A surprising number of combinatorial problems can be phrased as problems about
partitions, and we will see examples in Chapters 7 and 8. The relation

partition : list (list* A) <- list A

is defined by partition =
concat0, where concat =
([nil, cat]) and cat is restricted to
the type list A <- list A* x list A, One can also express partition as a catamorphism

partition =
{[nil, new U glue]},
where

new = cons (wrap x id)


glue = cons (cons x id) assocl -

(id x
cons°).
The pointwise definitions are

new (a, xs) =


[[a]] -H- xs
glue (a, [x] -H- xs) =
[[a] -H- x] -H- xs.
The definition of partition as a catamorphism thus describes a step-by-step
procedure, where at each step either a new singleton segment is created, or the next
element is 'glued' to the front of the first segment.

The definition of partition by a catamorphism is not as perspicuous as its definition


by the converse of a catamorphism. The definitions equivalent can be shown to be
using a theorem that we will prove in the following chapter. This theorem states that
if R : A <- FA is a surjective relation, and if / : T A <- A satisfies / R C a F(id,f),
where T is the type functor induced by T, then f° ([/?]). =

We will apply the theorem with / =


concat and R =
[nil, new U glue]. We have to
show that

concat nil C nil


concat new C cons -

(id x
concat)
concat glue C cons (id x
concat).
5.6 / Combinatorial functions 129

We prove only the third inclusion:

concat glue
=
{definition of
glue}
concat {cons x id) assocZ (td x cons°)
cons

{since concat ([mZ> cat])}


=

cat (cons x concat) assocl (id x consa)


=
{naturality of assocZ}
cat (cons x id) assocZ (zd x (id x concat) cons°)

C {since concat ([niZ, cat])}


=

cat (cons x id) assocZ (id x cat°) (id x concat)


=
{since cat {cons x id) cons (id x cat) assocr}
=

cons (id x cat) assocr assocZ (id x cat°) (id x concat)


C {since assocr assocZ id and cat cat° C id}
=

cons (id x concat).

We also have to show that [niZ, new U glue] is surjective, that is,

id C (niZ nil°) U ((new; U #Z?/e) {new U glue)°).

We leave it as an exercise to show that

new new° = cons (wrap wrap0 x id) cons°

<?Zwe </Z?/e0 =
cons {cons cons° x id) cons*.

Using these equalities, we can now conclude that

{nil niZ°) U (new new°) U (</Zwe glue°)


=
{above}
(niZ nil°) U (cons (((wrap wrap0) U (cons cons0)) x id) cons0}
=
{since (wrap wrap0) U (cons cons°) id (on non-empty lists)}
=

(niZ niZ°) U (cons cons°)


=
{since a a° id for all initial algebras a}
=

id,

which gives the result.

By the Eilenberg-Wright lemma, we obtain

Kpartition =
([Anil, A((new U glue) {id x
e))}.
130 5 / Datatypes in Allegories

We can simplify the second term, arguing:

A((new U glue) (id x g))


=
{power transpose of composition}
union PA(new; U <//we) A(id x e)
=
{power transpose of join}
union P(cwp (Anew;, Aglue)) A(zrf x g)
=
{since new; is a function}
union P(cwp (r new;, Aglue)) A(irf x G).

Finally, we can implement Apartition by a function partitions defined by

partitions =
([wrap m7, concat fe£ (cons (new;, glues)) cpr]),
where #Zwes implements Aglue:

glues (a,[]) =
[]
glues (a, [x] -H- a») =
[[[a] -H- re] -H- a»].
The proof of seta/w partitions =
Apartition is left as an exercise.

Permutations

Finally, consider the relation perm that holds between two lists if one is a
permutation of the other. There are a number of ways to specify perm; perhaps the
simplest is to use the type bag A of finite bags over A as an intermediate datatype.
This type can be described as a functional F-algebra [bnil, bcons] of the functor
F(i4, B) = 1 + A x B. The function bcons satisfies the property that

bcons (a, bcons (6, x)) = bcons (6, bcons (a, #)),
which in point-free style reads

bcons (id x
bcons) =
bcons -

(id x bcons) exch,


-

where exch : B x (A x
C) <— A x (B x
C) is the natural isomorphism

exch =
assocr (swap x
id) assocl.

The property captures the fact that the order of the elements in a bag is irrelevant
butduplicates do have to be taken into account. The function bagify : bag A <-list A
turns a list into the bag of its elements, and is defined by the catamorphism

bagify =
([bnil, bcons]}.
5.6 / Combinatorial functions 131

Since every finite bag is the bag of elements of some list, bagify is a surjective
function, so bagify bagify0 = id.

We can now define perm by

perm =
bagify0 bagify.
In words, # is a permutation of y if the bag of values in x is equal to the bag of
values in y. In particular, it follows at once from the definition that perm perm°. =

The above specification of perms does not lead directly to a functional program for
computing Aperm. One possibility is to express perm as a catamorphism perm =

([nil, add]) and then follow the path taken with all the examples given above. It is
easy to show (exercise) that

perm cons =
perm cons -

(id x
perm),
so we can take add =
perm cons, although, of course, the result is not useful for
computing perm. An alternative choice for add is the relation

add =
cat (id x
cons) exch (id x ca£°).
In words, add (a,x) =
y-\\-[a}-\\-zwhere y-H-2 #, so add (a,re) adds a somewhere
=

to the list x. Although this definition of add is intuitively straightforward, the proof
that perm =
([nil, add]) depends on the fact that bags can be viewed as an initial
algebra, and won't go into it.
we The function A add can be implemented as a

function adds defined by

adds (a, x) =
[y -H- [a] -H- z (y, z) <- splits x],

where splits is the implementation of Acat° described above. The function perms
that implements Aperm is given by

perms =
([wrap nil, list add cpr]).
We will meet perm again in the following chapter when we derive some sorting
algorithms.

Exercises

5.20 Construct functions cup, cap and cross so that

A(RUS) =
cup (AjR,AS)
A(Rf)S) =
cap-{AR,AS)
A(R x S) = cross (AR x AS).
132 5 / Datatypes in Allegories

5.21 Prove that

setify nil =
uj

setify wrap = r

seta/y cons =
cup (r x
setify)
setify- list f =
P/ setify.

5.22 Prove that seta/y subseqs =


Asubseq.
5.23 Express subseq as the converse of a catamorphism. (Hint: think about super-
sequences.)
5.24 As a function of type list A <- list* A, the relation init can be defined as a

catamorphism. How?
5.25 Prove that

new new° =
cons (wrap wrap0 x zrf) cons°

glue <7fa/e° =
cons (cons cons° x zrf) cons°.

5.26 Prove that seta/y partitions =


Apartition.

5.27 Show that

Apartition =
([Am7, c?/p (Pnew, ttmon PAgfee) A(id x
E)]),
and hence find another implementation of Apartition.
5.28 Using bagify bagify0 =
id, show that per7n cons =
per7n cons (zrf x perm).
5.29 A list # is an interleaving of two sequences y and 2 if it can be split into a

series of subsequences, with alternate subsequences extracted from y and z. For


example, [1,10,2,3,11,12,4] is an interleaving of [1,2,3,4] and [10,11,12]. The
relation interleave interleaves two lists nondeterministically. Define interleave as
the converse of a catamorphism.

5.7 Lax natural transformations

As we have seen, reasoning about datatypes in a relational setting makes it


possible toexplore properties that are difficult or impossible to express in a functional
setting. On the other hand, some properties that are simple equalities in a
functional setting become inequalities in a relational one. A good example is provided
by natural transformations and lax natural transformations. A lax natural
transformation is like a natural transformation but the naturality condition becomes an
5.7 J Lax natural transformations 133

inequation. Formally, a collection of arrows <j>a •' FA <- GA is called a lax natural
transformation, and is denoted by <j>: F f-3 G, if

FR<f) D (j)GR (5.13)


for all R. Notice the direction D of the inclusion, which can be remembered by
relating it to the shape of the hook in «-\ The inclusion can be pictured as

FA+?-GA
FR GR

FB +—GB
<P

As one example of a lax natural transformation, we have e : id <-* P; in other


words,

Re D G-Pfl,

for all R. This follows at once from the definition of Pi?.

The main result concerning lax natural transformations is the following theorem.

Theorem 5.2 Let F, G : A <- B be relators and J : B <- Fun(B) the inclusion of
functions into relations. Then 0:F«-^G =
0:FJ<f-GJ.

Proof. First, assume that <j>: F «-^» G, so in particular we have F/ <j> D <j> G/. But
also

=
{shunting of functions}

=
{inequation (5.13) with R=f°}
true,

and so F/ <j> =
<j> G/ for all /.

Conversely, assume that <j> : FJ <- GJ, so F# <j> <j> G# for all functions g. By =

shunting of functions, this gives Fg° 0 D 0 G#° since F and G are relators and
thus preserve converse.
134 5 / Datatypes in Allegories

Now, we complete the proof by arguing:

=
{let (/, g) be a tabulation of R}
Hf-9°)-<f>
=
{relators}
Ff-Fg°-</>
D
{above}

=
{since <j>: FJ <- GJ}
0.G/.G^°
=
{relators}

=
{since / g° tabulates R}
</>FR.
D

Exercises

5.30 Each of the following pairs is related by C or D. State in each case what the
relationship is:

PRr and r R

(R x
R) (id, id) and (id, id) R

cup (Pi? x Pi?) and Pi? cup.

Bibliographical Remarks

The notion of a relator was first introduced in (Kawahara 1973a); the concept
then went unnoticed for long time, until it was reinvented in (Carboni, Kelly,
a

and Wood 1991). Almost simultaneously, Backhouse and his colleagues started to
write series of papers that demonstrated the relevance of relators to computing
a

(Backhouse et al. 1991, 1992; Backhouse and Hoogendijk 1993). The discussion in
this chapter owes much to that work. Our definition of relators is more restrictive
than that of (Mitchell and Scedrov 1993).
Several authors have considered the use of relational products in the context of

program derivation, e.g. (De Roever 1976). The (sometimes excruciating) symbol
Bibliographical Remarks 135

manipulations that result from their introduction can be made more attractive by
adapting a graphical notation (Brown and Hutton 1994; Curtis and Lowe 1995).
As already mentioned in the text, relational algebras and catamorphisms were
first employed in (Eilenberg and Wright 1967) to reason about the equivalence
of deterministic and nondeterministic automata. This work was further amplified
in (Goguen and Meseguer 1983). Numerous examples of relational catamorphisms
can be found in (Meertens 1987), and these examples awakened our own interest

in the topic (Bird and De Moor 1993c; De Moor 1992a, 1994). The circuit design
language Ruby is essentially a relational programming language based on
catamorphisms (Jones and Sheeran 1990).

In the context of imperative program derivation, it has been convincingly


demonstrated that predicate transformer semantics are preferable to a relational
framework (Dijkstra 1976; Dijkstra and Scholten 1990). Just as it is possible to lift the
type structure of Fun to Rel, one can also lift the type structure of Rel to an
appropriate category of predicate transformers. The key property that makes this
possible is that every monotonic predicate transformer can be factored as a pair of
relations, in the same way as every relation can be factored in terms of functions
(De Moor 1992b; Gardiner, Martin, and De Moor 1994; Martin 1991; Naumann
1994). Although initial results are promising, there is as yet no definitive evidence
that this will lead to a useful calculus for program derivation.
Chapter 6

Recursive Programs

The recursive programs we have seen so far have all been based, one way or the
other, on catamorphisms. But not all the recursions that arise in programming
are homomorphisms of a datatype. For example, one may want to implement the
converse of a catamorphism, or a divide and conquer scheme.

To set the scene, we begin this chapter with


simple programming problem whose
a

solution is given by a non-structural recursion. The solutions of a recursion


equation
are called its fixed points and we continue with a discussion of some general

properties of fixed points. We then go on to discuss an important class of computations,


called hylomorphisms, that captures most of the recursive programs one is likely to
meet in practice. To illustrate the material, we give applications to the problem of

deriving fast exponentiation, one or two sorting algorithms, and an algorithm for
computing the closure of a relation.

6.1 Digits of a number

The problem in this section is simply to convert a natural number to its decimal
representation. The decimal representation of a nonzero natural number is a list of
digits starting with a nonzero digit. The representation of zero is exceptional in this
respect, in that it is a list with one element, namely zero itself. Having observed
this anomaly, we shall concentrate on deriving an algorithm for converting positive
natural numbers.

The first step is to specify the problem formally. The types involved are four in
number: the type Nat* of positive natural numbers; the type Digit {0,1,..., 9} =

ofdigits; Digit*
the type {1,2,..., 9}
= of
digits; and
finally the type of
nonzero

decimal representations, which are non-empty sequences of digits beginning with a


nonzero digit. This last type can be declared as

Decimal ::= wrap Digit* snoc (Decimal, Digit).


138 6 J Recursive Programs

Thus, ([wrap, snoc], Decimal) is the initial algebra of the functor

FA =
Digit* + (A x Digit).
The function val: Nat* <- Decimal is a catamorphism
val =
([embed, op]),
where embed : Nat* <- Digit* is the inclusion of digits into natural numbers, and
op(n, d) =
lOn + d. To check this, let x be the decimal

x =
snoc (snoc (wrap d%,di), do).
Then we have

valx =
10(10^2 + di) + (h =
102d2 + 101d1+ 10°d<).
We can now specify the function digits, which takes a number and returns its decimal
representation, by

digits C val°. (6.1)


The use of C rather than = is necessary because we do not know (at least not yet)
that val° is a function. One should read (6.1) as requiring a functional refinement
of val°. The goal is to synthesise an algorithm from this specification of digits.
As a first step, we expand the definition of val°:

val°
=
{definition}
([embed, op]}°
=
{catamorphisms}
([embed, op] Fval [wrap, snoc]° )°
=
{converse}
[wrap, snoc] Fval° [embed, op]°
=
{definition of F}
[wrap, snoc] (id + val° x id) [embed, op]°
=
{coproduct}
[wrap, snoc (val° x id)] [embed, op]°
=
{coproduct}
(wrap embed0) U (snoc (val° x id) op°).
-

Hence val° satisfies the recursive equation

val° =
(wrap embed0) U (snoc (val° x
id) op°).
6.1 j Digits of a number 139

In order to see what relation is given by op° : Nat* x Digit <- Nat+, we reason:

op(n, d) ra =

=
{definition of op}
lOn + d = ra

=
{arithmetic and 0 < d < 10}
n = ra div 10 A d =
m mod 10.

To obtain the right type for op° we need to ensure that 0 < n in the above
calculation, and this means require 10 precondition. So op° is a
that we < ra as a

partial function, defined if and only if its argument is at least 10. On the other
hand, embed0 is also a partial function, defined if and only if its argument is less
than 10. The join in the recursive equation for val° can therefore be replaced by a
conditional expression, and we obtain

if
/o _

/ wraP m"> m < 10


snoc (val° (ra div 10), ra mod 10), otherwise.

As a recursive program, this equation determines val° uniquely. The recursion


terminates on all arguments because ra > 10 and n = ra div 10 together imply
n < ra, and so val° is applied to successively smaller natural numbers. It therefore
follows that val° is a (total) function and we can take digits = val°.

Writing the result in functional programming style, we obtain the program

[ra], if ra < 10
digits ra = <
div mod otherwise.
digits (ra 10) -H- [ra 10],
The program runs in quadratic time because the implementation of snoc on lists
takes linear time. To obtain linear-time program we
a can introduce an

accumulation parameter and write digits ra / (ra, []), where =

M*** if 10
f(m
J v x); '
= i m

otherwise.
<

/(radivlO, [nmod 10] -H-re),


Notice that the anomalous case of 0 is treated correctly in the above algorithm.

Simple is, the digits of a number example illustrates a basic strategy for
as it
program derivationusing a relational calculus. First, a function of interest is specified
as a refinement of some relation R. Then, after due process of manipulation, R is

discovered to be a solution of a certain recursion equation. Finally, the recursion is


used to implement the function. As we shall see, the due process of manipulation
can often be replaced by an appeal to a single theorem.
140 6 j Recursive Programs

Exercises

6.1 Justify the final program above.

6.2 Least fixed points

Catamorphisms are defined as the unique fixed points of certain recursion


equations
(as are the converses of catamorphisms). Here we are interested in the fact
that,
when working in a relational context, one may also consider least fixed points.

The key result for reasoning about least fixed points is the celebrated Knaster-
Tarski theorem (Knaster 1928; Tarski 1955), which in our terminology is as follows:

Theorem 6.1 (Knaster-Tarski) Suppose 0 is a monotonic mapping (not


necessarily a functor) on the arrows of a locally complete allegory, taking a relation
X : A <- B to <j)X : A <- B. Then each of the equations <j)X C X and <j)X X =

has a least solution and these least solutions coincide. Dually, each of the equations
X C <j)X and X =
<j)X has a greatest solution and these greatest solutions coincide.

Proof. Let X =
{X (j>X C X} and define R =
C\X. We first show that <j>R C R,
or, equivalently, that X E X implies (j>R C X. We reason:

X eX

=> {definition of R}
RCX
=> {(j) monotonic}
<f>RC<f>X
=> {since <j>X C X}
</>RCX.
But now, since R G X, it follows that X =
R is the least solution of <j>X C X. It

remains to prove that R C <j>R:

RC<j>R
<= {definition of R}
<t>{<t>R) C <j>R
<= {since <j> monotonic}
<j)RC R
=
{above}
true.

D
6.2 J Least fixed points 141

For brevity we henceforth write (fiX :


<j>X) for the least solution of the equation
X =
(j)X.

Let consider what the Knaster-Tarski theorem says about datatypes and
us now

catamorphisms. Recall that ([/?]) was defined by the universal property

X =
<[R]) =
Xa =
RFX,

where F is the base relator of the catamorphism. Because a is an isomorphism, the


equation on the right can also be written
R a°, ([/?])
as X = FX so X =
is the
unique (and hence both greatest and least) solution of the equation. Since F is a
relator, the mapping <j> defined by <j>X R FX a° is monotonic. Hence by the
=

Knaster-Tarski theorem we obtain

<[R])CX <= RFXa°CX (6.2)


XQ([R]) <= XCRFXa0. (6.3)
The fusion law for catamorphisms therefore has two variants in which equality is
replaced by inclusion:

<[T])C S- <[R]) <= TFSCSR (6.4)


S •([#]) C([T]) <= SRCTFS. (6.5)
The proofs of these results are easy exercises.

Exercises

6.2 Where in the proof of the Knaster-Tarski theorem did we use the locally
complete property of the allegory?

6.3 Say that <j> is continuous if it preserves joins of ascending chains of relations.
That is, if X0 C Xx C X2..., then </)([j{Xn 0 < n}) \J{<j>Xn | 0 < n}. Prove =

Kleene's theorem (Kleene 1952) which states that, under the conditions of the
Knaster-Tarski theorem and the assumption that <j> is continuous,

(HX:<J>X) =
U{<£n0 I 0 < n},
where <j>nX =
<j>X <j)X <j>X (n times).

6.4 Use the Knaster-Tarski theorem to justify the following method for showing
(pX <j)X) : C A: show that <j>A C A.

Use Kleene's theorem to justify the alternative method: show that X C A implies
<j>X C A. This method is called fixed point induction.
142 6 / Recursive Programs

6.5 If (j> is a monotonic mapping, then the least solution of <j)X C X satisfies
<j>X = X. Show that this is a special case of Lambek's lemma when the partial

order of arrows A <- B is regarded as a category, and <j> is regarded as a functor on


this category.

6.6 Prove (6.4) and (6.5).


6.7 Prove that ([#]) C
([5]) follows from R F([S]) C S F([S]). Give an example
where it is not true that i?CS, but that, nevertheless, ([/?]) C
([5]).
6.8 An arrow is said to be difanctional if R =
R R° R. The difunctional closure of
an arbitrary arrow R is the least difunctional relation that contains R. Construct
the difunctional closure of R as a least fixed point.

6.3 Hylomorphisms
The composition of a catamorphism with the converse of a catamorphism is called
a hylomorphism. Thus hylomorphisms are expressions of the form ([/?]) ([5])°.
Hylomorphisms are important because they capture the idea of using an intermediate
data structure in the solution of a problem.

More precisely, suppose that R : A <- FA and also that S : B <- FB. Then we have
([#]) ([S])0 : A <r- B, where ([/?]): A *- T and ([5])° : T *- B, and where T is the
initial type of F. The type T is the intermediate data structure.

Practically every relation of interest can be expressed as a hylomorphism. Since


([a]) zrf, all catamorphisms and converses of catamorphisms are themselves
=

examples of hylomorphisms. We will see many other examples in due course.

Hylomorphisms can be characterised as least fixed points. More precisely, the


following theorem holds:

Theorem 6.2 Suppose that R : A <- FA and S : B <- FB are two F-algebras.
Then ([#]) ([5])° : A *- B is given by

m-W =
{liX:R'FX-S°).
Proof. First we show that ([/?]) ([5])° is a fixed point:

WMS])0) s°
^
=
{functors}
i?-Fp])-F([S])0.S°
6.3 J Hylomorphisms 143

=
{catamorphisms}
([Z?D.a.F([S])0.S0
=
{converse; catamorphisms}
<im m°-
¦

Second, we show that

M-lSfCl <= RFXS°CX

and appeal to Knaster-Tarski. The proof makes use of the division operation, a

typical strategy in reasoning about least fixed points:

fl^D W £ x
=
{division}

<= {Knaster-Tarski, and equation (6.2)}


R F(X/([S])0) a° C X/W
=
{division}
R-F(X/([S])0)'a°-<[S])0CX
=
{catamorphims}
R F(X/([S])0) F([5D° •S°CI
<= {functors and division cancellation}
R FX S° C X.

When FX =
GX + HX, so F-algebras are coproducts, we can appeal to the following
corollary of Theorem 6.2:

Corollary 6.1

fl^i, #2]) ([Si, 52D° =


(fiX : (Rt GX 5i°) U (R2 HX 52°)).
Proof. We reason:

[^^.(GJr + HJT).^!,*]0
=
{coproduct}
[R1-GX,R2-HX]-[SUS2}°
=
{coproduct}
(Hi GX 5i°) U (R2 HX .

52°)).
144 6 J Recursive Programs

Theorem 6.2, henceforth called the hylomorphism theorem, can be read as


representing prototypical 'divide and conquer' scheme. The term 5° represents the
a

decomposition stage, FX represents the stage of solving the subproblems


recursively, and R represents the recombination stage. We will see applications in the
next section and in Section 6.6.

Exercises

6.9 Specify the function that converts the binary representation of a number to its
octal representation as a hylomorphism.
6.10 Show that hylomorphisms preserve simplicity: if R is simple and S is simple,
then <[R]) ([5°D° is simple.

6.4 Fast exponentiation and modulus computation


Consider the problem of computing ab for natural numbers a and 6. The curried
function exp : (Nat <- Nat) <- Nat is defined by the catamorphism

exp a =
{[one, mult a]).
This definition encapsulates the two equations a0 1 and a6+1 a x ab.= The =

computation of exp a 6 by the catamorphism takes 0(b) steps, but by using a divide
and conquer scheme we can improve the running time to 0(log b) steps.
To derive the fast exponentiation algorithm consider the type Bin of binary
numbers, defined by Bin UstlBit, where Bit=
{0,1}. For example, as an element of
=

Bin the number 6 is given as [1,1,0]. The partial function convert: Nat <- Bin
converts a well-formed binary number, that is, a sequence of bits that is either empty
or begins with a 1, into natural numbers and is given by a snoc-list catamorphism

convert =
([zero, shift]},
where shift: Nat+ <r- Nat x Bit is given by shift (n, d) =
2 x n + d.

Now we can argue:

exp a

D {since convert is simple}


exp a -
convert convert0
=
{fusion, see below, with op a (n, d) =
(d =
0 -> n2, a x
n2)}
{[one, op a]) convert0
6.4 J Fast exponentiation and modulus computation 145

=
{corollary to hylomorphism theorem}
(fjiX : (one zero0) U (op a (X x id) shift0))

The fusion step is justified by the equations

exp a zero = one

exp a -

shift =
op a (exp a x
id).

By construction, zero and shift have disjoint ranges, so we can proceed as in the
digits of a number example and replace the join by a conditional expression. The
result is the following program for computing exp a b:

if 6 0
I *' =

ab =
~

op a (expa(bdiv 2), b mod 2), otherwise.

Modulus computation

Exactly the same idea can be used to compute a mod b for natural a and positive
natural b. The curried function mod :
(Nat <- Nat) <- Nat+ is defined by the
catamorphism

modb =
([zero,succb]),
where succ 6o =
(o 6 l->0, o + l). The computation
=
of a mod b by this
method takes 0(a) steps. But, as before, we can argue:

mod b
D
{since convert is simple}
mod b -
convert convert0
=
{fusion, see below}
{[zero, op &]) convert0
=
{hylomorphisms}
(fiX : (zero zero0) U (op b (X x
id) shift0)).
The fusion step is justified by the equations

mod b zero = zero

mod b shift =
opb (mod b x
id),
where op b (r, d) =
(n > b -> n 6, n) and n = 2 x r + d.
146 6 J Recursive Programs

The result is the program

if a =
0

The running time is


{0, op b
op b
(mod (a b div
2), 0),
(mod b (a div 2), 1),
if even a

if odd a.

0(log a) steps.
These simple exercises demonstrate how divide and conquer schemes can be
introduced by invoking a suitable intermediate datatype.

Exercises

6.11 Why do the programs for exponentiation and modulus terminate and deliver
functions?

6.5 Unique fixed points


The hylomorphism theorem states that a hylomorphism is the least fixed point of a
certain recursion equation. However, it is not necessarily the only fixed point. To
illustrate, consider the hylomorphism

X =
([zero, id]} ([zero, positive])0
on natural
numbers, where positive is the coreflexive that holds only on positive
integers (so positive succ succ°). The catamorphism ([zero, id]) describes the
=

constant function that always returns zero, and ([zero, positive]) describes the
coreflexive that holds only on zero. Hence X is again the coreflexive that holds only on

zero. However, the recursion equation corresponding to the hylomorphism is

X =
[zero, id] (id + X) [zero, positive]0,
which simplifies to X =
(zero zero0) U (X positive). This equation has other
solutions, including X =
id.

Note also that [zero, positive]0 : 1 + Nat <- Nat is a function, as is [zero, id], but the

least solution of the recursion is not even entire. However, Exercise 6.10 shows that
if R and S are simple relations, then so is (fiX : R FX S).
It isimportant to know when a recursion equation X =
R FX 5° has a unique
solution, and when the solution is a function. It is not sufficient for R and 5° to
be functions, as we saw simple to state: we need the fact
above. The condition is
that member (F) 5° is an relation, where member (F)a : A <- FA is the
inductive
membership relation for the datatype FA. The two sections that follow explain the
essential ideas without going into full details.
6.5 j Unique fixed points 147

Inductive relations

Basically, a relation is inductive if one can use it to perform induction. Formally,


R : A <- A is inductive if

r\xcx => na

for all X : A <- B. At the point level, this definition says that if

(Vc:cRa=>cXb) =? aXb

holds for all a and 6, then X holds everywhere.

To take a simple example, let < be the usual ordering on natural numbers. For
fixed 6, the implication

(Va:(\/c:c<a=> cXb) =? aXb) =» (Va : aXb)


asserts the general principle of mathematical induction for natural numbers, in
which the role of an arbitrary proposition involving a is played by the expression

aXb. As another example, take the relation tail outr cons°. The induction
=

principle here is that if a relation holds for a list x whenever it holds for tail #, then
it holds for every list.

A key result is that if S is inductive and R R C R 5, then R is also inductive. This


result is left as an instructive exercise in division. It follows that if S is inductive
and RCS, then R is inductive. It also follows that S is inductive if and only
if S+ is, where S+ is the transitive closure of 5. This relation can be defined by
S~*~ (fiX : S U (X S)). The reflexive transitive closure S* is the subject of a
=

separate section given below.

There is another way to define the notion of an inductive relation, but it requires
theallegory to be Boolean. A relation R : A <- A is well-founded if
XCXR => XC0

for all X : B <- A. This corresponds to the set-theoretic notion that there are no

infinite chains ao, oi,... such that ai+iRai for all i > 0. If a relation is inductive,
then it is also well-founded, but the converse holds only in a Boolean allegory.

Membership

The other key idea is membership. Data types record the presence of elements, so
onewould expect a relator F to come equipped with a membership arrow membera
'

A <- FA for each A, such that a member x precisely when a is an element of x. In


fact, not all relators do possess a membership relation, though fortunately those
148 6 j Recursive Programs

relators that arise in programming (the polynomial relators, the power relator, and
the type relators) do. Here are the membership relations for the polynomial relators,
in which we write member (F) to emphasise the dependence on F:

member
(id) =
id

member (Ka) =
0
member (F 4- G) =
[member (F), member (G)]
member (F x G) =
(member (F) outl) U (member (G) outr)
member (F G) =
member (G) member (F).

Most of these are intuitively obvious, given the informal idea of membership. For
example, in Rel the relator FA = AxA returnspairs of elements and # is a member
of a pair (y, z) if x =
y or x =
z. On the other hand the constant relator Ka(B) A =

records no elements from B, so its membership relation is the empty relation.

The membership relation for the power relator is G, as one would expect. That
leaves the membership relation for type relators. In a power allegory, the problem
of defining the membership relation for a type functor T is the same problem as
defining setify for the type. We have
member (T) =
G setify (T)
setify (T) =
Amember(T).
There is an alternative method (see Exercise 6.17) for defining the membership
relation of type functors that does not depend on sets.

So far we have not said what it means for a relation to be a membership relation.

One might expect that the formal definition would be straightforward, but in fact
it is not and we will not discuss it in the text (but see Exercise 6.18). If F does have
a membership relation member, then

R member D member FR

for all R, so member is a lax natural transformation member : id <-^ F. In fact,


member provided it exists largest lax natural transformation
is the with this
- -

type. It follows that membership relations, if they exist, are unique.

Consequences

The central result about the existence of inductive relations is that member (F) a°
is inductive, where a is the initial F-algebra. For example, consider the initial type
([zero, succ], Nat) of the functor FX = 1 + X. The membership relation here is
[0, id], so we now know that

[0, id] [zero, succ]° =


succ°
6.5 / Unique fixed points 149

is inductive. Furthermore, < is the relation pred*, where pred =


succ°, so this
relation is also inductive. This remark justifies the termination of the recursion in
the digits of a number example.

As a second example, take lists. The membership relation is [0, outr], so

[0, outr] [nil, cons]° =


outr cons°

is inductive. Since tail =


outr cons° we obtain that tail*, the proper suffix relation
is inductive. With snoc-lists, init and the proper prefix relation are both inductive.

The theorem referred to earlier about unique solutions is the following one.

Theorem 6.3 If member (F) S is inductive, then the equation X = R FX S


has a unique solution X =
(j)(R, S). Moreover, (/>(R, S) is entire if both R and S are

entire.

Proof. For a full proof see (Doornbos and Backhouse 1995).


?

Corollary 6.2 Suppose member (F) g is inductive. Then the unique solution of
X f FX g is a function.
=

Proof. The unique solution is X ([/]) ([<7°D°, which is entire by the theorem,
=

since / and gare. But Exercise 6.10 shows that the solution is also simple, since /

and g are.

For the next result, recall that R is surjective if id C R R°. Thus, R is surjective
if and only if R° is entire.

Corollary 6.3 If member (F) R° is inductive, then ([/?]) is surjective if R is.

Proof. X =
a FX R° has the unique solution X =
([R])0.
?

Using these results, we can now prove the theorem used in Section 5.6 to justify the
definition of partition as a catamorphism.

Theorem 6.4 If R is surjective and / R C a F/, then f° =


([/?]).
150 6 j Recursive Programs

Proof. In one direction we argue:

=
{shunting and ([a]) =
id}
/ d^D C ([a])
<= {fusion}
f-RCa-Ff
<= {assumption}
true.

In the other direction we argue:

f° C ([R])
<= {claim: ([/?]) is surjective}
fl*D-([/q)0-/0£ d^D
«= {since / ([-R]) £ «d from above; converse}
true.

By Corollary 6.3 the claim follows by showing that member R° is inductive. But

member R°
C {since /-iJCa-F/, shunting}
member F/° a° /
C {since member : id <-^> F}
/° member a° •/.

Now, by Exercise 6.16, /° member a° / is inductive because member a° is.


Finally, any relation included in an inductive relation is inductive, so member i?°
is inductive.

Exercises

6.12 Prove that R is inductive if and only if the equation X =


R\X has a unique
solution.

6.13 Prove that if S is inductive and R- R C R- S, then R is inductive.

6.14 Is the empty relation 0 inductive? What about II?

6.15 Show that the meet of two inductive relations is again inductive. Give a
6.6 J Sorting by selection 151

counter-example to show that the join of two inductive relations need not be
inductive.

6.16 Show that if R is well-founded, then so is f° R / for any function /.

6.17 Define inlist : A <- list+ A


catamorphism. Why can't inlist : A <- list A
as a

also be defined as a catamorphism? An


arbitrary element of a list can be found by
taking the first element of an arbitrary suffix, thus we can define inlist head tail*. =

Show how this definition can be generalised to define intree, where

tree A ::= tip A bin (tree A, tree A).


How about

tree A ::= null fork (tree A, A, tree A) ?

6.18 The formal definition of membership is this: a collection of arrows member is


a membership relation of F if

FR -

(member\id) =
member\R
for all R. Show that F has a membership relation member if and only if FR
(member\S) =
member\(R S) for all R and S.

6.19 Assume that id is the largest lax natural transformation of type id <-^> id, and
that relator F has a membership relation member. Show that member is the largest
lax natural transformation of type id <-^ F.

6.20 Prove that for any relators F and G, the relation member (F)\member (G) is
the largest lax natural transformation of type F <-^> G.

6.21 Prove that in a Boolean allegory member (F) is entire if and only if F0 =
0.

6.6 Sorting by selection

The problem of sorting


interesting one because of the variety of approaches
is an

one can take. One head for


a catamorphism, the converse of a catamorphism, or
can

various hylomorphisms using different intermediate datatypes. We will concentrate


on just two sorting algorithms that depend on selection for their operation.

The function sort : list A list A sorts


given connected preorder
<- a list under a

R : A «- A. A relation R is said to be connected if R U R°


II; the normal =

terminology is total but this invites confusion with the quite different idea of an
entire relation. The function sort is specified by

sort C ordered perm, (6.6)


152 6 J Recursive Programs

where perm was defined in the preceding chapter, and ordered is the coreflexive
that tests whether a list is ordered under R.

If relation R is a linear order (that is, a connected anti-symmetric preorder), then


ordered perm is a function and (6.6) determines sort uniquely, but we assume

only that R is a connected preorder, so the use of refinement is necessary. Strictly


speaking, we should parameterise both sort and ordered with the relation R, but
for this section it is simplest to assume that R is fixed.

We can define ordered as a relational catamorphism

ordered =
([nil, cons ok]},
where the coreflexive ok is defined by the corresponding predicate

ok (a, x) =
(V6 : b inlist x : aRb).
The relation inlist : A <— list A is the membership relation for lists. Thus ordered
rebuilds its argument list, ensuring at each step that only smaller elements are
added to the front. There is an alternative definition of ok, namely,

ok(a,x) =
(x =
[] V aR(headx)),
but this definition turns out not to be so useful for our purposes.

Selection sort

In outline, the derivation of selection sort is as follows:

ordered perm
=
{since perm perm° and ordered ordered0}
= =

(perm ordered)0
=
{since ordered ([nil, cons ok]}}
=

(perm ([nil, cons ok]})°


D {fusion, for an appropriate relation select}
([nil, select0])0.
In selection sort we head for an algorithm expressed as the converse of a

catamorphism. The proviso for the fusion step is

perm cons ok D select0 (id x perm)


and the following calculation shows how select may be constructed:
6.6 j Sorting by selection 153

perm cons ok
=
{since ([nil, perm cons]) (Section 5.6)}
perm =

perm cons (id x perm) ok


-

=
{claim: (id x perm) ok ok (zrf x perm) (Exercise 6.22)}
=

perm cons ok (zrf x perm)


D {specifying select C ok cons0 perm}
select0 (id x perm).

In words, se/ec£ is defined by the rule that if (a,y) select x, then [a] -W- =
y is
a permutation of x with aRb for all elements b of y. The relation se/ec£ is not
a function because it is undefined on the empty list. But we do want it to be a

function on non-empty lists. Suppose we canfind base and step so that

([base, step]} embed C ok cons° ([nil, perm cons]},


where embed : list+ A <- list A converts a non-empty element of list A to an element
of list+ A. Then we can take select =
([base, step]} embed.

The functions base and step are specified by the fusion conditions:

C ok -
cons° perm wrap

step (id x ok cons°) C ok cons° perm cons.

These conditions are satisfied by taking

base a =
(a,[])
iiaRb
stev(a (b x))
step(a,(0,x))
-

I ^'W^*)'
| (&,[a]Hf a), otherwise.

We leave details as an exercise. Finally, appeal to the hylomorphism theorem gives


that X =
([nil, select0]}0 is the unique solution of the equation

X =
(nil nil°) U (cons (id x
X) select),
so we can implement sort by
ifx =
[]
sort x =
{ [o]-W- sort y, otherwise
where (a, y) =
select x.
154 6 j Recursive Programs

Quicksort

The so-called 'advanced' sorting algorithms (quicksort, mergesort, heapsort, and


so on) all use some form of tree
as an intermediate datatype. Here we sketch the

development of Hoare's quicksort (Hoare 1962), which follows the path of selection
sort quite closely.

Consider the type tree A defined by

tree A ::= null fork (tree A, A, tree A).


The function flatten : list A <- tree A is defined by

flatten =
([nil, join]},
where join (x,a,y) x -H- [a] -H- y. Thus flatten produces a list of the elements in
a tree in left to right order.

In outline, the derivation of quicksort is

ordered perm
D {since flatten is a
function}
ordered flatten flatten0 perm
=
{claim: ordered flatten =
flatten inordered (see below)}
flatten inordered flatten0 perm
=
{converses}
flatten (perm flatten inordered)0
D {fusion, for an appropriate definition of split}
flatten ([nil, split0])0.

In quicksort we head for an algorithm expressed as a hylomorphism using trees as

an intermediate datatype.

The coreflexive inordered on trees is defined by

inordered =
([null, fork check])
where the coreflexive check holds for (x, a, y) if

(V6 : b intree x => bRa) A (V6 : b intree y => aRb).


The relation intree is the membership test for trees. Introducing Ff =
f x id x f
for brevity, the proviso for the fusion step in the above calculation is

split0 F(perm flatten) C perm flatten fork check.


6.6 j Sorting by selection 155

To establish this condition we need the coreflexive check' that holds for (x,a,y) if

(V6 : b inlist x => bRa) A (V6 : b inlist y => aRb).

Thus check' is similar to check except for the switch to lists.

We now reason:

perm flatten fork check


=
{catamorphisms, since flatten =
([nil, join]}}
perm join F flatten check
=
{claim: F flatten check =
check' F flatten}

perm join check' F flatten


=
{claim: perm join =
perm join F perm}
perm join Fperm check' F flatten
=
{claim: Fperm check'
Fperm-, functors}
=
check'

per7n jom check' F(perm flatten)


D {taking spZi£ C check' join0 perm}
split0 F(perm flatten).

Formal proofs of the three claims are left as exercises. In words, split is defined
by the rule that if (y, a, z) =
split x, then y -H- [a] -H- z is a permutation of # with
6i?a for dll b in y and ai?6 for all b in z. As in the case of selection sort, we can
implement split with a catamorphism on non-empty lists:

split =
([base, step]} embed.

The fusion conditions are:

base C check' join0 per7n wrap

split (id x check' join) C check' join0 per7n cons.

These conditions are satisfied by taking

base a =
([],<*,[])
if aRb
stev(a (x b v))
step(a,{x,b,y))
-

I (M 6?^*>^6'2/)'
| (x? [a] ^)? otherwise.

Finally, appeal to the hylomorphism theorem gives that X =


flatten ([nil, split0])0
is the least solution of the equation

X =
(nil niZ°) U (join (X x id x X) spZif).
156 6 j Recursive Programs

Hence sort can be implemented by

N, ** []
f
=

sort x = < sort y -H- [a] -H- sor£ 2, otherwise

[ where ($/, a, 2) =
split x.
The derivation of quicksort is thus very similar to that of selection sort except for
the introduction of trees as an intermediate datatype.

Exercises

6.22 Using Exercise 6.19, show that inlist inlist perm. Give a point-free =

definition of ok.
Using the preceding exercise, prove that (idxperm)-ok ok-(idxperm). =

6.23 Why is the recursion in the programs for selection sort and quicksort
guaranteed to terminate?

6.24 Writing ordered R to show explicitly the dependence on the preorder Z2, prove

that ordered R ordered S =


ordered (R n 5), stating any assumption you use.

6.25 Consider the problem of sorting a (finite) set. Why is the second of the
specifications
sort C ordered R setify0
sort C ordered (R fl neq) setify0,
more sensible? The relation neq satisfies a neq b if a ^ b. Develop the second
specification to a version of selection sort, assuming that the input is presented as
a list possibly with duplicates.

6.26 Sort using the type tree A as in quicksort, but changing the definition of
flatten =
{[nil Join]) by taking join (x, a, y) =
[a] -H- x -H- y.

6.27 Repeat the preceding exercise but with join (x, a, y) =


x -H- y -H- [a].
6.28 Repeat the preceding exercise but with join(x,a,y) =
[a] -H- merge (x,y),
where merge merges two ordered lists into one:

merge (x, []) =


x

merge ([],y) =
y

t\ i n m ,, f [a)-W-merge(x,[b]-W-y), if aRb
merge([a]*x,[b} + y)
{{^^[[afi^
=

otherwise.

6.29 What goes wrong if one attempts to sort using the intermediate datatype
tree A ::= null tip A fork (tree A, tree A) ?
6.7 J Closure 157

6.30 Recall from Section 5.6 that perm =


([nil, add]}, where

add =
cat (id x
cons) exch (id x cat°),
and exch = assocr (swap x
id) assocl. Using this characterisation of perm, we can

reason:

ordered perm
=
{using perm ([nil, add]}}
=

ordered ([nil, add]}


=
{fusion}
([nil, ordered add]}.
D
{for a suitable function insert}
([nil, insert]}.

Verify the fusion condition ordered add =


ordered add (id x ordered). Describe a

function insert satisfying insert


(id ordered) x C ordered add, and hence justify
the last step. The resulting algorithm is known as 'insertion sort'.

6.7 Closure

A good illustration ofthe problem of how to compute the least fixed point of a
recursion equation, when other fixed points may exist, is provided by relational
closure. For every relation R : A<- A, there exists a smallest preorder R* containing
R, called the reflexive transitive closure of R. Our primary aim in this section is
to show how to compute E(R*) : PA <— PA whenever the result is known to be a

finite set the computation will terminate). Many graph algorithms make use of
(so
such a computation, for instance in determining the set of vertices reachable from
a given vertex.

The closure of R is characterised by the universal property

R C X =
R* C X for all preorders X.

It can also be defined explicitly by either of the equations

ZT =
(fiX : id U (X R)) (6.7)
R* =
(fiX :id U (R-X)). (6.8)
The proof that these definitions coincide is left as an exercise.
158 6 J Recursive Programs

To justify (6.7) we have to prove that S =


(fiX : id U (X R)) is the smallest
preorder containing R. Since

id C id U (S-R) C 5,

we have that 5 is reflexive. Using this, we obtain

R C id U R C id U (S R)
-
C S,

and so S contains R. For transitivity, we argue:

S-SCS
=
{left-division}
SCS\S
<= {definition of S}
id U (S\S) -RCS\S
=
{division}
S (id U (5\5) .U)CS
=
{composition over join}
S U (S (5\5) -fi)CS
<= {cancellation of division}
5 U (S fl) C 5
=
{definition of S}
true.

Note the similarity of the proof to that of Theorem 6.2 with a switch of division
operator. Finally, suppose X is a preorder that contains R. Then we have

id U (X R) C id U (X-X) C X,

and so S C I.

Computing closure

It is a fact that the equation X =


id U (X R) has aunique solution, necessarily
X =
/?*, if and
only if i? is an inductive relation. In particular, tail is inductive, so
suffix is characterised by the equation

suffix id U (suffix fawi).


Simple calculation leads to the following recursion equation for Asuffix:

Asuffix =
cup (r,A(suffix faw/)).
6.7 J Closure 159

Using the fact that tail is not defined on the empty list, we can introduce a case

analysis, obtaining

(Asuffix)[] =
{[]}
(Asujfix) ([a] -W-x) =
{[a] -H- x) U (Asuffix) x.

Representing sets by lists in the usual way, we now obtain the following recursive
program for computing the function tails of Section 5.6:

tails[] =
{[}]
tails ([a] -H- x) =
[[a] -H- x] -H- tails x.

All this is very straightforward, but the method only works because the basic
recursion equation has a unique solution. In this section we show how to compute
A(R*) when R is not an inductive relation.

Rather than attempt to derive a method for computing A(R*) directly, we


concentrate first on giving an alternative recursive formulation for R*. This recursion
will be designed for efficient computation once we bring in the sets. The reason for
this strategy is that it will enable us to employ relational reasoning for as long as

possible.

In the following development use is made of relational subtraction. Recall from


Exercise 4.30 that R S is defined by the universal property

R-SCT =
RCSUT.

Prom this we obtain a number of expected properties, including

R-Q =
R
RUS =
RU(S-R)
fl-(5UT) =
R-S-T

(RUS)-T =
(R-T)U(S-T).
In the third identity the subtraction operator is assumed to associate to the left, so
R S—T=(R S)—T. Use of these rules will be signalled just with the hint
subtraction.

We will also make use of the following property of least fixed points, called the
rolling rule:

ifiX:(M,X)) =
^((iX-.tP^X)).
The proof of the rolling rule is left as an exercise, as are two other identities for

manipulating least fixed points.


160 6 J Recursive Programs

Finally, we make use of the fact that

R* S =
(fiX : S U (R X))
This too is left as an exercise.

Now for the main calculation. The idea is to define 8 by

0(P,Q) = P U (iaX:QU(R-X-P)), (6.9)


and use 6 to obtain a recursive method for computing R* S. Prom above we have
0(0, S) =
R* -
S so the aim is to show how to compute 0(0, S).
Since

0(P,0) = P U {jjlX : R X -

P) =
P U 0 =
P,

it follows that 0(P, 0) = P. We can also obtain an expression for 0(P, <5), arguing:

0(P, <?)
=
{definition}
P U (fj,X : Q \J (R X P)) ¦ -

=
{subtraction}
P U (jjlX : Q U (R X P Q)) ¦ - -

{rolling with <f>X Q U X and $X = =


(R X -

P -

Q)}
P U Q U (pX : R (Q U X) P ¦

Q) - -

{subtraction}
P U Q U (ftX : (R Q P Q) U (R¦ - - ¦
X -

P -

Q))
=
{definition of 0}
0(PU Q,QU(R- Q P Q)). - -

Summarising, we have shown that

0(0,5) =
R*S

0(P,0) =
P

6(P,Q) =
0(Pl)Q,RQ-P-Q)).
These three equations can be used in a recursive method for computing R* ¦
S:
compute 0(0,5), where

if Q =
0
0(P,Q)
| e(puQ,RQ- P
-

Q), otherwise.

The algorithm will not terminate unless, regarded as a set of pairs, R* S is a finite
relation; under this restriction the algorithm will terminate because the size of P
6.7 J Closure 161

increases at each recursive step. If it does terminate, then the three properties of 6
given above guarantee that the result is R* S.

The above algorithm is non-standard in that relations appear as data objects. But
we can algorithm to use sets as data rather than relations. Think of
rewrite the
the relations P, Q and S as being elements (that is, having type A <- 1, where
R : A <- A is given) and let p AP, q KQ and s AS. Then p, q and s are the
= = =

corresponding elements of type PA <- 1. By applying A to everything in sight, and

recalling that A(R* S) E(R*) AS, we obtain the following method for computing
=

E(R*)(s): compute close (®, s), where

^
~~

'
close (p U g, (ER) q p q), otherwise.

In this algorithm the operations are set-theoretic rather than relational; thus U is
set union and (-) is set difference. As before, the algorithm is not guaranteed to
terminate unless the closure of s under R is a finite set.

Exercises

6.31 Justify the alternative definition (6.8) of closure.

6.32 Show that R S* =


(fiX : R U (X S)) and S* R =
(fiX : R U (S X)).
6.33 Show that R* =
([zrf, R]) ([id, id])0, where the intermediate datatype is

iterate A ::= once A again (iterate A).

6.34 Give a catamorphism chainR on non-empty lists so that R* =


head chainR0.

6.35 The fi-calculus. There are just two defining properties of (fiX : (j)X):

(/>(fjiX : </)X) =
(fiX : </)X)
<f)Y<ZY => (^X:<t)X)CY.
The first one states that (fiX : (j)X) is a fixed point of 0, and the second one states
that (fiX (j)X)
: is a lower bound on all fixed points. Use these two rules to give a

proof of the rolling rule

(jiX-.^X)) =
<KnX:rl;(4,X)).
The diagonal rule of the ^/-calculus states that

{VLX:iiY:<l>{X,Y)) =
(pX : <t>{X,X)).
Prove the diagonal rule.
162 6 J Recursive Programs

Finally, the substitution rule states that

(fiX :<!>(ȴ :tJ,(X,Y))) =


4>(»X : 4>(<f>X, X))
The proof of the substitution rule is a simple combination of the preceding two
rules. What is it?

6.36 Using the diagonal rule of the ^/-calculus, show that

(Rusy =
r*-(s-r*)*

6.37 Using the preceding exercise, show that for any coreflexive C

RC=C =» /T =
(#-~C)*,
where ~C is defined in Exercise 5.17.

Bibliographical remarks

The first account of the Knaster-Tarski fixed point theorem occurs in (Knaster
1928). That version
only applied to powersets, and the generalisation to arbitrary
complete lattices was published in (Tarski 1955). The use of this theorem (and of
Kleene's celebrated result) are all-pervading in computing science. A much more
in-depth account of the relevant theory can be found in (Davey and Priestley 1990).
The term hylomorphism was coined in (Meijer 1992). According to Meijer, the
terminology is inspired by the Aristotelian philosophy that form and matter are

one, vXoa meaning 'dust' or 'matter'. The original proof that hylomorphisms can

be characterised as least solutions of certain equations in the relational calculus


is due to Backhouse and his colleagues (Aarts, Backhouse, Hoogendijk, Voermans,
and Van der Woude 1992). In (Takano and Meijer 1995), hylomorphisms are used
to explain some important optimisations in functional programming.

Results about unique solutions to recursion equations can be found in most


introductory text books on set theory, e.g. (Enderton 1977). These expositions do not
parameterise the recursion scheme by datatype constructors, because that requires
a categorical view of datatypes. The systematic exploration in a categorical setting

was initiated by (Mikkelsen 1976), and further elaborated in (Brook 1977). The

account given here is based on (Doornbos and Backhouse 1995), which contains proofs
of the quoted results, as well as various techniques for establishing that a relation is
inductive. The calculus of least fixed points, partly developed in the exercises, has
its roots in (De Bakker and De Roever 1973) in the context of relations. A more

modern account can be found in (Mathematics of Program Construction Group


1995). The concept of membership appears to be original work by De Moor, in
Bibliographical remarks 163

collaboration with Hoogendijk. An application appears in (Bird, Hoogendijk, and


De Moor 1996), and a full account can be found in (Hoogendijk 1996).

The idea of using function converses in program specification and synthesis was
originally suggested by (Dijkstra 1979), and has since been elaborated by various
authors (Chen and Udding 1990; Harrison and Khoshnevisan 1992; Gries 1981). Our
own interest in the topic was revived by reading (Augusteijn 1992; Schoenmakers

1992; Knapen 1993), and this led to the statement of Theorem 6.4. Indeed, this
theorem seems to be at the heart of several of the references cited above.

The idea that algorithms can be classified through their synthesis is fundamental
to this and it is
book, a recurring theme in the literature on formal program

development. Clark and Darlington first illustrated the idea by a classification of sorting
algorithms (Darlington 1978; Clark and Darlington 1980), and the exposition given
here is inspired by that pioneering work. An even more impressive classification of
parsing algorithms was undertaken by (Partsch 1986); in (Bird and De Moor 1994)
we have attempted to improve over a tiny portion of Partsch's results using the

framework of this book.


Chapter 7

Optimisation Problems

In the remaining four chapters we concentrate on a single class of problems; the


aim is todevelop a useful body of results for solving such problems efficiently. The
problems are those that can be specified in the form

minR-A(([S])'<in0)'
This asks for a minimum element under the relation R in the set of results returned
by ahylomorphism. Problems of this form will be referred to as optimisation
problems.

Formalising a programming problem as one of optimisation is attractive because


the specification is short, the idiom is widely applicable, and there are a number
of well-known strategies for arriving at efficient solutions. We will study two such
strategies in some depth: the greedy method, and dynamic programming.

The present chapter and Chapter 10 deal with greedy algorithms, while Chapters 8
and 9 are concerned with dynamic programming. This chapter and Chapter 8
consider a restricted class of
optimisation problem in which T is the initial algebra
of the intermediate
datatype of the hylomorphism, so the problems take the form
min R A([5]). Chapters 9 and 10 deal with the general case.

The central result of this chapter is Theorem 7.2, which gives a simple condition
under which an optimum result be computed by computing an optimum partial
can

result at each stage. The theoretical material is followed by three applications;


each application ends witha functional program, written in Gofer, that solves the

problem. The same format (specification, derivation, program) is followed for each
optimisation problem that we solve in the remainder of the book.

We begin by defining the relation min R formally and establishing its properties.
Some proofs illustrate the interaction of division, membership and power-transpose,
while others show the occasional need to bring in tabulations; many are left as
instructive exercises in the relational calculus.
166 7 J Optimisation Problems

7.1 Minimum and maximum

For any relation R : A <- A the relation min R : A <- PA is defined by

minR =
G fl {R/3).
In words, a is a minimum element of x under R if a is both an element of x and a

lower bound of under R. The definition of min R does not require that R be a
x

preorder, but it is only really useful when such is the case. The definition of min R
can be phrased as the universal property

X C min R =
ICG and I-3CiJ

for all X : A <- A. We can also define

max R =
minR°,

so a maximum element under R is a minimum element under R°.

The following three properties of lower bounds are easy consequences of the fact
that (R/S)-f =
R/(f°-S):
(R/3)-t =
R (7.1)
(R/3)-AS =
R/S° (7.2)
(R/3)' union =
{R/3)/3. (7.3)
Prom (7.1) and (7.2) we obtain

min Rr =
idnR (7.4)
minR-AS =
Sn(R/S°). (7.5)
Equation (7.4) gives that R is reflexive if and only if the minimum element under
R of a singleton set is its sole inhabitant. Equation (7.5) can be rephrased as the
universal property

X C minR-AS =
X C S and X 5° C R.

This rule is used frequently and is indicated in calculations by the hint universal
property of min.

Another useful rule is the following one:

minR-AS =
min(Rf)(S 5°)) AS. (7.6)
For the proof we argue as follows:
7.11 Minimum and maximum 167

min(Rn(S- S°)) AS
=
{(7-5)}
s n ((Rn(s-s°))/s°)
=
{division}
5 n (R/s°) n ((5 s°)/s°)
=
{commutativity of meet, and S C (S S°)/S°}
¦

s n (R/s°)
=
{(7-5)}
min R AS.

Equation (7.6) allows us to bring in context into an optimisation problem. It states


that for the purpose of taking a minimum under R on sets returned by A5, it
is sufficient to constrain R to those values that are related to one and the same
element by S. This context condition can be helpful in the task of checking the
conditions we need to hold in order to solve an optimisation problem in a particular

way. Below, we will refer to uses of (7.6) by the hint context.

Fusion with the power functor

Since ES =
A(S G), equation (7.5) leads to:

min RES =
(5 G) H (R/(S G)°). (7.7)
One application of (7.7) is the following result, which shows how to shunt a function
through a minimum:

minR-Pf =
f min(f° R •/). (7.8)
We reason:

min R P/
=
{(7.7) and E P on functions}
=

(/•e)n (*/(/• e)°)


=
{converse; division; / a function}
(f.G)n((R.f)/3)
=
{modular law, / simple}
f-(en(f>.(R.f)/B))
=
{division}
/•(en ((f°.R-f)/3)
168 7 j Optimisation Problems

=
{definition of ram}
f.min(f° •/?•/).
As an intermediate step in the above proof, we showed that

minR-Pf =
(/•€) n ((R-f)/3).
This suggests the truth of

min RPS =
(S G) n ((R S)/3). (7.9)
Equation (7.9) does in fact hold provided that R is reflexive. In one direction the
proof involves tabulations. The other half, namely,
min RPS C (5 G) H ((/? S)/3), (7.10)
is easier and is all we will need later on. For the proof, observe that by the universal
property of meet, (7.10) is equivalent to

min RPS C Se and min R PS 3 C R S.

We argue in two lines, using the naturality of G:

min R- PS C G PS C S G
ram RPS 3 C ram R-3-S C i? 5.

Inclusion (7.10) is referred to subsequently as fusion with the power functor.

Distribution over union

Given a collection of non-empty sets, one can select a minimum of the union by
selecting a minimum element in each collection and then taking a minimum of the

set of minimums. Since a minimum of the empty set is not defined, the procedure
breaks down if any set in the collection is empty, which is why we have inclusion
rather than equality in:

ram R P(ram R) C min R union. (7-H)


Inclusion (7.11) only holds if R is a preorder. Under the same assumption we can

strengthen (7.11) to read


ram R -

P(min R) = min R union P(dom (min R)). (7.12)


The proof of (7.11) is straightforward using (7.5) and the fact that G : id <-^> P. We
leave the details as an exercise. The direction C in (7.12) is also easy, using
min R =
min R dom (min R).
7.1 j Minimum and maximum 169

Using fusion with the power functor, the other half, namely

min R union P(dom (min R)) C min R P(min /?),


follows from the two inclusions

min R union P(dom (min R)) C min R G

min R union P(dom (min R)) -3 C R ram Z2.

The proofs are left as exercises.

Implementing min

We cannot refine min R to an implementable function except on non-empty finite


sets; even then we require R to be a connected preorder. Given setify : PA<-list* A,

the specification of minlist R : A<— list* A reads:

minlist R C min R setify.

Assuming R is connected, we can take minlist R =


([id, bminR]), where bminR
(short for 'binary minimum') is defined by

bminR (a, b) =
(aRb —> a, b).
The function minlist R chooses the leftmost minimum element in the case of ties.
In the Appendix, minlist is defined as a function that takes as argument a Boolean
function of type Bool <- A x A.

Exercises

7.1 Prove that (R/3) subset0 =


R/3, where subset =
e\e.
7.2 Prove that subset ER =
e\(R G).
7.3 Prove that (R-S)/T =
(R/3)-((3-S)/T) by rewriting R in the form (e AR°)°.
7.4 Prove that (R/b) PS =
(R- S)/3. (Hint: Exercises 7.1, 7.2, and 7.3 will be
useful, as well as the fact that Inc : P <-
E.)
7.5 Show that if R is a preorder, then R (R/3) =
R/3.
7.6 Prove that min (RC\S) =
(min R) C\ (min S).
7.7 Prove that G 3 = II. What well-known principle of set theory does this
equation express? Using the result, prove that min R =
G if and only if R II.
=
170 7 J Optimisation Problems

7.8 Prove that if R and S are reflexive, then RC\S° min R (min S)°. (Hint: for
=

the direction C use tabulations, letting (/, g) tabulate RC\ S° and h A(/ U g).) =

7.9 Using the preceding exercise prove that if R is reflexive, then R =


minR 3.

7.10 Suppose that R and S are reflexive. Prove that min R C min S if and only if
RCS.

7.11 Prove that min R is a simple relation if and only if R is anti-symmetric.

7.12 Suppose that R is a preorder. Using Exercise 7.5, show that minR =
en(R-
minR).
7.13 Show that if R is a preorder and S is a function, then RC\(S° S) is a preorder.

7.14 Prove that if R is a preorder, then max R AR = RC\ R°.

7.15 Prove that (min R S, min R T) C ram (R x


R) A(5, T).
-

7.16 Prove that if i? is reflexive and S is a preorder, then minR K(minS) =

min(S\#), where S;R =


Sn(S° => R).
7.17 The supremum operator can be defined in two ways:

supR =
minR-A(R°/e)
supR =
((e\R)/R)°n(R/(e\R)).
Prove that these two definitions are equivalent if R is a preorder.

7.18 One proof of the other half of (7.9) makes use of (7.4). Given (7.4) it suffices
to show

(S G) fl ((R/3) PS) C min R PS.

The proof is a difficult exercise in tabulation.

7.19 The following few exercises, many of which originate from (Bleeker 1994), deal
with minimal elements. Informally, a minimal element of a set x under a relation
R is an element a G x such that for all b G x with bRa we have aRb. The formal
definition is

mnl R =
min (R° => R).
Prove that (R° => R) is reflexive for any /?, but that (R° => R) is not necessarily a

preorder even when R is.

7.20 Prove that min R C mnl R with equality only if R is a connected preorder.

7.21 Is it the case that mnl R C mnl S if R C S7


7.1 J Minimum and maximum 171

7.22 Prove that mnlR-Pf =f mnl (f° R /).


7.23 Prove that mnl R =
G if and only if R is a symmetric relation.

7.24 Express mnl (R + S) in terms of mnl R and mnl S.

7.25 For an equivalence relation Q define class Q by

cZass <5 =
cap (zrf, A<5 G),
where cap returns the intersection of two sets. Informally, class Q takes a set and
returns some equivalence class under Q. Prove that if R is a preorder, then

mnl (#; S) =
mnlS class (R 0 R°) A(mnl R).

7.26 The remaining exercises deal with the notion of a well-bounded preorder. In
set-theoretic terms, a preorder R is well bounded if every non-empty set has a

minimum under R; this translates to

dom (G) =
dom (min R).

Why is a well-bounded preorder necessarily a connected preorder?

As a difficult exercise in tabulations, show that R is well bounded if and only if


R fl -1Z?0, the strict part of R is a well-founded (equivalently, inductive) relation.

7.27 Prove that if R is well bounded, then so is f° R / for all functions /.

7.28 Show that R is well bounded if and only if G C R° min R.

7.29 Using the preceding exercise, show that if R is a well-bounded preorder, then

min R union = min R E(min R).


This result strengthens (7.11). Using this fact, show how minR setify can be
expressed as a catamorphism.

7.30 A relation R is said to be well supported if

dom(e) = dom (mnl R).


Show that well-supportedness is a weaker notion than well-boundedness.

7.31 Prove that if R is a well-supported preorder, then G C R° mnl R.

7.32 Prove that if R is a well-supported preorder, then mnl R union =


mnl R
E(mnlR).
172 7 J Optimisation Problems

7.2 Monotonic algebras


We come now to an important idea that will dominate the remaining chapters. By
definition, an F-algebra S : A <- FA is monotonic on a relation R : A <- A if

SFR C RS.

To illustrate, consider the function plus : Nat <- Nat x Nat. Addition of natural
numbers is monotonic on leq, the normal linear ordering on numbers, a fact we can
express as

plus (leq C leq plus.


x
leq)
At the point level this reads

c=a + bAa<a' A b < b' => c< a' + b'.

When S /, a function, monotonicity


= can be expressed in either of the following
equivalent forms:

fFRf°CR and FRCf°-R-f.

By shunting we also obtain that / is monotonic on R if and only if it is monotonic


on R°. However, none of these equivalences hold for general relations; in particular,
it does not follow that if S is monotonic on /?, then S is also monotonic on R°.

For functions, monotonicity is equivalent to distributivity. We say that / : A <- FA


distributes over R if

f-F(minR) C minR A(/ Fe).


For example, the pointwise version of the fact that + distributes over < is

min x + min y =
min{a + b a G x Ab G y},
provided that x and y are non-empty. Here min = min leq.

Theorem 7.1 Function / is monotonic over R if and only if it distributes over R.

Proof. We argue:

/ F(min R) C min R
A(/ Fg)
{universal property of
min}
=

f F(min R)Cf-Fe and / F(min /?)•(/• Fg)° C R

=
{since min R C e}
f-F(minR)-(f-Fe)°CR
7.2 J Monotonic algebras 173

{converse; relators}
f'F(minR-3)-f° CR
{since min R 9 R if =
R is reflexive}
f.FR.f°CR.

In this chapter the main result about monotonicity is the following, which we will
refer to subsequently as the greedy theorem.

Theorem 7.2 If S is monotonic on a preorder i?°, then

([min R-AS]) C min R A([S]).


Proof. We reason:

([min R-AS])C min R A([S])


=
{universal property of min}
([min R-ASfyC ([S]) and ([min R-AS]}- ([S])° C i?

=
{since min R AS C5}
|ro i? ASD ([Sf C i?
«£= {hylomorphism theorem (see below)}
mmR'AS'FR'S0 CR
<= {monotonicity: FR S° C S° /?}
mm i? AS S° i? C i?

<= {since mm i? AS C R/S°\ division}


RRCR
=
{transitivity of R}
true.

Recall that the hylomorphism theorem (Theorem 6.2) expressed a hylomorphism


as aleast fixed point of a certain recursion equation; thus by Knaster-Tarski, the
hylomorphism ([min R AS]) ([SJ)° is included in R if R satisfies the associated
recursion inequation.

For an alternative formulation of the greedy theorem see Exercise 7.37. For problems
involving rather than min, the relevant condition of the greedy theorem is that
max

S should be monotonic on Z2, not R°. Note also that we can always bring in context
if we need to, and show that S is monotonic on R° C\ (([S]) ([S])0).
174 7 J Optimisation Problems

The exercises given below explore some simple consequences of the greedy theorem.
In the remainder of this chapter we will look at three other problems, each chosen
to bring out a different aspect of the theory.

Exercises

7.33 Express the fact that a +b < a + b' implies that b < V in a point-free manner.
7.34 Let a be the initial F-algebra. Prove that if a is monotonic on R, then R is
reflexive.

7.35 Sometimes we want monotonicity and distributivity to hold only on the set of
values returned by a relation. Finda suitably weakened definition of monotonicity

that implies

f-F(minR'AS) C minR A(/ FS).

7.36 Use the preceding exercise to give a necessary, as well as a sufficient, condition
for establishing the conclusion of the greedy theorem.
7.37 Prove the following variation of the greedy theorem: if / is monotonic on R
and / C min R AS, then ([/]) C min R A([5]).
7.38 Prove that if S is monotonic on R°, then min R AS min (FR) C min R E5.

7.39 The function takewhile of functional programming can be specified by

takewhilep = max R A(list p prefix),


where R =
words, takewhilepx returns the longest prefix of
length0 leq• length. In
x satisfy p. (Question: why the longest here
with the property that all its elements
rather than a longest?) Using prefix = ([nil, cons U nil]) and the greedy theorem,
derive the standard implementation of takewhile.

7.40 The maximum segment sum problem (Gries 1984, 1990b) is specified by
mss =
max A{sum segment),
where max is an abbreviation for max leq. Using segment =
prefix suffix, express
this problem in the form

mss =
max P(max A(sum prefix)) Asuffix.
Express prefix as a catamorphism on cons-lists, and use fusion to express sum-prefix
as a catamorphism. Hence use the greedy theorem to show that

([zero, oplus]) C max A(sum prefix),


7.3 J Planning a company party 175

where oplus = max A(zero U plus). Finally, express list ([c,f]) tails as a catamor-

phism and hence show how to implement mss by a linear-time algorithm.


7.41 The function be
specified by filter p
filter can max R A(list p subseq). In
=

words, filter px returns thelongest subsequence of x with the property that all its
elements satisfy p. (Question: again, why the rather than a longest subsequence?)
Using subseq ([nil, cons U outr]) and the greedy theorem, derive the standard
=

program for filter.


7.42 Let L denote the normal lexical (i.e. dictionary) ordering on sequences. Justify
the monotonicity condition

cons -

(id x L) C L cons.

Hence show that ([nil, max L A(cons U outr)]) C max L Asubseq.

Now justify the facts that: (i) a lexically largest subsequence of a given sequence is
necessarily a descending sequence; and (ii) if x is descending and a > headx, then
[a] -H- x is lexically larger than x. Use point-free versions of these facts to prove
(formally!) that

([nil, (ok -» cons, outr)]) =


([nil, max L A(cons U outr)]),
where ok holds for (a,x) if x =
[] or a > headx. Give an example to show

(ok -» cons, outr) ^ max L A(cons U outr).

7.3 Planning a company party

The following problem appears as an exercise in (Cormen, Leiserson, and Rivest


1990) in their chapter on dynamic programming:

Professor McKenzie is consulting for the president of the A.-B.


Corporation,which is planning a company party. The company has a
hierarchical structure; that is, the supervisor relation forms a tree rooted
at the president. The personnel office has ranked each employee with a

conviviality rating, which is a real number. In order to make the party


fun for all attendees, the president does not want both an employee and
his or her immediate supervisor to attend.

a. Describe an algorithm to make up the guest list. The goal should be


to maximise the sum of the conviviality ratings of the guests. Analyze
the running time of your algorithm.
b. How can the professor ensure that the president gets invited to his
or her own party?
176 7 j Optimisation Problems

We will solve this problem greedy algorithm. The moral of the exercise
with a

is that our classification of what is greedy algorithm can include problems that
a

others might view as applications of dynamic programming.

The company structure is given by a tree of type tree Employee, where

tree A ::= node (A, list (tree A)).


The base functor is F(A, B) = A x list B. Given party : list A <- tree A, our problem
is to compute max R Aparty, where

R =
(sum -
list rating)0 leq (sum list rating),

and rating : Real <- Employee is the conviviality function for individual employees.

We can define party : list A <- tree A in terms of a catamorphism that produces two

parties, one that includes the root and one that excludes it:

party =
choose ([(include, exclude)]).
The relation choose is defined by choose outlUoutr. The relation include includes
=

the root of the tree, so by the president's ruling the roots of the immediate subtrees
have to be excluded. The relation exclude excludes the root, so we have an arbitrary
choice between including or excluding the roots of the immediate subtrees. The
formal definitions are:

include = cons -

(id x (concat list outr))


exclude = outr -

(id x (concat list choose)).


Note that include is a function but exclude is not.

Derivation

The derivation involves two appeals to monotonicity, both of which we will justify
afterwards.

We argue:

max R -

hparty
=
{definition of party}
max R A(choose
-

([(include, exclude)]))
{since A(X- Y) EX-AY} =

max R Echoose A([(include, exclude)])


-
7.3 / Planning a company party 177

2 {claim: Exercise 7.38 is


applicable}
max R -
Achoose
(R R) A([(incWe, exclude}])
max x

D
{claim: the greedy theorem is applicable}
max R Achoose ([max (R x R) A{include, exclude)]}.
-

The first claim requires us to show that choose is monotonic on R, that is,

choose -

(R x R) C R choose.

The proof is left as a simple exercise. The second claim requires us to show that

(include, exclude) is monotonic on R x R, that is,

(include, exclude) (id x Zis£ (# x #)) C (R x R) (include, exclude).


-

To justify this we argue:

(include, exclude) (id x Zis£ (# x R))


=
{products}
(include (id x Zis£ (# x R)), exclude (id x list (R x #)))
C {claims}
(R include, R exclude)
-

=
{products}
(R x R) (include, exclude).
We outline the proof of one subclaim and leave the other as an exercise. We argue:

include (id x list (R x #))


=
{definition of include; functors}
cons (id x concat list (outr (# x R)))
-

C
{products; functors}
cons (id x concat Zis£ # Zis£ oi/fr*)
C {claim: concat Zis£ R C R- concat (exercise)}
cons (id x R- concat Zis£ oi/^r)
-

C
{since cons is monotonic on R (exercise)}
R cons (id x concat list outr)
- -

=
{definition of include}
R -
include.
178 7 / Optimisation Problems

It remains to refine max (R x R)-K(include, exclude) to a function. By Exercise 7.15


this expression is refined by

(max R Kinclude, max R Aexclude).


-

Since include is a function, the first term simplifies to include. We will leave it as

a simple exercise to show that the second term refines to

concat -
list (max R Kchoose) outr.

In summary, we have derived (renaming exclude)

party = max R Kchoose


([(include, exclude)]}
include = cons (id (concat list outr))
-
x

exclude =
concat list (max R Kchoose) outr.

The program

For efficiency, a list x is represented by the pair (x, sum (list rating x)). The relation
max R choose is refined to the standard function bmax R that chooses the left-hand
-

argument in the case of ties. All functions not defined in the following Gofer program
appear in the list of standard functions given in the Appendix. (Actually, bmax r
is a function, but
standard is given here for clarity.) Employees are identified by
their conviviality rating:

>
party
=
bmax r . treecata (pair (include, exclude))
> include =
cons' . cross (id, concat' . list outr)
> exclude =
concat' . list (bmax r) . outr

> cons' =
cross (cons, plus) .
dupl
> concat' =
cross (concat, sum) .
unzip

> r =
leq . cross (outr, outr)
> bmax r =
cond (r .
swap) (outl, outr)

> data Tree =


Node (Int, [Tree])
> treecata f (Node (a,ts)) f (a, list (treecata f) ts)
=

Exercises

7,43 Supply the missing proofs in the derivation.


7.4 / Shortest paths on a cylinder 179

7.44 Answer the remaining questions in the problem, namely (i) what is the running
time of the algorithm; and (ii) how can the professor ensure that the president gets
invited to his or her own party?

7.4 Shortest paths on a cylinder


The following problem is taken from (Reingold, Nievergelt, and Deo 1977), but is
rephrased slightly to avoid drawing a cylinder in WFfft:

Consider annxm array of positive integers, rolled into a cylinder around


a horizontal axis. For instance, the array

11 53 34 73 18 53 99 52 31 54
4 72 24 6 46 17 63 82 89 25
—> 67 22 10 97 99 64 33 45 81 76 —>

24 71 46 62 18 11 54 40 17 51
99 8 57 76 7 51 90 92 51 21

is rolled intoa cylinder by taking the top and bottom rows to be

adjacent. A
path is to be threaded from the entry side of the cylinder
to the exit side, subject to the restriction that from a given square it
is possible to go to only one of the three positions in the next column
adjacent to the current position. The path may begin at any position
on the entry side and may end at any position on the exit side. The cost

of such a path is the sum of the integers in the squares through which
it passes. Thus the cost of the sample path shown above (in boldface)
is 429. Show how the dynamic programming approach to exhaustive
search allows a path of least cost to be found in 0(n x m) time.

Once again this exercise in dynamic programming is solvable


by the methods given
in this chapter, although it is Theorem 7.1 rather than the greedy theorem that
is the crux. The other feature of interest is that the specification is motivated by
paying due attention to types.

We will suppose that the input is represented as a non-empty cons-list of n-tuples,


onetuple for each column of the array. Let F denote the base functor of non-empty
cons-lists, so F(A,X) = A + (^4 x
X), and let L be a convenient abbreviation for
the type functor list*. Finally, let N denote the functor that sends A to the set of
n-tuples over A. In the final program, n-tuples are represented by lists of length n.

Our is to compute min R paths, where R


problem sum° leq sum and paths is a
=

relation with type paths : PL Nat <— LN Nat. Because of the restriction on moves it
is not possible to define paths by the power transpose of a relational catamorphism,
180 7 J Optimisation Problems

so that, strictly speaking, the problem does not fall within the class described at
the outset of the chapter. Instead, we will define it in terms of a relation

generate : NPL4«- F(I\U, NPL4).


In words, generate takes a new tuple and a tuple of sets of paths, and produces
a tuple of sets of extended paths. Thus, the catamorphism ([generate]) returns an
n-tuple of sets of paths; the set associated with the fcth component of the tuple is
the set of valid paths that can start in component k of the first column. We can
now define

paths = union setify ([generate]),

where setify : PA <— NA converts an n-tuple into a set of its components.

Note that the type assigned to generate is parameterised by A; the restriction


A Nat is required only for comparing paths under the sum
=
ordering. Accordingly,
generate will be a lax natural transformation. Recall from Section 5.7 that this
means

NPLR generate D generate F(N#, NPLR)

for any relation R. To define generate we will need a number of other lax natural
transformations of different types; what follows is an attempt to motivate their
introduction.

First ofall, it is clear that we have to take into account the restriction on moves in
generating legal paths. The relation moves : PNA <- NA is defined by

moves x =
{upx^x^downx},
where up and down rotate columns:

up(ai,a2,...1an) =
(an, ai,..., an-i)
down (oi, 02,..., an) (02, a$,..., an, a\).
These functions are easily implemented when tuples are represented by lists. The
relation F(id, moves) has type

F(I\L4,PNPL4) <- F(I\L4, NPL4),


and we will define generate =
S F(id, moves) for an appropriate relation S.

The next step is to make use of a function trans : NP^4 <- PNA that transposes a

set of n-tuples. For example,

trans{(a, 6, c), (re, y, z)} =


({a, re}, {6, y}, {c, z}).
7.4 J Shortest paths on a cylinder 181

In the final program, when sets and n-tuples are both represented by lists, trans
will be implemented by a catamorphism of type LLA <— LLA.

The relation F(id, trans moves) has type

F(I\W, NPPL4) «- F(I\W, NPL4),


and so F(id, Uunion trans moves) has type

F(I\L4,NPL4) «- F(UA,UPLA).
We nowhave generate =
S F(id, N union £rans moves) for an appropriately chosen
relation 5.

The next step is to make use of a function zip : NF(^4, B) <- F(N^4, UB) that
commutes N with F. In the final program zip is replaced by the standard function on

lists. The relation zip F(id, Uunion trans moves) has type

UF(A,PLA) <- F(I\L4,NPL4),


so now we have generate =
S zip-F(id, Uunion trans moves) for an appropriate
relation 5.

The next step is to make use of the function cp : PF(A, B) <- F(A, PB), defined by
cp AF(id, g). The relation Ucp zip F(id, Numon trans moves) has type
=

NPF(4,L4) «- F(I\L4,NPL4),
so generate =
5 Ncp zip F(id, Nunion trans moves) for some relation 5.

Finally, we bring in a : LA
F(A, LA), the<- initial algebra of non-empty cons-lists.
The relation N(Pa cp) zip F(id, Uunion trans moves) has type

NPL4 «- F(I\L4,NPL4),
and is the definition of generate.

The above typing information is summarised in the diagram

NpL4 generate f{HAHpLA)


U(Pa cp) F(id, Uunion trans moves)
UF(A,PLA) +
F(UA,UPLA)
zip

We have motivated the definition of generate by following the arrows, but one can

also work backwards.


182 7 J Optimisation Problems

Derivation

The derivation that follows relies heavily on the fact that all the above functions
and relations lax natural transformations of appropriate type. The monotonicity
are

condition is that a is monotonic on R and is easy to verify. Since a is a function,


Theorem 7.1 gives us that a distributes over R. Since A(a F(id, G)) = Pa cp, we

therefore obtain the inclusion

a
F(id, min R) C min RPa- cp. (7.13)
Armed with this fact, we calculate:

min R paths
=
{definition of paths}
min R union setify ([generate])
D {distribution over union (7.11), since R is a preorder}
min R P(min R) setify ([generate])
2 {naturality of setify}
min R setify N(ram R) ([generate])
D
{fusion (see below for definition of Q)}
minR setify •([<?]).

The condition for fusion is

H(min R) generate D Q F(id, N(mm #)),

and we can use this to derive a definition of Q:

N(min R) generate
=
{definition of generate}
N(mm R-Pa- cp) zip F(id, N union trans moves)
D
{(7.13); functors}
Na NF(id, min R)) 2ip F(id, N i/mon £rans moves)
D
{naturality of zip}
Na zip F(Nid, N(rain #)) F(id, N union £rans moves)
-

=
{functors}
Na zip F(irf, N(rain # union) £rans moves)
D
{distribution over union (7.11)}
Na zip F(id, N(mm # P(rain R)) £rans moves)
=
{functors}
Na zip F(id, N(mm #) NP(ram R) £rans moves)
7.4 / Shortest paths on a cylinder 183

2 {naturality of trans}
Na zip F(id, N(ram R) trans PN(ram #) moves)
D {naturality of moves}
Na zip F(id, N(ram #) £rans moves N(ram #))
=
{functors, introducing Q}
Q-F(id,N(minR)),
where Q =
Ua- zip -

F(id, N(ram #) trans moves).


The definition ofQ can be simplified. When F is the base functor of non-empty
cons-lists, zip is
coproduct zip id + zip', where zip1
a = :
N(A x B) <— NA x Ni?, so
we can write Q as a coproduct

Q =
[Nwrap, Neons zip1 (id x
N(mm #) trans moves)].
With this definition of Q, the solution is min R setify •([<?]).

The program

In the following Gofer program we replace both N and P by list, thereby


representing both tuples and sets by (non-empty) lists. The function zip' is then implemented
by the standard function zip. For efficiency, a path x is represented by the pair
(x, sumx). The relation minR setify is implemented by the standard function
miniist r, whose definition is given in the Appendix. The function catallist

implements catamorphisms on non-empty cons-lists; its definition is also given in


the Appendix.

With that, the Gofer program is:

>
path =
minlist r . catallist (list wrap', list cons' step) .

>
step
=
zip . cross (id, (minlist r)
list trans moves
. .

> r =
leq . cross (outr, outr)

>
wrap' =
pair (wrap, id)
> cons' =
cross (cons, plus) .
dupl

> moves x =
[up x, x, down x]
>
up x =
tail x ++ [head x]
> down x =
[last x] ++ init x
184 7 J Optimisation Problems

Exercises

7.45 Did we use the fact that N was a relator in the derivation?

7.46 What change is needed to deal with a similar problem in which

moves x =
{up (up x), up x,x, down x, down (down x)}?

7.47 What if we took moves = r?

7.5 The security van problem


Our final problem illustrates an important idea in the theory of greedy algorithms:
when the desired monotonicity condition is not met, it may nevertheless still be
possible to arrive at a greedy solution by refining the ordering.

The following problem, invented by Hans Zantema, is typical of the sort that can

be specified using the idea of partitioning a list:

Suppose a bank has a known sequence of deposits and withdrawals.


For security reasons the total amount of cash in the bank should never
exceed some fixed amount AT, assumed to be at least as large as any
single transaction. To cope with demand and supply, a security van can
be called upon to deliver funds to the bank or to take away a surplus.
The problem is to compute a schedule under which the van visits the
bank a minimum number of times.

Let us call a sequence [oi, 02,..., an) of transactions secure if there is an amount

r, indicating the bank's reserves at the beginning of the sequence of transactions,


such that each of the sums

r, r + ai, r + ai + 02, ...,r + aiH (-an

lies between zero and N. For example, taking N =


10, the sequence [2, —5,7] is
secure because the take away or deliver enough cash to ensure an initial
van can

reserve of between three and six units. Given the constraint that N is no smaller
than any single transaction, every singleton sequence is secure, so a valid schedule
certainly exists.

To formalise the constraint, define

ceiling = max leq A(sum prefix)


floor =
min leq A(sum prefix), -
7.5 J The security van problem 185

where sum : Nat <- list Nat sums a list of numbers and prefix is the prefix relation
on non-empty lists. Then a sequence x of transactions is secure if there is an r > 0

such that

0 < r + floor x < N and 0 < r + ceiling x < N.

We leave it as a short exercise to show that this condition can be phrased in the
equivalent form

bmax (ceiling x, ceiling x floor x) < N.

Let secure be the coreflexive corresponding to this predicate. It is a simple


consequence of the definition that if secure holds for a sequence x, then it also holds for
an arbitrary prefix of x\ in symbols,

prefix -
secure C secure prefix.

A coreflexive satisfying this property is called prefix-closed. For most of the


prefix-closure is the only property of secure that we will need. At the end, and
derivation
only to obtain an efficient implementation of the greedy algorithm, we will use the
less obvious fact that secure is also suffix-closed: if x is secure, then any suffix of x

is secure.

Our problem can now be expressed as one of computing

min R A(list secure partition),


where R =
length0 leq length and partition : list (list* A) <— list A is the
combinatorial relation discussed in Section 5.6.

Recall that one expression for partition is

partition =
([nil, new U glue]),
where

new = cons
(wrap x id)
glue =
cons (cons x id) assocl -

(id x cons°).

Appeal to fusion (left as an exercise) shows that

list secure partition =


([nil, new U old]),
where

old =
cons -

((secure cons) x id) assocl (id x cons°),


so the task is to compute min R A([nil, new U old]) efficiently.
186 7 J Optimisation Problems

Derivation

A greedy algorithm exists if [nil, new U old] is monotonic on R°. The monotonicity
condition is equivalent to two conditions:

new
(id x R°) C R° (new U old) (7.14)
old (id x R°) C R° (new U o/d). (7.15)
Well, (7.14) is true but (7.15) is false.

To prove (7.14) we reason:

new (id x R°)


=
{definition of new}
cons (wrap x R°)
C {since cons is monotonic on i?° (exercise)}
R° cons (wrap x id)
-

=
{definition of new}
R° new

C {monotonicity of join}
R° (new U o/d).

To see why (7.15) is false* let [x] -tf xs and [y] -tf t/s be two equal-length partitions
of the same sequence, so, certainly,

([x]-U-xs)R°([y]-U- ys).
Suppose also that [a] -tf x is secure. Then (7.15) states that one or other of the
following two possibilities must hold:

(i) ([[a] 4f x] 4f xs) R° ([[a]] 4f [y] 4f ys)


(ii) ([[a] -tf x] -tf a*) R° ([[a] -tf £/] -tf ys) and secure ([a] -H-1/).
Since [#]-tf #s and [t/]-tf 2/5 have equal length, the first possibility fails, and the second
reduces to secure ([a] -tf y). But, in general, there is no reason why secure ([a] -tf x)

should imply secure ([a] -tf y).

However, the analysis given above does suggest a way out: if y is a prefix of #, then
secure ([a]-tf x) does imply secure
([a]-W-y) because secure is prefix-closed. Suppose
we refine the order R to R ; H, where

H =
(head0 prefix head) U (nil nil°).
7.5 J The security van problem 187

Recall from Chapter 4 that

R,H =
R n (R° =? H).
In words, [] (R ; H) [] and ([y] -tf 2/5) (R ; #) ([x] -tf xs) if 2/5 is strictly shorter than
xs, or it has the same length and y prefix x. Since R, H C R we can still obtain a

greedy algorithm for our problem if we can show that S is monotonic on (R ; 7/)°
and that ([ram (R ; H) AS]) can be refined to a function. The second task is easy
since old returns a shorter result than new if it returns any result at all; in symbols,
old C (R ; H) new. Hence we obtain

([wrap -

wrap, (ok —>• glue, new)]) C


([rain (R, H) AS]), -

where the coreflexive ok holds on (a, xs) if xs ^ [] and [a] -tf ftead xs is secure.

It remains to show that S is monotonic on (R ; H)°, that is,

new
(id x (i?; H)°) C (jR ; #)° (new U oU) (7.16)
old (id x (R ; J5T)°) C (i?; H)° (new U old). (7.17)
Condition (7.16) follows from the fact that

new (id x II) C #° new. (7.18)


A formal proof of (7.18) is left as an exercise. Using it, we can argue:

new (id x (R ; H)°)


C {since R;H CR}
new (id x ij°)
C {inclusions (7.14) and (7.18)}
(R° new) n (#° new)
=
{since new is a function}
(R° n J5T°) new
C {since X n Y C (X ; F), and converses}
(R ; J5T)° new.
Condition (7.17) follows from three subsidiary claims, in which \R\, the strict part
of R, is defined by \R\ =
Rn ->R°:

old (id x n) C H°-new (7.19)


o/d (id x |#|°) C R° new (7.20)
o/d (id x (i?° n H°)) C (i?° n J5T°) o/d. (7.21)
188 7 / Optimisation Problems

Again, we leave proofs as exercises. Now we argue:

old (id x (R ; H)°)


{since X ; Y \X\ U (X n Y) and \X\° \X°\}
= =

old-(idx(\R°\U(R0nH°)))
=
{distributing join}
(old (id x |fl°|)) U (old (id x (fl° n H°)))
C {conditions (7.19), (7.20) and (7.21)}
((fl° H H°) new) U ((fl° n H°) oW)
C {since ATI Y C X ; Y, and converses}
(R;H)° (new U old).
The greedy condition is established.

Up to this point we have used no property of secure other than the fact that it
is prefix-closed. For the final program we need to implement the security test
efficiently. To do this, recall from Section 5.6 that the prefix relation prefix : list A<-
list A can be defined as a catamorphism

prefix =
([nil, cons U nil]).
Since sum prefix =
([zero, plus U zero]), two baby-sized applications of the greedy
theorem yield:

ceiling =
([zero, omax plus])
floor =
([zero, omin plus]),
where omax a =
bmax (a,0) and omin a = bmin (a,0). Recall that x is secure if

bmax (ceiling x, ceiling x floor x) < N.

Since bmax (b,b c) =


b omin c, we obtain that [a] -tf x is secure if

omax (a + b) -

omin (a + c) < N,

where b =
ceiling x and c =
floor x. This condition implies omax b omin c < N,
so x is secure. This proves that secure is suffix-closed.

In summary, we have derived the following program for computing a valid schedule,
in which schedule is parameterised by N and ok is expressed as a predicate rather
than a coreflexive:

schedule N =
([nil, (ok N -» glue, new)])
okN(a, []) =
false
okN (a, [x] -tf xs) =
omax (a + ceiling x) omin (a + floor x) < N.
7.5 J The security van problem 189

The program

In the final Gofer program we represent the empty partition by ([],(0,0)) and a

partition [x] -tf xs by a pair

([x] -tf xs, (ceiling x, floor x)).


The standard function cond p (f ,g) implements (p -» /,</), and catalist is
the standardcatamorphism former for cons-lists. The function split implements
cons°.

> schedule n =
catalist (start, cond (ok n) (glue', new'))

> ok cond empty (false, (<= n) minus outr glue')


=
n . . .

> where empty null =


outl . . outr
> start =
([], (0,0))
>
glue' =
cross (glue, augment) .
dupl
> where augment cross (omax =
plus, omin plus) dupl . . .

> new' =
cross (new, augment) dupl .

> where augment pair (omax, omin)


=
outl .

>
glue =
cons cross (cons,
.
id) assocl cross (id, split). .

> new =
cons cross (wrap,
.
id)

> omax =
cond (>= 0) (id, zero)
> omin =
cond (<= 0) (id, zero)

Exercises

7.48 Prove that 0 < r + floor x < N and 0 < r + ceiling x < N for some r > 0 if
and only if

bmax (ceiling x, ceiling x floor x) < N.

7.49 Prove formally that prefix secure C secure prefix.

7.50 If x is secure and y is an arbitrary subsequence of re, is it necessarily the case

that y is secure?

7.51 Give details of the appeal to fusion that establishes

list secure partition =


([wrap wrap, new U old]).

7.52 Prove that cons is monotonic on R°.


190 7 J Optimisation Problems

7.53 Justify the claims (7.18), (7.19), (7.20), and (7.21).


7.54 The greedy algorithm produces length partition with a shortest
a minimum
possible first component. This means that the security van may be called upon
before it is absolutely necessary to do so. Sucha schedule might seem curious to

the security van company. Outline how, by switching to snoc-lists, it is possible to


reverse this phenomenon, obtaining a greedy schedule in which later visits are more

frequent than early ones.

7.55 Give details of the 'baby-sized' applications of the greedy theorem to


computing ceiling and floor.

7.56 The paragraph problem is to break a sequence of words into a sequence of


nonempty lines with the aim of forminga Visually pleasing' paragraph. The constraint

is that no line in the paragraph should have a width that exceeds some fixed quantity
W, where the width of a line x is the sum of the lengths of the words in x, plus
some suitable value for the interword spaces. Calling the associated coreflexive fits,

argue that fits is both prefix- and suffix-closed. Why is the following formulation
not a reasonable specification of the problem?:

paragraph C min R A(listfits partition),


where R =
length0 leq length. (Hint: Consider Exercise 7.54.)
7.57 Consider the ordering Q characterised by [] Q ys and

([x] -tf xs) Q([y] -tf ys) =


(xprefix y) A (y prefix x => xs Qys).
One can also define Q more succinctly by

Q =
(nil° •!) U (prefix ; (tail0 Q tail)).
This defines preorder, and a linear order on partitions of the same sequence. Using
a

only secure is prefix-closed, show that both new and old are monotonic
the fact that
on Q°. Although it is not true that Q C #, we nevertheless do have

min Q A([S]) C min R A([5]),


provided we the fact that secure is suffix-closed. In words, although Q is
also use

not a refinement of
R, it is still the case that the (unique) minimum partition under
Q is a minimum partition under R. The proof is a slightly tricky combinatorial
argument. The advantage of taking this Q is that we can replace R by a more
general preorder R cost0 leq cost and establish general properties of cost under
=

which the greedy algorithm works. What are they?


Bibliographical remarks 191

Bibliographical remarks

Our own interest in optimisation problems originated in the calculus of functions


referred to in earlier chapters. That work culminated in a study of greedy algorithms
(Bird 1990, 1991, 1992a, 1992b, 1992c). Jeuring's work also concerns various kinds
of greedy algorithm (Jeuring 1990,1993). A recurring problem with these functional
developments was the inadequate treatment of indeterminate specifications. These
difficulties motivated the generalisation to relations.

The calculus of minimum elements, in the context of categories of relations, was

first explored in (Brook 1977). Most of the ideas found there are also apparent in
earlier work on calculus, for instance (Riguet 1948).
the relational We adapted those
works for applications to
optimisation problems in (De Moor 1992a). Of course, the
definitions in relational calculus are obvious, and have also been applied by others,
see e.g. (Schmidt, Berghammer, and Zierer 1989).
Many researchers have attempted a classification of greedy algorithms before. An
overview can be found in (Korte, Lovasz, and Schrader 1991), which proposes a

mathematical structure called greedoids as a basis for the study of greedy


algorithms. More recently, (Helman, Moret, and Shapiro 1993) have proposed a
refinement of greedoids. Although there are some obvious links to the material presented
in this book, we have not yet investigated the connection in sufficient detail. The
theory of greedoids is much more concerned with structural properties than with
the synthesis of greedy algorithms for given specifications. Also, greedoids can be
characterised by the optimality of the greedy solution for a specific class of cost
functions; no such equivalence is presented here.
Chapter 8

Thinning Algorithms

In this chapter we continue to study problems of the form minR A ([5]). The
greedy theorem of the last chapter gave a rather strong condition under which such
a problem could be solved by maintaining a single partial solution at each stage.

At the other extreme, the Eilenberg-Wright lemma shows that A ([5]) can always
be implemented as a set-valued catamorphism. This leads to an exhaustive search
algorithm in which all possible partial solutions are maintained at each stage.
Between the two extremes of all and one, there is a third possibility: at each stage keep
a representative collection of partial solutions, namely those that might eventually

be extended to an optimal solution. Such algorithms are called thinning algorithms


and are the topic of the present chapter.

8.1 Thinning
Given a relation Q : A <- A, the relation thin Q : PA <- PA is defined by

thinQ =
(e\e)n((3-Q)/3). (8.1)
Informally, thin Q is a nondeterministic mapping that takes a set y, and returns
some subset x of y with the property that all elements of y have a lower bound
under Q in x. To see this, note that x(e\e)y means that a; is a subset of y, and

x((3 Q)/3)y =
(V6 G y : 3a e x :
aQb).
Thus, to thin a set x with thin Q means to reduce the size of x without losing the
possibility of taking a minimum element of x under Q. Unlike the case of min R,
we can implement thin Q when Q is not a connected preorder (see Section
8.3).
Definition (8.1) can be restated as the universal property

X C thinQ -AS =
e X C S and X S° C 3 .

Q,

which, like other universal properties, is often more useful in calculations.


194 8 J Thinning Algorithms

Properties of thinning

It is immediate from the definition that Q C R implies that thin Q C thin R.


Furthermore, it is an easy exercise to show that thin Q is reflexive if Q is reflexive,
and transitive if Q is transitive. We will suppose in what follows that Q is a
preorder, so thin Q is a preorder too.

We can introduce thin into an optimisation problem with the following rule, called
thin-introduction:

min R =
min R thin Q provided that Q C R.

The proof, left as an exercise, depends on the assumption that Q and R are pre-
orders.

We can also eliminate thin from an optimisation problem:

thin Q D r -
min Q, (8.2)
where r : PA<— A returns Q is a connected preorder,
singleton sets. However, unless
the domain of r min Q is smaller than that of thin
Q. For instance, thin id is entire
but the domain of r min id consists only of singleton sets. So, use of (8.2) may
result in an infeasible refinement. At the other extreme, thin Q D id, so thin Q can
always be refined to the identity relation on sets.
There is a useful variant of thin-elimination:

thin QAS D r-minR-KS provided that R n (S S°) C Q. (8.3)


For the proof, observe that by the universal property of thin we have to show

G r min RKS C S
T-minR-hS-S° C 3 Q.

The first inclusion is immediate from G r = id. For the second, we argue:

r min R AS S° C 3 Q
=
{shunting r and G r =
id}
minR-AS-S°CQ
=
{context}
min(Rn(S S°)) AS S° -
C Q
<= {since AS S° C a}
min (Rn(S> S°)) BCQ

<= {definition of min}


Rn(S-S°)CQ.
8.1 J Thinning 195

Finally, it is left as an exercise to prove that thin distributes over union:

thin Q union D union P(thin Q). (8.4)

The basic theorem

The following theorem and corollary show how the use of thinning can be exploited
in solving optimisation problems. Both exhaustive search and the greedy algorithm
follow as special cases. As usual, F is the base functor of the catamorphism.

Theorem 8.1 If 5 is monotonic on Q°, then

([thin Q A(S Fe)D C thin Q Ap]).

Proof. By the universal property of thin we have two conditions to check:

C
([thin Q A(S Fe)D ([5])
([thin Q A(S Fe)D ([55° C 3 Q.

The first is an easy exercise in fusion and, by the hylomorphism theorem, the second
follows if we can show that

thin Q A(S Fe) F(3 Q) S° C a Q.

We reason:

thin Q A(S Fe) F(3 Q) 5°


C {since FQ 5° C S° Q by monotonicity and converses}
thin Q A(S Fe) F3 5° Q
C {since AX 1° C 3}
thin Q-3' Q
C {since thin Q 3 C 3 Q}
^'Q'Q
=
{transitivity of Q}
3Q.
?

The following corollary is immediate on appeal to thin-introduction:

Corollary 8.1 If Q C R and S is monotonic on Q°y then

min R ([thin Q A(S Fe)]) C mm i? A([5]).


196 8 J Thinning Algorithms

Exercises

8.1 Prove that thin id =


id.

8.2 Prove that thin Q is a preorder if Q is.

8.3 Prove that min R D min R thin Q if Q C R.

8.4 Prove that cup {thin R, thin Q) C thin R by showing more generally that

cup
-

(thin R, subset) C thin R,

where subset is the inclusion relation on sets. You will need the inclusion

exe c (e,e)'cup,
so prove that as well. Does equality hold in the original inclusion when Q =
Rl

8.5 Prove that minR = r° thin R and hence prove the thin-elimination rule.

8.6 Prove that thin Q AS =


thin(Q fl (5 S°)) AS.

8.7 Prove (8.4). Is the converse inclusion true?

8.8 Prove that the greedy algorithm is a special case of Theorem 8.1.

8.9 Show that if Q is well-supported (see Exercise 7.30), then A(mnl Q) C thin Q.

8.2 Paths in a layered network


Let us now give a simple illustration of the ideas introduced so far. The example is
similar to the paths on a cylinder problem given in the preceding chapter.

By definition, a layered network is a non-empty sequence of sets of vertices. A path


ina layered network xs [xq, x±, xn] is a sequence of vertices [ao, «ij •»«n]
=
...,

where dj G Xj for 0 < j < n. With each path is associated a cost, defined by

cost [do, ai,..., On] =


(+j : 0 < j < n : wt (a,-, ty+i)),
where wt is some given function on pairs of vertices. We aim to derive an algorithm
for finding a least cost path in a layered network.
To formalise the problem we will use non-empty cons-lists, thereby building paths
from right to left. The choice is dictated solely by reasons of efficiency in the final
functional program, since snoc-lists would have served equally well. Thus the input
is an element of list+(PA). Our problem takes the form

minpath C min R A(list~*~ G),


8.2 J Paths in a layered network 197

where R = cost0 leq cost. Using the definition of list+ as a catamorphism, we

obtain that

C min R
minpath A([a F(e, id)]),
where a =
[wrap, cons] and F is the base bifunctor of non-empty cons-lists.

It remains to define cost. This is not a catamorphism on paths, but we do have

cost =
outr -

([wrapz, consw])
where wrapz =
(wrap, zero) and

consw(a, (x, n)) =


(cons (a, x), wt (a, head x) + n).
Thus ([wrapz, consw]) =
(id, cost).

Derivation

In this example we have S = a


F(e, id). The corollary to the thinning theorem
says that

min R ([thin Q A(a F(e, e))D C ram # A([a F(e, id)])


for any Q C R satisfying

a-F(e,Q°) C Q°.a-F(e,id).
Of course, if we can take Q R, then we can appeal to the greedy theorem,
=

avoiding thinning altogether. To show that we cannot take Q R, suppose p =

and q are two paths in the network [x\,..., xn] with costp > cost q. Then the
monotonicity condition with Q R says that for any set of vertices xq, and any
=

a e xq, there exists & b e xq such that

>
cost ([a] -tfp) cost([b] -tf q).
In particular, this condition should hold when xq {a}, and =
so a = b. Using
cost ([a] -tf p) =
wt (a, headp) + costp, we therefore require

wt (a, head p) wt(a, head q) > cost q cost p.

However, since wt(a,head q) can be arbitrarily large, this condition fails unless
headp head q. On the other hand, if headp
=
head q, then the inequality reduces =

to costp > cost q, which is true by assumption.

It follows that a F(g, id) is monotonic on Q°, where Q =


Rfl (head0 head). Hence

minpath C minR ([thin Q A(a F(g,G))]).


198 8 J Thinning Algorithms

Operationally speaking, the catamorphism on the right maintains a set of partial


solutions, with at least one solution for each starting vertex. But, clearly, only
one partial solution needs to be maintained for each vertex v, namely, a shortest

path beginning with v. This motivates the following calculation, in which the term
thin Q is eliminated:

thin Q A(a F(e, G))


=
{bifunctors}
thin Q A(a F(id, e) F(e, id))
=
{power transpose of composition}
thin Q union PA(a F(id, e)) AF(e, id)
-

D
{thin distributes over union (8.4)}
union P(thin Q A(a F(id, e))) AF(e, id)
D
{thin-elimination (8.3) see below}
-

union P(r min R A(a F(id, e))) AF(e, id)


=
{since union Pr id} =

P(min RA(a- F(id, €))) AF(e, id)


=
{since P E on functions}
=

P(ram # Pa AF (id, €)) AF(e, id)


To justify the appeal to (8.3) we have to show that R n (S 5°) C Q:

i?n(5-5°)c q
<£= {definition of Q}
S S° C ftead° Aeaci
=
{shunting}
(ftead 5) (ftead 5)° C id
<= {since head S C [id, owtf] (exercise), so ftead 5 is simple}
true.

The above derivation is quite general and makes hardly any use of the specific
datatype. For the base bifunctor F of non-empty cons-lists we have

AF(e, id) = id + cpl


min RPa- AF(id, e) =
[wrap, step]
step =
min R Peons cpr,

where the functions cpl and cpr were defined in Section 5.6. Hence, finally, we have

minpath C min R ([Pwrap, Pstep cPty- '


8.3 J Implementing thin 199

The program

In the Gofer program we represent sets by lists in the usual way, and represent a
path p by (p, (headp, costp)). The program is parameterised by the function wt:

>
path =
minlist r . catallist (list wrap', list step .
cpl)
>
step =
minlist r . list cons', cpr
> r =
leq . cross (cost, cost)
> cost =
outr . outr

>
wrap' =
pair (wrap, pair (id, zero))
> cons' =
cross (cons, augment) dupl .

>
augment
=
pair (outl, plus cross (wt, id) . .
assocl)

Exercises

8.10 Can we replace the cons-list bifunctor with one or both of the following bi-
functors?

F(A,B) = A + (Ax(BxB))
F(A,B) = A + (BxB).
What is the interpretation of the generalised layered network problem?

8.11 The derivation above is an instance of the following more general result.
Suppose Q C R and S =
Si -

S2 is monotonic on Q°. Furthermore, suppose


J*n(5i-5i°)C Q. Then

min R <[P(min R ASi) A(52 Fe)D C minR- A([S]).


Prove this result.

8.3 Implementing thin

In the layered network example we were fortunate in that the thinning step could
be eliminated, but most often we have toimplement thinning part of the final
as

algorithm. As with min R we cannot refine thin Q to an implementable function

except when thin Q is applied to finite sets; unlike min R we do not require the sets
to be non-empty, nor that Q be a connected preorder.
The function thinlist Q might be specified by

setify thinlist Q C thin Q setify,


200 8 / Thinning Algorithms

where setify : PA <- list A. However, we want to impose an extra condition upon
thinlist Q, namely that

thinlist Q C subseq.

In words, we want thinlist Q to preserve the relative order of the elements in the
list. The reason for this additional restriction will emerge below.

The idealimplementation of thinlist Q is a linear-time program that produces a

possible result. In particular, when Q is a connected preorder and x is


shortest a

non-empty list, we want

thinlist Qx =
[minlist Q x], (8.5)
where minlist Q was defined in the preceding chapter.

A legitimate, but not useful, implementation is to take thinlist Q id. Another is =

to remove an element from a list if it is 'bumped' by one of its neighbours. This


idea is formalised in the definition

thinlist Q =
([nil, bump Q]),
where

bump Q(a1[]) =
[a]
bump Q (a, [b] -tf x) =
(aQb -> [a] -tf x, bQa -> [b] -tf x, [a] -tf [b] -tf x).
This gives a linear-time algorithm in the number of evaluations of Q, though it is
not always guaranteed to deliver a shortest result. There are other possible choices
for thinlist Q, some of which are explored in the exercises.

Sorting sets

In the main theorem of this section we make use of the idea ofmaintaining a finite
set as a sorted list. We will use a version of sort from Chapter 6, taking

sort P =
ordered P setify0',
where P : A <- A is some connected preorder. Note that sort P is not a function,
even when P is a linear order: for example, sort leq {1,2,3} may produce [1,2,3] or
[1,1,2,3], or any one of a number of similar lists.

We will make use of a number of facts about sort P including

thinlist Q sort P C sort P thin Q. (8.6)


8.3 J Implementing thin 201

For the proof we argue:

thinlist Q sort P
=
{definition of sort
P}
thinlist Q ordered P setify0
C
{claim: thinlist Q ordered P C ordered P thinlist Q}
ordered P thinlist Q setify0
C
{specification of thinlist Q and shunting}
ordered P setify0 £Am Q
=
{definition of sort P}
sort P -
thin Q.
For the claim it is sufficient to show that thinlist Q ordered P C ordered P:

thinlist Q ordered P
C {specification of thinlist Q}
subseq ordered P
C
{since subseq ordered P C ordered P if P is connected}
ordered P.

It is important to note that the choice of P can affect the success of the subsequent
thinning process; ideally, sort P should
bring together elements that comparable
are

under Q. In particular, if Q is connected and we take P Q, = then thinning is


accomplished by simply returning the first element as a singleton list.

There are five other properties about sort P that we will need. Proofs are left as

exercises. The first four are

minlist Q sort P C min Q (8.7)


list f -sort (f° -P./) C sortP-Pf (8.8)
filter-p- sort P C sortP-Ep (8.9)
merge P (sort P)2 C sort P cup. (8.10)
In (8.9) the relation p is assumed to be a coreflexive, and in (8.10) the function
merge P is as defined in Exercise 6.28.

The fifth property deals with an implementation of the general cartesian product
function cp (F) A(Fe) described in Section 5.6. We met the special case cp (list)
=

in the paths in a layered network example. The function cp (F) is a natural


transformation of type PF <— FP, so we are looking for a function listcp (F) with type
list F <— F list. Moreover, we want this function to satisfy the condition
-

C
listcp (F) F(sort P) sort (FP) cp (F). (8.11)
202 8 J Thinning Algorithms

Not every functor F admits an implementation of listcp (F) satisfying (8.11); one
requirement is that F distributes over arbitrary joins. It is left as an exercise to
define listcp (F) for each polynomial functor F. It follows that if F is polynomial
and distributes over arbitrary joins (such a functor is called linear), then (8.11) can
be satisfied. In what follows we will assume that (8.11) can be satisfied.

Inclusions (8.8), (8.9) and (8.11) are used in the proof of the following lemma, which
is required in the theorem to come:

Lemma 8.1 If / is monotonic on R and p is a coreflexive, then

filter p listf listcp (F) F(sort R)


- -
C sort R A(p / Fe).
Proof. The proof is a simple calculation:

sortP-A(p-f-Fe)
=
{A of composition and cp (F) AFe} =

sort P E(p /) cp (F)


=
{E is a functor and agrees with P on functions}
sort P Ep P/ cp (F)

D
{(8.9)}
filter p sort P Pf cp (F)
-

D
{(8.8)}
filterp listf sort (f° P /) cp (F)
D
{since / is monotonic on P}
filterp listf sort(FP) cp (F)
- -

D
{(8.11)}
filterp listf listcp (F) F(sort P).
- -

Binary thinning

With these preliminaries out of the way, the main theorem of this section can now

be stated. It will be referred to subsequently as the binary thinning theorem.

Theorem 8.2 Suppose the following three conditions are satisfied:

1. S =
(pi f\)
-
U (p2 fa),
-
where p\ and p2 are coreflexives.

2. Q isa preorder with Q C R and such that p\ -f\ and p2 'h are both monotonic
on Q°.
8.3 J Implementing thin 203

3. P is a connected preorder such that /i and /2 are both monotonic on P.

Then

minlist R ([thinlist Q merge P (</i, #2) fo'sfcp])


*

£ win # A ([5]),

where (/» =
filter pi Zis£/i.

Proof. We reason:

ramfl-AflS])
2 {thinning theorem since S is monotonic on Q°}
min R ([ttin Q A(5 Fg)])

2 {(8.7)}
minlist R sort P ([tfwn Q A(5 Fg)])

2 {fusion}
minlist R ([thinlist Q raen/e P (#1, #2) Ustcp]). *

The condition for fusion in the last step is verified as follows:

sort P thin Q A(5 Fg)


D {(8.6)}
tfrni/is* QsortP' A(S Fg)
=
{definition of 5}
thinlist Q sort P cup (A(pi /1 Fg), A(^2 h F^)) * '

D
{(8.10)}
thinlist Q raen/e P (sort P A(pi /1 Fg), sort P A(p2 /2 Fg)) * *

D
{Lemma 8.1}
thinlist Q merge P (#1, #2) /«s£cp F(sort P).
*

The theorem can be generalisedin the obvious way when 5 is a collection S =

(pi' /1) U U (pn fn).


-
We leave details as an exercise.

Exercises

8.12 Another definition of thinlist Q is as a catamorphism ([id, bump Q]) on snoc-


lists. Define bump Q and give an example to show that this version of thinlist Q
differs from that of the text.
204 8 J Thinning Algorithms

8.13 Yet another definition is

thinlist Q[] =
[]
thinlist Q [a] =
[a]
Q ([a] -tf x), if aQb

Give to show that this version of thinlist


{thinlist
thinlist
[a] -tf
Q([b]^\- x),
thinlist Q ([b] -tf x),
\ibQa
otherwise

shorter
examples Q may return a or longer
result than that of the text.

8.14 Yet another definition arises from the specification

thinlist Q C list (minlist Q) min L A(list* (connected Q) partition),


where L =
length0 leq length and the coreflexive connected Q is defined by the
associated predicate

connected Qx =
(Va : a Mist x : (V6 : b inlist x : aQb V bQa)).
In words, we partition a list into the smallest number of components, each of whose
elements are all connected under Q, and then take a minimum under Q of each
component. Use the fact that connected Q is prefix-closed (in fact, subsequence-
closed) to greedy algorithm for the optimisation problem
give a on the right. Apply
type functor fusion to obtain a catamorphism for thinlist Q.

How the catamorphism be expressed as a more efficient algorithm if it is assumed


can

that QQ°CQUQ°?

8.15 Repeat the above exercise, replacing connected Q by leftmin Q, where

leftmin Q ([a] -tf x) =


(V6 : b inlist x : aQb).

8.16 A best possible implementation of thinlist Q would be an algorithm that


subsequence of minimal elements under Q. Can
returned the such an algorithm be
implemented in linear time in the number of Q evaluations?

8.17 Prove that subseq sort P C sort P subset provided P is a connected preorder.

8.18 Prove (8.8) and (8.9).


8.19 Give functions for listcp (F x
G) and listcp (F + G) in terms of listcp (F) and
listcp (G). What is listcp (F) when F is the identity functor, or the constant functor
KA?

8.20 Can you define listcp (T) for an arbitrary type functor T?
8.4 J The knapsack problem 205

8.21 Give a counter-example showing that (8.11) fails for non-linear polynomial
relators.

8.22 Formalise and prove a version of binary thinning in which the algebra S takes
the form S =
(A* Pi) U (/2-pa).

8.4 The knapsack problem

The standard example of


binary thinning is the well-known knapsack problem
(Martello and TothThe objective is to pack items in a knapsack in the
1990).
best possible way. Given is a list of items which might be packed, each of which
has a given weight and value, both of which are non-negative real numbers. The
knapsack has a finite capacity w, giving an upper bound to the total weight of the
packed items, and the object of the exercise is to pack items with a greatest total
value, subject to the capacity of the knapsack not being exceeded.

Let Item denote the type of items to be packed and vol, wt : Real <- Item the
associated value and weight functions. The input consists of an element x of type
list Item and a given capacity w.

We will model selections as subsequences of the given list of items. The relation
subseq : list A <— list A can be expressed in the form

subseq =
([[nil, cons] U [nil, outr]]).
The total value and weight of a selection are given by two functions value, weight :

Real <— list Item, defined by

value =
sum list vol

weight =
sum list wt.

Our problem is to find a function knapsack w satisfying

knapsack w C max R A(within w subseq),


where R =
value0 leq value and within wx =
(weight x < w). Equivalently,
replacing R by R° we obtain

knapsack w C min R A(within w subseq),


where R =
value0 geq value and geq =
leq°.
An appeal to fusion, using the fact that weights are non-negative, gives

within w subseq =
([(within w [nil, cons]) U [nil, outr]]).
206 8 / Thinning Algorithms

Of course, the right-hand side simplifies to ([nil, {within w cons) U outr]); the form
above suggests that binary thinning might be applicable.

Derivation

We first check to see whether (within w [nil, cons]) U [nil, outr] is monotonic on

R° = value0 leq value; if it is, then a greedy algorithm is possible. It is easy to


prove that [nil, cons] and [nil, outr] are both monotonic on R°, but the problem is
that within w
[nil, cons] is not. It does not follow that if value x < value y and
within w ([a] -H- x), then either within w ([a] -H- y) or value ([a] -H- x) < value y.

On the other hand, it is easy to prove that within w


[nil, cons] is monotonic on Q°,
where

Q = R fl (weight0 leq weight).


Furthermore, [nil, outr] is monotonic on Q°. Since the base functor of cons-lists
islinear, all the conditions of the binary thinning theorem are in place if we take
P R, thereby sorting in descending order of value.
=

The result is that we can implement knapsack w as the function

minlist R ([thinlist Q merge R (g\, #2) listcp]),


where g\ =
filter (within w) list [nil, cons] and g<i =
list [nil, outr].
The implementation can be simplified. For the functor FA = 1 + (Item x
^4) we

have

listcp =
listcp(F) =
wrap + cpr.

Furthermore, g\
=
[list nil, hi] and g<i =
[list nil, hz], where

hi =
filter (within w) list cons

h% = list outr.

An easy simplification now yields:

knapsack w = minlist R ([nil, thinlist Q merge R (fti, h^) cpr]).

Finally, sincepackings are produced in descending order of value, we can replace


minlist R by head.
8.5 J The paragraph problem 207

The program

We represent a list x of items by the pair (x, (value x, weight x)). The following
program is parameterised by the functions val and wt:

> knapsack head catalist (start, thinlist q step w)


=
w r
.
merge .

> start =
[([],(0,0))]
> step w =
pair (filter (within w) . list cons', list outr) .
cpr
> within w =
(<= w) weight.

> cons' =
cross (cons, augment) .
dupl
>
augment
=
cross (addin val, addin wt) .
dupl
> addin f =
plus . cross (f, id)

> r =
geq . cross (value, value)
> p
=
leq . cross (weight, weight)
>
q
=
meet (p,r)

> value =
outl . outr

> outr outr


weight =
.

The algorithm, though it takes exponential time in the worst case, is quite efficient
in practice. The knapsack problem is presented in many text books as an
application of dynamic programming, in which a recursive formulation of the problem
is implemented efficiently under the assumption that the weights and capacity are
integers. Dynamic programming will be the topic of the next chapter, but the
thinning approach to knapsack gives a simpler algorithm that does not depend on the
inputs being integers. Moreover, if the weights and capacity are integers, then the
algorithm is as efficient as the dynamic programming scheme.

8.5 The paragraph problem

The next of the


binary thinning theorem is to the paragraph problem
application
(Bird 1986; Knuth and Plass
1981). The problem has already been touched on
briefly in Exercise 7.56. Three inputs are given: a non-empty sequence of words,
a function length that returns the length of a word, and a number w giving the

maximum possible line width. The width of a line is the sum of the widths of its
words plus some measure of the interword spaces. It is assumed that w is sufficiently
large that any word will at least fit on a line by itself.
208 8 J Thinning Algorithms

By definition, a line is a non-empty sequence of words, and a paragraph is a

nonempty sequence of lines; thus

Line =
list+ Word
Para =
list+ Line.

We will build paragraphs from right to left, so our lists are cons-lists. Certainly,
no greedy algorithm for the paragraph problem can be based on cons-lists
sensible
(see Exercise 7.56), but thinning algorithms consider all possibilities and are not
sensitive to the kind of list being used.

The problem is to find a function paragraph w satisfying

paragraph w C min R A(Zis£+ (fits w) partition),


where R =
(waste w)° -leq- (waste w) and waste wis a, measure of the waste incurred
by a particular paragraph given the maximum width w.

To complete the specification we need to define waste w, fits w and partition. The
type assigned to partition is Para <— list+ Word and we can define it as a catamor-
phism on non-empty lists by changing the definition given in Section 5.6 slightly:

partition =
([wrap wrap, new U glue]),
where

new (a, xs) =


[[a]] -H- xs
glue (a, xs) =
[[a] -H- head xs] -H- tail xs.

Note that glue is a (total) function on non-empty lists, but only a partial function
on possibly empty lists. We will need the fact that glue is a function in the thinning
algorithm to come.

The coreflexive fits w holds on a line x if width x < w, where width is given by a

catamorphism on non-empty lists:

width =
([length, succ plus (length x id)]).
It is assumed that interword spaces contribute one unit toward the width of a line,
which accounts for the term succ in the catamorphism above.

Finally, the function waste depends on the 'white-space' that occurs at the end
w

of all the lines of the paragraph, except for the very last line, which, by definition,
has no white-space associated with it. Formally,

waste w = collect list (white w) init,


8.5 I The paragraph problem 209

Before proceeding Before proceeding


with the derivation with the derivation
of an algorithm, we of an algorithm,
note that the we note that the
obvious greedy obvious greedy
algorithm does not algorithm does
solve this not solve this

specification. specification.

Figure 8.1: A greedy and an optimal paragraph.

where init: list A <— list+ A removes the last element from a list, and

white wx =
(w width x)-
Provided it satisfies certain properties, the precise definition of collect is not too
important, but for concreteness we will take

collect = sum list sqr,

where sqrm =
m2. This definition is suggested in (Knuth and Plass 1981).

After an appeal to fusion, using the assumption that each individual word will fit
on a line by itself, we can phrase the paragraph problem in the form

C min R
paragraph w h([wrap wrap, new U (ok w glue)]),
where ok w holds on
([x] -H- xs) if width x < w. Since an individual word will fit on

a line by itself, we can rewrite the algebra of the catamorphism in the form

[wrap wrap, new] Uokw- [wrap wrap, glue].


Since newglue are both functions,
and we see that the problem is of a kind to which
binary thinning may be applicable.

Derivation

Before proceeding with the derivation of an algorithm, we note that the obvious
greedy algorithm does not solve this specification. The greedy algorithm is a left
to right algorithm, filling lines for as long as possible before starting a new line.
The left-hand side of Figure 8.1 shows the output of the greedy algorithm on the
opening sentence of this section, and an optimal paragraph (with the given definition
of collect) on the right.
210 8 J Thinning Algorithms

One reason why the greedy algorithm fails is that glue is not monotonic on R°.
Even for paragraphs [x] -H- xs and [y] -H- ys of the same input, the implication
> waste
waste ([x] -H- xs) ([y] -H- ys)
=>• waste ([[a] -H- x] -tf xs) > waste ([[a] -H- y] -tf 2/5)
does not hold unless x = y. Even then, we require an extra condition, namely that
cons is monotonic under collect0 Ze# collect. This condition holds for the given
definition of collect, among others.

Given this property of collect, we do have that both new and ok w glue are

on Q°, where
monotonic

Q =
R H (ftead° head).
We leave the formal justification as an exercise. So all the conditions for binary
thinning place, except for the choice of the connected preorder P. Unlike the
are in
case of the knapsack problem we cannot take P R. The choice of P is a sensitive =

one becausesorting bring together paragraphs with the same first


with P should
line, enabling thinlist Q to thin them to a single candidate. A logical choice is to
weaken the equivalence relation head0 head to a connected preorder, taking

P =
head0 L head,

where L is some linear order on lines. Given context,


prefix, we can take L =

because this is a linear order on first lines of


input. And it
paragraphs of the same

is easy to show that both new and glue are monotonic on P. However, all this is
overkill because a much simpler choice of P suffices, namely, P U, the universal =

relation. Trivially, all functions are monotonic on II. The reason why II works is
because we have

merge II cat,
=

and the term g\ in the implementation given below


so automatically brings together
all partial solutions with the same first line.

With this choice of P the binary thinning theorem gives

paragraph w =
minlist R ([thinlist Q -
cat -

(#1, #2) listcp]),


*

where

g1
=
Ust [wrap wrap, new]
g2
=
filter (ok w) list [wrap wrap, glue].
For the functor FA =
Word + (Word x 4) we have

listcp =
listcp(F) =
wrap + cpr.
8.5 J The paragraph problem 211

Hence rewriting g\ and #2 as coproducts, we obtain

paragraph w =
minlist R ([start, thinlist Q cat (/&i, A2) cpr]), *

where

start =
wrap wrap wrap

fti =
list new

/&2 =
/i/ter (ok w) Zis£ </Z?/e.

The program

For efficiency, a partition [x] -tf xs is represented by the pair

([x] -H- xs, (w width x, waste w xs)).


Since waste w
[] is not defined, we will assume that it is some large negative quantity
—00; then we have that the waste of a partition (xs, (m, n)) is max{m2 + n, 0}.
The resulting program is shown below. Some additional input and output
formatting has been added to make the program more useful: words divides a string
into consecutive words, leaving out spaces and newline characters; unwords does
the opposite, joining the words with single spaces; and unlines joins lists of lines
with single newline characters. These formatting functions are provided in Gofer's
standard prelude and are also defined in the Appendix:

>
paragraph w =
unpara .
para w words

>
unpara
=
unlines . list unwords . outl

> para w =
minlist r . catallist (start w, thinlist q .
step w)
> step w =
cat .
pair (list (new' w), filter ok list
.
glue') .
cpr

> start w =
wrap .
pair (wrap wrap, augment) .

> where augment pair ((w-)=


length, neginf) .

> new' w =
cross (new, augment) dupl
> where augment cross ((w-)
=
length, waste) .

>
glue' =
cross (glue, augment) dupl .

> where augment


=
cross (reduce .
swap, outr) .
dupl
> reduce =
minus . cross (id, succ .
length)
> new =
cons . cross (wrap, id)
>
glue =
cons . cross (cons, id) . assocl . cross (id, split)
212 8 / Thinning Algorithms

> r =
leq . cross (waste .
outr, waste .
outr)
> p
=
eql . cross (outl .
outr, outl .
outr)
>
q
=
meet (r,p)

> waste =
omax .
plus . cross (sqr, id)
> omax =
cond (>= 0) (id, zero)
>
sqr
=
times .
pair (id, id)
> ok =
(>= 0) . outl . outr
>
neginf =
const (-10000)

Exercises

8.23 Show list+ (fits w) partition =


([wrap wrap, new U (ok w glue)]).
8.24 One possible choice for the function / in the definition of waste is / sum. =

This leads to less pleasing paragraphs, but a greedy algorithm is possible provided
we switch to snoc-lists. Derive this algorithm.

8.6 Bitonic tours

As a final application of thinning we solve a generalisation of the following problem,


which is taken from (Cormen et al. 1990):

The euclidean traveling-salesman problem is the problem of determining


a a given set of n points in the plane.
shortest closed tour that connects
On the left in Figure 8.2 is the solution to a 7-point problem. The
general problem is NP-complete, and its solution is therefore believed
to require more than polynomial time.

J.L. Bentley has suggested that we simplify the problem by restricting


our attention to bitonic tours, that is, tours that start at the leftmost
point, go strictly left to right to the rightmost point, and then go strictly
right to left back to the starting point. On the right in Figure 8.2 is the
shortest bitonic tour of the same 7 points. In this case, a polynomial-

time algorithm is possible.

Describe an
0(ra2)-time algorithm for
determining an optimal bitonic
tour. You may assume that points have the same x-coordinate.
no two

(Hint: Scan right to left, maintaining optimal possibilities for the two
parts of the tour.)
8.6 J Bitonic tours 213

Figure 8.2: An optimal and an optimal bitonic tour.

We will solve a generalised version of the bitonic tours problem in which distances

are necessarily euclidean nor necessarily symmetric. We suppose only that with
not
each ordered pair (a, b) of points (called cities below) is associated a travelling
cost tc(a,b), not necessarily positive nor necessarily equal to tc(b,a). The final

algorithm will take 0(n2) time, where n is the length of the input, assuming that
tc can be computed in constant time.

It does not make sense to talk about a bitonic tour of one city, so we will assume

that the input is a list of at least two cities, the order of the cities in the list being
relevant. We will take the hint in the formulation of the problem and build tours
from right to left, but this is only because cons-lists are more efficient than snoc-lists
in functional programming. Formally, all this means that we are dealing with the
base functor

FA (City x
City) + (City x A)
of cons-lists of length at least two.

We will describe tours, much as a travel agent would, by a pair of lists (x, y), where
x represents the outward journey and y the return (reading from right to left). For
example, the tour that proceeds directly from New York to Rome but visits London
on its return is represented by the itinary

([New York, Rome], [New York, London, Rome]).


This is a different itinary to

([New York, London, Rome], [New York, Rome]),


214 8 J Thinning Algorithms

because the travelling costs may depend on the direction of travel. As we have
described them, both parts of the tour are subsequences of the input, and have
lengths at least two.

Suppose the first example is extended to include Los Angeles as the new starting
point. This gives rise to two extended tours:

([Los Angeles, New York, Rome], [Los Angeles, London, Rome])


([Los Angeles, Rome], [Los Angeles, New York, London, Rome]),
It is a requirement of a tour that no city should be visited twice, so New York has
to be dropped from either the outward journey or the return.

With these assumptions, we can define tour by

tour =
([start, dropl U dropr]),
where start (a, b) =
([a, b], [a, b]) and

dropl(a,([b]-U-x,y)) =
([a] -H- x, [a] -H- y)
dropr (a, (x, [b] -tf y)) =
([a] -tf x, [a] -tf y).
Each partial tour (x, y) maintains the property that the first elements of x and y
are the same, as are the last elements.

The total cost of a tour is given by a function cost defined by

cost (x, y) = outcost x + incost y,

where

outcost [ao, ai,..., an] =


(+j : 0 < j < n : tc (aj, Oj+i))
incost [ao, ai,..., an] (+j : 0 < j < n : tc (aj, flj-i)).
Our problem now is to find a function mintour that refines min R Atour, where
R = cost0 leq cost.

Derivation

As usual, analysis of why [start, dropl U dropr] is not monotonic on R° will help to
suggest an appropriate Q for the thinning step. The monotonicity condition comes
down to two inclusions:

dropl (id x R°) C R° ?

dropl
dropr (id x R°) C R° .

dropr.
8,6/ Bitonic tours 215

To see what these conditions mean, observe that cost (dropl {a, {x, y))) equals
cost {x, y) + tc {a, next x) tc {head x, next x) + cost {head y, a),
where next {[a] -tf [b] -H- x) 6. Dually, cost {dropr {a, {x, #))) equals
cos£ (#, y) + tc (a, ftead x) tc {next y, head y) + tc {next y, a).
Now, the first condition says that if cost {x, y) < cost
{u, v), then

cost {x, y) + tc {a, next x) tc {head x, next x) + cost {head y, a)


< cost {u, v) + tc (a, next u) tc {head u, next u) + cost {head v, a).

The second condition is similar. Neither holds under an arbitrary function cost
unless

{head x, head y) =
{head u, head v) A {next x, next y) =
(next u, next v).
The first conjunct will hold whenever (x, y) and (m, v) are tours of the same input.
It is now clear that we have to define Q by

Q =
R fl {next2)° next2, -

for then dropl and dropr are both monotonic under Q° (and Q).

All the conditions for the


binary thinning theorem are in place, except for the choice
preorder P. As in the paragraph problem, we cannot take P
of the connected R
because the monotonicity condition is not satisfied. And, also as in the paragraph
problem, we can take P II. The reason is basically the same as before and
Exercise 8.25 goes into details.

Since merge II =
cat, we can appeal to binary thinning and take

mintour minlist R -

([thinlist Q -
cat -

{91,92) listcp]),
-

where #1 =
list [start, dropl) and 92 list [start, dropr). As before, we have listcp
wrap + cpr, so mintour simplifies to

minlist R -

([wrap start, thinlist Q cat {list dropl, list dropr) cpr]).


-

The algorithm takes quadratic time because just two new tours are added to the list
of partial solutions at each stage (see Exercise 8.25). If the list of partial solutions
grows linearly and it takes linear time to generate the new tours, then the total
time is quadratic in the length of the input.
216 8 / Thinning Algorithms

The program

For efficiency a tour t is represented by the pair (£, costp). The Gofer program is
parameterised by the function tc:

> mintour *
minlist r . cata21ist (wrap .
start, thinlist q .
step)
>
step
*
cat .
pair (list dropl, list dropr) .
cpr

> start (a,b) =


(([a,b],[a,b]), tc (a,b) + tc (b,a))
>
dropl (a,((x,y),m)) =
((a:tail x, a:y), m +
adjustl (a,x,y))
>
dropr (a,((x,y),m)) =
((a:x, a:tail y), m +
adjustr (a,x,y))

>
adjustl (a, b:c:x, d:y) =
tc (a,c) -

tc (b,c) + tc (d,a)
>
adjustr (a, b:x, d:e:y) =
tc (a,b) -

tc (e,d) + tc (e,a)

> r =
leq . cross (outr, outr)
>
p
=
eql . cross (next2, next2)
>
q
=
meet (r,p)
> next2 =
cross (next, next) . outl
> next =
head . tail

> cata21ist (f,g) [a,b] =


f (a,b)
> cata21ist (f,g) (a:x) =
g (a, cata21ist (f,g) x)

Exercises

8.25 Determine next (dropl (a, (x, y))) and next (dropr (a, (re, y))). Hence show by
induction that the next values of the list of tours maintained after processing the
input [ao, oi,..., On] are:

(on,ai), (an-i,ai), ...,(a2,ai), (ai, 02),..., (ai, an)-

8.26 Consider the case where tc is symmetric, so the tour (y, x) is essentially the
same as (x,y). Show how dropl and dropr can be modified to avoid generating the
same tour twice. What is the resulting algorithm?

8.27 One basic assumption of the problem was that a city could not be visited
both on the outward and inward part of the journey. Reformulate the problem to
remove this restriction. What is the algorithm?

8.28 The other assumption was that each city should be visited at least once.

Reformulate the problem to remove this restriction. What is the algorithm?


Bibliographical remarks 217

8.29 The longest upsequence problem is to compute max R A(ordered subseq),


where R =
length0 leq length. Derive a thinning algorithm to solve this problem.

8.30 The rally driver's problem is described as follows: imagine a long stretch of
road along which n gas stations are placed. At gas station i (1 < i < n) is a
quantity of fuel /*. The distance from station i to station i +1 (or to the end of the
road if i n) is a known quantity d*, where both fuel and distance are measured
in terms of the same unit. Imagine that the rally driver is at the beginning of the
road, with a quantity /o of fuel in the tank. Suppose also that the capacity of the
fuel tank is some fixed quantity c. Assuming that the rally driver can get to the
end of the road, devise a thinning algorithm for determining a minimum number of
stops to pick up fuel. (Hint: model the problem using partitions.)

8.31 Solve the following exercise from (Denardo 1982) by a thinning algorithm: A
long one-way street consists of m blocks of equal length. A bus runs 'uptown' from
one end of the street to the other. A fixed number n of bus stops are to be located
so as to minimise the total distance walked by the population. Assume that each
person taking uptown bus trip walks to the nearest bus stop, gets on the bus,
an

rides, gets off at the stop nearest his or her destination, and walks the rest of the
way. During the day, exactly Bj people from block j start uptown bus trips, and
Cj complete uptown bus trips at block j. Write a program that finds an optimal
location of bus stops.

Bibliographical remarks

The motivation of this chapter was to capture the essence of sequential decision
processes as by Bellman (Bellman 1957), and rigorously defined
first introduced
by (Karp and Held 1967). In particular, Theorem 8.2 could be seen as a 'generic
program' for sequential decision processes (De Moor 1995). In that paper it is
indicated how the abstract relational expressions of Theorem 8.2 can actually be
written as an executable computer program.

The relation passed as an argument to thin corresponds roughly to what other


authors call a dominance relation. Dominance relations have received a lot of attention
in the algorithm design literature (Eppstein, Galil, Giancarlo, and Italiano 1992;
Galil and Giancarlo 1989; Hirschberg and Larmore 1987; Yao 1980, 1982). Most
of this work is concerned with improving the time complexity of naive dynamic
programming algorithms.

In programming methodology, our work is very much akin to that of (Smith and
Lowry 1990; Smith 1991). Smith's notion of problem reduction generators is quite
similar to the generic algorithm presented here in fact, but bears a closer
resemblance to the results of the following chapter.
218 8 J Thinning Algorithms

The idea of implementing dynamic programming algorithms through merging is


well known in operations research. In the context of the 0/1 knapsack problem, it
was first suggested by (Ahrens and Finke 1975). Recently, this method has been

improved (through an extension of methods described in this book) to obtain a

novel solution to the 0/1 knapsack problem that outperforms all others in practice
(Ning 1997).
Chapter 9

Dynamic Programming

We turn now to methods for solving the optimisation problem

minR-A«lS])-<lT])°).
However, we only consider the case where S
will ft, a function. This chapter
=

discusses dynamic programming solutions, while Chapter 10 considers another class


of greedy algorithms. In outline, dynamic programming is based on the observation
that, for many problems, an optimal solution is composed of optimal solutions to
subproblems, a property known as the principle of optimality.

If the principle of optimality is satisfied, then one can decompose the problem in
all possible ways into subproblems, solve the subproblems recursively, and assemble
an optimal solution from the partial results. This is the content of Theorem 9.1.

Sometimes it is known that certain decompositions can never contribute to an


optimum solution and can be discarded; this is the content of Theorem 9.2. In the
extreme case, all but a single decomposition can be discarded, leading to a class of

greedy algorithms to be studied in Chapter 10.

The sets of decompositions associated with different subproblems are usually not
disjoint, so a naive approach to solving the subproblems recursively will involve
repeating work. For this reason there is a second phase of dynamic programming in
which the subproblems are solved more efficiently. There are two complementary
schemes: memoisation and tabulation. (The terminology is standard, but tabulation
has nothing to do with tabular allegories.)

The memoisation scheme is top-down; the computation follows that of the recursive
program but solutions to subproblems are recorded and retrieved for subsequent
use. Some functional languages provide a built-in memoisation facility as an

optional extra.By contrast, the tabulation scheme is bottom-up; using an analysis


of the dependencies between subproblems, the problems are solved in order of
dependency, and stored in a specially constructed table to make subsequent retrieval
easy. Although the dependency analysis is usually simple, the implementation of
220 9 / Dynamic Programming

a tabulation scheme can be rather complicated to describe and justify. We will,


however, give full details of tabulations for two of the applications described in this
chapter.

9.1 Theory
As mentioned only the case that S
above, we consider =
ft, a function. To save ink,
define H ([ft]) ([T])0
=
follows, where ft and
in all that T are F-algebras. The basic
theorem about dynamic programming is the following one.

Theorem 9.1 Let M = min R AH. If ft is monotonic on #, then

(pX : min RP(h- FX) -AT0) C M.

Proof. It follows from Knaster-Tarski that the conclusion holds if we can show
that

minR-P(h-FM)-AT° C M. (9.1)
Using the universal property of min we can rewrite (9.1) as two inclusions:

minR-P(h-FM)-AT° C H (9.2)
min RP(hFM)AT°H° C R. (9.3)
To prove (9.2) and (9.3) we will need the rule

min RPX C (X-e) C) ((R- X)/b) (9.4)


proved in Chapter 7.

For (9.2) we argue:

minR-P(h-fM)-AT0
C {since (9.4) gives min R PX C X -

e}
hFMeAT0
=
{A cancellation}
ft FM T°
C {definition of M and universal property of min}
hFHT°
=
{definition of H and hylomorphism theorem (Theorem 6.2)}
H.
9.1 / Theory 221

To prove (9.3) we argue:

minR-P(h-FM)-AT°-H°
{since (9.4) gives min R PX C (R X)/3> .

((^/&.FM)/3)-Aro-#0
{definition of # and hylomorphism theorem}
((R-h- FM)/3). AT° TFH°h°
{since AX° «IC9; division; functors}
R -h F(M J5T°) ft°
.

{definition of M and universal property of min}


R-h-FR-h°

{assumption h FR ft° C R}
RR

{since i? is transitive}
R.

Theorem 9.1 describes a recursive scheme in which the


input is decomposed in all
possible ways. However, problems we can tell that certain decompositions
with some

will never lead to better results than others. The basic theorem can be refined by
bringing in a thinning step to eliminate unprofitable decompositions. This leads to
the following version of dynamic programming; the proof follows the preceding one
very closely, and we leave it as an exercise:

Theorem 9.2 Let M = min R AJT. If h is monotonic on R and Q is a preorder


satisfying hFHQ°CR°hFH1 then

(fiX : min RP(hFX) ^


thin Q AT°)
^
C M.

Both theorems conclude that an optimal solution can be computed as a least fixed
point of a certain equation. Theorem 6.3 says that the equation has a unique fixed
point if member (F) T° is an inductive relation. Furthermore, if AT° returns finite
non-empty sets and R is a connected preorder, then the unique solution is entire.
By suitably refining min R and thin Q AT°, we can then implement the solution
-

as a recursive function.

Since Q is a relation on FA (for some A), and FA is often a coproduct, we can

appeal to the following proposition to instantiate the conclusion of Theorem 9.2.


The proof is left as an exercise.
222 9 J Dynamic Programming

Proposition 9.1 Suppose that V\ and V2 have disjoint ranges, that is, suppose
thatVi° V2 =
0. Then

minR-P[UuU2]-ihin(Qi + Q2)-AlVi, V2]° =


(ranV1-+W1,W2),
where Wi = mm RPUi- thin Qi A V"i° for i =
1,2.

Checking the conditions

The two conditions of dynamic programming are:

h-FR C #. ft

hFHQ° C R°hFH.

To ease the task of checking these conditions, we can often appeal to one or other
of the following results; the first could have been given in an earlier chapter.

Proposition 9.2 If for some functions cost and k we have

R = cost0 leq cost

cost -ft = k F cos£

A; F/eg C /eg fc,

then h-FRCR-h.

Proof. We argue:

ft .
F# C R .
ft
=
{definition of # and shunting}
cos^ ft Fi2 C leq cos^ ft
=
{assumption cos^} on

A; F(cost R) Cleq-k -F cost


4= {since cost i2 C leq cos^}
A; F(te(j cos£) C Ze# A; F cos^
<= {functors}
k -FleqCleq-k
<= {assumption that k is monotonic on leq}
true.

?
9.1 J Theory 223

The following result establishes a monotonicity in context property (see also


Exercise 9.2).

Proposition 9.3 If for some functions cost and k we have

R = cost0 leq cost

cost ft =
k F(cos£, J?°)
A; F(/eg x id) C /eg fc,

and if H° is simple, then h-F(R n (J5T jET°)) C # ft.

Proof. We argue:

ft F(i? H (JT JT°)) C R -ft


. .

=
{definition of R and shunting}
cos* ft F((cos*° /eg cos*) n (# H°)) C /eg. cost ft
=
{products}
cost ft F((cos£, J5T0)0 (/eg cos*, H°}) C /eg cos* ft
=
{assumption on cost}
k F((cos£, jET°> (cos*, #0}0 (/eg cost, H°)) C /eg * F(cos£, J5T°>

<£= {since #° simple implies (cost, H°) simple}


* F(/eg cost, H°) C /eg * F(cost, H°)

<= {products; functors}


k F(/eg x id) C leq- k

<= {assumption on A;}


true.

In the next result we take F to be a bifunctor, writing F(id, X) rather than FX.

Proposition 9.4 Suppose U and V are two preorders such that

h-F(U,R)CR-h and HV°CR°.H.

Then the conditions of Theorem 9.2 are satisfied by taking Q =


F( U, V).
224 9 I Dynamic Programming

Proof. Monotonicity follows at once from the reflexivity of U. For the second part
we argue as follows:

h F(W, H) Q°
=
{taking Q F(£/, V); converse; bifunctors}
=

h-F(U°,H- V°)
C {assumption on V}
h-F(U°,R°-H)
C {bifunctors}
h-F(U0,R°)*F(id,H)
C {assumption on h (taking converse and shunting)}
fl°-fc-F(td,J5T).
D

Exercises

9.1 Why is Theorem 9.1 a special case of Theorem 9.2?

9.2 The conditions of dynamic programming can be weakened by bringing in


context. More precisely, it is sufficient to show that

h F(R n (H #°)) C Rh

h FH .

(Q n (r° T))° C R°hFH.

Prove this result.

9.3 Prove Theorem 9.2.

9.4 Prove that the thinning condition of Theorem 9.2 can be satisfied by taking
Q =
F(M° R M). Why may this not be a good choice in practice?
9.5 This exercise deals with the proof of Proposition 9.1. Relations V\ and V2 have
disjoint ranges if ran V2 C~ ran Vi, where is the complementation operator
~ on

coreflexives. Show that V\ and V2 have disjoint ranges if and only if

ran V2 = ran VV ~
ran V±.

Use this result to show that

A[ Vi, V2]° =
(ran Vi -> A(inl Vi°), A(tnr V2°)).
Now show that

ttin (Qi + Q2) EmZ = Em/ ttin Qi


9.21 The string edit problem 225

thin(Qi + Q2) Emr = Emr thin Q2.

Using these results, prove Proposition 9.1.

9.2 The string edit problem


In the string edit problem two strings x and y are given, and it is required to
transform string into the other by performing a sequence of editing operations.
one

There are many possible choices for these operations, but for simplicity we assume
that we are given just three: copy, delete and insert. Their meanings are as follows:

copy a copy character a from x to y;


delete a delete character a from x;
insert a insert character a in y.

The point about these operations is that if we swap the roles of delete and insert,
then we obtain a transforming the target string back into the source.
sequence
In fact, the operations contain enough information to construct both strings from
scratch: we merely have to interpret copy a as meaning "append a to both strings";
delete a as "append a to the left string"; and insert a as "append a to the right
string". Since there are many different edit sequences from which the two strings
can be reconstituted, we ask for a shortest edit sequence.

To specify the problem formally we will use cons-lists for both strings and edit
sequences; thus a string is an element of list Char and an edit sequence is an element
of list Op, where

Op ::= cpy Char del Char ins Char.

The function edit: (list Char x list Char) <- list Op reconstitutes the two strings:
edit =
([base, step]),
where base returns the pair ([],[]) and

step(cpya,(x,y)) =
([aj-tf x, [a] -tf y)
step (del a, (x, y)) =
([a] -tf x, y)
step(insa,(x,y)) =
(x,[a]-W-y).
The specification of string edit problem is to find a function mle (short for "minimum
length edit") satisfying
mle C min R Aedit°,
where R =
length0 leq length.
226 9 J Dynamic Programming

Derivation

To apply basic dynamic programming we have to show that a =


[nil, cons] is
monotonic under R. But this is immediate from Proposition 9.2 using

length =
([zero, succ outr])
and the monotonicity of succ under leq.

For this problem go further and make good use of a thinning step. The
we can

intuition is that operation, when available, leads to a shorter result than


a copy
delete or insert. We therefore investigate whether we can find a preorder Q over
the type F(Op, String x String), where F(A, B) 1 + (Ax B), satisfying =

a F(id, edit0) Q° C R° a F(id, edit0).


Prom Proposition 9.4 we know that it is sufficient to take Q =
F(U, V) for some

preorders U and V satisfying the two conditions:

a-F(U,R)CR-a and V -
edit C edit -
R.

Since a F(II, R) C R a (exercise) we can always take 17 II. There is also an


=

obvious choice for V: take V =


suffix x suffix. With this choice of V, the second
condition can be broken down into two inclusions:

(id x suffix) edi£ C edit #

(suffix x id) edi£ C ecKJ #.

Since smj0?# =
tail*, it is sufficient to show that

(id x tail) edi£ C edi£ #

(tail x id) edi£ C edi£ #,

because ABCBC implies A* B C B C*. We give an informal proof of the


first inclusion (a point-free proof is left as an exercise); the second one follows by a
symmetrical argument. Suppose edit es (x, cons (b, y)), and let e be the element
=

of es that produces 6. If e cpy b, then


=
replace e by del b in es; if e ins 6, then =

remove e from es. The result is an edit sequence fs that is no longer than es and

satisfies editfs (x, y).


=

The result of this analysis is that a shortest edit sequence can be obtained by
computing the least fixed point of the recursion equation

X = min R -

P[nil, cons -

(id x
X)] -
thin Q A[base, step)0,

where Q = id + (U x V), and U and V are given above.


9.2 / The string edit problem 227

Since base and step have disjoint ranges we can appeal to Proposition 9.1 and obtain

X =
(empty -» nil, minR P(cons (id x X)) thin(U x
V) hstep°),
where empty (x, $/) holds if both x and y are empty lists.

We can implement thin(U x V) Astep° as a list-valued function unstep, defined by

tm*fe;>([a] -H-x, []) =


[(deZa, (re, []))]
tm*tej>([],[6]-H-y) =
[(ifW *>([]>»))]
and

unstep ([a] -tf £, [6] -tf y) =

f [(cp2/a,(z,2/))], if a = 6
[(deZ a, (x, [b] -tf ?/)), (ms 6, ([a] -H- z, y))], otherwise.

The relation min R is implemented as the function minlist R on lists. The result is
that mle can be implemented by the program

mle =
(empty —> m/, minlist R list (cons (id x ra/e)) unstep).
The program terminates because the second components of unstep (x, y) are pairs
ofstrings whose combined length is strictly smaller than that of x and y.

The problem with this implementation is that the running time is an exponential
function of the sizes of the two input strings. The reason is that the same subprob-
lem is solved many times over. A suitably chosen tabulation scheme can bring this
down to quadratic time, and this is the next part of the derivation.

Tabulation

The tabulation scheme for mle is motivated by the observation that in order to

compute mle (x, y) we also need to compute mle (u, v) for all tails u of x and tails
v of y. It is helpful to imagine these values arranged in columns: for example,

mle (aio^as, 6162) mle (^102^3, fa) mle (aia^as, [])


mle (02^3, &1&2) mle (0203, fa) mle (02^3, [])
mle(as1fafa) mle (03,62) mZe(a3,[])
mle ([],&!62) mle([)M) mle ([],[]).

If we define the curried function

column xy =
[mle (u, y) u <— tails x],
228 9 / Dynamic Programming

then the rightmost column is column x


[] and the leftmost oneis column xy. The
topmost entry in the leftmost column is the required value mle (x, y). We will build
the columns one by one from right to left, using each column to construct the next
one. Thus, we aim to express column x as a cons-list catamorphism

column x =
([fstcol x, nextcol x]).
It is easy to check from the definition of mle that fstcol = tails-list del. The function
nextcol is to satisfy the equation

column x ([b] -tf y) = nextcol x (6, column x y).


The general idea is to implement nextcol as a catamorphism, building the next
column from bottom to top. Prom the recursive characterisation of mle we have

a) -tf mle (u, y), if a = b

where and the


{[cpy
of mle
[del a] -tf mle (u, [b] -tf y),
[ins b) Hf mle ([a] -M-u,y),
and mle
if m

otherwise,
< n

m n are lengths (u, [b] -tf y) ([a] -tf u, y) respectively.


In terms of column entries the picture is

column x
([b] -tf y) column x y

mle ([a] -tf u, [b] -tf y) mle ([a] 4f u, y)


mle (u, [b] -tf y) mle (u, y)

Thus, each entry in the left column may depend on the one below it (if a delete is

best), the one to the right (if an insert is best), and the one below that (if a copy
is best).

In order to have all the necessary information available in the right place, the
catamorphism for nextcol is applied to the sequence

zip (x, zip (init (column x y), tail (column x y))).

The elements of x are needed for the case analysis in the definition of mle, and
adjacent pairs of elements in column x y are needed to determine the value of mle. The
bottom element of nextcol x (6, column x y) is obtained from the bottom element of
column x y as a special case. With this explanation, the definition of nextcol is

nextcol x (6, us) =


{[base (6, last us), step b]j xus,
9.2 J The string edit problem 229

where

xus =
zip(x, zip (init us, tail us))
base (b,u) =
[[ins b] -tf w],
and

step 6 ((a, (m, v)), ws) =

[[cpt/ a] -tf v] -tf ws, if a = b


[bmin R ([del a] -tf w, [ins b] -tf u)] -tf ws, otherwise
where w = head ws.

The program

The only change in the Gofer program is that an edit sequence v is represented by
the pair (v, length v) for efficiency. The program is

> data Op -
Cpy I Del Char I Ins Char
Char
> mle (x,y) =
outl (head (column x y))
> column x =
catalist (fstcol x, nextcol x)
> fstcol x =
zip (tails (list Del x), countdown (length x))

> nextcol x (b,us) ¦


catalist ([ins (last us)], step b) xus
b
> where xus =
zip (x, zip (init us, tail us))

> step b ((a,(u,v)),ws)


> =
[cpy a v] ++ ws, if a =«
b
> =
[bmin r (del a w, ins b u)] ++ ws, otherwise
> where r =
leq . cross (outr, outr)
> w =
head ws

>
cpy b (ops,n) (Cpy b ops, n+1)
=
:

> del a (ops,n) =


(Del a :
ops, n+1)
> ins a (ops,n) =
(Ins a :
ops, n+1)

> countdown 0 =
[0]
> countdown (n+1) =
(n+1) : countdown n

Finally, let us show that this program takes quadratic time. The evaluation of
column x y requires q evaluations of nextcol, where q is the length of y, and the
time to compute each evaluation of nextcol is 0(p) steps, where p is the length of
x. Hence the time to construct column xy is 0(p x q) steps.
230 9 J Dynamic Programming

Exercises

9.6 Prove that cons (II x R) C R cons where R =


length0 leq length.
9.7 Prove formally that (id x tail) edi£ C edit #, where R =
length0 /eg length.

9.3 Optimal bracketing


A standard application of dynamic programming is to the problem of building a
minimum cost binary tree. The problem is often formulated as one of bracketing
an expression a\ 0 oq. © © an in the best possible way. It is assumed that 0
is an associative operation, so the way in which the expression is bracketed does
not affect its value. However, different bracketings may have different costs, and
the objective is to find a bracketing of minimum cost. Specific instances of the
bracketing problem are explored in the exercises.
The obvious choice of datatype to represent bracketings is a binary tree with values
in the tips:

tree A ::= tip A | bin (tree A, tree A).


For example, the bracketing (a\ 0 02) 0 (as 0 04) is represented by the tree

bin (bin (tip 01, tip 02), bin (tip a$, tip 04)),
while the alternative bracketing a\ 0 ((02 © ^3) © &*) is represented by the tree

bin (tip 01, bin (bin (tip 02, tip a3), tip a±)).
A tree can be flattened by the function flatten : list+ A <- tree A defined by

flatten =
([wrap, cat]),
where cat : list+ A <- (list+ A)2. This function produces the list of tip values in
left to right order. Our
problem, therefore, is to find a function met (short for
"minimum cost tree") satisfying
met C min R A([wrap, cat])0,
where R = cost0 leq cost.

The interesting part is the definition of cost. Here is the general scheme:

(tip a)
cost = 0

cost (bin (x, y)) = cb (size x, size y) + cost x + cost y

size (tip a) = st a

size (bin (x,y)) = sb (size x, size y).


9.3 / Optimal bracketing 231

In words, the cost of building a single tip is zero, while the cost of building a node
is some function cb of the sizes of the expressions associated with the two subtrees,
plus the cost of building the two subtrees. The function size (which, by the way,
has no relation to the function that returns the number of elements in a tree) is a

catamorphism trees, where st gives the size of


on an atomic
expression and sb the
size of a compound expression in terms of its two subexpressions. Formally,

(cost, size) =
([opt, opb]),
where opt =
(zero, st) and

opb ((ex, sx), (cy, sy)) =


(cb (sx, sy) + ex + cy, sb (sx, sy)).
We illustrate the abstract definition of cost with one specific example. Consider
the problem of computing x\ -tf a^ -H H- xn in the best possible way. If -tf is
implemented on cons-lists, then the cost of evaluating x -tf y is proportional to the
length of x, and the size of the result is the sum of the lengths of x and y. For this
problem, cb (m, n) m, sb (m, n) =m + n, and st length. It turns out in this
= =

instance that the bracketing

Xi -H- (X2 -H- (' ' '

-H- (Xn-1 -H- Xn)))


is always optimal, which is one reason why concat is defined as a catamorphism on

cons-lists in functional programming.

Derivation

For this problem we have h =


[tip, bin] and flatten, a function. There is no
([T]) =

obvious criterion for preferring some decompositions over others, so the thinning
step is omitted and we will aim for an application of Theorem 9.1. To establish the

monotonicity condition we will need the assumption that the size of an expression
is
dependent only on its atomic constituents, not on the bracketing. This condition
is satisfied if the function sb in the definition of size is associative. It follows that
size = sz -

flatten for some function sz.

For the monotonicity condition we will use Proposition 9.3 and choose a function g
satisfying

cost-[tip, bin) =
g (id + (cost, flatten)2) (9.5)
g-(id + (leq x id)2) C leq g. (9.6)
We take

g =
[zero, outl opb (id x
sz)2],
where sz is the function introduced above.
232 9 / Dynamic Programming

For (9.5) we argue:

g (id + (cost, flatten)2)


-

=
{definition of g; coproducts and products}
[zero, outl opb (cost, sz flatten)2]
=
{assumption sz flatten size} =

[zero, outl opb (co5^, size)2]


=
{since (cost, size) ([op£, opfr])} =

[zero, ot^Z (cos£, size) 6m]


=
{since cost £ip zero; products} =

[cost tip, cost bin]


- -

=
{products}
cost [tip, bin].
-

For (9.6) we argue:

g (id + (/eg x id)2)


=
{definition of o}
[zero, otttfZ opb (/eg x sz)2]
C
{definition of op6 and + monotonic}
[zero, leg outl op6 (id x sz)2]
C {/eg reflexive}
leg [zero, outl opb (id x sz)2]
-

=
{definition of </}
leq-g.

The dynamic programming theorem is therefore applicable and says that we can
compute a minimum cost tree by computing the least fixed point of the recursion
equation

X min R P[tip, bin (X x X)] A[wrap, cat]°.


Since wrap and cat have disjoint ranges appeal to Proposition 9.1 gives

X =
(single -» tip wrap0, min R P(6in (X x A")) Acat°),
where single x holds if x is a singleton list. The recursion can be implemented by
representing Acat° as the function splits, where

splits =
zip -

(inits*, tails+),
and inits+ and £ai/s+ return the lists of proper initial and tail segments of a list;
9.3 J Optimal bracketing 233

inits+ is an implementation of AW+, where init+ is the transitive closure of init


and describes the proper prefix relation; dually, tails* is an implementation of
Ktail+.

Then we can implement met by the recursive program:

met {single -» tip head, minlist R list (bin (met x met)) splits).
The program terminates because the recursive calls are on shorter arguments: if
(y, z) is an element of splits x, then both y and z are shorter than x. As in the
case of the string editing problem, multiple evaluations of the same subproblem
mean that the program has an exponential running time, so, once again, we need a
tabulation scheme.

Tabulation

In order to compute met x we also need to compute met y for every non-empty
segment y of x. It is helpful to picture these values as a two-dimensional array:

mct(ai)
mct(aiaa) met (02)

mct(aiaQ,az) mct(aaa^) mct(as)


met (aia^a^a^) rac£ (020304) mct(a^a^) mct(a^).

The object is to compute the bottom entry of the leftmost column. We will represent
the array as a list of rows, although we will also need to consider individual columns.
Hence we define

array list row inits

row list met tails


col list met inits.

The functions inits and tails both have type list*(list+ A) <— list+ A; the function
inits returns the list of non-empty initial segment in increasing order of length, and
tails the tail segments in decreasing order of length.
In order to tackle the main calculation, which is to show how to compute array, we
will need various subsidiary identities, solet us begin by expressing met in terms
of row and col. For the recursive case we can argue:

met
=
{recursive case of met and definition of splits}
minlist R list (bin (met x met)) zip (inits+, tails+)
234 9 J Dynamic Programming

{since list
g) zip zip (listf x Zist </)}
(f x =

minlist R Zist 6m zip (Zist met inits+, list met tails+)

{introducing mix minlist R Zist 6m zip}=

mix (list met inits~*~, list met £ai/s+)


-

{since inits+ inits init and £aife+ iaife tail} =

mix (list met inits init, list met to'/s tail)


-

{definition of row and col}


mix (col init, row tail).
- -

Hence

met =
(single -» tap ftead, mix (co/ init, row to'/)) (9.7)
mix =
minlist R Zist 6m 2ip.

Next, let us express col in terms of row and co/. For the recursive case, we argue:

col
=
{definition of col}
list met -
inits

{since inits (inits init, id) on non-singletons}


snoc

list met -

(inits init, id)


snoc

{since listf snoc snoc (listf x /)}


snoc (co/ init, met)
=
{(9.7) on non-singletons}
snoc (coZ init, mix (cof init, row tow/))
=
{introducing next snoc (outl, mix)} =

next (col init, row tail).


- -

Hence

coZ =
(single -» wrap tip ftead, next (col init, row tail)) (9.8)
next =
snoc (outl, mix).
Equation (9.8) can be used to justify the implementation of col as a loop (see
Exercise 9.13):
col =
loop next (wrap tip ftead, list row mits tail).
Below, we will need this equation in the equivalent form:

col -
cons process (id array)
x (9.9)
process =
loop next ((wrap tip) -
x id).
9.3 J Optimal bracketing 235

As a final preparatory step, we express row in terms of met and col. For the
recursive case we can argue:

row

=
{definition of row}
list met -
tails
=
{since tails
(id, tails tail) on non-singletons}
cons

list met cons (id, tails tail)


-

{since listf cons cons (/ x listf)} =

cons (met, row tail).


-

Hence

row =
(single -» wrap tip ftead, cons -

(met, row faw/)) (9.10)


Now for the main calculation. We will compute array as a catamorphism on cons-
lists, building columns from right to left, and then using the column entries to
extend each row. Hence we want

array =
(\fstcol, addcol]),
for appropriate functions fstcol and addcol. It is easy to check that

fstcol =
wrap wrap tip,

so the problem is to compute addcol. We reason:

array cons

=
{definition of array}
list row -
inits cons

=
{since inits (wrap outl, tail inits cons)}
cons = cons

list row (wrap outl,


-
cons inits cons) tail
=
{since listf cons cons (/ x listf)}
cons (row wrap outl, list row JmZ inits cons)
-

=
{(9.10)}
cons (wrap tip om£Z, Zis£ (cons (met, row £m/)) tail imfe cons).
-

We continue by simplifying the second term, abbreviating tail inits cons by tic:

list (cons (met, row tai/)) &c


-

=
{since list (f,g) zip (listf, list g)}=

list cons zip (list met, list (row tail))


-
tac
236 9 / Dynamic Programming

=
{products}
list cons -

zip (list met tic, list (row tail) tic)


=
{since listf tail = teii
col} listf; definition of
list cons zip (tail
-
col
(row teii) tac) cons, Zis£
=
{since list tail tic mite oi/ir; definition of array}
=

list cons zip (teii col cons, arraj/ outr)


-

=
{(9-9)}
Zis£ cons zip (tail process (id x array), array outr)
- -

=
{products}
list cons zip (tail process, outr) (id x array).
-

Summarising, we have shown that array cons =


addcol -

(id x array), where

addcol =
cons (wrap tip om£Z, step)
step = list cons zip (tail process, outr).
-

The program

The following Gofer program follows the above scheme, except that we label the
trees with cost and size information. More precisely, the tree bin (x,y)is represented
by bin (c, s) (x, y), where c =
cost (bin (x, y)) and s =
size (bin (x, y)):

> data Tree a =


Tip a I Bin (Int,a) (Tree a, Tree a)

> met =
head . last .
array
>
array
=
catallist (fstcol, addcol)
> fstcol =
wrap wrap.
tip .

> addcol =
cons pair (wrap
.
tip outl, step) . .

>
step =
list cons zip pair (tail
.
process, outr) . .

>
process
=
loop next cross (wrap .
tip, id) .

> next =
snoc pair (outl, minlist r
. list bin zip) . .

> where r leq =


cross (cost, cost) .

> cost (Tip a) =0


> cost (Bin (c,s) ts) c =

> size (Tip a) =


a

> size (Bin (c,s) ts) =


s
9.3 J Optimal bracketing 237

>
tip =
Tip
> bin (x,y) =
Bin (c,s) (x,y)
> where c =
cb (size x, size y) + cost x + cost y
> s =
sb (size x, size y)

Finally, let us estimate the running time of the program. To build an (n x n) array,
the operation addcol is performed n 1 times. For execution of addcol on an array
of size (m x
m), the operation step takes 0(m2) steps since next is executed m
times and takes 0(m) steps. So the total is 0(n3) steps.

Exercises

9.8 Consider the problem of computing the sum x\ + X2 + + xn in the most


efficient manner, where each Xj is a decimal numeral. What are the functions cost
and size for this bracketing problem?

9.9 Same questions as in the preceding exercise, but for the problem of computing
the product x\ x X2 x x xn.

9.10 Same questions, but for matrix multiplication in which we want to compute
M\ x Mi x Mn, where Mj is an (rj_i, rj) matrix.

9.11 Prove the claim that concat is best evaluated in terms of a catamorphism on

cons-lists.

9.12 Show that if h is associative, then

([#, h]) ([wrap, cat]) =


([#, ft]),
where the catamorphism ([#, h]} on the left is over non-empty lists, and ([#, h]} on

the right is over trees.

9.13 The standard function loopf is defined in the Appendix by the equations

loopf (id -
x nil) = outl

loopf (id -
x cons) =
loopf (/ x id) assocl.

An equivalent characterisation of loopf in terms of snoc-lists is:

loopf (id -
x nil) = outl

loopf (id -
x
snoc) =
f (loopf x id) assocl.

Using this characterisation, prove that

k =
loopf (g -
x list h inits)
238 9 J Dynamic Programming

if and only if

k -

(id x nil) =
g outl
k -

(id x snoc) =
f (k (id x
outl), h snoc outr).
Hence prove (9.9).
9.14 The optimal bracketing problem can be phrased, like the knapsack problem,
in terms of catamorphisms. Using the converse function theorem, express flatten0
as a catamorphism, and hence find a thinning algorithm for the problem. (This is
a research problem.)
9.15 Explore the variation of the bracketing problem in which 0 is assumed to be
commutative as well as associative. Thisgives us the freedom to choose an optimal
bracketing from among all possible permutations of the input [a\, da,..., an).

9.4 Data compression

In the method of data compression by textual substitution the data to be compressed


is string of characters. The compressed data is an element of list Code, where an
a

element of Code is either a character or a pointer to a substring of the part of the


string already processed:

Code ::= sym Char ptr (String, String+).


A pointer is defined as a pair of strings (but see below), the idea being that the
second string identifies the non-empty portion of the input concerned, while the
first indicates where it is to be found. We make this idea precise by describing the
process of decoding a code sequence.

We will need to use snoc-lists, so for this section suppose that

list A ::= nil snoc (list A, A)


list+ A ::= wrap A | snoc (list+ A, A).
In particular, String list Char and String^
= list+ Char. The partial function =

decode : String <— list Code is defined as the catamorphism

decode =
([nil, extend]},
where

extend
(x, sym a) =
x 4f [a]
extend (x, ptr (y, z)) = x 4f z, provided (y 4f z) init+ (x 4f z).
9.4 J Data compression 239

The relation init+ is the transitive closure of init and describes the proper prefix
relation. Note in the second equation that it is not required that y 4f z be a prefix
of x; in particular, we have

extend ("aba",p*r ("a", "bab")) =


"ababab".

The function decode is partial -

if the very first code element is a pointer, then


decode is undefined since there is no y for which y 4f z is a proper prefix of z. Note
also that the range of decode is the set of all possible strings, so all strings can be
encoded.

We have chosen to define pointers as pairs of strings, but the success of data

compression in practice results from representing each pointer (y,z) simply by the
lengths of y and z. For this new representation, the decoding of a pointer is given by

extend (x,ptr(m,n)) = x (g) (ra, n),


where the operator (g) is defined recursively:

#<8>(ra,0) = x

x(g)(m, n + 1) =
(x4f [xm]) <8> (m + l,n).

Here, xm is the rath element of x (counting from 0). This change of representation
yields a compact representation of strings. For instance,

decode ['a', (0,9)] =


"aaaaaaaaaa".

A slightly more involved example is

decode ['a', 'a', 'b', (1,3), 'c', (1,2)] = "aababacab".

Bearing the new representation of pointers in mind, we define the size of a code
sequence by

size =
([zero, plus [id x c,id x
p]- distr]),
where c and p are given constant functions returning the amount of space to store
symbols and pointers. Typically, symbols require one byte, while pointers require
fourbytes (three bytes for the first number, and one byte for the second). Both
c and p are determined by the implementation of the algorithm on a particular

computer.

The function size induces a preorder R = size0 leq size, so our problem is to
compute a function encode satisfying

encode C min R A decode0.


240 9 / Dynamic Programming

Derivation

The monotonicity condition is easy to verify, so the basic form of dynamic


programming is applicable. But we can do better with a suitable thinning step. For general
c and p it is not possible to determine at each stage whether it is better to pick a

symbol or a pointer, assuming that both choices are possible. On the other hand,
it is possible to choose between pointers: a pointer (y,z) should be better than

(y',zf) whenever z is longer than z' because a longer portion of the input will then
be consumed. More precisely, suppose

w = extend (x, ptr (y, z)) and w = extend (#', ptr ($/', z')),
so w = x 4f z = x' 4f z'. Now, z is longer than z' if and only if z' is a suffix of z.

Equivalently, z islonger than z' if and only if x is a prefix of x'.


This reasoning suggests one possible choice for the thinning relation Q: take

Q =
F(n + n, prefix),
where the first II is the universal relation on symbols, and the second II is the
universal relation on pointers. The functor F is given by

F(Code, String) = id + (String x


Code).
By Proposition 9.4 we have to check that

a -

F(II + II, R) C R a and prefix decode C decode R.

The first condition is routine using the fact that the sizes of symbols and pointers
are constants (i.e. [c, d] (II + II) =
[c, d]), and we leave details as an exercise. The
second condition follows if we can show

init -
decode C decode R.

We give an informal proof. Suppose decode cs = snoc (x, a); either cs ends with the
code element sym a, in which case drop it from
cs, or it ends with the code element
ptr (y, z4f [a]) for some y and z\ in the second case, replace it by ptr (y,z)i£z^[],
or drop it if z
[]. The result is a new code sequence that decodes to x, and which
=

has cost no greater than cs.

The dynamic programming theorem states that the data compression problem can

be solved by computing the least fixed point of the equation


X =
min R P[mZ, snoc (X x
id)] thin Q A[mZ, extend]0,
where Q id + (U
= x
V) and U prefix and V
= = U + II. Since nil and extend
have disjoint ranges, we can appeal to Proposition 9.1 and obtain

X =
(null -> nil, min R P(snoc (X x
id)) thin (U x
V) Aextend0).
9.4 J Data compression 241

The final task is to implement thin(U x


V) Aextend0. Since

(Aextend0) (w 4f [a]) =

{(w,syma)} U {(x,ptr(y,z)) | x-H-z =


ti/-H-[a] A (y 4f z) prefix w},
we can define /fi (short for "longest repeated tail") by

Irt w = min (U x
V) {(#, (y, z)) x -M- z = w A (y -H- 2?) init+ w},
and so implement thin (U x
V) Aextend0 by a function reduce defined by

if 2^ []

{[(ti;,«yma),(a;,p*r(y,z))],
[(11;, sym a)],
where (x, (y, z)) = H (w 4f
otherwise
[a]).
There is a fast algorithm for computing Irt (Crochemore 1986) but we give only a

simple implementation.

Summarising, we can compute encode by the recursive program

encode =
(null —> nil, minlist R list (snoc (encode x
id)) reduce).
As with preceding problems, the computation of encode is inefficient since the same
subproblem may be computed more than once. We will not, however, go into
the details of a tabulation phase; although the general scheme is clear, namely, to
compute encode on all initial segments of the input string, the details are messy.

The program

In the following program a code sequence x is stored as a pair (x,sizex). The


program is parameterised by the function bytes : Nat <- Code that returns the sizes
of symbols and pointers:

> data Code =


Sym Char I Ptr (String, String)
> encode =
outl . encode'
> encode' =
cond null (nil', minlist r . list f .
reduce)
> where f =
snoc' . cross (encode', id)
> r =
leq . cross (outr, outr)

> nil' =
const ([] ,0)
> snoc' =
cross (snoc, plus cross (id, bytes))
.
dupr .

> reduce w =
[(init w, Sym (last w)), (x, Ptr (y,z))], if z /= []
> =
[(init w, Sym (last w))], otherwise
> where (x,(y,z)) =
Irt w
242 9 J Dynamic Programming

> lrt w =
head [(x,(y,z)) I (x,z) <- splits w, y <- Iocs (w, z)]
> Iocs (w,z) =
[y I (y, v) <- splits (init w), prefix (z, v)]

>
prefix ([], v) =
True
>
prefix (z, []) =
False
>
prefix (a:z,b:v) =
(a ==
b) && prefix (z,v)

Exercises

9.16 Prove that a F(II + II, R) C R .


a.

9.17 Prove formally that init decode C decode R.

9.18 Why can't we take Q =


F(II, prefix), where II is the universal relation on code
elements?

9.19 What simplification to the algorithm is possible if it is assumed that c =


p?

9.20 We can turn decode into surjective function by redefining code


a total and
sequences so that if such aempty, then it always begins with a
sequence is not
symbol. This means that the converse function theorem is applicable, so decode0
can be expressed as a catamorphism. Develop a thinning algorithm to solve the
dictionary coding problem. (This is a research problem.)

Bibliographical remarks

In 1957, Bellman published the first book on dynamic programming (Bellman 1957).
Bellman showed that the use of dynamic programming is governed by the principle

of optimality, and many authors have since considered the formalisation of that
principle as a monotonicity condition, e.g. (Bonzon 1970; Mitten 1964; Karp and
Held 1967; Sniedovich 1986). The paper by Karp and Held places a lot of emphasis
on the sequential nature of dynamic programming, essentially by concentrating on

list-based programming problems. The preceding chapter deals with that type of
problem.

(Helman and Rosenthal 1985; Helman 1989a) present a wider view of dynamic
programming, generalising from lists to more general tree-like datatypes. Our approach
is a natural reformulation of those ideas to a categorical setting, making the
definitions and proofs more compact by parameterising specifications and programs
with functors. Furthermore, the relational calculus admits a clean treatment of
indeterminacy.
Bibliographical remarks 243

The work of Smith (Smith and Lowry 1990; Smith 1991) shows close parallels with
the view of dynamic programming put forward here: in fact the main difference is
in the style of presentation. Smith's work has the additional aim of mechanising the
algorithm design process. To this end, Smith has built a system that implements
his ideas (Smith 1990), and has illustrated its use with an impressive number of
examples. As said before, we have not investigated whether the results of this book
are amenable to mechanical application, although we believe they are. The ideas

underlying Smith's work are also of an algebraic nature (Smith 1993), but, again,
this is rather different in style from the approach taken here.

Another very similar approach to dynamic programming is that of (Gnesi, Monta-


nari, and Martelli 1981), which also starts with algebraic foundations. There it is
shown how dynamic programming can be reduced to a graph searching problem.
It is in factpossible to view our basic theorem about dynamic programming in
these terms (Ning 1997). One advantage of that view is that it allows a smooth
combination of branch-and-bound with dynamic programming. Branch-and-bound
has been studied in a calculational style by (Fokkinga 1991).

Besides Bellman's original book, there are many other texts on dynamic
programming, e.g. (Bellman and Dreyfus 1962; Denardo 1982; Dreyfus and Law 1977).
There is a fair amount of work on tabulation, and on ways in which tabulation
schemes may be formally derived (Bird 1980; Boiten 1992; Cohen 1979; Pettorossi
1984). These methods are, however, still ad-hoc, and a more generic solution to the
problem of tabulation remains elusive.

Finally, a on the applications considered in this chapter. In the special


few remarks
case of matrix chain
multiplication, the bracketing problem admits a much better
solution than the one derived here (Hu and Shing 1982, 1984; Yao 1982). The
part of the data compression algorithm that we have ignored (finding the longest
repeated tail) is discussed in a functional setting by (Giegerich and Kurtz 1995).
Chapter 10

Greedy Algorithms

As we preceding chapter, greedy algorithms can be viewed as an extreme


said in the
case dynamic programming in which all but a single decomposition of the input
of
are weeded out. The theory is essentially the same as that given in Chapter 9, so

most of what follows is devoted to applications.

10.1 Theory
As in the preceding chapter, define H ([ft]) ([T])0, where h and T are F-algebras.
=

The proof of the following theorem is very similar to that of Theorem 9.2 and is left
as an exercise:

Theorem 10.1 Let M =


minR AH. If h is monotonic on R and Q satisfies
h FH Q° C R° h F#, then

(/xX:ft-FX-ramQ-AT°) C M.

Theorem 10.1 has exactly the same hypotheses as Theorem 9.2 but appears to give
a much stronger result. Indeed it does, but the crucial point is that it is much
harder to refine the result to a computationally useful program. To do so, we need,
in addition to the conditions described in the preceding chapter, the further—and
very strong—condition that Q is preorder on sets returned by AT°.
a connected
This was not the case with the examples given in the preceding chapter. Since Q
is a relation on FA (for some ^4) and FA is often a coproduct, we can make use of
the following result, which is a variation on Proposition 9.1.

Proposition 10.1 Suppose that V\ and V2 have disjoint ranges, that is, suppose
thatVi° V2 =
0. Then

[UuU2]"min{Q1 + Q2)-A[VuV2]0 =
(ran Vi -> Wu W2),
where W% =
Ui min Qi A Vi° for i =
1,2.
246 10 J Greedy Algorithms

Recall also Proposition 9.4, which states that the hypotheses of the greedy theorem
can be satisfied by taking Q F([/, V), where U and V are preorders such that
=

h-F(U,R)CR.h and H V° C R° H.

However, such a choice of Q is not always appropriate when heading for a greedy
algorithm since we also require min Q AT° to be entire.

10.2 The detab-entab problem


The following two exercises are taken from (Kernighan and Ritchie 1988):

Exercise 1-20. Write a program detab that replaces tabs in the input
with the proper number of blanks to space to the next tab stop. Assume
a fixed set of tab stops, say every n columns. Should n be a variable or

a symbolic parameter?
Exercise 1-21. Write a program entab that replaces strings of blanks

by the minimum number of tabs and blanks to achieve the same spacing.
Use the same tab stops as for detab. When either a tab or a single blank
would suffice to reach a tab stop, which should be given preference?

Our aim in this section is to solve these two exercises. They go together because
entab is specified as an optimum converse to detab.

Detab

The function detab is defined as a catamorphism over snoc-lists:

detab =
([nil, expand]),
where

expand (x, a) =
(a =
TB -> fill x, x 4f [a])
fillx =
x 4f blanks (n (col x) mod n),
and

col =
([zero, count])
count (c, a) =
(a NL —> 0,
=
c + 1).
The expression blanks m returns a string of m blanks, TB denotes the tab character,
and NL the newline character. The function col counts the columns in each line of
the input, and tab stops occur every n columns.
10.2 J The detab-entab problem 247

The specification of detab is an executable program, except that it isn't particularly


efficient. For greater efficiency we can tuple detab and col detab to give

(detab, col detab) =


([base, step]},
where base returns ([],0) and

if NL
f (a?-H-[M],0), a =

(x 4f blanks m,c + m), if a =


TB
step((x,c),a) =
<
otherwise
(x 4f [a], c + 1),
where m =
n c mod n.

In the following functional program, we implement the snoc-list catamorphism by


a loop:

([base, step]) convert =


loop step (base, id),

where convert converts cons-lists to snoc-lists. The resulting Gofer program is:

> detab =
outl .
loop step pair (pair (nil, zero), id)
.

> step ((x,c),a)


=
(x ++ C'W], 0), if a '\n' ==

> -

(x ++ blanks m, c+m), if a 'Yt' ==

> =
(x ++ [a], c+1), otherwise
> where m =
n
-

(c 'mod' n)
> blanks 0 =
[]
' '
> blanks (m+1) =
: blanks m

There is another optimisation that improves efficiency still further. Observe that
base and step take a particular form, namely,

base =
(nil, cq)
step{{x,c),a) =
(x-U-f(c,a),g{c,a)),
for some constant cq and functions / and g. When base and step have this form,
we have

outl loop step (base, id)


-
=
loop' (/, g) (cq, id),
where loop' (/, g) is defined by the two equations

loop'(f,g)(c,[]) =
[}
loop'(f,g)(c,[a]^rx) =
f (c,a) 4f/<V'(/,g)(g(c,a),x).

The proof is left as an exercise.


248 10 J Greedy Algorithms

To see what this transformation buys, let c^+i =


g (c*, a*) and Xi+i =
f (c^, a*) for
0 < i < n. Then,

ft[ao,ai,...,an-i] =
((#i 4f X2) 4f •) 4f xn

h! [ao, ai,..., an-i]) =


xi 4f (^ 4f (• 4f xn)),
where h =
/oop step (6ase, id) and h!
outl =
loop' (/, #) (co, id). The second form
is asymptotically more efficient to compute in any functional language in which 4f
is denned in terms of cons.

Applying this transformation, and writing detab' =


loop' (/,^), we obtain the
following program:

> detab x =
detab'(0,x)
> detab'(c,[]) =
[]
> detab'(c,a:x) =
['Xn'] ++detab'(0,x), if a ==
'\n'
> =
blanks m ++ detab'(c+m,x), if a ==
'\t'
> =
[a] ++ detab'(c+l,x), otherwise
> where m =
n
-

c cmodc n

Entab

The more interesting problem is that of


computing entab. We begin by specifying
entab formally. The statement that
'strings of blanks are to be replaced by the
minimum number of tabs and blanks to achieve the same spacing' can be interpreted
as asking for a shortest possible output. The other condition on entab is that
detab entab id. These two conditions can be combined to give our specification:
=

entab C min R Adetab0,


where R =
length0 leq length.

Derivation

We aim to solve the greedy algorithm. Since nil and expand


problem with a

have disjoint ranges try we can Q as a coproduct Q


to express F([/, V), where =

F(C/, V) id +=
(V U). Furthermore, according
x to Proposition 9.4, the greedy
condition holds if we can find U and V to satisfy the two conditions

a -

F( [/, R) C R a and V detab C detab i?,

where a =
[nil,snoc]. Bear in mind, however, the additional requirement that
min Q A[mZ, expand]0 be entire.
10.2 J The detab-entab problem 249

Let us see whether we can take Q =


F(U, V) for appropriate U and V. Since
a F(II, R) C R a (see Exercise 10.3), we can choose U to be any preorder we like
on characters, including aUb if a TB or = a = b. This choice prefers tabs over
blanks. It might seem reasonable to choose V prefix, but this idea doesn't work. =

To see why, suppose n 8 and consider the following example:


=

detab [a, b, c, d, e, TB] =


[a, 6, c, d, e, £?!•, £?!•, £?!•].
Although [a, 6, c, d, e, BL, #1/] is a prefix of the right-hand side, it is longer than
[a, 6, c, d, e, TB], so the condition prefix de£a& C detab i? fails.
The resolution is to allow only those prefixes that do not cross tab stops; more

precisely, define

V =
prefix H (fill0 -fill).
To prove V detab C detab R we reason:

V detab
=
{since detab is a catamorphism}
V -

[nil, expand] F(id, detab) [nil, snoc]°


=
{coproducts and V nil nil} =

[nil, V expand] F(id, detab) [nil, snoc]°


-

C {claim: V expand C expandU(V outl)}


[nil, expand U (V ow^/)] F(id, detab) [ml, snoc]°
=
{distributing U; catamorphisms and definition of F}
de£a& U (V ow^/ (detab x id) snoc°)
=
{naturality of outl and init outl snoc°} =

detab U (V detab init).

Leaving aside the claim for the moment, we have shown that X V =
detab is a

solution of the inequation X C detab Upf-mit). But init is an inductive relation, so

the greatest solution of this inequation is the unique solution of the corresponding
equation, namely X = detabU(X-init). But the unique solution is X =
detab-prefix,
so V -
detab C detab prefix. It is immediate that prefix C i?, so we are done.

It remains to prove the claim. We argue:

V -

expand
=
{definition of expand}
V (istab outr -> fill outl, snoc)
-

=
{conditionals}
(istab outr —t V fill outl, V snoc)
-
250 10 J Greedy Algorithms

=
{claim: V -fill fill (exercise)} =

(istab outr -> fill outl, V snoc)


-

C {claim: V snoc C snoc U (V owrt) (exercise)}


(is£a& outr -> ^// ow^/, snoc U (V outl))
C {definition of expand}
expand U (V 0w£/).

The conditions of the greedy theorem are established, so we can solve our problem
by computing the least fixed point of the equation

X =
[nil, snoc] (id + (X x id)) min Q A [ml, eapand]0,
where Q = id + (V x [/). Appeal to Proposition 10.1 gives

X =
(null -> nil, snoc (X x
id) ram (V x
[/) Aexpand0).
It remains to implement min(V x
U) Aexpand0.
-
Since

Aexpand0 (x 4f [a]) =
{(y, TB) | ^// y =
x 4f [a]} U {(x, a)},
and

(3y : filly =
x -ti- [a]) =
a =
BL A col(x 4f [a]) mod n =
0,

we have

(ram (Kx [/) Aexpand0) (x 4f [a]) =

V 5, if a = RL and col (x 4f [a]) mod n =


0

{ram (x, a),


where 5 =
otherwise
{(y, TB) fill y =
x 4f [a]}.
Furthermore,

ram
V{(y, TB) filly =
x-Vt [a]} =
(unfillx, TB),
where unfill x is the shortest prefix of x satisfying

fill (unfill x) =
fillx.

We can define unfill by

unfill [] =
[]
«„/

wtfiK l*-H-[a]}
f
,v
_

-
/ unfill x, if a = BL and col (x 4f [a]) mod n ^ 0

| x4f [a]? otherwise


0the

Writing the resulting greedy algorithm as a Gofer program, we obtain


10.2 J The detab-entab problem 251

> entab x =
[] , if null x

> =
entab y ++ [a], otherwise
> where (y,a) =
contract x

> contract x
' >
> =
(unfill y,'\t'), if a ==
&& (col x) 'mod' n ==
0
> =
(y,a), otherwise
> where (y,a) =
(init x,last x)

> unfill x =
[] , if null x
' '
> =
unfill y, if a ==
&& col x 'mod' n /= 0
> =
x, otherwise
> where (y,a) =
(init x,last x)

> col loop op=


.
pair (zero, id)
>
op (c,a) =0, if '\n' a ==

> =
c+1, otherwise

The program for entab involves recomputations of col. To improve efficiency, we


will express a generalisation of entab as a snoc-list catamorphism, and then apply
the same transformation that we did for detab.

The idea is to define a function tbc (short for 'trailing blanks count') satisfying

entab x =
entab {unfill x) 4f blanks {tbc x). (10-1)

Using the definition of entab we obtain

tbc[] =
0

°' if BL and col{x -U-[a]) mod 0


{ a n
= =

tbc(x-M-\
v La]) "
=

| tbcx + 1, otherwise.

The pair {tbc, col) can now be defined as a snoc-list catamorphism:

{tbc, col) =
([base, op]},
where base returns (0,0) and

{t + 1, c + 1), if a =
BL and (c + 1) mod n ^ 0

im((t A a\
op{(t,c),a)
-

- J^ (°'c + 1)' if
[f
a =
fllr and
NL
(c + 1) mod n =
0
(0Q^ a =

(0, c + 1), otherwise.

Furthermore, the function triple =


{entab unfill, {tbc, col)) can also be expressed
252 10 J Greedy Algorithms

as a snoc-list catamorphism:

triple =
([base, op]),
where base returns ([],(0,0)) and

op((x,(t,c)),a) =

( (x, (t + 1, c + 1)), if a =
BL and (c + 1) mod n ^ 0
I (a?-H-[TB],(0,c + l)), if a = BI and (c + 1) mod n =
0
if NL
| (a;-H-6/an*»<-H-[M],(0,0)), a =

[ (z 4f blanks t 4f [a],(0, c + 1)), otherwise.

Using (10.1) we have

entab =
cat (id x
blanks) owW assocZ ^np/e.

Finally, applying the same transformation to triple as we did to detab, we obtain

> entab x =
entab'(0,0,x)

> (t, c, [] )
entab' blanks t =

> entab'(t,c,a:x)
' '
> entab'(t+l,c+l,x),
=
if a ==
&& d /= 0
' '
> ['\t'] ++ entab'(0,c+l,x),
=
if a ==
&& d ==
0
> blanks t ++ ['\n'] ++ entab'(0,0,x),
=
if a ==
'\n'
> blanks t ++ [a] ++ entab'(0,c+l,x),
=
otherwise
> where d (c+1) cmodc n =

Exercises

10.1 Justify outl -

loop step {base, id) =


loop' (/, g) (co, id).
10.2 In the specification of entab why not say that the number of tabs in the output
should be maximised?

10.3 Prove that a


F(II, R) C R a. How does the algorithm for entab change if a
single blank is to be preferred over a single tab?

10.4 For V =
prefix n (fill0 fill) prove that

V nil =
nil

V-fill =
fill
V snoc C snoc U (V outl).
Which of these conditions does not hold for V =
prefix?
10.3 j The minimum tardiness problem 253

10.3 The minimum tardiness problem

The minimum tardiness problem is a scheduling problem from Operations Research


(Hochbaum 1989; Lawler 1973). Given a bag of jobs, it is required to
and Shamir
find some permutation of the bag that minimises the maximum penalty incurred
if jobs are not completed on time. The permutation is called a schedule, so the
specification is

schedule C min R Abagify0


R =
cost0 leq cost,

where bagify turns a bag. The function cost is defined in terms of three
list into a

positive quantities associated with each


job j: (i) the completion time ctj, which
determines how long the job j takes to complete; (ii) the due time dtj, which
determines the latest time at which j should be completed (measured from the
start of the schedule); and (iii) a weighting wtj, which measures the importance
attached to job j. Given these quantities, the penalty penalty (x,j) incurred when
j is placed at the end of schedule x is defined by

penalty (x,j) =
(sum (list ct x) + ctj -

dtj) x wtj.

The term sum the


completion time of schedule x. If, when added
(list ctx) gives
to ctj, this
gives a time for
completing j that is greater than the due time of
j, then a penalty is incurred, its size being proportional to the importance of j.
If the completion time is less than the due time, then the penalty is negative.
Negative penalties are bonuses, but bonuses are ignored in the definition of cost,
which measures only the maximum penalty incurred:

cost =
max leq P([zero,penalty] a°) Apreftx,

where a =
[nil, snoc]. We can also describe cost recursively by:

cost[] =
0

cost (x 4f \j]) = bmax (cost x, penalty (x,j)).


It follows that costs are never negative, and a schedule costing zero is one in which
all jobs are completed by their due time.

To illustrate the tardiness problem, consider the following three jobs:

1 2 3
ct 5 10 15
dt 10 20 20
wt 1 3 3
254 10 J Greedy Algorithms

The best schedules are [2,3,1] and [3,2,1], each with a cost of 20; for example:

2 3 1
time 10 25 30
dt 20 20 10
penalty 0 15 20

The definition of cost is given in terms of snoc-lists, although we can use either
snoc-lists
or cons-lists to build schedules.

As we have Chapter 7 the choice of what kind of list to use can be critical in
seen in
the success greedy approach. Suppose we did go for snoc-lists. Then the final
of a

greedy algorithm, if it exists, will take the form

if emptybagu

{[],schedule
where
v-W-\j], otherwise
(vj) pick u
=

for some function pick. At each stage, therefore, we pick the job that is best
placed at the end of the schedule. Such algorithms are known as backward greedy
algorithms. If schedules are described by cons-lists, then the greedy algorithm would
involve picking a job that is best placed first in the schedule. In general, it does
not follow that if a greedy algorithm exists for snoc-lists, then a similar algorithm
exists for cons-lists.

However, armed with foresight, we will use snoc-lists in building schedules. As a

function on snoc-lists, bagify is defined by the catamorphism

bagify =
([nil, snag]),
where nil returns the empty bag, and snag (a contraction of snoc and bag, somewhat
more attractive than bsnoc) takes a pair (uj) and places j in the bag u, thereby
'snagging' it.

There is another strategic decision that should be mentioned at this point. In the
finalalgorithm input will be presented as a list rather than a bag. That means
the
we are, in effect, seeking a permutation of the input that minimises cost, so we

could have started out with the specification

schedule C min R Aperm.

With this specification another avenue of attack is opened up. The relation perm
can be defined as a snoc-list catamorphism (see Section 5.6):
10.3 J The minimum tardiness problem 255

perm
=
([ml, add]),
where add (x, j) =
y 4f \j] 4f z for some decomposition x =
y 4f 2.

In this form, the minimum tardiness problem might be solvable by the greedy
method of Chapter 7. However, no greedy algorithm based on catamorphisms
exists—or at least no simple one, which is why the problem appears in this chapter
and not earlier. To appreciate why, recall that a greedy method based on snoc-
list catamorphisms solves not only the problem associated with the given list, but
also the problems associated with all its prefixes. Dually, one based on cons-list
catamorphisms solves all suffixes of the input.

Now, consider again the three example jobs described above. With the input
presentedas [1,2,3], the best schedule for prefix [1,2] is [1,2] itself, incurring zero cost.

However, this schedule cannot be extended to either [2,3,1] or [3,2,1], the two best
solutions for the three jobs. Dually, with a cons-list catamorphism, suppose the
input is presented as [3,2,1]; again a best schedule for [2,1] is [1,2], but [1,2] cannot
be extended to either [2,3,1] or
[3,2,1].

Derivation

Although nil and snag have disjoint ranges, a development along the lines of the
detab-entab problem does not work here. For this problem we need to bring context
into both the monotonicity and greedy conditions. As a result, the proof of the
greedy condition is a little tricky.

With a =
[nil,snoc], (3 =
[nil, snag] and FX = 1 + (X x
Job), the monotonicity
and greedy conditions read:

a'F(Rn(bagify° bagify)) -
C R-a (10.2)
a *

Fbagify0 (Q° n (p° /?)) C R° a -

Fbagify0, (10.3)
To prove (10.2) we need the fact that cost can be expressed in the form

cost[] =
0

cost (x 4f \j]) =
bmax (cost x, penalty (perm x, j)).
This is identical to the earlier recursive characterisation of cost, except for the term
permx. It holds because penalty (x,j) depends only on the jobs in the schedule x,
not on their order. The reason is that penalty (x,j) is defined in terms of the sum

of the completion times of jobs in x, and sum applied to a list returns the same

result as sum applied to the underlying bag.


256 10 J Greedy Algorithms

Using perm =
bagify0 bagify, the new expression for cost can be put in the form
for which Proposition 9.3 applies:
cost -a = k F(cost, bagify)
-

k =
[zero, bmax (id x (penalty (bagify0 x
id))) assocr].
It is easy to check that

k F(leq x
id) C leq k

so (10.2) follows on appeal to Proposition 9.3.

For the greedy condition (10.3) we will need the fact that the original definition of
cost can be rewritten in the form

cost a =
bmax (g,h) (10-4)
g =
[zero, penalty] (10.5)
h =
[zero, cost outl]. (10.6)
We will also need two additional facts. Firstly, the cost of a schedule can only
increase when more jobs are added to it. In symbols,
add C R°-outl, (10.7)
where add is the relation for which perm =
([nil, add]}. A formal proof of (10.7) is
left as Exercise 10.7.

The second fact is that bagify0 is a catamorphism on bags, that is,

bagify0 (3 =
[nil, add] Fbagify0. (10.8)
The proof is left as Exercise 10.8. Putting (10.7) and (10.8) together, we obtain

bagify0 /3
=
{(10.8)}
[nil, add] Fbagify0
C {(10.7)}
[nil, R° outl] Fbagify0
- -

C
{definition of R and nil C cost0 geq zero}
cost0 geq [zero, cost outl] Fbagify0
=
{definition of h}
cost0 geq h Fbagify0.
-

Now for the proof of (10.3). We start by reasoning:


a Fbagify0 (Q° n ((3° /?))
10.3 / The minimum tardiness problem 257

C {monotonicity of composition}
(a Fbagify0 Q°) n (a Fbagify0 /?° /?)
=
{catamorphisms, since bagify ([/?])} =

(a-Fbagify0-Q°) n (bagify0 (3)


C {calculation above}
(a Fbagify0 Q°) n (cos^° geq-h- Fbagify0)
C {modular law}
a
((Fbagify0 <?° Fbagify) n (a° cos*° #eg ft)) Fbagify0
=
{choose Q to satisfy Fbagify0 Q° Fbagify g° ^eg #} =

«
((9° 9^ 9) H (a° cos*° #eg ft)) Fbagify0
* * *

=
{products}
a (g, cost a)° (geq #, ^e^ ft) Fbagify0
The choice Q f° leq /, where /
=
[zero,penalty (bagify0
- -
= x
id)], satisfies the
required specification. In words, a minimum under Q identifies a job with the least
penalty.

To complete the proof it is sufficient to show

a
(g, cost a)° (geq - -

g, geq -ft) C R° a.

Shunting cost0 to the left-hand side, we reason:

cost (g, cost a)° (geq g, ^eg ft)


a- -

C {since cost a bmax (#, cost a) and (#, cost a)


=
is simple}
bmax (geq #, geq ft)
-

C {monotonicity of bmax}
geq &raax (#, ft)
=
{(10-4)}
#e(7 cost a.

The greedy condition (10.3) is now established, so we can solve our problem by
computing the least fixed point of

X =
[nil, snoc] (id + (X x
id)) min Q A [ml, sna^]°.

Appeal to Proposition 10.1 gives

X =
(null -> ml, snoc (X x
id) min Q' Asnag°),
where Q' =
f° leq /-
and / =
penalty (bagify0 x id).
258 10 J Greedy Algorithms

Refining min Q' Asnag° to a partial function pick, we obtain

schedule =
(null —> nil, snoc (schedule x
id) pick).

The program

In the Gofer program we represent bags by lists and represent a list x by the pair
(x, sum (list ct x)). The function pick is implemented by choosing the first job in
the list with minimum penalty.

> schedule =
schedule' .
pair (id, sum . list ct)

> schedule' (x,t) =


[], if null x

> =
schedule' (x^f) ++
[j] , otherwise
> where x' =
delete j x
> t' =
t -

j ct
>
j =
pick (x,t)

>
pick (x,t) =
outl (minlist [(j, (t dt j) * wt j) I j <- x])
-

> where r =
leq . cross (outr, outr)

> delete j [] =
[]
> delete j (k:x) =
x, if j ==
k
> =
k : delete j x, otherwise

The running time of this program is quadratic in the number of jobs.

Exercises

10.5 Prove that k F(leq x


id) C leq k, where

k =
[zero, bmax (id x (penalty (bagify0 x
id))) assocr}.

10.6 Prove that cost a =


bmax (g, h).

10.7 To show that add C R° outl we can use a recursive characterisation of add:

add =
(jiX : snoc U (snoc (X x
id) exch (snoc° x
id))),
where exch :
(A x
C) x B <r-(A x
B) <r- C.
10.4 I The TeX problem -

part two 259

Prove that add C R° outl using fixed-point induction (see Exercise 6.4) and the
fact that

penalty (add x
id) C geq penalty (outl x
id).

10.8 Using the fact that perm =


bagify0 bagify =
([ml, add]), prove that

bagify0 /? =
[ml, add] Fbagify0.

10.9 Assuming all weights are the same, give an 0(n log n) algorithm for computing
the complete schedule.

10.10 The minimum lateness problem is similar to the minimum tardiness problem,
except that the cost function is defined by

cos£[] =
—oo

cost (x 4f [?']) =
bmax (penalty (xj), cost x).
It follows that costs can be negative. How does this change affect the development?
10.11 Does the problem in which cost is defined by

cost[] =
0

cost (x 4f [?]) =
plus (penalty (x, jf), cost x),
have a greedy solution?

10.4 The T]eX problem


-

part two

As a final example, let us solve the second of the IJgX problems described in
Chapter 3. Recall that the task is to convert between decimal fractions and
integer
multiples of 2~16. The function extern has type Decimal <- [0,216) and is
specified by the property that extern n should be some shortest decimal whose internal
representation is n:

extern C min R Aintern0


R =
length0 leq length.
The function intern is defined by the equations

intern =
round val

round r =
[216r+l/2j
val =
([zero, shift]}
shift (d,r) =
(d + r)/10,
260 10 / Greedy Algorithms

in which val is a catamorphism on cons-lists. In Chapter 3 we showed how to


compute intern using integer arithmetic only; this restriction also has to be maintained
in the computation of extern.

The first job is to cast the problem of computing extern into the standard mould.

Installing the definition of intern, we obtain

C
extern min R A(val° round0).
Since round0 is not a function we cannot simply take it out of the A expression.
Instead, we use the fact that

n=L216r + l/2j =
2n -

1 < 217r < 2n + 1

to express round0 in the form

round0 =
inrange interval,

where

interval n =
((2n -

l)/217,(2n + l)/217)
r inrange (a, 6) =
(a < r < 6).
Since interval is a function, we can rewrite the specification of extern to read

extern C min R A(val° inrange) interval.

Finally, we appeal to fusion to show that inrange0 val can be expressed as a

catamorphism on cons-lists:

inrange0 val =
([arb, step]).
The conditions to be satisfied are

inrange0 zero =
arb

inrange0 shift =
step (id x
inrange0).
The first condition determines arb and to determine step we argue:

(a, b) (inrange0 shift) (d, r)


=
{definition of inrange and shift}
a< (d + r)/10< 6
=
{arithmetic and definition of inrange}
(10a d, 106 d) inrange0 r
=
{arithmetic}
(3a', 6' : a (d =
+ a')/10 A 6 =
(d + 60/10 : (a', 6') mran^e0 r)
10.4 I The TeX problem
-

part two 261

=
{introducing step (d, (a', 6')) ((d =
+ a')/10, (d + 6')/10)}
(a, 6) (step (id x inrange0)) (d, r).

Summarising, we now want to determine a function extern satisfying

extern C win i? A([ar6, step])0 interval.

So far we haven't considered the restriction on the problem, namely, that the
argument to extern is an integer n in the range 0 < n < 216. For n in this range we

have interval n =
(a, 6), where a and 6 have the property that

0 < b < 1 and a < b. (10.9)


The important point is that if a' and b'
satisfy (10.9), then so do a and 6, where
(a, 6) step (d, (a7, 6'))
=
and d is
digit. Furthermore, we can always restrict arb
a

so that it returns an interval (a, b) satisfying (10.9). Hence, defining Interval to be

the set of pairs (a, b) satisfying (10.9), we have

[arb, step] : Interval«— 1 + (Digit x


Interval).
This type restriction is exploited in the derivation.

Derivation

It is easy to check that a =


[nil, cons] is monotonic under i?, so this leaves the
greedy condition. From above, it is sufficient to find a Q over the type FInterval,
where FA = 1 + (Digit x
^4), satisfying
Q Fh a° C Fh a° i?,

where ft =
(Jar6, step]).
For this problem a simple choice of Q suffices. To see why, consider the expression
A[ar6, step]0. Writing * for the sole inhabitant of the terminal object, we have for
(a, b) of type Interval that

(Aarb°)(a,b) =
(a < 0 -> {*},{ })
(Astep°) (a, 6) =
{(d, (10a -

d, 106 -

d)) 10 < 106 -

d < 1}.
But for digits d\ and d%, bearing (10.9) in mind,

(0 < 106 -

di < 1) A (0 < 106 -

da < 1) => dr =
d^.

Hence step0 :
(Digit x
Interval) <- Interval is, in fact, a function

step0 (a, 6) =
(d, (10a -

d, 106 -

d)), where d =
[106J.
262 10 J Greedy Algorithms

It follows that

(A[ar6, step]0) (a, 6) =

f {inl (*), inr (d, (10a d, 106 - -

d))}, if a < 0

| {inr (d, (10a d, 106 d))}, otherwise,


- -

and so Q need only choose between two alternatives. The appropriate definition of
Qis

Q =
(inl-\-inr°) U id,

where ! : 1 <- (Digit x Interval). With this choice of Q the inhabitant of the terminal
object is preferred whenever possible.

To establish the greedy condition, we argue:

Q Fh a° C Fh a° R
=
{definition of Q}
((inl •! inr°) U id) Fh a° C Fft a° i?
=
{since i? is reflexive}
m/ •! inr° Fft a° C Fh a° R
=
{definition of F}
inl •! (id x h) mr° a° C Fft a° i?
=
{universal property of ! and a inr =
cons}
inl •! cons° C Fft a° i2
=

{shunting}
id C !° m/° Fft a° R cons

=
{definition of F}
id C !° m/° a° R cons

=
{since a- inl =
nil}
id C !° nil° i? cons

=
{shunting}
m/ •! C i? cons

<= {since length nil •! C /eg length cons}


£rtje.

The greedy theorem is therefore applicable, so our problem is solved by computing


the least solution of the recursion equation

X = a FX min Q A[ar6, step]0.


10.4 J The TejX problem -

part two 263

We know how to simplify min Q A[ar6, step]0 and the result is that

extern =
f interval,

where

[], ifa<0
f(a, b) =
{
{ [d] 4f/(10a d, 106
where d [I0b\.
The final step is to introduce the restriction that the
=
- -

d), otherwise

computation should be
performed using integer arithmetic only. This turns out to be easy: writing w =
217,
every interval (a, b) computed during the algorithm satisfies a p/w and b = =
q/w
for some integers p and q. Initially, we have interval ((2n l)/w, (2n +
n =
l/w))
and if p/w, then 10a d
a = =
(lOp wd)/w, similarly for b. Representing
(p/w, q/w) by (p, q), we therefore obtain that extern n f(2n 1,2n + 1), where
=

[], ifp<0
f{p-> Q) {
i [d] -H-/(10p wd, lOq wd),
where d (lOq) div w.
=
otherwise

The program

Here is the final program written in Gofer:

> extern =
f . interval
>
f(p,q) =
[], if p <= 0
> =
[d] ++
f(10*p w*d, 10*q
- -

w*d), otherwise
> where d «
(10*q) 'div' w
> interval n =
(2*n -

1, 2*n + 1)
> w =
131072

Exercises

10.12 Prove that 0 < 106 -

d\ < 1 and 0 < 106 -

da < 1 imply that d\ =


d%.

10.13 The derivation of extern brought in integer arithmetic as a final step.


derived program for intern,
Using the give a derivation of extern that uses integer
arithmetic from the outset.

10.14 Show that the only property of w = 217 assumed in the derivation is that

lOq/w should not be aninteger for any q with 0 < q < w. How can this restriction
be removed?
264 10 J Greedy Algorithms

10.15 Actually, Knuth required a slightly more stringent condition on extern:


among equally short decimals, extern n should produce the one which is as close
as possible to
n/216. Which decimal, precisely, does the given algorithm for extern
produce? What modification ensures that extern n returns the shortest and closest
decimal to n/216?

Bibliographical remarks

For general remarks about the literature on greedy algorithms, see Chapter 7. The
approach of this chapter is arguably more general, and closer to the view of greedy
algorithms in the literature. We originally published the idea that dynamic
programming and greedy algorithms are closely related in (Bird and De Moor 1993a).
A similar suggestion occurs in (Helman 1989b).

In this chapter we have only problems where the base functor F is linear:
considered
no tree-like structures were For non-linear F, the recursion would be
introduced.
more appropriately termed 'divide-and-conquer'. We have not investigated this in

detail, but we hope that some of the applications of divide-and-conquer studied by


Smith can be treated in this manner (Smith 1985, 1987).

Although approach sketched here is applicable to a wide class of problems, it


the
still admits of further, meaningful generalisation. In (Curtis 1996), it is shown
how, by using a more general form of iteration, a wider class of algorithms can be
treated. Essentially, catamorphisms (and their converse) are replaced by a general
loop operator; this allows more flexibility in specifications and solutions.
Appendix

The following Gofer prelude file contains definitions of the standard functions
running all the programs in this book. As a prelude for general functional
necessary for
programming it is incomplete.

Prelude for *Algebra of Programming»


Created 14 Sept, 1995, by Richard Bird

Operator precedence table:

infixr 9 .

infixl 7 *

infix 7 /, 'mod'
infixl 6 +, -

infixr 5 ++, :

infix 4 ==, /=, <, <=, >=, >

infixr 3 kk
infixr 2 ||

Standard combinators:

(f.g)x-f(gx)
const k a =
k
id a =
a

outl (a,b) =
a

outr (a,b) =
b

swap (a,b) (b,a)


=

assocl (a,(b,c)) =
((a,b),c)
assocr ((a,b),c) =
(a,(b,c))
266 Appendix

dupl (a,(b,c)) =
((a,b),(a,c))
dupr ((a,b),c) =
((a,c),(b,c))

pair (f ,g) (f g a)
=
a a,

(f,g) (a,b) (f g b)
=
cross a,
cond p (f,g) a
=
if (p a) then (f a) else (g a)

curry fab
=
f (a,b)
uncurry f (a,b) =
f a b

Boolean functions:

false =
const False
true =
const True

False && x =
False
True && x =
x

False I I x =
x

True I I x =
True

not True =
False
not False =
True

otherwise =
True

Relations:

leq uncurry (<=)


=

less uncurry (<)


=

eql =
uncurry (==)
uncurry (/=)
=
neq
gtr
=
uncurry (>)
uncurry (>=)
=
geq

meet (r,s) =
cond r (s, false)
join (r,s) =
cond r (true, s)
wok r =
r .
swap

Numerical functions:

zero =
const 0
succ =
(+1)
pred =
(-1)
Appendix 267

plus uncurry (+)


=

minus uncurry (-)


=

times uncurry (*)


=

divide uncurry (/)


=

negative =
(< 0)
positive =
(> 0)

List-processing functions:

[] ++ y =
y
(a:x) ++ y =
a : (x++y)

null [] =
True
null (a:x) =
False

nil =
const []
wrap
=
cons pair (id, nil)
.

cons =
uncurry (:)
cat uncurry (++)
=

concat =
catalist ([], cat)
snoc =
cat cross
.
(id, wrap)

head (a:x) =
a

tail (a:x) =
x

split =
pair (head, tail)

last =
catallist (id, outr)
init =
catallist (nil, cons)

inits =
catalist ([[]], extend)
where extend (a,xs) [[]] = ++ list (a:) xs

tails =
catalist ([[]], extend)
where extend (a,x:xs) (a =
: x) : x : xs

splits =
zip pair (inits, tails)
.

cpp (x,y) =
[(a,b) I a <- x, b <- y]
cpl (x,b) =
[(a,b) I a <- x]
cpr (a,y) =
[(a,b) I b <- y]
cplist =
catalist ([[]], list cons .
cpp)

minlist r =
catallist (id, bmin r)
bmin r =
cond r (outl, outr)
268 Appendix

maxlist r =
catallist (id, bmax r)
bmax r =
cond (r .
swap) (outl, outr)

thinlist r =
catalist ([], bump r)
where bump r (a,[]) [a] =

bump r (a,b:x) I r(a,b) =


a:x

I r(b,a) =
b:x
I otherwise =
a:b:x

length =
catalist (0, succ .
outr)
sum =
catalist (0, plus)
trans =
catallist (list wrap, list cons .
zip)
list f =
catalist ([], cons . cross (f, id))
filter p =
catalist ([], cond (p .
outl) (cons, outr))

catalist (c,f) [] =
c

catalist (c,f) (a:x) =


f (a, catalist (c,f) x)

catallist (f,g) [a] =


f a

catallist (f,g) (a:x) =


g (a, catallist (f,g) x)

cata21ist (f,g) [a,b] =


f (a,b)
cata21ist (f,g) (a:x) =
g (a, cata21ist (f,g) x)

loop f (a, []) =


a

loop f (a,b:x) =
loop f (f (a,b), x)

merge r ([] ,y) =


y
merge r (x, []) =
x

merge r (a:x,b:y) I r (a,b) =


a :
merge r (x,b:y)
I otherwise =
b : merge r (a:x,y)

zip (x,[]) =
[]
zip ([],y) =
[]
zip (a:x,b:y) =
(a,b) :
zip (x,y)

unzip =
pair (list outl, list outr)

Word and line processing functions:

words =
filter (not.null) . catalist ([[]], cond ok (glue, new))
' '
where ok (a,xs) =
(a /= && a /= 'W)
glue (a,x:xs) =
(a:x):xs
new (a,xs) =
[]:xs
Appendix 269

lines =
catalist ([[]], cond ok (glue, new))
where ok (a,xs) =
(a /= '\n')
glue (a,x:xs) =
(a:x):xs
new (a,xs) =
[]:xs

unwords =
catallist (id, join)
" M
where join (x,y) x = ++ ++
y

unlines =
catallist (id, join)
where join (x,y) x =
„\n„

Essentials and built-in primitives:

primitive ord "primCharToInt" :: Char -> Int

primitive chr "primlntToChar" :: Int -> Char

primitive (==) "primGenericEq",


(/=) "primGenericNe",
(<=) "primGenericLe",
(<) "primGenericLt",
(>=) "primGenericGe",
(>) "primGenericGt" a -> a -> Bool

primitive (+) "primPlusInt",


(-) "primMinusInt",
(/) "primDivInt",
div "primDivInt",
mod "primModlnt",
(*) "primMulInt" Int -> Int -> Int

primitive negate "primNeglnt" Int -> Int

primitive primPrint "primPrint" Int -> a ->


String ->
String
primitive strict "primStrict" (a -> b) -> a -> b

primitive error "primError" ->


String a

show :: a ->
String
show x =
primPrint 0 x []

flip f a b =
f b a

End of Algebra of Programming prelude


Bibliography

Aarts, C. J., Backhouse, R. C, Hoogendijk, P. F., Voermans, E., and Van der
Woude, J. C. S. P. (1992). A relational theory of datatypes. Available from
URL http: //www. win. tue. nl/win/cs/wp/papers/papers.html.

Ahrens, J. H. and Finke, G. (1975). Merging and sorting applied to the 0-1

knapsack problem. Operations Research, 23(6), 1099-1109.

Asperti, A. and
Longo, G. (1991). Categories, Types, and Structures: An
Introduction toCategory Theory for the Working Computer Scientist.
Foundations of Computing Series. MIT Press.

Augusteijn, A. An alternative derivation of a binary heap construction


(1992).
Bird, R. S., Morgan, C. C., and Woodcock, J. C. P., editors,
function. In
Mathematics of Program Construction, Volume 669 of Lecture Notes in
Computer Science, pages 368-374. Springer-Verlag.

Backhouse, R. C. and Hoogendijk, P. F. (1993). Elements of a relational theory of


datatypes. In Moller, B., Partsch, H., and Schuman, S., editors, Formal
Program Development, Volume 755 of Lecture Notes in Computer Science,
pages 7-42. Springer-Verlag.

Backhouse, R. C. and Van der Woude, J. C. S. P. (1993). Demonic operators and


monotype factors. Mathematical Structures in Computing Science, 3(4),
417-433.

Backhouse, R. C., De Bruin, P., Malcolm, G., Voermans, T. S., and Van der
Woude, J. C. S. P. (1991). Relational catamorphisms. In Moller, B., editor,
Constructing Programs from Specifications, pages 287-318. Elsevier Science
Publishers.
272 Bibliography

Backhouse, R. C, De Bruin, P., Hoogendijk, P. F., Malcolm, G., Voermans, T. S.,


and Van der Woude, J. C. S. P. (1992). Polynomial relators. In Nivat, M.,
Rattray, C. S., Rus, T., and Scollo, G., editors, Algebraic Methodology and
Software Technology, Workshops in Computing, pages 303-362.
Springer- Verlag.

Backus, J. (1978). Can programming be liberated from the Von Neumann style? a

functional style and its algebra of programs. Communications of the ACM,


21, 613-641.

Backus, J. (1981). The algebra of functional programs: function level reasoning,


linear equations and extended definitions. In Diaz, J. and Ramos, I., editors,
Formalization of Programming Concepts, Volume 107 of Lecture Notes in
Computer Science, pages 1-43. Springer-Verlag.

Backus, J. (1985). From function level semantics to program transformations and


optimization. In Ehrig, EL, Floyd, C, Nivat, M., and Thatcher, J., editors,
Mathematical Foundations of Software Development, Vol. 1, Volume 185 of
Lecture Notes in Computer Science, pages 60-91. Springer-Verlag.

Barr, M. and Wells, C. (1985). Toposes, Triples and Theories, Volume 278 of
Grundlehren der Mathematischen Wissenschaften. Springer-Verlag.

Barr, M. and Wells, C. (1990). Category Theory for Computing Science.


International Series in Computer Science. Prentice Hall.

Bauer, F. L., Berghammer, R., Broy, M., Dosch, W., Geiselbrechtinger, F., Gnatz,
R., Hangel, E., Hesse, W., Krieg-Briickner, B., Laut, A., Matzner, T., Moller,
B., Nickl, F., Partsch, H., Pepper, P., Samelson, K., Wirsing, M., and
Wossner, H. (1985). The Munich Project CIP. Volume I: The Wide Spectrum
Language CIP-L, Volume 183 of Lecture Notes in Computer Science.
Springer-Verlag.

Bauer, F. L., Ehler, H., Horsch, A., Moller, B., Partsch, H., Paukner, O., and
Pepper, P. (1987). The Munich Project CIP. Volume II: The Program
Transformation System CIP-S, Volume 292 of Lecture Notes in Computer
Science. Springer-Verlag.

Bellman, R. E. and Dreyfus, S. E. (1962). Applied Dynamic Programming.


Princeton University Press.

Bellman, R. E. (1957). Dynamic Programming. Princeton University Press.

Berghammer, R. and Von Karger, B. (1995). Formal derivation of CSP programs


from temporal specifications. In Mathematics of Program Construction,
Volume 947 of Lecture Notes in Computer Science, pages 180-196.
Springer-Verlag.
Bibliography 273

Berghammer, R. and Zierer, H. (1986). Relational algebraic semantics of


deterministic and non-deterministic programs. Theoretical Computer Science,
43(2-3), 123-147.
Berghammer, R., Kempf, P., Schmidt, G., and Strohlein, T. (1991). Relation
algebra and logic of programs. In Andreka, H. and Monk, J. D., editors,
Algebraic Logic, Volume 54 of Colloquia Mathematica Societatis Janos
Bolyai, pages 37-58. North-Holland.

Bird, R. S. and De Moor, O. (1993a). From dynamic programming to greedy


algorithms. In Moller, B., Partsch, H., and Schuman, S., editors, Formal
Program Development, Volume 755 of Lecture Notes in Computer Science,
pages 43-61. Springer-Verlag.

Bird, R. S. and De Moor, O. (1993b). List partitions. Formal Aspects of


Computing, 5(1), 61-78.

Bird, R. S. and De Moor, O. (1993c). Solving optimisation problems with


catamorphisms. In Bird, R. S., Morgan, C. C., and Woodcock, J. C. P.,
editors, Mathematics of Program Construction, Volume 669 of Lecture Notes
in Computer Science, pages 45-66. Springer-Verlag.

Bird, R. S. and De Moor, O. (1994). Relational program derivation and


context-free language recognition. In Roscoe, A. W., editor, A Classical
Mind: Essays dedicated to CA.R. Hoare, pages 17-35. Prentice Hall.

Bird, R. S. and Meertens, L. (1987). Two exercises found in a book on


algorithmics. In Meertens, L., editor, Program Specification and
Transformation, pages 451-458. North-Holland.

Bird, R. S. and Wadler, P. (1988). Introduction to Functional Programming.


International Series in Computer Science. Prentice Hall.

Bird, R. S., Gibbons, J., and Jonesr G. (1989). Formal derivation of a pattern
matching algorithm. Science of Computer Programming, 12(2), 93-104.

Bird, R. S., Hoogendijk, P. F., and De Moor, O. (1996). Generic programming


with relations and functors. Journal of Functional Programming, 6(1), 1-28.

Bird, R. S. (1980). Tabulation techniques for recursive programs. Computing


Surveys, 12(4), 403-417.

Bird, R. S. (1984). The promotion and accumulation strategies in functional


programming. ACM Transactions on Programming Languages and Systems,
6(4), 487-504.

Bird, R. S. (1986). Transformational programming and the paragraph problem.


Science of Computer Programming^ 6(2), 159-189.
274 Bibliography

Bird, R. S. (1987). An introduction to the theory of lists. In Broy, M., editor,


Logic of Programming and Calculi of Discrete Design, Volume 36 of NATO
ASI Series F, pages 3-42. Springer-Verlag.

Bird, R. S. (1989a). Algebraic identities for program calculation. Computer


Journal, 32(2), 122-126.

Bird, R. S. (1989b). Lectures on constructive functional programming. In Broy,


M., editor, Constructive Methods in Computing Science, Volume 55 of NATO
ASI Series F, pages 151-216. Springer-Verlag.

Bird, R. S. (1990). A calculus of functions for program derivation. In Turner,


D. A., editor, Research Topics in Functional Programming, University of
Texas at Austin Year of Programming Series, pages 287-308. Addison-Wesley.

Bird, R. S. (1991). Knuth's problem. In Moller, B., editor, Constructing Programs


from Specifications, pages 1-8. Elsevier Science Publishers.

Bird, R. S. (1992a). The smallest upravel. Science of Computer Programming,


18(3), 281-292.

Bird, R. S. (1992b). Two greedy algorithms. Journal of Functional Programming,


2(2), 237-244.

Bird, R. S. (1992c). Unravelling greedy algorithms. Journal of Functional


Programming, 2(3), 375-385.

Bleeker, A. M. (1994). The calculus of minimals. M.Sc. thesis INF/SCR-1994-01,


Department of Computer Science, Utrecht University, The Netherlands.
Available from URL
http://www.cwi.nl/~annette/Papers/calculus.minimals.ps.

Boiten, E. A. (1992). Improving recursive functions by inverting the order of


evaluation. Science of Computer Programming, 18(2), 139-179.

Bonzon, P. (1970). Necessary and sufficient conditions for dynamic programming


of combinatorial type. Journal of the ACM, 17(4), 675-682.

Brink, C. and Schmidt, G., editors. (1996). Relational Methods in Computer


Science. Springer-Verlag. Supplemental Volume of the Journal Computing, to

appear.

Brinkmann, H. B. (1969). Relations for exact categories. Journal of Algebra, 13,


465-480.

Brook, T. (1977). Order and Recursion in Topoi, Volume 9 of Notes on Pure


Mathematics. Department of Mathematics, Australian National University,
Canberra.
Bibliography 275

Broome, P. and Lipton, J. (1994). Combinatory logic programming: computing in


relation calculi. In Bruynooghe, M., editor, Logic Programming. MIT Press.

Brown, C. and Hutton, G. (1994). Categories, allegories and circuit design. In


Logic in Computer Science, pages 372-381. IEEE Computer Society Press.

Burstall, R. M. and Darlington, J. A transformation system for


(1977). developing
recursive programs. Journal of the ACM, 24(1), 44-67.

Burstall, R. M. and Landin, P. J. (1969). Programs and their proofs: an algebraic


approach. In Machine Intelligence, Volume 4, pages 17-43. American Elsevier.

Carboni, A. and Street, R. (1986). Order ideals in categories. Pacific Journal of


Mathematics, 124(2), 275-288.

Carboni, A. and Walters, R. F. C. (1987). Cartesian bicategories I. Journal of


Pure and Applied Algebra, 49(1-2), 11-32.

Carboni, A., Kasangian, S., and Street, R. (1984). Bicategories of spans and
relations. Journal of Pure and Applied Algebra, 33(3), 259-267.

Carboni, A., Kelly, G. M., and Wood, R. J. (1991). A 2-categorical approach to

geometric morphisms I. Cahiers de Topologie et Geometrie Differentielle


Categoriques, 32(1), 47-95.

Carboni, A., Lack, S., and Walters, R. F. C. (1993). Introduction to extensive and
distributive categories. Journal of Pure and Applied Algebra, 84(2), 145-158.

Chen, W. and Udding, J. T. (1990). Program inversion: more than fun!. Science
of Computer Programming, 15(1), 1-13.

Clark, K. L. and Darlington, J. (1980). Algorithm classification through synthesis.


Computer Journal, 23(1), 61-65.

Cockett, J. R. B. and Fukushima, T. (1991). About Charity. Technical Report


92/480/18, Department of Computer Science, University of Calgary, Canada.
Available from URL
http: //www. cpsc. ucalgary. ca/pro j ects/charity/home. html.

Cockett, J. R. B. and Spencer, D. (1992). Strong categorical datatypes I. In Seely,


R. A. G., editor, Category Theory 1991, Volume 13 of CMS Conference
Proceedings, pages 141-169. Canadian Mathematical Society.

Cockett, J. R. B. (1990). List-arithmetic distributive categories: locoi. Journal of


Pure and Applied Algebra, 66(1), 1-29.
276 Bibliography

Cockett, J. R. B. (1991). Conditional control is not quite categorical control. In


Birtwistle, G., editor, Higher-order Workshop, Workshops in Computing,
pages 190-217. Springer-Verlag.

Cockett, J. R. B. (1993). Introduction to distributive categories. Mathematical


Structures in Computer Science, 3(3), 277-307.

Cohen, N. H. (1979). Characterization and elimination of redundancy in recursive


programs. In Principles of Programming Languages, pages 143-157.
Association for Computing Machinery.

Cormen, T. EL, Leiserson, C. E., and Rivest, R. L. (1990). Introduction to


Algorithms. The MIT electrical engineering and computer science series. MIT
Press.

Crochemore, M. (1986). Transducers and repetitions. Theoretical Computer


Science, 45(1), 63-86.

Curtis, S. Lowe, G. (1995). A graphical calculus. In Moller, B., editor,


and
Mathematics of Program Construction, Volume 947 of Lecture Notes in
Computer Science, pages 214-231. Springer-Verlag.

Curtis, S. (1996). A relational approach to optimization problems. D.Phil, thesis,


Computing Laboratory, Oxford, UK. Available from URL
http://www.comlab.ox.ac.uk/oucl/users/sharon.curtis/
publications.html.

Darlington, J. (1978). A synthesis of several sorting algorithms. Acta Informatica,


11(1), 1-30.
Davey, B. A. and Priestley, H. A. (1990). Introduction to Lattices and Order.
Cambridge Mathematical Textbooks. Cambridge University Press.

Davie, A. J. T. (1992). Introduction to Functional Programming Systems using


Haskell, Volume 27 of Computer Science Texts. Cambridge University Press.
De Bakker, J. W. and De Roever, W. P. (1973). A calculus for recursive program
schemes. In Nivat, M^ editor, Automata, Languages and Programming, pages
167-196. North-Holland.

De Moor, O. (1992a). Categories, relations and dynamic programming. D.Phil,


thesis. Technical Monograph PRfi-98, Computing Laboratory, Oxford, UK.

De Moor, O. (1992b). Inductive data types for predicate transformers.


Information Processing Letters, 43(3), 113-118.

De Moor, O. (1994). Categories, relations and dynamic programming.


Mathematical Structures in Computing Science, 4, 33-69.
Bibliography 277

De Moor, O. (1995). A generic program for sequential decision processes. In


Hermenegildo, M. and Swierstra, D. S., editors, Programming Languages:
Implementations, Logics, and Programs, Volume 982 of Lecture Notes in
Computer Science, pages 1-23. Springer-Verlag.
De Morgan, A. (1860). syllogism, no. IV, and on the logic of relations.
On the
Transactions of the Cambridge Philosophical Society, 10, 331-358. Reprinted
in: (De Morgan 1966).
De Morgan, A. (1966). "On the syllogism" and other logical writings. Yale
University Press.

De Roever, W. P. (1972). A formalization of various parameter mechanisms as


products of relations within a calculus of recursive program schemes. In
Theorie des Algorithmes, des Langages et de la Programmation, pages 55-88.
Seminaires IRIA.

De Roever, W. P. (1976). Recursive program schemes: semantics and proof


theory. Mathematical Centre Tracts 70, Mathematisch Centrum, Amsterdam,
The Netherlands.

Denardo, E. V. (1982). Dynamic Programming -

Models and Applications.


Prentice Hall.

Desharnais, J., Mili, A., and Mili, F. (1993). On the mathematics of sequential
decompositions. Science of Computer Programming, 20(3), 253-289.

Dijkstra, E. W. and Scholten, C. S. (1990). Predicate Calculus and Program


Semantics. Texts and Monographs in Computer Science. Springer-Verlag.

Dijkstra, E. W. (1976). A Discipline of Programming. Series in Automatic


Computation. Prentice Hall.

Dijkstra, E. W. (1979). Program inversion. In Bauer, F. L. and Broy, M.^ editors,


Program Construction, Volume 69 of Lecture Notes in Computer Science,
pages 54-57. Springer-Verlag.

Doornbos, H. and Backhouse, R. C. (1995). Induction and recursion on datatypes.


In Moller, B., editor, Mathematics of Program Construction, Volume 947 of
Lecture Notes in Computer Science, pages 242-256. Springer-Verlag.

Dreyfus, S. E. and Law, A. M. (1977). The Art and Theory of Dynamic


Programming, Volume 130 of Mathematics in Science and Engineering.
Academic Press.

Eilenberg, S. and Wright, J. B. (1967). Automata in general algebras. Information


and Control, 11(4), 452-470.
278 Bibliography

Enderton, H. B. (1977). Elements of Set Theory. Academic Press.

Eppstein, D., Galil, Z., Giancarlo, R., and Italiano, G. F. (1992). Sparse dynamic
programming II: Convex and concave cost functions. Journal of the ACM,
39(3), 546-567.

Feferman, S. (1969). Set-theoretical foundations of category theory. In Reports of


the Midwest Category Seminar III, Volume 106 of Lecture Notes in
Mathematics, pages 201-247. Springer-Verlag.

Fegaras, L., Sheard, T., and Stemple, D. (1992). Uniform traversal combinators:
definition, use and properties. In Kapur, D., editor, Automated Deduction,
Volume 607 of Lecture Notes in Computer Science, pages 148-162.
Springer-Verlag.

Field, A. J. and Harrison, P. G. (1988). Functional Programming. International


computer science series. Addison-Wesley.

Fokkinga, M. M. (1991). An exercise in transformational programming:


backtracking and branch-and-bound. Science of Computer Programming,
16(1), 19-48.

Fokkinga, M. M. (1992a). Calculate categorically!. Formal Aspects of Computing,


4(4), 673-692.

Fokkinga, M. M. (1992b). A gentle introduction to category theory the -

calculational approach. In Lecture Notes of the STOP 1992 Summerschool on

Constructive Algorithmics, pages 1-72. University of Utrecht. Available from


URL http://hydra.cs.utwente.nl/~fokkinga/mmf92b.html.

Fokkinga, M. M. (1992c). Law and order in algorithmics. Ph.D. thesis, Technical


University Twente, The Netherlands. Available from URL
http://hydra.cs.utwente.nl/~fokkinga/mmfphd.html.

Fokkinga, M. M. (1996). Datatype laws without signatures. Mathematical


Structures in Computer Science, 6, 1-32.

Freyd, P. J. and Scedrov, A. (1990). Categories, Allegories, Volume 39 of


Mathematical Library. North-Holland.

Galil, Z. and Giancarlo, R. (1989). Speeding up dynamic programming with


applications to molecular biology. Theoretical Computer Science, 64, 107-118.

Gardiner, P. B., Martin, C. E., and De Moor, O. (1994). An algebraic


H.
construction of predicate transformers. Science of Computer Programming,

22(1-2), 21-44.
Bibliography 279

Gibbons, J., Cai, W., and Skillicorn, D. B. (1994). Efficient parallel algorithms for
tree accumulations. Science of Computer Programming, 23, 1-18.

Gibbons, J. (1991). Algebras for tree algorithms. D.Phil, thesis. Technical


Monograph PRG-94, Computing Laboratory, Oxford, UK.

Gibbons, J. (1993). Upwards and downwards accumulations on trees. In Bird,


R. S., Morgan, C. C, and Woodcock, J. C. P., editors, Mathematics of
Program Construction, Volume 669 of Lecture Notes in Computer Science,
pages 122-138. Springer-Verlag.

Gibbons, J. (1995). An initial-algebra approach to directed acyclic graphs. In


Moller, B., editor, Mathematics of Program Construction, Volume 947 of
Lecture Notes in Computer Science, pages 282-303. Springer-Verlag.

Giegerich, R. and Kurtz, S. (1995). A comparison of purely functional suffix tree


constructions. Science of Computer Programming, 25, 187-218.

Gnesi, S., Montanari, U., and Martelli, A. (1981). Dynamic programming as


graph searching: an algebraic approach. Journal of the Association for
Computing Machinery, 28(4), 737-751.

Goguen, J. A. and Meseguer, J. (1983). Correctness of recursive parallel


nondeterministic flow programs. Journal of Computer and System Sciences,
27(2), 268-290.
Goguen, J. A. (1980). How to prove inductive hypotheses without induction. In
Bibel, W. and Kowalski, R., editors, Automated Deduction, Volume 87 of
Lecture Notes in Computer Science, pages 356-373.

Goldblatt, R. (1986). Topoi The Categorial Analysis of Logic, Volume 98 of


-

Studies in Logic and the Foundations of Mathematics. North-Holland.

Gries, D. (1981). The Science of Programming. Texts and Monographs in


Computer Science. Springer-Verlag.

Gries, D. (1984). A note on a standard strategy for developing loop invariants and
loops. Science of Computer Programming, 2, 207-214.

Gries, D. (1990a). Binary to decimal, one more time. In Beauty is our Business: A
Birthday Salute to Edsger W. Dijkstra, pages 141-148. Springer-Verlag.

Gries, D. (1990b). The maximum-segment sum problem. In Dijkstra, E. W.,


editor, Formal Development of Programs and Proofs. Addison-Wesley.

Grillet, P. A. (1970). Regular categories. In Barr, M., Grillet, P. A., and


Van Osdol, D. H., editors, Exact Categories and Categories of Sheaves,
Volume 236 of Lecture Notes in Mathematics, pages 121-222. Springer-Verlag.
280 Bibliography

Hagino, T. (1987a). Categorytheoretic approach to data types. Ph.D. thesis.


Technical Report ECS-LFCS-87-38, Laboratory for Foundations of Computer
Science, University of Edinburgh, UK.

Hagino, T. (1987b). A typed lambda calculus with categorical type constructors.


In Pitt, D. H., Poigne, A., and Rydeheard, D. E., editors, Category Theory
and Computer Science, Volume 283 of Lecture Notes in Computer Science,
pages 140-157. Springer-Verlag.

Hagino, T. (1989). Codatatypes in ML. Journal of Symbolic Computation, 8,


629-650.

Hagino, T. (1993). A categorical programming language. In Takeichi, M., editor,


Advances in Software Science and Technology, Volume 4, pages 111-135.
Academic Press.

Harrison, P. G. and Khoshnevisan, H. (1988). Algebraic transformation techniques


for functional languages. Computer Journal, 31(3), 229-242.

Harrison, P. G. and Khoshnevisan, H. (1992). On the synthesis of function


inverses. Acta Informatica, 29(3), 211-239.

Harrison, P. G. (1988). Linearisation: an optimisation for nonlinear functional


programs. Science of Computer Programming, 10(3), 281-318.

Harrison, P. G. (1991). Towards the synthesis of static parallel algorithms. In


Moller, B., editor, Constructing Programs from Specifications, pages 49-69.
Elsevier Science Publishers.

Helman, P. and Rosenthal, A. (1985). A comprehensive model of dynamic


programming. S1AM Journal on Algebraic and Discrete Methods, 6(2),
319-334.

Helman, P., Moret, B. M. E., and Shapiro, H. D. (1993). An exact characterization


of greedy structures. SIAM Journal of Discrete Mathematics, 6(2), 274-283.

Helman* P. (1989a). A common schema for dynamic programming and


branch-and-bound algorithms. Journal of the
ACM, 36(1), 97-128.

Helman, P. (1989b). A theory of greedy structures based on k-ary dominance


relations. Technical report CS89-11, Department of Computer Science, The
University of New Mexico, USA.

Henson, M. (1987). Elements of Functional Languages. Computer Science Texts.


Blackwell Scientific Publications Ltd.

Hirschberg, D. S. and Larmore, L. L. (1987). The least weight subsequence


problem. SIAM Journal on Computing, 16(4), 628-638.
Bibliography 281

Hoare, C. A. R. and He, J. (1986a). The weakest prespecification, I. Fundamenta


Informaticae, 9(1), 51-84.

Hoare, C. A. R. and He, J. (1986b). The weakest prespecification, II. Fandamenta


Informaticae, 9(2), 217-251.

Hoare, C. A. R. and He, J. (1987). The weakest prespecification. Information


Processing Letters, 24(2), 127-132.

Hoare, C. A. R., He, J., and Sanders, J. W. (1987). Prespecification in data


refinement. Information Processing Letters, 25(2), 71-76.

Hoare, C. A. R. (1962). Quicksort. Computer Journal, 5, 10-15.

Hochbaum, D. S. and Shamirr R. (1989). An o(n log2 n) algorithm for the


maximum weighted tardiness problem. Information Processing Letters, 31,
215-219.

Hoogendijk, P. F. (1996). A generic theory of datatypes. Ph.D. thesis, Department


of Computing Science, Eindhoven University of Technology, The Netherlands.

Hu, T. C. and Shing, M. T. (1982). Computation of matrix chain products, part I.


SIAM Journal on Computing, 11(2), 362-373.

Hu, T. C. and Shing, M. T. (1984). Computation of matrix chain products, part


II. SIAM Journal on Computing, 13(2), 228-251.

Hu, Z., Iwasaki, H., and Takeichi, M. (1996). Calculating accumulations.


Technical Report METR 96-0-3, Department of Mathematical Engineering,
University of Tokyo, Japan. Available from URL:
http: //www. ipl* t. u-tokyo. ac. jp/~hu/pub/tech. html.

Hutton, G. (1992). Between functions and relations in calculating programs.


Ph.D. Thesis. Research report FP-93-5, Department of Computer Science,
Glasgow University, UK. Available from URL
http: //www. cs. natt. ac. uk/Department/Staf f/gmh/.

Jay, C. Cockett, J. R. B. (1994). Shapely types and shape polymorphism.


B. and
InSannella, D., editor, Programming Languages and Systems ESOP 994, -

Lecture Notes in Computer Science, pages 302-316. Springer-Verlag.

Jay, C. B. (1994). Matrices, monads and the fast fourier transform. In Proceedings
of the Massey Functional Programming Workshop 1994y pages 71-80.

Jay, C. B. (1995). Polynomial polymorphism. In Kotagiri> R., editor, Proceedings


of theEighteenth Australasian
Computer Science Conference: Glenelg, South
Australia 1-3 February, 1995, Volume 17, pages 237-243. A. C. S.
Communications.
282 Bibliography

Jeuring, J. T. (1989). Deriving algorithms on binary labelled trees. In Apers, P.


M. G., Bosnian, D., and van Leeuwen, J., editors, Computing Science in the
Netherlands, pages 229-249. SION.

Jeuring, J. T. (1990). Algorithms from theorems. In Broy, M. and Jones, C. B.,


editors, Programming Concepts and Methods, pages 247-266. North-Holland.

Jeuring, J. T. (1991). The derivation of hierarchies of algorithms on matrices. In


Moller, B., editor, Constructing Programs from Specifications, pages 9-32.
Elsevier Science Publishers.

Jeuring, J. T. (1993). Theories for algorithm calculation. Ph.D. thesis, University


of Utrecht, The Netherlands.

Jeuring, J. T. (1994). The derivation of on-line algorithms, with an application to

finding palindromes. Algorithmica, 11(2), 146-184.

Jeuring, J. T. (1995). Polytypic pattern matching.In Peyton-Jones, S., editor,


Functional Programming Languages and Computer Architecture, pages
238-248. Association for Computing Machinery.

Jones, G. and Sheeran, M. (1990). Circuit design in Ruby. In Staunstrup, J.,


editor, Formal Methods for VLSI Design, pages 13-70. Elsevier Science
Publications.

Jones, G. and Sheeran, M. (1993). Designing arithmetic circuits by refinement in

Ruby. In Bird, R. S., Morgan, C. C, and Woodcock, J. C. P., editors,


Mathematics of Program Construction, Volume 669 of Lecture Notes in
Computer Science, pages 208-232. Springer-Verlag.

Jones, M. P. (1994). The implementation of the gofer functional programming


system. Research report YALEU/DCS/RR-1030, Yale University, New
Haven, Connecticut, USA. Available from URL
http://www.cs.nott.ac.uk/Department/Staff/mpj/.

Jones, M. P. (1995). A system of constructor classes: overloading and implicit


higher-order polymorphism. Journal of Functional Programming, 5(1), 1-35.

Karp, R. M. and Held, M. (1967). Finite-state processes and dynamic


programming. SIAM Journal on Applied Mathematics, 15(3), 693-718.
Kawahara, Y. (1973a). Notes on the
universality of relational functors. Memoirs
of the Faculty of Science, Kyushu University, Series A, Mathematics, 27(3),
275-289.

Kawahara, Y. (1973b). Relations in categories with pullbacks. Memoirs of the


Faculty of Science, Kyushu University, Series A, Mathematics, 27(1), 149-173.
Bibliography 283

Kawahara, Y. (1990). Pushout-complements and basic concepts of grammars in


toposes. Theoretical Computer Science, 77(3),267-289.

Kernighan, B. W. and Ritchie, D. M. (1988). The C Programming Language


(Second edition). Software series. Prentice Hall.

Kieburtz, R. B. and Lewis, J. (1995). Programming with algebras. In Jeuring,


J. T. and Meijer, E., editors, Advanced Functional Programming, Volume 925
of Lecture Notes in Computer Science, pages 267-307. Springer-Verlag.

Kleene, S. C. (1952). Introduction to Metamathematics, Volume 1 of Bibliotheca


Mathematica. North-Holland.

Knapen, E. (1993). Relational programming, program inversion and the derivation


of parsing algorithms. Computing science notes, Department of Mathematics
and Computing Science, Eindhoven University of Technology. Available from
URL http: //www. win. tue. nl/win/cs/wp/papers/papers. html.

Knaster, B. (1928). Un theoreme sur les fonctions d'ensembles. Annales de la


Societe Polonaise de Mathematique, 6, 133-134.

Knuth, D. E. and Plass, M. F. (1981). Breaking paragraphs into lines. Software:


Practice and Experience, 11, 1119-1184.

Knuth, D. E. (1990). A simple program whose proof isn't. In Feijen, W., Gries,
D., and Van Gasteren, A. J. M., editors, Beauty is Our Business A Birthday -

Salute to Edsger W. Dijkstra, pages 233-242. Springer-Verlag.

Kock, A. (1972). Strong functors and monoidal monads. Archiv fiir Mathematik,
23, 113-120.

Korte, B., Lovasz, L., and Schrader, R. (1991). Greedoids, Volume 4 of


Algorithms and combinatorics. Springer-Verlag.

Lambek, J. and Scott, P. J. (1986). Introduction to Higher Order Categorical


Logic, Volume 7 of Cambridge Studies in Advanced Mathematics. Cambridge
University Press.

Lambek, J. (1968). A fixpoint theorem for complete categories. Mathematische


Zeitschrift, 103, 151-161.

Lawler, E. L. (1973). Optimal sequencing of a single machine subject to

precedence constraints. Management Science, 19(b), 544-546.

Lawvere, F. W. (1966). The category of


categories as a foundation for
mathematics. In Eilenberg, S., Harrison, D. K., Mac Lane, S., and Rohrl, H.,
editors, Categorical Algebra, pages 1-20. Springer-Verlag.
284 Bibliography

Lehmann, D. J. and Smyth, M. B. (1981). Algebraic specification of data types: a

synthetic approach. Mathematical Systems Theory, 24(2), 97-139.

Mac Lane, S. and Moerdijk, I. (1992). Sheaves in Geometry and Logic: A First
Introduction to
Topos Theory. Universitext. Springer-Verlag.

Mac Lane, S. (1961). An algebra of additive relations. Proceedings of the National


Academy of Sciences, 47, 1043-1051.

Maddux, R. D. (1991). The origin of relation algebras in thedevelopment and


axiomatization of the calculus of relations. Studia Logica, 50(3-4), 421-455.

Malcolm, G. R. (1990a). Algebraic data types and program transformation. Ph.D.


thesis, Department of Computing Science, Groningen University, The
Netherlands.

Malcolm, G. R. (1990b). Data structures and program transformation. Science of


Computer Programming, 14(2-3), 255-279.

Manes, E. G. and Arbib, M. A. (1986). Algebraic Approaches to Program


Semantics. Texts and Monographs in Computer Science. Springer-Verlag.

Manes, E. G. (1975). Algebraic Theories, Volume 26 of Graduate Texts in


Mathematics. Springer-Verlag.

Martello, S. and Toth, P. (1990). Knapsack Problems: Algorithms and Computer


Implementations. Interscience Series in Discrete Mathematics and
Optimization. Wiley.

Martin, U. and Nipkow, T. (1990). Automating Squiggol. In Broy, M. and Jones,


C. B., editors, Programming Concepts and Methods, pages 223-236.
North-Holland.

Martin, C. E. (1991). Preordered categories and predicate transformers. D.Phil,


thesis, Computing Laboratory, Oxford, UK.

Mathematics of Program Construction Group. (1995). Fixed-point calculus.


Information Processing Letters, 53, 131-136.

McLarty, C. (1992). Elementary Categories, Elementary Toposes, Volume 21 of


Oxford Logic Guides. Clarendon Press.

Meertens, L. (1987). Algorithmics towards programming as a mathematical


-

activity. In De Bakker, J. W., Hazewinkel, M., and Lenstra, J. K., editors,


Mathematics and Computer Science, Volume 1 of CWI Monographs, pages
3-42. North-Holland.
Bibliography 285

Meertens, L. (1989). Constructing a calculus of programs. In Van de Snepscheut,


J. L. A., editor, Mathematics of Program Construction, Volume 375 of
Lecture Notes in Computer Science, pages 66-90. Springer-Verlag.

Meertens, L. (1992). Paramorphisms. Formal Aspects of Computing, 4(5),


413-424.

Meijer, E. and Hutton, G. (1995). Bananas in space: extending fold and unfold to

exponential types. In Peyton-Jones, S., editor, Functional Programming


Languages and Computer Architecture, pages 324-333. Association for
Computing Machinery.

Meijer, E., Fokkinga, M., and Paterson, R. (1991). Functional programming with
bananas, lenses, envelopes and barbed wire. In Hughes, J., editor,
Proceedings of the 1991 ACM Conference on Functional Programming
Languages and Computer Architecture, Volume 523 of Lecture Notes in
Computer Science, pages 124-144. Springer-Verlag.

Meijer, E. (1992). Calculating compilers. Ph.D. thesis, University of Nijmegen,


The Netherlands.

Mikkelsen, C. J. (1976). Lattice theoretic and logical aspects of elementary topoi.


Various Publications Series 25, Matematisk Institut, Aarhus Universitet,
Denmark.

Mili, A., Desharnais, J., and Mili, F. (1987). Relational heuristics for the design of
deterministic programs. Acta Informatica, 24(3), 239-276.

Mili, A., Desharnais, J., and Mili, F. (1994). Computer Program Construction.
Oxford University Press.

Mili, A. (1983). A relational approach to the design of deterministic programs.


Acta Informatica, 20(4), 315-328.

Mitchell, J. C. and Scedrov, A. (1993). Notes on sconing and relators. In Boerger,


E., editor, Computer Science Logic '92, Selected Papers, Volume 702 of
Lecture Notes in Computer Science, pages 352-378.

Mitten, L. G. (1964). Composition principles for synthesis of optimal multistage


processes. Operations Research, 12, 610-619.

Moggi, E. (1991). Notions of computation and monads. Information and


Computation, 93(1), 55-92.

Moller, B. and Russling, M. (1994). Shorter paths to graph algorithms. Science of


Computer Programming, 22(1-2), 157-180.
286 Bibliography

Moller, B. (1991). Relations as a program development language. In Moller, B.,


editor, Constructing Programs from Specifications, pages 373-397.
North-Holland.

Moller, B. (1993). Derivation of graph and pointer algorithms. In Moller, B.,


Partsch, H., and Schuman, S., editors, Formal Program Development, Volume
755 of Lecture Notes in Computer Science, pages 123-160. Springer-Verlag.

Morgan, C. C. (1993). The cuppest capjunctive capping. In Roscoe, A. W.,


editor, A Classical Mind: Essays in Honour of CA.R. Hoare, International
Series in Computer Science, pages 317-332. Prentice Hall.

Naumann, D. A. (1994). A recursion theorem for predicate transformers on


inductive data types. Information Processing Letters, 50(6), 329-336.

Ning, M. Z. (1997). Functional programming and combinatorial optimisation.


Ph.D. thesis, Computing Laboratory, Oxford, UK. forthcoming.

Partsch, H. A. (1986). Transformational program development in a particular


problem domain. Science of Computer Programming, 7(2), 99-241.

Partsch, H. A. (1990). Specification and Transformation of Programs A Formal -

Approach to Software Development. Texts and Monographs in Computer


Science. Springer-Verlag.

Paterson, R. (1988). Reasoning about functional programs. Ph.D. thesis,


University of Queensland, Brisbane.

Paulson, L. (1991). ML for the working programmer. Cambridge University Press.

Peirce, C. S. (1870). Description of a notation for the logic of relatives, resulting


from an amplification of the conceptions of Boole's calculus of logic. Memoirs
of the American Academy of Sciences, 9, 317-378. Reprinted in (Peirce 1933).

Peirce, C. S. (1933). Collected Papers. Harvard University Press.

Pettorossi, A. and Burstall, R. M. (1983). Deriving very efficient algorithms for


evaluating linear recurrence relations using the program transformation
technique. Acta Informatica, 18(2), 181-206.

Pettorossi, A. (1984). Methodologies for transformations and memoing in


applicative languages. Ph.D. thesis CST-29-84, University of Edinburgh,
Scotland.

Pettorossi, A. (1985). Towers of Hanoi problems: deriving iterative solutions by


program transformations. BIT, 25(2), 327-334.
Bibliography 287

Pierce, B. C. (1991). Basic category theory for computer scientists. Foundations


of Computing Series. MIT Press.

Pratt, V. R. (1992). Origins of the calculus of binary relations. In Logic in

Computer Science, pages 248-254. IEEE Computer Society Press.

Puppe, D. (1962). Korrespondenzen in abelschen kategorien. Mathematische


Annalen, 148, 1-30.

Reade, C. (1988). Elements of Functional Programming. International computer


science series. Addison-Wesley.

Reingold, E. M., Nievergelt, J., and Deo, N. (1977). Combinatorial Algorithms:


Theory and Practice. Prentice Hall.

Rietman, F. J. (1995). A relational calculus for the design of distributed


algorithms. Ph.D. thesis, Department of Computer Science, Utrecht
University, The Netherlands.

Riguet, J. (1948). Relations binaires, fermeture, correspondances de Galois.


Bulletin de la Societe Mathematique de France, 76, 114-155.

Russling, M. (1995). A general scheme for breadth-first graph traversal. In Moller,


B., editor, Mathematics of Program Construction, Volume 947 of Lecture
Notes in Computer Science, pages 380-398. Springer-Verlag.

Rydeheard, D. E. and Burstall, R. M. (1988). Computational Category Theory.


International Series in Computer Science. Prentice Hall.

Sanderson, J. G. (1980). A Relational Theory of Computing, Volume 82 of


Lecture Notes in Computer Science. Springer-Verlag.

Schmidt, G. Strohlein, T. (1993). Relations and Graphs: Discrete


W. and
Mathematics for Computer Scientists. EATCS Monographs on Theoretical
Computer Science. Springer-Verlag.

Schmidt, G. W., Berghammer, R., and Zierer, H. (1989). Symmetric quotients and
domain construction. Information Processing Letters, 33(3), 163-168.

Schoenmakers, B. (1992). Inorder traversal of a binary heap and its inversion in


optimal time and space. In Bird, R. S., Morgan, C. C, and Woodcock, J.
C. P., editors, Mathematics of Program Construction, Volume 669 of Lecture
Notes in Computer Science, pages 291-301. Springer-Verlag.

Schroder, E. (1895). Vorlesungen iiber die Algebra der Logik (Exakte Logik).
Dritter Band: Algebra und Logik der Relative. Teubner, Leipzig.
288 Bibliography

Sheard, T. and Fegaras, L. (1993). A fold for all seasons. In Functional


Programming Languages and Computer Architecture, pages 233-242.
Association for Computing Machinery.

Sheeran, M. (1987). Relations + higher-order functions = hardware


descriptions.
In Proebster, W. E. and Reiner, E., editors, VLSI and Computers, pages
303-306. IEEE.

Sheeran, M. (1990). Categories for the designer. In Leeser, M.


working hardware
and Brown, G., editors, Workshop Specification, VeriScation
on Hardware
and Synthesis: Mathematical Aspects. Cornell University 1989, Volume 408 of
Lecture Notes in Computer Science, pages 380-402. Springer-Verlag.

Skillicorn, D. B. (1995). Foundations of Parallel Programming, Volume 6 of


Cambridge International Series on Parallel Computation. Cambridge
University Press.

Smith, D. R. and Lowry, M. R. (1990). Algorithm theories and design tactics.


Science of Computer Programming, 24(2-3), 305-321.

Smith, D. R. (1985). Top-down synthesis of divide-and-conquer algorithms.


Artificial Intelligence, 27(1), 43-96.

Smith, D. R. (1987). Applications of a strategy for designing divide-and-conquer


algorithms. Science of Computer Programming, 18, 213-229.

Smith, D. R. (1990). KIDS: a semiautomatic program development system. IEEE


Transactions on Software Engineering, 16(9), 1024-1043.

Smith, D. R. (1991). Structure and design of problem reduction generators. In


Moller, B., editor, Constructing Programs from Specifications, pages 91-124.
North-Holland.

Smith, D. R. (1993). Constructing specification morphisms. Journal of Symbolic


Computation, 15, 571-606.

Sniedovich, M. (1986). A new look at Bellman's principle of optimality. Journal of


Optimization Theory and Applications, 49(1), 161-176.

Spivey, M. (1989). A categorical approach to the theory of lists. In Van de


Snepscheut, J. L. A., editor, Mathematics of Program Construction, Volume
375 of Lecture Notes in Computer Science, pages 399-408. Springer-Verlag.

Takano, A. and Meijer, E. (1995). Shortcut deforestation in calculational form. In


Peyton-Jones, S., editor, Functional Programming Languages and Computer
Architecture^ pages 306-313. Association for Computing Machinery.
Bibliography 289

Tarski, A. (1941). On the calculus of relations. Journal of Symbolic Logic, 6(3),


73-89.

Tarski, A. (1955). A lattice-theoretic fixpoint theorem and its applications.


Pacific Journal of Mathematics, 5, 285-309.

Taylor, P. (1994). Commutative diagrams in T$£{version 4). Available from URL


http: //theory. doc. ic. ac. uk/tex/contrib/Tayl or /diagrams.
Von Karger, B. and Hoare, C. A. R. (1995). Sequential calculus. Information
Processing Letters, 53(3), 123-130.

Wadler, P. (1987). Views: a way for pattern matching to cohabit with data
abstraction. In Principles of Programming Languages, pages 307-313.
Association for Computing Machinery.

Wadler, P. (1989). Theorems for free!. In Functional Programming Languages and


Computer Architecture, pages 347-359. Association for Computing
Machinery.

Walters, R. F. C. (1989). Data types in distributive categories. Bulletin of the


Australian Mathematical Society, 40(1), 79-82.

Walters, R. F. C. (1992a). Categories and Computer Science, Volume 28 of


Cambridge Computer Science Texts. Cambridge University Press.

Walters, R. F. C. (1992b). An imperative language based on distributive


categories. Mathematical Structures in Computer Science, 2(3), 249-256.

Wickstrom, A. (1987). Functional Programming Using Standard ML.


International Series in Computer Science. Prentice Hall.

Williams, J. H. (1982). On the development of the algebra of functional programs.


ACM Transactions on Programming Languages and Systems, 4(4), 733-757.

Yao, F. F. (1980). Efficient


dynamic programming using quadrangle inequalities.
InTheory of Computing, pages 429-435. Association for Computing
Machinery.

Yao, F. F. (1982). Speed-up in dynamic programming. SIAM Journal on

Algebraic and Discrete Methods, 3(4), 532-540.


Index

absorption law cartesian closed category, 72, 75, 78


for A, 105 cartesian product function, 125
for products, 41, 114 case operator, 41, 42, 122
accumulation parameter, 7, 12, 74, catamorphism, 46
77, 139 category, 25-30
Ackermann's function, 6 closure, 157-162
addition modulo p, 45 co-algebra, 52

algebra, 46 combinatorial functions, 123

allegory, 81-85 company party problem, 175


anti-symmetry, 86 concatenation, 8, 70
arrows of a category, 25 conditionals, 21, 66, 122
ASCII, 1 connected relation, 151
cons-lists, 7
bags, 130 constant arrow, 38

banana-split law, 55, 56, 78 constant functor, 31


base functor, 52 constructor, 1
bifunctor, 31, 40, 50 constructor classes, 23
bijection, 29, 75 context condition, 167, 223
binary thinning theorem, 202 continuous mapping, 141
bitonic tours problem, 212 contravariance, 83
Boolean allegory, 101, 122 converse function theorem, 128

boolean operators, 67 coproduct, 41-42


bottom element, 43 coreflexive, 86, 122
branch-and-bound, 243 currying, 2-3, 16, 44, 70, 71
bus-stop problem, 217
data compression problem, 238
cancellation law datatype, 1-3, 36-38
for coproducts, 42, 118 parameterised, 3, 49
for division, 99 with
laws, 53
for power transpose, 104 De Morgan's law, 101
for products, 39, 41, 43, 116 decimal representation, 11, 17, 62,
carrier (of an algebra), 45 137
292 Index

Dedekind's rule, see modular law fold operator, 5


detab-entab problem, 246 font conventions
diagonal rule, 161 for datatypes, 3
diagram, 27-28 for identifiers, 30
commuting, 27 forest, 15
pasting, 35 function space, see exponential
diagrammatic form, 2 functional application, 2
difunctional arrow, 142 functional composition, 2
difunctional closure, 142 functor, 30-33
disjoint union, 1, 38, 42 fusion (with the power functor), 168
distributive category, 67 fusion law
distributivity, 172 for catamorphisms, 48, 141
divide and conquer, 137, 144, 146 for coproducts, 42
domain, 26, 86 for exponentials, 72
dominance relation, 217 for power transpose, 104
duality, 28-30, 41, 52, 118 for products, 39
dynamic programming, 219 for terminal objects, 37, 40
dynamic programming theorem, 220 for type functors, 51

Eilenberg-Wright Lemma, 122 Galois connection, 100, 109


empty object, 38 Gofer, xii, 1, 23, 165
entire arrow, 88 graph algorithms, 157
epi-monic factorisation, 96 graph functor, 32
epic arrow, 28 greedoids, 191
equivalence, 86 Greedy theorem, 173, 245
evaluating polynomials, 58, 62
exchange law, 45 Haskell, 1, 22
existential image functor, 32, 35, 105 homomorphism, 5, 30, 45-46
existential quantification, 102 Hope, 1
exponential, 44, 72, 117 Horn sentence, 95

exponential functor, 113 Horner's rule, 58


exponentiation, 144 generalisation of, 62
hylomorphism, 142, 162
F-algebra, 45 Hylomorphism theorem, 144
factorialfunction, 4, 5, 57 hyperproduct, 62
Fibonacci function, 4, 5
fixed point idempotent arrow, 90
greatest, 140 identity functor, 31, 49
least, 140, 142 imp, see simple arrow
unique, 140, 146, 221 inclusion functor, 32, 35
fixed point induction, 141, 259 indirect equality, 65, 107
fixpoint, 49 indirect proof, 82, 102
floor, 65 induction, 147
Fokkinga's Theorem, 58 inductive relation, 147-151, 158
Index 293

inequation, 82 Miranda, 1
infinite lists, 52 modular identity, 88
initial algebra, 45-49 modular law, 84
initial object, 37-38 modulus computation, 145
initial type, 51 monad, 52
injection, 17, 28, 65, 68 monic arrow, 28
insertion sort, 157 monotonic algebra, 172
inverse, 16-18, 29 monotonic functor, see relator
involution, 83, 101 monotonicity
isomorphism, 29, 33, 48 of composition, 82
iterative definition, 12, 14 of division, 99
//-calculus, 161
jointly monic arrows, 92
natural isomorphism, 34, 67
Kleene'stheorem, 141 natural transformation, 19, 33-35
knapsack problem, 205 naturality condition, 34, 133
Knaster-Tarski theorem, 140 negation operator, 2
non-empty lists, 13
Lambek's Lemma, 49, 142 non-empty power object, 107
large category, 31 non-strict constructor, 43
Lawvere's Recursion Theorem, 78 non-strict
semantics, 22
lax natural transformation, 132, 148, nondeterminism, 81
182
layered networkproblem, 196 objects of a category, 25
lazy functional programming, 43, 45 one-pass program, 56
lexical ordering, 98, 175 opposite category, 28
linear functor, 202 optimal bracketing problem, 230
linear order, 152 Orwell, 1
list comprehension, 13
locale, 91 pair operator, 39, 43
locally complete allegory, 96 paragraph problem, 190, 207
longest upsequence problem, 217 parallel loop fusion, 78
loops, 12, 264 partial function, 26, 30, 88
lower adjoint, 101 partial order, 44, 86, 108
partitions, 128
maximum, 166 pattern matching, 2, 66
maximum segment sum problem, 174 permutations, 130
membership relation, 32, 34, 103, ping-pong argument, 82
147-151 point-free, 19-22
memoisation, 219 pointwise, 19-22
mergesort, 156 polymorphism, 18-19, 34, 35
merging loops, 56 polynomial functor, 44-45
minimal elements, 170 power allegory, 103, 117
minimum tardiness problem, 253 power functor, 105
294 Index

power object, 103 meet, 83


power relator, 119 negation, 101
power transpose, 103 product, 114
powerset functor, 32 relations (as data), 161
predicate calculus, 28 relator, 111, 134
predicate transformer, 108 retraction, 30
prefix, 126 rolling rule, 159, 161
preorder, 30, 33, 38, 75, 86, 98, 108, Ruby, 58, 78, 135
170 Ruby triangles, 58
principle of optimality, 219 rule of floors, 63, 65
problem reduction generator, 217
product, 38-41 Schroder's rule, 103
product category, 27, 40 security van problem, 184

projection function, 6, 39, 40, 45 selection sort, 152


semi-commuting diagram, 82
quicksort, 154 sequential decision process, 217
set comprehension, 104

rally driver's problem, 217 set theory, 95, 104, 162, 169

range, 26, 86 shortest paths problem, 179


reciprocal, see relational converse shunting rules, 89
recursion, 4, 137 simple arrow, 88
mutual, 15 singleton set, 36, 37, 106
non-structural, 139 SML, 1
primitive, 5, 6 snoc-lists, 7
structural, 5, 10, 56 sorting, 151-157
refinement, 18, 138, 194 sorting sets, 200
reflection law source operator, 25
for
catamorphisms, 48 source type, 2
forcoprodcuts, 42 spans, 40
for exponentials, 72 squaring functor, 31
for products, 39 strict functional programming, 22
for terminal objects, 37, 40 string edit problem, 225
reflexivity, 86 strings, 47
regular category, 108 strong functor, 76
relational structural recursion theorem, 73
algebra, 121 subcategory, 26, 32, 88
catamorphism, 121 subsequences, 123
converse, 43, 83 substitution rule, 162
coproduct, 117 suffix, 126
difference, 100, 159 supersequences, 132
division, 98 supremum operator, 170
implication, 97 surjection, 17, 28
inclusion, 82 surjective relation, 149
join, 96 symmetry, 28, 86
Index 295

tabulation (of an arrow), 91 well-bounded relation, 171


tabulation scheme, 6, 219, 227, 233 well-founded relation, 147, 151
tags, 42 well-supported relation, 171, 196
target operator, 25
target type, 2
tensorial strength, 79
term algebra, 46
terminal object, 37
T£K problem, 62, 259
tfim-elimination, 194
tfim-introduction, 194
thinning algorithm, 193
thinning theorem, 195
topos, 109
transitivity, 86
tree

balanced, 57
binary, 14
general, 16
weighted path length, 62
truth tables, 68
tupling, 78
type functor, 44, 49-52
type information, 27
type relator, 122

unit, 91, 94
unitary allegory, 94
universal property
of ram, 166
of thin, 193
of catamorphisms, 46
of closure, 157
of coproducts, 41
of division, 98
of implication, 97
of join, 96
of meet, 83
of power transpose, 103
of products, 39
of range, 86
of terminal object, 37
universal quantification, 98
upper adjoint, 101
Prentice Hall International Series in Computer Science (continued)

PEYTON JONES, S. and LESTER, D., Implementing Functional Languages


POTTER, B., SINCLAIR, J. and TILL, D., An Introduction to Formal Specification and Z (2nd edn)
RABHI, F. A., and LAPALME, G., Designing Algorithms with Functional Languages
ROSCOE, A. W. (ed.), A Classical Mind: Essays in honour of C. A. R. Hoare
ROZENBERG, G., and SALOMAA, A., Cornerstones ofUndecidability
RYDEHEARD, D.E. andBURSTALL, R.M., Computational Category Theory
SHARP, R., Principles of Protocol Design
SLOMAN, M. and KRAMER, J., Distributed Systems and Computer Networks
SPIVEY, J.M., An Introduction to Logic Programming through Prolog
SPIVEY, J.M., TheZ. Notation: A reference manual (2nd edn)
TENNENT, R.D., Semantics ofProgramming Languages
WATT, D. A., Programming Language Concepts and Paradigms
WATT, D. A., Programming Language Processors
WATT, D. A., Programming Language Syntax and Semantics
WATT, D. A., WICHMANN, B. A. and FINDLAY, W., ADA: Language and methodology
WELSH, J. and ELDER, J., Introduction to Modula-2
WELSH, J. and ELDER, J., Introduction to Pascal (3rdedn)
WIKSTROM, A., Functional Programming Using Standard ML
WOODCOCK, J. and DAVIES, J., Using Z: Specification, refinement, and proof

You might also like